Other ADT articles in the series:
- Part 1: Pre-processing and EDA
- Part 2: Synthetic dataset and first NN (coming soon)
- Part 3: Align (coming soon)
- Part 4: Final NN (coming soon)
Table of Contents
Some people have a garden for venting off, some do exercise, some go to wood shops, some prefer music.
I personally like drums, because when you are behind the kit, there's you, there's your kit, there's your body, completely unrelated to anything you do all day long, and - this is wonderful - there's no tomorrow and there are no tomorrow related issues. There's only now. It helps. A lot.
So, if you've ever thought about taking drum lessons, don't think twice - just try, you'll love it. And if you haven't, then consider thinking about it.
So one day I came across a post about spleeter - a track separation tool built with the help of machine learning. Among other tracks, it is capable of separating drums, and does it quite decently. And I thought, 'Well, what about a tool that could automatically produce a written drum score from a song?'
And sure enough, there are papers on the matter even dating back to the 80's, which is super sick, if you think about it. The task itself is known as 'Automatic Drum Transcription'. And one particular paper on the issue is just incredible for the tons of work researches have put into it, and it contains a lot of info about the history and existing solutions and types of problems, they even implemented different solutions from more or less recent papers and compared them and put their code online. So if you are interested, definitely check it out.
So, in general, when it comes to the actual problem solving from the sparkling idea, you ask yourself these questions all at once:
- Where could I possibly retrieve lots of data in unified format or is it possible to convert diverse data from different sources to unified format?
- What challenges should I overcome and how? Why it might not work at all and what step could break all the previous work?
- Is it worth putting your time and effort in it at all? What are you getting out of it?
- What solutions are out there and what should I read on the topic?
- Do I have anybody to consult me if I am stuck (saves time, but isn't necessary to do anything you want)?
Some people with musical experience recommended me data sources. Plenty of data, all in unified format (guitarpro tabs), and there's a python library that parses them (only specific versions though, but hey, it saves time and effort, so isn't it wonderful?). Docs are great, and the only thing that made me a bit confused was number of repetitions in alternate endings. Turns out, all you need to do is convert a number representation from decimal to binary numeric system, and then positions of ones will tell you when alternate endings occurr.
It was a bit challenging to expand repetitions, I was so pissed off with them (damn you, vague musical notation rules and nested brackets). I am pretty sure I still have bugs somewhere in my code and not enough failsaves, but whatever. I couldn't sleep and in my dreams I had been recursively opening brackets while other people had been trying to stop me. What an interesting life do I have! /s
Our world is huge, it is very diverse and this is true for humans and their culture and their everyday life. That means, whatever data you have, is probably skewed and biased. There's no way to find all the possible imbalances in your data. And weird effects and outcomes are always possible, so being skeptical is beneficial.
On the other hand, sometimes you want your data be intentionally biased. For example, if I don't know anything about traditional music of Latin America or South-East Asia, I won't be able to analyze it. The thing is, our perception of music is heavily influenced by the culture we live in. For me it is western musical tradition mostly.
This isn't the only reason I intentionally left out some music genres from my data. It's better to have some limit on amount of music instruments to recognize from audio, and I am comfortable with a standard 5-piece drum-kit, heavily used in western music. I also discriminated some genres that use electronic samples a lot, hoping to deal with instrumental music in a vast majority of cases.
So first attempt to analyze the data revealed that there are instruments I am not even familiar with. Which led me to the idea of using MusicBrainz search API to obtain band information, and use that information to intentionally make my dataset biased towards music I am familiar with. I used my previous super-old work based on musicbrainz and some other sources, which explored genre importance and relations. Here's a post from my colleague about how we did it (RU), and if we new about pandas and numpy at the time, it'd be much easier and more elegant, but we live in bubbles unfortunately.
It is worth noting though, that I have no idea how those tabs were collected and what other biases might be planted in the dataset. There is a notably big chunk of Russian music for sure in the data, since it is coming from the Russian side of the Internet, for example, not because Russians make more music than other parts of the world, obviously. Some things aren't that obvious though. We, as humans, are really passionate when comes to our music choices, borderline with elitism and snobbery. The tabs aren't 100% correct and could be absolute garbage in many cases. I definitely have some bugs in my code. So whatever conclusions I come up with here, could be true for this dataset only, not for entire music genres and music in general. As I said, it is always good to be skeptical.
I ended up filtering out a lot of percussion instruments and treating some playing technics or even different instruments as the same instrument (for example, I treat all the toms as just one tom, and it doesn't matter how and where you strike a ride cymbal for me - it's gonna be either bell, or body, however, side stick snare is distinct from just a plain snare hit). That resulted in 14 possible instruments.
Here is what genres are represented the most in the data I had after some filtering (mostly rock and metal sub-genres):
I did not spiral deeper into comparing genres. And don't get me wrong, I do not think they are all too similar, it's an interesting topic, but a bit tedious. Plus, I did not bother with beat durations or playing technics, or tempo, or time signatures. That would consume too much time. What I actually needed the most was the understanding of how much diversity is there in drum "phonemes". I might do a better analysis later, cause there could be lots of interesting insights.
Now, if you treat each combination of strokes occurring at the same time as a phoneme, here's how most popular combinations look like:
So how many combinations have I got in the end?
That's actually moderate, given 14 instruments, but there are still some impossible combinations (because drummers don't have three arms, for example), but we aren't going to address that. Chopping off combinations with length > 4 already helped dramatically.
I also tried (out of curiosity) to convert my combos into unicode and then use sentencepiece (BPE) to see what patterns are more prevalent. Since I don't have a nice and easy-to-use visualization, it's a bit tricky to show something nice about the result. But just like with words and Zipf's Law, the most popular tokens aren't really meaningful ('The Of And To A In Is I'), pretty much like the rarest ones. The meaning is somewhere in between. Not a shocking conclusion, I know.
So I used tokenized tabs to see how similar some of music genres are in terms of their drum parts, and well, most of them are similar. Somehow, instrumental rock stands out here and I don't know why. Could be just a bug.
But on a lower level (strikes) similarity picture makes more sense:
If somebody is going to conduct more focused in depth analysis, please let me know, I'd be really curious to see the results!
As for visualization, there isn't much that could help with it. I personally hate standard musical notation, but my research for an alternative did not give me anything worth considering too. If you know how to solve this problem using Python, please let me know as well, that would be of great help!
Special thanks and sources
- Huge thanks to Alexander Veysov for the help and advice.
P.S. On the internet, nobody knows you're a cat.