drums_rnn is a magenta model that can be used to generate drum sequences, using an LSTM (long short-term memory). LSTMs are a kind of recurrent neural network – a neural network that has loops, allowing it to base activations on a series of previous events/neurons. Such networks can sometimes suffer from issues trying to remember too many previous events, and an LSTM architecture tries to overcome this.

In terms of encoding, drums are polyphonic - ie multiple drums can be struck at the same time. Drums-rnn tackles this by using a single value to denote the whole set of drums that are struck at a particular time-step, and mapping all the different MIDI drums onto a smaller number of drum classes.

It offers 2 configurations:

  • a single drum, which uses a binary encoding (on or off) to indicate if at least one (/any) drum is struck.
  • a 9-piece drum kit, where the set is encoded as a 512 long one-hot vector where only 1 bit is on at a time, and each bit corresponds to one of the 9 drums.

To train a model, one can convert MIDI files into a 'NoteSequence' - which is stored as a protocol buffer – a format of serializing data. This can then be fed into a model, with either of the configuration options mentioned above. One can prime the model with a drum sequence encoded like so –  [(36, 42), (), (42,)] , where each element of the array is a time step, and the numbers indicate which drum is struck. This example is kick+hi-hat; rest; hi-hat.