Context-free grammar with audio samples from American pharmaceutical ads (view here)


I had the (unfortunate) experience of watching a lot of American television the last time I was visiting New York, and was amazed by the number of advertisements for prescription drugs I saw. It turns out the US is one of 2 countries where direct-to-consumer advertising for prescription drugs is legal (New Zealand is the other country)

I decided to collect a bunch of audio samples from these ads. I used youtube-dl to download a whole playlist that somebody was kind enough to compile. At first I tried using video-grep to extract video snippets corresponding to certain words and phrases. However the subtitles were not well aligned enough to give me the exact portions I wanted, and since I wasn't sure what exactly I was looking for or where I was headed at this point, I decided to work just with audio and manually extract samples. [ I learnt the hard way how to do this correctly with Adobe Audition – the first time round I created 50 samples all of duration 0:00 ].

I wanted to create a rule based system for  playing the samples, inspired by methods seen on this great presentation on generative music. I first categorized the samples as "problems" (recordings of short phrases such as 'chest-pain' and 'adult ADD'), "talk-to-your-doctor" (various recordings of similar phrases) and "free" (recordings such as 'i got my first prescription free', and 'free for up to one year'). Using the Sequence object in Tone.js I created rules for playing these samples, along with some pitch shifting and randomization within the categories. The results were just okay, but a good starting point:

I then added another category of sample with the words "side effects" and used the scheduleRepeat function to play a sample at set intervals. I created a small set of rules to decide how to choose samples, and how they should mutate/change over time. (Rules below, with each letter corresponding to a certain category or sample)

let rules = {
  "A" : "BB",
  "B" : "DAE",
  "C" : "AC",
  "D" : "DB",
  "E" : "CA"

let str = "ABAB";

This was still not very satisfying, but the context-free grammar direction seemed promising. I tried adding more ways of categorizing the samples, and tried playing 2 simultaneous tracks each with their own set of rules. While messing around, I stopped applying the rules and constructed the sequences manually, and realized there were certain repetitions and orderings that sounded better than others, and creating an overall repetitive structure helped in creating a rhythm of sorts.

Working with this, I added more samples, and explored different permutations and combinations that sounded good.

At this point I took a detour of sort, building on the idea of creating ordered/patterned set of sequences (for eg, 1-1-2-1). I created an array of 4 such patterned sequences, and then added rules to replace all similar elements of a single sequence with either a randomly generated set of samples, or a set that I had marked/chosen as sounding good. I tried various versions of this, with different lengths of patterned-sequences and different methods of replacing the sounds.

This approach lead to very slowly changing sounds – I liked the discomfort and drudgery, but it felt a little boring, and took too long to explore the all the different samples. I also realized that I was moving away from the 'generative' side of things to something far more curated and composed.

It seemed a good next step would be to formulate rules that complemented the selected juxtaposition of samples. I created a very simple set of rules – following a rule of sorts :

let rules = {
  "A" : ["AB", "BC", ""],
  "B" : ["BC", "CD", ""],
  "C" : ["CD", "DE", ""],
  "D" : ["DE", "EF", ""],
  "E" : ["EF", "FG", ""],
  "F" : ["FG", "GH", ""],
  "G" : ["GH", "HI", ""],
  "H" : ["HI", "IJ", ""],
  "I" : ["IJ", "JK", ""],
  "J" : ["JK", "KL", ""],
  "K" : ["KL", "LM", ""],
  "L" : ["LM", "MN", ""],
  "M" : ["MN", "NO", ""],
  "N" : ["NO", "OP", ""],
  "O" : ["OP", "PQ", ""],
  "P" : ["PQ", "QR", ""],
  "Q" : ["QR", "RS", ""],
  "R" : ["RS", "ST", ""],
  "S" : ["ST", "TU", ""],
  "T" : ["TU", "UV", ""],
  "U" : ["UV", "VW", ""],
  "V" : ["VW", "WX", ""],
  "W" : ["WX", "XY", ""],
  "X" : ["XY", "YZ", ""],
  "Y" : ["YZ", "Z0", ""],
  "Z" : ["Z0", "01", ""],
  "0" : ["01", "12", ""],
  "1" : ["12", "23", ""],
  "2" : ["23", "3A", ""],
  "3" : ["3A", "AB", ""]

[Sample "A" gets replaced by "AB", "BC" or nothing ]

This ruleset allowed for the identified repetitions to occur, while still letting the soundscape evolve, allowing for the discovery of all of the samples, and introducing new repetition-patters that helped to create a rhythm.

These results were far more satisfying, and I tweaked the orderings of the samples a little more for the final version. By this point I knew and recognized all the audio samples well enough to understand what the words were, but upon sharing this with friends and getting some feedback, I realized this was not so easy for the first-time listener – and so I added a visual elements to make the audio more discernible.

I'd like to work on this further – now that I have an understanding of the types of rules that could work well, I could gather more samples to add to the mix. I'd also like to explore pitch-shifting, which might first involve auto-tuning the samples to a certain note. Another possible direction could be to play with the envelopes and filters on the sounds, and creating another rule-base for modulating this, so there were two sets of rules working together, that could phase in an out of each other in interesting ways.