WHY THIS MATTERS IN BRIEF
AI’s are getting better at creating and generating synthetic music, and some of them are starting to get signed by major record labels.
The number of so called creative machines capable of generating and synthesising new music – both from scratch as well as from just lyrics – is increasing as fast as the field is evolving – from machines that compose classical music and pop music, even through to machines that have now been signed by major record labels such as Warner Music. The music biz isn’t what it used to be, and it’s changing faster every day. In March, for example, Google released an algorithmic Google Doodle that let users create melodic homages to Bach. And late last year, Project Magenta, a Google Brain effort “exploring the role of machine learning as a tool in the creative process,” showed of Musical Transformer, a model capable of generating songs with recognisable repetition. And all of that is just the tip of the proverbial iceberg.
In what might be characterised as a small but noteworthy step forward in autonomous music generation research, San Francisco capped-profit firm OpenAI, the same company behind the world’s “most dangerous” synthetic text generator, just announced their latest creation – MuseNet, an Artificial Intelligence (AI) system that can create four minute compositions with 10 different instruments across styles “from country to Mozart to the Beatles.”
OpenAI plans to livestream pieces composed by MuseNet on Twitch later next week, and will release a MuseNet-powered music tool in October. The MuseNet composer has three modes – simple mode, which plays an uncurated sample from a composer or style, and an optional start of a famous piece, and advanced mode, which lets you interact with the model directly to create a novel piece.
For example, here’s MuseNet prompted with the first 5 notes of Chopin:
As OpenAI technical staff member Christine Payne explains in a blog post, MuseNet, as with all deep learning networks, contains neurons, which are mathematical functions loosely modelled after biological neurons, arranged in interconnected layers that transmit “signals” from input data and slowly adjust the synaptic strength – weights – of each connection. But uniquely, it has attention – every output element is connected to every input element, and the weightings between them are calculated dynamically.
MuseNet isn’t explicitly programmed with an understanding of music, rather it works by discovering patterns of harmony, rhythm, and style and by learning to predict tokens – notes encoded in a way that combines the pitch, volume, and instrument information – in hundreds of thousands of MIDI files. MuseNet is informed by OpenAI’s recent work on Sparse Transformer, which in turn was based on Google’s own Transformer neural network architecture.
MuseNet was trained on MIDI samples from a range of different sources, including ClassicalArchives, BitMidi, and the open source Maestro corpus. Payne and colleagues transformed them in various ways to improve the model’s generalisability, first by transposing them, for example by raising and lowering the pitches, and then by turning up or turning down the overall volumes of the various samples and slightly slowing or speeding up the pieces.
To lend more “structural context,” they added mathematical representations, learned embeddings, that helped to track the passage of time in MIDI files, and then they added a so called “inner critic” component that predicted whether a given sample was truly from the data set or if it was one of the model’s own past generations.
MuseNet’s additional token types – one for composer and another for instrumentation – also gave the researchers greater control over the kinds of samples it generates, Payne explains.
During training, they were prepended to each music sample so that MuseNet learned to use the information in making note predictions. Then, at generation time, the model was conditioned to create samples in a chosen style by starting with a prompt like a Rachmaninoff piano start or the band Journey’s piano, bass, guitar, and drums.
“Since MuseNet knows many different styles, we can blend generations in novel ways,” she added. “[For example, the model was] given the first six notes of a Chopin Nocturne, but is asked to generate a piece in a pop style with piano, drums, bass, and guitar. [It] manages to blend the two styles convincingly.”
Payne notes that MuseNet isn’t perfect though, because it generates each note by calculating the probabilities across all possible notes and instruments, it occasionally makes poor note choices. And predictably, it has a difficult time with incongruous pairings of styles and instruments, such as Chopin with bass and drums.
But she says that it’s an excellent test for AI architectures with attention, because it’s easy to hear whether the model is capturing long-term structure on the training data set’s tokens.
“It’s much more obvious if a music model messes up structure by changing the rhythm, in a way that it’s less clear if a text model goes on a brief tangent,” she concludes.