OpenAI’s MuseNet AI can generate 4 minute songs across a wide range of genres and styles

0 2

By Matthew Griffin Intelligence and the Senses 29th September 2019

WHY THIS MATTERS IN BRIEF

AI’s are getting better at creating and generating synthetic music, and some of them are starting to get signed by major record labels.

Interested in the Exponential Future? Connect, download a free E-Book, watch a keynote, or browse my blog.

The number of so called creative machines capable of generating and synthesising new music – both from scratch as well as from just lyrics – is increasing as fast as the field is evolving – from machines that compose classical music and pop music, even through to machines that have now been signed by major record labels such as Warner Music. The music biz isn’t what it used to be, and it’s changing faster every day. In March, for example, Google released an algorithmic Google Doodle that let users create melodic homages to Bach. And late last year, Project Magenta, a Google Brain effort “exploring the role of machine learning as a tool in the creative process,” showed of Musical Transformer, a model capable of generating songs with recognisable repetition. And all of that is just the tip of the proverbial iceberg.

In what might be characterised as a small but noteworthy step forward in autonomous music generation research, San Francisco capped-profit firm OpenAI, the same company behind the world’s “most dangerous” synthetic text generator, just announced their latest creation – MuseNet, an Artificial Intelligence (AI) system that can create four minute compositions with 10 different instruments across styles “from country to Mozart to the Beatles.”

OpenAI plans to livestream pieces composed by MuseNet on Twitch later next week, and will release a MuseNet-powered music tool in October. The MuseNet composer has three modes – simple mode, which plays an uncurated sample from a composer or style, and an optional start of a famous piece, and advanced mode, which lets you interact with the model directly to create a novel piece.

For example, here’s MuseNet prompted with the first 5 notes of Chopin:

As OpenAI technical staff member Christine Payne explains in a blog post, MuseNet, as with all deep learning networks, contains neurons, which are mathematical functions loosely modelled after biological neurons, arranged in interconnected layers that transmit “signals” from input data and slowly adjust the synaptic strength – weights – of each connection. But uniquely, it has attention – every output element is connected to every input element, and the weightings between them are calculated dynamically.

MuseNet isn’t explicitly programmed with an understanding of music, rather it works by discovering patterns of harmony, rhythm, and style and by learning to predict tokens – notes encoded in a way that combines the pitch, volume, and instrument information – in hundreds of thousands of MIDI files. MuseNet is informed by OpenAI’s recent work on Sparse Transformer, which in turn was based on Google’s own Transformer neural network architecture.

Above: MuseNet’s understanding of composers and how they relate stylistically.

Image Credit: OpenAI

MuseNet was trained on MIDI samples from a range of different sources, including ClassicalArchives, BitMidi, and the open source Maestro corpus. Payne and colleagues transformed them in various ways to improve the model’s generalisability, first by transposing them, for example by raising and lowering the pitches, and then by turning up or turning down the overall volumes of the various samples and slightly slowing or speeding up the pieces.

To lend more “structural context,” they added mathematical representations, learned embeddings, that helped to track the passage of time in MIDI files, and then they added a so called “inner critic” component that predicted whether a given sample was truly from the data set or if it was one of the model’s own past generations.

MuseNet’s additional token types – one for composer and another for instrumentation – also gave the researchers greater control over the kinds of samples it generates, Payne explains.

During training, they were prepended to each music sample so that MuseNet learned to use the information in making note predictions. Then, at generation time, the model was conditioned to create samples in a chosen style by starting with a prompt like a Rachmaninoff piano start or the band Journey’s piano, bass, guitar, and drums.

“Since MuseNet knows many different styles, we can blend generations in novel ways,” she added. “[For example, the model was] given the first six notes of a Chopin Nocturne, but is asked to generate a piece in a pop style with piano, drums, bass, and guitar. [It] manages to blend the two styles convincingly.”

Payne notes that MuseNet isn’t perfect though, because it generates each note by calculating the probabilities across all possible notes and instruments, it occasionally makes poor note choices. And predictably, it has a difficult time with incongruous pairings of styles and instruments, such as Chopin with bass and drums.

But she says that it’s an excellent test for AI architectures with attention, because it’s easy to hear whether the model is capturing long-term structure on the training data set’s tokens.

“It’s much more obvious if a music model messes up structure by changing the rhythm, in a way that it’s less clear if a text model goes on a brief tangent,” she concludes.

Source: OpenAI

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.