Google’s new AI tool can generate music from text descriptions

A new Artificial Intelligence (AI) tool from Google can now generate music in any genre from text prompts, and can even transform a whistled or hummed melody into other instruments. According to Google Research, the technology called MusicLM is a text-to-music generation system. It works by analysing the text and deciphering the scale and complexity of the composition.

"We introduce MusicLM, a model generating high-fidelity music from text descriptions such as 'a calming violin melody backed by a distorted guitar riff'," the research paper read. "We demonstrate that MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption," it added.

According to the paper, MusicLM was trained on a dataset of 280,000 hours of music to learn to generate coherent songs from text descriptions and capture nuances like mood, melody and instruments. Its capabilities extend beyond generating short clips of songs. Google researchers showed that the system can build on existing melodies, whether hummed, sung, whistled or played on an instrument.

Moreover, as per the research, MusicLM can also take several descriptions written in sequence - for example "time to meditate," "time to wake up," "time to run," and "time to give 100%" - and create a sort of melodic "story" or narrative ranging up to several minutes in length. It can also be instructed via a combination of picture and caption, or generate audio that's "played" by a specific type of instrument in a certain game.

Notably, Google is not the first company to do this. As per TechCrunch, projects like OpenAI's Jukebox or Riffusion, an AI that can generate music by visualising it, and Google's own AudioLM have all tried their hand. However, owing to technical limitations and limited training data, none have been able to produce songs particularly complex in the composition of high-fidelity. Therefore, researchers believe that MusicLM is perhaps the first that can.

"MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modelling task, and it generates music at 24 kHz that remains consistent over several minutes. Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text description," Google researchers said in the paper.

But MusicLM is not flawless. For starters, some of the sample music that Google released in its research paper have distorted quality to them. While the system can technically generate vocals, they are often synthesised and sound gibberish, as per TechCrunch. Another drawback is the sometimes compressed nature of the sound quality, a byproduct of the training process.

Google researchers have also noted the many ethical challenges posed by a system like MusicLM, including a tendency to incorporate copyrighted material from training data into the generated songs. During an experiment, researchers found that about 1 percent of the music the system generated was directly replicated from the songs on which it was trained. This threshold is apparently high enough to discourage Google researchers from releasing the latest AI system in its current state.

"We acknowledge the risk of potential misappropriation of creative content associated to the use case," the co-authors of the paper wrote. "We strongly emphasise the need for more future work in tackling these risks associated to music generation," they added.

Source: NDTV