Riffusion: AI Music Generation from Text Prompts Using Visual Sonograms

Riffusion AI creates music using visual sonograms

Riffusion is an AI model created by two tech enthusiasts that creates music using text prompts. It does this by creating a visual representation and then converting it into audio. It is based on a finely tuned version of Stable Diffusion 1,5 image synthesis, which applies visual latent diffusion in sound processing.

Stable Diffusion is able to process sonograms because they are a form of picture. Forsgren and Martiros used sonograms to train a Stable Diffusion custom model. The sonograms were linked with descriptions of musical genres or sounds. Riffusion uses this knowledge to create new music on demand based on text prompts describing the type of music you want to listen to, such as \”jazz,\”\”rock,\”or even typing on a computer keyboard.


Leave a Reply

Your email address will not be published. Required fields are marked *