Riffusion: AI-Powered Text to Image Spectrograms for Generating Precise Audio
Riffusion modifies Stable Diffusion so that AI text to image spectrums can play audio
The system has been tweaked to produce finer images of spectrograms.
Stable Diffusion was updated to include AI routines for a fine tuning of the images of spectrograms paired with text. They can now generate sounds with greater precision. Riffusion is the team’s version of a stable diffusion model.
All of the Stable Diffusion Features remain.
Audio processing is also done, but it happens at a later stage or downstream from the model.