
Disco Diffusion
Disco Diffusion (DD) is a Google Colab Notebook which leverages an AI Image generating technique called CLIP-Guided Diffusion to allow you to create compelling and beautiful images from just text inputs. Created by Somnai, augmented by Gandamu, and building on the work of RiversHaveWings, nshepperd, and many others.
It’s magic. And also, free. (!)
However, the storied history and complex internal workings of CLIP and Diffusion are NOT the primary topic of this wiki. Rather, this site is to help you understand how to use basic DD controls to create your own images, and to provide some insight on how all of the parameters affect CLIP and Diffusion behavior.
DD Diffusion Process (vastly simplified)
Diffusion is a mathematical process for removing noise from an image. CLIP is a tool for labeling images. When combined, CLIP uses its image identification skills to iteratively guide the diffusion denoising process toward an image that closely matches a text prompt.
The image to the left was created in DD using just the text prompt: "A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation.”
Diffusion is an iterative process. Each iteration, or step, CLIP will evaluate the existing image against the prompt, and provide a ‘direction’ to the diffusion process. Diffusion will ‘denoise’ the existing image, and DD will display its ‘current estimate’ of what the final image would look like. Initially, the image is just a blurry mess, but as DD advances through the iteration timesteps, coarse and then fine details of the image will emerge.
The example image took 250 diffusion steps to complete. As you can see in the image sequence above, the images get progressively clearer over the range of steps, as the diffusion denoising process is guided toward the desired image by CLIP.
The content of the image is generally controlled by the text used in a ‘prompt’, a sentence, phrase, or series of descriptive words that tells CLIP what you want to see. Creating a good text prompt for AI art is a nuanced, challenging task that takes much trial and error and practice. It will require you to study, but won’t be covered in detail here in this guide. We’re focused on the knobs and levers to drive Disco Diffusion.
Most of DD’s controls are numerical and control various aspects of the CLIP model and the diffusion curve. The general approach for using DD is to pick a text prompt, tune the parameters, then run the notebook to create an image. Depending on the settings used and the processor available, DD can take between 5 minutes to an hour or longer to render a single image.
Fine tuning your prompt and parameters is complex and time consuming, so taking a methodological approach will benefit you. I recommend you first try out the default settings in the notebook, to confirm that the notebook runs properly and there are no errors with your setup. Beyond that, experiment!
Also, while there are animation controls in DD, you should begin by learning how to create still images, as those skills transfer directly to animations.
Creating art with AI is magical and complex, and is constantly being developed by data scientists and programmers. It should be no surprise then, that learning the tools will take work and focus. Also, DD has dozens of controls, with complex interactions and few limits, so it’s easy to get bad results. But don’t get discouraged!
See Also:
Using Disco Diffusion: Basic Settings
Using Disco Diffusion: Advanced Settings