Suno or Udio are gigantic platforms for making music with AI, with all the pros and cons of this type of tool. They also allow free access, which is fine to understand what they are, but if you want something more you have to pay, obviously.
What if I wanted to use Artificial Intelligence on my computer, and perhaps base my results on truly free material without stealing anything from anyone?
Personally I believe that an excellent tool for exploring AI on personal PC is ComfyUI.
In addition to being able to test all types of checkpoints for graphics and videos through its powerful node interface, it also allows you to manage a free audio library both for use and because it is based on open source audio materials: Stable Audio Open 1.0.
Unlike other models, private and not accessible for artists and researchers to build upon, Audio Open is based on a architecture and training process driven by a new open-weights text-to-audio model trained with Creative Commons data.
Stable Audio Open generates stereo audio at 44.1kHz in Flac format, and is an open-source model optimized for generating short audio samples, sound effects, and production elements using text prompts. Ideal for creating drum beats, instrument riffs, ambient sounds, foley recordings, and other audio samples, the model was trained on data from Freesound and the Free Music Archive, respecting creator rights.
How to do it? We see it right away.
As I said previously, we need a tool to manage all the necessary elements. Personally I had a good experience with ComfyUI, as mentioned previously, if you are inexperienced you can install the deskop version, if Python and its cousins scared you.
The latest version right now is 0.3.10, which you can download here.
It’s really very simple, but for installation you can find all the necessary information on the ComfyUI website.
One of the features of ComfyUI is that it allows you to use AI without having to resort to excessively powerful PCs, which I personally find very sensible.
However, once the installation is finished, before launching the program, take a moment to look at the directory tree: inside the main folder there are some important folders, where we will have to download the files necessary to make the whole thing work.
Inside the ComfyUI folder, notice the one called checkpoints. All the files necessary to make our workflow work will be placed inside these folders.
At this moment our installation is virgin, and since our goal is to create sounds with AI, let’s get what we need.
- Open the ComfyUI audio examples page, and literally follow the instructions. Rename the two files needed as stated, and put them in the right directories.
- Download the workflow, and put it in the ComfyUI > user folder. Simply download the flac file, which you will then drag into the ComfyUI interface to extract the relevant workflow.
- Now we can open ComfyUI by double clicking on the relevant icon created by the installation.
- Drag the previously downloaded .flac file onto the ComfyUI window, and you should see an interface similar to the following image. The nodes can be repositioned as is most convenient for you.
That’s it, you don’t need anything else and you’re ready to type your prompt into the node CLIP Text Encode and click Queue.
I hope it wasn’t too difficult. The technical part is finished, and if you have obtained an audio file in the Save audio node the installation works.
Creating meaningful prompts requires some experimentation, of course. However, your results will be saved in ComfyUI’s Output folder.
I strongly suggest studying the prompts page in the Stable Audio User Guide, it really explains how to proceed.
This is the starting point, from here you can start building your own path with AI. For example:
BEWARE, it is a dangerous drug and your hard drive will quickly fill up.
You can find countless examples by doing a little search for “ComfyUI audio workflow”.
Obviously this is only one of the ways to obtain our result, there are many others. It’s just probably the easiest to get started with.