DAW plugins have been approaching a saturation point over the past decade. Reskinned VSTs seem to outnumber innovative music software by a margin of ten to one. How many new delay and reverb effect plugins do we need? Generative AI and machine learning models are bringing a wave of novelty into the music software ecosystem. But there's a problem; most of these applications lack the plugin UI that we've grown accustomed to.
These new models are built, summarized in papers on Arxiv, promoted on Twitter, and circulated through a small community of AI enthusiasts. Only a few AI models have been productized and marketed to musicians.
Meanwhile, instant-song generators like Boomy and Soundraw are dominating public perceptions of AI music. These browser apps have been backed by venture capital and go after the royalty-free music market. Their marketing appeals to content creators who would normally shop for uaiod at Artlist, Epidemic, and SoundStripe. These browser apps aren't designed for digital audio workflows.
The majority of AI music models are housed in open source Github repositories. Loading a model on your personal computer can be time and resource intensive, so they tend to be run on cloud services like Hugging Face or Google Colab instead. Some programmers are generous enough to share access to one of these spaces.
To overcome this problem, companies will need to start hiring designers and programmers who can make AI models more accessible.
In this article, I'll share a few exclusive interviews that we held with AI VST developers at Neutone and Samplab. I was also fortunate to speak with Voger, a prolific plugin design company, to get their take on emerging trends in this space. For reader who want to learn more about AI models and innovative plugins will find a list at the end of this piece.
Ableton Max For Live is currently one of the few plugin formats that easily loads AI models directly in a DAW. The device shown below, shared by Kev at The Collabage Patch, leverages MusicGen's buffer feature to generate AI music in Ableton.
This next Max For Live device is called Text2Synth and was programmed by an acquaintance who I met through Twitter, named Jake (@Naughtttt). He created an AI text prompt system that generates wavetable synths based on descriptions of the sound you're looking for.
Generative audio workstations: Designing UX/UI for AI models
The idea of a generative audio workstation (GAW) was popularized by Samim Winiger, CEO at generative AI music company Okio. His team is actively building a suite of generative music software and we hope see more from them in the coming year.
A handful of AI DAW plugins are already available commercially. AudioCipher published a deep-dive on generative audio workstations last month that explored some of the most popular AI tools in depth. Instead of regurgitating that list, I've interviewed software developers and a design team to hear what they have to say about this niche. To kick things off, let's have a look at Samplab.
Samplab is an AI-powered audio-to-MIDI plugin that offers stem separation, MIDI transcription, and chord recognition. Once the audio is transcribed, users can adjust notes on a MIDI piano roll to change the individual pitch values of the raw audio. Their founder Gian-Marco Hutter has become a close personal friend over the past year and I was excited to learn more about his company's backstory.
1. Can you tell us about your background in machine learning and how Samplab got started?
We are two friends who met during our studies in electrical engineering and focused on machine learning while pursuing our masters degree. Thanks to co-founder Manuel Fritsche's research in the field, we were able to receive support from the Swiss government through a grant aimed at helping academic projects to become viable businesses. Through this program we could do research on our technology while being employed at the university.
2. How far along were you with development when you decided to turn Samplab into a commercial product?
We explored various applications of AI in music production to find out where we could provide the most value. At first, we launched a simple VST plugin and gave it out for free to get some people to test it. To our surprise, it was picked up by many blogs and spread faster than we anticipated. From that point on, we knew that we were onto something and built more advanced features on top of the free tier.
3. What were some of the biggest design challenges you faced while building Samplab 2?
It's hard to pinpoint a single challenge, as we had to overcome so many. Our AI model can't be run locally on a computer in a reasonable time, so we decided to provide an online service and run the frontend in a VST plugin or a standalone desktop app. Making the backend run efficiently and keeping the user interface clean and simple were all quite challenging tasks.
4. What Samplab features are you most proud of?
When working with polyphonic material, our audio-to-midi is at the cutting edge and it still keeps improving. We're proud of our ability to change single notes, even in complex audio, and completely rearrange what's played in samples. We're also super happy every time we get an email from a user giving us positive feedback.
Check out Samplab's website here.
Next up is Neutone's CTO, Andrew Fyfe. We reached out to learn more about their company's mission and design philosophy.
The Neutone plugin acts as a hub for in-house and third-party AI models in the music and audio space. Typically these services would be hosted in a web browser or run locally on an advanced user's machine. To create a hub like this, Neutone's design team needed to work closely with each model and expose its core features, to make sure they run properly within any DAW.
1. Neutone has taken a unique approach to AI music software, acting as a hub for multiple AI models. What were some of the key considerations when designing the "main view" that displays the list of models?
When considering the model browser design, we wanted to make it easy for users to stay up to date with newly released AI models. Accessibility to cutting-edge AI tech for artists is one of the major pillars of Neutone so we focused on making the workflow simple and intuitive! One of the biggest advantages of Neutone is that the browser hooks into our online model repository so we can keep it up to date with the latest models we release.
Users download modules straight into the plugin, sort of like Output's Arcade but for AI models instead of sample libraries! We also wanted to make sure that we credit the AI model creators appropriately, so this information is upfront and visible when navigating the browser. Additional search utilities and tags allow users to easily navigate the browser panel to find the model they are looking for.
2. When users make their selection, they see different interfaces depending on the core functionality of that model. How does your team arrive at the designs for each screen? For example, do you have in-house UX/UI designers that determine the most important features to surface and then design around that? Or do the model's creators tend to serve up wireframes to you to expand upon?
We worked closely with our UI/UX designer and frontend team to put together the most generalized, intuitive, and slickest looking interface possible for our use-case. We wanted to create an interface that could adapt to various kinds of neural audio models. It needed to be able to communicate these changes to the user but keep the user interaction consistent. The knobs that we expose update their labels based on the control parameters of the equipped model.
We use our Neutone SDK to enforce some requirements on model developers for compatibility with our plugin. Artists and researchers can leverage our Python SDK to wrap and export their neural audio models to load into the plugin at runtime. We also warmly accept submissions from our community so that we can host them on our online repo for all to access!
3. Were there any special considerations when designing the color pallet and aesthetic of the app? I noticed that each model interface feels coherent with the others, so I'm wondering if you have brand guidelines in place to ensure the app maintains a unified feel.
On Neutone's conception we imagined it not only being a product but an impactful brand that held its own amongst the other audio software brands out there. We are pioneering new kinds of technology and making neural audio more accessible to creators, so we wanted to marry this bold mission with a bold brand and product identity. We worked closely with our graphic designer to achieve our current look and feel and established brand guidelines to maintain visual consistency across our website and product range. The Neutone platform is only the beginning, and we have many more tools for artists coming soon!
Check out the Neutone website here.
To round things out, I spoke with Nataliia Hera of Voger Design. She's an expert in UX/UI design for DAW plugins. Her interfaces have been used by major software products including Kontakt, Propellerhead's Reason, and Antares. So of course we had to get her take on the future of design for AI VSTs.
Nataliia is also the founder of UI Mother, a music construction kit marketplace for plugin developers who want to assemble high quality interfaces, without paying the usual premium for custom designs.
1. I'm excited to understand your design philosophy. But first, can you share some of the companies you've built plugin interfaces for?2. I read that you started the company with your husband more than ten years ago. Can you tell us about your design background personally? What inspired you two to begin creating interfaces and music business together?
We're self-taught in design, starting from scratch 15 years ago with no startup funds or educational resources, especially in audio plugin design, which was relatively new. We pieced together knowledge from various sources, learning by studying and analyzing designs, not limited to audio.
Our journey began when Vitaly tried using the REAPER DAW and saw the chance to improve its graphics, which weren't appealing compared to competitors. Creating themes for REAPER garnered positive user feedback and prompted Vitaly to explore professional audio design, given the industry's demand.
As for me, I lacked a design background initially but learned over the years, collaborating with my husband, an art director. We revamped REAPER themes successfully, alongside with plugins and Kontakt libraries GUI leading to a growing portfolio. About six months later, we established our design agency, filling a need for quality audio program design.
Today, I've acquired skills in graphic design, branding, product creation processes, and graphics design, thanks to our extensive library of knowledge and years of practice.
3. What does your design process look like when starting with a new client? For example, do they typically give you a quick sketch to guide the wireframe? I noticed that your portfolio included wireframes and polished designs. How do you work together to deliver a final product that matches the aesthetic they're looking for?
It is interesting, but from the outside, the client sees the process as working with a design agency. I compared the processes while talking to the agency owners I know and when I ordered a design service. The client gives data as they are. As usual, it can be some scratch, the list of controls, and sometimes, it may be an example of what the client wants to achieve in style.
As usual, the process may look like this:
All the related things I do and implement help us to improve the result. For example, we implemented an inner standard for each level to decrease the risks when the client gets the result he didn't expect.
The same things are true about the KPI and all other business instruments. All of them are focused on the client to get all he needs in time. Now I say "he needs," but there is a difference between "he needs" and "he wants". It is another story - typical client's creative process, you know. We even have some kind of client scale where we foresee the risks of creating hard, creative things that need time, iteration, and sorting out variants in the conditions of often conflicting or unchanged data in the technical task. It is normal!
4. Voger designs more than just flat user interfaces. Your portfolio includes some beautiful 3D-modeled interfaces. Are these used to promote the applications on websites and advertisements? What software do you use?
3D technology is very important in creating the most expensive and attractive designs for an audio sphere. This applies to a promo or a presentation and primarily to the UI itself. The biggest part of the most attractive photo-realistic designs of audio plugins you have seen for the last 20 years were either partially or completely created with the help of 3D technologies. Because only 3D technologies can give us believable shadows, lights and materials for objects.
Plausibility and photo-realism are essential for UI audio plugins. Firstly, people used to see and use familiar and legendary hardware devices (synthesizers, drum machines, effects and others). All of these devices have recognizable, unique design styles and ways of use (for example, familiar to all: Juno 106, TR-808 or LA-2A).
We all live in the real world, so even ordinary flat UI tries to repeat the real world with the help of shadows, imitation of real light and real materials like wood, plastic or metal (by the way, that is the reason Flat UI in iOS7 failed, but this is material for another interview). Actually, there is no difference in the kinds of programs. Any modern and adequate 3D editor can create a cool audio design. But we have been using only Blender 3D for these years. It is free, stable, doesn't cause problems, and we can always rely on it (which can not be said, for example, about Adobe and similar companies).
5. If a client wants to build a physical synth, are you able to create 3D models that they can hand off to a manufacturer?
Theoretically, yes, why not? But, practically, it all depends on a project, and we cannot influence many co-factors as the clients have not given us such an opportunity yet. I am talking about the engineering part with inner components like types of faders, potentiometers, transformers, chip location etc. We understand clients in such situations, we are not engineers to design schemes, but an exterior design made of 5 types of knobs and three types of faders available on eBay cannot be claimed as extremely complex. Still, we have experience creating a couple of such hardware devices. Unfortunately, we cannot disclose them because of NDA, as it is still under production, but one will appear very soon, and you will definitely hear about it.
6. There's recently been a shift toward artificial intelligence in generative music software. These AI models typically run on services like Hugging Face, Google Colab, or on a skilled user's personal computer. There are some amazing services out there that have no interface at all. I think these engineers don't know where to go to turn their software into usable product interfaces.
Firstly, we're a long way from achieving true AI (maybe 20 or even 40 years away). Right now, our "AI" isn't quite the cool, creative intelligence we imagine. It's more like trying to spread butter with a flint knife. If AI-generated designs flood the market, they'll all look alike since they're based on the same internet data. Soon, people will want something new, and we'll start the cycle again. The idea of becoming AI's foundation is intriguing. We'd love to see what the VOGER design, averaged over time by a neural model, looks like. Will it be light or dark? My art director is grinning; he knows something!
7. I noticed that you have a second company called UI mother, that sells music interface construction kits. Can you talk about how the service differs from Voger? What is the vision for this company?
All is simple. Here, we are talking about a more democratic and budget-friendly model of getting a cool design for audio software than VOGER. Let us imagine the developers who want to try independent design creation but don't have enough money to pay a professional team of designers. Or they risk hiring unknown freelancers who can let them down in an unexpected moment. These developers can get professional and high-quality graphics in a form of a UI kit optimized for audio programs, including many nuances of this sphere, like frame-by-frame animation, stripes, different screen sizes, etc. These clients are often independent developers or experimentalists. Sometimes, it can be small companies that need to test an idea or to create some simple, even free product.
Check out Voger Design to see more of their work and UI Mother to explore their VST UI construction kits.
To wrap up this article, let's have a look at some of the most interesting AI models in circulation and try to identify programs that don't have an interface yet. I've organized them by category and as you'll see, there's a broad spread of options for the new plugin market:
This list covers the majority of AI music model categories that we've encountered. I'm sure there are plenty other ones out there and there may even be entire categories that I missed. The field is evolving on a monthly basis, which is part of the reason we find this field so exciting.
About the author: Ezra Sandzer-Bell is the founder of AudioCipher Technologies and serves as a software marketing consultant for generative music companies.
When you subscribe to the blog, we will send you an e-mail when there are new updates on the site so you wouldn't miss them.