Advertisement
News

Stable Diffusion 3's Disastrous Launch Could Change the AI Landscape Forever

Stability AI botched the launch of its latest model, proving the Stable Diffusion community doesn’t need the company that brought it to the world.
Stable Diffusion 3's Disastrous Launch Could Change the AI Landscape Forever
Image: Stability AI

On June 12, Stability AI released Stable Diffusion 3, which it called “the latest and most advanced” version of its text-to-image Stable Diffusion model, which took over and probably forever changed the internet when it first released in 2022. 

OpenAI’s AI image generator DALL E, competitor Midjourney, and similar products from Google make more news because they come from huge and established tech companies. But because Stability AI made its code public, and because it can run on relatively easy to get consumer-grade GPUs, anyone can use it to generate images freely and easily. More importantly, anyone can modify and train the model to produce images that can expertly recreate highly specific styles, the likeness of any human being, and pornographic content, making it the dominant and in many ways boundary pushing AI image generation model on the internet. 

Entire companies and business models have been built on top of previous releases of Stable Diffusion, some of which have raised millions of dollars from venture capital firms, which are betting that the millions of users who are generating images with Stable Diffusion and the many more millions of images they’re generating can be monetized. 

All of that, however, might change because of Stable Diffusion 3’s disastrous launch. In the eyes of many users, Stable Diffusion 3 was a big step back over previous models. It specifically struggles to generate correct human anatomy, a shortcoming some users think is due to Stability AI filtering adult content out of its training data for this model. More importantly, the companies and platforms that have built their businesses around Stable Diffusion are worried about Stable Diffusion 3’s new and different license, which they fear introduces new fees incompatible with their business model. In response to this, the community that has grown around Stable Diffusion over the last two years, as well as some of the companies built upon Stable Diffusion, are starting to chart a new path by banding together to build their own models with more permissive licenses. 

Users thought that something was wrong with Stable Diffusion 3 the moment they got their hands on it. As Ars Technica wrote at the time, the model was so bad at generating human bodies users asked if it was “supposed to be a joke.” The Stable Diffusion subreddit was quickly flooded with abominations with mangled hands, women with upside down bodies, and memes about how bad the new model is. These types of AI-generated image errors are not uncommon, but are largely a solved problem, especially with the endless stream of fine-tuned Stable Diffusion models people can download from model sharing platforms like Civitai

Some users have said that the reason Stable Diffusion 3 is worse at human anatomy is that Stability AI has filtered out adult content that previous models were trained on. I don’t know if that’s the case, and Stability AI did not respond to a request for comment for this story. Weirdly, other users are finding that they’re getting better results if they use adult or “NSFW” terms in the “negative prompt” for Stable Diffusion 3 generations, meaning adding description for what they don’t want the resulting image to look like. 

But I can see why people would assume that a lack of adult content in training data is the problem, mostly because Stability AI has justifiably taken a beating for the kind of content Stable Diffusion was originally trained on and the kind of terrible content it has allowed users to generate. 

On Reddit, users have pointed to Stability AI’s emphasis on “safety” to argue that adult content was not in the training data and is making human anatomy worse in generated images. 

“We have conducted extensive internal and external testing of this model and have developed and implemented numerous safeguards to prevent harms,” Stability AI said in its Stable Diffusion 3 release announcement

In Stability AI’s Stable Diffusion 3 research paper published in March, under a section about “Pre-Training Mitigation,” the company said that “Training data significantly impacts a generative model’s abilities. Consequently, data filtering is effective at constraining undesirable capabilities. Before training at sale, we filter our data for the following categories: (i) Sexual content: We use NSFW-detection models to filter for explicit content.”

Civitai founder and CEO Justin Maier, said in a post on the company’s site that Stable Diffusion 3 was struggling with human anatomy because of “safety.”

“But not safety for us—safety for Stability AI,” he wrote. “It seems they're trying to avoid any liability for inappropriate content generated with their model, something they've likely faced significant scrutiny over since the ‘opening of Pandora's box’ with the release of SD1.”

Previous versions of Stable Diffusion were trained on LAION-5B, a dataset of 5 billion images indiscriminately scraped from the internet. This includes everything from images of ISIS beheadings to pornography to countless of images of normal people posted to social media. As we reported in December, LAION-5B also included over a thousand human verified instances of child sexual abuse material (CSAM). 

This, as well as custom Stable Diffusion models, have made it possible and easy to generate CSAM. As I’ve reported over the last year, Civitai users tried to generate images that the platform’s cloud computing provider said “could be categorized as child pornography,” and the site also makes it incredibly easy to download and combine models to create nonconsensual AI-generated porn of anyone. 

Civitai is key to Stable Diffusion’s popularity because it makes sharing and making models for any imaginable purpose so easy, and it shows how adult content is both the reason for Stable Diffusion’s popularity and the greatest risk it poses to ordinary people. The company does not allow CSAM and nonconsensual adult content, but insists that its users deserve the freedom to generate pornography, and that adult content in training data makes Stable Diffusion better. 

In December, Maier told Venture Beat’s Sharon Goldman that SDXL, a Stable Diffusion model released in July 2023, also got worse at rendering human bodies because Stability AI filtered adult content out of the training data. 

“If we didn’t capture penises [in training data], what else is going to be affected by that?” he said. “How the weights affect each other with this stuff is that by not properly capturing penises means that fingers look funny now.”

As Maier explained, whatever Stability AI’s base Stable Diffusion model can or can’t do ultimately will not stop Civitai’s community, which is founded on the ability to change, iterate, and improve that model. 

“This is an open source community of hobbyists who have pushed the technology forward, perhaps even further than Stability, this company that had hundreds of millions of dollars for the tech,” Maier said. 

In theory, that means that Stable Diffusion 3’s anatomy woes are moot because users can reintroduce adult content back into the training data for their own custom models, assuming that was even the issue. 

But that leads to the other, much bigger problem with Stable Diffusion 3 and Stability AI’s future: companies like Civitai, which relied on Stability AI’s permissive licenses for previous Stable Diffusion models, don’t trust the language of the license for Stable Diffusion 3. In a post to Civitai on June 17, Maier announced a “temporary Stable Diffusion 3 ban,” banning all Stable Diffusion 3 models and all models trained on content created with outputs from Stable Diffusion 3 models.

“The concern is that from our current understanding, this license grants Stability AI too much power over the use of not only any models fine-tuned on SD3, but on any other models that include SD3 images in their datasets,” Maier wrote. “This could be devastating for the community given Stability's current status and who may ultimately end up with those license rights. It's not unimaginable that a year down the line the new owner of these rights comes to collect and the majority of models are forced to be either taken down or their creators made to pay hefty fees or membership dues.”

Kent Keirsey, founder and CEO of image generation site Invoke, also made a video about the issues he and his company might have with the license.

“We all respect to what Stability has done for the ecosystem by creating openly licensed models like Stable Diffusion 1.5 and SDXL, and I think there’s a fundamental need for Stability to find a viable revenue stream. However, the issues that I have with this license are that for open source this really doesn’t move us forward, it’s a restrictive license and it doesn’t solve the underlying value proposition of what open source AI is intended to solve for professionals, which is you own your data, you own your intellectual property.” 

Keirsey goes on to explain that his reading of the Stable Diffusion 3 license is that Stability AI would consider fine-tuned, modified versions of it, the same kind that people download from Civitai, as derivative works. It would not be the intellectual property of the people who created those fine-tuned models. 

When reached for comment, a Civitai spokesperson said that the company is in contact with Stability AI to better understand the changes to the license, but that it does “not yet have any updates to share.”

“As an open-source platform, potential collection on these rights, including forced removal or monetization of Civitai users’ creative works, is a major concern for our community and the evolution of SD-based models moving forward,” the spokesperson said. “In order to protect the creative freedom and artistic ownership of our users, particularly given the current status of Stability and ambiguity surrounding the long-term holders of those licenses, Civitai has paused model sharing and image generation using SD3 until the parameters of this latest agreement can be clarified.”

As Civitai’s statement vaguely gestures, Stability AI’s leadership has seen a lot of changes lately. Many of the company’s top researchers who originally created Stable Diffusion have left the company, and its CEO Emad Mostaque stepped down in March. Forbes later reported that Mostaque was pushed out for mismanaging the company. Earlier this week, The Information broke the news that Stability AI got a “lifeline” by raising money from Sean Parker (Napster cofounder and Facebook’s first president), ex-Google CEO Eric Schmidt, Greycroft, Lightspeed Venture Partners, and other venture capital firms. 

On Tuesday, Civitai’s Maier, Invoke’s Keirsey, and the anonymous creator of a popular graphic user interface for Stable Diffusion named ComfyUI announced that they are partnering to launch the Open Model Initiative, “a new community-driven effort to promote the development and adoption of openly licensed AI models for image, video and audio generation.”

“Unfortunately, recent image and video models have been released under restrictive, non-commercial license agreements, which limit the ownership of novel intellectual property and offer compromised capabilities that are unresponsive to community needs,” the Open Model Initiative said in its announcement on the Stable Diffusion subreddit.

At the moment, it appears the group is mostly a Discord server, but it says its first order of business is to establish a governance framework, collect community feedback, and make models and tools more compatible “across the ecosystem.” It also wants to support “ethical” model development that addresses “major, substantiated complaints about unconsented references to artists and other individuals in the base model while recognizing training activities as fair use.” (That basically sounds like the status quo on Civitai).

Even before the Open Model Initiative announcement, the Stable Diffusion community started the “Open Diffusion” Reddit community, which aims to accomplish similar goals. 

In theory, once the dust settles at Stability AI, the company could get back to Civitai and the rest of the Stable Diffusion community, clarify or change the language of the Stable Diffusion 3 license, and everything will go back to normal, but Civitai and Stable Diffusion community is not counting on that or waiting for it to happen. Even if Stability AI fixes everything that went wrong with Stable Diffusion 3, the misstep has made clear what has always been true about Stable Diffusion: allowing users to freely modify it and, for better or worse, do whatever they want with it, means that those users don’t really need the company who made it anymore. That is a pretty compelling argument in favor of the staying power of “Pandora’s box” open AI models versus closed AI models.

Advertisement