Stable Diffusion As a Tool: The New Generative Genie

VP of People, Opreto

4 minute read

ChatGPT, Dall-e2, and Midjourney (and ilk) are having a moment. Their user growth is explosive, news coverage is fawning, mind share is off the charts. Their generated text and images are everywhere. There is no doubt that the Stable Diffusion Generative AI models that underpin these systems are a powerful tool for communication. But this efflorescence of Generative AI models trained on large datasets has also led to at least one interesting orthogonal experience as well. I refer to the special evolution of the hyper-modern and fascinating “generative genie” shared user experience in Midjourney.

I’ve been exploring the liminal dreamscape realm of Generative AI models, checking their fit, observing the war. Some of the things I’ve found have been deeply moving, some inspiring, some terrifying or disturbing. This is part of a series about my journey and the best practices (and anti-practices) I find for these new tools.

First, a disclosure: I love Midjourney. I am a Midjourney user with a paid subscription, and I am a daily participant in its channels. I am not ignorant of its societal impacts, and I have talked about my fears about this technology in my previous post. But I am also not a blind reactionary, hurling defensive vitriol at Generative AI because I’m afraid my meal ticket might disappear. I believe statistical generation from large models to be a powerful innovation by our species, even though it threatens many economic assumptions and is a current source of chaos for our most sensitive and precious class (artists). At first I was simply entranced by the style Midjourney applied to the art it generated; I was purely attracted to its aesthetics and what it could do for me, and I adored the doors it could unlock. Certainly, coming from experimenting with OpenAI’s Dall-e 2, it struck me as having more polish, and vibed harder. It made the kind of opinionated stylistic choices that create striking visuals, and each image felt like it was ringing a gong inside me.

But over time, and after more direct experience with the system, I have realized that what I appreciate most about it is the interface and the user experience - even more than the art that it generates on a case by case basis.

Whereas Dall-e 2 is a traditional web application (enter your prompt into this field, hit “generate”), Midjourney uses the super-popular Discord chat app as its interface.

You join the Midjourney chat server, and then find one of its channels to sit in, and you prompt the AI system for your images with crafted lines of chat that everyone else in the channel can see. It is like a party line, with 50 users on a call with a generative genie trapped in a mainframe, all doing so at once, and all seeing the result of each other’s wishes and evocations.

Even when split across 100 channels on the server, the chat window is a permanently scrolling carousel of prompts from different humans, each one summoning a different highly-stylized image. Each person then usually continues a process of prompt refinement over the next hour, with more and more targeted details; an ongoing public interaction, happening in the town commons. These conversations with the oracle weave in and out of each other, honing in on more and more specific memetic fragments, sharing a common stream of consciousness, reacting to each other’s images with approving emoji.

When taken in aggregate, it feels like experiencing a collective dream, an expression of that day’s total-civilizational subconscious. The constant stream of images are the things that trouble or inspire all those other connected users, and I am glimpsing their dreams - ephemereal, powerful, evocative. During the World Cup, there were oil painting portraits of Lionel Messi wearing a crown; during the early days of Elon Musk’s takeover of Twitter, there were images of blue birds evaporating into smoke.

People crowd around Midjourney’s generative flame to express complicated feelings through a wrangling process of discussion with our cultural milieu in its entirety, through the blinking cursor and the call and response of these Generative AI systems. It is its turnkey nature that makes it such a powerful tool for generating images and text, its fashion of capturing dreams and visions and nightmares in the established language of artistic expression that makes it exciting to individuals using it. But even when it falls down and lands in the uncanny valley (which it often does), it can invoke a complicated sentiment or vibe. When it does this it is speeding up and improving the way humans talk to each other in collectives. When we are given windows to see this conversation, a fireside ring of seats around a generative flame, it puts our whole civilization in a different light.

The Stable Diffusion Generative AI model can change how we experience society and each other. This is always what art was supposed to do; now its happening at hyperscale. Midjourney’s user experience is a hint of things to come, an evolution of communicative capacity for our species, and tomorrow’s “generative genie” may well be prompted by brain signals and display its images on a screen on our foreheads.

In my next post, I will be visiting some of the nightmare fuel that Stable Diffusion can inadvertently produce, both because of how it works, as well as how it doesn’t. I will be examining the places where SD Generative AI models fail, and the things it is a bad practice to use them for. There are things in this world that will always need a human eye and an organic hand.