Essays

My Journey with Midjourney

Written by Mark McElroy

New technologies challenge what it means to be an artist.

When I was in third grade, our teacher gave each of us rectangles of beige paper and a few crayons and said, “Show me the very best drawing you can make!”

I spent the next quarter-hour drawing a female magician pulling a rabbit out of a hat. I remember the picture clearly, because my mother saved it. The figure of the magician is little more than an oval head atop a triangle dress, standing upright on two stick legs. The hat is a lopsided square with a warped ellipse for a brim. The rabbit looks like an obese, buck-toothed hamster.

When my teacher saw my masterpiece, she took away my crayons, gently slid my paper off the desk, and said, “Drawing just isn’t your thing. Why don’t you write one of those stories you’re always writing?”

A Genie for Frustrated Artists

From a very young age, I was able to imagine fantastic vistas, translate them into words, and capture them in stories. As an adult, I’ve written scripts for Tarot deck illustrations that graphic artists have made into compelling images. As a professional, I’ve been empowered by computers to make a small portion of my living through graphic design. I’m a decent photographer. But I’ve never been able to take the vivid images in my head and translate them directly into sketches or paintings or photos — until now.

Midjourney is an artificial intelligence that generates images based on text prompts. (Other systems, including DALL-E 2 and NightCafe, do much the same thing.) For someone like me, stumbling on Midjourney has been like finding a genie who takes any image I can describe and renders it into a sketch, a cartoon, oil painting, or photograph in any number of styles in about 60 seconds.

Magic … and Mystery

The Midjourney AI understands composition, lighting, perspective, and levels of detail. It knows the styles of thousands of illustrators and painters and art schools. It knows how the use of very specific lenses will impact the look and feel of a photograph, or how certain kinds of paint or media influence the appearance of a painting. And because it’s been trained on millions of images from the internet, it knows all about celebrities, movies, and pop culture, and it can blend together elements from these sources to create something new (or, as some would say, derivative).

The results are portraits, illustrations, and vistas that are equal parts delightful and frustrating. While delightful in their dreamlike vividness, they are also frustrating in their tendency to be almost perfect. A stunningly realistic portrait may have black, dead eyes. An exquisite painting of a potter in his workshop may have seven fingers on each hand. A breathtaking landscape may have a shadowy figure lurking in the meadow.

“Imagine a southern mansion being consumed by flames.”

Midjourney Basics

Using Midjourney is simple. Using the Discord app (of all things), you submit a prompt — a text instruction, beginning with the command “/imagine” and describing what you want to see — to the Midjourney AI.

Prompts have four basic parts:

  • They always begin with the command “/imagine,” followed by
  • The subject (what you want to see)
  • The details (modifiers, atmospherics, styles, color palettes, lighting instructions, perspectives, cameras, lenses, tones), and
  • Parameters, like aspect ratios or image sizes or levels of quality.

You can get intriguing results with very simple prompts. Here, for example, is what Midjourney returned when I typed “/imagine a man with a hat”:

Four initial images produced by the simple prompt, “Imagine a man with a hat.”

Getting Better Images

After weeks of obsessive Midjourney use, I’ve become adept at what Discord members call “promptcraft” — endlessly tweaking arcane strings of text to nudge Midjourney into generating more pleasing images.

For example, here’s the output when Midjourney is given the prompt: “/imagine a man with a hat, symetrical face, mix of matt smith, connor swindells, thimothee chalmet, jake gyllennhal, noah wyle, soft skin, soft focus, in style of Tom Bagshaw, Alfons Mucha, William Morris art background, 80 mm, natural lighting, beautifully lit –ar 9:16”:

Add more descriptive language, and you get more detailed, more specific images.

Watching the Paint Dry

Whether your prompt is simple or complex, by default, Midjourney will draft four versions for you to review. Waiting for these to finish is a bit like the modern-day equivalent of fanning and blowing on Polaroid prints to speed up their development.

At first, all four images are just shapeless masses of color and shadow, but as Midjourney keeps working, four images slowly emerge. To help you pass the time, Midjourney provides a progress statement (“10% or “35%” or “85%) telling you how close the images are to being finished. At about 65%, the images are clear, but soft. But even at 100%, your first four results may have odd components, weird inclusions, or suffer from an unpleasant roughness.

But: no worries — you’re just getting started. If none of the four candidates suit you at all, you can click the “reroll” button (the one featuring two curved arrows) and Midjourney will start from scratch, honoring your prompt but generating images with an entirely different “seed,” producing four new candidates for review.

Slouching Toward Perfection

But most of the time, one of the four initial images will be a distant cousin of the image in your head. When this is the case, you can pick the candidate (1, 2, 3, or 4) that pleases you most and ask Midjourney to give you variations on it, producing four more images based on your favorite.

You repeat this process over and over, producing variations of variations, until you’re satisfied. Once the image pretty much suits you, you can upscale it, producing a single image large enough to use on social media or your desktop. For printing on a large canvas, you’ll have to do some upscaling in a third party program, like Topaz or Affinity Photo or even Pixelmator Photo, too.

Alternatively, you can “remaster” the image, submitting it to an algorithm that takes some liberties with your design, but renders it with a much higher level of detail.

You follow this process over and over again until Midjourney produces results you like.

What Midjourney Does Well

Landscapes. Midjourney, by design, leans toward the atmospheric: moody fantasy scenes, cyberpunk cities, and apocalyptic landscapes.

Illustrations, paintings, photos, and renders. Midjourney also generates “hand-drawn” art, logos, book covers, movie posters, paintings and photos “in the style of” famous artists, and even images of hand-crafted objects with ease, particularly if you’re not too picky about exact details.

Portraits. While rendering slightly abstracted images of people is as easy, rendering photorealistic people is harder and takes longer. It took me an hour of promptcrafting and tweaking to get my first fairly photorealistic image:

This was the first “photorealistic” image I managed to create with detailed prompting.

Now, I can generate more realistic portraits with ease, but even the best of these tend to have odd eyes or extra digits:

Midjourney struggles with eyes, hands, and limbs.

For the very best photorealistic results, I use Midjourney for initial renders, then finalize the images in Affinity Photo (which is like Photoshop, but smarter):

A little tweaking in Affinity Photo makes AI-generated portraits look much more realistic.

What Midjourney Doesn’t Do Well … Yet

Eyes. They tend to be soulless, lifeless, deformed, or, of all things, double-irised.

Hands and feet. Human and animal bodies tend to feature extra digits and deformities. Multiple limbs are not uncommon.

Fine adjustments. Images that don’t quite suit you must be varied or remastered or rerolled completely. You can’t say, “Keep that guy, but fix his arm” or “This woman, but have her standing instead of sitting.”

Replicate earlier results. Because of the way images are “seeded,” using the same prompt will never produce the same image twice.

Naughty images. Aware of Midjourney’s potential to be used to produce deep fakes and realistic pornographic images of well-known celebrities, the app’s creators have banned certain prompts, including selected body parts, words like “shirtless,” and even, as I discovered when trying to render “a big black tornado” (I swear!), the phrase “big black.”

That said: even casual browsing of the public image stream on Midjourney.com reveals just how far certain users have pushed these limits, sometimes with disturbingly realistic results. While browsing the public stream, I’ve stumbled on (and reported) images of sexual violence against women more than once. Midjourney encourages community policing of images, providing a “Report This” button and actively booting folks off the platform who use clever promptcraft to skirt the no pornography rule.

What Midjourney Does That’s Weird

  • Midjourney occasionally freaks out. One morning, every single person I tried to draw was a twisted, Silly-Putty mass of flesh: revolting and horrific images that never evolved beyond grotesque deformity. Why? I’ve no idea.
  • Midjourney leans into color palettes. Mine has latched onto autumnal palettes, and, despite specific direction to the contrary, most my images feature deep reds, golds oranges, and yellows.
  • Earlier work influences later work. After a morning spent trying to render UFOs floating over Mississippi soybean fields, other images created later in the day incorporated odd lights in the sky or random patches of agriculture.
  • Much work is done in public. Because Midjourney rcommunicates via Discord, a lot of people are working in public image generation channels. There are more than twenty of these for beginners and at least as many for experienced users, so jump in, type a prompt, and get going. The people there don’t interact much (that’s more for the showcase-style channels), but you can learn a lot from watching how others tweak their prompts over time.

Don’t want to work in public? Send your prompts to the Midjourney bot via private messaging … or, if you’re a clever programmer, you can integrate the Midjourney bot into a private Discord server of your own.

What I’ve Done that Pleases Me

Using a combination of Midjourney and Affinity Photo, I’ve created many images I’m unreasonably proud of. In one case, I’m working on a series of editorial images poking fun at the homophobic new Global Methodist Church:

From a series of paintings I’ve done mocking the homophobic Global Methodist Church.

I’ve created professional photos of each person in my recently-finished novel, Parallel Lines:

Thomas, the main character in Parallel Lines. Portrait generated by Midjourney.
Hope, Thomas’ unhappy wife, from Alternative Timeline 1 in Parallel Lines.
Davina, the antagonistic driver of much of the tension in Parallel Lines.

And I’m getting better and better at summoning more and more photorealistic results:

A steampunk android character, rendered by Midjourney
King of the Dwarves
Mr. April in my “Mr. Midjourney” calendar. (Just joking.)

What Does Midjourney Cost?

Midjourney offers three tiers of service:

  • $10.00 a month to generate about 200 “fast mode” images (usually rendered in a minute or less), with the option to generate more fast-mode images at the rate of about $4.00 for 60 images.
  • $30.00 a month to generate about 900 fast mode images and an unlimited number of “relaxed mode” (usually rendered in two or three minutes, tops) images.
  • $600 a year to create unlimited fast mode images in “private mode” where no one else can see what you’re doing. (You can also add private mode to the $30.00 plan by paying an additional $50.00 per month.)

When choosing a plan, bear in mind that you often need to generate between 4 and 10 intermediate images to steer Midjourney to the result you want. Each attempt counts as an image.

I signed up for the $10.00 plan and quickly moved to the $30.00 plan. After a month of heavy use, I decided to be less explorational and more purposeful in my use, so I dropped back down to the $10.00 plan. So far, the stingier plan is more adequate … but a lot less fun.

The Ethics and Impact of Midjourney

As mentioned before, Midjourney has been trained on millions of images scraped from the internet — including the work of artists and photographer and graphic designers all over the world. As a result, it’s absurdly easy to say, for example, “/Imagine a painting of New York City in the style of Robert Kinkade, the Painter of Light”:

Or in the style of Keith Haring:

Or even photographed by Ansel Adams:

Ethical questions abound.

  • What does it mean to live in a world where the style of a given painter, without his or her consent, can be copied wholesale and applied to any number of projects?
  • While the Midjourney terms of service say images created are mine to use as I please, to what extent are images based on the style and composition of another artist and drawn by a computer actually mine?
  • Midjourney is already beating humans in art competitions. How fair is it for artists to enter a Midjourney render as their own digital artwork?
  • When the creation of art is commoditized in a way that lets me produce virtually unlimited art for $30.00 a month … what does that do to the value of stock art … or any human-produced art?

As much as I love Midjourney, questions like these give me pause.

This is Just the Beginning

New technologies — the iPhone, ultra-high definition television, virtual reality, voice assistants — all feel magical for a moment … until, with familiarity, they become mundane. For now, creating illustrations with Midjourney still feels magical, as though I’ve found an artistically-inclined genie. I, the master, wish for art, and Midjourney, the genie, produces it.

But of course, in every story about wish-granting djinns, the clever genie’s services always come at a terrible price.

People are already using text-based AIs (like Jasper) to write blog posts. As a professional writer, I can spot these a mile off, at least for now. But how long before I’ll be able to ask an AI, “/Imagine a fast-paced, breathtaking novel set in the world of Star Trek, a massive galactic threat, diverse characters, gay romantic subplot, main crisis resolved, romantic love triangle unresolved, with a cliff-hanger ending that leads to a sequel, written in the style of James S. A. Corey”?

More than one startup is already offering an AI that turns prompts into video clips. Given enough computing power, how long before I can say, “/Imagine a twelve-episode comedy series with characters like those from Sex Education, set in the world of Game of Thrones, with stand-alone episodes organized in a single long story arc concerning a lost magical ring”?

If you think there’s a glut of (good and bad) media now, just wait until we’re living in a world where everyone is generating paintings, novels, series, and movies at the rate of millions per minute.

Magical Thinking

In a world where the creation of art has become democratized, everyone is an artist. In response to this, though, new arts will emerge. Already, among Midjourney users, there are some of us who are more adept at promptcraft than others. One day, perhaps, “He’s a great prompter” will carry all the cachet of “She’s a great painter” or “They’re a great photographer.”

And then what? As the AI becomes more clever, obtaining desires results will depend less and less on guessing what orders will best direct the genie. When we can easily demand whatever we can imagine, the new star artists will become those who are the first to imagine output more surprising and engaging than everyone else’s — at least until the AI learns to copy and replicate their style.

And at that point, all that’s left for us to do is say, “Alexa, I’m bored” — and, based on previous consumer activity, Amazon.com’s AI, knowing what we haven’t seen, will create on-demand the exact media experience we would have asked for, had we only been clever enough to think it up ourselves.

And what if the entertainment were to be so engaging and so immersive, I could lose myself in it completely, forgetting who I really am, becoming so immersed and engaged that I could spend seventy or eighty years on life support, mesmerized by a digital experience’s twists and turns, setbacks and victories, mistaking them for my own?

At that point, who’s the artist? And given the power of the genie, who is really the master?

About the author

Mark McElroy

I'm a writer and professional facilitator. I'm the author of a dozen or so non-fiction books and hundreds of corporate video scripts. As a professional facilitator, I coach individuals, committees, and teams to change how they meet, make decisions, and plan, so they can get out of their own way and do work that really matters. I use this site to write about writing, adaptive strategy, travel, and spirituality ... and to "learn out loud" by sharing works (and what doesn't).