Of what do AIs dream?

Midjourney is an AI platform that deploys machine learning and processing techniques similar to Dall-e. The backend to the Midjourney platform drew (and continues to draw) on a vast repository of images and their associated texts. That’s as much as I dare surmise about its operations. I have yet to read the definitive academic articles revealing the precise functioning of its neural-network-based algorithms.

Most reviews seem to agree that Midjourney’s impressive performance in generating original images is under the influence of an inbuilt bias that defaults towards photorealistic, 3d-style, shaded and airbrushed, moody images.

An interesting web post by Steve Dennis shows how a simple instruction to Midjourney to draw a circle defaults to “a cinematic teal-and-orange look” — a circle in a 3D setting with depth, shading and clouds. He then enhances the circle request to Midjourney by naming different materials, settings, and art styles. All are rich and evocative. Midjourney veers towards circles, and the complicated.

I’ve tried something similar to the circle exercise with the words “unicursal labyrinth,” an obsession indulged in previous posts. Midjourney’s generated imagery is rich in form, materials and context. The platform provides 4 versions in response to the prompt and offers the option to choose and enhance each of them. That said, the 2×2 image array is compelling in its own right. I’m inclined to show all 4 in what follows. (Click to enlarge.)

I prompted Midjourney to create versions of the unicursal labyrinth with 6 further key words each in turn: (1) Vitruvius, (2) modernist, (3) Le Corbusier, (4) floral, (5) video game, and (6) pixel art. I show the outputs here.

A prompt to generate a simple line drawing produces these images.

Close scrutiny of any of these images reveals interesting spatial paradoxes — for example, ambiguities between walls and corridors. Most of the drawings are not true labyrinths, unicursal or otherwise, though they are certainly labyrinthine.

Midjourney also takes a picture as input that prompts further generation. So I gave it my own truly simple drawing of a unicursal labyrinth.

I added the terms: (1) stainless steel, (2) crowds of people, (3) accessible railway station, (4) Spongebob Squarepants, (5) cryptography, and (6) blockchain. Nothing in the prompts lists included the word “labyrinth,” yet it seemed to produce them.

Midjourney seemed to adapt and enhance my simple line drawing as a frame on which to hang the other concepts. I suspect that the Midjourney algorithms implemented some smart feature detection that identified my drawing as a mostly symmetrical compact labyrinth shape drawn with prominent lines and no colour. So, my image and the final image were mediated by textual descriptions. I’m happy to have this hypothesis challenged.

The featured image at the top of this page was generated by Midjourney prompted by the second sentence in my new book Cryptographic City:

“Cryptographic methods and technologies at times appear exotic and external to the concerns of those of us interested in the history, design, and shaping of cities, but in what follows I will demonstrate that cities are already invested in cryptographic ideas and practices. At the very least, cities and cryptography have concepts and procedures in common” [1].

Here is the initial set generated by Midjourney.

There was nothing in my description about circles or spheres, though the algorithm seems to favour them. See my post: Circles and how to get out of them. This particular AI platform dreams of circles. Also see post: The hallucination machine.

References

Notes

  • James Stewart keeps an impressive array of AI generated images on his Facebook page at https://www.facebook.com/jameskstewart.
  • Midjourney (1) converted a photograph I took of Frank Gehry’s Bilbao Museum into a railway station, and (2) generated the image on the right from the prompt: “Escher-style, self-referential tree house on the moon.”

Leave a Reply