I’m revisiting my post of 18 January 2014 #179 What’s wrong with parametricism. As that post is ostensibly about computer-generated 3D building forms, it fits my previous reflections on AI generated photo-real static pictures and animated videos. See post Bohemian melancholy.
Diffusion-based image generation can certainly generate images of 3D architectural forms. I uploaded one of my photographs from a recent trip to Doha and prompted ChatGPT: “This is a sample of a crystalline rock formation often called a ‘desert rose.’ This particular sample could be held in one hand. Please generate an image of a public building (e.g. a museum) in a waterfront setting that captures the smooth curves of the stone ‘petals’ and the ways they collide and intermesh. Don’t show any people in the representation.”


The AI generated the image above, which happens to adopt a formal style similar to that of Jean Nouvelle’s actual National Museum of Qatar in Doha, designed explicitly to reflect the forms of the desert rose.

ChatGPT explained this similarity: Diffusion models “work through statistical recall of precedent morphologies. The desert-rose → Doha museum connection is strongly represented in architectural imagery circulating online. As a result, the system tends to produce a culturally recognisable solution rather than a formally unprecedented one.” It then suggested prompts to steer the generative process towards something more original.
But here I am more interested in a different challenge — how to calculate a 3D computer (or CNC) model of a structure implied by an image generated by AI.
How to parameterise a picture in 3D
As explained in previous posts, AI fusion-based image generation requires no spatial geometrical modelling to create an illusion of 3D space, demonstrate perspective, parallax, occlusion, persistence and shifts in point-of-view (POV).
Some digital cameras (e.g. Apple iPhone) produce photographs that incorporate spatial point-cloud data, hidden from direct view as a specialised data layer — a depth map, or 2.5D representation. Legacy photography, analogue films, and hand drawings do not have such 3D layers — nor do AI-generated pictures (such as the desert rose building above) nor videos, as yet.
There are techniques for deriving 3D information about building and landscape geometries from 2D photographs (photogrammetry), especially with stereopsis and the analysis of multiple views of scenes from different points of view (POV). The data so derived is generally in the form of point clouds — 3D coordinates of points visible from particular view points.
Point clouds
LiDAR scans of actual settings produce point clouds by calculating the distances traversed by a scanning laser beam reflected from surfaces back to the POV transmitter. See post: Fade to black: LiDAR in the age of extinction.
Techniques exist for converting point-cloud data to contiguous meshes. With supplementary information and some computational effort such data can be converted to likely point and edge coordinates of objects suitable for representation, analysis and manipulation in CAD and geometrical modelling platforms. Can diffusion techniques assist?
Diffusion models produce images or videos as “appearance fields” (ChatGPT’s term), rather than point clouds or CAD-ready point-line-plane data. However, diffusion modelling can feature in workflows that produce point-cloud data, or directly produce point-line-plane data as required in a CAD system.
As usual, I called on ChatGPT to supplement (and circumvent) my own web searches. What follows is adapted from a recent interaction.
3D workflows
Several research and production techniques now explore the feasibility of work flows that transition from AI generated imagery to serviceable CAD models.
One approach is to prompt a diffusion system to generate outputs from different POVs designed to be geometrically reconstructable via photogrammetry. ChatGPT incorporates and interfaces with its own diffusion model. Here is the AI’s 2nd attempt at generating a stereo pair from an image that I uploaded (Geoffrey Bower’s house in Colombo, Sri Lanka).

The stereoptic effects are very sparse and random in this case (though ChatGPT’s explanation of the difficulties is interesting). Were such a diffusion system to be better adapted for, trained and tuned on such outputs, then the rudiments of a 3D model could be derived via photogrammetry.
There are more sophisticated approaches, e.g. where the platform operates with volumetric representations, rather than pixels on a picture plane. The neural network of a diffusion platform could be trained on images that contain pixel-by-pixel depth information. When used to generate new images, a trained neural network would predict or infer depth rather than triangulate mathematically from multiple views. I gather such techniques are under development.
What’s wrong with AI-based 3D
It’s evident by now that diffusion-derived geometry tends to be noisy, irregular, and in some cases physically improbable. At best, it captures “perceptual plausibility” rather than architectural accuracy. Moving from generative geometry to design geometry requires post-processing such as surface fitting, plane detection, and parametric reconstruction.
Diffusion models can suggest volumes, articulate façade rhythms, or infer spatial depth but they don’t yet produce defined architectural elements such as orthogonal wall systems, planar surfaces, or dimensionally specific structural grids.
As my AI interlocutor intoned, “Diffusion predicts appearance consistent with learned spatial statistics — probabilistic reconstruction.”
So diffusion models can contribute to point-cloud production and even to mesh generation, but they don’t yet produce the clean point-line-plane abstractions required for architectural modelling without additional reconstruction and constraint-imposition stages as presented in my post of 18 January 2014 #179 What’s wrong with parametricism.
Discover more from Reflections on Technology, Media & Culture
Subscribe to get the latest posts sent to your email.