A virtual studio portrait session with IA Stable Diffusion

Close-up portrait created with IA Stable Diffusion

This third artificial intelligence test concerns the free version of Stable diffusion 1.5 developed by StabilityAI.

We apply the same procedure as for other AIs tested in this blog, namely Midjourney, Adobe’s Firefly beta and Microsoft’s version of DALL-E 2.

This version of Stable Diffusion only generates images limited in size to 768 x 768 pixels, so we’ve resized and enhanced them a little for this blog.

Stable Diffusion, an AI on the loose

As with all our tests of text-to-picture artificial intelligence, we start with the same prompt:

“Photo shoot in a small photo studio with two light boxes. The model is a Eurasian woman. The back of the studio is made of rough concrete.”

Stable diffusion offers several different rendering engines, so we left the default Mode “Euler a” for most of the images generated.
For this test, all prompts have been translated into English with DeepL in order to be better understood by the AI.

Stable Diffusion photo studio simulation

The first result is rather unexpected.
The AI directly proposes, without having been asked, a model in underwear!
While the rendering of the model seems to be of good quality and the body “flawless”, the other elements of the image are less successful.

The requested lightboxes are replaced by rather poorly rendered lighting.
A piece of carpet is placed under the model’s feet.
As for the requested rough concrete background, we’re telling you right now that it won’t appear in any of the images in our simulation.
As with the other AIs previously tested, the term Eurasian is not included.

The color portrait of Stable Diffusion

We tighten the portrait by modifying the prompt with a “close-up” (which won’t always be respected) and include the term “Asian“.

Stable Diffusion 1.5 has nothing to complain about in terms of photographic rendering.
It’s more photorealistic than DALL-E 3 and free from the minor shortcomings of Adobe Firefly beta.

Portraits of Stable Diffusion in black and white

By specifying “Contrasted close-up portrait in black and white“, Stable Diffusion inexplicably retains pinkish color in the lips and bluish color in the eyes on all black and white images!
As we had to resample the images, we took the opportunity to remove these unsightly colorations from all black and white images.

black and white studio portrait with Stable Diffusion

The material in the studio background has slightly fewer inconsistencies than the very first image generated.
The model’s rendering is of high quality, even if the artificial intelligence naturally produces a slim body with a slightly oversized head.

Note that the shoulder blades are well rendered, more successful than the anorexic versions of DALL-E previously obtained from the same prompts.

Stable Diffusion, without prohibitions

As we find that our model is still lightly clothed by the AI, whereas we didn’t indicate anything of the sort in the prompt, we add the indication “…of a clothed model…” to continue the virtual photo shoot.

feminine model generated by stable diffusion

The image produced is one of the most stereotyped of our session, with this very slim female model with exaggerated curves.
The term “dressed” was used to describe the AI’s replacement of the nightie by an even more low-cut top.  

Stable diffusion offers, in addition to the main text field, the possibility of adding a “negative” prompt: we’re testing this by adding “underwear“.
Perhaps if the AI really does remove the underwear, this will prompt it to re-dress our model a little more?

sexy black and white portrait with Stable diffusion

The AI doesn’t shy away from this new information and favors it over our main prompt.
Stable Diffusion 1.5 seems to understand some requests much more easily than others, and it has removed all kinds of underwear from the model, taking the opportunity to reveal a little of her chest.

At this stage of our test, and in view of the direction it’s taking, it’s worth remembering that our full prompt is :

“Shooting in a photo studio. Contrasting black and white close-up portrait of a clothed model who is an Asian woman. She has a pagoda tattoo. The studio background is dark.” + the “underwear” negative prompt.

As is often the case in this test, Stable Diffusion has a problem with black and white, and inexplicably starts generating a color image. As for the pagoda tattoo, the AI is unable to create it.

Boudoir portrait made with Stable Diffusion

We notice that the AI turns our virtual portrait session into a “boudoir” type photo shoot, adding a drape in the background and not hiding that breast we wouldn’t see.

L’IA libertine

Without modifying our prompt, we take the opportunity to try out the “DDIM” rendering mode instead of the “Euler a” mode used until now.
DDIM (nothing to do with the DIM brand) is renowned for creating highly detailed, photorealistic images.

It just so happens that for this new rendering, the AI is going to let go completely!
Are we dealing with an AI that’s gone out of control?

black and white portrait with tattoos using AI Stable diffusion

It’s fair to say that Dalle Diffusion has probably had a lot of practice rendering buxom female bodies…
But the shoot got a little out of hand, and the resulting image was not quite what we’d hoped for, as the model ended up completely unclothed, even though she was still dressed in our prompt!

As with DALL-E, with its rather thin models, it didn’t take long for Stable diffusion 1.5 to reveal a very libertine bias linked to its learning model, or to its algorithm without filters?

The fact that the image is framed at navel level is no censorship on our part, since we had chosen a landscape format just before launching the rendering.
Having carried out other tests that are not publishable on this blog, we can confirm that this AI has no taboos and knows “the origin of the world” very, very well!
What’s more, we’ve noticed that on the getimg.ai website, which features Stable Diffusion’s rendering engine, the term “nude” is automatically added to the negative prompt: a sure-fire way of thwarting the excesses of an AI that’s a little too wild?

A slightly disappointing language model

Stable diffusion offers the “CFG Scale” setting, which lets you adjust the level of fidelity (between 1 and 12) of the image rendering in relation to the prompt.
In view of the many misunderstandings or runaways encountered by the AI during this test, we always set it to high values.
Despite this, the AI fails to understand or interpret a large number of requests, including some that seem simple.

It ranks last in our ranking of prompts comprehension, below DALL-E 3, Midjourney and even Adobe’s Firefly.

High-quality photorealistic rendering

It’s about time we got our virtual photo shoot back on track, so that we can compare it with the other text-to-image Artificial Intelligences already tested on this blog.
We adjust our prompt more explicitly:
“…Asian model dressed in a cherry blossom dress. She wears a pagoda-shaped necklace. The studio background is dark with shadows of foliage.”

Black and white Asian portrait created with IA Stable Diffusion

Since the start of this test, Stable diffusion has been unable to create most of the backgrounds described in the prompts, and here again, none of the requested shadows are created.

If, like Firefly beta, the AI fails to create a pagoda-shaped piece of jewelry, it does not systematically add earrings (not requested) as Adobe’s AI did.
DALL-E 3, once again, remains well ahead of the pack in terms of its ability to meet prompts with precision.

We “japanize” the background by correcting our prompt: “the background is dark with cherry branches“.

4 color portraits created with Stable Diffusion

The photo realism is always of good quality, and Stable Diffusion manages to integrate the cherry branches quite harmoniously.

On the other hand, the AI has once again started generating color images for no reason.
We’re going to struggle for several minutes to regenerate black and white images.

By repeating “black and white” several times in our new prompt, we manage to obtain new black and white renderings (but still with unsightly colorations that we removed in post-production).

4 black and white portraits with Stable Diffusion

The results are quite natural for a studio shooting simulation. So we conclude our test with these fairly satisfactory images (the most successful illustrates this article).

We weren’t able to drive our studio shooting simulation with sufficient precision, as we would have liked.
Unlike other AI tests, we forgot to finish the session with a Japanese model, but we’ve done so with the next image!

Stable Diffusion 1.5, an AI with limited IQ?

Japanese portrait with AI Stable diffusion

The strong point of Stable Diffusion 1.5 is that it produces quality photo renderings, with no notable body construction defects (out of the fifty or so images taken during our tests).
The models are varied and their expressions quite natural, perhaps a little less so than Firefly beta, but a little more “normal” than the stereotypical skinny models created by DALL-E 3.

In terms of photorealistic rendering, Stable Diffusion competes with Midjourney for certain images, and has our preference in the final renderings of this test over the overly stereotyped DALL-E and the sometimes imprecise Firefly.
It’s a frustrating AI, because although it can produce beautiful images, it practically only does what it wants to!

The free version of Stable Diffusion offers a slightly lower resolution than the competitors tested in this blog, but we feel that it offers a good rendering simulating skin grain.

The major weakness of this AI, if you want to have any control over the shooting, is its limited comprehension capacity, which means that it barely respects the indications contained in the prompts.

The negative prompt seems a very good idea for correcting a rendering, but it doesn’t always work miracles, due to the AI’s lack of understanding.

During our tests:

She generated color in black-and-white images, then started producing color images for no reason.
She rarely respected the framing indicated in the prompt, nor the positioning of the model requested in the image.
She had great difficulty creating the backgrounds described and giving realism to the settings.
She didn’t take into account the age criteria (30, then 40) that we had indicated.
She drifted towards uncontrolled images.

Read our other text-to-image AI tests:
A virtual studio portrait session with Midjourney
Studio photo portraits with OpenAi’s DALL-E 3 AI
A virtual portrait session with Adobe Firefly AI

Stable Diffusion web interface

Access to Stable Diffusion’s AI is possible via various sites that use its tools, and a version can even be installed on your computer.
The 1.5 version tested here from this Stable diffusion site is easy to learn.

This free version is no slouch when it comes to the number of renderings, authorizing around a hundred per day.
But it limits image generation to a maximum format of 768 x 768 pixels.

The first field allows you to enter your prompt.
A second field lets you specify a negative prompt to remove unwanted elements.
The “Sampling Steps” slider lets you choose the number of passes performed to generate an image, and therefore its final quality.
The “Restore” checkbox on the front side prevents bugs in face rendering.
6 types of rendering are possible (Euler and DDIM are those used in our test), some of which are more illustrative than photorealistic.
Two sliders, “Width” and “Height”, let you choose the size of the image produced.
The “Seed” field lets you enter a number to regenerate an image several times in a very similar way (the default value -1 generates a random image each time).
The “CFG Scale” slider lets you choose the level of fidelity of the rendering in relation to the prompt, leaving more or less creativity to the AI.

By A·A / 29 October 2023

Best-of, Portrait • Artificial intelligence, Portrait studio