The Art Of Conversation: Creating Masterworks With ChatGPT

The latest update to ChatGPT has brought about another revolutionary change in the way we interact with computers, but you’d be forgiven for not noticing. The change seems small, at first, but it offers an early peek at what many are predicting the future of creative work looks like; a new way of working collaboratively with multiple AIs, which brings together two OpenAI products under one roof; not just ChatGPT, but the newly released upgrade to its text-to-image generative AI model, DALL-E 3. 

We should note, this service is only available to OpenAI’s ChatGPT Plus and Enterprise subscribers, and unfortunately seems unlikely to be available to free users any time soon. As always, if you have access, you’ll be able to enable the DALL-E 3 mode from the dropdown menu on the ChatGPT GPT-4 page. But what does it do differently?

WELCOME TO META-PROMPTING

Until now, text-to-image AIs have all worked operated fundamentally through some variation of prompting; you provide your instruction as a text prompt, and the AI interprets that prompt directly, attempting to create an image that best fits the prompt. As these services have grown they’ve added impressive new tools such as the ability to read images, or inpainting and outpainting tools, but the fundamentals remain the same for all: the user inputs a text prompt, which is then recreated as an image.

The Art Of Conversation: Creating Masterworks With ChatGPT

SPYSCAPE
Share
Share to Facebook
Share with email

The latest update to ChatGPT has brought about another revolutionary change in the way we interact with computers, but you’d be forgiven for not noticing. The change seems small, at first, but it offers an early peek at what many are predicting the future of creative work looks like; a new way of working collaboratively with multiple AIs, which brings together two OpenAI products under one roof; not just ChatGPT, but the newly released upgrade to its text-to-image generative AI model, DALL-E 3. 

We should note, this service is only available to OpenAI’s ChatGPT Plus and Enterprise subscribers, and unfortunately seems unlikely to be available to free users any time soon. As always, if you have access, you’ll be able to enable the DALL-E 3 mode from the dropdown menu on the ChatGPT GPT-4 page. But what does it do differently?

WELCOME TO META-PROMPTING

Until now, text-to-image AIs have all worked operated fundamentally through some variation of prompting; you provide your instruction as a text prompt, and the AI interprets that prompt directly, attempting to create an image that best fits the prompt. As these services have grown they’ve added impressive new tools such as the ability to read images, or inpainting and outpainting tools, but the fundamentals remain the same for all: the user inputs a text prompt, which is then recreated as an image.

The ChatGPT/DALL-E collaboration introduces a radical change to this workflow that, if you’re not looking closely, is hard to spot. The process of turning ChatGPT inputs into DALL-E 3 outputs seems identical to every other text-to-image workflow; the user types a prompt in a text box, and four images pop out at the other end. What’s changed is the bit in between, where ChatGPT now acts as a middleman - middlebot, if you prefer - between you and DALL-E 3, changing your instructions to make them more easily digestible for DALL-E’s delicate palette. This is called meta-prompting: the act of prompting an AI for prompts to give to an AI.

A DALL-E 3 attempt at illustrating the new workflow. Artist robots require three arms for practical reasons.

A PEDANTIC COLLABORATOR

As is often the case with chatbots, there’s not a great deal of transparency - and no documentation - about the inner workings of this process, so much of the information on how this works comes from ChatGPT itself. Two important caveats are therefore needed here; firstly, while we have no more reliable source, ChatGPT is far from reliable, and secondly, AI behaviors are likely to change over time. You may find asking ChatGPT these questions yourself generates different answers!

We asked ChatGPT how it interprets user prompts, and the short version of its response involves three steps. Firstly, it identifies the key visual elements requested, such as subject, composition, lighting, and background, just like any other text-to-image service. The second stage is more novel, however; ChatGPT will edit the text of your prompt to make it more palatable for its intended audience. For example, our sample prompt was “condensed” to remove “redundant and overly verbose phrasing.” This is frankly rude, but our wounded pride aside, it’s fascinating to see how ChatGPT fashions natural language prompts for other AIs to work with. This collaborative approach continues in the third stage of the process, where the bot attempts to ensure that the intent of your original prompt is retained.

Incredible results from the prompt: "image of a human and a robot collaborating at a painting"

PAINTING WITH QUESTIONS

This collaborative approach is both a blessing and a curse. Over the last few months, the established text-to-image services have added a wide range of tools and toys to augment the basic prompting process, and you’ll get none of these with ChatGPT.  There’s no inpainting or outpainting, and at the time of writing no ability to use reference images. You’re restricted to telling ChatGPT what you want, and then hoping that it will handle its end of the bargain. 

Fortunately, the odds of success seem to be very high. The first thing to note is that you can give extremely short prompts and ChatGPT will flesh them out into fully fledged instructions, adding creative elements in an attempt to generate the best prompt possible. The prompt we provided for the main image in this week’s newsletter, seen above, was simply “image of a human and a robot collaborating at a painting.” ChatGPT converted that to “Photo of a young Asian woman and a sleek modern robot, both holding paintbrushes and working on a large canvas. They are surrounded by paint pots and an easel, with the center of the canvas capturing their collaborative art piece.” DALL-E 3 then took those instructions and ran with them, dropping in a perfectly realized optical illusion of the robot painting itself. Not bad from an eleven word prompt!

What’s more, when the creative process requires a little more human input, ChatGPT’s collaborative strengths come to the fore. The most impressive use of meta-prompting comes when you don’t get the results you need first time; the ability to use natural language instructions to clarify your intent - and ChatGPT’s determination to understand it - gives you far greater odds of a successful output than with its brittle and idiosyncratic rivals. That said, it’s an all-or-nothing approach. Midjourney, Firefly and the rest of the gang all allow for some degree of inpainting, allowing for minor alterations to largely successful images, but if you don’t like a ChatGPT image, there’s no saving it without resorting to other software. Any attempt to recreate the image itself in ChatGPT is doomed to failure, you can only reroll the prompt. 

As always, we feature an animal in a pointillist style; this time, a cat.

KEEPING IT GENERAL

One other novel feature of the meta-prompting approach is that it allows OpenAI to protect  copyright in an unusual but highly effective - and certainly controversial - way. ChatGPT acts as a filter between the user’s request and the DALL-E 3 model, and its eagerness to translate the user’s intent means it will accept prompts that DALL-E 3 would reject if received directly. The most glaring example is the work of living artists; OpenAI’s website clearly states that “DALL·E 3 is designed to decline requests that ask for an image in the style of a living artist” but ChatGPT is not. We tested the system with several distinctive contemporary artists, and ChatGPT converted our barefaced requests for copyright infringement without hesitation. For example, we prompted for “an image of a cat in the style of Yayoi Kusama”, which led to the prompt “Oil painting of a cat lounging in a room filled with infinite mirrored reflections. The entire scene is dotted with vibrant, colorful polka dots, reminiscent of avant-garde art from the 1960s. The repetitive patterns create a sense of endlessness and immersion.” ChatGPT refers to the generated outputs as “images inspired by Yayoi Kusama’s polka dot style”, which is both legally careful language and painfully reductive. 

Artistic copyright is a thorny issue which is well beyond the scope of this article, but we would like to offer one tip for those seeking to add stylistic variation to their images in a slightly more ethical way. WIkipedia has a list of several hundred art movements, and referencing these will have exactly the same effect as referencing the artists that comprised them. 

Read mORE

RELATED aRTICLES

This story is part of our weekly briefing. Sign up to receive the FREE briefing to your inbox.

Gadgets & Gifts

Put your spy skills to work with these fabulous choices from secret notepads & invisible inks to Hacker hoodies & high-tech handbags. We also have an exceptional range of rare spy books, including many signed first editions.

Shop Now

Your Spy SKILLS

We all have valuable spy skills - your mission is to discover yours. See if you have what it takes to be a secret agent, with our authentic spy skills evaluation* developed by a former Head of Training at British Intelligence. It's FREE so share & compare with friends now!

dISCOVER Your Spy SKILLS

* Find more information about the scientific methods behind the evaluation here.