Seven months ago, when Jason Allen used a new kind of generative artificial intelligence (AI) program to win an art competition in Colorado, he was immediately accused of cheating.
At the time, few had used these tools, and few understood what he had done. Text-to-image programs like Midjourney or Stable Diffusion were weeks old. ChatGPT hadn’t been launched.
In the period since, a remarkable change has taken place. The tools have become widely popular, and an industry has sprung up around how to best use them.
Rather than “cheating”, this skill goes by another name: “prompt engineering.”
We want to hear from you:
Already, jobs sites feature ads for prompt engineers, while Australian marketing companies are asking for “know-how on prompting ChatGPT”. Many experts say knowing how to talk to an AI program to get the best results will be a basic skill, like email or googling.
This new-found appreciation of prompt engineering skills is colliding with interpretations of copyright law. US authorities recently rejected Mr Allen’s application to copyright his prize-winning image, saying “it does not contain any human authorship”.
“I thought we were going to be basking in the celebration of AI,” said Mr Allen, based in Colorado, about the creation of this image.
“Instead, I’m kind of like the guy that opened Pandora’s Box.”
So, what is prompt engineering?
And how in-demand will this new skill become?
Learning the words of power
Most AI tools available now either generate text, such as ChatGPT, or images, such as Midjourney.
A few others do text-to-song, text-to-video, and so on.
Bing Chat AI is an exception to this. It is multi-modal, meaning it can generate both text and images from a text prompt.
First up, let’s focus on text-to-image generation.
To understand how to communicate with the AI program, we need to know a little bit about how it works.
Text-to-image AI programs have been trained on a vast number of images scraped from the internet, along with any associated text.
Images scraped from, for instance, a forum for the Unreal gaming engine, will be automatically tagged with “Unreal” (along with other words, such as their caption).
Because many images have this tag (Unreal is a popular engine, so there are lots of images), the word “Unreal” has “weight” within the AI-training dataset.
This weight makes the word a powerful prompt for the AI program, generating a certain predictable aesthetic.
The same is true for any words related to cinematography, art direction, graphic design and art criticism.
However, there’s no definitive guide or manual to what prompts work, or have what effect. Instead, learning how to speak to the AI is a process of trial and error.
In the months since programs like Midjourney became available, communities devoted to unpicking and sharing the secrets of AI prompts have sprung up online.
One of the largest of these is PromptHero. It started in September and has 150,000 users, of which 10,000 are active.
“I spotted this problem that when you first try to get something done with this, the first thing you usually do is pretty bad,” said its co-founder Javier Ramirez, based in Portugal.
“You need to prompt in the right way to get a high-quality output.”
He referred us to one of the community’s members, a man who lives in the US Midwest and prefers to be identified by his PlatformHero profile, JHawkk.
The image above, for instance, was made in Stable Diffusion with a string of 15 prompt phrases, from “analog style” to “cyberpunk”.
JHawkk also used 31 negative prompts, describing what the image should not contain, from “disgusting” to “poorly drawn feet”.
The trick to AI art was knowing the right words, JHawkk said. Like an engineer translating a design into mathematical figures, he converts the discrete aesthetic elements of an image (“ray tracing”, “rim lighting”) into the peculiar language of the model.
“Sometimes you see an image, and so you start to break apart the images into smaller phrases,” he said.
“Essentially, it’s how you might describe that image and especially in a way that the actual model itself could interpret that.”
The making of Théâtre D’opéra Spatial
Jason Allen said he will “never” share the string of prompts that generated his prize-winning image, Théâtre D’opéra Spatial, but was willing to talk about the process of creation.
As with JHawkk, he started by learning the right words.
“I wanted to create a cinematic scene like you might see out of a movie,” he said.
“So I’m going online and finding a keyword dictionary of everything related to cinematography. I’m essentially learning to be a cinematographer.”
He spent “weeks” testing different aesthetic elements in Midjourney, until he was confident he could accurately reproduce the images he imagined.
“We’re looking for results coming out of your mind and seeing it in the final piece.”
Then it was time to introduce the subject.
“I was in a hypnagogic state, I was half asleep, half dreaming these women in Victorian dresses wearing space helmets.”
He wanted to combine the fashion of 19th-century England’s Victorian era and the romance and melodrama of a Star Wars-like space opera.
He then made many variations on this theme by tweaking the prompts.
All up, he spent roughly 80 hours working on his entry.
Mr Allen has hired a lawyer and appealed the US Copyright Office’s decision to not award copyright, arguing that it didn’t understand AI is “just a tool”.
“What are you saying? Is it a person? Because it’s not a person. I’m the person,” he said.
“Stop throwing the users under the bus.
“We all have our creative dreams. We all have our ideas. Without that, the AI is nothing.”
AI-whisperer, salary $US335,000
While prompting for image generators involves mixing together thematic elements in an unpredictable way, prompting for text generation is very different. The focus is more on giving a very clear set of instructions.
In London, the law firm Mishcon de Raya is after a prompt engineer, able to “design and develop high-quality prompts for a range of legal and non-legal use cases”.
Nick West, its chief strategy officer, said one “use case” might be analysing contracts.
“The combination of the breakthroughs of GPT4 and the chat bit of ChatGPT suddenly make this a very attractive space.
“I do believe we will be able to do stuff that we wouldn’t otherwise have been able to do.”
But first, the firm needs a prompt engineer.
“It turns out there are better ways of writing the prompt and less good ways of writing that prompt.
“That’s what prompt engineering is.”
In San Francisco, tech company Anthropic is offering a salary of up to $US335,000 ($500,000) for a prompt engineer.
“It’s a hybrid between programming, instructing, and teaching,” the ad reads.
Matt Bell, a member of Anthropic’s technical staff, said the company had received more than 1,000 responses.
Although coding knowledge and a background in machine learning were helpful, “prompting itself does not require a coding background,” he said.
“In fact, our best prompter is a philosopher.
“Good prompting involves writing extremely clear explanations … and figuring out reasons for misunderstandings and how to prevent them.”
How to speak to ChatGPT or Bing Chat AI
The key to good prompt engineering was breaking a complex task into a set of simple ones, said Lingqiao Liu, a senior lecturer at the University of Adelaide’s Australian Institute for Machine Learning (AIML).
Programs like ChatGPT have a tendency to slip into what he calls an “easy pattern”. If you ask it a simple question (known as a “zero-shot prompt”), it will typically respond with a “vanilla” answer, lacking detail or structure.
Write an introduction to the cassowary.
The cassowary is a large, flightless bird that is native to the tropical forests of Australia and Papua New Guinea. It is considered one of the most dangerous birds in the world, due to its powerful legs, sharp claws, and aggressive behavior. With its striking appearance and unique characteristics, the cassowary has captured the attention of many people around the world. In this article, we will delve into the fascinating world of the cassowary, exploring its physical features, habitat, behavior, diet, and conservation status. Despite its fearsome reputation, the cassowary plays an important role in the ecosystem as a seed disperser and predator, and it is essential to protect this iconic species from threats such as habitat loss and hunting.
But there are ways to encourage it to do more.
“The general philosophy is we try and find a smarter way to let GPT explore more possibilities,” Dr Liu said.
One method is “one-shot prompting”. Give the AI quick explanation of what you need, by giving an example of a question-answer pair. When you ask the next question, the AI will use that answer template.
Question: What is a cassowary?
Quick stat: The cassowary is is considered one of the most dangerous birds in the world
Characteristics: They are large, with powerful legs and sharp claws.
Where do they live: The tropical forests of Australia and Papua New Guinea.
What do they eat: Cassowaries are omnivores, which means they eat both plant and animal matter.
Endangered status: The Southern Cassowary is listed as Endangered by the International Union for Conservation of Nature (IUCN). The Northern Cassowary is listed as Vulnerable.Question:
What is a kookaburra?
Quick stat: The kookaburra is a beloved bird native to Australia, known for its distinctive call that sounds like raucous laughter.
Characteristics: They are large, with powerful beaks and a stocky build.
Where do they live: Throughout Australia, primarily in eucalyptus forests and woodlands, as well as suburban areas where they often nest in tree hollows.
What do they eat: Carnivores, and their diet consists of insects, small mammals, reptiles, and birds.
Endangered status: The kookaburra is listed as a species of least concern by the International Union for Conservation of Nature (IUCN).
Another method is known as role prompting. Provide the AI with a role that matches the form of the answer you want to receive.
Explain the powerful owl to me in the style of David Attenborough.
Ah, the powerful owl, one of Australia’s most magnificent and imposing birds of prey. With its piercing yellow eyes and impressive wingspan, the powerful owl commands respect in the forests where it resides. This remarkable creature can grow up to 65 cm in length and weigh up to 1.4 kg, making it one of the largest owl species in the world.
The powerful owl is found throughout the eastern and southeastern parts of Australia, from northern Queensland to Tasmania. It prefers to make its home in tall eucalyptus forests, where it can easily hunt for its preferred prey of small mammals such as possums and gliders, as well as birds and insects. Despite its size and power, the powerful owl is also known for its gentle nature and the care it shows towards its young.
A third method is introducing a critical agent. You can ask ChatGPT to write, for instance, a story about a robot, and then ask it to criticise and rewrite the story using its own suggestions.
Criticise the following text and convince me why its not good. Let’s think about the problems with the text step by step.
Firstly, the text lacks an engaging opening. The writer could have started with a hook or a question to capture the reader’s attention.
Secondly, the description of the powerful owl’s physical attributes is somewhat bland and generic. While mentioning its size and weight is informative, the writer could have added more interesting details to make the description more engaging and captivating.
Thirdly, the writer could have included more information on the owl’s behaviour and lifestyle, such as their nesting habits or social behaviour. This would give the reader a more complete picture of the powerful owl’s life and make the text more interesting.
Lastly, the discussion of the owl’s endangerment and the need for conservation feels rushed and tacked on. The writer could have expanded on this point to explain the specific reasons for their decline and what steps are being taken to conserve them.
Overall, while the text provides some basic information about the powerful owl, it lacks engaging descriptions and detailed information that would make it more interesting and informative.
Great points. Rewrite the text and improve it based on your criticism.
The powerful owl is an iconic bird of prey in Australia, known for its impressive size and distinctive features. Growing up to 65 cm in length and weighing up to 1.4 kg, it is one of the largest owl species in the world. Its piercing yellow eyes and imposing wingspan command respect in the tall eucalyptus forests where it prefers to make its home. [answer continues]
A final strategy, “chain of thought”, is a way of encouraging the AI to reason through a complex problem by asking it to list the steps it took to reach its answer.
Instead of asking it to “write an introduction to Australian birds”, you’ll get better results by asking it to first “write an outline to an introduction” for this subject. Then, ask it to follow that outline.
Please follow the outline to write an introduction to Australian birds.
Sure, here’s an introduction to Australian birds based on the outline provided:
Australia is home to an incredibly diverse range of bird species, many of which are found nowhere else on Earth. With over 800 species of birds, Australia boasts a birdwatcher’s paradise. From majestic birds of prey like the wedge-tailed eagle and the powerful owl, to the unique flightless emu and the iconic kookaburra, there is no shortage of fascinating birdlife to discover. [answer continues]
Anna Bernstein, a full-time prompt engineer at the AI-powered copywriting service copy.ai, said her main tip was “clarity”.
“If you printed out your input, and handed it to Data from Star Trek, what would he do?” said Ms Bernstein, who’s based in New York.
“You need to pay attention to how you refer to things, label things, and give instructions.”
Another tip was, “Contain the main focus of your task within the verb itself.”
“Instead of ‘Rewrite this to be shorter,’ it’s more effective and consistent to say ‘Condense this.'”
Finally, avoid negative commands. Instead of saying, “Don’t be overly formal”, say “Keep it casual”.
“The AI tends to think of pink elephants when instructed not to,” Ms Bernstein said.
The rest of her know-how had to be kept secret.
“Unfortunately a lot of the discoveries I’ve made are [intellectual property], and can’t be shared.”
Will prompt engineering become a standard skill?
Some argue that prompt engineering will only ever be a specialist skill, and that it will become increasingly obsolete as AI tools grow better at interpreting human queries.
Knowledge of the HTML, for instance, is no longer necessary to make a basic website, like it was in the ’90s.
Others say prompt engineering will become a foundational skill for white-collar workers, and that using these tools will be folded into many day-to-day tasks.
Dr Liu’s position was somewhere between these two predictions: prompt engineering will be a standard skill, he said, but some people will be better at it than others.
For some, it would be a specialist skill, like being able to code.
“The key skillset for a programmer is actually not to write code. The important skill is how to decompose a complicated task into a sequence of several more tasks,” he said.
“This kind of skill will transfer to prompt engineering.”
Mishcon de Raya’s Mr West had a similar take on the future.
“Basic capability in GPT4 or other models is necessary to do the job in the same way that advanced capability to use Word or whatever else is necessary.
“But again, we’re right in the foothills here, no-one really knows.”