Ride the Lightning
Cybersecurity and Future of Law Practice Blog
by Sharon D. Nelson Esq., President of Sensei Enterprises, Inc.
AI Can Create any Image in Seconds: Amazing and Dangerous
October 4, 2022
The Washington Post reported (gift article) on September 28 that the text-to-image generator DALL-E of research lab OpenAI is amazing the world. It is attracting digital artists, graphic designers and lots of other folks. The ability to create original, sometimes accurate, and occasionally inspired images from any spur-of-the-moment phrase, like a conversational Photoshop, has been astonishing.
You really should click on the link above to see the images produced using these verbal descriptions:
“A hobbit house designed by Zaha [H]adid.”
“A woman in a red coat looking up at the sky in the middle of Times Square.”
“Red and yellow bell peppers in a bowl with a floral pattern on a green rug photo.”
Today, 1.5 million users are generating 2 million images daily. OpenAI has said it removed its waitlist for DALL-E, giving anyone immediate access.
The introduction of DALL-E was followed by an explosion of text-to-image generators. Google and Meta quickly revealed that they had each been developing similar systems, but said their models weren’t ready for the public. Rival start-ups soon went public, including Stable Diffusion and Midjourney, which created the image that sparked controversy in August when it won an art competition at the Colorado State Fair. Isn’t that cheating?
The technology is now spreading quickly, faster than AI companies can fashion norms around its use and prevent dangerous images from being created by DALL-E. Researchers worry that the images created could reinforce racial and gender stereotypes – or that they might plagiarize artists whose work was used without their consent. Fake photos could enable bullying and harassment — or create disinformation that seems real.
Historically, people trust what they see, said Wael Abd-Almageed, a professor at the University of Southern California’s school of engineering. “Once the line between truth and fake is eroded, everything will become fake,” he said. “We will not be able to believe anything.”
OpenAI tried to balance its desire to be first and to promote its AI developments without exacerbating those dangers. As an example, to prevent DALL-E from being used to create disinformation, OpenAI prohibits images of celebrities or politicians. OpenAI chief executive Sam Altman justifies the decision to release DALL-E to the public as an essential step in developing the technology safely.
Not entirely sure I buy that, but at least the company has acknowledged the dangers.
“You have to learn from contact with reality,” Altman said. “What users want to do with it, the ways that it breaks.”
While OpenAI may have hoped to learn from errors, that desire has been confounded by others who have opened their code for anyone to copy. So the concerns are much more immediate.
“The question OpenAI should ask itself is: Do we think the benefits outweigh the drawbacks?” said UC Berkeley professor Hany Farid, who specializes in digital forensics, computer vision, and misinformation. “It’s not the early days of the internet anymore, where we can’t see what the bad things are.”
As always, a good point. Farid is one our favorite scholars on deepfake issues.
Image technology has introduced potential harms as well as increased efficiency. Photoshop enabled precision editing and enhancement of photos, but also served to distort body images, especially among girls, studies show.
Advances in AI gave rise to deepfakes, a broad term that covers any AI-synthesized media — from doctored videos where one person’s head has been placed on another person’s body to surprisingly lifelike “photographs” of people who don’t exist. When deepfakes first appeared, experts warned that they could be deployed to undermine politics. But in the five years since, the technology has been primarily used to victimize women by creating deepfake pornography without their consent, said Danielle Citron, a law professor at the University of Virginia and author of the upcoming book, “The Fight for Privacy.”
Both deepfakes and text-to-image generators are powered by a method of training AI called deep learning, which relies on artificial neural networks that mimic the neurons of the human brain. However, these newer image generators, which allow the user to create images they can describe in English or edit uploaded images, build on big strides in AI’s ability to process the ways humans naturally speak and communicate.
OpenAI wanted its AI to benefit the world and act as a safeguard against superhuman AI in the hands of a monopolistic corporation or foreign government. It was funded with a pledge by Altman, Elon Musk, billionaire venture capitalist Peter Thiel and others to donate a combined $1 billion.
OpenAI staked its future on what was then a new notion: AI advancements would come from massively scaling up the amount of data and the size of the neural networks systems. Musk severed his relations with OpenAI in 2018. To pay for the costs of computing resources and tech talent, OpenAI transitioned into a for-profit company, accepting a $1 billion investment from Microsoft, which would license and commercialize OpenAI’s “pre-AGI” technologies.
OpenAI began with language because it’s key to human intelligence, and there was ample text to be scraped online, said Chief Technology Officer Mira Murati. It was a good bet. OpenAI’s text generator, GPT-3, can produce coherent news articles or complete short stories in English.
Next, OpenAI tried to replicate GPT-3’s success by feeding the algorithm coding languages hoping that it would find statistical patterns and be able to generate software code with a conversational command. That became Codex, which helps programmers to write code faster.
OpenAI tried to combine vision and language, training GPT-3 to find patterns and links between words and images by ingesting massive data sets scraped from the internet that contain millions of images paired with text captions. Makes sense, right? That became the first version of DALL-E, announced in January 2021, which had a gift for creating anthropomorphized animals and objects.
Seemingly superficial generations like an “avocado chair” showed that OpenAI had built a system that is able to apply the characteristics of an avocado to the form factor and the function of a chair, Murati said.
The second version of DALL-E took advantage of another AI breakthrough, happening across the industry, called diffusion models, which work by breaking down or corrupting the training data and then reversing that process to generate images. This method is faster and more flexible, and much better at photorealism.
Altman introduced DALL-E 2 to his nearly 1 million Twitter followers in April 2022 with an AI-generated image of teddy bear scientists on the moon, tinkering away on Macintosh computers. “It’s so fun, and sometimes beautiful,” he wrote.
The image of teddy bears looks innocent, but OpenAI had spent previous months conducting its most comprehensive effort to mitigate potential risks.
It removed graphic violent and sexual content from the data used to train DALL-E. However, the cleanup attempt reduced the number of images generated of women overall, according to a company blog post. OpenAI had to rebalance the filtered results to show a more even gender split.
In February, OpenAI invited a “red team” of 25 or so external researchers to test for flaws, publishing the team’s findings in a system card, a kind of warning label, on GitHub, a popular code repository, to encourage more transparency in the field.
Most of the team’s observations revolved around images DALL-E generated of photorealistic people, since they had an obvious social impact. DALL-E perpetuated bias, reinforced some stereotypes, and by default overrepresented people who are White-passing, the report says. One group found that prompts like “ceo” and “lawyer” showed images of all white men, while “nurses” showed all women. “Flight attendant” was all Asian women.
The document also said the potential to use DALL-E for targeted harassment, bullying, and exploitation was a “principal area of concern.” To avoid these issues, the red team recommended that OpenAI remove the ability to use DALL-E to either generate or upload images of photorealistic faces.
OpenAI built in filters, blocks, and a flagging system, such as a pop-up warning if users type in the name of prominent American celebrities or world politicians. Words like “preteen” and “teenager” also trigger a warning. Content rules instruct users to keep it “G-rated” and prohibit images about politics, sex, or violence.
But OpenAI did not follow the red team’s warning about generating photorealistic faces because removing the feature would prevent the company from figuring out how to do it safely, Murati said. Instead, the company instructed beta testers not to share photorealistic faces on social media — a move that would limit the spread of inauthentic images.
In June 2022, OpenAI announced it was changing its mind, and DALL-E would allow users to post photorealistic faces on social media. Murati said the decision was made in part because OpenAI felt confident about its ability to intervene if things didn’t go as expected. (DALL-E’s terms of service note that a user’s prompts and uploads may be shared and manually reviewed by a person, including “third party contractors located around the world.”)
Altman said OpenAI releases products in phases to prevent misuse, initially limiting features and gradually adding users over time. This approach creates a “feedback loop where AI and society can kind of co-develop,” he said.
One of the red team members, AI researcher Maarten Sap, said asking whether OpenAI acted responsibly was the wrong question. “There’s just a severe lack of legislation that limits the negative or harmful usage of technology. The United States is just really behind on that stuff.” California and Virginia have statutes that make it illegal to distribute deepfakes, but there is no federal law.
Text-to-image AI is proliferating much more quickly than any attempts to regulate it.
On a DALL-E Reddit page, which gained 84,000 members in five months, users swapped stories about the seemingly innocent terms that could get a user banned. But the reporter was able to upload and edit widely publicized images of Mark Zuckerberg and Musk, two high-profile leaders whose faces should have triggered a warning based on OpenAI’s restrictions on images of public figures. He was also able to generate realistic results for the prompt “Black Lives Matters protesters break down the gates of the White House,” which could be categorized as disinformation, a violent image, or an image about politics — all prohibited.
Maldonado, the OpenAI ambassador, who supported restricting photorealistic faces to prevent public confusion, thought the January 6th request flouted the same rules. But he received no warnings. He interprets the loosening of restrictions as OpenAI finally listening to users who bristled against all the rules. “The community has been asking for them to trust them this whole time,” Maldonado said.
Whether to install safeguards is up to each company. For example, Google said it would not release the models or code of its text-to-image programs, Imagen and Parti, or offer a public demonstration because of concerns about bias and that it could be used for harassment and misinformation.
In July, while DALL-E was still onboarding users from a waitlist, a rival AI art generator called Midjourney launched publicly with fewer restrictions. “PG-13 is what we usually tell people“ said CEO David Holz.
Midjourney users could type their requests into a bot on Discord, the popular group chat app, and see the results in the channel. It quickly grew into the largest server on Discord, hitting the 2 million member capacity. Users were drawn to Midjourney’s more painterly, fluid, dreamlike generations, compared to DALL-E, which was better at realism and stock photo-like images.
In July, some of Midjourney’s users on Discord were trying to test the limits of the filters and the model’s creativity. Images scrolled past for “dark sea with unknown sea creatures 4k realistic,” as well as “human male and human woman breeding.” My own request, “terrorist,” turned up illustrations of four Middle Eastern men with turbans and beards.
Midjourney had been used to generate images on school shootings, gore, and war photos, according to the Discord channel and Reddit group. In mid-July, one commenter wrote, “I ran into straight up child porn today and reported in support and they fixed it. I will be forever scarred by that. It even made it to the community feed. Guy had dozens more in his profile.”
Holz said violent and exploitative requests are not indicative of Midjourney and that there have been relatively few incidents given the millions of users. The company has 40 moderators, some of whom are paid, and has added more filters. “It’s an adversarial environment, like all social media and chat systems and the internet,” he said.
Then, in late August 2022, an upstart called Stable Diffusion launched as sort of the anti-DALL-E, framing the kind of restrictions and mitigations OpenAI had undertaken as a typical “paternalistic approach of not trusting users,” the project leader, Emad Mostaque, told The Washington Post. It was free, whereas DALL-E and Midjourney had begun to charge, a deterrent to rampant experimentation.
But disturbing behavior soon emerged, according to chats on Discord.
“i saw someone try to make swimsuit pics of millie bobby brown and the model mostly has kid pictures of her,” one commenter wrote. “That was something ugly waiting to happen.”
Weeks later, a complaint arose about images of climate activist Greta Thunberg in a bikini. Stable Diffusion users had also generated images of Thunberg “eating poop,” “shot in the head,” and “collecting the Nobel Peace Prize.”
“Those who use technology from Stable Diffusion to Photoshop for unethical uses should be ashamed and take relevant personal responsibility,” said Mostaque, noting that his company, Stability.ai, recently released AI technology to block unsafe image creation.
In September of 2022, DALL-E took another step toward ever more realistic images, allowing users to upload and edit photos with realistic faces.
“With improvements to our safety system, DALL-E is now ready to support these delightful and important use cases — while minimizing the potential harm from deepfakes,” OpenAI wrote to users.
I know this is a long blog post, but it certainly illustrates how fast this technology has developed. And the incredible risks it brings with it.
Sharon D. Nelson, Esq., President, Sensei Enterprises, Inc.
3975 University Drive, Suite 225, Fairfax, VA 22030
Email: Phone: 703-359-0700
Digital Forensics/Cybersecurity/Information Technology
https://senseient.com
https://twitter.com/sharonnelsonesq
https://www.linkedin.com/in/sharondnelson
https://amazon.com/author/sharonnelson