'The Cat Is Out of the Bag': As DALL-E Becomes Public, the Possibilities — and Pitfalls — of AI Imagery

Sep 26, 2022

Updated Oct 15, 2024

Save Article

Failed to save article

Please try again

An illustration of three purple kittens wearing snorkeling masks. The illustration is in a style reminiscent of Pablo Picasso's artwork.

"Purple kittens snorkeling in the style of Picasso," an image generated by OpenAI's DALL-E 2. (Image via OpenAI)

Update, 11:50 a.m. Wednesday: DALL-E was previously only accessible to the public via invite, but the AI generator has now been made available to all without a waitlist. Read more about signing up to use DALL-E here.

Original post, 2:10 p.m. Monday: Think of something, type it in, and wait for it …

Imagine an image generated from your thoughts and words in just minutes, all ready for the world to see. That’s what OpenAI’s DALL-E — a neural network designed to convert text descriptions into images — and its newest version, DALL-E 2, is for.

This artificial intelligence-powered image generator seems futuristic and fun, but is it necessary? More importantly, is it safe? And how can technologies like these handle things like online disinformation, bias, violent content and other harms?

In a recent episode of KQED Forum, host Mina Kim hosted a conversation on the possibilities and pitfalls of this technology with:

Lama Ahmad, policy researcher from DALL-E creator OpenAI;
Hany Farid, deepfake expert at UC Berkeley.

The following interview has been edited for length and clarity.

MINA KIN: What is DALL-E, and how does it work?

LAMA AHMAD: DALL-E is an AI that basically does text-to-image. So someone can type in a prompt and the output would be anything you can imagine. It learned from pairs of lots and lots of images, and lots and lots of captions, and basically is able to construct those concepts entirely from scratch. Even though it learned from existing images, words and language, it will be able to construct what that image would look like.

You can also use Inpaint to select a region of an image and modify it also using words. And you can also create variations on existing images, or images generated by the software itself.

The images can look hyperrealistic, fantastical or abstract, if that’s what you would like. Sort of all in the words that you use and how you craft your prompts.

You can also upload images to DALL-E. That goes to that Inpainting functionality where you can edit regions of an image. That being said, you cannot upload images of people, and that was a concerted effort on the part of OpenAI to really combat misinformation and disinformation related to people.

It was really important to OpenAI that we are putting out the AI that is most helpful, and most beneficial to people. And at the end of the day, uploading images of people was not something we wanted to allow.

DALL-E doesn’t allow uploading images of people. But, how can DALL-E be used in a malicious way?

LAMA AHMAD: Before we even deployed DALL-E, we worked with researchers at lots of universities, academic institutions, civil society organizations that have been thinking deeply about these problems. Not just disinformation, but things like bias, things like representations of harmful imagery around children, violent content — a whole host of things.

And so we really thought about this, all the way from training the model and what sorts of images we include, all the way up into the development of the system, and how it would be presented to the world.

Including external researchers was a really important part of that because it’s not just people at OpenAI who are thinking about it, it’s the minds of lots of great people that helped us to develop our mitigations and make sure that they were working.

What benefit does DALL-E provide?

LAMA AHMAD: I think some of the exciting things that we’ve seen happen with DALL-E are changes in how artists are doing their work, a new genre of art, and the emerging accessibility of creating images. Not everyone has the skills to do Photoshop. I certainly don’t.

Being able to make those kinds of things accessible to a broad range of people is something that we hope will be unlocked as a benefit of models like DALL-E. But it’s certainly something that we think about, like, how do you deploy? Or should you even deploy at all?

Are the safeguards really good that DALL-E puts in?

HANY FARID: They’re not perfect.

Shortly after DALL-E was released, another version of this image synthesis was released called Stable Diffusion, with zero guardrails. You can put in anybody’s name, create violent or sexually explicit material — no rules. And it’s completely open-source. It took us all of an hour to get one running up on one of our computers, and all of the safeguards — that OpenAI very thoughtfully put in — in some ways don’t matter anymore.

We’ve seen three very different approaches to this technology. We’ve been talking about OpenAI’s thoughtful methods with safeguards; I’ve mentioned Stable Diffusion with zero safeguards; Google has a version of this and they refuse to release it. I couldn’t even get access as a forensic scientist saying I wanted to just do forensic analysis. Google’s rationale was they are concerned about misuse and they are not releasing it.

And so now you’ve seen three very different approaches to this. But of course, it’s the lowest common denominator that matters. And the lowest common denominator is, no rules.

Can you talk a little bit about regulating tools like this?

HANY FARID: What we spend most of our time doing here in my lab is, somebody sends me an image or a video, and we try to determine if it’s real or fake.

Another way to think about this problem is that when you record an image or a video at the point of recording, you can authenticate. And there’s a really nice initiative called the C2PA, the Coalition for Content Provenance and Authenticity, building a system that would allow your device so that when I take an image or a video or audio, it will authenticate the date and time, the location, who took it, and all the pixels are authentic so that when that piece of content makes its way onto the internet, then I can trust that it has been authentic. And I like that technology a lot because it’s here, it works, and it will scale. But it doesn’t mean we can simply ignore all the other problems.

But I think it’s part of a larger solution that includes education, government regulation and technology. And also, more corporate responsibility. OpenAI has done a good job, but I don’t think you can say that all companies have the same thoughtfulness that we have seen coming out of OpenAI.

What are the dangers of tools like DALL-E in cases like predictive policing, facial recognition and surveillance regimes?

HANY FARID: I think that a legitimate point to make is, how are these systems being trained? And also is there bias in the data?

So for example, in policing, if we use historical data, that historical data is biased against people of color. And if you simply train an AI algorithm, you are simply going to repeat history. So I think that it’s a legitimate and reasonable concern to have, both from a bias perspective, but also from a fairness perspective.

California does have two laws about deepfakes. One allows anyone whose images are nonconsensually used in pornography to sue, and the other prohibits the malicious use of deepfakes for a candidate for office. What do you think of those?

HANY FARID: There are some problems with the laws. You have to prove intent to harm. It’s not just that it’s illegal to create it. You have to prove that it was in your intent to harm somebody. And proving intent is very, very difficult.

And also, this works within the borders of California. What happens when this is coming from wherever — Romania? Russia? China? We have no ability to litigate this here. Once those images are on the internet, they don’t come down and the harm is done and maybe you’ll have some retribution a year, 10 years from now. But it doesn’t really deal with the problem.

I think also the deepfake bill on politics is problematic, because it’s very narrowly tailored. There are all kinds of things you have to show. And again, within the California borders, I think they were well-intentioned. But I think they are largely impotent.

You’ve talked about “the liar’s dividend.” Can you explain what that is?

HANY FARID: The thing I really worry about is that if we enter this world where any image, any audio, any video, any tweet, any article can be fake, well, then — nothing has to be real. We can simply dismiss things that we don’t like or agree with. And now we are living in a completely alternate universe relative to those around us.

You see this playing out on a regular basis. For example, in 2015 Trump gets himself in trouble for the act of saying awful things about women on the “Access Hollywood” tape. A year and a half later, deepfakes are now on the horizon — they come into our vocabulary. He’s asked about the audio and he says it’s fake.

Now we are living in alternate realities, the so-called “liar’s dividend.” And now we have a double-fisted weapon. I can create fake content and I can deny reality by using exactly the same specter of that technology.

What about democratizing access to knowledge?

HANY FARID: I think the paradox of the internet was that the idea was to democratize access to knowledge and information and to wrestle away from the handful of publishers the ability to publish. And therefore, that was a big leap. Now the world would be better off.

But of course, what we’ve done is we’ve just traded off who controls information. It went from CBS, ABC, NBC Nightly News to Facebook. And if you asked me which one I would prefer, I would rather go back to the nightly news. Why? Because they have editorial standards.

There are now five tech companies that control the internet, and I think that has been to the detriment, because they have not brought in the editorial standards and the journalistic standards that, although they are not perfect, the mainstream media has.

In which ways do you see this technology being used for good?

HANY FARID: I’m not 100% convinced that we should have done this stuff. And I’m not referring specifically to DALL-E. I’m referring to the general world of synthetic media.

If you go back for decades now, the computer-graphics community has developed technology that allows for really cool special effects and Hollywood studios. And now we’ve just democratized access to that.

I think it’s fun and creative. But are those outweighed by the downsides, and if so, by how much? And if so, then what do we do about it?

I admit that I have a biased worldview because I come at it from the other side. I see more on the downside than the upside. That’s not to say that we can’t mitigate the harm. But the reality today is the cat is out of the bag. We’re not putting this technology back in, and we’re going to have to start to get more serious about dealing with and mitigating the harm that is coming from these types of technologies.

News Daily Newsletter

The Bay

Emma’s Must-Sees

Videos from KQED Live

Donor-Advised Funds

'The Cat Is Out of the Bag': As DALL-E Becomes Public, the Possibilities — and Pitfalls — of AI Imagery

Successfully subscribed.
Browse more newsletters

News Daily Newsletter

The Bay

Emma’s Must-Sees

Videos from KQED Live

Donor-Advised Funds

'The Cat Is Out of the Bag': As DALL-E Becomes Public, the Possibilities — and Pitfalls — of AI Imagery

Successfully subscribed. Browse more newsletters

Successfully subscribed.
Browse more newsletters