On May 13, OpenAI unveiled a new GPT-4o AI model to power its ChatGPT chatbot. The newest version is wildly capable and much more humanlike, with the ability to solve equations, tell bedtime stories, and claims to identify emotions from facial expressions.
OpenAI has made a big deal about wanting to make its tools available to everyone for free. But experts say GPT-4o’s turbocharged capabilities widen the amount of information that can be potentially swept up by OpenAI, heightening concerns about privacy.
The firm has a spotty track record in the area. After it launched in 2020, a technical paper revealed how millions of pages scraped from Reddit posts, books, and the broader web were used to create the generative AI text system, including personal data you share about yourself online. This led to ChatGPT falling foul of data protection regulators in Italy, resulting in a temporary ban in the country last year.
Not long after the launch of GPT-4o, an initial demo of the macOS ChatGPT desktop app indicated the chatbot was potentially able to access a user’s screen. And in July, the same app came under fire again when it emerged that a worrying security issue made it easy to find chats stored on your computer and read them in plain text.
OpenAI quickly issued an update that encrypts the chats, but with this level of scrutiny on the company and GPT-4o, it’s easy to see why people are worried about privacy. How private is the newest iteration of ChatGPT? Is it worse than previous versions? And is there any way to lock it down?
On the face of it, OpenAI’s privacy policy does show a large amount of data collection, including personal information, usage data, and content provided when you use it. ChatGPT uses the data you share to train its models, unless you turn it off in the settings or use the enterprise version.
OpenAI is quick to say in its privacy policy that individual data is “anonymized,” but the approach on the whole seems to be “take everything now and sort it out later,” says Angus Allan, senior product manager at digital consultancy CreateFuture, which advises firms on ways to use AI and data analytics. “Their privacy policy explicitly states they collect all user input and reserve the right to train their models on this.”
The catch-all “user content” clause likely covers images and voice data too, says Allan. “It’s a data hoover on steroids, and it’s all there in black and white. The policy hasn’t changed significantly with GPT-4o, but given its expanded capabilities, the scope of what constitutes ‘user content’ has broadened dramatically.”
OpenAI’s privacy policies are clear that ChatGPT does not have access to any data on your device beyond what you explicitly input into the chat. However, by default, ChatGPT does collect lots of other data about you, says Jules Love, founder at Spark, a consultancy that advises companies on how to build AI tools including ChatGPT into their workflows while addressing data privacy. “It uses everything from prompts and responses to email addresses, phone numbers, geolocation data, network activity, and what device you’re using.”
Open AI says this data is used to train the AI model and improve its responses, but the terms allow the firm to share your personal information with affiliates, vendors, service providers, and law enforcement. “So it’s hard to know where your data will end up,” says Love.
OpenAI’s privacy policy states that ChatGPT does collect information to create an account or communicate with a business, says Bharath Thota, a data scientist and chief solutions officer of analytics practice at management consulting firm Kearney, which advises firms on managing and using AI data to power new revenue streams.
Part of this data collection includes full names, account credentials, payment card information, and transaction history, he says. “Personal information can also be stored, particularly if images are uploaded as part of prompts. Likewise, if a user decides to connect with any of the company’s social media pages like Facebook, LinkedIn, or Instagram, personal information may be collected if they’ve shared their contact details.”
OpenAI uses consumer data like other big tech and social media companies, but it does not sell advertising. Instead, it provides tools—an important difference, says Jeff Schwartzentruber, senior machine learning scientist at security firm eSentire. “The user input data is not used directly as a commodity. Instead, it is used to improve the services that benefit the user—but it also increases the value of OpenAI’s intellectual property.”
Since its launch in 2020 and amid criticism and privacy scandals, OpenAI has introduced tools and controls you can use to lock down your data. OpenAI says it is “committed to protecting people’s privacy.”
For ChatGPT specifically, OpenAI says it understands users may not want their information used to improve its models and therefore provides ways for them to manage their data. “ChatGPT Free and Plus users can easily control whether they contribute to future model improvements in their settings,” the firm writes on its website, adding that it does not train on API, ChatGPT Enterprise, and ChatGPT Team customer data by default.
“We provide ChatGPT users with a number of privacy controls, including giving them an easy way to opt out of training our AI models and a temporary chat mode that automatically deletes chats on a regular basis,” OpenAI spokesperson Taya Christianson tells WIRED.
“`html
The firm says it does not seek out personal information to train its models, and it does not use public information on the internet to build profiles about people, advertise to them, or target them—or to sell user data.
OpenAI does not train your models on audio clips from voice chats—unless you choose to share your audio “to improve voice chats for everyone,” the Voice Chat FAQ on OpenAI’s website notes.
“If you share your audio with us, then we may use audio from your voice chats to train our models,” Open AI says in its Voice Chats FAQ. Meanwhile, transcribed chats may be used to train models depending on your choices and plan.
In recent years, OpenAI has enhanced transparency about data collection and usage “to a degree,” providing users with clear options to manage their privacy settings, says Rob Cobley, commercial partner at law firm Harper James, which offers legal advice on data protection matters. “Users can access, update, and delete their personal information, ensuring they have control over data.”
“`
The easiest way to keep your data private is to go into your personal settings and turn off data collection, says Love.
Allan recommends that “almost everyone” take a few minutes to opt out of model training as soon as they can. “This doesn’t remove your content from their platform, but it means it can’t be used to train future models, where there is a theoretical risk of your data leaking.”
To do this, go to Settings, Data Controls, and turn off “Improve the model for everyone.”
You can also stop OpenAI collecting your data by using a “Temporary Chat” each time you use it. Click on ChatGPT in the top left, and toggle on Temporary Chat from the bottom of the list.
However, limiting data collection does reduce functionality. “It will not remember anything from your previous chats, so your answers will typically be more generic and less nuanced,” says Love.
From the ChatGPT web interface, users can delete their chat history, add personalized instruction to help support privacy, control any shared links, make a request to export data, and delete the account, says Schwartzentruber. “For extra security, you can also add multifactor authentication and the ability to log out of all devices.”
There are other things to look out for to safeguard your privacy when using ChatGPT. For example, you can unknowingly make sensitive data available through the use of Custom GPTs.
You can also manage your interaction data by being selective about the content you share with ChatGPT-4o in the first place. The challenge is the trade-off between maintaining privacy and optimizing your experience, says Schwartzentruber. “If you restrict data sharing when using ChatGPT, key aspects of your interaction with the AI may be affected. That can mean reduced personalization and less accuracy and relevance, as the AI draws on a more limited set of generic algorithms.”
Update 7/31/24 8:40am ET: This story has been updated to clarify that GPT-4o is the underlying AI model that powers ChatGPT.