You can tell a lot about someone from their eyes. They can indicate how tired you are, the type of mood you’re in, and potentially provide clues about health problems. But your eyes could also leak more secretive information: your passwords, PINs, and messages you type.
Today, a team of six computer scientists are unveiling a new technique for attacking Apple’s Vision Pro mixed reality headset in which they could decode what users were typing on its virtual keyboard based solely on eye-tracking data. This method, named GAZEploit and exclusively reported by WIRED, enabled the researchers to accurately reconstruct passwords, PINs, and messages typed by looking at the users’ eyes.
“By observing the direction of eye movements, the attacker can determine which key is being pressed by the user,” explains Hanqiu Wang, a key researcher in the team. They were able to correctly identify the characters in passwords 77 percent of the time with only five attempts, and 92 percent accuracy was achieved for typed messages.
It’s important to note that the researchers did not have direct access to view what was being displayed on Apple’s headset. Rather, they analyzed the eye movements of a virtual avatar generated by the Vision Pro. This avatar feature finds utility in various applications such as Zoom, Teams, Slack, Reddit, Tinder, Twitter, Skype, and FaceTime.
The researchers informed Apple of the security risk in April, and by the end of July, a fix was released to prevent any potential data leaks. This incident marks the first time an attack has exploited individuals’ “gaze” data in such a manner, according to the researchers. The situation highlights the dangers biometric data—personal body information and measurements—pose in potentially revealing private details and their use in the expanding field of surveillance. For more information, see this article.
With the Vision Pro, your eyes function like a mouse. A virtual keyboard appears in front of you, which you can move and resize as needed. To type, you simply focus on the appropriate letter and click by tapping your fingers together. More on this can be found here.
While using the headset, your activities remain private. However, connecting to virtual meetings or live streams involves the creation of a Persona, a kind of spectral 3D avatar generated by the Vision Pro through facial scanning.
The researchers noted in their preliminary report that such technologies could accidentally reveal vital facial biometrics, including eye movements, during video calls using these avatars. The study by Wang and team focused on deriving two specific biometrics from Persona recordings: eye aspect ratio (EAR) and eye gaze. The team comprised Siqi Dai, Max Panoff, and Shuo Wang from the University of Florida, Haoqi Shan from CertiK (a blockchain security firm), and Zihao Zhan from Texas Tech University.
The GAZEploit attack is composed of two stages, explains Zhan, a leading researcher in the field. Initially, the team devised a method to detect when a user of the Vision Pro is typing by observing the 3D avatar being used. They utilized a recurrent neural network, a kind of deep learning model, which they trained using data from 30 individuals performing various typing tasks.
As a person types with the Vision Pro, their gaze focuses on the next key to press, followed by a swift movement to subsequent keys, the scientists found. “Our gaze demonstrates certain predictable patterns while typing,” notes Zhan.
According to Wang, these eye movement patterns are notably different when typing as compared to other activities such as browsing the web or viewing videos while wearing the headset. “The frequency of eye blinks lessens when engaged in gaze typing tasks due to increased concentration,” Wang remarks. Simply put, moving eyes across a QWERTY keyboard displays recognizable movement patterns.
The research’s second phase involves using geometric calculations to determine the positioning and size of the keyboard, explains Zhan. “The primary premise is that as long as we can gather sufficient gaze data to accurately map out the keyboard layout, we can then track all keystrokes thereafter,” he states.
Combining these two elements, they were able to predict the keys someone was likely to be typing. In a series of lab tests, they had no prior knowledge of the victim’s typing habits, speed, or keyboard location. Even so, the researchers could predict the correct letters typed, within a maximum of five guesses, with 92.1 percent accuracy for messages, 77 percent for passwords, 73 percent for PINs, and 86.1 percent for emails, URLs, and webpages. (On the first guess, the correct letters were identified 35 to 59 percent of the time, depending on the type of data being analyzed.) Duplicate letters and typos posed additional challenges.
“It’s very powerful to know where someone is looking,” says Alexandra Papoutsaki, an associate professor of computer science at Pomona College who has studied eye tracking for years and reviewed the GAZEploit research for WIRED.
Papoutsaki notes that the research is particularly significant as it depends solely on the video feed of a person’s Persona, making it a more “realistic” scenario for an attack compared to instances where a hacker physically manipulates someone’s headset to access eye-tracking data. “The fact that now someone, just by streaming their Persona, could expose potentially what they’re doing is where the vulnerability becomes a lot more critical,” Papoutsaki comments.
While the attack was developed in lab settings and hasn’t been deployed against real-world users of Personas, the researchers mention that hackers could potentially exploit this data leakage. They suggest that theoretically, a criminal might share a file with a victim during a Zoom call which could trick them into logging into a Google or Microsoft account. The attacker could then record the Persona session as the target logs in to retrieve the password and gain access to their account.
The GAZEpolit researchers communicated their findings to Apple in April and shared their proof-of-concept code to allow the replication of the attack. Apple addressed this issue in a software update for Vision Pro at the end of July, which prevents the sharing of a Persona while using the virtual keyboard.
An Apple spokesperson acknowledged the resolution of the vulnerability, noting it was fixed in VisionOS 1.3. However, the software update documentation does not specify the fix. The researchers indicated that Apple assigned CVE-2024-40865 to the vulnerability and advised users to install the latest software update.
This research underscores the potential for personal data to be inadvertently disclosed or compromised. In recent years, law enforcement has utilized fingerprints from online photos and identified individuals from the way they walk in CCTV footage. Furthermore, police departments have begun experimenting with Vision Pros for surveillance purposes.
As wearable technologies such as glasses, XR, and smartwatches become more prevalent and affordable, they collect increasing amounts of data about their users’ activities and intentions, raising significant privacy concerns. Cheng Zhang, an assistant professor at Cornell University who reviewed the Vision Pro research, points out these privacy risks. Zhang’s own research involves developing wearable technologies that interpret human behaviors.
“This paper clearly demonstrates one specific risk with gaze typing, but it’s just the tip of the iceberg,” Zhang says. “While these technologies are developed for positive purposes and applications, we also need to be aware of the privacy implications and start taking measures to mitigate potential risks for the future generation of everyday wearables.”