If you know where to look, plenty of secrets can be found online. Since the fall of 2021, independent security researcher Bill Demirkapi has been building ways to tap into huge data sources, which are often overlooked by researchers, to find masses of security problems. This includes automatically finding developer secrets—such as passwords, API keys, and authentication tokens—that could give cybercriminals access to company systems and the ability to steal data.
Today, at the Defcon security conference in Las Vegas, Demirkapi is unveiling the results of this work, detailing a massive trove of leaked secrets and wider website vulnerabilities. Among at least 15,000 developer secrets hard-coded into software, he found hundreds of username and password details linked to Nebraska’s Supreme Court and its IT systems; the details needed to access Stanford University’s Slack channels; and more than a thousand API keys belonging to OpenAI customers.
A major smartphone manufacturer, customers of a fintech company, and a multibillion-dollar cybersecurity company are counted among the thousands of organizations that inadvertently exposed secrets. As part of his efforts to stem the tide, Demirkapi hacked together a way to automatically get the details revoked, making them useless to any hackers.
In a second strand to the research, Demirkapi also scanned data sources to find 66,000 websites with dangling subdomain issues, making them vulnerable to various attacks including hijacking. Some of the world’s biggest websites, including a development domain owned by The New York Times, had the weaknesses.
While the two security issues he looked into are well-known among researchers, Demirkapi says that turning to unconventional datasets, which are usually reserved for other purposes, allowed thousands of issues to be identified en masse and, if expanded, offers the potential to help protect the web at large. “The goal has been to find ways to discover trivial vulnerability classes at scale,” Demirkapi tells WIRED. “I think that there’s a gap for creative solutions.”
It is relatively trivial for a developer to accidentally include their company’s secrets in software or code. Alon Schindel, the vice president of AI and threat research at the cloud security company Wiz, says there’s a huge variety of secrets that developers can inadvertently hard-code, or expose, throughout the software development pipeline. These can include passwords, encryption keys, API access tokens, cloud provider secrets, and TLS certificates.
“The most acute risk of leaving secrets hard-coded is that if digital authentication credentials and secrets are exposed, they can grant adversaries unauthorized access to a company’s code bases, databases, and other sensitive digital infrastructure,” Schindel says.
The risks are high: Exposed secrets can result in data breaches, hackers breaking into networks, and supply chain attacks, Schindel adds. Previous research in 2019 found thousands of secrets were being leaked on GitHub every day. And while various secret scanning tools exist, these largely are focused on specific targets and not the wider web, Demirkapi says.
During his research, Demirkapi, known for his hacking during his teen years, searched for secret keys on a large scale. He used VirusTotal, a Google-owned platform where developers can check files for malware by uploading them.
VirusTotal’s Retrohunt feature scans a year’s worth of files using YARA rules to detect specific patterns. Demirkapi explains, “We can use these tools and VirusTotal’s extensive data to search for secrets.”. He examined over 1.5 million samples, identifying more than 15,000 active secrets by confirming their validity through API calls.
This extensive search revealed numerous keys that could potentially allow attackers access to digital resources and sensitive information of various organizations. For instance, credentials from a member of Nebraska’s Supreme Court and access to Stanford University’s Slack channels were among the findings.
Following the discovery, Nebraska State Court Administrator Corey R. Steel confirmed that the credentials were immediately updated, there was no misuse of the data, and policies were revised to prevent such breaches. Stanford University has not officially responded, but reports suggest that the issue was promptly addressed after being reported.
Demirkapi also scoured passive DNS replication data, to search for websites with dangling subdomain issues. Vulnerable websites can be impersonated, used to deploy malware or phishing pages, steal cookies, and more. “Dangling domains are widespread, and it’s pretty easy for attackers to find high-valuable targets,” says Daiping Liu, a senior research manager at Palo Alto Networks. Liu says tens of thousands of dangling records are exposed at any one time, adding that larger domains can be more susceptible to the issue as they’re harder to manage and there’s more chance for human error.
For example, Demirkapi briefly published an (almost convincing) satirical article on a New York Times production domain with the headline “U.S. Declares War Against Russia Amid Escalating Tensions, Sending Shockwaves Through International Community.” This was removed after around a week, Demirkapi says. A spokesperson for The New York Times declined to comment.
The researcher says by starting with dangling cloud resources instead of looking for issues with a specific domain or set of domains allows for issues to be discovered systematically. Overall, he found more than 78,000 dangling cloud resources linked to 66,000 apex domains. Pointing to academic research that followed a similar technique using passive DNS replication data, but starting with URLs, Demirkapi says his approach was able to find magnitudes more issues.
Finding thousands of vulnerable websites and exposed secrets is one thing—getting them fixed is another. While Demirkapi says it has not been possible to alert all websites with dangling domain issues to the problems; he has managed to find ways to clean up the 15,000 hard-coded secrets.
Some experts addressed the issue by contacting affected companies directly. However, Demirkapi reached out to providers that were issuing credentials to see if there was a better way to communicate about the leaked secrets. In one instance, after he revealed over 1,000 compromised OpenAI API keys in February, the company granted him access to a self-service tool that automatically revokes any revealed details. OpenAI’s spokesperson, Niko Felix, mentioned that this service not only deactivates keys identified as compromised but also ensures the ongoing safety of their clients.
However, not all efforts were as seamless. GitHub, which manages over 420 million repositories, has long operated a secret scanning service that identifies exposed tokens and keys. It collaborates with third parties to allow these secrets to be reported and possibly deactivated. In March, Demirkapi inquired if GitHub had a public endpoint to report the secrets he uncovered for faster action. Unfortunately, a spokesperson from the company confirmed that no such service exists for individuals to use.
Demirkapi then approached Amazon Web Services (AWS), who denied him access to their reporting tools used by affiliates. AWS spokesperson Aisha Johnson stated, “We staunchly believe that customer credentials, including security keys, are the sole property of the customers. AWS does not allow external entities to handle or revoke these keys, as it would contravene security procedures and diminish trust in our services.” However, she noted that anyone can contact their security team via email, and AWS informs clients if any exposure of keys is detected.
Finding existing methods insufficient, Demirkapi utilized GitHub to upload secrets himself and activate the platform’s secret scanning system to report them. “I devised a technique that keeps it completely private,” said Demirkapi about the workaround he crafted, which involved uploading secrets in a manner they were not exposed to the public.
Ultimately, Demirkapi says he picked low-hanging fruit for the research. “Detecting a hard-coded secret or detecting if a resource is dangling, those are fairly trivial classes of vulnerability,” he says, adding that more complex vulnerabilities could potentially be detected in big data sources. There may be plenty of untapped databases that can help fix security issues. “I think that we need to think more about leveraging these large data sources to derive value from them in unconventional ways,” Demirkapi says.