One task where AI tools have proven to be particularly superhuman is analyzing vast troves of data to find patterns that humans can’t see, or automating and accelerating the discovery of those we can. That makes Bitcoin’s blockchain, a public record of nearly a billion transactions between pseudonymous addresses, the perfect sort of puzzle for AI to solve. Now, a new study—along with a vast, newly released trove of crypto crime training data—may be about to trigger a leap forward in automated tools’ ability to suss out illicit money flows across the Bitcoin economy.
On Wednesday, researchers from cryptocurrency tracing firm Elliptic, MIT, and IBM published a paper that lays out a new approach to finding money laundering on Bitcoin’s blockchain. Rather than try to identify cryptocurrency wallets or clusters of addresses associated with criminal entities such as dark-web black markets, thieves, or scammers, the researchers collected patterns of bitcoin transactions that led from one of those known bad actors to a cryptocurrency exchange where dirty crypto might be cashed out. They then used those example patterns to train an AI model capable of spotting similar money movements—what they describe as a kind of detector capable of spotting the “shape” of suspected money laundering behavior on the blockchain.
Now, they’re not only releasing an experimental version of that AI model for detecting bitcoin money laundering but also publishing the training data set behind it: a 200-million transaction trove of Elliptic’s tagged and classified blockchain data, which the researchers describe as the biggest of its kind ever to be made public by a thousandfold. “We’re providing about a thousand times more data, and instead of labeling illicit wallets, we’re labeling examples of money laundering which might be made up of chains of transactions,” says Tom Robinson, Elliptic’s chief scientist and cofounder. “It’s a paradigm shift in the way that blockchain analytics is used.”
Blockchain analysts have been utilizing machine learning tools for a number of years to enhance and automatize their abilities in tracing crypto funds and identifying criminal actors. In 2019, Elliptic, in collaboration with MIT and IBM, established an AI model for the detection of suspicious money movements and released a smaller data set of approximately 200,000 transactions for its training.
Contrastingly, the same research team has now adopted a much more ambitious methodology for their new research. Instead of attempting to classify individual transactions as either valid or illicit, Elliptic examined up to six transaction collections between the Bitcoin address clusters it had already marked as illicit actors and the exchanges where these criminals traded their crypto. The researchers hypothesized that transaction patterns between the criminals and their cashout points could serve as examples of money laundering behaviors.
Building on this hypothesis, Elliptic compiled 122,000 of these so-called subgraphs, or money laundering patterns within a total data set of 200 million transactions. The research team then utilized this training data to develop an AI model, the purpose of which is to identify money laundering patterns across the entire Bitcoin blockchain.
Boone Ashworth
Carlton Reid
Amanda Hoover
Reece Rogers
As a test of their resulting AI tool, the researchers checked its outputs with one cryptocurrency exchange—which the paper doesn’t name—identifying 52 suspicious chains of transactions that had all ultimately flowed into that exchange. The exchange, it turned out, had already flagged 14 of the accounts that had received those funds for suspected illicit activity, including eight it had marked as associated with money laundering or fraud, based in part on know-your-customer information it had requested from the account owners. Despite having no access to that know-your-customer data or any information about the origin of the funds, the researchers’ AI model had matched the conclusions of the exchange’s own investigators.
From a pool of 52 customer accounts, managing to flag 14 of them as potentially suspicious might not seem impressive. However, factoring that only 0.1 percent of accounts are generally flagged on the exchange for potential money laundering, the researchers assert that with their automated tool, finding suspicious accounts has become significantly easier. As one of the authors of the study, Mark Weber from MIT’s Media Lab, puts it, it’s a massive leap going from ‘out of a thousand items we examine, one is illicit’ to 14 in 52. Additional investigation is intended to reveal whether something has been overlooked.
Elliptic declared that they’ve been using the AI model confidentially in their projects. As more confirmation of the tool’s efficiency, the researchers elaborate that it has been instrumental in analyzing the source of funds for some suspicious transactions, identifying bitcoin addresses linked to a Russian dark-web market, a cryptocurrency ‘mixer’ designed to obscure the trail of bitcoins on blockchain, and a Panamanian Ponzi scheme. Elliptic refrained from revealing any identities on account of ongoing investigations.
More than the utility of the AI model, the potential lies in the training data from Elliptic. The researchers have shared this with the machine learning and data science community on Google-owned Kaggle. Mark Weber from MIT appreciates the open-source ethos. The data does not contain any information about the owners of the Bitcoin addresses, or the addresses themselves, but the structural details for the ‘subgraphs’ of the transactions it marked with their suspicion ratings for money laundering.
According to Stefan Savage, a Computer Science professor at the University of California San Diego, this tremendous repository of data will stimulate Blockchain money laundering research. Savage also worked as an adviser to a benchmark Bitcoin tracing paper from 2013. Despite this, he believes that the tool, as it stands, won’t revolutionize crypto anti-money laundering, instead serving as a proof of concept. ‘It’s a signal that there’s work to be done here. More people should join in,’ he states.
Boone Ashworth
Carlton Reid
Amanda Hoover
Reece Rogers
Savage warns, though, that AI-based money-laundering investigation tools will likely raise new ethical and legal questions if they end up being used as actual criminal evidence—in part because AI tools often serve as a “black box” that provides a result without any explanation of how it was produced. “This is on the edge where people get uncomfortable in the same way they get uncomfortable about face recognition,” he says. “You can’t quite explain how it works, and now you’re depending on it for decisions that may have an impact on people’s liberty.”
MIT’s Weber counters that money laundering investigators have always used algorithms to flag potentially suspicious behavior. AI-based tools, he argues, just mean those algorithms will be more efficient and have fewer false positives that waste investigators’ time and incriminate the wrong suspects. “This isn’t about automation,” Weber says. “This is a needle-in-a-haystack problem, and we’re saying let’s use metal detectors instead of chopsticks.”
As for the research impact that Savage expects, he argues that even beyond blockchain analysis, Elliptic’s training data is so voluminous and detailed that it may even help with other kinds of AI research into analogous problems like health care and recommendation systems. But he says the researchers do also intend their work to have a practical effect, enabling a new and very real way to hunt for patterns that reveal financial crime.
“We’re hopeful that this is much more than an academic exercise,” Weber says, “that people in this domain can actually take this and run with it.”