We once expected futuristic inventions like self-driving cars and robot helpers, but instead, we’ve witnessed the ascendancy of artificial intelligence (AI) systems that rival human capacities in chess, text analysis, and poetry composition. These advancements have revealed a curious aspect of AI: their unexpected ability to exhibit a form of creative expression.
The essential technology behind AI-generated images, known as diffusion models, is designed to create replicas of the images it was trained on. Interestingly, these models appear to innovate, merging features from various images to produce entirely new creations, contradicting the assumption that they would merely replicate existing images. This phenomenon puzzled researchers for years, as these systems were expected to memorize rather than create.
In their process, diffusion models utilize an approach called denoising. Essentially, the model transforms a clear image into digital noise before reconstructing it—a process akin to shredding a painting and then reassembling the fragments. Skeptics questioned how genuine novelty could arise from a mere reassembly operation.
A breakthrough came when two physicists proposed that imperfections in the denoising process might be the key to the apparent creativity of diffusion models. Their research, shared at the International Conference on Machine Learning 2025, suggests that what appears as creativity is actually an inevitable outcome of the models’ design.
Mason Kamb, a graduate student at Stanford University, has been captivated by the concept of morphogenesis—the way living systems assemble themselves. This interest led him to compare diffusion models to Turing patterns, which describe how cells coordinate to form complex structures without a central directive. Rather than adhering to a preordained plan, cells interact locally, leading to innovations and sometimes aberrations, like limbs with extra fingers.
When AI-generated images began to surface, many resembled these anomalies—images with unrealistic features that evoked a sense of the uncanny. Kamb recognized these unexpected results as failures intrinsic to a decentralized construction process akin to morphogenesis.
Initially, researchers viewed the inherent limitations of diffusion models as constraints. They didn’t link the focus on local patches of pixels, governed by rules of locality and translational equivariance, to the emergence of creativity. However, Kamb’s developing hypothesis suggested that emphasizing these features could indeed lead to creative outputs.
Working with his advisor, Surya Ganguli, Kamb formulated an experimental predictive model called the equivariant local score (ELS) machine. This comprehensive set of equations was not a trained diffusion model but aimed to replicate the effects of locality and equivariance in generating images. Remarkably, when tested against established diffusion models, the ELS machine yielded results matching the outputs with an astounding accuracy of 90%, a notable achievement in machine learning.
Kamb’s findings indicated that imposing a localized perspective was essential for generating creativity within the models. The anomalies, such as the extra fingers phenomenon, were direct consequences of this localized focus. While experts acknowledge that Kamb and Ganguli’s research sheds light on the mechanisms behind diffusion model creativity, they also note that it leaves many questions unanswered, particularly regarding other AI systems that display creativity without relying on these same principles.
This research could reshape our understanding of creativity, not just in AI but potentially in human cognition as well. Kamb’s work suggests both AI and humans assemble creative ideas based on past experiences—evidence of a shared struggle to navigate a complex world. In this light, creativity might stem from a joint effort to bridge the gaps in our knowledge, whether human or artificial.
Read the original story on Quanta Magazine.