How AI Transformers Mimic Parts of the Brain | Quanta Magazine
Understanding how the brain organizes and accesses spatial information where we are, whats around the corner, how to get there remains an exquisite challenge. The process involves recalling an entire network of memories and stored spatial data from tens of billions of neurons, each connected to thousands of others. Neuroscientists have identified key elements such as grid cells, neurons that map locations. But going deeper will prove tricky: Its not as though researchers can remove and study slices of human gray matter to watch how location-based memories of images, sounds and smells flow through and connect to each other.
Artificial intelligence offers another way in. For years, neuroscientists have harnessed many types of neural networks the engines that power most deep learning applications to model the firing of neurons in the brain. In recent work, researchers have shown that the hippocampus, a structure of the brain critical to memory, is basically a special kind of neural net, known as a transformer, in disguise. Their new model tracks spatial information in a way that parallels the inner workings of the brain. Theyve seen remarkable success.
The fact that we know these models of the brain are equivalent to the transformer means that our models perform much better and are easier to train, said James Whittington, a cognitive neuroscientist who splits his time between Stanford University and the lab of Tim Behrens at the University of Oxford.
Studies by Whittington and others hint that transformers can greatly improve the ability of neural network models to mimic the sorts of computations carried out by grid cells and other parts of the brain. Such models could push our understanding of how artificial neural networks work and, even more likely, how computations are carried out in the brain, Whittington said.
Were not trying to re-create the brain, said David Ha, a computer scientist at Google Brain who also works on transformer models. But can we create a mechanism that can do what the brain does?
Transformers first appeared five years ago as a new way for AI to process language. They are the secret sauce in those headline-grabbing sentence-completing programs like BERT and GPT-3, which can generate convincing song lyrics, compose Shakespearean sonnets and impersonate customer service representatives.
Transformers work using a mechanism called self-attention, in which every input a word, a pixel, a number in a sequence is always connected to every other input. (Other neural networks connect inputs only to certain other inputs.) But while transformers were designed for language tasks, theyve since excelled at other tasks such as classifying images and now, modeling the brain.
In 2020, a group led by Sepp Hochreiter, a computer scientist at Johannes Kepler University Linz in Austria, used a transformer to retool a powerful, long-standing model of memory retrieval called a Hopfield network. First introduced 40 years ago by the Princeton physicist John Hopfield, these networks follow a general rule: Neurons that are active at the same time build strong connections with each other.
Hochreiter and his collaborators, noting that researchers have been looking for better models of memory retrieval, saw a connection between how a new class of Hopfield networks retrieve memories and how transformers perform attention. These new Hopfield networks, developed by Hopfield and Dmitry Krotov at the MIT-IBM Watson AI Lab, can store and retrieve more memories compared to the standard Hopfield networks because of more effective connections. Hochreiters team upgraded these networks by adding a rule that acts like the attention mechanism in transformers.
Then, earlier this year, Whittington and Behrens helped further tweak the approach, modifying the transformer so that instead of treating memories as a linear sequence like a string of words in a sentence it encoded them as coordinates in higher-dimensional spaces. That twist, as the researchers called it, further improved the models performance on neuroscience tasks. They also showed that the model was mathematically equivalent to models of the grid cell firing patterns that neuroscientists see in fMRI scans.
Grid cells have this kind of exciting, beautiful, regular structure, and with striking patterns that are unlikely to pop up at random, said Caswell Barry, a neuroscientist at University College London. The new work showed how transformers replicate exactly those patterns observed in the hippocampus. They recognized that a transformer can figure out where it is based on previous states and how its moved, and in a way thats keyed into traditional models of grid cells.