The Computing Pioneer Helping AI See | Quanta Magazine
I understood early on about the importance of prior data when looking at the world. I couldnt see very well myself, but my memory of prior experiences filled in the holes enough that I could function basically as good as a normal person. Most people dont know that I dont see well. That gave me I think this unique intuition that it might be less about the pixels and more about the memory.
Computers only see whats there now, whereas we see the moment connected to the tapestry of everything weve seen before.
Is it even possible to express in words the subtle visual patterns that, for example, make Paris look like Paris?
When youre in a particular city, sometimes you just know what city youre in theres this je ne sais quoi, even though youve never been to that particular street corner. Thats extremely hard to describe in words, but its right there in the pixels.
[For Paris], you could talk about how its usually six-story buildings, and usually there are balconies on the fourth story. You could put some of this into words, but a lot is not linguistic. To me thats exciting.
Your recent work involves teaching computers to ingest visual data in ways that mimic human sight. How does that work?
Right now, computers have a ginormous data set: billions of random images scraped off the internet. They take random images, process one image, then take another random image, process that, etc. You train your [computers visual] system by going over and over this data set.
The way that we biological agents ingest data is very different. When we are faced with a novel situation, it is the one and only time this data will be there for us. Weve never been in this exact situation, in this room, with this lighting, dressed this way. First, we use this data to do what we need to do, to understand the world. Then, we use this data to learn from it, [to predict] the future.