Google’s PlaNet Makes Identifying a Photo So Easy a Machine Can Do It

PlaNet: The world is divided into many grid. (Screenshot: Tobias Weyand, Ilya Kostrikov, James Philbin)
Google. Don't let the cutie-pie name fool you. Since way back in 1998 we've used, abused, and relied on it for our search engine needs.

Planning a family vacation? Chances are good you’ve scoured Google for information and photos for potential destinations like the Grand Canyon or New York City. Google has made it their business to make the finding of things as easy as possible for the masses, and we’ve proven over and over again that people are on an endless quest to find things. Plus having some decent online pics of Rockefeller Center to present to the kids never hurts the cause on the vacation front, right?

Those picture searches lead us to Googles latest endeavour, a machine heavy on artificial intelligence that is programmed to recognize details or objects within an image in order to memorizethem, christened PlaNet.  PlaNet focuses on the successful photo geolocation of objects, buildings, even animals and plants, but without all that geotagging silliness normally required to do so.  

Google engineers Tobias Weyand, Ilya Kostrikov, and James Philbin recently released their paper detailing the science and math behind how PlaNet works, and how it stacks up against the human brain when it comes to identifying what a building or landmark is or where a photo was taken-provided its a relatively clear photo, of course (cmon-cut it some slack for that).

PlaNEt.0
Ground truth location (yellow), human guess (green), PlaNet guess (blue).

The (very) simplified breakdown of the PlaNet brainis this: Weyand, Kostrikov and Philbin devised a grid system that had the earth covered in 26 000 boxes.  The size of the box varied depending on how many geotagged photos or images existed for that particular location.  For example, New York City-millions of pictures available, smaller squares needed.  Antartica?  Not so many visuals around, bigger squares as a result.  

From there, 91 million geotagged images were placed in a database, and PlaNet was given the task of learning where each image was taken in relation to the boxed grid system already in place.  To test how well PlaNet had done with its homework the team then culled 2.3 million geotagged images from Flickr and presented them to PlaNet, minus the geotags.  

PlaNet correctly identified the continent on which the photo was taken 48 percent of the time, the country 28.4 percent, and the city 10.1 percent.  PlaNet even had a 3.6 percent success rate with street level views-not too shabby for something plugged into a wall outlet.

Photos shown to PlaNet (left) and the resulting probability distribution showing where the photo was likely taken (right).
Photos shown to PlaNet (left) and the resulting probability distribution showing where the photo was likely taken (right).

Those numbers may seem lackluster on the surface, but compared to human capabilities its quite impressive.  PlaNet took on ten worldly human travellers in a friendly online game of geoguessr and showed everyone whos boss,  winning 28 of 50 rounds played.

The developers and Google are being tight-lipped about possible uses or markets for PlaNet, but since the current core program is relatively reasonable in size (377mb) dont be surprised if youre asking Siri to open it for you sometime down the road.  

If you liked this post, sign up for the Weekly Memo, a handpicked selection of the most Interesting Shit delivered to your inbox every Saturday. Or join our 350,000+ followers by liking us on Facebook, or follow us on Twitter or Instagram.

Jay Moon

Jay Moon is a writer who has turned the wanderlust that found him backpacking around Canada and the U.S. as a young lad into a writing lust that has him embracing the opportunity to cover topics about anything (and everything) he can get his now middle-aged eyes, ears, and hands on.