GeoGuessing with Deep Learning

During this last lockdown in the UK, my wife and I have been playing GeoGuessr. It’s slower paced than the computer games we normally play but it goes well with having a 11-week old baby who becomes more alive each and every day.

GeoGuessr is a geographic discovery game. You are dropped into a random Google Street View and tasked with pointing out your location on a map. You can look around, zoom, and follow the car’s path through the local streets.

We’ve started to take the Daily Challenge on GeoGuessr quite seriously. We show up every day and push for a new high score. In the Daily Challenge there’s a three minute limit per round which, for us, is either filled with frantic clicking as we zip through the Australian outback (probably while mistaking it for South Africa), or debating back and forth about whether ø exists in the Swedish language.

I now have a lot of I’ll know it when I see it knowledge. I know Greenland on sight. My lost knowledge of country flags has returned, along with new knowledge of USA state flags, which countries drive on the right vs. left, which use KM vs M. I know pretty much every country-specific domain name (they’re often on roadside adverts) – I won’t forget .yu anytime soon.

Did you know that black and white guard rails are commonly found in Russia and Ukraine? Or that you can make out the blue EU bar on license plates through Google Street View’s blurification? Read more in this 80,000 word guide – Geoguessr - the Top Tips, Tricks and Techniques.

The red and white striped arrow pointing downwards indicates that you are in Japan, most likely on the island of Hokkaido or possibly on the island of Honshu near mountains.

I once read that machine learning is currently capable of doing anything a human can do in under one second. Recognize a face, pick out some text from an image, swerve to avoid another car. This got me thinking, and thinking led me to a paper called Geolocation Estimation of Photos using a Hierarchical Model and Scene Classification by Eric Müller-Budack, Kader Pustu-Iren, and Ralph Ewerth. This paper treats “geolocalization as a classification problem where the earth is subdivided into geographical cells.”

It predicts the GPS coordinates of photos.

Even indoor photos! (GeoGuessr’s Daily Challenge will often trap you inside a museum).

Recently, the paper’s authors released a PyTorch implementation and provided weights of a pre-trained base(M, f*) model with underlying ResNet50 architecture.

I presumed that the pretrained model would not map well to the sections of photospheres that I could scrape from GeoGuessr. For training data, the authors used “a subset of the Yahoo Flickr Creative Commons 100 Million dataset (YFCC100M)“. Which contains “around five million geo-tagged images from Flickr [and] ambiguous photos of, e.g., indoor environments, food, and humans for which the location is difficult to predict.”

What was interesting was that on the Im2GPS dataset, humans found the location of an image at country granularity (within 750km) 13.9% of the time but the Individual Scene Networks were able to do it 66.7% of the time!

So the question became: who is better at GeoGuessr, my wife (a formidable player) or the machine?

To scrape screenshots of the current in-game location, I created a Selenium program that performs the following four times:

Take a screenshot of the canvas
Step forwards
Rotate the view ~90 degrees

The number of times this happens is tuneable via NUMBER_OF_SCREENSHOTS in the snippet below.

'''
Given a GeoGuessr map URL (e.g. <https://www.geoguessr.com/game/5sXkq4e32OvHU4rf>)
take a number of screenshots each one step further down the road and rotated ~90 degrees.
Usage: "python file_name.py <https://www.geoguessr.com/game/5sXkq4e32OvHU4rf>"
'''
from selenium import webdriver
import time
import sys
NUMBER_OF_SCREENSHOTS = 4

geo_guessr_map = sys.argv[1]

driver = webdriver.Chrome()
driver.get(geo_guessr_map)

# let JS etc. load
time.sleep(2)

def screenshot_canvas():
    '''
    Take a screenshot of the streetview canvas.
    '''
    with open(f'canvas_{int(time.time())}.png', 'xb') as f:
        canvas = driver.find_element_by_tag_name('canvas')
        f.write(canvas.screenshot_as_png)

def rotate_canvas():
    '''
    Drag and click the <main> elem a few times to rotate us ~90 degrees.
    '''
    main = driver.find_element_by_tag_name('main')
    for _ in range(0, 5):
        action = webdriver.common.action_chains.ActionChains(driver)
        action.move_to_element(main) \\
            .click_and_hold(main) \\
            .move_by_offset(118, 0) \\
            .release(main) \\
            .perform()

def move_to_next_point():
    '''
    Click one of the next point arrows, doesn't matter which one
    as long as it's the same one for a session of Selenium.
    '''
    next_point = driver.find_element_by_css_selector('[fill="black"]')
    action = webdriver.common.action_chains.ActionChains(driver)
    action.click(next_point).perform()

for _ in range(0, NUMBER_OF_SCREENSHOTS):
    screenshot_canvas()
    move_to_next_point()
    rotate_canvas()

driver.close()