Scrape Google’s Top 300 Results and Screenshot Them

Part of what I’ll do for any company’s SEO is to create local business directories for them. To do this right, it’s important to understand how many listings already exist. Most businesses will have less than 50 business listings. For bigger, older companies, it’s usually 200+. Either way, there is no such thing as too many business directories. They all help.

To find all live business directories, we can do a simple google search. We enter the company’s name and their address:

Some Company Store "123 Fake St"

This will search Google for any mention of the company name AND filter that search to include only pages that mention this address. It’s a great way to get a sneak peek at how strong a company’s online presence really is. If you want to see more results on one page, you can add the modifier &num=100 to your search query in the address bar.

https://google.com/search?=Some Company Store "123 Fake St"&num=100

Now you can get up to 100 results. But what if you have more than 100? Let’s just catch the first 300 because we may not ever go past 300 with this kind of search. We can query the next page of results by adding &start=100 to the search query. This would be page 2 if we’re getting results by the hundred. Then &start=200 to get results 200-300. No, you cannot just type &num=300. Limit 100 per query.

https://google.com/search?=Some Company Store "123 Fake St"&num=100&start=100

Ok cool. Now we’ve got hundreds of URLs. Let’s grab them all at once and put them into a text file. This loosely written Python script will do the trick. This grabs all 100 URLs from google search, goes to the next 100 results, and saves those, too. It saves all entries into a text file, which we’ll use in the screenshot grabber.

from bs4 import BeautifulSoup as bs
import requests
from time import sleep

x = 'Some Company Store "123 Fake St"'


search_mod = '&num=100&start='
urls = []

def parse_URL(url):
    step_0 = url.split('&')
    step_1 = step_0[0]
    step_2 = step_1.split('=')
    return step_2[1]

for i in range(0, 3):
    url = 'https://www.google.ca/search?q=' + x + '+' + search_mod + str(i*100)
    print('Searching URL: ' +url)
    response = requests.get(url)
    page = bs(response.text, "html.parser")
    for i in page.find_all('h3'):
        try:            
            url_to_add = i.parent['href']
            url_to_add = parse_URL(url_to_add)
            urls.append(url_to_add)
            print('Saving URL: '+ url_to_add)
        except KeyError:
            continue
        urls = list(dict.fromkeys(urls)) ## DE-DUPE  
    for u in urls:
        with open('urls.txt', 'a') as file:
                print('Saving to File: '+ u)
                file.write(u+'\n')
    print('10 seconds delay before searching again...')
    sleep(10)
    

Screenshotting is actually really simple. We just set up a web driver using selenium and visit each web page individually, take a screenshot of the rendered page, and save the screenshot as a png.

from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
driver = webdriver.Chrome(options=chrome_options)

urls = open('urls.txt').readlines() # FILE NAME GOES HERE

count = 0
for i in urls:
    driver.get(i)
    screenshot = driver.save_screenshot(str(count)+'.png')
    count+=1

And just like that, we’ve got a big juicy list of web pages and a quick way to see what’s on each of them. It’s a fantastic way to speed up any kind of research. This could easily replace having to open hundreds of tabs to see if a web page has something you’re looking for.

Leave your thoughts below. Follow on Twitter