« SERP API Data Scraper

SERP API Google Scholar API Scraper

Download Data to Excel & CSV Files

Steve Spagnola
Written by Steve Spagnola
Last updated May 2, 2024

Scraping the Google Scholar API

If you’re looking to scrape Google Scholar search results, an official API to do so does not exist unfortunately. However, you can use the Unofficial SerpApi Google Scholar API to run searches and receive back structured data containing Google Scholar data.

Scrape Google Scholar Articles

We’ll walk through querying the SerpAPI Search Endpoint for Google Scholar Search Results with our service so you can see how the API works and what options you have, whether you access it with our scraping platform or directly using your own software.

Read on below, or check out the video above for a full tutorial on how to use the API.

1. Get a Serp API Key

To use SerpApi, you’ll need to sign up for a free SerpAPI account and you’ll get 100 free searches per month (20 Google Scholar results per search) to test the service out. Once you have an account, go to your SerpAPI API Key and copy the value so you can use it with our service:

Serp API Key

2. Run Queries

Using the green box on this page, you can run an initial search on Google Scholar. Enter in any search term you would otherwise use on Google Scholar. You can use any advanced operators that you need as well. We’ll query Serp API on your behalf (using your provided SerpApi key) for Google Scholar results and extract out the publications, authors, and links to PDFs and publications.

Query Serp API for Google Scholar

3. Download Data

You’ll then see the first 20 results from Google Scholar in the results, broken out by result type and de-normalized when more than one value exists per article. You can also download the results in JSON from our service on the free forever tier.

Articles

The organic_results collection will contain rows that each represent an article scraped from Google Scholar.

Google Scholar Articles

You’ll see the following interesting fields in the response data.

  • Title
  • Position
  • Result ID
  • Link URL
  • Text Snippet
  • Publication Info Snippet
  • Citation Count
  • Versions Count
  • First Author
  • First Resource

Authors

Since each article can have more than one author, our service automatically separates authors out into a separate collection, which will typically have over 20 rows (the number of articles returned per page).

Google Scholar Authors

If you click “Download CSV” on the collection, the file will contain a reference to each row’s parent article towards the right hand side of the CSV file columns. This way you can reference exactly which author wrote which article or ingest this data into a relational database in proper format.

Resources

Similar to authors, each article can have 0 or more resources such as links to downloadable PDF files or HTML versions of the paper if not behind a paywall. These are available in the organic_results › resources collection.

Google Scholar Resources

As with authors, if you click “Download CSV” you’ll get a CSV file where columns on the right will reference the parent article for each resource row.

Pagination

SerpAPI offers a start parameter you can use to retrieve multiple pages for a search result set, allowing you to download all of the results available from Google Scholar.

You can manually increase this by 20 each time and download CSV files for each page of results or use our Google Scholar Search - Pagination Workflow to automatically query the API and increase the offset by 20 each time, combining the results into a single CSV file for each of the 3 collections mentioned above.

Article Citations

You can also use the Google Scholar cites feature, which allows you to see exactly which papers cite a specific paper that you provide via API. To do this, you’ll need to use the special search query cites option and provide the value seen in the cites_id field for the specific article you want to scrape the citing papers for.

Unreliable Alternatives

You’ll see a few alternative screen scrapers and open source projects that attempt to scrape Google Scholar directly from your computer, which is a very bad idea as it risks getting your IP address or Google Account banned for automated access to the search engine Google provides.