🛡 HAR File Web Scraper

Scrape Any Interactive Website Without Being Detected or Blocked


What is a HAR File

🗄 HAR files contain the raw network data sent and received to and from your web browser between servers when surfing the web. They are essentially "recordings" of the raw data your browser processes while you use your favorite websites. This makes them ideal for scraping websites, as they contain all the data sent to your browser as you interacted with any given website.

Web Scraping from HAR Files

🤖 Scraping websites using anything other than a legitimate web browser with a real human being behind the screen is becoming more and more difficult as companies have become very good at cracking down on screen scraping, even leveraging AI!

🎥 Since HAR files contain a passive "recording" of the raw data sent to your browser as you use a website in accordance with their Terms of Service, the website can never detect when it's being recorded and you have a legal right to access the recording.

⚖️ Therefore, scraping websites with HAR files is completely safe, legal and will never waste your time with broken web scraping software or proxies. The downside is that you must manually access websites to record traffic to a HAR file, but you can easily outsource this to a virtual assistant.

How to Generate a HAR File

💻 You'll need to open developer tools on your web browser to generate a HAR file. Simply right click on a web page and hit "Inspect" on the dropdown menu. This will then open developer tools and begin recording your web traffic.

How to Generate a HAR File

🔍 Once recording, you now want to make sure you capture the desired data in your browser, so be sure to reload the web page if needed so that the data comes into your browser while you're recording. You'll also want to click and scroll around the website to load in more data as needed. To export the recorded data to a HAR file, select the "Network" tab in developer tools then click the down arrow labeled "Export HAR..."

Scraping Interactive Data

🧲 Most modern websites such as Instagram, Airbnb, and many others have AJAX style interfaces for interacting with their data. Think of the Airbnb map that dynamically loads listings as you pan and zoom around the map. This data is sent in JSON format to your browser and is then rendered as HTML. HAR file web scraping allows us to capture the raw JSON (before it's rendered to HTML) and very easily scrape the data this way.

🚫 Older or static websites that only show data as HTML will not work well with HAR file web scraping, as the data is "baked" into the HTML and thus more difficult to scrape. For these scenarios, we suggest using a more traditional screen scraper or building your own Python web scraper using a library like Beautiful Soup.

How to Open a HAR File

📝 HAR files are simply text files containing JSON under the hood, so you can open them in any text editor or with any programming language or tool such as Python. You can also open your HAR file using this webpage, just drag and drop the file here! Below is a sample HAR file showing a single network request that dynamically loads search results from our website.

{
  "log": {
    "version": "1.2",
    "creator": {
      "name": "WebInspector",
      "version": "537.36"
    },
    "pages": [
      {
        "startedDateTime": "2024-07-07T15:23:20.232Z",
        "id": "page_1",
        "title": "https://stevesie.com/apps",
        "pageTimings": {
          "onContentLoad": 1038.0420000001322,
          "onLoad": 1491.081000000122
        }
      }
    ],
    "entries": [
      {
        "_initiator": {
          "type": "script",
          "stack": {
            "callFrames": [
              {
                "functionName": "",
                "scriptId": "106",
                "url": "https://stevesie.com/static/web/app/base_brochure.50224e7848640dbcecca.bundle.js",
                "lineNumber": 0,
                "columnNumber": 59287
              },
              {
                "functionName": "",
                "scriptId": "106",
                "url": "https://stevesie.com/static/web/app/base_brochure.50224e7848640dbcecca.bundle.js",
                "lineNumber": 0,
                "columnNumber": 49672
              },
              {
                "functionName": "send",
                "scriptId": "112",
                "url": "https://stevesie.com/static/web/app/vendors~bootstrap~bootstrap-slider~devbridge-autocomplete~jquery.50224e7848640dbcecca.bundle.js",
                "lineNumber": 24,
                "columnNumber": 78905
              },
              {
                "functionName": "ajax",
                "scriptId": "112",
                "url": "https://stevesie.com/static/web/app/vendors~bootstrap~bootstrap-slider~devbridge-autocomplete~jquery.50224e7848640dbcecca.bundle.js",
                "lineNumber": 24,
                "columnNumber": 74504
              },
              {
                "functionName": "getSuggestions",
                "scriptId": "119",
                "url": "https://stevesie.com/static/web/app/vendors~devbridge-autocomplete.50224e7848640dbcecca.bundle.js",
                "lineNumber": 0,
                "columnNumber": 7869
              },
              {
                "functionName": "onValueChange",
                "scriptId": "119",
                "url": "https://stevesie.com/static/web/app/vendors~devbridge-autocomplete.50224e7848640dbcecca.bundle.js",
                "lineNumber": 0,
                "columnNumber": 6642
              },
              {
                "functionName": "onKeyUp",
                "scriptId": "119",
                "url": "https://stevesie.com/static/web/app/vendors~devbridge-autocomplete.50224e7848640dbcecca.bundle.js",
                "lineNumber": 0,
                "columnNumber": 6194
              },
              {
                "functionName": "",
                "scriptId": "119",
                "url": "https://stevesie.com/static/web/app/vendors~devbridge-autocomplete.50224e7848640dbcecca.bundle.js",
                "lineNumber": 0,
                "columnNumber": 3239
              },
              {
                "functionName": "dispatch",
                "scriptId": "112",
                "url": "https://stevesie.com/static/web/app/vendors~bootstrap~bootstrap-slider~devbridge-autocomplete~jquery.50224e7848640dbcecca.bundle.js",
                "lineNumber": 24,
                "columnNumber": 39256
              },
              {
                "functionName": "v.handle",
                "scriptId": "112",
                "url": "https://stevesie.com/static/web/app/vendors~bootstrap~bootstrap-slider~devbridge-autocomplete~jquery.50224e7848640dbcecca.bundle.js",
                "lineNumber": 24,
                "columnNumber": 37251
              },
              {
                "functionName": "r",
                "scriptId": "106",
                "url": "https://stevesie.com/static/web/app/base_brochure.50224e7848640dbcecca.bundle.js",
                "lineNumber": 0,
                "columnNumber": 46156
              }
            ]
          }
        },
        "_priority": "High",
        "_resourceType": "xhr",
        "cache": {},
        "connection": "440506",
        "pageref": "page_3",
        "request": {
          "method": "GET",
          "url": "https://stevesie.com/cloud/api/v1/search?query=data",
          "httpVersion": "HTTP/1.1",
          "headers": [
            {
              "name": "Accept",
              "value": "application/json, text/javascript, */*; q=0.01"
            },
            {
              "name": "Accept-Encoding",
              "value": "gzip, deflate, br, zstd"
            },
            {
              "name": "Accept-Language",
              "value": "en-US,en;q=0.9"
            },
            {
              "name": "Cache-Control",
              "value": "no-cache"
            },
            {
              "name": "Connection",
              "value": "keep-alive"
            },
            {
              "name": "Host",
              "value": "stevesie.com"
            },
            {
              "name": "Pragma",
              "value": "no-cache"
            },
            {
              "name": "Referer",
              "value": "https://stevesie.com/apps"
            },
            {
              "name": "Sec-Fetch-Dest",
              "value": "empty"
            },
            {
              "name": "Sec-Fetch-Mode",
              "value": "cors"
            },
            {
              "name": "Sec-Fetch-Site",
              "value": "same-origin"
            },
            {
              "name": "User-Agent",
              "value": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
            },
            {
              "name": "X-Requested-With",
              "value": "XMLHttpRequest"
            },
            {
              "name": "sec-ch-ua",
              "value": "\"Not/A)Brand\";v=\"8\", \"Chromium\";v=\"126\", \"Google Chrome\";v=\"126\""
            },
            {
              "name": "sec-ch-ua-mobile",
              "value": "?0"
            },
            {
              "name": "sec-ch-ua-platform",
              "value": "\"macOS\""
            }
          ],
          "queryString": [
            {
              "name": "query",
              "value": "data"
            }
          ],
          "headersSize": 1277,
          "bodySize": 0
        },
        "response": {
          "status": 200,
          "statusText": "OK",
          "httpVersion": "HTTP/1.1",
          "headers": [
            {
              "name": "Connection",
              "value": "keep-alive"
            },
            {
              "name": "Content-Length",
              "value": "1866"
            },
            {
              "name": "Content-Type",
              "value": "application/json"
            },
            {
              "name": "Date",
              "value": "Sun, 07 Jul 2024 15:26:26 GMT"
            },
            {
              "name": "Server",
              "value": "nginx/1.10.3"
            },
            {
              "name": "Vary",
              "value": "Cookie"
            },
            {
              "name": "X-Frame-Options",
              "value": "SAMEORIGIN"
            }
          ],
          "cookies": [],
          "content": {
            "size": 1866,
            "mimeType": "application/json",
            "compression": 0,
            "text": "{\"objects\": [{\"id\": \"7a00cd96-d34c-4412-87e6-f6833b3de7a3\", \"type\": \"app\", \"name\": \"UPC Database\", \"slug\": \"upc-database\"}, {\"id\": \"ac2255df-da33-47f9-b1df-f8ecc806bcf2\", \"type\": \"app\", \"name\": \"Opendatasoft\", \"slug\": \"opendatasoft\"}, {\"id\": \"91f4ba55-7945-4f15-9bf7-0395e25694c8\", \"type\": \"endpoint\", \"name\": \"Amazon Product Data\", \"app_name\": \"Rainforest\", \"app_slug\": \"rainforest\", \"slug\": \"amazon-product-data\"}, {\"id\": \"beb551fe-61fc-4406-a199-8a90bc66669a\", \"type\": \"endpoint\", \"name\": \"Keyword Data\", \"app_name\": \"SEOmonitor\", \"app_slug\": \"seomonitor\", \"slug\": \"keyword-data\"}, {\"id\": \"936db7e5-139c-406d-a4da-6bc4afdcb502\", \"type\": \"endpoint\", \"name\": \"Instagram Location Metadata\", \"app_name\": \"SocialScrape\", \"app_slug\": \"socialscrape\", \"slug\": \"instagram-location-metadata\"}, {\"id\": \"1ccbe76c-1f43-4a84-872f-6a431f11529b\", \"type\": \"endpoint\", \"name\": \"Instagram Hashtag Metadata\", \"app_name\": \"SocialScrape\", \"app_slug\": \"socialscrape\", \"slug\": \"instagram-hashtag-metadata\"}, {\"id\": \"54cc5b77-365c-437b-9ba3-9bb83d29b83a\", \"type\": \"endpoint\", \"name\": \"Target Product Data\", \"app_name\": \"RedCircle\", \"app_slug\": \"redcircle\", \"slug\": \"target-product-data\"}, {\"id\": \"ce591a2a-2c20-4230-b855-1a48953f667e\", \"type\": \"formula\", \"name\": \"Countdown eBay Product Data - Pagination\", \"slug\": \"countdown-ebay-product-data-pagination\"}, {\"id\": \"29ce848a-ef01-4553-a931-804e26da5c27\", \"type\": \"article\", \"name\": \"\\u2696\\ufe0f Is Data Scraping Legal?\", \"slug\": \"is-data-scraping-legal\"}, {\"id\": \"c180258e-d73c-4596-b7b3-941797392336\", \"type\": \"article\", \"name\": \"Scrape Netflix Most Watched Shows & Movies Data\", \"slug\": \"scrape-netflix-historical-viewership-data-shows-movies\"}, {\"id\": \"0a761f36-dd4b-4870-93d1-17c560f8fce6\", \"type\": \"article\", \"name\": \"\\ud83c\\udf7f Visualizing Netflix Catalog Data from Guidebox\", \"slug\": \"visualize-netflix-catalog-data-guidebox\"}]}"
          },
          "redirectURL": "",
          "headersSize": 199,
          "bodySize": 1866,
          "_transferSize": 2065,
          "_error": null
        },
        "serverIPAddress": "159.89.40.197",
        "startedDateTime": "2024-07-07T15:26:26.032Z",
        "time": 294.1140000025472,
        "timings": {
          "blocked": 0.5910000003897585,
          "dns": 0.0040000000000000036,
          "ssl": 73.09400000000001,
          "connect": 143.787,
          "send": 2.1469999999999914,
          "wait": 145.7370000022757,
          "receive": 1.847999999881722,
          "_blocked_queueing": 0.4310000003897585
        }
      }
    ]
  }
}
  

💥 The interesting bits of the HAR file will be in the "entries" section, where each item will contain a network request and response data. For scraping data, you'll want to look at the responses from the web servers, especially responses that contain raw JSON data containing the information you'd like to scrape. You can extract this data using your own scripts or simply drag and drop your HAR file here to download any detected JSON blocks of data for free!

HAR File Security

❌ Never share your HAR file with an untrusted third party, as HAR files may contain sensitive fields like cookies. For example, if you share a HAR file from browsing Instagram while logged in, the other party can see your cookies and attempt to access your Instagram account. While modern websites like Instagram can block these attempts, you still may be compromised so always be careful!

🔒 Due to the sensitive nature of HAR files, our HAR file web scraper only processes the raw HAR file in your browser (so we never access it) and is never actually uploaded to our servers. The only time data from your HAR file is sent to our servers is when you click the "Parse" button on a request group, which only sends the network response data and does not send sensitive data like cookies to our servers.