top of page
Search

Yelp Web Scraper

  • Writer: Atharva Anil Dastane
    Atharva Anil Dastane
  • Nov 16, 2022
  • 1 min read

Updated: Nov 30, 2022

Introduction

What is Web Scraping?

Web scraping is an automatic method to obtain large amounts of data from websites. Most of this data is unstructured data in an HTML format which is then converted into structured data in a spreadsheet or a database so that it can be used in various applications.

What is Yelp?
Yelp's website, Yelp.com, is a crowd-sourced local business review and social networking site. The site has pages devoted to individual locations, such as restaurants or schools, where Yelp users can submit a review of their products or services using a one to five star rating scale. Businesses can also update contact information, hours, and other basic listing information or add special deals. In addition to writing reviews, users can react to reviews, plan events, or discuss their personal lives.

Problem Statement

Nowadays data is not easily available. Even if we acquire data it never comes in proper format. It is messy all its way. So I wanted to check out some of the best restaurants from different metropolitan cities in the US from yelp.com. I also wanted to check out what people say about those restaurants based on their experience. Instead of hovering over those webpages back and forth, I wanted to check out all the restaurants and their reviews in a go simultaneously. Here is my notebook that gives you a detailed overview of web scraping Yelp data in an Excel file.


Beautiful Soup

Beautiful Soup is a Python library for getting data out of HTML, XML, and other markup languages. Say you’ve found some webpages that display data relevant to your research, such as date or address information, but that do not provide any way of downloading the data directly. Beautiful Soup helps you pull particular content from a webpage, remove the HTML markup, and save the information. It is a tool for web scraping that helps you clean up and parse the documents you have pulled down from the web.


Importing necessary libraries

  • csv - to store all the restaurant and reviews data to a csv.

  • requests - module to send http requests using Python.

  • re - module to use to regular expression, if needed.

  • numpy, pandas - for basic data manipulation.


Creation of 10 URLs to scrape 100 best restaurants from a particular City



Web Scraping Restaurants and thier Reviews


Function to get the Soup Data

  • Function takes in the URL to be scraped.

  • We create dummy headers so that Yelp does not block us from getting the data from the same IP address.

  • We send in the http request throught request.get(url) and store it in response.

Function to get the HTML tag and its class
Main Function to be executed to generate the final Excel files
Function to get Restaurants Attributes
Function to generate CSV for Restaurants Attribute/Info
Function to get the Reviews Data for every Restaurants
Function to get the list of URL's containing Reviews
Function to get the Reviews and Ratings of the Customers
Function to generate CSV for Reviews and Ratings
Main Function Execution

Output


Top 20 Boston Restaurants Details and Reviews sample output.



Output files attached for Boston and Seattle
















I hope you enjoyed the Yelp Web Scraping journey!!











 
 
 

コメント


Feel Free to Reach me!

  • Github
  • LinkedIn
  • Email

©2022 by Atharva Anil Dastane | Proudly created with Wix.com

bottom of page