Summary:
This blog post explains how to make a simple demo web scraping program in Python using Beautiful Soup 4. The program searches eBay using a list of item names and returns the URL, name, and price of the first item listed.
Step 1: Doing the Process Manually
Before starting any programming, the first step is to search eBay manually while looking for ways to automate the process. Start by searching for a random item on eBay. In this example, “Python textbook” is entered into the search bar.
Step 2: Automating the Process
Notice the item searched, “Python textbook”, is also in the URL as “Python+textbook”.
If modified, the URL can be used to search for items on eBay. In this example “Python+textbook” is replaced with “watering+hose” and returns a page with “watering hose” as the search result.
Step 3: Getting Unique Identifiers
To scrape the name and the price of each item, Beautiful Soup needs unique information like HTML tags and CSS class names. A quick way to view the name or price’s specific place in HTML is by using the browser’s inspect tool. Chrome was used in this example, but Firefox also has a similar tool. To get the specific tag containing the first item’s name
- Right click on the name of the item
- Click Inspect
This should open the Elements tab and highlight the HTML containing the item’s name.
HTML for Name:
<h3 class="s-item__title" role="text">Latex 25 50 75 100 FT Expanding Flexible Garden Water Hose with Spray Nozzle</h3>
HTML for Price:
<span class="s-item__price">$6.80<span class="DEFAULT"> to </span>$18.95</span>
Step 4: Writing the Code
Now that the preliminary steps are done it’s time to put everything together using code. Import Beautiful Soup 4 and Requests to allow HTML parsing and HTTP requests. Next, create a list of items to be searched on eBay. The rest of the code will be divided into two functions, make_urls() and ebay_scrape().
from bs4 import BeautifulSoup import requests # List of item names to search on eBay name_list = ["Ramen", "Monster Hunter World", "Adhesive page markers", "Calculator", "arduino", "gtx 1070", "bluetooth headphones", "coffee machine", "sweet tea", "Python textbook"]
make_urls()
The make_urls() function creates URLS leading to the search page for each item in name_list. It does this by appending names inside of name_list, whose spaces have been replaced with “+”, to the end of the modified URL from step 2. When finished, the program returns a list of URLS pointing to each item’s search result.
# Returns a list of urls that search eBay for an item def make_urls(names): # eBay url that can be modified to search for a specific item on eBay url = "https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2380057.m570.l1312.R1.TR11.TRC2.A0.H0.XIp.TRS1&_nkw=" # List of urls created urls = [] for name in names: # Adds the name of item being searched to the end of the eBay url and appends it to the urls list # In order for it to work the spaces need to be replaced with a + urls.append(url + name.replace(" ", "+")) # Returns the list of completed urls return urls
ebay_scrape()
ebay_scrape() scrapes and prints the url, name, and price of the first eBay search result for each item. When finding the name and price using the Beautiful Soup 4 object, “soup”, the tag and class name found in step 3 are used in the soup.find() method with the tag being listed first followed by the class name.
# Scrapes and prints the url, name, and price of the first item result listed on eBay def ebay_scrape(urls): for url in urls: # Downloads the eBay page for processing res = requests.get(url) # Raises an exception error if there's an error downloading the website res.raise_for_status() # Creates a BeautifulSoup object for HTML parsing soup = BeautifulSoup(res.text, 'html.parser') # Scrapes the first listed item's name name = soup.find("h3", {"class": "s-item__title"}).get_text(separator=u" ") # Scrapes the first listed item's price price = soup.find("span", {"class": "s-item__price"}).get_text() # Prints the url, listed item name, and the price of the item print(url) print("Item Name: " + name) print("Price: " + price + "\n")
Entire eBay_search Code
from bs4 import BeautifulSoup import requests # List of item names to search on eBay name_list = ["Ramen", "Monster Hunter World", "Adhesive page markers", "Calculator", "arduino", "gtx 1070", "bluetooth headphones", "coffee machine", "sweet tea", "Python textbook"] # Returns a list of urls that search eBay for an item def make_urls(names): # eBay url that can be modified to search for a specific item on eBay url = "https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2380057.m570.l1312.R1.TR11.TRC2.A0.H0.XIp.TRS1&_nkw=" # List of urls created urls = [] for name in names: # Adds the name of item being searched to the end of the eBay url and appends it to the urls list # In order for it to work the spaces need to be replaced with a + urls.append(url + name.replace(" ", "+")) # Returns the list of completed urls return urls # Scrapes and prints the url, name, and price of the first item result listed on eBay def ebay_scrape(urls): for url in urls: # Downloads the eBay page for processing res = requests.get(url) # Raises an exception error if there's an error downloading the website res.raise_for_status() # Creates a BeautifulSoup object for HTML parsing soup = BeautifulSoup(res.text, 'html.parser') # Scrapes the first listed item's name name = soup.find("h3", {"class": "s-item__title"}).get_text(separator=u" ") # Scrapes the first listed item's price price = soup.find("span", {"class": "s-item__price"}).get_text() # Prints the url, listed item name, and the price of the item print(url) print("Item Name: " + name) print("Price: " + price + "\n") # Runs the code # 1. Make the eBay url list # 2. Use the returned url list to search eBay and scrape and print information on each item ebay_scrape(make_urls(name_list))
Step 5: Running the Code
After running the program it will print the item’s URL, name, and price.