top of page
Search

Amazon Web Scraping

In this project, I developed an automated web scraper using Python, Selenium WebDriver, and BeautifulSoup to extract product details (title, price, ratings, and review percentages) from Amazon product pages for a gaming laptop.



METHODS


Libraries and Dependencies


The code begins by importing necessary libraries like 'BeautifulSoup', 'requests', 'smtplib', 'selenium', 'datetime', and 'csv'. 'BeautifulSoup' and 'requests' are used for parsing HTML and handling HTTP requests; 'smtplib' for sending emails through SMTP; 'Selenium' for automating and controlling web browser actions; 'datetime' for date and time manipulation; 'csv' for handling comma-separated values (CSV) file operations; and 'Pandas' for data analysis and manipulation in dataframes.


Selenium WebDriver and Browser Automation


A pivotal aspect of the script is the use of Selenium WebDriver. This tool automates a web browser, allowing the script to navigate to the Amazon product page. Configuring WebDriver and setting browser options let the script runs smoothly and efficiently, mimicking a real user's interaction with the website.


Data Extraction Process


The core functionality involves extracting various pieces of information from the Amazon product page. This includes the product title, the total number of customer ratings, the product price (divided into whole and fractional parts), the average customer rating, and the distribution of ratings across star levels 1 to 5. These pieces of data are critical for understanding the product's market performance and customer reception.


Data Storage and CSV File Handling


Once extracted, the data is carefully stored in a CSV file named 'AmazonGamingLaptop2023.csv'. The script will create this file if it doesn't exist and append new data with each execution. Organizing the data into a structured format and writing it into the CSV file is an essential part of this process, enabling easy access and analysis of the data.


Automated Price Monitoring and Email Notifications


A significant feature of this script is the automated monitoring of the product's price. The 'check_price' function watches over the product price and triggers an email alert if the price drops below a pre-set threshold of $1,000. The send_mail function then uses Gmail's SMTP server to send out the alert. This function is built to handle server connection, authentication, and email dispatch.


Continuous Monitoring Loop


To ensure consistent monitoring, the script operates in an infinite loop, calling the 'check_price' function every 24 hours. This routine ensures daily updates on the product's price and other key characteristics, providing timely information to the user.


Pandas DataFrame Integration


Lastly, the script reads the data from the CSV file into a Pandas dataframe, facilitating further analysis, manipulation, or visualization of the data. This feature enhances the script's utility, making it a more versatile tool for data-driven decision-making.


SUMMARY


In summary, this Python script is a comprehensive solution for monitoring Amazon product details, especially useful for tracking price changes and other key metrics of a gaming laptop. It combines the power of web scraping, automated browser interaction, data management, and alert mechanisms in a single, efficient package. However, it is tailored for Amazon and demands careful handling, particularly concerning the automation of email notifications. As e-commerce continues to evolve, such scripts play an increasingly important role in empowering users with timely and accurate information.



Below is the full Python script I used:




Click on the GitHub icon below to view the Python script for this project.



12 views

Commentaires


bottom of page