A simple web scraping project that extracts product details from Hamleys.in using Python.
This repository contains Python scripts to scrape product metadata from Hamleys product pages and export the results to a CSV file. The primary entry point is final.py.
- Scrapes key fields: URL, Product Name, Brand Name, Price, Category, Image URL, Description
- Reads input URLs from
url_list.csv - Exports results to
data.csvvia pandas
final.py: Main script. Readsurl_list.csv, scrapes each page, writesdata.csv.scraper.py,here_is_everything.py, etc.: Supporting or exploratory scripts used during development.url_list.csv: Input list of product page URLs (one per line).data.csv: Output CSV generated byfinal.py.
- Python 3.9+
- See
requirements.txtfor Python package dependencies.
Install dependencies:
pip install -r requirements.txt- Add target product URLs (one per line) to
url_list.csv. - Run the main script:
python final.pyOn success, a data.csv file will be created/overwritten with columns:
URL, Product Name, Brand Name, Price, Category, Image URL, Description.
- The scraper relies on the current HTML structure of Hamleys.in. If the site structure changes, selectors in
final.pymay need updates. - Be respectful of robots.txt and the website's terms of use. Use responsibly.
- Connection errors: If you encounter timeouts or failed requests, check your internet connection and try again later. You can also add retry logic or exponential backoff in
final.pyfor improved reliability. - Invalid URLs: Double-check that every line in
url_list.csvcontains a valid, reachable Hamleys product URL. - Missing data: If some fields are empty in
data.csv, the website's HTML structure may have changed. Inspect the page and update selectors infinal.pyas needed. - Dependency issues: Ensure all required packages are installed using
pip install -r requirements.txt. Use Python 3.9 or newer. - Permission errors: Run your scripts with appropriate permissions, especially if writing files to protected directories.
- Respect website policies: Avoid sending too many requests in a short time. Review Hamleys.in robots.txt and terms of use.
If you need help, feel free to open an issue or start a discussion!
This project is provided for educational purposes. You may adapt it for your own use.
This project is for educational purposes only. Always check a website's robots.txt file and terms of service before scraping. Use responsibly and avoid overloading servers with too many requests.
