GitHub - Advik274/WebScrapping-using-python-hamleys.in: Python web scraper for extracting product data from Hamleys toy store · GitHub
Skip to content

Advik274/WebScrapping-using-python-hamleys.in

Folders and files

Repository files navigation

WebScraping-using-python-Hamleys.in

A simple web scraping project that extracts product details from Hamleys.in using Python.

This repository contains Python scripts to scrape product metadata from Hamleys product pages and export the results to a CSV file. The primary entry point is final.py.

Features

  • Scrapes key fields: URL, Product Name, Brand Name, Price, Category, Image URL, Description
  • Reads input URLs from url_list.csv
  • Exports results to data.csv via pandas

Project structure

  • final.py: Main script. Reads url_list.csv, scrapes each page, writes data.csv.
  • scraper.py, here_is_everything.py, etc.: Supporting or exploratory scripts used during development.
  • url_list.csv: Input list of product page URLs (one per line).
  • data.csv: Output CSV generated by final.py.

Requirements

  • Python 3.9+
  • See requirements.txt for Python package dependencies.

Install dependencies:

pip install -r requirements.txt

Usage

  1. Add target product URLs (one per line) to url_list.csv.
  2. Run the main script:
python final.py

On success, a data.csv file will be created/overwritten with columns: URL, Product Name, Brand Name, Price, Category, Image URL, Description.

Notes and assumptions

  • The scraper relies on the current HTML structure of Hamleys.in. If the site structure changes, selectors in final.py may need updates.
  • Be respectful of robots.txt and the website's terms of use. Use responsibly.

🚦 Troubleshooting

  • Connection errors: If you encounter timeouts or failed requests, check your internet connection and try again later. You can also add retry logic or exponential backoff in final.py for improved reliability.
  • Invalid URLs: Double-check that every line in url_list.csv contains a valid, reachable Hamleys product URL.
  • Missing data: If some fields are empty in data.csv, the website's HTML structure may have changed. Inspect the page and update selectors in final.py as needed.
  • Dependency issues: Ensure all required packages are installed using pip install -r requirements.txt. Use Python 3.9 or newer.
  • Permission errors: Run your scripts with appropriate permissions, especially if writing files to protected directories.
  • Respect website policies: Avoid sending too many requests in a short time. Review Hamleys.in robots.txt and terms of use.

If you need help, feel free to open an issue or start a discussion!

License

This project is provided for educational purposes. You may adapt it for your own use.

📝 Note on Web Scraping

This project is for educational purposes only. Always check a website's robots.txt file and terms of service before scraping. Use responsibly and avoid overloading servers with too many requests.

About

Python web scraper for extracting product data from Hamleys toy store

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages