Sunbelt Computer Software

WebScraping-using-python-Hamleys.in

A simple web scraping project that extracts product details from Hamleys.in using Python.

This repository contains Python scripts to scrape product metadata from Hamleys product pages and export the results to a CSV file. The primary entry point is final.py.

Features

Scrapes key fields: URL, Product Name, Brand Name, Price, Category, Image URL, Description
Reads input URLs from url_list.csv
Exports results to data.csv via pandas

Project structure

final.py: Main script. Reads url_list.csv, scrapes each page, writes data.csv.
scraper.py, here_is_everything.py, etc.: Supporting or exploratory scripts used during development.
url_list.csv: Input list of product page URLs (one per line).
data.csv: Output CSV generated by final.py.

Requirements

Python 3.9+
See requirements.txt for Python package dependencies.

Install dependencies:

pip install -r requirements.txt

Usage

Add target product URLs (one per line) to url_list.csv.
Run the main script:

python final.py

On success, a data.csv file will be created/overwritten with columns: URL, Product Name, Brand Name, Price, Category, Image URL, Description.

Notes and assumptions

The scraper relies on the current HTML structure of Hamleys.in. If the site structure changes, selectors in final.py may need updates.
Be respectful of robots.txt and the website's terms of use. Use responsibly.

🚦 Troubleshooting

Connection errors: If you encounter timeouts or failed requests, check your internet connection and try again later. You can also add retry logic or exponential backoff in final.py for improved reliability.
Invalid URLs: Double-check that every line in url_list.csv contains a valid, reachable Hamleys product URL.
Missing data: If some fields are empty in data.csv, the website's HTML structure may have changed. Inspect the page and update selectors in final.py as needed.
Dependency issues: Ensure all required packages are installed using pip install -r requirements.txt. Use Python 3.9 or newer.
Permission errors: Run your scripts with appropriate permissions, especially if writing files to protected directories.
Respect website policies: Avoid sending too many requests in a short time. Review Hamleys.in robots.txt and terms of use.

If you need help, feel free to open an issue or start a discussion!

License

This project is provided for educational purposes. You may adapt it for your own use.

📝 Note on Web Scraping

This project is for educational purposes only. Always check a website's robots.txt file and terms of service before scraping. Use responsibly and avoid overloading servers with too many requests.

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebScraping-using-python-Hamleys.in

Features

Project structure

Requirements

Usage

Notes and assumptions

🚦 Troubleshooting

License

📝 Note on Web Scraping

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
data.csv		data.csv
final.py		final.py
requirements.txt		requirements.txt
url_list.csv		url_list.csv

Sunbelt Computer Software

PL/B Language Development and Support

Folders and files

Latest commit

History

Repository files navigation

WebScraping-using-python-Hamleys.in

Features

Project structure

Requirements

Usage

Notes and assumptions

🚦 Troubleshooting

License

📝 Note on Web Scraping

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages