This project analyzes the best-selling Amazon books dataset using Python and pandas. It provides insights such as top authors by the number of books listed, average ratings by genre, and basic data exploration.
-
📊 Data Loading and Exploration:
- Load the dataset from a CSV file.
- Display the first 5 rows of the dataset.
- Show the shape and column names of the dataset.
- Provide summary statistics for numerical columns.
-
🛠️ Data Cleaning:
- Remove duplicate entries.
- Rename columns for better readability (e.g., "Name" to "Title", "Year" to "Publication Year", "User Rating" to "Rating").
- Convert the "Price" column to a float data type.
-
📈 Analysis:
- Identify the most popular authors by counting the number of books they have listed.
- Calculate the average rating for each genre.
-
💾 Export Results:
- Save the top 10 most popular authors to a CSV file (
top_authors.csv). - Save the average rating by genre to a CSV file (
avg_rating_by_genre.csv).
- Save the top 10 most popular authors to a CSV file (
- main.py: Contains the Python script for data analysis.
- bestsellers.csv: The dataset of best-selling Amazon books (not included in this repository).
- top_authors.csv: Output file containing the top 10 authors by the number of books listed.
- avg_rating_by_genre.csv: Output file containing the average rating by genre.
- Python 3.x
- pandas library
- Clone the repository and navigate to the project directory.
- Place the
bestsellers.csvfile in theAnalyze Best Selling Amazon Booksfolder. - Run the
main.pyscript:python main.py
- Check the output files (
top_authors.csvandavg_rating_by_genre.csv) in the same folder.
