Sunbelt Computer Software

Customer Segmentation with RFM Model and K-Means Clustering

🔍 Overview

This project demonstrates how to segment customers using the RFM (Recency, Frequency, Monetary) model combined with K-Means clustering. It helps identify patterns in customer behavior and groups them into actionable segments such as "Loyal Customers", "At Risk", and "Need Attention".

All customer data used in this project comes from OTA (Online Travel Agency) transactional data from kaggle. Customer IDs were randomized, it didn't exist in the data.

📁 Dataset

Source: Proprietary OTA customer booking data
Anonymized with randomized CustomerID
Time Period: 1 year snapshot

Fields used:

Recency: Days since last booking
Frequency: Number of bookings
Monetary: Total value of bookings

🧪 Step 1: Data Cleaning & Preprocessing

After loading the dataset, I:

Removed duplicates and nulls
Calculated RFM values
Applied square root transformation to reduce skew

📊 Step 2: RFM Scoring

Each customer was scored on a scale of 1 to 4 based on R, F, and M quartiles using pd.qcut. These scores were summed into an RFM_Score.

rfm_df['RFM_Score'] = rfm_df['R_Score'] + rfm_df['F_Score'] + rfm_df['M_Score']

Output:

🔄 Step 3: Optimal Cluster Selection

To determine the best number of clusters for K-Means, I used:

Elbow Method
Silhouette Score

from sklearn.metrics import silhouette_score

Output:

Based on results, I chose k = 4.

📊 Step 4: Clustering with K-Means

I standardized the RFM sqrt values and applied KMeans clustering.

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
kmeans = KMeans(n_clusters=4, random_state=42)
rfm_df['Cluster'] = kmeans.fit_predict(X_scaled)

Output:

📋 Step 5: Inspect distributions after preprocessing

Before running the clustering algorithms I checked whether the square-root transformation really reduced skewness and brought the three R ∙ F ∙ M features onto comparable shapes.

Output:

📋 Step 6: Summarise the resulting clusters

Once the optimal k was chosen I calculated the mean Recency, Frequency, and Monetary values for each cluster and counted how many customers fell into every group.

Output:

📋 Step 7: Segment Mapping

I assigned human-readable labels to each cluster based on their average RFM behavior:

cluster_labels = {
    0: 'Champions',
    1: 'At Risk',
    2: 'Loyal Customers',
    3: 'Need Attention'
}
rfm_df['Segment'] = rfm_df['Cluster'].map(cluster_labels)

Output:

📈 Step 8: Final Analysis & Insights

I aggregated cluster statistics and visualized them using bar plots.

cluster_summary = rfm_df.groupby('Segment')[['Recency', 'Frequency', 'Monetary']].mean().round(1)

Output:

🔺 Segment Takeaways

🪧 Tools Used

Python (Pandas, Scikit-learn, Seaborn, Matplotlib)
JupyterLab (GitHub Codespaces)

📚 How to Use This Repo

git clone https://github.com/13Saksham/rfm-customer-segmentation-python.git
cd rfm-customer-segmentation-python

🙏 Acknowledgements

Kaggle Datasets
Concepts inspired by DataCamp, StackOverflow, and self-practice

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
notebooks		notebooks
README.md		README.md

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Segmentation with RFM Model and K-Means Clustering

🔍 Overview

📁 Dataset

🧪 Step 1: Data Cleaning & Preprocessing

📊 Step 2: RFM Scoring

🔄 Step 3: Optimal Cluster Selection

📊 Step 4: Clustering with K-Means

📋 Step 5: Inspect distributions after preprocessing

📋 Step 6: Summarise the resulting clusters

📋 Step 7: Segment Mapping

📈 Step 8: Final Analysis & Insights

🔺 Segment Takeaways

🪧 Tools Used

📚 How to Use This Repo

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Segment	Insight
Champions	Recently active, high value, and frequent bookers
Loyal Customers	Consistent bookers with solid spend
At Risk	Haven't booked in a while but used to spend a lot
Need Attention	Low engagement and low spend

Sunbelt Computer Software

PL/B Language Development and Support

Folders and files

Latest commit

History

Repository files navigation

Customer Segmentation with RFM Model and K-Means Clustering

🔍 Overview

📁 Dataset

🧪 Step 1: Data Cleaning & Preprocessing

📊 Step 2: RFM Scoring

🔄 Step 3: Optimal Cluster Selection

📊 Step 4: Clustering with K-Means

📋 Step 5: Inspect distributions after preprocessing

📋 Step 6: Summarise the resulting clusters

📋 Step 7: Segment Mapping

📈 Step 8: Final Analysis & Insights

🔺 Segment Takeaways

🪧 Tools Used

📚 How to Use This Repo

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages