Dedupe.io was shut down Jan 31, 2023.
The Dedupe.io team has decided to dedicate our focus to our consulting practice at DataMade and work on projects more aligned with our mission to support our clients in working toward democracy, justice, and equity.
De-duplicate and find matches in your Excel spreadsheet or database
Dedupe.io is a powerful tool that learns the best way to find similar rows in your data. Using cutting-edge research in machine learning we quickly and accurately identify matches in your Excel spreadsheet or database—saving you time and money.
Trusted at organizations around the world
A simple tool for a complex problem
In today’s world of big data, there’s never been more information available to work with. Unfortunately, all this data is hard to use, especially if it’s been entered by hand or comes from different systems. The simple task of figuring out who is who in a spreadsheet or database can be a daunting, time-consuming task.

That’s where Dedupe.io comes in. We developed the best dynamic and scalable solution for de-duplicating and linking datasets, and built a simple step-by-step wizard for anyone to use it.
Read more about how and why we built Dedupe.io »
Dedupe.io uses
- De-duplicating customer records
- Combining lists of addresses or businesses
- Master data management
- Merging different database systems
- Creating a master list of products or parts
- Cleaning up lists of names and emails
- Finding contributions in campaign finance
- Cross-referencing government records
Dedupe in action
Select examples of impactful projects powered by Dedupe.io and the dedupe python library.
How can you use Dedupe.io?
Find duplicates in a spreadsheet
Upload a spreadsheet and find all exact and similar records within it
Merge multiple files
Link together two or more spreadsheets and find overlapping records in each
Check against a canonical list
Upload a master list and check new spreadsheets against it
We find the hard matches
Real-world data is messy, and Dedupe.io was built to work with it
We find matches even when there are major data quality issues
Typos, misspellings and abbreviations
Data that is hand-typed can have misspellings, abbreviations and other typos
We match them using powerful text similarity algorithms
Inconsistent formatting
Different people and systems format data differently
We parse out names, addresses and any text to make smart comparisons
| site_name | address | phone | |
|---|---|---|---|
| Chicago Commons Guadalupano | 1814 S. Paulina 60608 | 6663883 | |
| Chicago Commons Guadalupano Family Center | 1814 South Paulina 60608 | 6663884 | |
| Chicago Commons Association - Guadalupano Family Center | 1814 S Paulina St | 6663883 | |
| CHICAGO COMMONS ASSOCIATION GUADALUPANO FAMILY CENTER | 1814 S PAULINA 60608 | 6663883 |
Contradictory fields
Sometimes, your data doesn't agree with itself
We compare using multiple fields to find records with the most agreement
How it works
Upload your data
Upload any spreadsheet or connect directly to your database
Train it
You provide training on the right way to identify similar records in your data
Validate and download
Matches are automatically found for you to review and then download
Questions?
We're happy to help! Read our FAQ

