How to Remove Duplicate Contacts From a CSV

TL;DR

Duplicates creep into CSVs from multiple exports, inconsistent formatting, and merged lists - and they compound every time you import.
Exact-match deduplication catches the obvious ones, but fuzzy matching is what finds "Acme Ltd" vs "ACME Limited" vs "acme".
DataFixr's clean page handles deduplication, merging, and formatting standardisation in one upload - no scripts or spreadsheet formulas needed.

Duplicate contacts are the most common data quality problem in B2B. They’re also one of the easiest to fix - if you catch them before they hit your CRM.

The issue is that most people don’t catch them. They export a CSV, glance at the row count, and import it straight into their CRM or sequencer. Three months later, they’ve got reps emailing the same person from two different sequences, pipeline numbers that don’t add up, and an ops team spending Friday afternoons merging records that should never have been separated.

This guide covers where duplicates come from, why they matter more than you’d think, and how to remove them - including a walkthrough of doing it in DataFixr in about two minutes.

Where duplicates come from

Duplicates rarely happen because someone added the same contact twice on purpose. They happen because of how data gets combined, exported, and reformatted across tools.

The most common sources are multiple exports from the same platform at different times (Sales Navigator this week, Sales Navigator again next month - same people, slightly different data), merging lists from different sources without deduplicating first, inconsistent formatting that makes the same record look different (“Acme Ltd” in one row, “ACME Limited” in another), and CRM exports that include both a contact’s current and previous entry because their job title or email changed.

The tricky part is that most of these aren’t exact duplicates. The name might be slightly different, the company might be formatted differently, or one row has an email while the other has a phone number. Simple “remove duplicates” in Excel won’t catch these.

Why it matters

A few duplicates in a 200-row list might not seem like a big deal. But the problems compound quickly.

You burn credits enriching or verifying the same contact twice. Reps reach out to the same person from different sequences, which looks unprofessional and damages trust. Your CRM contact count is inflated, which skews every report built on top of it. Automations - lead scoring, territory routing, sequence enrolment - fire twice for the same person. And when you eventually do try to clean things up, you’ve got activity history split across multiple records that needs to be merged carefully.

The earlier you catch duplicates, the cheaper the fix. Deduplicating a CSV before import takes minutes. Merging duplicate CRM records with months of activity history takes hours.

How to do it manually

If you’re working with a small list and want to handle it in a spreadsheet, the basic approach is to sort by the column most likely to contain duplicates (usually email), scan for exact matches, and delete the less complete row. For near-duplicates, sort by company name and scan for variations.

This works for lists under a few hundred rows, but it doesn’t scale. You’ll miss fuzzy matches, you’ll lose data from deleted rows that had fields the surviving row didn’t, and you’ll spend time you could be spending on outreach.

How to do it in DataFixr

DataFixr’s clean page handles deduplication as part of a broader cleaning workflow - so you’re not just removing duplicates, you’re also standardising the formatting that causes them in the first place.

Here’s the step-by-step.

Step 1 - Upload your CSV

Go to portal.datafixr.io/clean and upload your CSV file. DataFixr will parse it and show you the row count and column count immediately. You can preview the raw data in the table at the bottom of the page to make sure everything loaded correctly.

Step 2 - Choose a preset

You’ll see three cleaning presets: Basic, Standard (recommended), and Aggressive.

For deduplication specifically, the key difference is how each preset handles matching. Basic runs with deduplication set to none - it cleans formatting but doesn’t remove duplicate rows. Standard deduplicates by company + domain and keeps the most complete record from each group. Aggressive uses fuzzy company matching, which catches near-duplicates like “Marks & Spencer” and “Marks and Spencer” or “Acme Ltd” and “ACME Limited.”

If your main goal is removing duplicates, start with Standard. If you suspect your list has a lot of formatting inconsistencies creating near-duplicates, go Aggressive.

Step 3 - Fine-tune the dedupe settings (optional)

Below the presets, you can adjust the dedupe mode directly. The options are: none, exact row, company, company + domain, email, phone, and fuzzy company. Each one changes what DataFixr uses as the matching key.

You can also set what happens when duplicates are found. Dedupe keep controls which row survives - first, last, or most complete. “Most complete” is usually the best choice because it keeps the row with the most populated fields.

If you turn on merge duplicates by completeness, DataFixr won’t just keep one row and delete the others - it’ll merge them, pulling in any unique data from the duplicate rows into the surviving record. So if one duplicate has a phone number and the other has an email, the merged record gets both.

Step 4 - Run the clean

Hit Clean data. A confirmation modal shows you the credit cost (1 credit per 10 rows, based on your uploaded row count) and your balance before and after. Confirm and the clean runs.

Step 5 - Review the results

Once it’s done, DataFixr shows you a full breakdown: rows in vs rows out, duplicates removed, cell changes, emails and phones normalised, countries standardised, and formula cells escaped.

Open the change preview to see exactly what was changed - row by row, field by field, with before and after values and the rules that were applied. If you used fuzzy matching, you’ll also see the fuzzy duplicate clusters - groups of company names that DataFixr identified as likely duplicates, with examples and row numbers.

Step 6 - Download the cleaned file

If everything looks right, hit Download CSV. The output file has the suffix _fixr_cleaned.csv so you can keep it alongside the original. From here, it’s ready to import into your CRM, load into a sequencer, or push into whatever comes next in your workflow.

Which dedupe mode should you use?

It depends on what your list looks like.

Email is the safest starting point if every row has an email address. It’s deterministic - same email, same person, no ambiguity.

Company + domain works well for lists where multiple contacts might share an email domain but you only want one contact per company. This is common with prospecting lists where you’ve pulled several people from the same organisation.

Fuzzy company is for lists where company names are inconsistent. It uses similarity matching to group names that are close but not identical - “Procter & Gamble” and “Procter and Gamble” and “P&G” (if you add that to the company alias map). This catches the duplicates that exact matching misses.

Exact row is the most conservative - it only removes rows where every single field is identical. Useful as a safety net, but it won’t catch much in real-world data.

Tips for cleaner deduplication

For a broader comparison of tools that handle deduplication alongside other cleaning tasks, see best CSV cleaning tools for sales and RevOps teams.

Standardise before you dedupe. If “ACME LTD” and “Acme Limited” get normalised to the same value before the duplicate check runs, exact matching works much better. DataFixr does this automatically - it strips legal suffixes, normalises casing, and collapses whitespace before running the dedupe pass.

Use the company alias map for known variations. If your data regularly contains abbreviations like “M&S” for “Marks and Spencer” or “P&G” for “Procter and Gamble,” add them to the alias map in the advanced settings. DataFixr will resolve them to the canonical name before matching.

Always review fuzzy clusters before trusting them. Fuzzy matching is powerful but not infallible - “Riverside Health” and “Riverside Heating” are similar strings but different companies. The cluster preview lets you sanity-check before anything gets merged.

Wrapping up

After deduplication, how to clean a lead list before CRM import covers the remaining steps before uploading to your CRM. For a full pre-import cleaning workflow - deduplication, validation, and field mapping - see CSV cleaning tool for CRM imports. Not sure what problems your CSV has? Use the free CSV health checker to get a quick readiness report before you start.

Deduplication isn’t glamorous, but it’s one of the highest-ROI data tasks you can do. Every duplicate you remove before import is a credit saved, a misfire prevented, and a report that’s slightly more accurate.

If you’re doing this manually in spreadsheets, you’re probably spending more time on it than you need to - and missing the fuzzy matches that cause the most problems downstream.

Try the DataFixr clean page →

DataFixr handles deduplication, formatting standardisation, validation, and formula protection in a single upload-and-clean workflow. Start using DataFixr free ->

Start using DataFixr free

Keep your outbound workflows clean, enriched, and governed.

Related guides

Best Data Cleaning Platforms for Messy CSV Imports

Best CSV Cleaning Tools for Sales and RevOps Teams

How to Clean a CSV File Before Uploading It to HubSpot

How to Join CSV Files Without Creating Duplicate Records