Csv WorkflowsCrm HygieneData Cleaning

How to Clean a CSV File Before Uploading It to HubSpot

A practical checklist for cleaning a CSV before importing it into HubSpot - covering duplicates, field mapping, formatting, email validation, company names, lifecycle stages, and CRM hygiene.

Zacc
Director
24 Apr 2026 10 min read
TL;DR
  • Before importing a CSV into HubSpot, check duplicates, required fields, column names, email formats, company names, lifecycle stages, opt-out fields, and owner mapping.
  • Most HubSpot import problems start before the upload screen: messy source files, inconsistent formatting, and unclear field mapping.
  • The safest workflow is to clean, standardise, deduplicate, validate, preview, then import - rather than trying to fix everything after it hits the CRM.

A CSV import looks simple until it goes wrong.

You upload the file, map the columns, click import, and wait for the records to appear in HubSpot.

Then the problems start.

Duplicate contacts appear. Companies do not match. Lifecycle stages are wrong. Owners are missing. Phone numbers are inconsistent. Email fields contain junk. Some records overwrite useful data. Others fail to import. Reps start asking which version of the data they should trust.

The issue is usually not HubSpot itself.

The issue is the file you uploaded.

Most CRM import problems begin before the upload screen. If the CSV is messy, incomplete, duplicated, or badly mapped, the CRM will inherit those problems. And once bad data enters the CRM, it becomes harder to fix because it starts touching workflows, reports, sequences, tasks, owners, deals, and automations.

This guide walks through how to clean a CSV before uploading it to HubSpot, so your CRM stays usable and your team does not spend the next week fixing avoidable import mistakes.


The short version

If you’re choosing a tool to help, best CSV cleaning tools for sales and RevOps compares the options.

Before uploading a CSV to HubSpot, check six things:

  1. Are the records duplicates?
  2. Are the columns mapped clearly?
  3. Are emails, phone numbers, names, and company fields formatted consistently?
  4. Are required CRM fields present?
  5. Are lifecycle stages, owners, countries, and other picklist values valid?
  6. Are suppression, opt-out, and compliance fields handled correctly?

The goal is not just to get the file imported.

The goal is to make sure the imported records are clean enough to use.


Why CSV imports cause CRM hygiene problems

A CSV is flexible. That is useful, but it is also dangerous.

One spreadsheet can contain data from LinkedIn exports, event lists, enrichment tools, scraped sources, old CRM exports, partner lists, webinar attendees, and manually edited rows.

Each source may use different field names, formatting, naming conventions, and levels of quality.

For example:

  • One row uses “United Kingdom” and another uses “UK”
  • One company is listed as “Acme Ltd” and another as “ACME LIMITED”
  • One phone number includes the country code and another does not
  • One email field contains a personal Gmail address
  • One record has a company domain and another only has a company name
  • One contact appears three times with slight variations
  • One column says “Job Title” and another says “Position”
  • One owner value uses a name, another uses an email address
  • One row has “CEO”, another has “Founder & CEO”, and another has “Chief Executive Officer”

None of these issues feel dramatic in a spreadsheet.

But inside a CRM, they matter.

They affect matching, routing, segmentation, reporting, enrichment, personalisation, deduplication, workflow triggers, and rep trust.


Step 1: Save a clean working copy

Before you touch the file, save a copy of the original.

This sounds basic, but it matters.

If the import goes wrong or someone asks where a value came from, you need a source file to compare against. Never clean directly on the only copy of the CSV.

Use a simple naming convention like:

event-leads-original-2026-04-24.csv
event-leads-cleaned-for-hubspot-2026-04-24.csv

If multiple people are involved, make it clear which file is the source and which file is the upload-ready version.


Step 2: Remove obvious junk rows

Start by deleting rows that should never reach the CRM.

That might include:

  • Empty rows
  • Test records
  • Placeholder values
  • Internal team members
  • Supplier or vendor contacts
  • Competitors
  • Students or personal contacts if they are not relevant
  • Rows with no usable contact or company identifier
  • Records that clearly fall outside your ICP

A lot of teams skip this step because they assume the CRM can hold everything.

It can. But that does not mean it should.

Every bad record creates noise. It can affect reporting, waste rep time, trigger irrelevant automation, and make segmentation harder later.


Step 3: Standardise column names

Before mapping fields into HubSpot, make your CSV headers clear.

Avoid vague column names like:

  • Name
  • Info
  • Status
  • Type
  • Notes
  • Source
  • Company
  • Number

Use specific names instead:

  • First Name
  • Last Name
  • Work Email
  • Mobile Phone
  • Company Name
  • Company Domain
  • Job Title
  • Lead Source
  • Lifecycle Stage
  • Contact Owner
  • Country
  • Industry

Good column names reduce mapping mistakes.

They also make it easier for someone else to review the file before import.

A simple rule: if a new team member could not understand the column without asking, rename it.


Step 4: Split full names into first and last name

Many CSVs contain a single “Full Name” column.

That might look fine in a spreadsheet, but it is usually less useful in a CRM. Sales emails, personalisation variables, routing rules, and CRM fields often work better when first and last names are separate.

For example:

Sophie Chen should become:

  • First Name: Sophie
  • Last Name: Chen

This is not always perfect. Some names include prefixes, suffixes, middle names, double-barrelled surnames, or non-Western naming structures. Do not assume every name splits neatly.

But for most B2B lists, separating obvious first and last names makes the data more usable.

Keep the original full name in a separate column if you are unsure.


Step 5: Clean email addresses

Email is one of the most important fields in a HubSpot import because it is often used to identify and update contact records.

Before importing, check that emails are:

  • Lowercase
  • Free from spaces
  • Free from commas or semicolons
  • Structurally valid
  • Not obviously fake
  • Not duplicated across multiple contacts
  • Not personal addresses unless that is acceptable for your workflow
  • Not role-based addresses if you only want named contacts

Examples of values to review:

  • info@company.com
  • sales@company.com
  • admin@company.com
  • test@test.com
  • john@
  • n/a
  • unknown
  • john.smith @ company.com

A malformed email can break imports or create records that are not useful for outreach.

A valid-looking email can still be risky if it has not been verified. If the file is being used for outbound, validation should happen before the data reaches a sequencer.


Step 6: Standardise phone numbers

Phone numbers are one of the messiest fields in B2B data.

The same number can appear in multiple formats:

  • 07123 456789
  • +44 7123 456789
  • 0044 7123 456789
  • (07123) 456789
  • 7123456789
  • Call main office
  • N/A

Before importing, decide what format you want.

For UK records, using an international format is usually cleaner because it makes country context explicit. If your team works across regions, this matters even more.

Also separate phone types if possible.

A direct dial, mobile number, switchboard, and company mainline are not the same thing. If they are all dumped into one phone field, reps lose context.


Step 7: Standardise company names

Company names cause a lot of duplicate account problems.

The same company may appear as:

  • Acme Ltd
  • ACME LIMITED
  • Acme
  • Acme Group
  • Acme UK
  • Acme Technologies Ltd

Before importing, standardise obvious variations.

That does not always mean stripping every suffix. Sometimes the suffix matters. But you should at least remove inconsistent capitalisation, extra spaces, obvious typos, and formatting differences.

Where possible, use company domain as a matching field.

Company names can vary. Domains are often more stable.

For example:

  • Acme Ltd
  • ACME LIMITED
  • Acme Group

may all share:

  • acme.com

That makes matching easier.


Step 8: Check for duplicates before import

For a detailed walkthrough on deduplication specifically, see how to remove duplicate contacts from a CSV.

Do not wait until after import to deal with duplicates.

Check the CSV for duplicates using fields like:

  • Email address
  • Company domain
  • Company name
  • Phone number
  • LinkedIn URL
  • First name + last name + company
  • Company name + country

The right matching rule depends on the dataset.

For contacts, email is usually the strongest dedupe key when available. But not every file has email addresses, and not every duplicate uses the same email.

For companies, domain is usually stronger than company name. But some companies have multiple domains, regional domains, or parent-subsidiary structures.

This is why deduplication should be reviewed, not blindly automated.

The aim is to remove clear duplicates while avoiding accidental merges between genuinely different records.


Step 9: Review lifecycle stages and statuses

Lifecycle stage fields can cause reporting problems if they are imported carelessly.

Before importing, ask:

  • Should these records be leads, subscribers, MQLs, SQLs, opportunities, or something else?
  • Are you updating existing records or only creating new ones?
  • Should the CSV be allowed to overwrite lifecycle stage?
  • Does the value in the file exactly match the values used in HubSpot?
  • Will changing this field trigger any workflows?

This is one of the most common import mistakes.

A team uploads a list and accidentally changes lifecycle stages for existing contacts. Then reports become unreliable, automations trigger unexpectedly, and sales teams lose context.

If you are unsure, do not overwrite lifecycle stage automatically.

Create a review column instead.


Step 10: Check owner mapping

If your CSV contains owners, make sure the values match how your CRM expects ownership to be assigned.

Owner fields often fail because the CSV contains:

  • Nicknames
  • Old employees
  • Full names instead of emails
  • Emails that do not match CRM users
  • Blank owner values
  • Regional owner names that are not unique

A bad owner import can send leads to the wrong rep or leave records unassigned.

Before upload, check that every owner value is valid.

If ownership should be assigned later based on region, territory, company size, or segment, it may be better to import the records without owner values and let your routing rules handle it.


Step 11: Validate picklist fields

Picklist fields need exact values.

That might include:

  • Country
  • Industry
  • Company size
  • Lead source
  • Lifecycle stage
  • Persona
  • Department
  • Region
  • Territory
  • Segment

If your CRM uses “United Kingdom” but your CSV says “UK”, “GB”, “Great Britain”, and “England”, your import may fail or create messy segmentation.

Picklist values should be standardised before import.

Do not rely on someone fixing these manually later.

They usually will not.


Step 12: Protect opt-outs and suppression fields

This is one of the most important checks.

Before importing a CSV, make sure you understand whether any records are unsubscribed, suppressed, opted out, blocked, or excluded from outreach.

Do not let a new import accidentally reactivate people who should not be contacted.

Useful fields to review include:

  • Email opt-out
  • Do not call
  • Suppression reason
  • Consent status
  • Legitimate interest basis
  • Source
  • Date collected
  • Last updated
  • Data retention date

Even if these fields are not all in your CRM today, your import process should account for them.

Bad suppression handling creates operational and compliance risk.


Step 13: Decide what can overwrite existing records

Not every field in the CSV should be allowed to update HubSpot.

Before importing, divide fields into three groups.

Safe to fill if blank

These are fields that can be added when the CRM has no value.

Examples might include:

  • Job title
  • Company domain
  • Industry
  • LinkedIn URL
  • Company size

Review before overwriting

These are fields where the CSV might be newer, but the CRM might be more trusted.

Examples might include:

  • Phone number
  • Lifecycle stage
  • Owner
  • Company name
  • Country
  • Segment

Do not overwrite

These are fields where changing values could break workflows, reporting, or compliance.

Examples might include:

  • Opt-out fields
  • Suppression status
  • Original source
  • Lifecycle stage history
  • Manually verified fields
  • Customer status

This decision should happen before upload.

Once the data is imported, it is harder to separate a good update from a bad one.


Step 14: Run a final import-readiness check

Before uploading the file, ask these questions:

  • Are all required fields present?
  • Are emails formatted and validated?
  • Are phone numbers standardised?
  • Are company names cleaned?
  • Are duplicate contacts removed or flagged?
  • Are duplicate companies removed or flagged?
  • Are picklist values valid?
  • Are owners mapped correctly?
  • Are lifecycle stages intentional?
  • Are opt-outs protected?
  • Are source fields clear?
  • Are risky overwrite fields removed or reviewed?
  • Has someone previewed the final file?

This final check takes less time than cleaning up a bad import later.


A simple CSV cleaning workflow

For keeping your CRM clean after import, the CRM data hygiene checklist for sales and RevOps covers the recurring maintenance steps.

A good HubSpot import process usually looks like this:

  1. Save the original file
  2. Remove junk rows
  3. Standardise column names
  4. Split and format key fields
  5. Clean emails and phone numbers
  6. Standardise company names and domains
  7. Deduplicate contacts and companies
  8. Validate required fields
  9. Review lifecycle stage and owner fields
  10. Protect opt-outs and suppression fields
  11. Decide overwrite rules
  12. Preview the final CSV
  13. Import a small sample if needed
  14. Upload the final file

The workflow is not complicated.

The discipline is doing it every time.


Final thought

A clean HubSpot import starts before HubSpot.

The upload screen is not the place to discover duplicates, broken emails, inconsistent company names, bad owner values, or risky overwrite fields.

By that point, the file should already be ready.

Clean first. Standardise second. Deduplicate before import. Validate before use. Protect fields that should not be overwritten.

That process keeps your CRM cleaner, your reps more confident, and your reporting more reliable.

A CSV import should improve your CRM.

It should not become the reason your team stops trusting it.

For a focused workflow covering every step of preparing a CSV for HubSpot - including deduplication, field mapping, and validation - see HubSpot import cleaning. To check your CSV import readiness in seconds, use the free CSV health checker.


DataFixr helps teams clean, deduplicate, validate, and prepare CSV files before they reach HubSpot or any other CRM - so messy imports do not become long-term CRM hygiene problems. Request early access ->

Frequently asked questions

What should you clean before importing a CSV into HubSpot?
Before importing a CSV into HubSpot, check for duplicate contacts, missing required fields, inconsistent email formats, badly formatted phone numbers, and company name variations. You should also remove formula injection characters, map column names to HubSpot property names, and validate email addresses to avoid bounce-related sender reputation damage.
Can HubSpot deduplicate a messy import for you?
HubSpot deduplicates on email address at import time, but it won't catch fuzzy duplicates where names, phone numbers, or company names vary slightly. If the same contact appears twice with different email addresses, both will be imported as separate records. Clean duplicates before you upload, not after.