Preparing Data for ERP Implementation: a Plan for Cleaning Master Data and Reference Lists
Preparing data for ERP implementation: how to clean master data and reference lists, remove duplicates and align rules across departments before go-live.

Why ERP launches often fail because of data
ERP is usually implemented to bring order: unified reference lists, transparent accounting, less manual work. But if source data is messy, the system won't clean it by itself. It will quickly and strictly replicate mistakes. Therefore, preparing data for ERP implementation often matters more than configuring modules.
After go-live the same issues surface again and again: the same counterparty is created differently (TОО/LLP, abbreviations, different BIN/ID), items and services mismatch by units of measure and tax attributes, departments use different accounting rules and names for accounts, details are incomplete or outdated, and documents don’t reconcile by analytics because fields were filled inconsistently.
When such data enters an ERP, manual fixes become a constant burden. Managers resend cards, accounting corrects postings, procurement creates new items instead of finding existing ones. At the company level this becomes a queue of clarifications and loss of trust in reports.
Disputes between departments arise not out of malice but because their goals differ. Warehouse wants to post receipts quickly, sales want to issue an invoice, finance needs correct analytics, legal needs accurate details. If rules are not aligned, ERP records the conflict right in the reference data.
A useful rule is simple: before go-live, fix what breaks daily processes; “cosmetic” work can be finished later. Before launch it's almost always more beneficial to remove duplicates of key master data (counterparties, item master, employees) and approve required fields, unified naming and coding rules. Descriptions, additional attributes, rare classifiers and deep historical rework can be postponed if they are not needed for initial processes.
Example: if a manufacturing company with several sites has the same supplier entered as “ABC”, “ABC Ltd” and “АБС”, purchases will be fragmented, limits and prices won't be comparable, and the accounts payable report will become a source of disputes in the first month.
Which data should be prepared for ERP first
Start not with everything at once, but with what the system cannot operate without daily. These data set unified rules for all departments and most strongly influence whether there will be order or endless manual fixes.
Master data are the main business “cards” that documents and transactions attach to. Typically these are counterparties (customers and suppliers), items and materials (item master), employees, and accounting objects: warehouses, points of sale, departments, equipment.
A separate group is reference lists and classifiers. These are sets of values that help fill documents consistently: units of measure, currencies, VAT rates, statuses (for example, “new”, “in progress”, “closed”), reasons for write-offs, contract types, country and region codes. If these are not aligned beforehand, users will start creating their own variants and reports will stop matching.
At the start, there is usually no point in moving all history into ERP. Archive transactions, closed orders and years of documents are often needed only for reference. You can leave them in the source system or in an export, and move only the necessary minimum to ERP: open documents and opening balances.
To determine the minimal set for the first launch, answer four questions:
- Which processes start on day 1 (procurement, sales, warehouse, finance)?
- Which reference lists are mandatory for these processes?
- Which fields in the cards are actually used, and which are filled “just in case”?
- Which balances and open documents must be transferred so work does not stop?
If you start with warehouse and procurement, it is critical to prepare the item master, units of measure, warehouses, suppliers and opening balances. The rest can be added iteratively after stabilization.
Roles and agreements: who decides about the data
Before cleaning, agree on the basics: who has the right to say “this is correct” for each dataset. Without this, data preparation for ERP becomes endless edits and disputes.
The best working model is “data owner + executor + approver of rules”. The owner is responsible for meaning and final quality, not for personally editing every row.
Roles are often distributed by domain like this: finance is responsible for the chart of accounts, cost items, payment terms and tax attributes; procurement — for suppliers, purchasing item master and units of measure; sales — for customers and delivery terms; HR — for employees and org structure; IT or the project team — for loading rules, version control, tools and access.
Next, fix two principles. First: each reference list must have one “source of truth” where data are considered canonical. Second: the person who approves rules (formats, required fields) should be available when problematic records are found.
A practice that saves weeks is a short decision log. For example, when merging counterparty data, pre-agree required fields (BIN/ID, full and short name, address, status), naming format (how to write “TОО”, language, case), duplicate rules (what counts as a match, who approves a merge), allowed statuses (active, archive, under review) and exception handling procedures.
If you are a manufacturing company or an integrator, it is especially important that procurement, finance and warehouse agree on the item master and counterparties before migration. Then ERP will start working as a single system, not a set of different interpretations across departments.
Inventory of data sources and fields
Inventory is the moment when you stop arguing “we have everything” and document exactly where data live and which fields are actually needed. For ERP preparation this is one of the fastest ways to spot gaps, duplicates and manual-entry points.
Start with a list of sources. Typically these are 1С, CRM, Excel files from departments, warehouse or production systems, email (requests and details in correspondence), network folders with price lists and contracts. Include “grey” sources too: personal spreadsheets and files like “final_2.xlsx”.
To keep inventory from becoming an endless registry, record for each source the owner, how data appear (automatically or manually), realistically used fields, update frequency and main risks (empty fields, mixed formats, duplicates).
Then mark critical reference lists and documents without which ERP processes won’t take off: counterparties and contracts for invoices, item master and units for warehouse, warehouses and storage locations, employees and departments for requests and approvals. Agree on the “golden” source here: for example, counterparty details should come from one place, not three.
Separately identify points where data are entered manually without controls. A typical scenario: a manager copies a BIN from an email, accounting edits an address “as in the contract”, warehouse keeps balances in Excel. These spots generate most of the fixes after go-live and should be included in the cleanup plan and input rules.
Data profiling: quick checks before cleanup
Before mass cleanup it is useful to quickly measure what exactly is “sick” in the data. Such profiling usually takes 1–3 days and gives a clear picture: where are the biggest gaps, which fields conflict by format and how much time fixes will take.
Checks that almost always yield the most effect:
- Completeness of required fields: empty BIN/ID, addresses, department codes, units of measure, payment terms.
- Uniqueness of keys: one BIN/ID per counterparty, one SKU per item, one employee number per person.
- Values “out of rules”: different date and phone formats, mixing Cyrillic and Latin, leading or trailing spaces.
- Logical inconsistencies: counterparty without country, item without group, employee without department.
Then compile a short quality report to discuss things based on numbers rather than impressions. Usually enough is: field and source, problem type, volume (how many records) with 3–5 examples, risk for ERP processes (high/medium/low) and recommendation (fix in source, normalize by rule or correct manually with confirmation).
Example: if 18% of counterparties have no BIN/ID, duplicate merging and contract reconciliation will fail. This is almost always a higher priority than inconsistent phone formats. Such a report helps quickly form the work queue and assign responsibilities.
A step-by-step cleanup and normalization plan before go-live
To prevent data preparation for ERP from turning into endless edits, follow a simple principle: first agree what the data should look like, then clean.
A typical workflow is:
-
Define the target data model in ERP: which fields are actually needed, which are required, which reference lists are shared. If ERP doesn’t have a field or it is calculated by rules, don’t spend time filling it manually.
-
Describe normalization rules: BIN/ID format, unified abbreviations (TОО, LLP), address rules, phone format, units of measure, allowed statuses. Decide separately what to do with empty required fields.
-
Export data from sources and clean only in a working copy. Preserve the original so you can explain what and why was changed.
-
Configure duplicate detection and agree merging rules. Decide in advance which source has priority in conflicts (for example, legal name from accounting, contacts from CRM) and who approves disputed cases.
-
Perform a pilot load and reconciliation: load a small sample (for example, 200 counterparties and 50 items) into the test ERP and check reports, search, and absence of record “multiplication.”
After the pilot, formalize the result in a regulation: who creates new records, which fields are required, which checks run at input, how often duplicates are controlled. Otherwise the reference lists will become chaotic again within a month.
Normalizing master data: counterparties, items, employees
Normalizing master data means bringing key reference lists to a single standard so ERP does not contain dozens of variants of the same object. This usually has the most visible effect: fewer manual fixes, fewer disputes, faster searches and reporting.
For counterparties start with a naming rule. Decide what is stored in the “Name” field: legal name, brand or short name for documents. Store legal form, BIN/ID, branch indicator and status (active/archived) separately. This avoids variants like “TОО Chamomile (Almaty) / Chamomile TОО / Chamomile LLP”.
Contacts and addresses are better structured: split address into parts (country, city, street, house), store phones in one format, email without spaces. Record contact persons as separate entries, not as text in a “Comments” field.
A common issue is a “combo” field: code + text + manager note. Move notes to a separate field and store codes and values in strictly defined attributes.
Minimum rules to approve before cleanup:
- required fields for each type (counterparty, item, employee) and who fills them;
- default values (currency, unit of measure, country, department);
- ban on free-text input where a list should be used;
- unified naming and abbreviation rules;
- checks before creating a new record (by BIN/ID, phone, SKU, employee number).
If procurement, service and accounting operate simultaneously in a company, the counterparty in ERP should be a single record with roles and contracts linked to it. Then a new supplier will not become three different cards across departments.
Reference lists and classifiers: aligning without conflicts
Reference lists seem minor, but they are the source of post-launch disputes: why one report says “pcs”, another “pieces”, and a third “pc”. That is why preparing data for ERP almost always starts with aligning basic classifiers.
Start with a short core of lists that affect most documents: units of measure, currencies, countries, statuses, operation types. Then decide what is a code and what is a name. The code should be stable and short (for example, PCS); the name should be human-friendly (“piece”). If you put “pcs” or “pieces” in the code, you’ll end up with different values for the same meaning.
A simple normalization works well: one meaning — one value (no synonyms), unified format (case, spaces, punctuation, abbreviations), ban on free-text where a list must be used, and addition of new values only by request with justification.
To migrate without losses, create a mapping table “how it was” → “how it will be”. For example: “шт”, “штук”, “ед” → code PCS, name “piece”. Use it during migration, when loading history, and in integrations.
Decide which lists are centralized and which can be local. Currencies and countries are usually company-wide. Internal request statuses may be allowed by branch, but that should be an explicit rule.
Duplicates: how to find, verify and merge correctly
Duplicates in ERP are dangerous because they look small but quickly become sources of disputes: sales see one counterparty, accounting another, procurement a third. Therefore it is important to define in advance what constitutes a duplicate and who resolves disputed cases.
First, define duplicate types
Duplicates are rarely exact copies. More often they are a mix of exact copies, “almost identical” records, typos and “similar but different” cases (branches, same surnames, different legal entities under one brand). To avoid merging the wrong records, describe duplicate types and exception rules in advance.
For detection choose comparison keys that truly distinguish entities. Usually this is a combination of details: BIN/ID, name, address, phone, bank details. One key almost never gives reliable results.
How to merge: the “golden record” and review queue
When a pair of records is recognized as a duplicate, you need a clear principle of the “golden record”: which record remains primary and which is archived or linked. It is often easiest to assign a source priority (for example, accounting over CRM) and field-level rules: from where to take the legal name, where to take bank details, how to handle contacts.
If fields conflict (two different accounts, two addresses, different legal forms), don’t try to resolve automatically. Questionable cases should go to a manual review queue for data owners and the decision should be recorded: who confirmed and why.
Example: two counterparties differ by one letter in the name and have different phones. If BIN is the same — it’s almost certainly a duplicate. If BIN is empty but address and bank details match — it’s a candidate for review, not an automatic merge.
Example scenario: preparing the counterparty directory for ERP
A typical situation: procurement has a supplier as “TОО TechSupply”, accounting as “TechSnab LLP”, and a branch as “Tech Supply (Almaty)”. Contracts don’t link, payments go “to the wrong place”, and e-invoice errors appear due to mismatched details. It’s better to sort this out once during data preparation than to fix every document manually later.
First, collect requirements from the people who actually issue documents. Procurement cares about delivery terms and contact; accounting — payment details and closing documents; compliance — statuses and checks. Agree on the minimal set of fields required for a counterparty to be active.
A typical core is:
- full legal and short name
- BIN/ID and country of registration
- bank: BIC, account number, bank name
- legal and actual address (if different)
- contact person, phone, email
Then normalize entry rules: BIN — only digits, strictly by length; phone — in one format; address — a uniform template (city, street, house, office). Validate bank details for length and allowed characters. If a company has several accounts, store them as multiple bank records under one counterparty rather than separate counterparties.
When merging, do not simply “delete extras”; merge by a clear rule. For example: the main profile becomes the one with a confirmed BIN and current payment details, alternative names are kept as synonyms for search. The final decision is usually made by the data owner (often accounting), with procurement confirming operational contacts.
Document the process: who creates a new counterparty, which fields are required, which checks run before activation and how to quickly correct mistakes. Then the directory won’t fall apart a month after go-live.
Common mistakes when preparing data for ERP
The most frequent problem is trying to perfect everything at once. When a team tackles items, counterparties, employees, warehouses and finance simultaneously, you quickly hit disputes and fatigue. It’s much more reliable to start from key processes: what procurement, sales, production and accounting need first.
Second mistake — decisions remain in chats and calls instead of becoming rules. If you don’t document how names are written, which fields are required, what counts as the current address or BIN/ID, the same dispute will return every week and different people will make different edits.
Third issue — mixing structures: reference lists and transactions live together. For example, one table stores item card, price history, balances and ad-hoc notes. In ERP this becomes confusing: it’s unclear what to migrate as master data and what to keep as transaction history.
A separate risk is one-click duplicate merging. Without checks it’s easy to lose important bank details, contacts or contract terms. A typical case: one card has the correct BIN, another has the active bank account and responsible manager. If you just delete the “extra” one, you’ll later restore data manually and hunt for missing details.
Finally, many teams clean data but don’t change input rules. After go-live errors return because there are no required fields and input validations when creating cards, no unified formats for names, addresses and units, no assigned owner for the list, and users weren’t told when to create a new record versus search for an existing one.
A simple test: if after cleanup you cannot explain to a newcomer in 10 minutes how to correctly create a counterparty or an item, the problem will reappear soon.
Short checklist before migration and next steps
Before moving data to ERP, pause and verify that key decisions are made. These 30–60 minutes of final checks often save weeks of manual fixes after go-live.
If any item lacks a clear answer, postpone migration:
- Required fields and formats are filled in key master data (BIN/ID, dates, units, codes, addresses).
- Reference lists are aligned: agreed names, codes and mapping tables for old and new values.
- Rules for handling duplicates are clear: how to find matches, who confirms and which criteria define the golden record.
- A trial load has been performed (on a copy of the environment) and results reconciled with process owners: totals, balances, record counts, key reports.
- Data owners are assigned and a regulation for creating new records after go-live is approved so chaos does not return within a month.
Next, prepare a 2–4 week plan: a cut-off date, order of fixes and who gives final approval for load. Prepare a simple error log in advance: what was found, where the source is, who fixes it and when we recheck.
If time is short or sources are too many, sometimes it’s simpler to engage a systems integrator to build the migration process, quality control and repeatable checks. In this format you can involve teams like GSE.kz (gse.kz), who besides integration usually take on organizing the work and support so process owners are not pulled into endless manual reconciliations.
After the first productive load schedule a short post-migration audit in 1–2 weeks. It quickly shows which rules should be strengthened so preparing data for ERP does not turn into continuous cleanup in the live system.
FAQ
Which data should be prepared first so ERP actually works on day 1?
Start with what is needed for Day 1 processes: - counterparties (customers/suppliers) and their details; - item master and units of measure; - warehouses/storage locations, departments; - employees (if requests, approvals, or access depend on them); - opening balances and open documents. History “from many years” is usually better left out of the first go-live unless it is required for current operations.
Why won’t ERP clean up data on its own after go-live?
ERP does not “fix” data — it applies rules strictly and amplifies mistakes. As a result: - counterparties and items multiply as duplicates; - documents don’t reconcile by analytics; - accounting and procurement spend time on constant fixes; - reports become a subject of disputes, because numbers are formally correct but based on different versions of reference data.
Who should make decisions about the “correct” values in reference lists?
A working model is a «data owner» for each domain: - finance: chart of accounts, cost items, taxes, payment terms; - procurement: suppliers, purchasing items, units of measure; - sales: customers, delivery terms; - HR: employees and org structure; - IT/project: load rules, access, version control. It’s important that each reference list has a single source of truth and one person/role who makes the final decisions in disputed cases.
How do you find out where needed data lives if there are many sources?
At minimum — a short inventory: - list of sources (1С, CRM, Excel, warehouse or production systems, email with requests and details, network folders with price lists and contracts); - owner of each source and update frequency; - which fields are actually used (not just “just in case”); - where data is entered manually without controls. This quickly shows where major duplicates, empty fields and format conflicts will come from.
What quick data-quality checks should be done before cleanup?
Do a short profiling run in 1–3 days: - completeness of required fields (BIN/ID, units, departments); - uniqueness of keys (one BIN/ID per counterparty, one SKU per item); - format “noise” (extra spaces, mixed date/phone formats, Cyrillic/Latin mix); - logical gaps (item without group, counterparty without country). Present results as numbers: how many records, examples, risk level for processes and suggested fixes.
How to determine that records are duplicates and not different companies?
A duplicate is not just an identical row. Common cases: - one legal entity with several name variants (TОО/LLP, abbreviations, different languages); - typos and case differences; - “similar but different” (branch vs head company, brand vs legal entity). Look for matches by a combination of keys: BIN/ID + name + address/phone/bank details. A single indicator almost always gives false positives.
How to merge duplicates without losing important details and contacts?
Use the “golden record” principle: - predefine source priorities (e.g., legal details from accounting, contacts from CRM); - describe which fields to take from which source when conflicts occur; - send disputed cases to a manual review queue by the data owner. Do not delete the “extra” record without merging: it’s important to preserve accounts, contacts and contractual terms that may be spread across records.
Which normalization rules for counterparties typically yield the biggest benefit?
Agree simple rules before loading: - what goes into the Name field and what is stored separately (legal form, status, branch); - unified BIN/ID format (no spaces, fixed length); - address structure (country/city/street/house), unified phone format; - ban on “combining” code + text + notes in one field. The goal is that the same entity does not appear under many variants in the system.
Which reference lists and classifiers are most important to align before migration?
Basic things break reports: - units of measure (e.g., “pcs”, “pieces”, “pc”); - currencies; - VAT rates and flags; - document/request statuses; - operation types. Make a mapping table “how it was → how it will be” and forbid free-text entry where a reference list should be used. New values only by request with justification.
What is better to do before ERP go-live and what can wait until after?
Golden rule: fix what blocks daily operations first. Usually completed before go-live: - removal/merging of duplicates for key master data; - required fields and formats; - basic reference lists (units, currencies, taxes, statuses); - transfer of opening balances and open documents. Leave for after go-live: - “cosmetic” descriptions; - additional attributes that do not affect documents; - deep historical data rework if not needed for initial processes.