DSPM: what data classes it finds and how to choose a product
DSPM helps discover sensitive data in files and databases. We cover data classes, product selection criteria and how to handle thousands of ownerless findings.

Why DSPM matters if you already have security tools
In many companies the problem isn’t a lack of security controls, but that nobody really knows where important data lives. Documents spread across shared folders, email, file shares, test databases and Excel exports. Often it’s unclear who owns a dataset and who should decide: delete it, restrict access, or change the process.
DLP detects leakage attempts and suspicious actions well, but usually doesn’t answer what sensitive data is already stored and where. IAM shows who has which rights, but without understanding the data’s value those rights are hard to assess. SIEM collects events and helps investigate, but doesn’t inventory file and table contents or explain which assets are more important.
DSPM fills that gap: it discovers data, adds clear context, and helps set priorities. A useful DSPM “finding” is not just a file or table, but a bundle: the object (file, folder, table, column, dump), its context (where it’s stored, who has access, how recently it was used, whether there are copies), an explanation of the risk and recommendations for action.
Agree in advance what success looks like. A good outcome is not “fewer alerts” but reduced real risk: fewer shared folders with personal data, fewer databases with excessive privileges, fewer abandoned archives, clear owners and retention terms.
A practical example: an organization may have modern DLP but scans of IDs sit in a department’s network folder accessible to dozens of employees. DSPM quickly finds this, shows who has access, and suggests what to fix first.
Data sources DSPM usually covers
DSPM starts with a simple question: where does your data actually live? The answer is almost always broader than “the production database.” Mature solutions therefore work with files, databases and cloud storage, and can find forgotten copies that only surface after an incident.
Coverage most often includes file stores: network folders, shared drives, corporate portals and collaboration documents. These locations accumulate Excel sheets, scans, system exports and “temporary” files for years.
Another class is object stores and archives: backups, database dumps, log archives and packages sent to contractors. These often have no owner but contain the most complete snapshots of customer or HR information.
For databases DSPM typically operates at several levels: relational DBs, NoSQL, analytics marts and BI warehouses. The tool should see not only the “primary” tables but derived sets: replicas, exports to marts, and intermediate layers used for reporting.
If your company uses SaaS and cloud storage, DSPM connects to them too. In practice cloud folders and shared spaces often become the places where "anyone with the link" access is accidentally enabled.
Check specifically how the tool finds test environments and one-off exports. A typical scenario: a developer was given an export “for a week,” it landed in a test DB or project folder and then became part of daily work.
Data classes DSPM most often searches for
DSPM’s practical value depends on which types of data it can find and how accurately it distinguishes them in the real world. In most organizations the riskiest findings are not exotic, but everyday documents, exports and tables.
Most commonly searched for:
- personal data: IIN, full name, dates of birth, addresses, phone numbers, email, scans of IDs and passports;
- payment and banking data: card numbers, IBAN/account numbers, payment orders, contractor details, transaction details;
- medical information: diagnoses, test results, prescriptions, medical histories;
- financial and trade secrets: budgets, price lists, contracts, internal reports, procurement plans;
- credentials and “secrets”: passwords in files, API keys, tokens, connection strings, private keys.
A “data class” is not only a pattern like a document number. Quality is often defined by context: where the file resides, its name, nearby columns in a table, and signs of an export from CRM/ERP.
A simple example: a shared Excel “Employee list” stores IIN as a number without spaces, and nearby columns are “Department” and “Phone.” A good DSPM will mark this as personal data even with empty values or broken formats.
Before choosing a product, clarify which classes are critical for you and how they can be configured: are there ready-made policies for local formats (for example, IIN), can you add custom rules and dictionaries, does it support searching for secrets in configs and code, and are there confidence levels and exclusions to reduce noise.
Files vs databases: differences in search and labeling
DSPM handles structured and unstructured data differently. In a database a “finding” usually has a clear meaning (a field, table, or record set), while in files a match may be a text fragment without obvious context. This affects accuracy, labeling and how quickly you can assign an owner and act.
In databases DSPM relies on structure: tables, columns, types, relationships and reference tables. Labeling is more precise: you can say “this column stores IIN” or “this table contains medical diagnoses.” Good tools consider patterns and business context to avoid confusing, for example, an order ID with a document number.
Files are harder. Documents, emails and attachments may contain sensitive fragments within ordinary text. If images or scanned PDFs appear, OCR is required or some risks will be missed. Semi-structured formats (JSON, XML, logs, exports) produce many matches but lack a schema, so the tool must distinguish real personal data from technical fields and test values.
For files metadata and access rights are especially important: where the object sits, who owns or is responsible for it, which groups have read/write permissions, creation and modification dates, and signs of wide distribution (many copies, frequent forwarding).
If the product adds usage context (who reads it, from where, how often), prioritization becomes easier. The same set of personal data in a quiet table and in a shared network folder with broad rights represent different risk levels and require different actions.
How to evaluate classification quality and risk scoring
DSPM is valuable not for the number of findings but for how accurately it separates real risk from noise. Test on a small but diverse set: several file shares, a few database tables, test exports, archives and office documents.
Classification: depth, accuracy and explainability
Look for classification depth: besides built-in detectors (IIN, card numbers, etc.) there should be custom templates and dictionaries. Otherwise the product will see “universal” patterns well but miss internal codes, project names and your contract formats.
Accuracy depends on two things: false positives and false negatives. Ask the vendor to show an explanation for each finding: what matched (pattern, dictionary, model), where it was found (column, sheet, range) and with what confidence. If the product can’t clearly answer “why,” it’s hard to tune and even harder to defend to the business.
Rule flexibility often decides a pilot’s fate. You need exclusions (folders, schemas, file types), confidence thresholds and contextual conditions. Example: 12 digits in logs is not an IIN if there are no other personal data signals nearby.
Risk scoring: what it should consider in practice
Risk scoring should consider not only the data class but context: where the object is, who can read it, whether there’s external access, whether encryption is used and the retention period. A table with personal data in the finance segment and the same data in a shared folder are different risk levels.
Before the pilot ask these questions:
- Can you add templates and dictionaries without vendor development?
- Is there an explanation and confidence level for each finding?
- Can you configure exclusions and contextual rules?
- How are large volumes scanned and does that affect production systems?
- What leaves the environment: metadata, content fragments, results, or does everything stay inside the perimeter?
Check the deployment mode separately. For government, finance and healthcare it’s critical that data does not leave the environment and only minimal technical information (or nothing) is sent outside.
Rolling out DSPM step by step: from pilot to routine
To avoid turning DSPM into an endless list of findings, start with a pilot and clear boundaries. In the first stage it’s more important to understand where risk actually lives and how the team will work with it than to “find everything.”
Map sources and pick 1–2 highest-risk areas. Often this is a shared network folder with broad rights or an old database that no one remembers.
Then define the critical data classes for your industry and regulations. Usually 10–20 classes is enough; otherwise you’ll drown in noise.
Agree roles before the first scan. In practice it’s useful to separate responsibilities into three levels: data owner (business, responsible for meaning), system owner (IT, manages storage and access), and process owner (security/compliance, sets rules and controls).
After launch capture a baseline: where sensitive data lives, who has access and how many “open” locations exist. Immediately set exclusions (some archives, system directories, etc.) so statistics don’t balloon.
Then run a short remediation cycle: tighten access, enable encryption where justified, remove duplicates, and move data to proper stores. Reinforce regularity: scheduled rescans and change monitoring so new “holes” don’t appear silently.
If an integrator runs the pilot, ensure they take the solution to operational mode: not only installation but help with the owner model and finding-handling process.
How not to drown in thousands of ownerless findings
“Thousands of findings” usually come from three causes: connecting too many sources at once, setting low thresholds (treating every match as sensitive), and duplicates across copies, exports and backups. If you try to handle each item individually, the security team will quickly burn out.
Focus on impact rather than single files and tables. Prioritize places with broad access (e.g., “everyone in domain”), external users (contractors, guest accounts) and data that is actively used and frequently opened.
Simple triage rules help: elevate public/wide access, treat external access as higher priority than internal, prioritize massive datasets (thousands of records) over single matches, and route duplicates and backups to a separate stream for systematic handling.
Then group findings into cases: one risk across many similar objects. For example, “Accounting shared folder: 340 files with passport data” instead of 340 tickets. This reduces noise and helps fix the root cause, not just symptoms.
The “no owner” problem is organizational. Assign ownership to a system, department or business process (e.g., “HR documents” or “client statements”), not to each file. If you work with an integrator, agree in advance who in the business will decide on access and retention.
Use a few statuses (e.g., “new”, “in progress”, “closed”, “accepted as risk”) and 3–5 metrics without excessive detail: cases opened and closed in a period, share of findings without an owner, top systems by risk, average time to owner assignment.
Common mistakes in choosing and launching DSPM
The most frequent mistake is trying to cover all systems in the first month. DSPM quickly finds a lot of “interesting” items, but without a focused pilot you’ll get an avalanche of findings and disappointment. Better to pick 1–2 critical sources (for example, sales file shares and one key database) and define which data classes and risk scenarios you want to surface.
A second mistake is underestimating exclusions and false positive control. If you don’t agree on test data, project archives, technical dumps and other noise sources, alerts will be ignored.
A third is missing an owner model and approval process. A finding without a data owner, system owner and remediation responsible person rapidly becomes nobody’s. The result: DSPM turns into a problem showcase, not a risk management tool.
Another common failure is connecting backups without a plan. Backups contain outdated versions and duplicates, and you can’t fix everything there as you would in the production environment. Connect backups only when you have a clear scenario: who stores them, who accesses them and what measures you can realistically apply.
Finally, problems begin when security handles fixes alone. Access rights, encryption, segmentation and retention usually require IT and business involvement. A practical rule keeps the process working: each finding must have a next step and a deadline.
Real-world example: sensitive data in a shared network folder
A typical story: the sales team needs to “quickly check” an export from CRM. An analyst exports the client base to Excel, saves the file in a shared folder “Common\Temp,” sends it to colleagues and forgets to delete it. After a few weeks several such files accumulate, some renamed “final_2”, “current”, “for_review.”
After connecting DSPM to the file shares it becomes clear that one folder contains exports with IIN, phone numbers and addresses, and nearby are scans of contracts and powers of attorney. The worst part is not the file count but permissions: the folder has “read for all employees” and rights are inherited down into subfolders.
Why is there “no owner”? A former employee may have created the folder. The group they granted access to remained. Business no longer remembers who requested the folder and why.
To avoid drowning in such cases, prioritize by impact and accessibility: is there external access, is access wide, are these fresh exports or old archives, can a large volume be quickly exfiltrated (e.g., one file with tens of thousands of rows).
Fixing often takes less time than assigning blame: remove “everyone” access, move exports to a controlled store, assign a folder owner and retention for temporary data, and document rules about who stores exports and who approves access.
If internal resources are insufficient to adjust permissions and processes, a systems integrator can help operationalize the work. In this role, for example, GSE.kz is often useful as an integrator that ties the tool into infrastructure, roles and daily case handling.
Quick checklist before buying and piloting DSPM
Agree on a clear starting scope rather than “all company data.” Otherwise DSPM will become a stream of findings where it’s unclear what’s critical and who should respond.
Limit the perimeter: choose 5–10 sources that reflect real risk (for example, sales file shares, CRM exports, 1–2 main databases, archived email attachments). For each source assign business and IT responsables in advance.
Before launch check basics: list of critical data classes and their priority, presence of guest/public/wide access to stores, locations of DB dumps and Excel/CSV exports, and rule for grouping findings (by system, owner, data type or project). Also decide how you will assign owners — by system owner, department, directory or project tags.
To make the pilot useful, select 3–4 metrics in advance: top risks per week, average time to remediation, share of findings without an owner, share of “critical” findings that are validated on review.
Next steps: how to choose a product and run a pilot
Define the pilot goal and 2–3 risk scenarios where results are clear to both business and security. Examples: personal data in shared folders, secrets (passwords, keys) in exports and backups, sensitive tables in databases without an owner and without access control.
Then document product and environment requirements: where to deploy (cloud or on-prem, segment, access), what happens to metadata and scan results, required integrations (AD/IdP, DLP/SIEM, ticketing, data catalogs, CMDB), what reports and KPIs leaders expect, scanning windows and DB load constraints, and which exclusions must be in place from day one.
Request a demo on your data or the closest possible dataset. A good test includes a few control files and tables: some with obvious sensitive fields and some with lookalikes that are safe.
Plan the finding workflow before start: who accepts findings (security), who remediates (system and data owners), and who approves changes (business). A practical rule for a pilot: each finding should get an owner and a status within 3–5 business days.
If resources are lacking or you need to link DSPM to infrastructure and security processes, engage an integrator. GSE.kz, as a systems integrator, can help design the pilot, integrations and infrastructure for DSPM within your organization’s IT and security architecture.
FAQ
Why do we need DSPM if we already have DLP, IAM and SIEM?
DSPM answers the question “where exactly sensitive data lives and what condition it’s in.” DLP, IAM and SIEM are useful, but without an inventory of your data you often don’t know which objects to protect first and where risk has already accumulated.
Which data sources are best for a DSPM pilot?
It’s common to start with the most “alive” and chaotic places: network folders, shared drives, document portals and one key database. This usually yields quick wins because exports, scans and files with wide access are found there most often.
What types of data does DSPM typically look for in companies?
Most often DSPM finds personal data, financial and bank details, medical records, commercial secrets and “secrets” like passwords, tokens and keys. The practical value is not just recognizing a pattern, but correctly understanding the object’s context and risk.
How does search and classification differ between files and databases?
In databases classification is usually more accurate because of tables and columns, and it’s easier to assign owners and actions. Files generate more noise: text fragments, attachments, scans and broken formats, so OCR, metadata and actual access rights matter more.
How can I quickly check whether DSPM classifies data well?
Take a small but varied sample and prepare a few control examples where sensitive fields are obvious and a few similar but safe examples. A good tool shows why it decided that way, where exactly it found the match and with what confidence; otherwise you won’t be able to tune rules and reduce noise.
What should influence risk scoring in DSPM?
A sensible scoring should consider not only the data class but also context: breadth of access, external access, how recently data was used, presence of copies, encryption and retention. The same set of personal data in a closed zone and in a shared folder with “everyone” access should get different priority.
Which metrics indicate successful DSPM adoption?
For a start, focus on reducing real risk rather than the number of findings. For example: reduce the number of shared folders containing personal data, remove excessive database privileges, locate and address abandoned archives, and assign owners and retention periods.
How not to drown in thousands of findings after the first scan?
Limit the scope immediately and don’t lower thresholds so everything is considered sensitive. Then group findings into cases by folder, system or data type and start with objects that have wide access and external exposure, otherwise the team will drown in one-file-at-a-time reviews.
How to handle findings that have no owners?
Assign owners not to every file but to a system, department or business process, and agree in advance who will decide on access and retention. If a finding has no next step and deadline, it quickly becomes nobody’s problem and stalls.
What mistakes are most common when choosing and launching DSPM?
Common mistakes are trying to connect all systems at once, forgetting about exclusions and test environments, or attaching backups without a clear remediation plan. If internal resources are insufficient, an integrator can help turn the pilot into routine operation: configure sources, roles, case handling and integrations with IT/security processes, in a format that GSE.kz commonly provides.