Why change a commercial DWH at all if it works?

Most often — because of the "growth tax": as data and user counts increase, licenses, extensions and additional modules become expensive quickly. Switching makes sense when you want to pay for infrastructure and support rather than for every core, and to reduce dependence on a single vendor.

Why do teams often choose Postgres + ClickHouse instead of a single DB?

Because a DWH typically has two different workloads: careful data preparation with rules and constraints, and heavy analytics over large volumes. Postgres is convenient as a landing zone and quality-control area, while ClickHouse is the engine for fast scans and aggregations for BI.

What data is reasonable to keep in Postgres in a DWH?

Keep there what requires integrity and clear updates: staging for raw data, catalogs and their versions, loading metadata, task statuses, and results of quality checks. This helps explain the origin of numbers and to safely handle corrections and replays.

What is best to store and compute in ClickHouse?

Put facts and marts there that are mainly read: large event and sales tables, period reports, tops, funnels, and cohort slices. ClickHouse shines when you need to read many rows, compute aggregates quickly and return a small result set.

How to start designing data marts so you don't redo them later?

Define which business questions the mart answers, lock down metric definitions, granularity and join keys, and the update mode. Try to include only fields that are actually needed; otherwise the mart grows and becomes hard to maintain.

When do aggregates and precomputations really pay off?

Aggregates are needed where the same reports repeat and source tables grow quickly. Build precomputations for the most used dashboards and compute them incrementally so you don't recalculate the full history each time.

How to handle late-arriving data or backdated corrections?

Plan a small recalculation window for recent days or weeks, because corrections usually arrive there. In Postgres record receipt and validation of a batch, and in ClickHouse update data by partition/chunks rather than many tiny row updates.

How to estimate disk, memory and network requirements without guessing?

Count not only "how many GB per day", but how much will be in raw, marts, aggregates, plus operational overhead and growth/backups. ClickHouse needs fast disks and stable network for inserts and background merges; Postgres needs write IOPS and reliable WAL operation.

What usually makes queries slow after migrating from a commercial DWH?

The most dangerous mistake is carrying over heavy JOINs on large raw tables as‑is. Usually it’s better to change the model: narrower fact tables, needed keys stored together, compact catalogs in a separate layer, and check real BI filters before optimizing.

How to roll out Postgres + ClickHouse safely without breaking reports?

Run a short pilot on 1–2 domains and 10–20 key reports, measure load speed, response times and the cost of recalculation when corrections arrive. Switch to the new platform only after parallel reconciliation of numbers and with a rollback plan; plan infrastructure and support in advance, especially for on‑prem scenarios where 24/7 operations matter.

Postgres and ClickHouse for DWH: when to choose and deploy

What problem is solved by replacing a commercial DWH

A commercial DWH often becomes a "growth tax." As data, users and reports grow, licenses and extension costs rise quickly and vendor lock‑in increases. Changes to the data model or new marts turn into approvals, purchases and waiting for maintenance windows instead of direct business work.

Replacement is usually not about open source fashion, but about regaining control. The company wants to decide where to store data, how to scale, what to optimize and what to pay for: infrastructure and support, not every CPU core or an extra feature.

Most teams expect three things from a new platform: predictable analytics speed, transparency (clear reasons why a query is fast or slow) and flexibility to evolve marts. Another common motive is risk reduction: it's easier to change contractors and not depend on whether a critical feature is locked behind a license.

Why choose a pair of DBMSs instead of one? Because a DWH typically has two different tasks. First — carefully ingest, clean, reconcile and serve the "single source of truth" for references and facts, where transactions and constraints matter. Second — quickly compute heavy analytical queries over large volumes, where columnar storage and efficient aggregations matter. Postgres and ClickHouse cover these roles well separately.

Expect no magic. If sources are noisy, there are no data owners, metric definitions are not agreed, and loads run without quality control, any stack will produce questionable numbers and unstable performance. Open source DWHs win when there is discipline: rules for schemas, versioned marts, load monitoring and a clear change process.

Postgres and ClickHouse: a simple division of roles

The Postgres + ClickHouse pairing gives predictable results if roles are agreed in advance. You avoid forcing one DBMS to do everything at once and encounter fewer surprises in speed and storage cost.

Postgres often becomes the "ingest zone" and the place where rules live. It's convenient to load raw data (staging) there, coerce types, remove duplicates, validate keys and keep history for reference data. It works where transactions, constraints and clear update logic are important.

ClickHouse takes analytics tasks: large scans, GROUP BYs, time filters, wide fact tables and fast BI reports. It prefers prepared data laid out for reading many rows with minimal per‑row processing during queries.

A typical flow: sources (1C, CRM, billing, logs) land in Postgres staging, undergo normalization and quality checks, then facts and marts are formed and loaded into ClickHouse. BI users mainly read ClickHouse, while Postgres remains the "control panel": storing metadata and explaining where numbers came from.

Even if analytics run in ClickHouse, it often makes sense to keep reference data and their versions in Postgres (counterparties, branches, products), load metadata (what, when, from which source), schedules and task statuses, quality rules and results, plus service tables for reconciliations and audits.

Simple example: a branch network sends daily sales. In Postgres you record that a file or batch arrived, check completeness (did all branches report?), normalize product codes. In ClickHouse you store wide sales facts and compute daily, category and regional reports in seconds.

Data marts: when Postgres, when ClickHouse

A data mart is not just a "table for analysts." It's a product for the business: it answers a defined set of questions, has an owner and updates on a schedule everyone understands.

A good mart contains only what’s needed for decisions, not "everything just in case." Fix field and metric definitions (what is "revenue", "active customer", "sale date"), granularity (by day, by receipt, by customer) and join keys. Know the data source and update mode (incremental or full refresh) and have basic quality checks: sum checks, missing values and duplicate checks.

Postgres is convenient when a mart lives close to operational data or requires frequent small changes: appending rows, updating statuses, storing catalogs and ensuring integrity. A typical example is a "customer directory + current contract status" mart where records update many times per day.

ClickHouse usually wins when a mart is read‑heavy: large slices, groupings, period calculations, segment comparisons and multi-dimensional reports. For example, a sales-by-receipt mart with hundreds of millions of rows where the main load is aggregations by day, store and category.

A common rule in the pair: catalogs and current states stay in Postgres, while facts and heavy analytics go to ClickHouse.

Don't overcreate marts. Dozens of similar tables with no owner appear and nobody knows which is the "right" one. Maintain a limited number of marts and treat them as products: assign an owner, describe the questions they answer, document definitions and sources, and merge or retire duplicates. Changes should follow a clear process, not "a request in chat."

Aggregates and precomputations: where they help most

Aggregates are precomputed summaries that repeat daily: revenue by day, operation counts by week, KPIs by branch or product. In a Postgres + ClickHouse setup they often yield more benefit than tuning because they remove the most expensive GROUP BYs from user queries.

Aggregates are almost mandatory when the same reports run frequently and base tables grow quickly. If managers check branch sales every morning by day, week and month, grouping billions of rows each time will consume time and resources even on a fast engine, and systems will degrade during peaks.

Compute on the fly when queries are rare, data is small, or slices constantly change (ad‑hoc analysis). Otherwise a precomputation becomes another mart to maintain and explain.

To keep precomputations manageable, design incremental recomputation: recalc only the new period, not the whole history. Data is usually time‑based; each day you recompute yesterday plus a small "tail" for late adjustments.

A practical balance: build aggregates for the top 3–5 most popular reports, store only the needed granularity (day, branch, product), fix metric formulas, and include a short lookback window for recalculation if data can be corrected.

More aggregates mean faster queries but higher maintenance cost. Better fewer, well‑understood and regularly used ones.

Analytical queries: what will be fast and what won't

The Postgres + ClickHouse pair works well when you understand query patterns in advance. Two visually similar reports can behave differently: one runs in seconds, the other chokes on disks or network.

ClickHouse usually wins when you need to read many rows and quickly compute totals. Typical cases: time filters (day, week, month), top‑N by sales or events, funnels (step 1 → step 2 → step 3), cohort analysis by first action date. Its strength is large scans and columnar aggregations.

Postgres is more predictable for point lookups and small joins. For example, viewing a customer card with 5–20 related rows, or carefully joining a few small catalogs with facts when volumes are moderate.

Practical engine selection rules:

ClickHouse: many rows to read, few columns returned, many GROUP BY/COUNT/SUM by periods.
Postgres: point filters by keys, frequent small JOINs, transactional checks and updates.
Be cautious: joining two large tables without a designed key/order of data.
Be cautious: queries with complex per‑row business logic instead of aggregation.
Always: check which fields BI users actually filter on.

The most dangerous pattern when moving from a commercial DWH is trying to reproduce heavy JOINs on raw large tables. In ClickHouse this often becomes long reads and bloated intermediate results. The solution is to change the model: prepare narrower fact tables, keep needed keys nearby, and store compact catalogs.

Collect real queries before implementation: export BI logs, request the 10–20 most important reports, note filters (especially time) and expected response times. Run them on a test volume and only then decide what to compute in ClickHouse and what to leave in Postgres.

Disk, memory and network requirements: how to estimate without guessing

Infrastructure sizing for DWH

We will calculate disks, network and growth margin for your DWH for 12–24 months.

Request sizing

Size the hardware by workloads, not just current data volume. DWH storage usually grows across layers: raw (staging), marts and aggregates, plus retention and backups.

A simplified volume estimate:

Raw: average row size × rows per day × days of retention.
Marts: often 20–80% of raw (depends on normalization and duplicates).
Aggregates: often 5–30%, but can exceed marts if you precompute everything.
Overhead: Postgres indexes and WAL; ClickHouse parts and merges (often +20–50%).
Growth: at least +30% and separate space for backups.

Disk throughput matters as much as capacity. Critical spots are WAL and indexes in Postgres, and background merges and inserts in ClickHouse. Slow disks cause latency patterns: loads proceed but queries slow during merges. Fast SSD/NVMe and separate disks for roles usually help.

Network becomes a bottleneck during bulk loads, replication, distributed queries and backups. A common mistake is counting only daily GB and ignoring peaks: nightly backfills, mart recomputations and data transfers between Postgres and ClickHouse. Narrow networks stretch load windows and can push nightly work into business hours.

Memory and CPU can also hit limits. ClickHouse grouping, sorting and JOINs may spike memory usage. In Postgres cache, work_mem and parallelism matter. CPU is stressed by complex aggregations and high concurrency.

Practical rule: budget a margin and measure on a pilot. Test real query speeds for loads and mart recomputations, monitor query latency during background work (merges, vacuum), check network headroom for replication and backups, and evaluate performance when data grows 3–5×.

If you pilot on production‑class servers, results are closer to reality than tests on random VMs. For on‑prem scenarios, this might be racks like the S200 Series from GSE.kz.

Data and pipelines: what to agree on before the first load

Most DWH problems start not with the choice of Postgres or ClickHouse, but with agreements about the data. If you don't fix basic rules before the first load, you'll spend time reconciling reports and repeatedly redesigning marts.

First, define the schema framework. You need stable keys (not dependent on name, status or branch), unified catalogs and clear time rules. For example: what is a "day": source local timezone, head office time, or UTC? Even "revenue for yesterday" can differ if systems slice days differently.

What to fix in advance

A single identifier for entities: customer, contract, sales point, device.
Shared catalogs (organizations, branches, statuses) and their owners.
Time agreements: timezone, calendar, "business day", closing delays.
Responsibility boundaries: what is counted in source, in ETL/ELT, and in the mart.
Change rules: how to handle backdated corrections.

Then decide on the loading mode: batch or near real time. "Near online" is often requested out of habit but is truly needed for a limited set of marts (queue monitoring, fraud signals, operational KPIs). Financial and managerial reporting is usually fine with hourly or daily batch loads. Batching makes quality control and stability easier.

Quality control should be embedded in the pipeline, not done manually after complaints. Minimum checks: duplicate keys, missing required fields, late arrivals, and reconciliations with sources (at least sums and counts).

Computation logic and observability

Version calculation rules: who changed a formula, when and for which dates it applies. A good practice is to store transformation code and schema migrations together and release changes as a versioned release.

Observability is necessary from day one: load times, volumes, error counts, share of late data, and execution time of key queries. Failures then show up immediately, not when a manager urgently needs a report.

Step‑by‑step implementation of Postgres + ClickHouse

Architecture of the pairing for your task

We will review the roles of Postgres and ClickHouse and design a staging-core-marts layer scheme for your case.

Discuss the project

Implement the pair as a series of short, verifiable steps rather than one big migration. This reveals bottlenecks (disk, network, data model) early and avoids building the system by guesswork.

Start with specifics: which reports and metrics must work on day one and what delay is acceptable (5 minutes, an hour, a day). Record not only the list of reports but typical filters: by branch, manager, period, product. Filters drive where to place aggregates and how to partition data.

Map sources: where data comes from, update frequency, what is critical (e.g., financial ops), and where late events may occur. This defines load, deduplication and quality rules.

Typical sequence:

Lock the list of priority reports and success criteria: response time, freshness, accuracy.
Describe sources and update patterns: schedule, volumes, which fields may change back in time.
Choose layers and owners: staging for raw, core for agreed entities, marts for productized views, and assign responsible people.
Pilot one domain (e.g., sales): build the mart in Postgres, push heavy slices and groupings to ClickHouse, and measure bottlenecks.
Migrate marts one by one and deactivate old DWH pieces only after parallel reconciliation of numbers.

Measure three things in the pilot: load speed, typical query speed and the "cost of errors" (time to recalc if corrections arrive). If a branch sales report is commonly sliced by date and city, test partitioning and distribution keys in advance. Otherwise ClickHouse may look fast in demos but slow in reality.

Final step: a cutover plan — a period of parallel computation, reconciliation of key metrics and a clear rollback if discrepancies appear in production.

Common implementation mistakes and how to avoid them

The most frequent mistake is dumping all data into a single huge table and leaving analysts to "figure it out." There are no clear marts, no owners, and every new report becomes manual work and an argument over which numbers are correct.

The second trap is premature aggregation. Teams build dozens of precomputations "just in case," then half are unused, they take long to update and they break load windows. Simple rule: an aggregate must answer a concrete business question and have an owner who confirms it's needed.

Common causes of broken reports

Time and timezones are a separate pain. If one system stores time in local timezone, another in UTC, and the mart keeps "as received" timestamps, daily reports will jump. For example, branches close day at 20:00 local time and part of sales shifts to the next day in analytics.

An ill‑thought update strategy is also dangerous. Postgres handles point updates and catalogs well, while frequent row updates in ClickHouse are usually expensive and can slow queries. Design loads so ClickHouse mostly sees inserts and partition replays, and apply corrections by partition recalculation rather than endless tiny updates.

How to avoid problems in practice

Before launch, agree on basic rules: each mart has an owner and metric descriptions, time is stored consistently (often UTC) with local time separate, update plans are clear (what is full refresh vs incremental, acceptable delays), monitoring for loads and quality is configured (freshness, sum checks, anomalies), and there is a "golden" report compared to a benchmark after each change.

If deploying a DWH across branches, start with one mart (e.g., daily sales by location), stabilize it and then scale the approach to other domains. In SI projects (including with GSE.kz) discipline around marts, time and quality control usually yields more benefit than trying to "speed up everything" at once.

Short checklist before production

Before launch, agree on meanings and system boundaries. Even a strong Postgres + ClickHouse pairing will break if teams calculate the same metric differently or don't know where to find data.

Start with reports. Make a short list of key dashboards and extracts actually in use: financial results, sales, stock levels, SLA, branch load. For each item fix metric definitions (e.g., "revenue" — by payment or shipment; "active customer" — last 30 days or calendar month) and an owner who approves changes.

Check layer architecture: where raw data, marts and aggregates live. There must be a single authoritative source for each field and clarity about where it can be used. A common mistake is when a Postgres mart pulls data from different places "because it's faster", and after a month nobody can reproduce numbers.

Set retention and storage rules per layer. Raw is kept longer for replays, marts by business need, aggregates just as long as they speed up queries. Without predefining retention, disks fill and cleanups break reports.

Verify data quality and load observability: checks for completeness, duplicates and metric jumps; alerts for late loads and sudden volume changes; and a simple way to see which table and step failed.

Finally — pilot with measurements. On a limited but real dataset measure disk growth by layer, network traffic between components, typical query times and aggregate recompute time. If results are much worse than expected, it's usually a sign to change mart granularity, partitioning or precompute strategy rather than "add hardware at random."

Example scenario: reporting for a branch network

Short-cycle implementation plan

We will discuss goals, SLA and a 2–6 week work plan to kick off the migration.

Schedule a meeting

Imagine a network of 30–100 branches. Managers need the same numbers daily: revenue, number of receipts, average check, top products, weekly trends. Sources are usually different: POS, CRM, warehouse and branch/product catalogs. Data arrives with delays and sometimes returns/statuses change retroactively.

In Postgres + ClickHouse split roles: Postgres holds staging (raw uploads) and catalogs: branches, categories, managers, aggregation rules. It's easier there to control quality, store catalog versions and run checks. ClickHouse stores ready reporting marts where read speed and aggregations matter.

A practical set of aggregates: daily sales by branch and category, plus a separate per‑product table for top lists. Reports like "yesterday vs last year" or "top‑20 categories by region" read from small summary tables rather than recomputing millions of receipt rows.

Build updates so you don't recalc everything. A common approach: daily load of new data and recompute only the last N days (e.g., 7–14), because adjustments and returns most often affect that window. Older history is treated as stable.

In the pilot check not only the fastest chart but repeatability: times for 5–10 typical queries (by branch, region, category, tops), duration of daily load and last‑N‑day recompute, behavior under concurrent requests from multiple managers, catch‑up speed when a load is late, and how quickly you detect an error before reports go out.

For on‑prem infrastructure, decide where ClickHouse will run (it needs fast disks and stable network) and who will provide 24/7 support. In Kazakhstan such projects are often done on local servers with local support — experience from an SI like GSE.kz can be useful.

Next steps: pilot, infrastructure and support

To avoid the Postgres + ClickHouse stack becoming another unfinished DWH, start with a short pilot on real pain points: which tables are read, which filters are used, and where users wait minutes.

Pick 1–2 domains for the pilot (e.g., sales and stock or finance and payments) and take at least 3–6 months of data. This quickly shows which marts belong in Postgres and which heavy aggregates and long analytics should move to ClickHouse.

Agree on minimal SLAs: otherwise "fast" means different things to everyone. Two simple numbers are enough: data freshness (e.g., every 15 minutes or daily) and response time for key reports (e.g., up to 5 seconds for typical filters).

Then plan infrastructure and operations. Size for 12–24 months: data growth, more users, new marts. ClickHouse often hits disk capacity and network between nodes; Postgres needs write IOPS and reliable backups.

A practical 2–6 week work plan:

Collect 20–30 real queries and the list of fields needed in marts.
Fix SLA for update latency and report response times.
Estimate 12–24 month data growth and reserve disk and memory accordingly.
Assign owners: who owns marts, who monitors data quality, who grants access.
Set up basic operations: monitoring, backups, restore test, and schedules for regular tasks.

If the team lacks bandwidth for integration and on‑call, consider a partner. For example, GSE.kz as a vendor and integrator can help select servers, build infrastructure and organize round‑the‑clock support.