Why does BI run fast at first but slow down after a few months?

Most often the server was chosen for a demo load, while in reality dozens of users work concurrently, data volumes grow, and nightly loads conflict with daytime queries. As a result, cache no longer fits in RAM, disks can't keep up, and CPU resources are contested, so response times start to jump.

What numbers should be collected before buying a server for BI and data marts?

Start with peak concurrency and a clear SLA for response time for 3–5 key reports. Then record the nightly window for ETL/recalculating marts and a 1–3 year data growth forecast — these three items usually define CPU, RAM and disk requirements most accurately.

When should ETL/DWH/marts and BI be placed on different servers?

Separate roles when nightly loads regularly spill into the morning or when one heavy report slows everyone else. Even a minimal separation of the BI service and the database with marts usually gives more stable response than trying to beef up a single large server.

What's more important for BI: more cores or higher CPU frequency?

Interactive dashboards usually benefit more from single-core speed and predictable latency because queries are short and numerous. Nightly recalculations and bulk loads benefit from more cores and parallelism — but only if memory and disks aren't the bottleneck.

How to tell the system lacks RAM, and how to plan a buffer?

If marts and indexes fit in cache, reports speed up noticeably because there is less disk reading. Plan memory with a buffer: after OS and services there should be room left for the DB cache and for growth of marts, otherwise the system will start using disks and performance will be inconsistent.

What disks and storage are typically needed for fast reports and stable loads?

NVMe/SSD usually deliver a tangible effect in BI: lower read latency, faster sorts and temp files. A good basic approach is not to chase only capacity but to provide a fast layer for active marts and predictable write performance for logs and loads; otherwise nightly processes stretch out.

When does the network become a bottleneck for BI and DWH?

If disks and CPUs look underutilized while query times still jump, the network is often the culprit: data sources, backup transfers or replication may be saturating the link. For most scenarios where BI and DWH are colocated, 10GbE is sufficient, but active replication, large backups and growing marts may require higher bandwidth.

What are RPO and RTO and how do they affect analytics infrastructure choice?

RPO is how much data you are willing to lose, RTO is how quickly the system must be restored. These figures decide whether backups are enough or you need a standby node/replication — because backups alone don't guarantee a fast service return.

What to look for in licenses and software compatibility when choosing CPU and platform?

Check licensing of your DB and BI platform in advance: it may be counted by cores, sockets or users, which affects the cost-efficiency of a configuration. Also verify how your software behaves on dual-socket platforms and with NUMA to avoid uneven latency despite similar specs on paper.

How to run a pilot correctly and not miss the growth requirements?

Run a night load together with 2–3 heaviest reports, measure response times, duration of the nightly job and peaks in RAM/disk/CPU. If local support in Kazakhstan matters, consider a platform that is easy to expand and service locally (for example GSE S200) to avoid a full replacement in a year.

Servers for Analytics and BI: choosing for reports and data marts

What usually goes wrong with servers for BI and analytics

The main reason BI infrastructure starts to slow down is simple: the server was chosen for a pretty demo table, not for real load. As a result, reports take a minute to open, filters freeze, and users stop trusting the numbers because “it was faster yesterday.”

Most often the bottlenecks are three things: memory, disks or CPU. There's not enough RAM, so data is constantly pushed out of cache. Disks can't read and write fast enough during loads, and daytime reports end up waiting for nightly updates. Or CPU is insufficient when many people run heavy queries and calculations at the same time.

Expectations make the problem worse. The business needs fast reports during the day, while IT plans to update marts at night. If these goals aren't separated in advance, conflicts arise: nightly loads consume resources, dashboards are slow in the morning, and data freshness is in question.

For the quasi-public sector in Kazakhstan there's another practical point: data almost always grows faster than planned. New sources are added, retention rules and audit metrics expand. If the server is bought "tight," after a year you have to add capacity — which means new procurement cycles, approvals and the risk of incompatible configurations.

Before purchasing, answer a few questions so you don't overpay or miss on resources:

How many users open reports concurrently during peak hours and which reports are the heaviest?
What is the target SLA: “report within 5 seconds” or “30–60 seconds is acceptable”?
When and how are marts updated: only at night or are there daytime loads too?
What is the planned hardware lifetime and expected data growth?
What's more important: report response speed, load speed, or balance?

Practical example: if departmental analytics serves 50–100 employees during the day and multiple systems load at night, a single “universal” configuration often disappoints. In such cases it’s better to plan separate roles (queries vs loads) or at least build in extra RAM and disk capacity. That's why servers like the GSE S200 are often considered: they make it easier to plan scaling for growing marts and parallel queries.

Typical architecture: DWH, marts, ETL and BI server

To choose servers for analytics, think of the chain as roles rather than one big system. This makes it easier to see what must be fast and where conflicts arise.

A typical scheme usually has four layers:

ETL/ELT: pulls data from sources, cleans and loads on a schedule.
DWH (warehouse): stores history and raw data in processing-friendly structures.
Data marts: ready sets for reports and KPIs.
BI server: visualization, dashboards, access control, cache and report publishing.

Bottlenecks don't appear "everywhere at once," but in specific places. At night the pressure is on loads: ETL writes a lot to disk and taxes CPU with transformations. During the day the focus is on reads and concurrency: users open reports, apply filters and drilldowns, and marts receive many simultaneous queries. The BI server can consume memory for cache and suddenly cause slowdowns even if the database still performs.

One server for all tasks often ends with nightly loads hurting morning reports and a heavy report from one department slowing others. This is particularly noticeable in the quasi-public sector: fixed update windows and SLAs for reporting.

In practice roles are separated like this:

ETL and DWH are often split so writes don't block reads.
Marts are taken out separately if there are many “hot” reports.
BI service is run on a separate node if there are many users and a stable response is required.

If growth is expected, it's easier to plan from the start to separate roles onto distinct nodes without changing the entire system logic.

Load profiles: types and why they matter

A BI server rarely lives in one mode. During the day it handles dozens or hundreds of user queries, at night it recalculates marts, and sometimes it receives ad-hoc analyst requests. Mixing these modes makes it easy to buy a configuration that looks powerful on paper but is slow in reality.

Interactive reports and dashboards are many short queries. Fast response and stable latency are important: users notice pauses in peak minutes, not average speed. Typical needs are CPUs with good single-core performance, enough RAM for cache, and fast disks.

Batch calculations and scheduled marts are long tasks in night windows. Here parallelism, high disk throughput and predictability matter. If the window doesn't close, the whole process fails in the morning.

Ad-hoc analyst queries are unpredictable spikes. One query can trigger heavy aggregation on a large table. You need spare CPU/RAM and clear limits (queues, priorities) so experiments don't break interactive reporting.

Also clarify where computations run: in the database (joins, aggregations, window functions) or in the BI tool. More logic in the DB raises server requirements for the database. More calculations in BI increase load on the BI server, especially memory.

If ML models or forecasting are added, requirements expand: usually more RAM and fast disks for training sets, more cores for parallel computations, and sometimes GPUs.

How to gather requirements without complicated terms

Start not with CPU models and disk names, but with how people actually use reports. This quickly shows what matters most: response speed, steady nightly loads, or headroom for growth.

First, fix the users and their habits. It's not how many employees exist but how many work concurrently and at what hours. If the peak is 10:00–12:00, the system must handle maximum load at that time without sudden drops.

Next, determine which reports hurt. Agree on simple thresholds: what must open in 5 seconds, what can take 30 seconds, and what is a batch task of 5 minutes. This affects the configuration immediately: fast interactive reports and heavy recalculations need different emphases.

A short requirements card helps:

Peak concurrency and peak hours (e.g., 60 users at 10:00).
Critical reports and target response times (5/30/300 seconds).
Current data volume and 12–36 month forecast (e.g., 12 TB -> 25 TB).
Marts update window (e.g., 4 hours at night).
Support and regulatory constraints (24/7, SLA, procurement cycle).

In the quasi-public sector in Kazakhstan, procurement cycles and local support matter. It makes sense to plan a 2–3 year buffer and clarify on-site service and repair. If you consider local vendors with support networks like GSE.kz, these questions are usually easier to agree on within internal rules.

CPU: choosing for queries, calculations and parallelism

CPU becomes a bottleneck not only because of one heavy report but due to competition: user queries, mart updates and background calculations often run at once. So it's important to know which mode will be more common: short interactive queries or long batch recalculations.

If reports must open quickly and filters are used live in meetings, strong single cores and higher frequency matter. If complex aggregations run at night and many parallel queries occur by day, more cores usually win.

Parallelism and query competition

Query competition is when multiple tasks contend for the same cores. Example: in the morning one department builds summaries, another pulls detail, and marts are being refreshed. In such cases “many medium cores” often beat “few fast ones,” but only if memory and disks keep up.

Sockets, NUMA and headroom for growth

NUMA can be pictured as two halves of a dual-socket server: cores work faster with their local memory and slower with memory on the other socket. Sometimes a single-socket server with a faster CPU gives smoother response than two sockets with slower CPUs, especially for latency-sensitive queries.

Practical guidelines:

For interactive reports prefer higher frequency and low latency.
For marts, calculations and many users add cores.
Keep CPU headroom of at least 20–30% for new reports and data growth.
Check BI and DB licensing: by cores, sockets or users.
If considering dual-socket platforms (including GSE S200-class servers), verify how your software handles NUMA.

Memory: how much RAM is needed for marts and fast delivery

Analytics integration

We take on BI and DWH system integration including infrastructure and commissioning.

Start integration

In analytics you often hit RAM limits before CPU. Reports and dashboards like repeated queries, filters and aggregations. All this runs faster when needed pages and indexes are in the DB cache rather than read from disk.

You can infer whether data should be kept hot in memory by user behavior. If the same marts and reference tables are used by dozens during work hours (finance, procurement, HR), cache brings a noticeable effect. If reports are rare and varied, RAM still matters but can't work miracles.

Signs of insufficient memory are visible even without deep metrics:

swap appears (system pages memory to disk);
speed drops during peak report hours;
the same report is fast one moment and suddenly slow the next;
disk reads increase at similar load.

Plan memory not “tight.” After OS and services, leave a clear reserve for cache and add headroom for 12–18 months of growth (new metrics, more detail, longer history). Otherwise a mart grows, cache no longer fits, and the system hits disks again.

A separate cache or semantic layer server makes sense when the BI platform actively caches results while the DB is busy with ETL. In such cases it's usually easier to split roles across nodes than to keep adding RAM to one overloaded server.

Disks and storage: fast queries and predictable loads

BI performance often falls back to disks. A user opens a dashboard, and if the server spends time reading data or working on temporary files, the report will lag even if the CPU is good.

Two key metrics: IOPS — number of small read/write operations per second; throughput — how much data moves per second. Interactive reports suffer from latency and low IOPS, while nightly loads need throughput and steady write performance.

SSD, preferably NVMe, usually gives a noticeable boost: reports open faster, sorts and aggregations pause less, and loads run more consistently. Capacity is important, but a large slow array often loses to a smaller fast one.

Good practice is to separate storage workloads so they don't interfere:

tables and marts on the fastest layer;
transaction logs separate, optimized for steady writes;
temp files (temp, sort, spill) on a fast disk;
backups on a separate, larger and cheaper tier.

Choose RAID based on risk and write patterns. Databases often use RAID10 for speed and predictable latency. RAID5/6 fits where capacity matters more but is more sensitive to heavy writes and rebuilds.

If data grows, tiered storage helps: “hot” for current marts, “warm” for less-frequent periods and “archive” for regulatory retention. In the quasi-public sector it’s common to keep the last 12–18 months on fast storage and older periods on cheaper storage with known response times.

Network, reliability and recovery: what to check before buying

Disk and RAID specification

We will clarify RAID, storage tiers, NVMe and separation of logs, temp and backups for your DB.

Agree specification

BI servers can slow for reasons other than CPU. Network and poor recovery planning often cause issues, especially when morning reports access remote sources and nightly loads run into working hours.

Quick network guide

10GbE is usually enough if BI is colocated with DWH, loads are moderate, and backups run at night without saturating the link. Move to 25/40GbE when marts grow, many parallel users appear, active replication exists or frequent heavy backups take place.

Typical network bottlenecks: source exports (file registries), replication between nodes, backup transfers, and analysts accessing large datasets. A simple sign of a network issue: disks and CPUs are underutilized while query times fluctuate.

Recovery without surprises

Reliability starts with basics: dual power supplies, network redundancy, disks resilient to failure and a clear swap/replacement procedure. This is not a luxury but a way to avoid stopping reports over a single component fault.

RPO and RTO can be explained plainly: RPO — how much data loss is acceptable (e.g., 15 minutes or 1 day). RTO — how quickly the system must be back (e.g., 1 hour or 8 hours). These numbers show whether you need replication, a spare node and how often to run copies.

Remember: backup and fault tolerance are different. Backup helps return data from "yesterday" but doesn't prevent downtime now. Fault tolerance keeps the service available during a failure but doesn't replace backups.

Before purchase check:

is there a separate channel or window for backups and replication;
can the network handle morning reports and parallel loads;
is power, network and disk redundancy planned;
are RPO/RTO agreed with report owners;
has real recovery been tested, not just the presence of backups.

Step-by-step: how to pick a configuration for reports and marts

Picking a server for BI is easier when you start from real scenarios: how people work by day and what happens at night. This is critical in the quasi-public sector: morning reports must be fast and scheduled tasks must fit the window.

5 steps that give a clear result

Choose 3–5 most important reports and marts. For each record a target response time (e.g., 3–5 seconds for typical filters) and the heaviest actions (drilldown, export, recalculation).
Estimate concurrency: how many users at peak and how many tasks run in parallel (updates, calculations, publishing). Record the nightly window: how many hours for loads and rebuilds.
Decide how to split roles. If marts and loads conflict, separation usually wins: a node for DWH/ETL and a node for the BI service. One powerful node works when users are few and nightly loads are short.
Draft a configuration per profile. Fast responses usually need more RAM and fast disks. Bulk loads and calculations need more CPU and steady write performance. Network matters if data comes from various systems or external storage is used.
Build in a 1–3 year buffer: data growth, new marts, more users. Decide how you'll scale: add RAM/disks, add a second server or move to a cluster.

Then verify compatibility with your software (DB, BI platform, drivers) and support/parts availability.

Quick check before final specification

Which hurts more: morning report peak or nightly loads?
Will the regimen fit the window as data grows?
Is there redundancy: disks, power supplies, recovery plan?
Can the server be expanded without a full replacement?

If choosing infrastructure for Kazakh organizations, clarify local origin and support requirements. GSE S200 servers are often reviewed because of available support and predictable supply in the region.

Example scenario for the quasi-public sector: from reports to marts

Imagine a quasi-public organization in Kazakhstan: 200–500 users open reports in the morning (9:00–11:00), and at night marts are recalculated and sources loaded. Typical data: finance (payments, budgets, postings), HR (headcount, payroll), procurement (plans, contracts, deliveries). Sources vary and update frequency differs.

To prevent analytics servers from "falling over" at peak, split roles early, even if initially it's a single physical server.

Where RAM matters more and where disks matter

Tasks typically fall into three types, each with its own bottleneck:

BI delivery and interactive reports: RAM and CPU matter to keep hot sets in memory and handle many concurrent queries.
Nightly ETL and mart recalculation: fast disks and steady writes matter because of bulk inserts, recalculations and sorts.
Historical storage (years): capacity and predictable performance matter so growth doesn't eat space or slow marts.

In practice teams often prioritize RAM for marts and BI, and select disks with capacity and buffer for nightly loads and historical growth. Capacity should be planned not "tight" but with +30–50% for 12–18 months.

Pilot plan: what to measure

Before buying, run a pilot on one node with real data. Measure:

open times for 10–20 key reports during morning peak;
duration of nightly mart recalculation and source loads;
maximum concurrent load without degradation;
RAM, CPU and disk usage (write queues, IOPS);
mart growth per month and a yearly forecast.

If the pilot shows reports stall while disks are idle, prioritize RAM/CPU. If nightly jobs don't finish in the window, the disk subsystem or contention between ETL and BI is usually at fault.

Common mistakes when choosing BI servers

Resource calculation without surprises

We will calculate CPU, RAM, disks and a 1–3 year buffer so you don't buy too tight.

Request estimate

Problems often start when a server is chosen by a single parameter. For example, buying many cores and expecting reports to fly, then being surprised performance barely changes.

Typical errors:

Lots of CPU but little RAM. For marts and interactive queries memory often matters more. If sets don't fit cache, the system goes to disk while CPU idles.
Choosing disks by capacity only. For DWH latency and IOPS matter. A load that should take 30 minutes can take 4 hours with slow storage and write queues.
Mixing nightly ETL and daytime interactive loads without priority rules. Result: pauses by day, missed nightly SLAs.
Not allowing space for temp tables, sorts and log growth. Especially painful at quarter end.
Testing on small samples. Production data reveals locks, memory contention and slow joins.

Simple way to catch these before purchase:

Run 2–3 heaviest reports in parallel with a typical mart load.
Measure which resource hits first: RAM, disk, CPU or network.
Check temp and log space for 6–12 months growth.
Separate workloads by time or node if daytime reports are critical.

A balanced configuration is better than the single most powerful component, whether you consider GSE S200 or other hardware classes.

Checklist before purchase and next steps

Collect a few facts on one page before buying servers for analytics. This saves money and reduces surprises after launch.

Fix these:

How many users will be in BI concurrently and when are peaks (morning, end of day, month end).
Which reports and dashboards are critical and acceptable response times.
Current data volumes and 12–24 month growth forecast.
How often marts update and the window for loads.
Availability requirements: are night stoppages allowed, is a recovery plan needed?

Then verify basics:

CPU: enough cores for parallel queries and background loads?
RAM: will working sets and caches fit to avoid disks?
Disks: is there a fast layer for active tables and separate space for ETL/staging?
Network: enough bandwidth between BI, DWH and storage?
Redundancy: RAID, power supplies, spare capacity to survive failures?

To refine numbers, ask the team for measurable examples: 5–10 heaviest queries, nightly load durations, concurrency stats and some real timings.

Before a big purchase run a short pilot on real data. Success criteria are simple: target response times for critical reports, duration of mart updates, stability under peak and absence of rare but severe slowdowns.

After that, agree on the target architecture (where DWH, marts and BI live) and assign responsibility for support and monitoring. If you need a hardware baseline for BI and marts, consider GSE S200 servers and, if needed, system integration and 24/7 support from GSE.kz.