Where the choice starts: the agency's problem and constraints

Infrastructure questions rarely come from curiosity — they come from practice: warranties expire on servers, databases and registries grow, new citizen services appear, and the IT staff doesn't expand. At the same time, requirements for resilience, logging, backups and clear responsibility for outages increase.

That leads to a choice: hyperconverged infrastructure (HCI) or the classic "servers plus SAN" approach. On paper both cover virtualization and storage. In reality they differ in how quickly they can be deployed, what skills are needed to support them, and how the solution will behave a year later when load increases.

In the public sector, peak benchmark numbers matter less than a combination of constraints: deployment timelines, predictable resilience, change control, regulated service delivery, spare parts availability and procurement specifics (lots, compatibility, origin requirements).

Example. If you need to migrate 60–80 virtual machines in a short maintenance window and don't want to depend on scarce SAN specialists, HCI often looks sensible. If you already have a mature SAN team and need to scale storage independently of compute, the classic approach usually offers more flexibility.

Below is a practical, no-marketing comparison: deployment timelines, support competencies and scaling considering typical public-sector requirements.

What HCI means in simple terms

HCI is an approach where compute, storage and virtualization are combined in a single cluster. Instead of separate servers and a separate SAN, you use several identical nodes that together provide a shared resource pool for virtual machines.

A typical HCI system consists of nodes (servers with CPU, RAM and disks), a network for inter-node communication, and a software management layer. This layer aggregates disks from all nodes into a shared datastore and provides a single management interface: where VMs run, how much space is used, which node is overloaded.

Resilience is achieved through the cluster. VM data is usually replicated across multiple nodes. If a node fails, VMs are started on neighbors and data remains available thanks to copies. The higher the protection level, the more resources are reserved as "spare".

HCI is most often chosen for typical virtual servers, VDI, test and backup environments, small agency data centers and remote sites where there's no dedicated SAN team.

It's important to remember: HCI is not "magic" — it's just another way to assemble a data center. CPUs, memory, disks, network and the software platform are still chosen — they are just managed as a single system.

Classic "servers plus SAN" architecture: how it works

A classic virtual infrastructure is usually layered: compute, storage and network. Servers run the VMs, the SAN stores data, and switches and network settings connect everything.

A typical setup includes a hypervisor cluster on servers, a separate SAN, a dedicated storage network (FC or iSCSI) and tools for virtualization and backups.

Because layers are separate, responsibilities often split too: a server/hypervisor admin, a SAN admin, a network engineer, backup and security specialists.

The classic approach shines in large environments that require fine-grained performance and capacity tuning. You can scale disk subsystems separately, upgrade servers independently, choose RAID levels, caching and disk classes for different workloads.

Problems usually appear at the integration points. To create a new VM pool you may need to allocate a LUN on the SAN, configure zoning or VLANs, verify multipathing and access policies, check firmware and driver compatibility, then test resilience. The more handoffs and configuration points, the more important it is to assign roles and coordinate the work in advance.

Deployment timelines: which is faster and why

In IT projects the calendar is often eaten not by procurement itself but by a chain of steps: delivery, mounting, power and network, configuration, testing, commissioning and sign-off. In agencies you also add approvals, security checks and formal inspections.

HCI typically boots up faster because there are fewer components and part of the configuration follows templates. But this advantage only holds if the network is ready: agreed L2/L3 designs, VLANs, MTU, channel redundancy and address space. Without that, a "fast" cluster can easily lose its lead.

The classic scheme usually takes longer because there are more stages: separate servers, separate SAN, separate SAN network, more compatibility checks and more handoffs between teams. Even with good deliveries, time goes into integration and resolving small mismatches in settings.

What to check to avoid schedule slips

Before starting, go through a few points:

site readiness: ports, racks, power, cooling;
approved network diagrams, addressing, security and access rules;
migration plan and downtime window for live services;
acceptance test scenarios and "commissioned" criteria;
who will handle integration and support in the first weeks.

If a regional agency needs a virtual infrastructure in 6–8 weeks, HCI often fits when networks and documentation are ready. In projects requiring SAN, zoning and separate storage contours, classic timelines usually expand. In such projects acceleration often starts not with hardware, but with network preparation and paperwork.

Support and competencies: who will maintain it

Support often determines the choice more than raw specs.

For HCI, a team proficient with virtualization and basic networking is usually enough. Storage is built-in, so many traditional SAN admin tasks either disappear or become simpler: fewer consoles and fewer places to make mistakes.

In the classic model the skill set is broader. Besides virtualization, you need SAN knowledge (RAID policies, pools, performance), SAN networking (FC/iSCSI), zoning and path diagnostics. It's not inherently difficult, but it requires practice and time.

In either case, close out basic roles in advance: virtualization, network, backup and restore (with tests), security (accounts, logs, segmentation), and incident/change processes.

In a small IT team HCI is usually easier to organize: a single stack, unified updates, a clear growth model. In the classic approach dependency on multiple vendors and contractors appears faster: servers, SAN, network — each with its own support.

Plan training ahead: appoint an owner and a backup owner for the platform, rehearse restore from backup and run "node failure" and "controller/path failure" scenarios. If an integrator provides 24/7 support, define responsibility boundaries and response times before go-live.

Scaling: what happens as load grows

HCI or Classic Sizing

Get a configuration tailored to your VMs, data growth and RPO/RTO requirements.

Request sizing

Load growth in an agency usually looks like this: new systems are added, user numbers increase, archives and logs grow, heavy reports appear. It's important to know what grows — compute, storage or resilience requirements.

In HCI scaling is often simple: add a node, and CPU, RAM and disk pool grow together. This is convenient when growth is roughly uniform and you want to manage the infrastructure as a single unit.

In the classic model growth is separate: need compute — buy servers; need capacity or IOPS — expand the SAN. This is often more economical when growth is uneven (for example, storage volumes suddenly increase due to archives or video while compute needs stay the same).

A common issue with HCI is "I only need more disks." In some architectures that means buying additional nodes that also bring CPU/RAM. Sometimes vendors offer disk-focused configurations or dedicated storage nodes, but clarify this before procurement.

In almost any scenario the network becomes visible quickly: bandwidth, latency and redundancy. As the cluster grows internal traffic increases (replication, data rebalancing after failures), so plan the network with headroom.

Economics and procurement for an agency: what to watch besides price

When comparing HCI and classic setups, people often look only at hardware price. For an agency, total cost of ownership matters more: deployment, support and downtime risks.

A fair estimate must include not only servers and disks, but also licenses (virtualization, management, backup, monitoring), design and migration, testing and training, service and spare parts, and the cost of downtime (service loss, fines, manual workarounds).

Timelines and effort often change the outcome. If the classic approach requires more approvals, integration and stitching components together, the project takes longer and the team stays busy with repetitive tasks. Sometimes a pricier delivery pays off because commissioning is faster and the migration risk is lower.

For procurement decide in advance whether to order a single integrated kit or separate items. A single kit simplifies acceptance and accountability; separate items offer flexibility when replacing parts. In all cases, fix who is responsible for compatibility and a unified support contour.

One more point — supply chain transparency and nationwide service: predictable lead times, clear equipment origin, and the ability to recover quickly after failures. If local content requirements matter, check the manufacturer's status and supporting documents. In Kazakhstan this is often critical for procurement with preferences: official local manufacturer status and certificates (ISO 9001, ISO 14001, ISO 45001) reduce some bureaucratic risks.

How to choose: a plan for the IT manager and the customer

Implementation Plan Without Disruptions

We will align timelines, acceptance criteria and responsibilities for deployment in the agency.

Discuss project

Start not with a technology name but with the outcome over 1–3 years: fewer outages, faster launch of new services, simpler updates, and clear responsibility.

Formulate goals in verifiable terms: "migrate 60 VMs to the new platform without stopping critical systems", "ensure N+1 resilience", "reduce time to deploy a standard service to 1 day." Then follow a short sequence:

Describe the load profile with simple metrics: number of VMs, data volume, annual growth, peak hours, share of critical systems.
Record constraints: deployment timelines, staff and skills, isolation requirements, acceptable downtime windows.
Define backup and DR first: where copies are stored, required RPO/RTO, and how recovery works if a site fails.
Run a pilot on 1–2 typical services and evaluate not only speed but support: updates, diagnostics, reaction times.
Document acceptance criteria and a migration plan with clear rollback steps.

If an integrator is involved, clarify in documents who is responsible for design, migration, training and ongoing support.

Example scenario: a regional agency updating its server room

A regional agency runs several key services: file shares, domain, mail or collaboration, 1–2 agency systems, a reporting database and backups. The team is small: 2–3 admins covering everything, including network and workstations. A few hours of downtime is painful; maintenance windows are usually nights or weekends.

Option A: HCI cluster

A typical start is a 3-node cluster: compute, storage and resilience in one environment. Deployment often fits a short schedule: installation, cluster setup, VM migration.

The schedule advantage is clear when you suddenly need a new service (for example, a citizen request portal or a video conferencing system). You usually allocate resources inside the cluster and spin up another VM without a separate SAN story.

Option B: servers plus SAN

The classic approach is convenient when roles are well understood: separate servers, separate storage, separate storage networks and policies. But the project usually takes longer: more coordination points, more settings and compatibility tests.

After 12–18 months, when data grows (scans, archives, databases), HCI often receives 1–2 additional nodes, increasing both disk and CPU/RAM. In the classic model you can expand only the SAN without touching compute, but you must plan licenses, ports, shelves and sometimes migrations.

With branches and remote sites HCI is often easier to replicate with small clusters and a unified management approach. In the classic model a branch can become a separate mini-install (servers, SAN, network) and increase support burden.

Common mistakes when choosing HCI or classic

The most common issue is deciding based on general promises ("HCI is simpler" or "SAN is more reliable") rather than on real loads and processes.

Frequent mistakes:

Choosing HCI without understanding storage profile. If you have heavy databases, logging and rapid growth, be clear about whether you need IOPS, low latency, capacity or growth rate.
Not validating the network and resilience design. HCI is especially sensitive to a correct inter-node network and sufficient bandwidth.
Mixing critical and test systems without rules. Without resource isolation, test workloads can unexpectedly consume production performance.
Postponing backups. Backup windows, retention and recovery requirements (RPO/RTO) must be clear before procurement.
Comparing only hardware price and forgetting about deployment, training and support.

When working with an integrator, ask for a short load model and migration plan, not a presentation. That reveals risks faster than a high-level discussion.

Short checklist before deciding

Post-launch Support

We will organize 24/7 support and define procedures for incidents and changes.

Set up support

Document a few items before choosing. Usually 1–2 meetings suffice and save weeks of approvals and rework.

List the systems that will run in the new environment and set criticality by at least two markers: maximum allowable downtime (RTO) and acceptable data loss (RPO). Then clarify real growth: data volumes, growth rate, presence of archives, scans, video and backups.

Minimum checks:

critical services described and a maintenance window defined;
current capacities and growth forecast (at least approximate) known;
resilience requirements recorded: node failure and site failure;
site readiness: network, power (UPS, feeds), racks and cooling;
backup described with responsible people and a test restore.

Also agree roles: who administers the platform, who takes 24/7 incidents, who communicates with vendors and oversees part replacements. In public organizations, lack of this often delays commissioning.

Next steps: how to prepare the project and avoid delays

To keep the debate factual, start with baseline data: which services must stay online, how many users, acceptable downtime windows. List constraints separately: procurement rules, security, placement, import substitution.

Then compare 2–3 options (HCI, classic, hybrid) using the same criteria table: timelines, risks, support, and how capacity grows.

A practical sequence that usually saves time:

collect a load profile (CPU, RAM, disks, network) and a 2–3 year growth estimate;
agree acceptance criteria and performance requirements;
plan a pilot on representative services with pre-agreed measurements;
create a wave-based migration plan and rollback order;
approve a minimal support contour: shifts, monitoring, update windows.

If you lack experience with public-sector infrastructure projects, involve an integrator who can run pilots, migrations and documentation for audits. In Kazakhstan this is commonly done together with a local manufacturer and system integrator such as GSE.kz: the company offers its own S200 series servers for data centers and nationwide system integration services with 24/7 technical support.