Feb 01, 2025·8 min

HPE SimpliVity 380 for Regional Sites: Choosing Nodes

We explain how to choose HPE SimpliVity 380 for regional sites: sizing CPU/RAM/disks, pilot testing, VM migration and fault-tolerance checks.

Why HCI in regional sites and which problems it solves

A regional site often follows a simple rule: there are few IT people, and downtime is unacceptable. If a single server or storage device fails, it impacts cash desks, reception, classrooms, or office services. Traveling to the site, bringing spare parts and troubleshooting on location is often not possible or takes too long.

Branches choose HCI primarily because it reduces the number of separate devices and failure points. Instead of a bundle like "servers + SAN + separate controllers" you get a cluster of nodes where compute and storage work together. Management becomes straightforward: add a node, expand the pool, migrate VMs, and schedule firmware updates.

In practice, HPE SimpliVity 380 on regional sites usually handles basic tasks that must "just work" every day: infrastructure VMs (AD/DNS, printing, monitoring, local apps), file services and document exchange, VDI or terminal desktops for small groups, and small databases and accounting services if the load is predictable.

Discuss site constraints in advance. They directly affect configuration and rollout plans. If the link to the main DC is unstable, the branch must survive a disconnect without stopping local services. Power and cooling in regional sites are often weaker, so clarify available power, UPS presence and battery runtime. Don't forget the physical details: rack unit capacity, whether there is a rack at all, who can access the server room and how quickly someone can get in outside working hours.

A simple example: a branch has 2–3 critical services and a dozen supporting VMs. Putting this into an HCI cluster with a clear fault-tolerance scheme reduces the risk of a single point of failure and simplifies support, even if there isn't a skilled engineer on site.

Gathering baseline data: what you need before choosing a configuration

Before selecting HPE SimpliVity 380 for regional sites, collect facts about current loads. Without that you may overspecify "just in case" or, conversely, underestimate peaks and run out of resources in the first month.

Start with a list of virtual machines. For each VM record its role (AD, file server, 1C, VDI, video surveillance, etc.), OS, service owner and criticality. Note allowable downtime windows and dependencies: what must start first, which VMs communicate with each other, and which services rely on the external link.

Then gather actual resource consumption, not "allocated" values. Measure CPU (average and peaks), RAM (working set and peak growth), disk (used capacity, daily/weekly growth, IOPS and latency), and network (peak traffic and common issues). Usually 2–4 weeks of monitoring is enough. If there is seasonality, capture a heavy period.

Estimate growth for 12–36 months: new services, staff growth, video surveillance projects, database growth. Even rough forecasts help provision a reasonable buffer.

Also define availability requirements. Decide in advance what must survive a single-node failure and what must continue working if the site has problems (and how exactly this is ensured).

And don’t forget placement conditions. Check:

access to the rack and physical security
power and redundancy (UPS, feeds)
cooling and room temperature
acceptable noise levels
maintenance constraints (who and how quickly can reach the site)

Example: a branch has 25 VMs, 6 of them critical (AD, file server, accounting, print, VPN). The accounting system causes disk peaks at month end and downtime windows are only at night. This immediately affects the required CPU/RAM buffer and disk and fault-tolerance needs.

Basic approach to choosing node count and N+1 buffer

For a regional site, the two main questions are: how many nodes are needed for a pilot, and how many for production. The minimal fault-tolerant HCI setup usually starts with a cluster that can survive the loss of one node without stopping critical VMs. So the basic rule of thumb is N+1: key resources should remain available even if one node fails.

Before calculations, group VMs. This makes it easier to see what must always run and what can degrade during an incident:

steady workloads (AD/DNS, file services, 1C, terminals)
peak workloads (month-end closing, reporting, backups)
latency-sensitive workloads (VDI, voice, some databases)
RAM-heavy workloads (in-memory, large caches, analytics)
utility VMs (monitoring, updates, test VMs)

Then sum CPU/RAM/disk needs for steady and critical VMs, add realistic peak headroom (a typical busy day), and then apply N+1. For example, in a 3-node cluster, after one node fails you have 2 left, and those remaining must be able to host the mandatory workload.

When is CPU frequency more important than core count? Frequency matters for VMs with few vCPUs and strict response-time requirements (terminal services, single-threaded apps). Core count matters where many parallel threads run (multiple application servers, lots of background jobs).

Also note VMs with licensing or binding constraints: socket limits, hypervisor version requirements, MAC bindings, USB keys, or special drivers. These often affect cluster size and migration planning, especially when you want a smooth pilot.

Short example: a branch with 35 VMs and reporting peaks. For a pilot, take the minimum nodes to validate migration and single-node failure on a test VM group. For production, increase configuration to N+1 for RAM and disks, because branches often hit memory and storage limits rather than pure CPU cores.

CPU sizing: how to avoid mistakes with cores and frequency

CPU selection for HPE SimpliVity 380 at a regional site comes down to how your VMs behave during peaks. Translate current metrics into clear guidelines: average load, peak load during business hours, and specific times when users complain about slowness.

It helps to categorize VMs by behavior: some provide a steady baseline, others create spikes, and some almost always load the CPU. Keep in mind VMs that "live on CPU": reporting jobs, dense VDI sessions, databases and integrations, and anything performing encryption or archiving.

Check the following to avoid mistakes:

are there sustained peaks above 70–80% on hosts or individual VMs during business hours?
are there apps bound to a single thread (then frequency matters more than core count)?
how often do VMs show CPU Ready or similar CPU-wait signs?
how much overhead do the hypervisor and cluster services consume?

Example: a branch with 35–45 VMs has average host CPU of 25–35%, but peaks up to 75% at 10:00 and 16:00 due to reports and antivirus scans. Adding cores might help, but if the main slow service is single-threaded, a higher-frequency CPU plus job scheduling will be more effective.

Agree on a growth policy in advance. If you expect +20% users in a year, decide whether you will add nodes (scale out) or replace nodes with higher-performance CPUs. Fixing this before procurement avoids a pilot that looks good only on initial load.

RAM sizing: peaks, headroom and limits for heavy VMs

Memory in HCI often runs out before CPU or disks, especially in branches where teams like to keep an extra buffer "just in case." For HPE SimpliVity 380 focus on peaks, not averages. Peaks determine whether the site survives the working day and a single-node failure.

Collect two values per VM: actual active RAM usage and peak spikes over a typical week (workdays, month-end, night jobs). If you only have "allocated" RAM, that’s a poor basis—some memory may be reserved but unused.

Identify RAM-heavy VMs that don’t compress well or tolerate shortages: databases with large caches, apps with fixed memory settings, dense VDI. Decide in advance whether overcommit is acceptable for those VMs or whether they need a 1:1 policy.

Practical rules that usually save the project:

size for N+1: RAM must be sufficient when one node is lost so VMs relocate without swapping
do not overcommit RAM for critical VMs, or strictly limit overcommitment
moderate overcommitment is acceptable for non-critical services with peak monitoring
plan for 12–18 months growth to avoid immediate upgrades after launch
check where reservations and limits are configured, they skew the picture

Also consider NUMA for large VMs. If a VM has many vCPUs and lots of RAM spread across NUMA nodes, performance can drop unexpectedly. Practical guidance: keep a heavy VM’s RAM within a single NUMA domain of the host and avoid oversizing vCPU.

Example: a branch with 35 VMs has average RAM use of 5–6 GB, but 6 database and reporting VMs spike to 32–48 GB. Sizing by averages would pass on paper; sizing by peaks plus N+1 shows you need more RAM per node or lower VM density, otherwise an outage will cause swapping and noticeable slowdowns.

Storage sizing: capacity and performance without complex formulas

HCI pilot at your site

We will build a test matrix and validate fault tolerance on your branch's real network.

Order a pilot

At branches, disks are often the source of surprises: documentation may show enough capacity, but in reality you run into latency issues or sudden data growth. For HPE SimpliVity 380 rely on facts: metrics and a reasonable buffer.

First, map current VMs: average and peak IOPS, read/write profile, average latency during business hours and the size of the working set (how much data is actually "hot"). Look not only at daily averages but also at month-end peaks, backups, and update windows.

Practical sizing steps

Capacity can be split into useful (what you must store now) and buffer. The buffer covers data growth, platform overhead, and the impact of the chosen data protection scheme (for example, how many copies you store and where).

Keep the checklist short:

which VMs are latency-sensitive (DB, VDI, 1C, terminal farms) and which tolerate more latency (file shares, printing, infra)
where IOPS matter more and where consistent low latency matters more
how large the hot working set is and how frequently it changes
expected data growth over 12–24 months and acceptable buffer
available windows for heavy operations (backup, replication, maintenance)

Clarify responsibility boundaries: the platform covers local fault tolerance and fast snapshots, while classic backup and cross-site replication often have separate rules and windows. In the pilot test how the chosen disk profile behaves during backup and peak load so you don’t address problems only in production.

Pilot: how to plan and what to validate before migration

A pilot reduces risks before the main migration: validate HCI behavior on real network links and loads, and determine the actual resource buffer required. This is vital for regions where on-site staff and connectivity are limited.

What to check in infrastructure

Start by recording network and connectivity. Measure bandwidth, latency and packet loss on primary and backup channels, and actual failover times. If the site relies on a single provider, a local pilot will quickly show where redundancy is required.

Choose a pilot model: a small cluster on site (best reflects reality) or a lab in the main DC (faster and easier access). If the branch regularly suffers latency spikes, an on-site pilot gives honest response-time data.

Prepare a short test matrix and agree on a maintenance window:

fail one node and verify services remain up
simulate a disk problem and observe system reaction
reboot a node during business hours
restore a test VM from backup
validate admin ops: create VM, snapshot, migrate

How to document pilot results

Define success criteria and sign-off actors in advance: IT, business and security. Usually 3–4 measurable points are enough: recovery time, performance stability, ease of daily operations, clarity of reports.

Agree on observability: which metrics to capture (CPU/RAM, disk latency, network, task durations), where to store results, and who is responsible for the final report.

VM migration plan: steps from preparation to rollback

A successful migration to HCI is rarely platform magic; it’s disciplined preparation. Start with inventory: which VMs exist, who owns them, dependencies (DB, file shares, licenses, integrations), and acceptable downtime windows. Before moving, remove old snapshots and ensure guest OSes have no conflicting legacy drivers or agent software. Align VMware Tools (or equivalents) and OS updates to avoid post-migration surprises.

Choose the migration method by risk and downtime, not convenience. In regions two approaches work best: move VMs in groups (e.g., 10–20 per window) or move by service (application with its DB and supporting nodes). If downtime is critical, schedule a short planned outage and mandatory post-move checks.

Practical migration order:

non-critical VMs (test, auxiliary) to rehearse the process
business-critical services with a clear verification plan (user scenarios, printing, 1C/ERP, CRM)
infrastructure roles (AD/DNS, monitoring, backup) only after application VMs are stable

Have a rollback plan before the first move. Define what constitutes failure (for example, 2–3 failed key checks, rising application errors, unacceptable latency), who decides, and how many minutes it takes to revert. Rollback usually uses a prepared return to the old host and a clear restore point (snapshot or backup). Record results in a simple protocol: what was moved, what was tested, and outcomes.

Communicate clearly: one maintenance window – one on-site responsible person and one central IT contact, plus a brief notice to users. If an integrator runs the migration, agree on the communication channel and a stop-word for aborting work.

Fault-tolerance testing in the pilot: a simple checklist

VDI and terminals at the branch

We will calculate user density and latency requirements for VDI and terminals.

Calculate VDI

In the pilot, it's important not just to confirm VMs boot, but to validate how the site handles failures and maintenance. This matters for branches: there are often no on-site engineers and services must remain available.

Before tests, fix a baseline: which VMs and apps participate, which metrics you capture (service availability, latency, recovery time), and what success looks like. Inform service owners and choose a safe window for simulating failures.

Minimum test set

Start with the most realistic failure: simulate one node failing. Ensure key services (domain, file shares, DB, terminal server) remain available and that users notice only a brief pause, if any.

Next, reboot a node during business hours. This test reveals actual user impact: how apps handle load migration, whether responses dip, and whether sessions break.

Third, restore a VM from backup and verify the application, not just that the VM started. For example, log into the system with a test account, access data, perform a simple operation (create a document, run a transaction, open a report) and confirm data integrity.

RTO/RPO and recording results

Measure RTO and RPO with a stopwatch and system logs: how long did recovery take and what recovery point was achieved. After each run produce a short report: what was done, results, where issues occurred (network, DNS, app dependencies), what fixes were made, and rerun tests to confirm corrections.

Common mistakes in node selection and migration

The most frequent problem when deploying HPE SimpliVity 380 in branches is relying on default assumptions instead of measured figures. The cluster launches but immediately hits memory limits, or storage is slow, or the environment becomes risky when a single node fails.

Errors in sizing CPU, RAM and disks

Many overestimate deduplication and compression as guaranteed capacity savings. Real gains depend on OS types, databases, encryption and how much similar data exists. Use a conservative reduction factor and validate it in the pilot.

Another common mistake is sizing memory by averages. A branch can be calm 90% of the time, but daily peaks (shift close, reports, backups) cause issues. Under a single-node failure the load must fit on the remaining nodes; RAM shortage appears instantly.

Also, mixing all workloads without placement rules means 1–2 heavy VMs (1C, SQL, VDI) consume resources and degrade others.

Quick pre-check list:

real RAM peak and N+1 behavior
which VMs are "heavy" and where they'll be placed
required disk headroom if compression efficiency is lower than expected

Pilot and migration mistakes

Migrations often start without a clear rollback plan. Then a maintenance window is missed and tests are incomplete. The pilot is seen as a failure, though the issue was poor organization.

Another mistake is running a pilot without success criteria. Without agreed metrics (key app response time, allowed downtime, RPO/RTO), it's hard to make decisions and justify budget afterwards.

Practical example: a pilot moved 10 VMs and "everything seems to work" but peaks and node-failure tests were skipped. After production cutover at month end the branch experiences slowness and during an outage memory proves insufficient to restart all VMs. This is preventable by defining test scenarios and scheduling repeated runs.

Short checklist before purchase and start of work

Check site readiness

We will assess power, UPS, cooling and access to the server room before choosing a configuration.

Request an audit

Before ordering HPE SimpliVity 380 for regional sites, document facts and agreements on a single page. This reduces the risk of tightly packed nodes and drawn-out migrations due to hidden dependencies.

Gather the real VM picture: not only counts but roles—domain controller, file services, 1C, VDI, video surveillance, local DBs. Note what will break on move: IP bindings, licenses, integrations, backup schedules.

Agree availability rules: do you need N+1 (survive a node failure without degrading critical services) or is a recovery model sufficient? This directly impacts node count and CPU/RAM/disk buffers.

Minimum checks before start:

VM inventory is ready: owners, dependencies, criticality and downtime windows agreed
2–4 weeks of metrics: CPU/RAM peaks, used space, data growth, disk workload profile
agreed availability and capacity goals: N+1 rule, headroom for growth and heavy VMs
pilot defined as a project: success criteria, test matrix including fault tests
migration planned in waves: transfer order, communications and a clear rollback plan with a decision point

Finally assign responsibilities: who provides access, who signs test results, who accepts downtime. For a regional site it’s critical to fix the work schedule and an on-site contact to avoid wasting the night on logistics.

Example scenario for a branch: from pilot to roll-out

Imagine a regional office with 35–60 VMs: two critical systems (e.g., 1C/SQL and domain controller/DNS) and the rest are file shares, printing, terminal services and small apps. The goal is simple: the office continues to operate if one HPE SimpliVity 380 node fails, and total downtime stays within the agreed limit.

Build the pilot around a typical cluster and agreed metrics: how long switchovers take, network behavior, and performance during peaks. Decide which VMs can be briefly stopped and which must survive a failure almost unnoticed.

Migrate VMs in waves so you can test not only booting but actual user behavior after each wave:

Wave 1: secondary services (test, auxiliary) and 5–10 VMs to warm up the process
Wave 2: core infrastructure (AD/DNS, file services) but without the heaviest loads
Wave 3: critical applications and databases in a pre-planned window

After each wave record facts: migration time, latency changes, application errors, user complaints, CPU/RAM/IOPS deviations. Also verify backups, restore of a VM and console access under problems.

Document pilot results concisely: a "before/after" table, a list of risks (e.g., network bottleneck or insufficient RAM at peak), and required remediation tasks with owners and deadlines.

Then roll out using the same template: identical waves, a unified test suite and a standardized report format. If an integrator runs the pilot, agree in advance who is responsible for infrastructure, applications and day-of-migration support to minimize surprises on other sites.

Next steps after the pilot: operation, support and scaling

After a successful HPE SimpliVity 380 pilot at a regional site, solidify the outcome. The pilot proves the solution works. Ongoing success depends on clear operations and agreed support.

First decide who will maintain the site daily. For some branches central IT is sufficient and on-site staff only perform simple hardware swaps or reboots by checklist. For critical sites (healthcare, finance) a hybrid model is common: centralized control plus a local responsible person.

What to document

Keep documentation lean and usable:

current diagram: nodes, networks, service dependencies
roles and access: who can stop VMs, change policies, or apply updates
short procedures: upgrades, scale-out, part replacement, restoring from backup
emergency actions: "what to do in 5 minutes", "what to do in 1 hour"

Support and spare parts

In regions downtime often depends on spare parts and approvals, not HCI complexity. Pre-agree maintenance windows, escalation order and a minimal set of spare parts to keep closer to sites.

Plan roll-out: choose 1–2 standard configurations, provide short training for local staff and perform quarterly checks (capacity, VM growth, backup health, fault-test results).

If you need help moving the pilot into production, migrating and integrating sites across Kazakhstan, this can be arranged through GSE.kz. The company operates as a systems integrator and, according to information on the site, provides 24/7 technical support and a service network across the country.

FAQ

Why does a branch need HCI if it currently "somehow works" on one or two servers?

HCI is convenient for branches because it reduces the number of separate components and failure points. Instead of separate servers and separate storage, you get a cluster where compute and storage work together, and typical administrative tasks become simple operations like adding a node, expanding a pool, and applying planned updates.

What tasks are typically handled by HPE SimpliVity 380 at a regional site?

Most often these are infrastructure VMs such as AD/DNS, print and monitoring services, file sharing and document exchange, small terminal/VDI scenarios for user groups, and predictably loaded databases and accounting systems. The key requirement is stability: services must run reliably day-to-day and survive incidents without long downtime.

What data should be collected before choosing the node configuration?

Start with an accurate list of VMs and their criticality, then collect real metrics of CPU, RAM, disk and network usage for 2–4 weeks, including peaks. Also record site conditions: power, UPS, cooling, access to the server room and link quality to the main DC, because these constraints directly affect configuration and rollout plans.

How to correctly apply the N+1 rule in an HCI cluster for a branch?

Aim to ensure that after the failure of one node, critical VMs continue to run on the remaining nodes without memory swapping and without severe disk degradation. Practically this means sizing based on actual peaks rather than "allocated" values, and verifying that the mandatory workload fits into the cluster with one node down.

What matters more when choosing CPU: more cores or higher frequency?

If issues are response-time related for single-threaded services with few vCPUs, higher clock speeds usually help more. If you have many parallel workloads and many VMs loading the CPU at once, total core count matters more. Tie the final decision to peak periods when users actually experience slowness, not to daily averages.

Why does RAM usually become the limiting factor and how to provision a buffer?

Memory often becomes the bottleneck first because peaks can be rare but sharp, and when a node fails the load must move to the remaining nodes. Size RAM based on peak actual usage and keep enough headroom so critical VMs don’t go into swap during an outage; otherwise they’ll technically run but users will feel severe slowness.

How to choose the storage setup without complex calculations and avoid performance pitfalls?

Separate capacity and latency concerns: available space doesn’t guarantee performance. Record average and peak IOPS and latency for current VMs, mark services sensitive to response time, and test behavior during backups, updates and month-end processing — that’s when surprises typically surface.

What must be checked in a pilot before the main migration?

A useful pilot must demonstrate behavior on the branch’s real network and under real loads, including incident scenarios. At minimum check one-node failure, a planned node reboot during business hours, restoring a test VM from backup, and basic admin operations, then record measurable success criteria like recovery time and performance stability.

How to plan VM migration to HCI so rollback is clear?

Start with inventory, dependencies and allowable downtime windows, then clean up old snapshots and align guest tools/drivers to avoid post-migration conflicts. Migrate in waves and prepare a rollback plan in advance: define what constitutes failure, who decides, and how quickly you can return the service to the old environment.

What to consider about connectivity, power and site “physical” conditions to avoid derailing the rollout?

If the link to the main DC is unstable, the branch must remain operational locally when the connection drops, and remote administration behavior should be tested. Also confirm there’s real available power, UPS and sufficient cooling, because in regional sites these limits often cause unexpected shutdowns and complex incidents.

What is the short pre-purchase and pre-start checklist?

Collect real facts and agreements on a single page: VM inventory and roles, chosen availability rules (N+1 or recovery model), metrics for 2–4 weeks, pilot scope and success criteria, migration waves and rollback plan. Assign who gives access, who signs test results and who is the on-site contact to avoid wasting time on logistics.