Why containers don't always replace VMs

Containers often promise three things: fast startup, cheaper infrastructure and easier application delivery. In many cases that's true, especially when you need to update services frequently and bring up identical environments quickly.

The problem is that asking "will containers replace VMs?" is usually the wrong question. It's more useful to ask: what exactly do you want to improve and what risks are you willing to accept? Containers change how an application is packaged and delivered, but they don't remove operational requirements: predictability, security, support and recovery.

Practically speaking, containers are strong where speed and repeatability matter. Virtual machines are more convenient where strict isolation, different operating systems or familiar admin processes are required. So the boundary often follows responsibility: what are you willing to trust to the host kernel, and what do you want fully separated?

As a simple guideline:

Containers are suitable for microservices, web apps, background jobs and test environments.
VMs are often preferred for legacy systems, applications that change rarely, workloads that need strong isolation, and scenarios with different OSes.
A common compromise is containers running inside VMs: this makes it easier to combine Kubernetes with familiar security boundaries.
For some workloads "bare metal" still makes sense when predictable performance or licensing specifics are critical.

In one data center it’s perfectly fine to run a container cluster for new services while keeping VMs for accounting or medical systems that change slowly. Updates and incidents don’t mix, and support stays clear: each zone has its own rules and processes.

Containers vs VMs: the core difference is isolation

A container and a virtual machine try to solve a similar problem—packaging an app so it’s easier to run and move—but they do it differently, and that difference in isolation shapes risks and operational practices.

Containers share the host operating system kernel. There is one OS on the server and all containers use its kernel; they are isolated using OS mechanisms (namespaces, resource limits). This gives fast startup and lower memory usage, but a container has no separate kernel, so boundaries between containers are thinner.

A virtual machine behaves like a separate computer inside the physical server: it has its own guest OS and kernel. A hypervisor separates VMs much more firmly. If one VM fails, it usually doesn't affect neighbors. In the container model, a kernel vulnerability or a misconfiguration can have a wider impact.

Three terms that help avoid confusion:

Image — the template used to start a container (application and dependencies).
Layer — a part of an image. Layers are reused and speed up updates.
Registry — the storage for images, where servers pull them from.

Containers are especially convenient when an application is split into small components that must be deployed and scaled quickly: web, API, background workers. If load increases, add instances; if it drops, remove them.

Remember: containerizing an app doesn't make it "automatically safer." It makes it portable and convenient for frequent releases, but isolation and security require explicit configuration and discipline.

Where virtual machines remain mandatory

Containers work well when applications share similar requirements and live in the same OS environment. In real life they don't replace VMs when a different level of separation or a different OS is needed.

The most common blocker is different operating systems or kernel versions. If you need Windows and Linux side by side, or must keep an old distribution with a specific kernel, containerization quickly hits platform limits. A VM provides a capsule with its own OS and predictable behavior.

VMs also win when environments must be strictly separated: different departments, different trust levels, contractors, test and production on the same hardware. It’s easier to define responsibility boundaries and reduce the risk that a container platform misconfiguration opens access where it shouldn't.

Legacy applications are another pain point. These systems may need drivers, special security or monitoring agents, OS-bound services or old library versions. You can try to "wrap" them in containers, but this often becomes a separate project with questionable return on investment.

Another frequent argument is auditability and host-level control. If policy requires strict tracking of admin actions, ties to specific OS images and a clear update model, isolation via VMs is usually easier to explain and verify during checks.

Typically VMs remain mandatory if:

you need Windows alongside Linux or different kernels/old OS versions;
strong separation is required between teams or trust levels;
the application depends on drivers, agents or OS-specific features;
host-level auditing and control are important;
regulatory requirements are easier to satisfy with a VM model.

Practical example: an organization has a medical system on Windows with a certified security agent, while new Linux services are being developed. It's logical to keep the medical system in VMs and containerize the new components. On rack servers this often provides a balance: compliance without forgoing modern approaches.

Security risks when running containers

The main nuance with containers is the shared security base. They share the host kernel, so a kernel vulnerability or misconfigured isolation expands the attack surface for all containers on that host.

A common issue is privilege. "Root in a container" is often seen as a harmless convenience, but with misconfigurations it can become a stepping stone out of the container. Privileged containers and access to devices or the container runtime socket are even more dangerous: they can quickly elevate risk to a full node compromise.

Secrets and configuration are another risk area. Leaks usually happen not because of "hackers," but due to team habits: tokens in environment variables, passwords baked into images, secrets in repositories, or logging sensitive configuration "for debugging." A simple scenario: a key added to a Dockerfile ends up in a public registry and becomes accessible to anyone with read access.

In multi-tenant environments there’s also the "noisy neighbor" factor. Even without an attack, a resource-heavy container can cause service degradation (memory or I/O). Some classes of vulnerabilities allow one container to affect others via the shared node.

Finally, supply chain issues. Unverified images, random base layers and dependencies bring vulnerabilities into the infrastructure. Image substitution in a registry can become a quiet backdoor.

A minimal set of practices that usually gives a big improvement:

avoid privileged containers and root by default;
separate workloads across nodes, especially for different teams or systems;
store secrets separately from images and logs;
allow only verified base images and regularly scan dependencies;
patch the kernel and runtime on a schedule as a critical security task.

Operational risks: performance, storage, recovery

Estimate capacity for a hybrid data center

We’ll help estimate CPU, RAM, network and storage for Kubernetes and virtualization.

Get an estimate

Containers often look simpler: run an image, scale, update. In practice "easier" can become "more complex" when many components appear: a cluster, networking, ingress, storage, image registry, access policies, secrets, controllers. The more layers, the more failure points and configuration settings.

Performance: bottlenecks you don't see in tests

Issues usually surface not in CPU but in network and disks. Containers add an abstraction layer and services share the same host resources.

Typical hidden bottlenecks:

network: CNI quirks, MTU, extra hops through services and load balancers;
disk: IOPS and latency on shared storage, especially with many small operations;
resource limits: CPU throttling and OOMKill when memory runs out suddenly;
"noisy neighbor": one container increases latency for others;
node imbalance: the scheduler placed heavy pods together.

Another trap is misconfigured requests/limits. If requests are too low, pods overload nodes and then compete for resources causing instability. If limits are too tight, the app slows down without clear errors.

Data, backups and recovery: where mistakes are expensive

For stateful services (databases, queues, file storage) the container model requires more discipline. VMs are often simpler: one system image, one disk, straightforward snapshots. In Kubernetes recovery usually depends on the combination of manifests + volumes + external services, and it’s easier to make mistakes.

Before running stateful workloads check:

where the data lives: locally on the node or on network storage;
how backups are done: at the app level, volume level or platform level;
how long recovery takes: RTO and RPO should be verified with tests;
what happens if a node fails: will the pod move and will it see its data?

Example: a team moves an internal service with a DB into a cluster. Tests pass, but in production load spikes increase latency due to the disk subsystem and CPU throttling. On a VM the problem would be visible on one machine; in containers it’s harder to localize: symptoms are smeared across layers. Plan resources and capacity (IOPS, network, CPU/RAM headroom) in advance.

How to mix approaches without unnecessary complexity

The simplest and often safest approach is to treat the virtual machine as the trust boundary and run containers inside it. This preserves the hypervisor’s familiar isolation while giving the convenience of containers.

Next, decide whether to run the container cluster on dedicated nodes or co-locate them with other VMs on shared hypervisors. Dedicated nodes are easier to secure and explain to auditors but require capacity discipline. Shared hypervisors save hardware but increase the risk of workload interference and complicate incident analysis.

To avoid confusion, define one clear template and stick to it: the same structure for prod, test and sandboxes, but with different limits, access and data.

A practical separation by workload type usually looks like this:

stateless services (API, frontend, workers) — in containers, with fast scaling;
stateful systems (databases, critical queues, monoliths with heavy disks) — often in VMs, with clear backup and recovery processes;
integrations requiring legacy drivers or licenses — in VMs to avoid breaking support;
scheduled batch jobs — where resource control and rollback are simpler.

To avoid proliferating platforms, agree in advance on a minimal set of standards: one orchestrator, one logging approach, one secrets strategy and one base OS image.

Step-by-step plan to move to a hybrid model

A hybrid approach works best when you acknowledge technology boundaries up front. The goal is not to urgently "move everything to Kubernetes," but to reduce risk and keep support manageable.

Start with a clear inventory: what applications you have, what they depend on and what would be painful to lose. Surprises are more often in licenses, drivers, network rules and storage than in code.

Inventory. Record dependencies: OS version, kernel requirements, ports and network links, storage type, backups, licenses, integrations (for example AD, 1C, payment gateways).
Classify by criticality and risk. Mark services where downtime is acceptable and those where any failure is an incident.
Define isolation boundaries. Keep in VMs anything that needs its own OS, a clear update model or strong isolation. Put stateless services and easy-to-roll-back parts into containers.
Pilot 1–2 services. Choose something that brings quick value: an API, a task processor, an internal portal. Set metrics: release time, recovery time, incident frequency, on-call load.
Standards and support playbook. Base images, CI/CD rules, secrets management, and simple answers to "who’s on call", "where to look for logs", "how to roll back".

Train through the pilot: a short practical exercise on a real test-contour incident teaches more than a general course.

Monitoring, updates and backups: keep control

Pilot containers without overloading the team

We’ll run a pilot on 1–2 services and measure releases, incidents and recovery.

Start a pilot

To prevent a VM/container hybrid from turning into a zoo, follow a simple rule: unify core signals for everything. Otherwise the team will live in two different worlds and waste time finding where something broke.

Unified metrics and clear SLOs

Use common top-level metrics: service availability, latency, errors, load and capacity. Details differ: for VMs watch CPU steal, memory, disk queues and hypervisor health. For containers watch restarts, CPU/RAM limits, throttling, node saturation and scheduler behavior.

Agree on a small set of dashboards so the on-call person doesn’t jump between unrelated systems:

service: RPS, errors, latency and dependencies;
platform: Kubernetes nodes and VM hosts, disks, network;
data: databases, queues, storage, free space and IOPS;
changes: recent deploys, patches, restarts.

Logs, tracing and alerts without noise

Minimum standard for incident postmortems: centralized logs correlated by request-id, application metrics and basic tracing for key APIs. For containers it’s critical not to lose logs when a pod is recreated. For VMs remember log rotation and consistent formats.

Keep alerts to "what matters": errors, latency growth, outages, low disk space, node failures, spike in restarts. Noisy events that don’t affect users should not become night calls.

Patches and backups

Split patch management into three layers: hypervisors and hosts, container base images, and application dependencies. Each layer needs a maintenance window and a clear rollback (VM snapshot, previous image, previous manifest).

Backups in a hybrid setup are more than data. Usually you need to copy:

data (databases, volumes);
configuration (manifests, Helm values, secrets in a safe form);
infrastructure parameters (VM templates, network settings).

Common mistakes and traps when choosing containers

Many problems come not from the technology itself but from starting without rules.

The first trap is containerizing everything. Teams move apps without criteria or a rollback plan. Critical systems end up in a new environment and it's hard to go back: the environment changed, dependencies moved, documentation is missing.

Second mistake — no resource limits. Without CPU and RAM limits one service can consume a whole node and impact others. This usually appears during traffic spikes.

Third trap — mixing production and test on the same node "to save money." Even if it works initially, you get unpredictable resource contention and risk a test build affecting production.

What commonly breaks support:

images built on different base OSes with no standard;
no vulnerability scanning of images and dependencies;
manual fixes applied on servers and lost reproducibility;
secrets ending up in env vars, logs or inside images;
making the cluster complex too early.

Kubernetes is not always needed

Another common mistake is building a complex Kubernetes cluster when a simple orchestrator or a couple of VMs with clear deployment would suffice. The more components, the more updates, certificates, network policies and failure points.

Practical example: a branch office runs several internal services on one server. If experience is limited, it’s smarter to start with containers inside a dedicated VM, with resource limits and standardized images. Later, if scaling and resilience are required, move to a cluster.

Short checklist: what to choose in your case

System integration for your environment

We’ll design infrastructure to meet your regulations, audit and support model.

Discuss implementation

Start from constraints, not trends: OS, isolation, data and team readiness.

Need different OSes or kernel versions (e.g., Linux and Windows)? Choose VMs.
Need strong isolation for security or regulatory reasons? Keep VMs as the outer boundary and use containers inside them.
Is the app stateless, scales easily and tolerates restarts (API, frontend, workers)? Good candidate for containers.
Is there critical state (DBs, queues, file stores)? Decide carefully: either VMs or containers with solid storage, snapshots and tested backups.
Is the operations team ready (image standards, monitoring, update procedures, access control)? Then containers won’t turn into chaos.

Two often-forgotten questions: how do you roll back after a failure, and how do you recover after a disaster? If the answer is "we’ll figure it out on the way," it’s safer to start with VMs or a hybrid.

Example: a hybrid without overloading the team

Imagine a government agency or a bank: a payment system and customer database (highest trust and strict regulations) sit next to an employee portal, reporting, external integrations and internal analytics. The usual approach is simple: leave critical parts on VMs and build containerized services around them that are easier to change and scale.

A practical separation looks like this. Critical subsystems (core banking, licensed apps, audit-heavy components) run in separate VMs, sometimes on dedicated clusters and in isolated networks. Containers are used for frequently updated parts that should not have direct data access without controlled gateways: APIs, queues, web fronts, reporting services, background jobs.

To keep support simple, fix rules in advance: a single change process (even if part goes to VMs and part to Kubernetes), unified monitoring and alerts with owners, clear SLAs and a minimal number of environment types (for example one VM template and one container service standard).

Plan capacity with headroom for CPU and RAM: in a hybrid setup load spikes often come from releases, reports or month-end processing. For storage and network decide in advance where you need fast disks and where volume plus reliable backups is more important.

If you are choosing server hardware for virtualization and container nodes, consider not only specs but support and integration. For example, GSE.kz (gse.kz) as a vendor and integrator can handle both supplying S200 Series rack servers and providing infrastructure and support services, which simplifies responsibility in a hybrid model.

After deployment measure outcomes, not the number of containers: release time, incidents per month, recovery time (RTO) and how many manual actions remain.

Next steps are usually straightforward: a short app audit, a pilot on 1–2 noncritical services, then formalizing standards and selecting a platform that fits your security and procurement model.