Leaf-spine Data Center Network: When to Move On From a Two-Tier Design
Leaf-spine in data centers: how to tell when it’s time to move away from a two-tier design, and what to require of switches and optics (10G, 25G, 100G) so the network can grow without rework.

Why change the data center network topology at all
When a data center is small, the network usually runs without surprises. You have a few access switches, a pair of aggregation switches — everything is familiar. Problems start not on day one, but as racks, services and speed requirements are added.
Usually the first signs are not a single "lack of bandwidth", but a set of small yet recurring issues: not enough ports where you need them, more uplinks and patch cords, harder to keep consistent settings, and every expansion turns into a project with downtime risk. At the same time other symptoms appear: latency spikes from overloaded uplinks, rising east-west traffic (between servers), and less predictable oversubscription and its effects.
At some point you hit limits not only in hardware but in the network "geometry." Then the question arises: continue scaling a classic two-tier data center or move to leaf-spine, where scaling and traffic paths work differently.
People usually compare two options:
- Two-tier (access + aggregation), where growth is achieved by adding uplinks or beefing up the aggregation layer.
- Leaf-spine, where racks connect to leaf switches and each leaf connects to all spines by the same rules.
To decide without guesswork, you need to know when a topology change is truly justified, which metrics to check before buying hardware, and what to require from switches and optics (10G, 25G, 100G and above) so the network can grow without rework.
Two-tier and leaf-spine in simple terms
A two-tier data center typically looks like this: Top-of-Rack (ToR) or End-of-Rack (EoR) access switches in racks, and above them a pair of powerful aggregation or core switches. Servers connect to access, access uplinks go to aggregation, and traffic either goes to other racks or out to users and services.
A leaf-spine network works differently. Leaf switches sit next to servers like ToR, but each leaf connects not to a single aggregation layer but to all spines. The spines form a common "fabric": between any two racks there are multiple equivalent paths and routing spreads flows across them.
The difference is most visible for east-west traffic (between servers). In a two-tier design rack-to-rack exchange often goes through aggregation, which becomes a bottleneck as load grows. In leaf-spine the common pattern is leaf -> spine -> leaf, so paths usually have the same number of hops.
For north-south (to users, internet, office) both designs can work well, but leaf-spine is usually easier to expand without reworking the core.
In short:
- Latency: leaf-spine tends to be more predictable because paths are more uniform.
- Scaling: two-tier can hit the core; leaf-spine lets you add leaves or spines as you grow.
- Cabling: leaf-spine needs more inter-rack links but fewer special links to the core.
- Oversubscription: leaf-spine makes it easier to keep predictable ratios, but you must design those ratios up front.
Example: for a virtualization cluster or database that communicates heavily across racks, switching to leaf-spine often immediately reduces queues at the top layer and stabilizes response times.
When a two-tier design remains a good choice
The two-tier model (access + aggregation) looks boring — and that’s a strength. If your data center is small and growth is predictable, you get a working design with fewer devices, fewer cables and simpler configuration. In that case leaf-spine can add complexity without much benefit.
The most common scenario where two-tier is justified: predictable loads and a small number of racks. When you have roughly 3–10 racks and servers mostly talk to the internet, users and central services, simplicity and cost matter more than maximum scale.
When traffic is mainly north-south
If services are classic (mail, file shares, ERP, domain services, partial VDI), much of the traffic flows "out" from servers rather than between them. With this profile a two-tier network usually copes well provided the uplinks from access to aggregation are not an obvious bottleneck.
Signs you can keep the current design:
- few racks and slow, infrequent additions
- no dense clusters with heavy east-west traffic
- simple failover scenarios and easy diagnostics are priorities
- limited budget and upgrades needed for servers or storage too
- a small team without capacity for complicated changes
Example: a regional organization runs 6 racks with virtualization, telephony, ERP and backups. Peaks happen in the morning and month-ends, but inter-server exchange is moderate. Investing in reliable access switches, decent uplinks and spare ports is often better than rebuilding the topology.
If you plan procurement and deployment holistically (servers, racks, network, support), a systems integrator like GSE.kz usually starts with a traffic profile and growth plan, not the choice of a "trendy" topology.
At what size and load does leaf-spine make sense
Moving to leaf-spine is not triggered by a fixed rack count but by the moment when predictability and uniformity become more important than the lowest per-port cost. If most traffic is internal (east-west) rather than external (north-south), a two-tier design is more likely to hit bottlenecks and show uneven latency.
A strong signal to move is rising east-west traffic. It grows gradually: more virtualization and VM migrations, clustered platforms, network replication and backups, distributed databases and analytics. In a two-tier network some racks end up "closer" to resources and others "further", causing variable latency and throughput depending on direction.
Another sign is frequent expansions. If you add racks every few months, you want to plug new ToRs like building blocks: add a leaf, bring up uplinks and get the same characteristics as neighbors without reworking the core.
Leaf-spine is often chosen when you need:
- more even latency and bandwidth between any racks
- horizontal scaling without major redesigns
- a clear path to 25/100/200/400G as you grow
- control of oversubscription per leaf rather than "as it happened"
Simple example: you had 6 racks with classic services and occasional nightly backups. Then you add a virtualization cluster, data replication and daily backups to a storage pool. Rack-to-rack traffic becomes constant, and leaf-spine usually brings visible benefits: fewer surprises and easier capacity planning.
How to decide: step-by-step assessment for your data center
The move to leaf-spine boils down to numbers: how many racks, server port speeds, how much internal traffic, and headroom needed for growth.
A checklist that helps avoid mistakes:
- Record current and target scale: number of racks and servers, ports needed per rack, access speeds (10G, 25G) and uplink speeds (40G, 100G). Count real usage now and planned one year out.
- Measure traffic profile: is it mostly east-west or north-south? Look at peaks by time and service (backups at night, month-end reports).
- Choose target oversubscription separately for rack-facing and for the fabric. The more internal exchange you have, the closer to 1:1 you may want at the rack level and the more predictable the inter-rack segment should be.
- Project architecture 2–3 years forward: how many uplinks per leaf, how many spines, and spare port capacity. If calculations show you'll hit port or speed limits in a year, that's a clear sign.
- Check physical constraints: space for switches and patch panels, power and cooling, and actual cable routes. Cable density and tray space often become blockers.
Small example: currently 8 racks with 20 servers each using 10G, traffic mainly north-south — a two-tier design is fine. But if you plan to grow to 24 racks, move servers to 25G and add clusters, east-west traffic will rise and you should calculate oversubscription and port capacity: it may be easier to design a fabric now than to patch uplinks and rebuild the core later.
Requirements for switches: leaf and spine
Pick switches based on tasks, not on the "fashionable" speed. First determine ports per rack, predominant traffic type, and growth tempo.
A common rule: servers at 10/25G, uplinks at 40/100G and up. If you have lots of virtualization and distributed storage, 25G to servers gives more headroom than 10G, and 100G uplinks reduce the chance of hitting uplink limits within a year. Look at total switching capacity and routing/forwarding table headroom, not just individual port speeds.
Features to include up front:
- ECMP to spread flows across multiple links
- QoS with clear queues and prioritization (for storage and voice)
- Telemetry and stats export (sFlow/NetFlow) to detect hotspots
- LACP; MLAG only if you intentionally keep L2 aggregation and want L2 redundancy
Also verify buffer sizes and behavior during microbursts. For mixed flows (backups, replication, VM migration) small buffers can cause short packet loss and odd slowdowns even when average utilization is low.
On L2/L3 boundary, L3 up to the rack usually wins: fewer broadcasts, easier scaling and fewer complex incidents. If you plan overlays, include VXLAN/EVPN on the leaf (as a VTEP role) and treat spine mainly as an IP fabric.
Example: when upgrading racks from 10G to 25G and backups hit multiple server groups, a leaf with ECMP, sane QoS and telemetry makes overloads visible and easier to mitigate. In systems-integration projects (including at GSE.kz) these requirements should be fixed in design to avoid replacing hardware later because a function was missed.
Optics and cabling: what to plan from the start
In leaf-spine, cables and transceivers become part of the architecture. A common mistake: the network supports 25G/100G but real speed is limited by optics, link lengths or lack of fiber.
DAC, AOC and optics: practical choices
In short: DAC and AOC are convenient for short links; traditional optics are needed for distance and flexibility.
- Use DAC (copper) for short patch connections inside a rack or between adjacent racks when price and simplicity matter.
- AOC (active optical cable) is convenient as a "plug-and-forget" solution for tens of meters, but is less flexible if you want to change lengths or routes.
- SR multimode optics suit typical in-hall distances.
- LR single-mode optics are needed when racks are far apart, there are multiple halls, or you need distance headroom.
Tie your choice to physical placement: where are the spines, how many rows, are there separate halls, and how will trays and rack entries run? If placement is uncertain, plan for a solution that survives rearrangement without rerouting cables.
Speed ladder and reserve for expansion
Define a "speed ladder" for 2–3 years: 10G now, 25G to servers later, 100G uplinks, then 200G/400G. Optics and cross-connects should support these steps without mass recabling.
To avoid reworking every six months, aim for at least:
- fiber reserve in trunks (often 1.5–2× current need)
- cross-connects and patch panels with spare ports, not "tight fit"
- common connectors and module types across the hall
- clear labeling: rack, unit, port, line type, length, module type
- compatibility planning so you don’t troubleshoot "why it didn’t come up" during work
When these items are documented, upgrades go much smoother.
Common mistakes when moving to leaf-spine
Buying leaf-spine as a "quick upgrade" sometimes leads to rising latency, insufficient links and weeks of investigation. Usually the issue is not topology but choices that were fine in a two-tier network.
The most frequent mistake is deploying ToR (leaf) switches with 25G to servers but skimping on uplinks to the spine. The result is excessive oversubscription: the fabric exists but inter-rack bandwidth is lacking. This is painful for virtualization, distributed storage and backups.
Another trap is mixing speeds without a growth plan. Today 10G down and 40G up may look OK, but after adoption of 25G and 100G, temporary uplinks become permanent bottlenecks.
MTU and jumbo frames are another issue. If part of the network uses MTU 1500 and part 9000, overlays and storage may behave erratically: fragmentation in some places, hidden losses in others. This looks like random slowdowns though the root cause is configuration.
People also forget separate management and resiliency. Without an OOB network, a fabric error can cut access to switches and recovery turns into a site visit.
Finally, optics and transceivers: untested compatibility, mixed batches, different fiber types and lengths — and you get link flaps under load.
Before commissioning, verify basics:
- calculate and validate oversubscription with tests
- fix target speeds (10G/25G down, 100G up — or your variant) and an upgrade plan
- align MTU end-to-end and test critical flows (storage, overlays, backups)
- provide OOB management and dual independent power/routes
- test optics and transceivers for compatibility and stability
Short checklist before changing topology
Before moving from two-tier to leaf-spine, spend half an hour capturing some numbers and decisions. That removes subjective arguments and quickly shows whether risk lies in ports, optics or growth expectations.
Keep answers documented, not just in an engineer's head:
- Scale and horizontal growth: how many racks and servers in 24–36 months, and which growth is already confirmed by projects.
- Oversubscription targets: separately for rack (servers -> leaf) and for inter-rack segment (leaf -> spine). With lots of east-west (virtualization, containers, HCI, databases) the target is usually stricter.
- Speeds and handoff points: server port speeds (10G/25G), leaf uplinks, and required egress speed (WAN/internet/inter-DC). Decide where you move to 100G and where 25G suffices.
- Mandatory switch features: ECMP, telemetry, monitoring, QoS for critical apps, buffers for bursts, and required protocol support.
- Optics and cabling plan: module choices (SR/LR/DR per distances), real route lengths, fiber reserve, labeling and inventory rules.
A practical test: model one typical rack and calculate what happens if it suddenly hosts 30% more servers. If uplink or optics limits are hit immediately, plan leaf-spine now.
Example: how a data center outgrows a two-tier design
Imagine a small corporate DC with 10–20 racks. The network is classic: access (ToR/EoR) and a pair of aggregation switches. This is convenient initially: less optics, a clear scheme, and rare expansions.
After a year or two the site grows to 40–60 racks. Several virtualization clusters appear, a backup zone, some heavy applications and a test environment. Traffic becomes mixed: not only north-south but also east-west — replication, storage, service-to-service traffic and backups. Oversubscription appears: uplinks from racks carry less bandwidth than needed during peaks.
Symptoms repeat:
- aggregation hits port and throughput limits, especially during backup windows
- every expansion requires reworking VLANs and uplinks
- latency between zones becomes uneven and visible in clusters
A phased approach is common rather than a big rip-and-replace. Build a new zone as leaf-spine (for new racks or a new cluster). Migrate services gradually: first those needing predictable latency and wide east-west bandwidth, then the rest. Keep the old two-tier as an "island" during migration.
Agree with operations on three things upfront:
- unified port and utilization monitoring (for both old and new zones)
- optics inventory for 10G/25G/100G and route documentation
- migration plan with tests: latency measurements, resiliency checks, rollback window
Systems integrators (including GSE.kz) often start with a pilot of a few racks to validate calculations and choose switches and optics without risking the whole site.
Next steps: pilot, migration plan and support
To avoid a "big bang", start with preparation: an up-to-date diagram, current uplink and port speeds, identified hotspots, and a 12–24 month rack growth forecast. List critical applications and whether they value latency, bandwidth, predictability or operational simplicity.
Then run a pilot on a small segment rather than the entire site. A practical setup: 1–2 racks with typical load (for example, a virtualization cluster and storage) connected to new leaf and spine.
Pilot steps usually include:
- document "as-is": latencies, link loads, losses and east-west peaks
- bring up the new segment and repeat measurements at the same times
- test resiliency: remove a link, a leaf, or reboot a spine
- evaluate operational aspects: diagrams, labeling and monitoring
After the pilot, lock down standards, otherwise the network becomes heterogeneous. Agree on optics types (10G/25G/100G), module classes and lengths, patching and labeling rules, and configuration templates (port profiles, LACP/ECMP, MTU, security policies). A simple rule like "one transceiver type for leaf, one for spine" reduces errors significantly.
If you engage a partner, specify in the TOR:
- target speeds and acceptable oversubscription
- resilience requirements and maintenance windows
- rack, power and cooling constraints
- required integrations (monitoring, logging, access)
In Kazakhstan, on-site support matters: response time, spare optics and equipment availability. If you need a partner that covers the full cycle (hardware, integration, support), discuss a project with GSE.kz (gse.kz): they provide systems integration, DC infrastructure solutions and 24/7 technical support.
FAQ
When should I really consider moving to leaf-spine instead of just "adding uplinks"?
If you have few racks and growth is infrequent, a two-tier design is usually simpler and cheaper to operate. Transitioning becomes justified when inter-server traffic grows noticeably and expansions turn into constant rework of uplinks and configurations. The clearest trigger is when issues are not caused by "bad ports" but by the upper layer regularly becoming the bottleneck for rack-to-rack traffic.
Which metrics and observations best show that a two-tier network is already "not coping"?
Look at the share of east-west traffic: replication, VM migrations, traffic between cluster nodes, network backups, and application-to-application exchange. If peak rack-to-rack loads become steady or repeat regularly, that’s a signal. Also check whether latency grows specifically during internal windows like backups, replications and mass cluster tasks.
How to choose oversubscription so leaf-spine doesn’t become a "paper fabric"?
Base the choice on your real traffic model and latency goals, not on a single "correct" number. For virtualization, distributed storage and analytics, teams usually aim for lower oversubscription at the leaf so the inter-rack fabric behaves predictably. A practical method is to set a target aggregate bandwidth per rack (what servers can generate) and then choose uplinks so the fabric can handle peaks without queuing in one place.
What mistakes happen most often when moving to leaf-spine?
The most common mistake is to give servers fast ports but skimp on uplinks to the spine. The result is high oversubscription: the fabric exists on paper but lacks real inter-rack bandwidth, which hurts virtualization, distributed storage and backups. Other frequent errors: mixing speeds without a growth plan, temporary uplinks becoming long-term bottlenecks, and mismatched MTU settings that cause mysterious performance problems.
What should I look for in leaf and spine switches besides port speeds?
Choose leaf switches by rack-facing ports and uplink capacity into the fabric. Make sure the switch provides the required aggregate throughput, supports even flow distribution and offers good observability. Choose spines by number of ports, uplink speeds from leaves and growth headroom. If spines are sized too tightly, expansion will quickly force a core replacement—the very thing you wanted to avoid.
Is it better to use L2 or L3 up to the rack in leaf-spine?
Decide where L2 ends and L3 begins. For scale, running L3 up to the rack is usually simpler: fewer broadcasts and fewer "magic" behaviors to troubleshoot. If you need L2 segments over the fabric, plan overlays in advance and verify end-to-end MTU and policies. Without that, the network may function but applications can be unstable.
What to choose for leaf–spine links: DAC, AOC or optics?
For short connections inside a rack or between adjacent racks, DAC or AOC are convenient: inexpensive and simple to deploy. Where distances matter, routes may change, or there are multiple halls, fiber optics with the proper type of fiber is safer. Choose based on actual cable lengths and the need for flexibility. Optics issues usually appear under load at the worst possible time.
Do I need separate OOB management when moving to leaf-spine?
Yes. If management traffic uses the same fabric, a configuration error or fabric outage can cut off access to the switches themselves. An OOB management network enables recovery without a site visit and without manual console connections. At minimum, deploy a separate management path and a clear access scheme for emergency operations.
How to migrate to leaf-spine without a "big bang" and service downtime?
Start with a pilot on 1–2 racks with representative load and repeatable measurements: latency, losses, uplink utilization, and behavior during peak windows. Define success criteria and a rollback plan in advance. Then standardize module types, patching rules, configuration templates, MTU, and monitoring. Without standards, the network becomes heterogeneous and support gets harder.
Are there scenarios where a two-tier design remains the best choice even when speeds grow to 25/100G?
Yes. If your traffic is truly dominated by north-south flows, you have few racks and infrequent expansions, a two-tier design often gives a better balance of cost, simplicity and deployment speed even at 25/100G. Focus on quality uplinks, port reserve and consistent settings to avoid technical debt. Leaf-spine makes sense when inter-rack exchange and frequent scaling are your main pain points.