Why do problems most often start at the access layer rather than the internet or core?

Usually because the "last meter" near the user fails most often: switch power, a port, a PoE device, a patch cord or an uplink. If access has no redundancy, even with good internet and a reliable core people will still regularly lose connection to servers, telephony and Wi‑Fi.

What does resilient access actually give for offices and branches?

A single failure can take out a room, a floor or a small branch, and recovery drags on if there’s no local engineer. Redundancy at the access layer limits the impact of one failure and makes maintenance predictable, which is critical for point‑of‑sale systems, telephony and Wi‑Fi.

What is the simple difference between stacking/VSF and MLAG?

A stack (Cisco Catalyst) or VSF (Aruba CX) is several switches operating as one logical device: shared config and a single management point. MLAG/VSX are two separate devices that let you build one LACP channel across different boxes but usually require more careful paired configuration.

What happens to the network if one switch fails?

In a stack/VSF, the failing member’s ports disappear while the rest keep working; if the master fails there may be a short pause during re‑election. In MLAG/VSX, the remaining switch keeps its ports and uplinks active, so the impact is often smaller if LACP is configured correctly.

Should endpoints have redundant connections or is that only for servers and Wi‑Fi?

For a single PC there’s usually no difference—one cable. A second link is more useful for access points, mini‑switches, servers and any devices with dual NICs where surviving a port or switch failure matters.

How to make two uplinks upward without surprises?

The common approach is to build uplinks as an LACP port‑channel so that losing one link doesn’t take down the whole floor or branch. Make sure LACP is configured the same on both ends and that trunk and VLAN lists match, otherwise you’ll get black holes where some traffic never returns.

Why is STP still needed and how to prevent it becoming a source of problems?

STP is needed where L2 loops are possible, but make it predictable: set a clear root at distribution and avoid accidental bridges. Enable port protection on access ports so a misplaced patch cord or device won’t bring down the segment.

Where is it better to place the L2/L3 boundary in an office and in a branch?

Keep L2 at access and place L3 and VLAN gateways at distribution or on the branch router unless there’s a specific reason not to. That way L2 issues don’t spread through the network and a single switch failure is easier to isolate and fix.

How to ensure the default gateway doesn’t disappear when one device fails?

Make sure the gateway survives the failure of one device without clients needing to change IPs: usually VRRP/HSRP on a distribution pair or the vendor’s equivalent. The key is to decide in advance where the default gateway sits for each VLAN and keep that consistent across sites.

How to safely update a stack/VSF or an MLAG/VSX pair without causing downtime?

Plan updates so you service the secondary node first and then the primary, and always have a tested rollback image. After changes, check not just ping but LACP state, MAC/ARP table stability, uplink error counters and logs—those show VLAN, trunk or flapping problems.

Resilient access for Cisco and Aruba: stacking, VSF, MLAG

Why offices and branches need resilient access

The access network is the part users notice first. Calls drop, video meetings freeze, printers don’t work, a POS terminal can’t reach the server — then users start saying "it works sometimes, then it doesn’t." Often the issue isn’t the internet or the core, but simple failures on floors and in branches.

Failures usually happen where it’s closest to the user: switch power, a failed power supply, a port (easy to “kill” with a cable or a faulty PoE device), a patch cord, a module/transceiver or the uplink upward. There are also planned causes: firmware upgrades, hardware replacement, moving desks.

Outages come in different scales. Sometimes one port goes down and a single user or access point is affected. Sometimes a whole switch dies and you lose a room, a floor or a small branch. Sometimes the uplink breaks: devices look up but traffic doesn’t reach servers and services.

If you only make the core or internet link redundant but keep access single, mass incidents will still happen. In branches that’s especially painful: it can take half a day from reporting to someone driving over to reboot the equipment.

The goal for resilient access on Cisco and Aruba is simple: limit the impact of any single failure, shorten recovery time and make maintenance predictable. For example, a firmware update shouldn’t take down an entire floor, and replacing one switch in a branch shouldn’t stop cashiers and phones.

Stacking, VSF and MLAG: the short difference

In practice resilient access is usually built in three ways: stacking (Cisco Catalyst), VSF (Aruba CX) and MLAG (called MLAG, VSX, MC‑LAG by different vendors). The objective is the same: a single device failure shouldn’t "take down" users and uplinks.

Stacking and VSF are easiest to understand as "several switches behave as one." Shared config, a single VLAN table, one management point. For an admin it looks like one large switch with many ports.

MLAG is a different approach: two separate switches that can appear as a single logical endpoint to connected devices. Management is usually separate (two configs), but upstream (or to servers/APs) you can present a single LACP aggregate across different boxes.

What happens on failure

In a stack/VSF the failed member’s ports disappear, but the rest keep working. If the master fails, role switching happens and there may be a short pause. That’s why critical uplinks should be distributed across different members.

In MLAG, if one switch fails the other keeps its ports and uplinks. LACP typically stays up on the remaining link, so the degradation is usually smaller.

How this affects uplinks and ports

The difference is felt more on uplinks than on user ports.

In stacking/VSF you create one LACP bundle "to the stack" and the physical ports are distributed across members.
In MLAG you can split LACP across two devices without stacking, but the peer pair and control links need careful setup.

For a workstation there’s normally no difference: a PC has one cable. For access points, mini‑switches and servers a second link is often genuinely useful.

Sometimes you don’t need complexity. A single switch is fine where downtime is acceptable, there are no critical services on the floor, and ease of maintenance matters more than maximum availability.

Choosing a scheme: offices, floors and branches without extra complexity

The aim is simple: one switch or uplink failure shouldn’t leave people offline, but the scheme should not be more complex than necessary. For access on Cisco and Aruba, one of the typical models usually suffices.

Typical schemes that rarely fail

Three common reliable options:

Small office: a pair of access switches in a stack (or VSF) with two uplinks to distribution.
Medium multi‑floor office: each floor has its own stack/VSF and uplinks go to a pair of distribution switches.
Branch: a compact stack/VSF or two switches with MLAG if device independence and maintenance without a single management point are important.

Where L2 and L3 boundaries are placed solves half the problems. If a branch has no complex routing keep L2 at access and move L3 to distribution or the branch router. Then L2 failures don’t spread and it’s easier to find where a loop occurred.

Two uplinks up: where to terminate them

If there’s a single distribution switch above, both uplinks are usually put into one LACP port‑channel. If there’s a distribution pair, the best option is to terminate the port‑channel across the pair (MLAG/VSX). If there’s no pair, two independent uplinks with properly configured STP are often safer.

Keep segmentation consistent across sites. A few VLANs usually suffice: users, telephony, corporate Wi‑Fi, guest Wi‑Fi, cameras and access control.

STP is required where L2 and loop risk exist. To prevent STP from becoming a problem set a clear root at distribution and avoid accidental L2 bridges (for example, someone connecting two wall ports with a patch cord). On access ports enable protection features and keep uplink topology predictable.

Example: in a two‑floor office you deploy a stack per floor, keep the VLANs the same, and have two uplinks from each stack to the distribution pair. In a branch use a small stack and two uplinks to one router or to a pair if present. This gives resilience without extra entities.

Step‑by‑step: how to build stacking on Cisco Catalyst

Stacking on Cisco Catalyst is handy when you want one logical switch from two or three devices. For a typical office it gives clear management, a shared config and fast commissioning.

Before you start

Verify that the models actually support stacking in your hardware revision and that you have the required StackWise modules/cables (often sold separately). Decide in advance who will be master (active), who standby, and lock the member order. This affects port numbering and how you later identify "that" port in the rack.

Plan a single management approach: one management IP for the stack, a clear device name (e.g. ACCESS‑FLOOR3) and a document mapping "stack member 1 — top switch in the rack."

Assembly and basic config

Physically it’s best to wire a ring: connect stacking cables so each switch links to two neighbors. A ring survives one stacking cable break without collapsing the stack.

Then configure basics: VLANs must match the upstream level, access ports must be set for workstations and phones, and uplinks should carry the correct VLAN list. Gather uplinks in an LACP port‑channel so a single link isn’t the only fail point. Check that STP sees the stack as a single device.

Plan reboots and upgrades. Agree on a maintenance window, save configs, verify image space and update following a chosen scenario. The order of operations and post‑update checks usually save more time than the initial setup.

Step‑by‑step: how to build VSF on Aruba CX

VSF (Virtual Switching Framework) turns two Aruba CX switches into one logical node. For users it looks like one switch; for admins it simplifies management and provides access‑level resiliency.

1) Preparation

Check that models support VSF and ArubaOS‑CX versions match. Choose roles (leader and member) and set priorities so leader election is predictable. A practical setup is a preferred leader and a standby.

Label devices and ports before installation — it saves hours when you need to quickly find a link.

2) Cabling and initial checks

Build VSF links according to vendor recommendations for your model: usually two links so there’s no single point of failure. Before putting users on, verify cable health: connectors, latches and link speeds.

Then configure VSF and ensure the node sees both members and reports links as healthy. Only after that migrate user connections.

3) Management, user ports, uplinks and LACP

Assign one management IP for the VSF, configure access (SSH/AAA per your policy) and apply consistent port profiles: VLAN, voice VLAN (if needed), PoE and port‑security.

Uplinks to distribution or core are simplest and most reliable as an LACP aggregation (usually one physical link from each VSF member). After assembly check LACP, VLAN correctness on trunks and that STP doesn’t block an unexpected path. A quick health indicator is a stable MAC table without jumps.

4) Failure test

Simulate a failure: power off one VSF member. Ports on the remaining switch should keep working and traffic should flow over the remaining LACP link. Power the unit back on and verify it rejoins the cluster without manual steps.

MLAG at access: when it’s better than stacking

Update plan without downtime

We will create a safe update and rollback plan for stacks, VSF or VSX.

Get the plan

Choose MLAG when device independence matters more than a single logical switch. Stacking gives a unified management point — a plus — but there’s also a shared risk: a stack failure or a bad update can affect the whole group.

MLAG keeps two switches logically separate, so maintenance is calmer: you can reboot and update nodes one by one while keeping the network up.

The layout is a pair (A and B) with two types of links between them. First, a peer‑link (sometimes called ISL) for control traffic and parts of data forwarding. Second, a keepalive (often via a separate network or the management plane) so nodes don’t falsely view each other as down if the peer‑link is congested or broken.

Don’t mix up terms. On Aruba CX MLAG is often implemented as VSX: two devices synchronizing state and enabling LACP across them. On Cisco similar functionality can be done with StackWise Virtual and Multi‑Chassis EtherChannel in campus scenarios.

Use MLAG for things that must not fail when one switch dies: uplinks (LACP to two devices), servers and storage with dual NICs, Wi‑Fi controllers and critical access points.

Uplinks and aggregation: how to connect upward without surprises

Most sudden access outages aren’t caused by stack vs MLAG but by uplinks: one port is up, another is not; a VLAN is missing; or traffic goes into a black hole. Discipline in LACP, trunking and gateway configuration fixes most of these.

If you have stacking on Cisco Catalyst or VSF on Aruba CX, uplinks are easiest as an LACP channel to the logical switch. Two physical links provide capacity and protection. Both ends must build the channel the same way: LACP on both sides — mixing LACP and "on" often ends badly.

With MLAG (VSX/MC‑LAG) you can choose where to terminate the channel: to a distribution pair or to a single device. Rule of thumb: if the upstream pair supports MLAG, terminate there (one link to each device) and get redundancy without STP blocking. If there’s only a single upstream device, build the channel there and solve further redundancy with routing or a second uplink.

Trunk VLAN lists are a risk zone. Agree on an explicit allowed VLAN list on uplinks and keep native VLANs consistent (or avoid native VLANs where possible). Otherwise you may see traffic go down but never return.

Also plan the VLAN gateway: who is the default gateway for clients. On Cisco this is usually HSRP/VRRP at distribution; on Aruba it’s VRRP or a distributed gateway in a pair. The goal is the same: the gateway should remain reachable without changing client IPs.

After bringing links up don’t stop at ping. At minimum check LACP state, MAC/ARP stability, error counters on uplinks, STP behavior and logs for VLAN/trunk mismatches or flapping.

Real‑world example: a floor has a VSF pair and two uplinks to distribution. If someone forgets to add the telephony VLAN on one uplink, phones may be visible but won’t register: some traffic goes via the second link and gets lost upstream. Explicit allowed VLANs and matching trunk settings fix this quickly.

Maintenance and upgrades: how not to cause downtime

Hardware and module selection

We will select switches and modules for your design, including power redundancy.

Get a quote

Resiliency often breaks during planned work rather than accidents. In stacks (Cisco Catalyst stacking, Aruba CX VSF) and MLAG pairs (VSX/MLAG) you can often do changes almost transparently if you know the boundaries.

What can be done without a maintenance window, and what needs one

You can usually do tasks that don’t change device roles or break uplinks without a window: replacing one uplink cable when another link exists in LACP, small port/profile changes, or replacing one device in a pair if the config is ready.

You generally need a window for changes that can restart a stack/pair or cause protocol recalculations: OS version changes, changing master/primary, major STP changes, or reassigning LACP groups on uplinks.

Software updates: safe order and rollback

Keep the update plan consistent across offices and branches. A practical order: backup configs and record versions, check free space and image integrity, update secondary first then primary, then test services (LACP, addressing, telephony, Wi‑Fi).

Plan rollback: keep the previous image and document how to return to it without scrambling through chats during the maintenance window.

Device replacement is faster with a template and clear port names. In branches this often means: bring a new switch, load the config, verify connections — and people are working within 15–20 minutes.

Monitor early signals: rising port error counts, link flapping, frequent master role changes, overheating, power issues, and unstable VSF/stack links.

Keep documentation short and actionable: connection diagrams, version list and image locations, role rules, console/OOB access, and a short post‑work checklist.

If an integrator maintains your infrastructure, agree on a single playbook and checks for all sites to avoid "special" branches that fall over at the first scheduled update.

Typical mistakes with stacking, VSF and MLAG

Most painful outages are caused not by hardware but by small config and process errors.

A common trap is mismatched software versions and hidden incompatibilities. In a Cisco stack or an Aruba CX pair devices might come up but some features behave oddly: LACP or STP problems. It’s especially risky when one switch was updated and the other was forgotten.

Second mistake: uplinks without proper aggregation. A single uplink may seem simple but during a failure you can get STP blocking, a loop or selective VLAN losses. Another frequent issue is mismatched allowed VLANs on trunk ends — it looks like "sometimes it works, sometimes it doesn’t."

Third—messy stacking/VSF cabling. For stacking and VSF the right ports, member order and ring redundancy matter. An unlatched cable or swapped lines may not show up immediately but will fail on reboot.

Mixing STP and MLAG/VSX without understanding roles is dangerous. MLAG prevents some loops at the pair level, but STP still exists and must be predictable (root, priorities and where it’s really needed).

Before launch do a short checklist: record models and versions, verify LACP and VLAN on both ends of uplinks, ensure stacking/VSF is wired in a ring and labeled, set STP root, run a failover test and prepare a rollback plan.

Quick checklist before going live

Spend an hour on checks before production — it’s cheaper than troubleshooting a strange stack or MLAG later.

What to verify before switching users

Start with physical checks: port mapping matches diagrams, labels are correct, stacking/VSF links are in the intended ports.

Then software and configs: software versions match, backups saved, device roles clear.

Make uplinks predictable: LACP, identical trunk settings and matching VLAN lists across the path.

Short control list:

Diagram and labeling match reality.
Software versions match and config backups are taken.
Uplinks in LACP, identical trunk settings, VLANs match.
Failover test passed: shut down one device and one uplink — services remain up.
Basic alerts configured: link down, port errors, overheating.

Mini failover test that actually finds problems

During low load disable one uplink and then power off one switch. If users lose connectivity for more than a few dozen seconds, first check LACP, STP/loop protection and VLAN consistency on trunks.

Example: HQ and three branches on the same logic

Unified templates for branches

We will standardize configs across all sites and branches.

Standardize configs

A practical low‑complexity layout: at HQ run pairs of access switches per floor and two uplinks to distribution. Repeat the same idea compactly in branches to simplify scaling and support.

At HQ place two access switches per floor and make them one logical node: Cisco Catalyst stacking or Aruba CX VSF. Give two uplinks to the distribution pair via LACP. If an access switch or one uplink fails, the whole floor doesn’t go down.

Avoid single points of failure: spread Wi‑Fi APs and IP phones across both switches. Put critical cameras on different switches and, where possible, different PoE power supplies. Servers on a floor get most benefit from two NICs into different switches with server‑side aggregation.

For branches without a strong local engineer a stack (stacking/VSF) is often simpler: it’s managed as one device with fewer manual steps. If the branch needs independent control plane and frequent updates with minimal downtime, consider MLAG/VSX.

Leave headroom: 20–30% spare ports, space for a third stack member and a clear VLAN scheme (corporate, guest Wi‑Fi, telephony, video). This helps growth without reworking the network.

Next steps: how to move from idea to working design quickly

Gather basic data: exact switch models and modules, acceptable downtime (e.g. 5 minutes or 0), list of VLANs and critical services (telephony, POS, Wi‑Fi, video), and current uplink topology.

Create a one‑page diagram. No need to map every wall port — show where the switch pair/stack is, where uplinks go, where LACP is needed and which devices must survive a single node or link failure. At this stage it becomes clear whether stacking/VSF or MLAG/VSX is better.

A short implementation plan

Work in small steps: verify hardware compatibility and fix the target scheme, prepare a template config (VLAN, trunk/access, STP, management), test a pair on the bench with simple failovers, schedule a maintenance window and write a rollback plan.

Also plan support: who responds at night, spare parts to keep (stacking/VSF cables, SFPs, power supplies), and your recovery time objective. For offices and branches this is often more important than a theoretical perfect architecture.

If you need resilient access on Cisco and Aruba without extra risk, you can engage GSE.kz as a systems integrator: they can help with design, implementation and 24/7 support with a service network across Kazakhstan.