Where to start a business case for server modernization if time is limited?

Start from facts you already have: downtimes over the last 6–12 months, number of incidents, average MTTR, and user complaints with dates and affected systems. Then in one paragraph state which business processes suffer and how you measure it: lost working hours, missed deadlines, delayed services.

What does the budget committee usually want to see in such a case?

The committee usually looks at three things: how much money and time are lost now, how risks change after the project, and when the investment will pay back. Technical details matter only where they explain effect and risks — for example, impact on downtimes, support and compatibility.

How to correctly calculate the cost of an hour of downtime so the calculation won't be dismissed?

Use simple logic: losses = hours of downtime × cost per hour of the service. Build the cost per hour from clear components: paid time of employees who cannot work and fixed obligations (regulatory deadlines, SLAs, penalties) if they are actually applied.

How to quantify the damage from 'everything works but slowly'?

Convert waiting time into person-hours. If many users regularly wait for reports, document processing, or UI loads, measure time lost on typical operations and multiply by frequency and number of users, then multiply by the employee hourly rate.

How to present the effect as a range instead of a single disputable number?

Provide two scenarios: conservative and realistic. The conservative scenario counts only emergency downtimes and uses a minimal cost-per-hour estimate; the realistic one also adds partial outages and degradations. Use the same data sources and state assumptions clearly.

What must be included in the 3–5 year TCO besides server price?

Separate one-off and recurring costs clearly. 3–5 year TCO normally includes CAPEX for equipment and implementation, and OPEX for power, cooling, service and spare parts, licenses, renewals and support labour.

Which migration risks should be reflected in advance so the case looks honest?

Describe risks briefly as ‘what can go wrong’ and next to each write ‘what we will do in advance’. Typical measures are a pilot on one service, a rehearsal switch, an agreed maintenance window, acceptance criteria by metrics and a clear rollback plan.

How to answer the committee: 'Why not extend the life of current servers?'

Because prolonging life usually doesn't eliminate key risks and can be more expensive due to emergency repairs, spare parts shortages and growing manual workarounds. In the case compare 'leave as is' and 'modernize' with the same rules: same time horizons, same metrics and transparent assumptions.

Should I prepare multiple upgrade scenarios or is one enough?

Prepare at least two options: a 'minimal' one to reduce risks and a 'target' one for the three-year effect, and briefly show differences in CAPEX, OPEX and payback. A phased option is useful as plan B: critical services first, then the rest, so the project survives budget cuts.

How to describe delivery and support so it reads as risk management rather than marketing?

Check whether the supplier provides verifiable parameters: delivery times, availability of spare parts, service coverage, response format and responsibility for the first weeks of operation. If local production or procurement requirements matter, indicate which documents confirm manufacturer status and certificates, for example for Kazakh suppliers and integrators at the GSE level.

Business Case: Server Modernization for a Budget Committee

Name: GSE
Address: ул. Геологов 2, нп. 10, Алматы, Алматы, 050000, KZ
Telephone: +7-707-816-04-45
Price range: $$

What the committee wants to see in the business case

Budget committees usually care less about which server is "faster" and more about why to spend money now and what measurable benefits the organization will get. A strong business case for server modernization reads like a financial memo: concise, verifiable, with clear assumptions.

The same questions often come up: what's wrong now, what damage does it cause, why can't the problem be fixed cheaper, when will the investment pay off, and what happens if the project is not done. Committees also look separately at risks: migration, compatibility, security, supply dependency and support.

In server projects the word “effect” usually means three things:

Money: lower losses from downtime, fewer emergency fixes and support costs, predictable spending over 3–5 years.
Time: key systems run faster, staff and users wait less, incidents close faster.
Risk: lower probability of service outages, easier to pass audits and handle growth in load.

Immediately define what you mean by “modernization.” It can be replacing aged servers, expanding capacity, consolidating nodes into a more compact setup, or moving to a more manageable, standardized architecture.

To avoid drowning in technical detail, focus on numbers and verifiable facts:

which services are affected and how many people or departments depend on them
how many hours of downtime occurred and how you estimate the cost per hour
current support costs (hours, contractors, spare parts)
what will change after the project and how it will be measured
assumptions and limits you accept, and what can go wrong

Delivery and support terms are often a separate argument: lead times, spare parts availability, response format and service geography. For committees this is a language of risk, not "hardware marketing."

Document structure: short and verifiable

The committee approves upgrades faster when the document looks like a verifiable memo, not a 30-slide deck. A practical format is one-page decision summary plus an appendix where each item can be checked.

On the first page include only what influences the decision:

objective and what exactly changes (e.g., replace 10 servers, migrate 6 critical services)
the current problem in facts (hours of downtime, performance drops, increasing incidents)
two scenarios: minimal (mitigate risks) and target (deliver effect)
headline figures: CAPEX, OPEX, savings, payback period
effect per year over a 3–5 year horizon (a short table of a few rows)

Then provide a detailed calculation in the appendix. Important: include not only formulas but the sources of the numbers.

The appendix usually needs four blocks:

calculations and logic (what was counted, which formulas, what’s included in TCO)
assumptions and data sources (tickets, monitoring, payroll rates, tariffs, SLA)
risks and what happens if assumptions fail (for example, effect is 20% lower)
estimates by scenario and project boundaries (what is in scope and what is not)

If a vendor or integrator provides a turnkey solution, keep the structure anchored in numbers and verifiable inputs, not in equipment descriptions.

Gathering source data without long surveys

For committee approval, clear sources and simple assumptions matter more than “perfect” accuracy. Most numbers can be collected in 1–2 days if you agree in advance where the facts come from: monitoring logs, Service Desk, security and finance reports.

Start with a list of critical services. Usually this includes accounting and budgeting, document workflow, key registers and databases, email and collaboration. For each service record the owner (who is responsible) and the “cost of error”: what is halted during downtime and what is critical by deadlines.

Next quickly identify who actually suffers from failures. You don’t need every employee—3–5 departments where a failure immediately causes delays are enough: accounting, procurement, archives/clerical, operations, contact center. A short 20-minute interview and one example of a “failed day” are usually sufficient.

Use available metrics: incident counts, downtime durations, mean time to recovery (MTTR), queue sizes and response times. Even if data is incomplete, state the period (e.g., the last 6–12 months) and source honestly.

Check current expenses, which are often scattered across accounts. It’s convenient to group them into a few lines:

internal support (admin hours, on-call)
spare parts and repairs
external contractors and onsite visits
licenses and renewals
penalties or losses caused by outages (if recorded)

A rule of thumb: if Service Desk shows 12 major incidents per year and average MTTR is 3 hours, that’s already a basis for effect calculation. Present other figures as assumptions and note they will be refined during the project.

How to calculate the effect of reduced downtime in numbers

Committees need verifiable calculations: which downtimes are counted, where hours come from, and how the cost per hour was derived. It helps to separate downtimes by type, because they have different causes and respond differently to an upgrade.

Four common categories are often enough:

planned outages (updates, maintenance)
emergency downtime (failures, overheating, disk or power faults)
partial outages (one service fails while others work)
degradations (everything runs but so slowly people effectively wait)

Basic formula: losses = hours of downtime × cost per hour of the service. The issue is not the formula but an honest estimate of the cost per hour.

Cost per hour can be built from 2–3 clear components: payroll costs for employees who cannot work; lost volume of services (e.g., citizen reception, processing requests); penalties or SLA obligations, if any. If a service affects multiple departments, count the most impacted process and state that this is conservative.

Hidden downtime from “slowness” is easy to convert to waiting hours. Example: 120 employees lose 10 minutes a day because the accounting system is slow. That’s 20 person-hours per day. Multiply by the cost per hour and by the number of working days.

To avoid disputes about assumptions, present a range:

conservative: count only emergency downtimes and the minimal cost per hour
realistic: add partial outages and degradations plus lost service volume

This shows the committee a corridor of effect instead of one “magical” number.

How to estimate gains from system acceleration without complex math

Committees rarely care about GHz or benchmark scores. They care how speed improvements affect people and processes: how many minutes per day are lost waiting, how many tasks are delayed, and where overtime occurs.

Rely on a few indicators: response time of key systems (accounting, document workflow), report generation times, batch operation speeds (day-end closing, exports, calculations) and the share of peak hours when everything “lags.”

Simple monetary calculation:

Pick 3–5 typical operations that many users perform frequently. Measure “before” on current infrastructure and “after” on a pilot or test bench. Convert the effect:

daily time saved = (time_before - time_after) × number of operations × number of users
annual saving = daily saving × working days
monetary effect = saved hours × average employee hourly cost

Example: a report takes 6 minutes now and will take 3 minutes. 40 users run it twice a day. Saving: 3 minutes × 80 runs = 240 minutes/day = 4 hours. Multiply by working days and the hourly rate (use payroll with overheads or an internal rate).

Do not promise “10× faster” without conditions. State what changes along with servers (storage, network, DB) and where speed may not improve (application bottleneck, license limits, external channels). That makes the calculation honest and verifiable.

Support and maintenance: where costs usually hide

Infrastructure for growth

We will pick a configuration for virtualization, databases and reserves for the next 3 years.

Select solution

Committees often see only the purchase price, while main losses sit in support. These costs are hard to spot because they are spread over time: small tickets, onsite visits, emergency part replacements, manual workarounds after failures.

Count support costs as time to handle an incident (from registration to closure), visits or remote sessions, component replacements, manual workarounds (reboots, log cleanup, service restores) and downtimes that require extra IT effort.

A practical way to estimate savings is to start from incident counts and labour per incident. For example: currently 18 incidents/month, average effort 2.5 hours, two specialists involved (1st/2nd line), plus one onsite visit per month. After platform unification you forecast 10 incidents/month and 1.5 hours per incident. Labour savings are immediately visible without complex models.

Show effects in a few lines:

reduction in incidents (pcs/month) and average time to resolution (hours)
fewer onsite visits and emergency procurement
unification: fewer models and spare parts, simpler training and procedures
freed IT hours for institutional tasks (without increasing headcount)

Also define the spare parts and support plan: what is stored locally (disks, PSUs), what is covered by warranty, required response times. If you count on 24/7 supplier support and a vendor’s service network, present this as reduced risk of long outages and emergency visit costs. Always state assumptions: current incident statistics, expected unification effect and limits (for example, "assuming current load and operation mode").

TCO for 3–5 years: a simple cost model

To convince the committee show not only purchase price but full cost of ownership (TCO) over 3–5 years. This horizon is easy to defend: it matches typical server refresh cycles and reveals recurring yearly costs even if CAPEX is one-off.

Start with two cost buckets and list items in plain language so finance and security understand what’s counted.

CAPEX: servers, storage (if needed), network gear, racks and UPS, installation and commissioning
OPEX: power and cooling, service and spare parts, warranty extensions, licenses and subscriptions, contractor or internal support labour

Compare the “leave as is” scenario against “modernize.” The base (as-is) must be honest or the calculation will be challenged.

In the ‘leave as is’ case include: rising repair costs as equipment ages, support renewal costs (if applicable), downtime risks and time staff spend on manual workarounds. In the ‘modernize’ case add: training, migration, maintenance windows and lower OPEX due to more predictable service.

A simple format: take 3 years, show CAPEX in year 1, then OPEX per year, and the net difference between scenarios.

Step-by-step: build the business case in 5 working days

Treat the case as a short one-week project: each day yields a verifiable result you can review.

5-day plan

Day 1: document the problem with facts. Collect 3–5 evidences: incidents from Service Desk, SLA reports, user emails, downtime windows, recovery times. Dates, durations and affected systems are required.
Day 2: describe 2–3 upgrade scenarios. Usually “minimal to reduce risk,” “target for 3 years,” and “phased if budget is cut.” For each scenario list changes: servers, storage, redundancy, support.
Day 3: calculate effects across three buckets. Downtime (hours/year × cost/hour), speed (how many people wait and how much time is lost), support (current cost of the old park: visits, parts, night work).
Day 4: consolidate into one table. Investment, annual savings, payback period, assumptions and risks. If vendor/integrator offers proposals, list what is included in price.
Day 5: prepare a 5–7 slide defense. Problem, options, calculations, risks, migration plan, and what you need from the committee.

A short example assumption: “ERP is down 6 hours per quarter, 80 users affected, cost per hour 4,000 тенге.” Committees prefer a calculation with clear logic over “exact” figures without sources.

Example scenario with clear assumptions

Pilot on a single service

We will check compatibility and metrics before purchase to support the business case with data.

Schedule pilot

District organization: 220 employees, three key systems—1C and accounting services, electronic document workflow, and a common file share (templates, orders, scans). Users complain about hangs; IT spends time on manual restarts and recovery.

Source data from last year: 8 emergency downtimes. Average duration per downtime 2.5 hours plus about 1 hour of post-recovery catch-up (queues, reprocessing). Total 28 hours of unavailability. Assumed cost per hour of downtime: user time plus IT overtime, conservatively 150,000–250,000 tenge/hour (not lost profit but real paid time and broken schedules).

Modernization scenario: update servers and standardize the platform (same OS versions, typical roles, unified monitoring and backups). The goal is not to make everything 10× faster, but to reduce failures and simplify support.

Expected conservative effect: 8 downtimes/year become 2, average duration 1 hour. Unavailability drops from 28 to 2 hours — a saving of 26 hours. In money: 3.9–6.5 million tenge per year. Plus a 20–30% drop in support tickets (Service Desk), freeing 0.2–0.4 FTE in IT without hiring.

If modernization budget is 10–18 million tenge (servers, implementation, migration, tests), payback is about 18–36 months. This format works well: a range, assumptions and verifiable sources.

To make the committee "see the numbers" attach:

incident export with dates and durations
short time measurements “before” (open report, post document processing)
support ticket statistics by category
calculation of cost per hour of downtime with formula and sources (payroll, user counts)
migration plan and maintenance windows

Common mistakes that make the case fail

The committee often rejects the calculations rather than the modernization itself. Even a good project looks weak if the numbers come from “thin air” or costs are combined so they can’t be verified.

First issue — overly optimistic percentages for downtime reduction. If you claim "-70%", show where that comes from: incident statistics, ticket logs, monitoring. Without basis, stick to conservative figures (15–25%) and show a range.

Second — mixing one-off and annual costs in the same line. The committee needs to see what is one-time (hardware, implementation) and what repeats yearly (support, licenses, power). When mixed, the total is easily challenged.

Third — forgetting the “invisible” work: admin training, testing, pilot, migration windows, rollback plan. If you move a critical system, allow time for parallel runs and backup checks. Otherwise promised gains may turn into missed deadlines.

Fourth — ignoring risks: delivery times, compatibility with legacy software, license limits and security requirements. This is critical when tied to specific OS, DB or virtualization versions.

Fifth — no phased option. If budget is cut, the case must survive: upgrade the core (critical services) first, then the rest.

Self-check list:

key percentages have sources and a conservative scenario
CAPEX and OPEX are separated and labeled in plain words
migration, testing, training and buffer time are included
supply, compatibility and licensing risks are described with mitigations
there is a phased rollout plan with measurable effect at each step

If you work with local vendors and integrators, also record how this affects delivery times, support and procurement requirements. No promises — only verifiable facts.

Risks and assumptions: how to state them honestly and convincingly

Calculate 3–5 year TCO

Get a cost and benefit calculation based on your downtimes and load with the GSE team.

Request calculation

Committees catch unstated expectations more often than wrong numbers. Write risks and assumptions briefly and next to each say what you'll do to prevent the risk becoming a problem.

Good rule: each risk answers “what can go wrong?” and its mitigation answers “what do we do beforehand?”. Switching to new servers almost always entails a short downtime. If you admit this and plan a night or weekend window, this is stronger than promising “no downtime.”

Project risks and mitigations

Delivery and implementation times. Plan with buffers and stages (test environment first, then production) so a delay in one part doesn’t derail everything.
Migration and switch downtime. Pilot one service, rehearse the switch, and have a rollback plan to the old equipment.
Compatibility and performance. Test key systems on a test server, agree acceptance metrics (response time, CPU load, IOPS) and pass/fail criteria.
Resilience. Redundancy for power, disks and network, and regular recovery tests so “there is a reserve” is proven in practice.
Personnel factor. Assign an IT owner for migration, a clear work schedule and vendor responsibility for support in the first weeks.

How to record assumptions so they are accepted

State assumptions as a list: “we assume this, because otherwise the calculation would change.” Indicate sources and what changes if an assumption fails.

Examples:

cost per hour of downtime is based on internal data (lost employee time, penalties). If exact data is missing, use a conservative estimate and state it.
load forecast is confirmed by the department. If growth doesn’t happen, the effect is lower, but risk of failures is still reduced.
the maintenance window for switching is agreed with the business. If the window is shortened, preparation and contingency costs increase.

If you depend on supplier service conditions (response time, spare parts, 24/7 coverage), list them in assumptions and check that they are documented.

Quick checklist before submitting to the budget committee

Before submission: data, calculations, evidence

Collect the minimum that can be quickly rechecked:

environment and criticality: list of servers and systems, service owners, number of users, acceptable maintenance window, current availability requirements
reliability facts: downtimes for 6–12 months (tickets or monitoring), MTTR, main causes (hardware, power, storage, OS, human)
calculation logic: base scenario (as-is) and target (post-upgrade), units (hours, tenge, FTE), no double counting
full cost of ownership: what is included in CAPEX and OPEX for 3–5 years (support, spare parts, licenses, power, rack space, visits, training)
appendices for verifiability: specs and quotes, migration plan with dates and owners, assumptions and risk table, sign-off from key service owners

If procurement depends on local content or compliance, attach supplier documents (domestic manufacturer status, ISO certificates, service coverage). This reduces questions before the defense.

At the defense: what to say and how to answer

Three effective theses:

we are buying downtime and risk reduction, with a clear hourly cost and confirmed statistics
the chosen option gives the best 3–5 year effect by TCO and support, not because it’s "more powerful"
the transition is planned to reduce implementation risk (pilot, maintenance window, rollback, clear responsibilities)

Three typical questions and short answers:

Why now? There are measurable signals: rising incidents, end of support, spare part shortages, increased load.

Why not extend current equipment? It usually does not close key risks and can be more expensive due to emergency repairs and downtime.

What if the effect is smaller than expected? The model includes a conservative scenario and assumptions that will be monitored by KPIs after launch.

Next steps: from calculation to project

Once numbers are agreed, the committee wants to see you understand the path from idea to launch. Turn the case into a concrete plan: what to buy, how to migrate, who is responsible, and how results will be measured.

Define 2–3 modernization options (spot replacements, virtualization cluster refresh, hybrid approach) and request commercial offers. In the request state the assumptions used in calculations (current downtimes, critical services, maintenance windows) so offers are comparable.

To reduce migration fear ask the vendor or integrator for a short assessment of phases and risks: what can be moved without downtime, where a night slot is required, and which tests are mandatory.

If local production and supply chain transparency are important, record this as a procurement factor and confirm with documents. For example, GSE.kz (gse.kz) is a technology manufacturer and system integrator in Kazakhstan that assembles PCs and servers locally and provides round-the-clock support through a service network. Such parameters are convenient to state in assumptions and confirm in procurement documents.

A pilot is useful when performance or compatibility is in dispute. Pick one service or a small environment and set success criteria in advance:

response time and batch operation times
actual downtime during migration
support load (number of incidents)
backup and recovery requirements

If you need a contractor, check scope and SLA: which works are included (migration, tests, documentation), who is responsible for rollback, response times for incidents and how performance is recorded.