What is downtime and why calculate its cost

Workstation downtime is the time when an employee cannot perform their main work due to an IT problem. It’s not just a “broken PC.” This usually includes network or internet outages, freezes or errors in key software, printer and peripheral issues, and situations where someone cannot log in because of a locked account, expired password, or access rights.

Calculating the cost of downtime matters because an “hour of downtime” is almost never equal to one hour of salary. Even with a fixed salary, downtime often triggers a chain reaction: deadlines shift, service quality drops, queues grow, colleagues do overtime, and managers spend time on investigations and manual workarounds.

A calculation is useful for practical decisions: buying new PCs or servers, choosing support levels and response times, agreeing on SLAs and service contracts, and planning upgrades (OS updates, moving to a new office suite version, strengthening link redundancy). When you have numbers, the conversation about “reliability” turns into choosing between clear options and their costs.

Usually the calculation helps decide:

which SLA for recovery time is justified for different departments;
where to replace equipment versus where improved support and processes are enough;
which workstations are critical (front office, registration, cash desk) and which can tolerate longer downtime;
whether to invest in backup channels, spare PCs, and standard images.

If a workstation in the service window freezes, not only the specialist’s time is lost. People in the queue lose time and sometimes trust in the service. So a simple one-page model helps justify IT service quality requirements with numbers rather than feelings.

What makes up the cost of downtime

Workstation downtime cost is almost never just the employee’s salary for the “idle” time. That’s a convenient lower bound. In reality, downtime drags losses in services, queues, and quality.

1) Direct losses: paid time without results

The most obvious part is the hours when an employee is present but cannot work due to PC, software, network, or peripheral problems. Usually you take the hourly cost (salary plus taxes and contributions) and multiply by downtime.

2) Service losses: unperformed operations and “missed revenue”

If the employee produces measurable results (appointments, transactions, requests, system operations), downtime becomes underproduction. In a government office this is fewer processed requests, in a bank fewer transactions, in a clinic missed appointments.

Then one key question matters: is the result lost forever or merely delayed? If the client leaves, it’s closer to “lost revenue.” If the task is postponed, queues and overload usually increase.

3) Queue and delays: domino effect

Even a short incident accelerates waiting. For example, 30 minutes of downtime for one operator at a service window does not equal “minus 30 minutes” but several hours of backlog if the client flow is high. Load often shifts to colleagues, creating a hidden cost: they delay their own tasks.

4) Quality risks: mistakes and rework

After downtime, employees usually rush to catch up. This leads to more errors, returns, repeated calls, and re-entering data. It’s hard to measure precisely, but you can at least estimate time for fixes and the number of repeat requests.

5) Indirect effects: deadlines, fines, reputation

If a process has fixed deadlines (regulation, contract, internal KPI), downtime can cause delays. Then fines, overtime, increased support requests, and complaints appear. Reputation rarely converts directly into money, but to justify reliability requirements it’s usually enough to show that downtime increases both costs and the risk of failing obligations.

A practical approach: first calculate the “minimum” (direct losses), then add 1–2 of the most noticeable components for your process. Most often that’s unperformed service and increased queue.

One-page model: rows and columns you need

One page in Excel or Google Sheets is enough to consistently calculate losses for different departments. The sheet should be understandable not only to IT but to a manager: what happened, how much time was lost, how much it cost, and why.

Columns (what to record for each incident)

It’s convenient to log row-by-row: one row = one incident (or one day of downtime if an incident lasts).

Date (or period), department, workstation type (office, cash desk/service window, engineering, remote)
Cause (PC, network, account, software) and who resolved it (IT, contractor, user)
Incident duration (min/hour) and share of actually lost time (0–100%)
Hourly cost of the employee and a coefficient for impact on service/queue
Final losses (auto-calculated)

How to calculate key fields (simple rules)

Share of actually lost time is needed because not every incident equals a full stop. If an employee waited 30 minutes but spent 20 minutes answering emails by phone, the time loss would be 10/30 = 33%.

Hourly cost take as: (salary + mandatory contributions) / working hours per month. If you don’t want to argue about accuracy, add a single overhead coefficient (for example, 1.2) and use it for everyone.

Service/queue impact coefficient helps to quickly account for the domino effect when one person’s downtime slows the flow, increases waiting, and causes repeat requests. Start with 2–3 levels, for instance: 1.0 for an office role without queue, 1.3 for customer flow outside peak, and 1.6–2.0 for peak hours.

Step-by-step calculation: from incident to sum

You don’t need a complex model to calculate downtime losses. Describe each incident so the calculation can be easily repeated weekly or monthly.

Five steps you can do in 10 minutes

Record downtime duration and frequency. Measure from the moment the employee could not perform core work until recovery. If there was waiting for confirmation or issuing a replacement, that is also downtime.
Calculate hourly cost of the employee in a clear way. The simplest option: (salary + taxes/contributions + regular allowances) / planned working hours per month. If you don’t have data, use salary/hour as the minimum and separately show “full cost.”
Estimate the share of lost output. Downtime is not always 100% loss: sometimes part of the tasks can be done without the system. Use a coefficient from 0 to 1 (for example, 0.7 if 30% of time was used for useful work).
Add the effect on service or queue. This is more important than salary where there is customer flow or strict deadlines: how many requests weren’t processed, how many appointments were missed, what penalty or revenue loss is possible.
Summarize and make 2–3 scenarios. Basic formula:

Downtime x Hourly cost x Share of loss + Service impact.

Mini example

An operator at a service window is idle for 1.5 hours. Hourly cost = 4,000 KZT. Output loss = 0.8 (part of the time was spent consulting without the system). Base: 1.5 x 4,000 x 0.8 = 4,800 KZT.

Queue effect: on average 6 clients per hour, margin = 700 KZT per contact. Loss: 1.5 x 6 x 700 = 6,300 KZT.

Total: 11,100 KZT for one incident. For reporting, show three variants: optimistic (lower losses), base, and pessimistic (higher queue, higher loss coefficient). This helps justify SLA requirements, support level (for example, 24/7), and the need for spare workstations.

How to estimate impact on services and queues

Prepare procurement without risks

We will help prepare procurement and integration requirements so downtime doesn't eat your savings.

Submit a request

It’s important to see not only salary losses but what exactly stopped happening: appointments, calls, request processing, document issuance. This is especially visible where work is a flow.

The simplest conversion of downtime into “unperformed services”: take the service rate (services per hour) and multiply by downtime. The rate can be taken from regulations, last month’s statistics, or the historical average.

A small set of formulas usually enough to keep handy:

Unperformed services = (average speed, services/hour) x (downtime, hours)
Revenue or value loss = unperformed services x value per service (or penalty/risk)
Queue growth = unperformed services from downtime (they move to later)
Additional waiting = queue growth / actual processing speed after recovery
Peak adjustment = downtime during peak x coefficient 1.3–2.0 (by load)

If there is a queue, one workstation’s downtime affects others: clients don’t disappear, they wait. Waiting time increases, the share of “left without being served” grows, and repeat requests appear. Even after recovery, the team takes time to catch up with the backlog.

Separate out downtime of key roles: operator, cashier, dispatcher, doctor, accountant. Their downtime often blocks the chain: others are present but the process is stuck (cannot register, pay, schedule, or close a case).

Don’t forget workarounds. If part of the work moved to paper, personal phone, or someone else’s PC, that is not zero loss but a cost transfer: more time, more errors, and later reprocessing.

Example: at a service window they usually serve 6 clients per hour. If one station was down for 40 minutes during peak, that’s about 4 clients who will move into the queue. Then calculate how long it takes to catch up and what that did to waiting and service quality.

Where to get data without complex analytics

You don’t need a big BI report to estimate losses. A few simple numbers from commonly available sources in most companies are enough.

Data sources usually already available

Start with Service Desk or at least a ticket log (email, chat, Excel). There you usually have date and time of the request, problem type, and a comment about the result. If monitoring exists (network or server availability), it helps confirm outages and shows when things went down and up. A short talk with the department manager is useful: they quickly separate incidents that truly stopped work from those that were just inconvenient.

If you need quick data, collect a minimal set for two weeks:

start date and time (when the employee lost the ability to work)
restore date and time (when they could perform tasks again)
employee role (operator, doctor, accountant, manager)
cause of downtime (category)
how many people were affected (1 or a group)

Before calculating, agree on boundaries. Count from the moment “cannot perform core work” to “can work again.” Do not include user waiting time if the workstation is already restored. Separately note cases where a spare PC could have been used.

How to avoid arguing about salaries and speed up agreement

Don’t discuss individual salaries. Use an average hourly cost by role or department (payroll fund + taxes divided by working hours). For a basic model, 3–5 roles are usually enough. This reduces disputes and speeds up approval.

Group downtime causes broadly to see where main losses are: PC (hardware), OS and updates, network/Wi‑Fi, accounts and access, peripherals (printer, scanner, cash register). For example, in a clinic one failing scanner may cause fewer minutes of downtime but more complaints because of the queue. So keep the category separate.

Scenarios: how to show the effect of improvements

To justify improvements, it’s easier to compare three clear scenarios. Then losses turn from an abstraction into choices: what to change and what effect it will have.

3 variants that are easy to compare

Summarize them in one table:

Scenario	What changes	What we recalculate
Current	As is: incident frequency and recovery time	Monthly losses at current MTTR and incident count
Improved service	Faster response and recovery (lower MTTR)	Savings from reduced downtime
Fleet upgrade	Fewer failures and hangs (less frequent incidents)	Savings from reduced incident frequency

The effect of reducing MTTR is direct: if average recovery was 90 minutes and becomes 45, losses per incident halve. Multiply by incidents per month.

Reducing incident frequency is even more visual: if there were 4 incidents/month and it becomes 2, losses from that cause drop by 50% even without MTTR changes.

How to show payback without “magic”

Present results as “monthly savings” vs “cost of the measure” and show payback period. Example: faster support (extra shift, spare workstations, 24/7 for critical points) yields 1.2–1.8 million KZT/month savings at a cost of 0.9 million KZT/month. Upgrading PCs reduces incidents and yields 0.8–1.4 million KZT/month savings with a one-time purchase.

To avoid promises you can’t keep, use ranges:

MTTR: “from 60 to 40 minutes,” not “it will be 37 minutes”
Frequency: “minus 20–40%,” not “no incidents”
Losses: “minimum, base, maximum”
Note separately that peak hours have higher queue impact

Example: workstation downtime at a service window

Launch a system modernization

We will design and implement an IT solution meeting your security and continuity requirements.

Start project

Context: a service window in a government agency or bank. The operator works 09:00–18:00, peaks are 10:00–12:00 and 15:00–17:00. During peak, any failure immediately affects waiting and complaints.

Use the one-page model and simple numbers:

Hourly cost (salary with contributions) – 3,500 KZT/hour
Productivity – 6 clients/hour
Service impact coefficient – 1.6 in peak and 1.3 outside peak
Monthly repetition – based on support requests

Incident 1: “PC does not boot” (peak). Diagnosis and replacement take 45 minutes (0.75 hours). Direct time loss: 0.75 x 3,500 = 2,625 KZT. With service impact: 2,625 x 1.6 = 4,200 KZT.

Incident 2: “no access to the system” (off peak). Password reset and rights check take 25 minutes (0.42 hours). Direct loss: 0.42 x 3,500 = 1,470 KZT. With impact: 1,470 x 1.3 = 1,910 KZT.

Total for the day if both occur once: 4,200 + 1,910 = 6,110 KZT. If per month “PC does not boot” happens 3 times and “no access” 8 times, cost is: 3 x 4,200 + 8 x 1,910 = 27,880 KZT.

Then frame the conclusion as a requirement rather than a wish:

For “PC does not boot”: provide quick exchange to a spare workstation and a standard image for recovery (target – 15–20 minutes during peak).
For “no access”: have a clear procedure and response time for support, plus control of typical causes (target – 10–15 minutes).

This turns the calculation into an argument: you’re not asking to “improve support,” you’re asking to reduce losses and queues through specific measures and measurable recovery times.

How to use the calculation for SLA and service requirements

When losses are expressed in money, SLA discussions become easier. Instead of arguing “faster or not,” you can say: downtime in this department costs X KZT per hour, so it’s reasonable to require these response and recovery times.

Metrics to request in the SLA

Three metrics are usually enough and can be measured without complex systems:

Response time: how many minutes/hours until work on the request begins.
Recovery time (RTO): how long until the workstation is ready to perform the key task again.
Availability: share of time the workstation or service is operational (for example, during business hours).

Translate metrics into money. If one hour of downtime costs 12,000 KZT, the difference between recovery in 2 hours and 6 hours is 48,000 KZT per incident. Such figures are useful for budget requests or support contract changes.

How to prioritize and set requirements

Don’t try to improve everything at once. Use 1–3 months of statistics and calculate losses by cause. Usually the top-3 causes account for the majority (for example, OS failure, disk failure, account issues). Focus strict targets on them.

A minimal set of requirement statements:

For critical roles: response within N minutes, recovery within M hours.
For non-critical: response within N hours, recovery within M days.
Escalation: if not recovered within M hours, the next level is involved.
Reporting: monthly incidents count, average recovery time, amount of losses per model.

To quickly reduce losses, simple measures often help: 1–2 spare workstations per site, standardized images and configurations, replacement rules, and clear rules when issuing a spare is faster than “fixing in place.”

If you select hardware and support for these requirements, check how integrators and vendors implement them locally. For example, GSE.kz (gse.kz) works with infrastructure and workstations for organizations in Kazakhstan, and your downtime model helps define where maximum reliability is needed and where a basic service level is enough.

Common mistakes and how to avoid them

Upgrade your workstation fleet

We will select workstations and all-in-one PCs from GSE for your roles and criticality.

Submit a request

Calculations often “fail” in negotiations because the numbers sound plausible but don’t answer what the business actually lost.

Mistakes that give the wrong total

Confusing “incident time” with actually lost time. An incident might last 2 hours but the employee worked offline for 40 minutes. Record lost minutes by process, not ticket lifetime.
Counting only salary and forgetting service impact. If one workstation increases the queue, losses come not only from that employee but from delays, missed deadlines, and repeat requests.
Picking coefficients “by eye.” Coefficients like “efficiency loss 0.7” should be quickly validated. Five minutes with the department manager is often enough.
Mixing different roles into one average figure. A call center operator and an accountant have different hourly costs and error costs. Break down into at least 2–3 role groups.
Ignoring peak hours and critical processes. Thirty minutes at 10:00 and at 16:50 can cost differently, especially around shift closures, payment windows, or citizen reception.

How to ground the calculation quickly

Test the model on one real case. For example: a PC failure prevented work for 45 minutes, but for 15 minutes the employee handled requests manually. Compare “45 minutes by ticket” and “30 minutes net loss plus customers who missed the window and returned.” Such an example makes the sum clear and defensible.

Checklist and next steps

You don’t need perfect accounting for a first estimate. A honest, clear model that you can refine and compare month to month is enough.

10-minute checklist

Employee role and work type (customer reception, request processing, accounting)
Net time lost per incident (when work was impossible)
Hourly cost (salary with taxes and contributions divided by working hours)
How many operations usually done per hour and what happened to the queue
What was done manually or later (overtime) and how long it took

This is enough to get a basic estimate and compare types of failures. If data is missing, use conservative assumptions and record them in the model to avoid methodology disputes later.

Minimal pilot and regular updates

For a pilot, choose one department where downtime is noticeable: service window, registration desk, call center, or accounting at month-end. It’s important that there are clear volume and queue metrics.

Update the model monthly with the same method:

collect 5–10 most frequent incidents and their durations;
update hourly cost (if rates or schedules changed);
record service consequences (delays, reschedules, queue growth);
note IT changes (updates, PC replacements, new support rules);
compare results to the previous month and briefly note reasons for changes.

The next step for IT and management is to discuss measures that directly reduce downtime: standard PC configurations or all-in-ones, a spare pool, replacement rules, and clear 24/7 support for critical points. If you select hardware and support based on these requirements, it’s useful to compare with how system integrators and vendors do it locally. For example, GSE.kz (gse.kz) works with infrastructure and workstations for organizations in Kazakhstan, and your downtime model helps predefine where maximum reliability is needed and where a basic service level is sufficient.