Why is server monitoring not enough to know if the business is fine?

If you monitor only infrastructure, you can see that services are "alive" but not whether work moves through the process on time. Processes can stall because of queues, manual checks, returns or unclear ownership — issues that don’t show up in CPU, disk or 500 error metrics.

Which process is best to start monitoring to get quick results?

Choose one process where delays directly hurt money, deadlines or trust, and where the flow repeats regularly. Often this is request approvals, support ticket handling or invoicing — start with a process that already has complaints or workarounds.

How to define process boundaries so metrics aren’t disputed?

Agree on two short phrases: “Start = …” and “Finish = …”, tied to specific events or statuses in your system. That removes disputes (for example, whether to count from draft creation or from when the request was sent) and makes metrics comparable.

What data should be collected first to compute basic metrics?

At minimum: request ID, type/topic, initiator, current responsible person, creation time, current status and time of last change. That’s enough to calculate cycle time, spot stuck requests and see where queues form.

What exactly counts as “approval time” and where do people usually err?

“Approval time” is the duration from the chosen start to the chosen finish, not simply the difference between “created” and “closed.” Decide in advance whether you count calendar time or working hours and don’t mix these approaches in the same metric.

How to measure share of returns so it helps improve the process?

Share of returns is the number of requests explicitly sent back for rework divided by total requests in the period. To make it actionable, record return reasons from a short list rather than free text, otherwise you’ll get too much “other” and few conclusions.

How to tell if a request is really “stuck” and not just complex?

A stuck request is one whose status hasn’t changed longer than an allowed time for that step, not simply a long overall cycle. Define which statuses mean “waiting for action” and set a concrete threshold for each important stage.

How to choose thresholds so alerts don’t become noisy or silent?

Start from facts for the last 2–4 weeks and set thresholds slightly better than the current level so goals are achievable. Use percentiles (especially P90) rather than only averages, and separate request types so urgent requests don’t skew ordinary ones.

What should an alert contain so it can be closed by action?

A useful alert includes the minimum needed to act: request ID, current stage, owner, how long there’s been no movement and what the next step should be. If an alert doesn’t lead to a concrete action or has no owner, it quickly turns into noise.

What are the most common mistakes when implementing process monitoring?

Common failures are vague definitions, too many metrics, alerts that go “nowhere,” and poor data hygiene (empty statuses, backdated edits, unclear return reasons). Also avoid using metrics as punishment — people will learn to bypass measurements instead of improving the flow.

Business Process Monitoring: Simple Metrics and Alerts

Why monitor the process, not just the servers

Server monitoring answers the question: “Is the system alive?” But for the business the more important question is often: “Is work getting done on time and without extra loops?” Servers can look green on all dashboards while requests sit in queues for weeks because nobody picks them up or they keep getting sent back for rework.

When you look only at infrastructure, you see symptoms (CPU load, 500 errors, disk space) but not the outcome. A process consists of people, rules, queues, manual checks and approvals. Delays and quality losses usually appear there first — and they often don’t show up in server metrics.

Typical problems that are “invisible” at the hardware and service level:

queues between stages: a document is ready but waits a week for signatures;
manual steps: someone copies data from an email into a system and that slows the flow;
returns: a request is sent “back for revision” several times and the deadline stretches;
unclear responsibility: it’s not clear at which stage and with whom the task currently is;
bypass exceptions: some requests are resolved in chats and have no history.

Process monitoring shifts the conversation from “everything is running” to “we processed X requests with this time and quality.” Bottlenecks appear faster: you can see which step accumulates a backlog, where returns happen most, and how the situation changes after a rule update.

Success here is measured by simple things: fewer delays, fewer returns, predictable cycle times and clear ownership for each stage. For example, if purchase approvals regularly “stall” at legal review, you’ll see an increase in waiting time specifically at that step even if the system has no errors. Then it makes sense to change the rule, redistribute tasks or update document templates — not to search for a server problem.

Where to start: pick a process and measurement boundaries

Don’t start with dozens of charts. Pick one process where delays really hurt money or trust. Often it’s approval flows (purchases, business trips, access requests), support ticket handling, or invoicing. Trying to cover everything at once wastes time and yields few useful signals.

For monitoring to work, a process needs clear boundaries: where it starts and where it ends. Without that, “approval time” will be calculated differently by everyone and debates will eat the benefit.

How to choose 1–2 starter processes

Evaluate candidates by simple signs. A good starter process:

repeats frequently (daily or weekly);
shows noticeable delays or queues;
involves several teams so handoffs are lost;
already has complaints or manual workarounds.

For example, purchase approvals may have green servers and email, but employees wait weeks and requests are returned for a single form error. Here it’s more important to see process bottlenecks than infrastructure load.

Lock the measurement boundaries with simple rules

Write two short phrases: “Start = …” and “Finish = …”. Start could be when a request gets status “Submitted” (not “Created”, if drafts hang around). Finish could be “Paid” or “Completed”, depending on what you want to control.

Next assign roles. You need a process owner (makes decisions and changes rules) and a data owner (ensures statuses are filled and events are recorded consistently).

Agree in advance what you’ll do with the metrics. Example rules:

if 10% of requests are stuck more than 48 hours at one step, the stage manager reviews the queue daily;
if the share of returns grows, update the form and the applicant checklist.

Data and events: where to get facts for metrics

Process metrics are based on simple facts: when a request appeared, how and when its status changed, who participated, whether it was returned and why. The good news: these data points often already exist, just scattered.

Traces of the process are usually where people already work daily: service desk (incidents and requests), CRM (deals and approvals), corporate email (threads and replies), spreadsheets (request registers), and internal portals (forms and routes). Even if you have a single system, look at status history and comments, not only the final status. Comments often include return reasons or clarifications that explain why a request stalled.

For a start don’t try to collect everything. A minimal set of fields is enough to compute approval time, share of returns and stuck requests without debates:

request identifier (number);
what is requested (type/topic);
initiator and approver(s);
creation date and time;
current status and date of last change.

If returns occur, add one more field: return reason (a list of options, not free text). Then the share of returns will be understandable and comparable across teams.

The main trap is forcing people to fill too much manually. It’s better to capture data automatically: record status changes as events, pull the author and timestamp from the system, and present return reasons as a short picklist.

For example, when approving workstations or servers (a typical IT and procurement scenario), returns often relate to incomplete specifications or wrong budget codes. If that is captured with one click, the metric becomes reliable and alerts about stuck requests stop being noisy.

Three basic metrics: clear definitions

For process monitoring to be useful, metrics must be defined so two people calculate them the same way. Below are three basic metrics that usually reveal bottlenecks within the first week.

1) Approval time

This is the duration of a request’s path from a clear start to a clear finish. A common mistake is to count “what’s in the system” without agreeing on rules.

Start is usually recorded when the request becomes “ready for approval”: submitted, required fields filled, documents attached. End is when a final decision is made (approved or rejected), not when someone just “looked” at it.

Decide separately how to count pauses. Two useful options:

calendar time (how the business experiences it);
active time (how long the team actually worked on it).

For example, if a request waited two days for information from the initiator, that can be part of total time (for end-to-end service evaluation) or excluded (for evaluating approvers). It’s best to store both numbers but not mix them on one chart.

A return is not any comment. It’s a clear status change after which the initiator must fix something: “needs revision,” “returned for completion,” “documents do not match.” The share of returns is simply returns / all requests over a period.

To make the metric actionable, classify return reasons. Don’t create 30 categories — 4–5 practical ones are enough. For example: “missing data,” “amount/item error,” “wrong approval route,” “no supporting document,” “other.”

If 60% of returns are “missing data,” that signals improving the form and prompts, not pressuring approvers.

3) Stuck requests

A “stuck” request is one that hasn’t changed status longer than the allowed time for a given step. Measure the absence of movement, not just a long total cycle. A request can be long but still active: actions happen every day.

Define which statuses count as “waiting for action” and after how many hours or days this becomes a problem. Simple example: “With manager for approval” without changes for more than 48 hours = stuck. This way you catch bottlenecks precisely instead of blaming the entire process.

Targets and thresholds: set goals that work

Launch a process monitoring pilot

We'll pick one process, define events and thresholds so alerts actually help.

Start a pilot

Targets exist to quickly separate normal variation from a real problem. If thresholds are too strict, alerts fire constantly and people stop reading them. If too loose, you’ll learn about failures from an unhappy customer.

For approval time it’s better to start with percentiles rather than averages. P50 (median) shows typical time, P90 shows what happens in the tail where delays hide. In practice P90 often reflects the pain: rare but severe cases.

Immediately separate norms by request types. Urgent requests usually have a different route and expectations. Mixing urgent and normal requests under one threshold creates false alarms or hides deterioration.

Another source of dispute is the calendar. “24 hours” on the wall and “24 working hours” are different. For approvals it’s usually fairer to count only working hours (for example, 09:00–18:00 on weekdays). Then a request created on Friday evening doesn’t look catastrophic on Monday morning.

To set an initial target without overpromising, start from facts for the last 2–4 weeks and set thresholds slightly better than current performance:

capture baseline: current P50 and P90 for approval time;
split into 2–3 categories (for example, urgent vs normal, or by department);
decide whether you count only working hours;
set a goal one step ahead (e.g., improve P90 by 10–20%, not halve it);
define what counts as a breach: “daily P90 above threshold” or “N requests older than threshold.”

Example: if normal requests currently have P50 = 6 working hours and P90 = 2 working days, a reasonable first goal can be P50 ≤ 5 hours and P90 ≤ 1.5 days. For urgent requests set separate goals, e.g., P50 ≤ 1 hour and P90 ≤ 4 hours. This gives clear expectations and enables simple alerts without constant noise.

How to set simple alerts: a step-by-step scheme

Alerts should help a person intervene before a client is affected or a deadline is missed. Start with 2–3 clear signals and make them habitual rather than creating dozens of rules at once.

Five-step scheme

Define which signal is useful and who should receive it. A team lead needs to know about requests with no movement; an executor benefits from a near-deadline reminder; the process owner needs to know about a spike in returns.
Pick a simple trigger. Start with one metric: time in status (e.g., “In approval” more than 48 hours), repeat events (“returned” more than twice) or queue accumulation (“awaiting assignment” more than 20 requests). Complex formulas often cause disputed triggers.
Add context so an alert can be closed with an action, not a long chat. Minimum: request ID, current status, current owner, how long it’s been idle and what should be the next step (for example, “assign an executor” or “awaiting manager signature”).
Define channel and quiet hours. Some notifications belong in email, others in chat, others only in a daily digest. Set hours when alerts are suppressed and limit frequency: for example, no more than one message per request every 4 hours.
Do a weekly check of noise. Review which alerts fired most often and what resulted. If a signal is regularly ignored, simplify or replace the rule.

How to quickly reduce noise

Keep alerts only where there’s a clear action owner. If there is no owner, fix the request route first. Keep thresholds realistic: alerts for every small delay quickly stop being effective.

Dashboards and reports: who needs what

All-in-ones for the front office

Equip the front office with all-in-one PCs for receiving requests and using internal portals.

Select an AIO

Monitoring usually fails not because of data, but because of presentation. The same metrics help different roles in different ways. Separate who makes decisions, who manages queues, and who does the work.

For managers: 3–5 numbers that show process health

Managers don’t need a list of requests. They need to see where the process loses time and quality. One screen with 3–5 indicators suffices: trend of approval time (by week), share of returns, share of overdue items and incoming volume. Add top-3 return reasons so discussion moves from feelings to actions.

Show not only averages but also the median or the 75th percentile. Averages are skewed by rare cases; percentiles give a truer picture for most cases.

For executors: what’s stuck and what to do now

For executors the dashboard should look like a workspace: few charts, lots of concrete items. Central is a list of stuck requests with clear priority. Priority can be a simple combination of how overdue the request is and how important it is (type, client, department).

Add brief hints to each request: where it’s stuck (stage), how many hours in status, and what’s needed to move it (comment, document, approval).

Daily short report: only items that need decisions

A daily report is a morning “pulse.” It should answer three questions: what’s already overdue, what’s about to become overdue, and where the manager’s help is needed. Usually three blocks suffice:

overdue: count and the five oldest cases;
at risk: which items will cross the threshold today (by stage);
needs action: requests without an owner, without a response, or with repeated returns.

To avoid confusion, fix common names and units. For example: “Approval time, hours” = from submission to final decision. Include a short example calculation for one request to reduce disputes.

In organizations running parallel deliveries, approvals and IT changes (for example, during infrastructure rollout and 24/7 support), role-based presentation quickly turns metrics into working tools rather than a pretty chart.

Practical example: approval flow in an organization

Imagine a purchase or contract approval: the initiator creates a request, it goes through finance, legal, information security and then manager approval. Servers can be perfect, but the business still halts if requests sit for weeks with a single approver.

To make monitoring useful, agree which events you record exactly: “request created,” “moved to step,” “approved,” “returned for rework,” “closed.” Then metrics are computed transparently.

Helpful minimum:

time per step (how many hours or days a request spends with finance, legal, IS) — this shows a bottleneck rather than an average across the whole flow;
returns: share of requests returned at least once, and separately the share with repeated returns;
overall cycle: from creation to closure.

Keep alerts short and unambiguous:

“Request stuck”: no status change on a specific step longer than X hours (legal and IS thresholds often differ);
“Repeated return”: the request was sent back for rework twice in a row;
“Cycle threshold exceeded”: total approval time exceeded Y days even if each step looked "almost normal."

Most important is agreeing actions. If returns mainly come from legal, the issue is often criteria: no contract template, missing required fields, or wrong documents from the initiator. If one approver consistently causes delays, it’s likely overload or vacation without a replacement — redistribute tasks and set clear substitution rules.

In implementation and support projects provided by system integrators like GSE.kz, it’s useful to document not only the dashboard but also a “reaction playbook”: who gets notified, when escalations kick in and what counts as a resolution.

Common mistakes and pitfalls in process monitoring

Role-based dashboards

We’ll build dashboards for managers, queues for performers and a short daily report.

Request a demo

The effect usually breaks down when teams try to monitor everything at once. You get ten charts that are all “important,” but nobody knows what to do with the numbers. After a couple of weeks reports are ignored because they don’t help decisions.

Common traps:

chasing many indicators: with too many metrics people pick the ones that “look good” and lose trust in the rest;
vague calculation rules: one measures from creation, another from first view, a third excludes weekends;
alerts that go “nowhere”: messages are sent to everyone or no one and no one takes responsibility;
poor data quality: empty statuses, backdated edits, return reasons that mean the same thing;
metrics used as punishment: when people are measured harshly they learn to game the system (close stages formally, split requests, change statuses).

Simple example: a request is stuck at “Legal” for six days. An alert fires but goes to ten people. No one takes it because “it’s not my area.” Meanwhile the card lacks the required return reason field and many returns are logged as “other.” The report shows delays but no clear causes, so nothing improves.

A few rules to avoid these pitfalls:

start with 2–3 metrics that can be explained in one sentence and tied to an action;
write down the definitions: from which event to which, what to exclude, which statuses are valid;
every alert must have an owner and a simple action: “check,” “escalate,” “return with comment”;
agree on data hygiene: required fields, a short list of return reasons, forbid empty statuses;
present metrics as a reason to improve a stage, not to find blame.

This builds trust in the numbers and lets you manage the process instead of arguing about how to calculate time.

Quick checklist and next steps

Process monitoring works only when the process has a clear owner and metrics mean the same thing for everyone.

Quick pre-launch checklist

a process owner is assigned and someone can make decisions based on metrics;
metrics are described unambiguously (from which event to which, which statuses count, what is a return);
thresholds and goals are set (normal, warning, critical) and it’s clear what to do at each level;
alerts contain context: request ID, stage, how long it’s been stuck, who is the owner, what should be the next step;
data is reliable: statuses are actually filled, return reasons are not all “other” but chosen from a clear list.

After that, test alerts in real conditions. One glance at a notification should answer: what happened, where, how urgent, who and what should do it. If alerts are noisy, people will ignore them. If silent, you’ll hear about problems from customers.

30-day plan to avoid getting stuck

Run a pilot on one process and make it part of daily work:

days 1–7: pick the process, fix definitions, start collecting events;
days 8–14: set up 2–3 metrics and simple thresholds, run them on historical data;
days 15–21: enable alerts for a limited group, reduce noise, add context;
days 22–30: confirm the owner, a daily check ritual (for example, 10 minutes each morning), and eliminate the first 3–5 recurring causes of delay.

Then expand to neighboring processes, but only transfer approaches that proved useful.

If you need integration with existing systems, event collection infrastructure or 24/7 support, you can discuss this with GSE.kz (gse.kz) as a system integrator so process monitoring becomes a working management tool, not a checkbox report.