Where to start when replacing a WAF and what goals to set

Moving from a commercial WAF to an Nginx + ModSecurity stack almost always starts not with rules, but with reasons. Common drivers are license costs, vendor lock-in, desire for full control over configuration and logs, and requirements for local support and supply chain transparency. If these reasons aren’t documented up front, the project can quickly turn into endless tuning “to be like before” without a clear idea of the expected outcome.

Set expectations early. A WAF reduces risk and blocks some attacks, but it does not fix vulnerabilities in the code or replace secure development practices. If the application has an SQL injection or broken authorization logic, a filter can temporarily hide symptoms, but the underlying issue remains.

Next, check whether the service is a good candidate for this change. The ideal case is a web application with predictable URLs and forms, clear parameters, stable clients, and decent logs. It’s harder when there are many nonstandard APIs, frequent releases, multiple teams, and any extra false positive hurts revenue. In such systems it can be smarter to keep a managed WAF at the perimeter and use ModSecurity as an additional layer or enable it only for specific domains.

Before starting, agree on things that will save weeks later: who owns the service and who owns the rules (who decides to "block"), SLAs for incidents and false blocks, change windows and rollback procedures, success criteria (for example, portion of attacks mitigated and acceptable false positive rate), and the initial mode — detection only or blocking from day one.

When goals and boundaries are clear, replacing a WAF stops being "security magic" and becomes a manageable engineering change.

What Nginx and ModSecurity provide in practice

In this model Nginx usually acts as a reverse proxy in front of the application. It accepts incoming HTTP(S) traffic, can terminate TLS, normalize requests, and then forward them upstream. Filtering is enabled at the Nginx level (http/server/location context) by loading the ModSecurity module, so the control point sits before the application — the same place a commercial WAF normally sits.

ModSecurity is a rules engine. Importantly, it has two main modes:

Detection (log only)
Blocking

In real deployments teams almost always start in Detection to see what breaks, and only then carefully enable blocking.

A basic start is OWASP CRS — an out-of-the-box rule set that covers many common web-level attacks (injections, suspicious headers, encoding bypasses, odd parameters). CRS doesn’t make a service impenetrable, but it provides a sensible baseline you can live with and improve over time.

The difference with commercial platforms often shows up in what’s not included by default: behavioral analysis and auto-learning for your traffic, advanced bot protection (browser checks, fingerprints, reputation), L7 DDoS mechanics provided by managed services (challenges, global networks, ready-made profiles). Nginx + ModSecurity work well as a transparent, controllable rules layer, but the "automation" of commercial WAFs must be compensated with separate solutions or accepted as an explicit risk.

Attacks typically covered by rules

When you enable OWASP CRS the first wins are usually against common web attacks. This is not "smart protection by itself" but signatures and checks of what appears in URLs, headers, parameters, JSON or form bodies.

The most common category is injections. Rules catch typical fragments for SQLi, XSS and command injections: unexpected quotes and operators, suspicious functions, attempts to break out of a context and inject code. For example, a search request with a parameter like q=' OR 1=1 -- will almost always trigger an alert even if the application later sanitizes it.

The second big group is attempts to read or bypass file paths. Path traversal usually looks like ../ (in various encodings) or attempts to access system files. CRS handles such requests well, as well as scanner calls for common files and directories.

Another class covers protocol violations and “weird” requests: invalid methods, unexpectedly large bodies, conflicting headers, suspicious Content-Type values, and malformed requests often used by bots and scanners.

Rules for sessions and cookies also trigger frequently: overly long values, unusual character sets, attempts to inject code into cookies or headers. This helps against primitive attacks and noisy traffic.

Protect admin areas not just with CRS. Simple Nginx controls and targeted restrictions usually help: allow only needed methods, restrict admin panels by IP lists, enable geo-restrictions only where justified.

The broader the rules, the higher the risk of false positives on legitimate requests. Start in observe mode and gradually enable blocking for the most confident categories.

Protection boundaries: what a WAF doesn’t solve

When migrating to Nginx + ModSecurity, agree in advance on boundaries. A WAF catches pattern-based web attacks, but it does not make the application secure by itself. If expectations are too high, you’ll end up with either gaps or endless false blocks.

Where signatures fail

OWASP CRS rules rely heavily on patterns. They can be bypassed by payload variations, encodings, parameter splitting, rare headers and nonstandard field order. Sometimes an attack looks like a normal business request and the WAF cannot tell without application context.

Another trap is expecting features that are actually different protections: anti-bot and behavioral defenses (scraping, fraud, credential stuffing), device fingerprints and session risk scoring, reputation-based account protection, advanced API protection by schema/contract, and incident-management-as-a-service (analytics and response).

Protocols and performance

Traffic inspection depends on how you proxy the connection. For TLS you either terminate encryption at the proxy or you won’t see the content. HTTP/2, WebSocket and especially gRPC introduce nuances: long-lived connections, binary bodies, and special headers. Formally it can work, but content parsing and rules become trickier and sometimes less useful.

Performance issues are usually related to request body sizes and rule complexity. Large JSON, file uploads, parallel requests and heavy regular expressions consume CPU quickly. Enabling deep inspection everywhere can cause latency and timeouts that look like application failures.

When to combine protections

A WAF is most effective when it complements, not replaces, other measures: fix code logic and authorization bugs, use rate limiting and body size limits, enforce strict API contracts (allowlists/schemas) for critical endpoints, deploy dedicated anti-bot solutions where business needs them. Viewed this way, ModSecurity becomes a predictable control layer, not the lone line of defense.

Choosing architecture and placement in the traffic flow

The first decision is where the WAF will sit. The simplest option is at the perimeter, in front of or immediately behind the load balancer, to cover general inbound traffic. This works well when you have many services and need centralized control. If services differ greatly (for example, a public site and an internal API), it’s often better to place the WAF closer to the specific service — that makes rules easier and reduces the risk of breaking legitimate requests.

Sometimes WAF is placed in a separate zone (DMZ) as an independent proxy layer. That helps with strict network boundaries but adds an extra hop and complicates debugging.

For resilience don’t overcomplicate: the WAF should be as reliable as your entry point. Common approaches are active-active behind the balancer (multiple nodes with identical config), active-passive (standby node with fast failover), or horizontal scaling by load.

Split configuration into two layers: common rules (baseline protections, headers, limits) and service-specific rules (exceptions, thresholds, allowed paths). This reduces chaos: update the common layer once, while service exceptions live near the team’s code.

Store configs as code: a single repo, versioning, reviews and clear change history. That way you can see why an exception was added and who added it.

Have a fast rollback plan: keep a mode switch (Detection/Off) per service or location. If a release spikes blocks for an API, you can quickly move the problematic rule to Detection, record false positives and restore blocking without stopping traffic.

Step-by-step migration without downtime

WAF architecture and fault tolerance

We’ll design WAF placement on the perimeter, in a DMZ or closer to services for your architecture.

Discuss the project

The risky part of migration is not the installation but the abrupt switch to blocking. To avoid downtime, start with observation.

First enable ModSecurity in Detection mode (log only) and let it run on real traffic. In 3–7 days you typically see a baseline: which rules fire most, which methods and URIs are checked, and where unexpected spikes occur.

Then enable OWASP CRS and still avoid blocking. At this stage it’s important to check how the app behaves with typical checks: forms, authentication, search, file uploads, API calls.

At the same time fix technical settings so requests are parsed correctly. Common adjustments include request body size limits and per-parameter limits, correct handling of file uploads (multipart) and common extensions, JSON parsing (content-type, encoding), unified log format and correlation with request id.

When statistics are clear, pick a few critical endpoints and enable blocking one by one. Start with endpoints where risk is highest and the business tolerates stricter checks: admin panels, authentication, payment requests, contact forms. Leave Detection for others and add exceptions gradually.

Prepare a short runbook for on-call and service owners: how to tell an attack from a false positive, where to find events in Nginx and ModSecurity logs, how to temporarily relax a rule for a specific URI or parameter, and how to rollback changes without restarting the whole service.

How to collect and triage false positives

A false positive is when the WAF blocks or flags a legitimate request from a user or service. Decide in advance who makes the call: security owns the risk assessment, and the application owner confirms whether a request is legitimately needed by the business.

Start in Detection and collect a baseline. Without a quiet period you’ll almost certainly be firefighting instead of tuning.

To triage quickly, include in logs: rule id, what matched (matched data), URL and method, parameters and headers, and response status. A single rule id often indicates whether the hit is SQLi, XSS or a suspicious body format.

Common suppression approaches from safer to riskier:

Narrow exception by parameter (allow specific special characters only for the comment field)
Exception by endpoint (only for a specific URL with unusual format)
Disable a rule by rule id (only with strong justification and compensating measures)

Exceptions should be minimal and documented. Add a comment, an owner and an expiry (for example, 30 days), after which the exception is reviewed. If you need to relax a check, first scope it to a field or format, not the whole site.

A good FP workflow usually looks like: a ticket with the example request and timestamp, reproduction in test and confirmation by the app owner, decision on an exception agreed with security, implementation of the rule together with a release and a quick log check after rollout.

Embedding rules into release processes

Performance and sizing calculation

We will choose WAF node architecture and resources for your traffic and API formats.

Request sizing

The most important change is to stop editing rules directly in production. Rules should live alongside code and be released as predictably as other changes.

Rules as code

Split rules into base rules, exceptions and local tweaks. This makes changes clearer and easier to roll back:

crs/ - immutable OWASP CRS set (updated via separate PRs)
custom/ - your app- and API-specific rules
exclusions/ - targeted exceptions tied to URL, parameter or method
tests/ - a suite of requests for validation (legit and attack cases)
docs/ - short description of conventions and typical solutions

For PRs use a short template: why the change, what risk, which endpoints are affected, what logs and events to expect. Review is mandatory: it’s easy to "fix" a false positive by opening a hole inadvertently.

Releasing changes

Different strictness across environments is normal. On dev you can enable more aggressive rules to catch problems early; on prod keep minimal necessary blocking.

Before release run automated tests: good requests should not be blocked and bad ones should be consistently caught. A typical sequence is: Detection-only on stage, then on a small portion of prod, then selective blocking for the most confident rules, measure effect (block rate, number of FPs, load, user complaints), and have a fast rollback (flag/env var or revert to previous rules tag).

Release rules in small batches, observe the effect immediately and avoid turning the WAF into a black box that blocks releases.

Observability: logs, metrics and alerts

When moving to Nginx + ModSecurity you lose some ready-made dashboards, so you should build observability intentionally. Otherwise you either won’t notice attacks or you’ll start disabling rules at random because of false positives.

Useful minimum metrics to track regularly: block ratio (blocked vs allowed) by hour/day, top firing rules (rule id) and their trend, top endpoints with hits, Nginx latency and WAF impact (if measured separately), and status-code distribution (200/3xx/4xx/5xx) alongside WAF events.

Don’t view WAF in isolation from the app. If 403s rise after a rule change, check which requests are blocked and whether app errors increased. If 5xxs grow, it may be a side effect: a rule allowed a heavy request that overloaded the app, or retries and timeouts increased load.

Keep alerts simple: a sudden spike in blocks relative to baseline (e.g. 3–5x), growth in 403s for a single endpoint or client/subnet, rise in 5xxs immediately after a rule change or OWASP CRS update, a new rule appearing in the top list, or sustained latency growth at the entry point after enabling blocking.

Store logs so incidents and FPs can be investigated: request, rule, reason, endpoint, correlation id (request id), source. Every 2–4 weeks review exceptions: old paths retire, new ones appear, and temporary workarounds often remain "forever" and turn into holes.

Common mistakes when replacing a commercial WAF

Migration failures usually come from the approach, not ModSecurity or Nginx. People often expect rules to immediately block everything bad and never interfere with business. In practice the first weeks are spent on observation, tuning and discipline.

Mistakes that most hurt availability

Typical examples: switching to blocking immediately without a quiet period and getting unexpected 403s on legitimate requests; creating too-broad exceptions (e.g. "don’t check the entire /api") and forgetting them; not validating real traffic formats (JSON, multipart, large files, long headers, nonstandard encodings); lacking a rollback plan and an accountable person; confusing WAF with protection against spikes (WAF handles request-level attack classes but doesn’t replace rate limiting and overload protection).

Agree on a simple rule early: until rules are proven, operate in detection mode, collect request examples and decide on each type of hit.

Short practical example

A team has a portal with a document upload form and a separate JSON API for mobile apps. After enabling rules they see hits on the comment parameter because users paste text with \\u003c\\u003e escapes or code snippets. If you simply disable checks for all POSTs, you lose protection where it’s needed.

Better: narrow the exception to that specific parameter and endpoint, set a review deadline and add rate limiting for expensive methods (e.g. uploads).

Quick checklist before and after enabling blocking

Hardware for Nginx and ModSecurity

We’ll pick GSE servers for Nginx and ModSecurity considering load and scaling.

Select servers

The riskiest moment is switching from Detection to Blocking. The checklist below helps avoid breaking legitimate traffic and drowning in incidents.

Before enabling blocking verify the base so OWASP CRS behaves predictably and logs are useful:

Services and critical endpoints, owners and change windows are recorded.
Body size limits, allowed methods, encodings and URL normalization are configured.
real IP is enabled (so events show the real client, not the balancer) and logs are in a unified format.
Start in Detection and plan phased rollout: one service or a small share of traffic, then expand.
There’s a process for false positives: who decides, SLA for triage, where decisions are stored, how exceptions are named and when they’re reviewed.

After the first Blocking release, maintain control:

Monitor metrics: block share, top rules, top URLs, growth in 4xx/5xx and impact on key business events.
Set alerts for spikes in blocks by endpoint, rule and source.
Do short retros after each incident: what was blocked, why, how discovered, and how to improve rules and tests.
Regularly clean exceptions: remove temporary ones, merge duplicates, enforce expiry and owner.
Align rule changes with application releases: when an API or form changes, prepare rule updates and test cases in advance.

Example rollout: from pilot to stable protection

Imagine a company with two entry points: an internal staff portal (VPN access, many admin actions) and a public client form (public traffic, conversion is critical). During migration risk profiles diverge: the portal can be strict, while the public form must avoid breaking legitimate submissions.

A pilot can be organized by week:

Week 1: Detection. Collect OWASP CRS events, list top-10 rules and top-20 URLs with most hits, and mark where events look like attacks vs where they hit business logic.
Week 2: Targeted exclusions. Add narrow exclusions only for specific parameters and paths, run common scenarios on stage (login, search, form submit). For the admin portal enable partial Blocking for the clearest classes (SQLi, RCE, auth bypass attempts).
Week 3: Expand Blocking. Gradually add blocking for the public form, but start with rare, clearly malicious signatures and anomalous requests not seen in normal traffic.

To ensure you’re not harming the product, agree to track daily: conversion on the public form and share of 4xx/5xx, response times and retry growth from clients, blocked request share for key URLs, and support tickets saying "submission fails" or "can’t access".

Architectural questions often surface: where real IPs are lost, how to split policies for portal and form, how to version rules with releases. If you lack people or need 24/7 coverage, teams often hand this phase to an integrator. For example, GSE.kz (gse.kz) as a system integrator can help with deployment and support of infrastructure around the WAF as part of an overall protection system.