What exactly does “backup without commercial agents” mean?

This means you don't install proprietary vendor agents on servers and workstations and you don't become dependent on their license, closed format, or separate console. Instead you use tools like restic or borg and store backups in S3-compatible storage, retaining control over keys and the recovery process.

Do I need to back up the entire disk or are folders enough?

Usually you back up the data you actually need to recover: application directories, configs, user documents and database dumps. The OS and runtime environment are often faster to restore using automation or an image, while user data is restored from backups.

How to quickly decide between restic and borg for S3?

Choose restic if you want to write directly to S3 out of the box and quickly scale the same setup across many nodes. Choose borg when you have a central backup server and pull data over SSH, with S3 used as an external copy in a second step.

What to check in S3-compatible storage before starting backups?

Check critical compatibility items: request signature support, multipart upload behavior for large objects, request and object size limits, and bucket addressing style. Then run a practical test from the machine that will do backups: upload and download a several-GB file and measure real throughput to see if it fits the window.

What permissions should S3 keys have so we don't lose backups?

A basic rule is a separate user and separate keys just for backups, with permissions only for the needed bucket. Where possible, separate keys for writing, for reading (recovery tests) and for administration so a leaked key cannot silently delete history.

Is encryption enabled automatically in restic and borg or do I need to configure it?

Restic encrypts repositories by design, and without the password a repository cannot be restored. With Borg, encryption is chosen at repository initialization, so enable an encrypted mode from the start and plan where the passphrase and key material are stored for emergency recovery.

Where should repository passwords and keys be stored so recovery is possible?

At minimum, keep secrets out of common user files and chat: use a system secrets store or a file with strict permissions for the service user, plus an offline backup of keys in case the server is lost. Crucially, verify ahead of time that the planned recovery procedure actually works with the stored keys.

What are the first steps after `init` to make backups reliable?

After init, take a first working snapshot without trying to perfect excludes, then add excludes and run another snapshot. Always perform a test restore into a separate folder or VM to ensure S3 access and the password are correct and data can be read.

How to know a backup fits the "window" and what to do if it doesn't?

Estimate daily change size, add a safety margin and compare with real throughput to S3. If you don't fit the window, first reduce volume (excludes, split tasks by time), then limit resource usage (rate limits, lower parallelism), and only then consider a more complex architecture.

How often should recovery tests be run and what to check?

At minimum, restore a small dataset to a separate location monthly and check files open and permissions are correct. Quarterly, run a full scenario for critical systems: restore a DB dump, bring the service up in a test mode and verify it performs key operations.

Restic and Borg backups to S3: setup, encryption, tests

What “without commercial agents” means and why it matters

“Without commercial agents” usually means you don’t install a proprietary vendor agent on every server or workstation and you don’t tie yourself to its license, closed format or separate console. Instead, simple tools like restic or borg are used and data is sent to S3-compatible storage.

In practice this approach is often simpler and more reliable. Fewer moving parts: one binary, clear commands, client-side encryption, ordinary logs. If tomorrow you need to recover on different hardware, you don’t need “that exact” agent and management server. Keys and access to the repository are enough.

There are disadvantages too. Without a single GUI it’s harder to see the overall picture across all nodes and quickly know who hasn't backed up recently. This is usually compensated by discipline and simple workarounds: a common config and repository naming template, centralized log collection with failure alerts, regular scheduled recovery tests, and short reports (what was copied, how long it took, how much data was added).

It’s important to understand the limits. Restic and borg to S3 are great for file data: directories, configs, DB dumps, snapshots of key folders on servers and VMs. Consistent backups of complex applications still require scripts: stop the service correctly or make a dump, verify recovery, and ensure the application actually starts.

This approach fits well if you have one office server, a few branches, a small virtual environment or a fleet of workstations and servers (for example, standard PCs and rack servers), where transparency, control and vendor independence matter.

Basic scheme: what we back up, where and how we protect it

A minimal working scheme for restic and borg backups to S3 has three parts: the data source (server or workstation), the backup repository (where snapshots are written) and S3-compatible storage as the remote target. On top of that there are always two more layers: encryption keys and access rules.

Typically you don’t copy the whole disk, but specific directories and data you actually need to restore: databases, user files, configs, and key application folders. It’s often easier to restore the system from an image or automation and restore the data from backups.

Restic includes encryption by design: the repository is encrypted with a password, and only encrypted blocks are stored on S3. With Borg, encryption depends on the mode chosen during repository initialization, so it’s important to consciously choose an encrypted option once and not keep the repository unencrypted.

A key question: where to store the password and keys. Practical options:

environment variables and a file with permissions restricted to root
a system secret store (if you already use one)
a separate offline medium for emergency access (in case a server is lost)

S3 access should be tightened along two axes: network and permissions. Network-wise, allow connections only from machines that actually perform backups and, where possible, block the rest at the firewall or access policy. For permissions, create separate credentials for backup and split operations:

write (append) for daily work
read-only for checks and recovery tests
a separate admin credential not used in the schedule

Example: in a government office in Kazakhstan you can configure the file server to write to an S3 repository at night, while a separate daytime account only reads and performs a weekly test restore into a temporary folder. Even if the write key leaks, an attacker will have a harder time silently deleting history if deletion and bucket management are available only to an admin key.

Restic or Borg: choosing without extra theory

Restic and Borg solve the same problem: taking “snapshots” so incremental backups use less space. Both support deduplication, compression (in different forms) and client-side encryption, so the storage receives already-encrypted blocks.

The difference is usually not “which is better” but how you will live with the tool daily: where the repository sits, how jobs are run, how you check status and how fast you can restore.

When restic is the simpler choice

Restic often wins where you need S3 backups without extra intermediate layers. It talks to S3-compatible storage “out of the box,” so the scheme is typically straightforward: machine -> S3.

Restic is a good fit if you have:

an S3 repository as the primary target (cloud or on-prem S3)
many identical nodes and want identical commands and logs
minimal environment dependencies and simple installation
clear automation via cron/systemd without complex wrappers
a need to quickly onboard new servers from a template

When borg makes more sense

Borg is convenient when you keep repositories on a dedicated server and fetch data over SSH. This fits a “central backup server + many sources” model, especially if you only use S3 as an external copy via additional synchronization.

Borg suits you if you prefer:

repositories on a filesystem with SSH access
strictly controlled networks without direct S3 access from application servers
a model where one backup host serves many sources

If the team is small, it’s usually better to standardize on one tool. It’s easier to train on-call staff, write unified recovery instructions and avoid confusion over repository formats, keys and check commands.

Preparing S3-compatible storage and access

Before configuring restic and borg to S3, make sure the chosen storage really fits. “S3-compatible” sometimes means “similar to S3” but with nuances that appear at 3 a.m. when the backup window has already closed.

Check provider details that most affect operation: support for AWS Signature v4, correct handling of large objects (multipart upload), request and object size limits, and bucket addressing style (virtual-hosted or path-style). If there is a region-specific endpoint, get it in advance and lock it into settings.

Next, separate resources. Use a dedicated bucket for backups and a dedicated user (or key) just for backups. This reduces accidental deletion risk and simplifies auditing.

Short rights checklist for the backup user:

access limited to a single bucket
read, write, list objects (no account-level admin operations)
deny delete if your access model allows it
separate keys for test and production environments

If versioning is available, enable it for the backup bucket. Versioning helps when objects are deleted or overwritten. Add lifecycle rules so versions don’t grow forever: keep frequent restore points briefly and rare ones longer.

Store access secrets as if they were the password to everything: not in a group chat or notes. Minimum — a file with 600 permissions accessible only to the service user. Better — a system secret store or systemd credentials.

Finally test the network: upload and download a 1–5 GB test file from the server that will run backups (for example, from your rack or a GSE S200-class server). Record actual throughput and latency to determine if copying fits the available window.

Configuring restic: a minimal step-by-step to start

To make restic and borg backups to S3 predictable, start not with commands but with a data list. Mark what you really need to restore (databases, documents, configs) and what can be rebuilt (caches, temp files). That immediately reduces copy time and shrinks the backup window.

Prepare S3 credentials (key/secret, bucket name, path) and choose where to store the repository password. The password is always required: restic encrypts client-side and without the password the backup becomes an unrecoverable archive.

Repository initialization and the first check look like this (substitute your values):

export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export RESTIC_REPOSITORY="s3:s3.example.local/backup-prod"
export RESTIC_PASSWORD="strong_password"

restic -o s3.endpoint=s3.example.local init
restic snapshots

After init make the first full backup. At this stage it’s more important to get a working snapshot than a “perfect” set of excludes:

restic -o s3.endpoint=s3.example.local backup /etc /home /var/lib/app
restic snapshots

Now add excludes so you don’t transfer unnecessary data. Common excludes:

cache directories (for example, */cache*)
temporary files (/tmp, */tmp*)
large build and download directories if they are not needed for recovery
logs, if you have a separate log retention policy

The last mandatory step is a trial restore into a separate folder. This takes minutes but immediately proves that the password, S3 access and data structure are correct:

mkdir -p /restore-test
restic restore latest --target /restore-test
ls -la /restore-test

If the test restore succeeds, you already have a minimal working scheme. Next, lock excludes in a file, remove the password from environment variables and move to scheduling and backup window control.

Configuring borg: a working scheme without surprises

Solution for government organizations in Kazakhstan

We will advise how to build infrastructure on equipment with official status of a domestic manufacturer.

Request

Borg is good for fast incremental file-level backups and convenient restores. Remember: Borg writes the repository like a regular filesystem, so reaching S3 usually requires an intermediate step.

The most reliable options are two. First — a repository on a dedicated SSH gateway (or backup server) with a normal disk, and then the repository directory is copied to S3 by a separate job. Second — mount S3 as a filesystem. The latter is more convenient but more prone to latency, locking and incomplete POSIX behavior. For Borg it’s usually safer to “store locally first, then upload.”

Initialization and encryption

Choose a key mode: the simplest is repokey (key stored in the repository, access protected by a passphrase) or keyfile (key in a separate file that must also be backed up). Example initialization:

export BORG_REPO=ssh://backup@backup-gw/./repos/app1
export BORG_PASSPHRASE='StrongPassphrase'

borg init --encryption=repokey-blake2 "$BORG_REPO"

Agree on clear archive names. Practice: hostname + role + date, so you can quickly find and compare:

borg create --stats --compression lz4 \\
  "$BORG_REPO"::"srv1-app-{now:%Y-%m-%d_%H%M}" \\
  /etc /var/lib/app

Run scheduled jobs via cron or a systemd timer. In a restic/borg + S3 scheme, Borg often handles local increments and the upload to S3 is a separate task after a successful archive creation.

Checking and test restores

Periodically run repository checks: they catch corruption and disk issues before you need a restore.

borg check --verify-data "$BORG_REPO"

Do recovery tests regularly. At minimum: restore one file and a whole directory to a separate folder.

mkdir -p /tmp/restore-test
borg extract "$BORG_REPO"::srv1-app-2026-01-11_0200 etc/app.conf
borg extract "$BORG_REPO"::srv1-app-2026-01-11_0200 --target /tmp/restore-test var/lib/app

Scheduling and controlling the backup window

A good backup runs on a schedule and doesn't interfere with production services. For restic/borg, cron or systemd-timer is usually enough. What matters more than how you run jobs is what and how you check after they run.

How to build a schedule

Start with a simple set of recovery points and only add complexity when there is a real need. A basic scheme usually looks like:

nightly: incremental backup of data and configs
weekly: a “control” copy plus repository integrity check
monthly: a separate point you won’t delete early
before changes: an ad-hoc backup before upgrades or migrations

This covers everyday mistakes (deleted files) and rare issues (failure after an update).

How to keep the window under control

The backup window is the time when copying impacts disks, network and S3. Assess it once and revisit after data growth: take the daily changed data size, multiply by 2 (margin), and compare with real upload speed and deduplication time.

If backups interfere with production, limit the load rather than simply postponing. Practical measures:

limit speed: restic has upload/download limits, borg has remote-ratelimit
reduce parallelism: fewer threads often give more predictable load
split tasks: copy databases and large directories at different times
exclude unnecessary data: caches, temp files and logs that can be restored another way

Success is not “something wrote to the log.” Treat only an exit code of 0 as success: restic - 0, borg - 0 (borg exit code 1 usually means “warnings present” and should be investigated).

Failures should alert automatically. Minimum set: an email to the on-call, a message to a workplace messenger or an automatic ticket. For example, if the nightly accounting server backup missed its window, the alert should arrive in the morning before work starts, not at the moment recovery is urgently needed.

Retention, storage and integrity checks

Backup and access audit

We will audit permissions, keys and processes so backups can actually be restored.

Order

Backup storage quickly becomes cluttered without simple rules: how many copies to keep, when to delete and how to ensure archives are readable. For restic and borg to S3 this is especially important since S3 is billed for volume and operations.

Retention rules: how long and why

Start from business and regulatory requirements, then map them to a clear scheme. A typical combination of frequent-short and rare-long points works well.

Daily: 7–14 days to cover accidental deletions and quick rollbacks.
Weekly: 4–8 weeks to catch late-discovered issues.
Monthly: 6–12 months for reporting, audits or seasonal data.
“Before changes”: a snapshot before upgrades, migrations and patches.

Both restic and borg have built-in retention mechanisms. One important thing: retention rules should be identical for all servers of the same type, otherwise forecasted volumes become unpredictable.

Deletion policy and protection from mistakes

Delete only through the safe commands of the tool, not manually in the bucket. Before enabling automatic prune/compact, run 1–2 weeks in “report only” mode so the job reports what would be removed.

Integrity checks should run on a schedule. Typically:

light metadata checks weekly at night or on weekends
a deeper full check monthly in a separate window when load is minimal

S3 costs and the role of deduplication

Deduplication in restic and borg reduces storage growth when data has many repeats (shared files, profiles, mail stores). The first full backup will still be large; subsequent growth depends on daily changes. Monitor two metrics: total repository size and "how much was added per day." This quickly reveals if someone started backing up temp files or logs without rotation.

Simple change log of settings

Even without complex systems, keep a short log of changes:

date and reason (e.g., excluded a directory)
who approved it
what changed in schedule/retention
expected effect on backup window and volume

This helps when investigating incidents: “why were old copies deleted” or “why did integrity checks double in time.”

Regular recovery tests: how to do them systematically

A backup without recovery is not a backup. Until you try to restore data and run the service, you’ve only verified that “something was written somewhere.” A lost encryption key, wrong permissions, a broken increment chain or a forgotten repository password often appear only on the day of an incident.

To avoid tests becoming a one-off checkbox, create a simple calendar: monthly restore of one file or a small folder, and quarterly run a full scenario. This is equally important for restic, borg and S3-based variants where access policies and storage behavior add risk.

Where and how to restore

Avoid restoring over live data. Use a separate VM, a test folder on a spare disk, or a small lab. In an office, this can be a test virtual machine on a rack server (for example, a rack-class S200) with user access disabled so results aren’t accidentally modified.

A typical test cycle:

choose a “reference” (file, project folder, DB dump) and a recovery date
restore to a test location and compare size, file counts and a few checksums
verify permissions (owners, groups, modes) so applications can read data
start the application or service in test mode and validate a key workflow (login, report, search)
delete test data or archive the test result

After each test record at least: elapsed time, volume restored, exact commands and steps, result (success/partial/failure) and reason if something broke. Also note whether you met the allowed recovery window and whether network throughput or storage limits were a blocker.

If a test fails, act the same day: fixes should be made immediately, otherwise temporary workarounds become permanent by the next incident.

Common mistakes and traps in setup

The most painful error with encryption is losing the password or key and discovering it only during an incident. For restic and borg this means data exists in the repository but cannot be read. A minimum saving measure: store secrets in a password manager, have an emergency sealed offline container and assign a responsible person who knows where the recovery plan lives.

A logical but dangerous trap is backing up “nearby,” i.e., to the same server, same RAID or the same cloud account. On compromise, admin error or storage failure both primary and backup copies can disappear. For restic and borg to S3, separate at least access levels and, where possible, accounts.

Too-broad S3 permissions are a frequent problem. If a key can delete buckets and object versions, a bad script or key compromise can empty your backup. Give a key only what it needs: write to a specific bucket, read for checks and minimal delete rights if client-side rotation is used.

Another class of errors is exclude masks. You think you’ve excluded temp files, but you actually stop backing up a DB, an attachments folder or an app key directory after a path change. Rule: after each change to excludes, run a trial and inspect what ended up in the snapshot.

Finally, backups often “fail silently”: jobs fail night after night and nobody notices. Check yourself with a short list:

alerts for failed runs and for exceeded runtime
visibility of the last successful backup and its size (to catch sudden “0 MB” runs)
rotation schedule and expiry for S3 credentials with a clear rotation procedure
scheduled repository integrity checks
monthly test restore to a folder or VM

Simple example: after an app update the data folder moved from /var/lib to /srv but the excludes stayed the same. Backups keep “succeeding” but there’s nothing to restore. These issues are caught only by monitoring sizes and regular recovery tests.

Example scenario: office infrastructure with S3 as an external copy

24/7 service and support

We will help organize support and maintenance of equipment at sites across Kazakhstan.

Clarify

Imagine a typical office with 15–30 users. There’s a file server with shared folders (contracts, scans, projects), a separate machine for 1С and a database, plus several important application directories (configs, reports). The goal is simple: if a disk fails, a server breaks or someone deletes a folder, data must be returned quickly without hunting for laptops.

RPO and RTO should be stated plainly. RPO — how much data you can afford to lose: e.g., up to 1 hour of accounting work and 4 hours for general folders. RTO — how quickly you must recover: e.g., file access within 2 hours, 1С within 4–6 hours (at least on a temporary stand).

For this case restic and borg to S3 fit well: local snapshots on a fast disk for quick rollbacks and an external copy in S3-compatible storage for fire, theft or ransomware scenarios.

Typical schedule:

night: a full run (or logically full with deduplication) when load is minimal
daytime: short deltas every 1–2 hours for the most critical data
before day close: a separate dump of the 1С database and immediate send to the external repository

Make recovery tests part of routine. Weekly restore a sample: a couple of document folders and a random deep file. Monthly bring up a full stand: deploy a server (on separate hardware or a VM), restore the database and verify 1С opens.

Simple metrics that show things are OK:

age of the last successful backup (hours) for files and DB
backup duration and whether it fits the night window
percentage of successful jobs in 7 days (without manual restarts)
actual test restore time (minutes until the required file/DB is available)

Next steps: lock it in and scale

Before moving restic and borg to production, tidy the small details. Those are the things that usually break recovery at the worst moment.

Pre-launch checklist

Run this short list before enabling a full schedule:

access: separate backup keys, minimal permissions, a clear repository name
encryption: password/keys stored off the server and a backup of key material
recovery test: at least once restored files into an empty folder and verified they open
backup window: you know how long full and incremental runs take and they fit night/weekend windows
logs and alerts: you can see when the last successful run occurred and how much data was transferred

If the recovery test is only “verbal,” that’s not a backup, it’s hope.

How to scale without chaos

When backups cover multiple servers and roles (DBs, files, VMs), decide in advance how responsibilities and repositories are divided. Common models:

one repository per server (easier to search and restore specific hosts)
one repository per role (e.g., databases separate from file data)
separate repositories by access level (so app admins don’t have keys to critical copies)

Then it’s a matter of resources. Add a dedicated backup server when night windows stop fitting, production load grows, or you want a local buffer before S3 upload. Hardware upgrades make sense when compression/deduplication consumes CPU and integrity checks start affecting production.

To make backups a procedure, include them in change management (what and how is protected before a release), incident playbooks (who and how initiates recovery) and audits (test intervals, log retention, access control to keys).

If you need help with infrastructure, GSE.kz has experience in system integration. For backup servers or storage nodes, resilient GSE S200 servers may fit, and above all it’s important to agree on a clear SLO for recovery and regular checks.