Diagnostic package for RMA: SupportAssist and HPE AHS
How to collect a diagnostic package for RMA: what to export from iDRAC SupportAssist and HPE Active Health System, and which data to include to speed up replacements.

Why prepare diagnostics for RMA in advance
RMA (Return Merchandise Authorization) is the process of replacing or repairing equipment under warranty. In practice it rarely starts with a simple request to “just replace the part.” Manufacturers almost always ask for facts: what happened, under which conditions, and whether there is evidence in the logs. Without a diagnostic package, the correspondence can stretch for days because support must be convinced by words rather than data.
On first contact they typically ask the same things: exact model and serial number, when the error appeared and whether it repeats, what changed before the incident (firmware, settings, load), whether there were reboots or cable/board swaps, whether a diagnostic export (SupportAssist or HPE AHS) exists, and screenshots of the errors.
It's important to distinguish symptom from cause. A symptom is what the admin sees: “server powered off,” “RAID degraded,” “fan at 100%.” The cause is often hidden in telemetry and controller logs: power, memory, temperatures, management events, disk and HBA/RAID errors. The value of a diagnostic package is that it shows the chain of events leading up to the failure, not just the final result.
Start collection as early as possible—before reboots, firmware rollbacks and “swap tests.” Any intervention can erase context: some logs are circular and get overwritten, and controller state changes. If diagnostics are captured early, it’s easier to agree on replacement quickly and order the correct spare part rather than guessing from a description.
SupportAssist and AHS: what they are and how they differ
SupportAssist and HPE Active Health System (AHS) solve the same problem: quickly gather a clear picture of a failure for the vendor so the RMA doesn’t turn into a week-long email thread. These are ready-made diagnostic archives taken from the server management controller that show what happened to the hardware.
iDRAC SupportAssist is for Dell servers. It exports hardware events and component status: memory, disk and controller errors, power, temperatures, iDRAC logs and firmware info.
HPE Active Health System (AHS) is used on HPE servers. It stores telemetry and equipment/firmware logs, plus a history of configuration changes. This helps understand what changed before the problem appeared.
Key practical differences
Both packages are useful but differ in depth and format, so support will ask different follow-up questions.
- SupportAssist is often seen as a "state snapshot + iDRAC logs" for a specific incident.
- AHS is stronger on "history": it shows sequences of events and configuration changes over time.
- For Dell, a Service Tag is usually important; for HPE, the Serial Number and iLO data are key.
Sometimes one package is not enough. For example, the server is "up" but an application crashes and hardware logs are silent.
When OS logs are needed and why screenshots are requested
Windows/Linux logs are usually requested if the error looks software-related or borderline: driver, filesystem, network, monitoring agent failures. In that case, system logs and kernel messages are added to the hardware diagnostics.
Screenshots (or photos of the front panel) are requested for a simple reason: they capture what is visible right now—error codes, drive status, warnings. This avoids vague descriptions like “it was blinking orange.” A photo with the code on the LCD panel plus the diagnostic archive often moves the case from “need more details” to “preparing replacement.”
Data to note before exporting logs
Before collecting diagnostics, record basic server and error details. Support often needs these immediately. If they’re missing, you’ll be asked and the process will be delayed.
-
Device identifiers. For Dell this is the Service Tag; for HPE, the Product ID and serial number. Record them exactly as shown in the management interface or on the nameplate—no abbreviations or "roughly".
-
Model and configuration. “Server won’t boot” is not enough. It matters which CPU, how much RAM, what disks (type and capacity), which RAID/HBA controller, and which network cards. If the issue is related to disks or a controller, this covers half the questions.
-
Firmware versions. Usually BIOS, BMC (iDRAC or iLO), RAID/HBA and NIC firmware are enough. Replacements often go faster when it’s clear the issue is not caused by obviously outdated microcode.
-
Symptom and time. Record the exact error text and the time it occurred (include the time zone). Example: “Predictive failure” on a specific disk at 03:17, after a reboot.
If you need a one-screen note, keep this minimum next to the logs:
- serial number and Service Tag/Product ID;
- model and key configuration (CPU, RAM, disks, controller);
- BIOS, iDRAC/iLO, RAID/HBA and NIC versions;
- error text and exact time;
- what happened before the failure (updates, moves, power interruptions).
Quick prep: what to do in 10 minutes before collecting
Before exporting logs, capture the server state so support can match errors to time and context.
Start by checking the controller’s clock (iDRAC or iLO). If the time is off or the wrong time zone is set, events in the logs won’t match what you saw.
Then act concisely:
- Verify date, time and time zone on the management controller.
- Record current indicators: which LEDs are lit, what the screen shows, any POST/boot codes.
- Open the system and hardware event logs and note the last 2–3 errors (time, code, short description).
- Do not reset controllers or rollback settings before exporting: a reset often "washes away" useful context.
- If the server is unstable, collect diagnostics first, then run experiments.
Example: a server randomly reboots every 20–30 minutes. The most useful actions are to quickly photograph indicators and the screen, record the latest log entries and immediately start SupportAssist or AHS collection. If you begin manually rebooting to “check,” logs will contain noise and the needed segment may be lost.
Step-by-step: how to collect an iDRAC SupportAssist package
SupportAssist in iDRAC is convenient because it bundles logs, hardware state and some configuration into a single archive. This reduces the chance support will ask for extra files.
-
Log into the iDRAC web interface (usually a separate management IP).
-
Find the SupportAssist or Collection section (names vary by iDRAC version).
-
Choose full collection, not minimal. A full archive usually includes event logs, hardware sensors, storage/controller details and basic configuration.
-
If you can choose a period, set the interval that covers the issue. 7–30 days is often enough. If the clear failure was last night but warnings have been appearing for a week, pick a wider range.
Before sending, check:
- the collection completed without errors;
- the archive downloaded and opens;
- the file is named clearly: date, server name, short issue (for example, 2026-01-27_SRV-DB01_PSU-fault.zip);
- screenshots/photos of current warnings were saved if needed.
If possible, include a Lifecycle Log export and a screenshot showing the specific message (for example, Power Supply Failure or controller error). The log shows history; the screenshot documents the current status.
Step-by-step: how to collect HPE Active Health System (AHS)
AHS is a diagnostic archive that helps HPE support quickly see what happened around a failure. Collect it as close to the event as possible. Even if the server has been rebooted, capturing the period before and after the event is important.
Collecting AHS from iLO
-
Open the iLO web interface.
-
Find Active Health System (usually under Information or Diagnostics).
-
Select a date range. A practical choice is 24 hours before the failure and 24 hours after. If you know the exact time, sometimes 1–2 hours before and 4–6 hours after is enough.
-
Start the export and wait for completion. Save the file locally without unpacking.
-
Check the archive size. If it’s suspiciously small (a few kilobytes), the export may have been interrupted—repeat it.
Support often asks separately for the Integrated Management Log (IML) and a system summary (System Information, Firmware, Storage, Network) if these are available as separate iLO reports.
To avoid confusion, use a consistent naming template: hostname, serial number (if needed), date, ticket number. Example: SRV-ALM01_2026-01-27_TKT123_AHS.zip and SRV-ALM01_2026-01-27_TKT123_IML.txt.
As a guideline: for a night reboot at 03:12, set roughly 02:00–08:00 so AHS will include preceding warnings and subsequent effects (array rebuilds, disk errors, power events).
What to attach to avoid being asked again
Even with SupportAssist or AHS, engineers often need a couple of simple confirmations. These help quickly tie logs to the onsite reality and avoid repeated requests.
The most useful item is one clear screenshot or photo: POST message, RAID warning, BSOD or kernel panic. The image shows the exact code and phrasing.
The second frequent question is about the disk subsystem. Archives may include general telemetry, but separate screenshots from the RAID utility/BIOS help determine if the array was degraded, which disk showed errors, and whether a rebuild was interrupted.
A minimal kit that usually prevents follow-up requests:
- photo/screenshot of the error with time reference;
- RAID summary (array state, problem disks, rebuild status);
- 3–5 lines about recent changes (disk replacement, firmware updates, rack move, power reconfiguration);
- signs of power/cooling issues from logs;
- if the failure repeats, a clear reproduction scenario.
Write the scenario as a short instruction, not a story. For example: “After a cold boot, RAID shows a warning during initialization, then the server reboots after 2–3 minutes.” If a specific action reliably triggers the failure (starting a backup, heavy disk I/O, powering on a particular VM), state it.
Common mistakes that delay RMA
Delays usually come from lack of evidence to confirm the root cause, forcing support to ask you to re-create the archive, clarify details or schedule another downtime.
Mistakes that ruin diagnostics
- Collecting an incomplete archive or choosing too short a period so the event is not in the export.
- Rebooting or resetting iDRAC/iLO before collection: some counters and events may be cleared.
- Mixing up servers and files: several identical support.zip files without labels or logs from another host.
- Not providing the exact incident time and symptoms: “server crashed” without date/time leaves support without a reference point.
- “Cleaning” logs: removing lines, unpacking and re-saving files. This raises questions about integrity. Better add an explanatory note as a separate text file.
Imagine a rack with two identical servers. One rebooted during the night. In the morning someone sends an archive without hostname or time and then resets the controller “just in case.” Support can’t correlate events and returns the case for clarification.
How to reduce the risk of returns
- Rename the archive: model + serial number + date.
- Record the incident time (with time zone) and 1–2 sentences about the symptoms.
- Note what changed before the incident (updates, disk replacement, a move, power reconfiguration).
- Don’t reset anything before collection. If you already did, state that clearly in your comments.
Short checklist before sending to support
Spend 2 minutes to verify the package—this significantly reduces the chance you’ll be asked to re-collect logs or re-check dates.
- Model and unambiguous identifier (Service Tag/serial number) recorded in text.
- Incident time noted: date, time, what happened and which error was shown.
- SupportAssist or AHS collected fully and covers the needed period.
- Screenshot/photo available if the error is visible on a screen, iDRAC/iLO or front panel.
- Files clearly named and placed in a single archive, not a scatter of attachments.
Real example: how one package saved days
A server at a branch rebooted several times overnight; in the morning the admin saw a power warning. One PSU LED was blinking, but by midday the indicator had returned to normal and it was unclear whether the cause was PSU, power feed, overheating or firmware.
The most common mistake is to reboot, update firmware or reset the controller before investigating. After such actions, traces disappear and support must ask for another observation window.
In this case the admin did the opposite: they recorded exact reboot times and captured SupportAssist (for Dell) or AHS (for HPE) before any resets. They also photographed the PSU indicator and took a screenshot of events around the time window.
They added three things to the package: a list of recent operations (cable replacement, rack move, BIOS update), the current power topology (two PSUs, a single feed, presence of UPS) and a one-paragraph symptom summary. The engineer then saw the chain: a power event, input voltage sag, repeated restarts and the specific PSU slot. Replacement was approved quickly without further "please clarify" messages.
If the server is critical and you can’t wait for a part, temporarily reduce risk: shift load to a backup, check connector seating and power cable condition, ensure both PSUs are on separate lines/UPS, and record repeated events without unnecessary resets.
Next steps: how to organize RMA with minimal downtime
After exporting SupportAssist or AHS the goal is simple: make sure the vendor has all inputs so replacement happens in a convenient window with no surprises onsite.
Assemble a single package to submit and keep it together as one file or folder: diagnostic archive, short problem description (what happens, how often, since when), system identifiers, contact of the onsite person during the work, and access limitations (where the equipment is, how to enter the server room, escort rules).
Preselect a maintenance window when services can be stopped or moved to a standby system to perform the swap, run basic tests and then return the load and check logs. Also agree who will run post-replacement checks: memory test, RAID verification, network port validation, sensor monitoring.
If your team lacks time for initial diagnosis and organization, integrator support helps. For example, GSE.kz (gse.kz) provides 24/7 technical support and a service network—handing them collected logs and facts can speed up the RMA process.
A good benchmark: if the vendor can open a case, understand the situation and plan work without extra questions, the RMA proceeds noticeably faster.
FAQ
Why prepare a diagnostic package for RMA in advance?
To avoid wasting days on back-and-forth. Manufacturers almost always require evidence from logs and telemetry, not just a verbal description. A ready diagnostic package immediately shows what happened to the hardware before the failure and speeds up repair or replacement decisions.
When is the best time to collect SupportAssist or AHS?
Collect it as soon as you detect the symptom and, if possible, before any reboots, resets or firmware rollbacks. Logs are often circular and can be overwritten, and controller state changes after interventions. The closer the collection is to the incident, the higher the chance the archive contains the needed chain of events.
What is the difference between iDRAC SupportAssist and HPE Active Health System (AHS)?
SupportAssist is the archive for Dell servers; it usually includes iDRAC events, sensors, component errors and firmware details. HPE Active Health System (AHS) is for HPE servers and is especially useful because it stores a history of events and configuration changes over time. Both aim to give support a clear "picture," but they differ in format and depth of data.
What data should I record before exporting logs?
Record the exact model, identifier (Service Tag for Dell or serial number/Product ID for HPE), the incident time with time zone, and the exact error text. Add key configuration (CPU, RAM, disks, RAID/HBA, NICs) and firmware versions (BIOS, iDRAC/iLO, RAID/HBA, NIC). This covers the most common follow-up questions on the first try.
Why is it important to check the time on iDRAC/iLO before collecting diagnostics?
Always verify date, time and time zone on the iDRAC or iLO. If the controller’s clock is off, events in the archive won’t match what you saw in the OS or monitoring, and analysis will be delayed. If you find a mismatch, note it explicitly in your case comments.
How to choose the time range when collecting SupportAssist or AHS?
Choose an interval with margin: at least a few hours before the event and a few hours after. For ongoing warnings, take several days or weeks. Often the failure looks like a single event, but the logs show a gradual degradation (disk, power, temperature). If the archive is too short, the needed event may simply be missing.
When are OS logs needed if I already have SupportAssist or AHS?
When you suspect a driver, filesystem, network, monitoring agent or application crash without hardware errors, hardware logs alone may be insufficient. In that case attach Windows/Linux system logs and kernel messages alongside SupportAssist/AHS. That helps separate hardware and software causes and avoids unnecessary parts replacement.
Why does support ask for screenshots or photos of indicators?
A screenshot or photo captures the exact error code and phrasing at the moment: POST, RAID menu, iDRAC/iLO or front panel. Descriptions like "was blinking orange" are ambiguous, while an image removes doubt. Together with the diagnostic archive, this often moves the case straight to concrete actions.
Which mistakes most often prolong RMA?
Common issues that delay RMA are too short or incomplete collections, and rebooting or resetting iDRAC/iLO before exporting. Another frequent problem is mixing up files from different servers and not providing the exact incident time. Do not unpack or edit archives; if you already did, add a clear note explaining what changed.
How to prepare the case so it won't be returned for clarifications?
Put everything into one clear package: the SupportAssist/AHS archive, a short description of the symptom and frequency, exact time, server identifiers and contact details of the onsite person. Name files so they can't be confused, e.g. include date and hostname. If you need help with initial diagnostics and organizing work, you can hand the collected logs to GSE.kz (gse.kz) to speed up agreement and prepare the replacement in a suitable window.