IT operations

SIEM Log Volume Estimator

Enter your events per second, average event size, retention period, and indexing overhead to get daily ingest in GB and total retention storage in TB. Adjust the overhead multiplier for your SIEM vendor's observed compression ratio. Use the result as a planning baseline — validate against a sample week of actual data before purchasing long-term storage.

Last reviewed May 14, 2026 by ToolSpilo Editorial Team.

Review method: Reviewed against Microsoft Sentinel billing guidance, Splunk sizing/performance guidance, PCI DSS log-retention references, and GDPR storage-limitation guidance. Existing EPS, storage, and tier tables preserved.

Calculator tool

How this calculator works

Use the explanation to understand the formula, assumptions, and practical limits behind the calculator result.

The Core Formula

Daily raw ingest is EPS multiplied by average event size multiplied by seconds in a day, then scaled by the indexing overhead factor.

Where:

  • DD = daily ingest in gigabytes
  • EE = events per second
  • SS = average compressed event size in bytes
  • OO = indexing and storage multiplier, where 1.35 means 35% overhead
  • RR = retention storage in terabytes
  • TT = retention days
D=E×S×86,400×O109D = \frac{E \times S \times 86{,}400 \times O}{10^9}
R=D×T1,000R = \frac{D \times T}{1{,}000}

Worked Example — Mid-Size Enterprise

Environment: 500 endpoints, 20 servers, 5 network devices, 2 cloud environments

Log sourceTypical EPSAvg event size
Windows endpoints (500)1,000800 bytes
Linux servers (20)300600 bytes
Firewall / IDS (5)800500 bytes
Cloud (AWS/Azure)400700 bytes
Total2,500~650 bytes

With E=2,500E = 2{,}500, S=650S = 650, O=1.35O = 1.35, and T=90T = 90:

D=2,500×650×86,400×1.35109190 GB/dayD = \frac{2{,}500 \times 650 \times 86{,}400 \times 1.35}{10^9} \approx 190 \text{ GB/day}

R=190×901,00017.1 TBR = \frac{190 \times 90}{1{,}000} \approx 17.1 \text{ TB}

At 0.02/GBcloudobjectstorage,90dayretentioncostsroughly0.02/GB cloud object storage, **90-day retention costs roughly 342/month**.

EPS Benchmarks by Log Source

Source typeLow EPSTypical EPSHigh EPS
Windows DC (per server)50200800
Windows workstation (per host)1310
Linux server (per host)515100
Palo Alto / Fortinet firewall1005005,000
IDS/IPS sensor2001,00010,000
Web application (per node)101002,000
AWS CloudTrail (per account)550500

Multiply per-host EPS by the device count in your environment. For planning, add 30–50% headroom for incident response spikes and new source onboarding.

Indexing Overhead

Raw EPS is not what your SIEM actually stores. Vendors add metadata, field extractions, and index structures:

FactorEffect on stored volume
Field extraction (parsing)+20–40%
Index structures+15–25%
Compression−40–70% (depends on log type)
Net typical overhead+10–50%

Splunk typically runs at 1.5–2× raw; Elastic with compression closer to 1.0–1.3×. Use the overhead field to enter your vendor's observed ratio from a pilot deployment.

Storage Tiers

Not all retention data needs the same access speed or cost tier:

TierAccess timeRelative costTypical use
Hot (SSD/SAN)<1 secondHighLast 7–30 days, active investigation
Warm (spinning disk)SecondsMedium30–90 days, routine queries
Cold (object store, S3/GCS)MinutesVery low90–365+ days, compliance archive

A tiered architecture — hot for recent data, cold for compliance — can reduce storage cost by 60–80% compared to keeping everything on SSD. Verify your SIEM supports tiered storage before designing for it.

Frequently asked questions

Should I use peak or average EPS?

Use average EPS for storage sizing and peak EPS for ingest pipeline capacity planning.

Storage is determined by total volume over time — average EPS × seconds is the right metric. But your ingest pipeline (log forwarders, collectors, message queues) must handle peak load without dropping events. Peaks during incident response, scans, or business hours can be 5–10× the average EPS. Size ingest capacity for peak; size storage for average.

How do I find my actual EPS?

If you have an existing SIEM or log management system, query the events-per-second metric from its monitoring dashboard. Most SIEMs expose EPS as a built-in operational metric.

For a new deployment, run your log sources into a sample pipeline for one week — ideally including a business day, weekend, and a period of elevated activity. Divide total events by total seconds to get average EPS. This sample is far more reliable than a theoretical estimate and will also reveal event size averages from real log data.

What retention period should I plan for?

Start with your legal, regulatory, customer, and internal policy requirements. Do not assume one retention period fits every organization or every log type.

Useful planning anchors:

  • PCI DSS commonly requires audit log history for at least 12 months, with at least the most recent 3 months immediately available for analysis.
  • GDPR-style storage limitation does not set one fixed security-log period; personal data should be kept only as long as necessary for the purpose and legal basis.
  • SOC 2, ISO 27001, and internal security policies usually depend on the controls, contracts, risk assessment, and auditor expectations.

From a threat-hunting perspective, many teams keep 30–90 days searchable and move older logs to cheaper retention. Confirm the final retention design with compliance, legal, and security leadership.

How does SIEM compression affect the storage estimate?

Log data compresses very well — text-based formats (syslog, JSON, CEF) typically achieve 70–85% compression. However, SIEM vendors store much more than raw events: parsed fields, inverted indexes for fast search, correlation state, and metadata.

The net result varies significantly:

  • Splunk SmartStore: expect 1.5–2× raw event volume after indexing
  • Elastic (ECS + ILM): with compression enabled, 1.0–1.4× raw
  • Microsoft Sentinel (Log Analytics): ~1.2× raw for most log types

Run a pilot with 1–2 representative log sources before committing to a storage architecture. Vendor-provided sizing tools also provide starting estimates but tend to be conservative.