Calculator tool
How this calculator works
Use the explanation to understand the formula, assumptions, and practical limits behind the calculator result.
The Core Formula
Daily raw ingest is EPS multiplied by average event size multiplied by seconds in a day, then scaled by the indexing overhead factor.
Where:
- = daily ingest in gigabytes
- = events per second
- = average compressed event size in bytes
- = indexing and storage multiplier, where 1.35 means 35% overhead
- = retention storage in terabytes
- = retention days
Worked Example — Mid-Size Enterprise
Environment: 500 endpoints, 20 servers, 5 network devices, 2 cloud environments
| Log source | Typical EPS | Avg event size |
|---|---|---|
| Windows endpoints (500) | 1,000 | 800 bytes |
| Linux servers (20) | 300 | 600 bytes |
| Firewall / IDS (5) | 800 | 500 bytes |
| Cloud (AWS/Azure) | 400 | 700 bytes |
| Total | 2,500 | ~650 bytes |
With , , , and :
At 342/month**.
EPS Benchmarks by Log Source
| Source type | Low EPS | Typical EPS | High EPS |
|---|---|---|---|
| Windows DC (per server) | 50 | 200 | 800 |
| Windows workstation (per host) | 1 | 3 | 10 |
| Linux server (per host) | 5 | 15 | 100 |
| Palo Alto / Fortinet firewall | 100 | 500 | 5,000 |
| IDS/IPS sensor | 200 | 1,000 | 10,000 |
| Web application (per node) | 10 | 100 | 2,000 |
| AWS CloudTrail (per account) | 5 | 50 | 500 |
Multiply per-host EPS by the device count in your environment. For planning, add 30–50% headroom for incident response spikes and new source onboarding.
Indexing Overhead
Raw EPS is not what your SIEM actually stores. Vendors add metadata, field extractions, and index structures:
| Factor | Effect on stored volume |
|---|---|
| Field extraction (parsing) | +20–40% |
| Index structures | +15–25% |
| Compression | −40–70% (depends on log type) |
| Net typical overhead | +10–50% |
Splunk typically runs at 1.5–2× raw; Elastic with compression closer to 1.0–1.3×. Use the overhead field to enter your vendor's observed ratio from a pilot deployment.
Storage Tiers
Not all retention data needs the same access speed or cost tier:
| Tier | Access time | Relative cost | Typical use |
|---|---|---|---|
| Hot (SSD/SAN) | <1 second | High | Last 7–30 days, active investigation |
| Warm (spinning disk) | Seconds | Medium | 30–90 days, routine queries |
| Cold (object store, S3/GCS) | Minutes | Very low | 90–365+ days, compliance archive |
A tiered architecture — hot for recent data, cold for compliance — can reduce storage cost by 60–80% compared to keeping everything on SSD. Verify your SIEM supports tiered storage before designing for it.
Frequently asked questions
Should I use peak or average EPS?
Use average EPS for storage sizing and peak EPS for ingest pipeline capacity planning.
Storage is determined by total volume over time — average EPS × seconds is the right metric. But your ingest pipeline (log forwarders, collectors, message queues) must handle peak load without dropping events. Peaks during incident response, scans, or business hours can be 5–10× the average EPS. Size ingest capacity for peak; size storage for average.
How do I find my actual EPS?
If you have an existing SIEM or log management system, query the events-per-second metric from its monitoring dashboard. Most SIEMs expose EPS as a built-in operational metric.
For a new deployment, run your log sources into a sample pipeline for one week — ideally including a business day, weekend, and a period of elevated activity. Divide total events by total seconds to get average EPS. This sample is far more reliable than a theoretical estimate and will also reveal event size averages from real log data.
What retention period should I plan for?
Start with your legal, regulatory, customer, and internal policy requirements. Do not assume one retention period fits every organization or every log type.
Useful planning anchors:
- PCI DSS commonly requires audit log history for at least 12 months, with at least the most recent 3 months immediately available for analysis.
- GDPR-style storage limitation does not set one fixed security-log period; personal data should be kept only as long as necessary for the purpose and legal basis.
- SOC 2, ISO 27001, and internal security policies usually depend on the controls, contracts, risk assessment, and auditor expectations.
From a threat-hunting perspective, many teams keep 30–90 days searchable and move older logs to cheaper retention. Confirm the final retention design with compliance, legal, and security leadership.
How does SIEM compression affect the storage estimate?
Log data compresses very well — text-based formats (syslog, JSON, CEF) typically achieve 70–85% compression. However, SIEM vendors store much more than raw events: parsed fields, inverted indexes for fast search, correlation state, and metadata.
The net result varies significantly:
- Splunk SmartStore: expect 1.5–2× raw event volume after indexing
- Elastic (ECS + ILM): with compression enabled, 1.0–1.4× raw
- Microsoft Sentinel (Log Analytics): ~1.2× raw for most log types
Run a pilot with 1–2 representative log sources before committing to a storage architecture. Vendor-provided sizing tools also provide starting estimates but tend to be conservative.