Active
Real-Time Log Aggregator
A high-throughput log aggregation pipeline processing 500K events/sec with sub-second search latency.
Overview
A distributed log aggregation system built in Rust that ingests, transforms, and indexes log events at scale. Designed to replace a legacy ELK stack that couldn't keep up with growing traffic.
Problem
The existing logging infrastructure had several pain points:
- Ingestion lag — during traffic spikes, Logstash would fall behind by minutes, making real-time debugging impossible.
- High resource usage — the JVM-based pipeline consumed 64 GB of RAM per node.
- Schema drift — unstructured logs made it hard to build reliable dashboards.
Solution
A purpose-built pipeline with three stages:
- Collector — lightweight Rust agent (< 10 MB memory) that tails log files, parses structured fields, and publishes to Kafka.
- Transformer — stream processing layer that enriches events (geo-IP, user lookup), normalizes schemas, and routes to appropriate indices.
- Query API — TypeScript/Node.js service with a search UI supporting full-text search, field filtering, and time-range queries.
Architecture
- Kafka provides durable, partitioned message transport
- Elasticsearch handles indexing and search
- The Rust collector uses zero-copy parsing for minimal overhead
Key Learnings
- Rust's ownership model makes concurrent I/O safe and fast — zero data races in production over 18 months.
- Schema enforcement early in the pipeline prevents downstream chaos. We added a schema registry backed by Protobuf definitions.
- Sampling beats dropping. When overloaded, sampling 10% of debug logs is better than dropping them entirely — you keep statistical visibility.
Tech Stack
- Rust 1.75 (tokio async runtime)
- Apache Kafka 3.6
- Elasticsearch 8
- TypeScript, React (query dashboard)
- Docker, Kubernetes
- Prometheus, Grafana for monitoring
Was this helpful?