DataLens is a local-first log/event analytics engine built with Java 21, Spring Boot 4, and PostgreSQL. It ingests high-volume synthetic events (100K to 5M rows), serves real-time analytics APIs (error rate, top IPs, p95 latency, sliding windows), and includes benchmark + EXPLAIN tooling to compare query/index strategies in a reproducible way.
- High-volume ingestion with two modes:
- JDBC batch insert (fast path)
- JPA batch insert (reference path)
- Analytics endpoints for operational and security signals:
- Error-rate time series
- Top suspicious IPs
- Top endpoints with average latency
- P95 latency time series
- Sliding-window error analytics
- Index profile manager for performance experiments:
baselineprofile (minimal index)optimizedprofile (analytics index set)
- Benchmark subsystem:
- Repeated timings for scenarios
- Mean / p95 / min / max metrics
- Benchmark run persistence in
benchmark_runs
- EXPLAIN ANALYZE endpoint for query plan visibility
- Integration tests with Testcontainers PostgreSQL (auto-skips if Docker unavailable)
- GitHub Actions CI workflow (
mvn test+ optional gitleaks scan)
flowchart LR
Client[Client / curl / Postman] --> API[Spring Boot API]
API --> A[AnalyticsController]
API --> D[DevController]
D --> I[EventIngestionService]
D --> IDX[IndexManagementService]
D --> B[BenchmarkService]
A --> Q[AnalyticsService]
I --> G[SyntheticLogEventGenerator]
I --> PG[(PostgreSQL)]
Q --> PG
IDX --> PG
B --> PG
B --> BR[(benchmark_runs)]
- Java 21
- Spring Boot 4.0.3
- Maven
- PostgreSQL 16 (Docker Compose)
- Spring Data JPA + JdbcTemplate
- Flyway (schema/index migration ownership)
- Testcontainers (integration tests)
- Start PostgreSQL:
docker compose up -d- Start the app:
./mvnw spring-boot:runThe app listens on http://localhost:8081.
docker-compose.yml provisions:
- image:
postgres:16 - db:
datalens - user/password:
datalens/datalens - port mapping:
5433:5432 - persistent volume:
datalens_pgdata
Seed 100K events via JDBC fast path:
curl -X POST "http://localhost:8081/api/dev/seed?n=100000&mode=jdbc&days=14"Seed 250K via JPA batch mode:
curl -X POST "http://localhost:8081/api/dev/seed?n=250000&mode=jpa&days=30"Example response:
{
"requested": 100000,
"inserted": 100000,
"mode": "jdbc",
"elapsedMs": 3495,
"totalRows": 100000
}Health + row count:
curl "http://localhost:8081/api/analytics/health"Error-rate time series:
curl "http://localhost:8081/api/analytics/error-rate?service=api&bucketMinutes=5"Top IPs:
curl "http://localhost:8081/api/analytics/top-ips?limit=20"P95 latency:
curl "http://localhost:8081/api/analytics/p95-latency?service=gateway&bucketMinutes=5"Sliding-window errors:
curl "http://localhost:8081/api/analytics/sliding-window-errors?windowMinutes=15&stepMinutes=5"Apply baseline profile:
curl -X POST "http://localhost:8081/api/dev/indexes/apply?profile=baseline"Apply optimized profile:
curl -X POST "http://localhost:8081/api/dev/indexes/apply?profile=optimized"Drop optimized indexes:
curl -X POST "http://localhost:8081/api/dev/indexes/drop?profile=optimized"Run benchmark scenario:
curl -X POST "http://localhost:8081/api/dev/benchmark/run?scenario=errorRate&iterations=5"Sample output:
{
"scenario": "errorRate",
"profile": "baseline",
"iterations": 5,
"variants": [
{"variant":"date_trunc_bucket","meanMs":13.91,"p95Ms":25.98,"minMs":10.29,"maxMs":25.98},
{"variant":"generated_series_bucket","meanMs":8.1,"p95Ms":9.32,"minMs":7.32,"maxMs":9.32}
],
"totalDurationMs": 111
}The test fixture in src/test/resources/sql/deterministic_fixture.sql is used to validate exact analytics behavior (not only shape/smoke).
Example expected output (top-endpoints):
[
{"endpoint":"/api/login","requestCount":4,"avgLatencyMs":250.0},
{"endpoint":"/api/search","requestCount":3,"avgLatencyMs":500.0},
{"endpoint":"/api/orders","requestCount":3,"avgLatencyMs":70.0}
]Example expected output (sliding-window-errors, 5m window/step):
[
{"windowStart":"2026-01-01T00:00:00Z","totalCount":5,"errorCount":1},
{"windowStart":"2026-01-01T00:05:00Z","totalCount":5,"errorCount":1},
{"windowStart":"2026-01-01T00:10:00Z","totalCount":0,"errorCount":0}
]Example expected output (suspicious-ips, top record):
{"ip":"10.0.0.1","totalRequests":6,"authFailures":4,"maxRequestsPerMinute":3,"suspicionScore":13.1,"reasons":"auth_failures=4, max_rpm=3"}| Dataset Size | Profile | Scenario | Variant | Mean (ms) | p95 (ms) |
|---|---|---|---|---|---|
| 100,000 | baseline | errorRate | date_trunc_bucket | 13.91 | 25.98 |
| 100,000 | baseline | errorRate | generated_series_bucket | 8.10 | 9.32 |
| 100,000 | optimized | errorRate | date_trunc_bucket | 12.16 | 16.42 |
| 100,000 | optimized | errorRate | generated_series_bucket | 54.24 | 221.58 |
| 100,000 | baseline | topIps | base | 14.08 | 14.49 |
| 100,000 | optimized | topIps | base | 9.69 | 13.72 |
Error-rate explain:
curl "http://localhost:8081/api/dev/explain?scenario=errorRate&profile=optimized&variant=date_trunc_bucket"Top-ips explain:
curl "http://localhost:8081/api/dev/explain?scenario=topIps&profile=optimized&variant=base"Typical snippets:
Bitmap Index Scan on idx_log_event_ts
Planning Time: 0.092 ms
Execution Time: 8.456 ms
HashAggregate
Sort Method: top-N heapsort
Execution Time: 7.114 ms
JDBC batchingestion is consistently faster than JPA batch for high cardinality inserts.optimizedindex profile improvedtopIpsin this run (mean 14.08ms -> 9.69ms).- For error-rate bucketing:
- direct bucketing with epoch/trunc stayed stable (
~12-14ms) generate_seriesshowed higher variance under optimized profile in this run
- direct bucketing with epoch/trunc stayed stable (
- EXPLAIN plans clearly show index usage differences between baseline and optimized profiles.
- Benchmark values can vary by host resources, container warmup, PostgreSQL cache state, and data distribution; compare profiles on the same machine and same dataset window.
Run all tests:
./mvnw testNotes:
- Integration tests use Testcontainers PostgreSQL for correctness with Postgres SQL and query plans.
- If Docker is unavailable locally, container-based tests are skipped automatically.
GitHub Actions pipeline (.github/workflows/ci.yml):
mvn -B test- optional
gitleaksscan (non-blocking) - surefire reports uploaded as workflow artifact
Use the repo script to generate real benchmark/explain/output artifacts:
pwsh ./scripts/final-proof.ps1 -BaseUrl http://localhost:8081 -Dataset 100000 -Iterations 5Artifacts are written to docs/proof/. Final checklist: docs/FINALIZATION.md.
Generated evidence in docs/proof/ includes:
- seed run outputs
- baseline vs optimized benchmark results
- EXPLAIN ANALYZE responses
- sample analytics endpoint responses
- Built a high-volume event analytics backend with dual ingestion paths (JDBC/JPA) up to multi-million rows.
- Designed and benchmarked query/index strategies using PostgreSQL native analytics SQL and EXPLAIN ANALYZE.
- Implemented reproducible performance experiments with index profile switching and persisted benchmark runs.
- Delivered production-style API layering, validation, error handling, integration tests, and CI automation.
- Demonstrated security-oriented analytics via suspicious IP heuristics and burst detection.