██████╗ ██╗███╗ ██╗██╗ ██╗ █████╗ ███╗ ███╗
██╔══██╗██║████╗ ██║╚██╗ ██╔╝██╔══██╗████╗ ████║
██████╔╝██║██╔██╗ ██║ ╚████╔╝ ███████║██╔████╔██║
██╔══██╗██║██║╚██╗██║ ╚██╔╝ ██╔══██║██║╚██╔╝██║
██████╔╝██║██║ ╚████║ ██║ ██║ ██║██║ ╚═╝ ██║
╚═════╝ ╚═╝╚═╝ ╚═══╝ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝
Sr. Data Scientist · ML Systems · LLM Engineering · AdTech
Senior Data Scientist specializing in end-to-end ML system design — spanning quantile regression, temporal clustering, causal inference, anomaly detection, and LLM-powered agent pipelines. Building production systems where statistical rigor and engineering precision drive measurable outcomes at scale in latency-sensitive AdTech environments.
EDA EMBED MODEL SERVE FEEDBACK
─── ───── ───── ───── ────────
Polars ●─────────● Word2Vec ───────● LightGBM ───────● Lambda ───────● Thompson
╲ ╱╲ ╲ ╱╲ ╲ ╱╲ ╲ ╱
╲ ╱ ╲ ╲ ╱ ╲ ╲ ╱ ╲ ╲ ╱
Kafka ●─────────● BERT ─────── ● DCN ───────── ● O(1) ───● Elasticity
╱ ╲ ╱ ╱ ╲ ╱ ╱ ╲ ╱ ╱ ╲
╱ ╲╱ ╱ ╲╱ ╱ ╲╱ ╱ ╲
DSP ●─────────● HMM ───────● Iso.Forest ───────● RT infer──────● AutoLoop
└─────────────────────────────────────────────────────────────────┘
hourly recalibration feedback arc
┌─────────────┐ ┌──────────────┐ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ User Query │───▶│ Planner │───▶│ Tool Use │───▶│ Reflection │───▶│ Response │
│ │ │ LangGraph │ │ MCP · RAG │ │ self-critique│ │ grounded │
└─────────────┘ └──────────────┘ └─────────────┘ └──────────────┘ └──────────────┘
│ │
▼ ▼
┌────────────┐ ┌──────────────┐
│ Memory │ │ Vector DB │
│ store │ │ FAISS·pgvec │
└────────────┘ └──────────────┘
| Layer | Method | Detail |
|---|---|---|
| Embedding | Dense + sparse hybrid | BERT, Word2Vec, BM25 fusion |
| Indexing | HNSW approximate NN | 200M+ document scale |
| Re-ranking | Cross-encoder | Precision boost post-retrieval |
| Entity linking | Custom taxonomy mapper | Publisher content → audience graph |
| Storage | FAISS · pgvector | On-prem and cloud portable |
Query complexity assessment
│
├──▶ Simple retrieval ──▶ Haiku (fast · cheap)
├──▶ Structured output ──▶ Sonnet (balanced)
└──▶ Complex generation ──▶ Opus (frontier)
Tools & Frameworks: LangGraph · LangSmith · PydanticAI · MCP · FAISS · pgvector · RAG
01 INGEST 02 FEATURES 03 MODEL 04 SERVE 05 FEEDBACK
───────── ─────────── ───────── ──────── ───────────
Polars · Kafka Word2Vec LightGBM QR Lambda Thompson MAB
Bidstream EDA HMM states DCN features O(1) lookup Auto-calibrate
DSP signals GloVe embeds Anomaly detect Real-time Price elasticity
│ │ │ │ │
└────────────────┴────────────────┴────────────────┴────────────────┘
Feedback loop (hourly recalibration)
| Metric | Result |
|---|---|
| Bid request reduction | 50%+ |
| GCPM gain | 2×+ |
| Pipeline speedup | 10× |
| Directional decision accuracy | 76% |
| Daily revenue lift | $44–$500 |
| ID5 integration revenue | $10K+/day |
| Infra cost reduction | 61% ($7.63 → $3/hr) |
Bid floor optimization ████████████████████ 95%
Quantile regression ███████████████████ 92%
LLM agents · LangGraph ██████████████████ 88%
Anomaly detection · IVT ██████████████████ 88%
Embedding · HNSW · RAG █████████████████ 87%
NLP · BERT · Word2Vec █████████████████ 86%
Thompson Sampling · MAB ████████████████ 83%
Hidden Markov Models ████████████████ 82%
Causal inference ███████████████ 80%
Core ML
LLM & Agents
Data & Infrastructure
Cloud
Addis Ababa, ET · Open to remote · AdTech · ML Systems · LLM Engineering



