Skip to content
View Bina-man's full-sized avatar
👟
FIFA
👟
FIFA

Block or report Bina-man

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Bina-man/README.md

 ██████╗ ██╗███╗   ██╗██╗   ██╗ █████╗ ███╗   ███╗
 ██╔══██╗██║████╗  ██║╚██╗ ██╔╝██╔══██╗████╗ ████║
 ██████╔╝██║██╔██╗ ██║ ╚████╔╝ ███████║██╔████╔██║
 ██╔══██╗██║██║╚██╗██║  ╚██╔╝  ██╔══██║██║╚██╔╝██║
 ██████╔╝██║██║ ╚████║   ██║   ██║  ██║██║ ╚═╝ ██║
 ╚═════╝ ╚═╝╚═╝  ╚═══╝   ╚═╝   ╚═╝  ╚═╝╚═╝     ╚═╝
 Sr. Data Scientist · ML Systems · LLM Engineering · AdTech

Senior Data Scientist specializing in end-to-end ML system design — spanning quantile regression, temporal clustering, causal inference, anomaly detection, and LLM-powered agent pipelines. Building production systems where statistical rigor and engineering precision drive measurable outcomes at scale in latency-sensitive AdTech environments.


⚡ ML Architecture · Pipeline as Neural Net

   EDA              EMBED             MODEL             SERVE           FEEDBACK
   ───              ─────             ─────             ─────           ────────

  Polars ●─────────● Word2Vec ───────● LightGBM ───────● Lambda ───────● Thompson
         ╲         ╱╲        ╲      ╱╲          ╲     ╱╲        ╲     ╱
          ╲       ╱  ╲        ╲    ╱  ╲           ╲   ╱  ╲        ╲   ╱
   Kafka  ●─────────● BERT    ─────── ● DCN    ───────── ● O(1)   ───● Elasticity
          ╱       ╲  ╱        ╱    ╲  ╱           ╱   ╲  ╱        ╱   ╲
         ╱         ╲╱        ╱      ╲╱            ╱    ╲╱         ╱     ╲
     DSP ●─────────● HMM     ───────● Iso.Forest ───────● RT infer──────● AutoLoop

          └─────────────────────────────────────────────────────────────────┘
                         hourly recalibration feedback arc

🤖 LLM Engineering · Agents · Retrieval

Agentic Pipeline Architecture

┌─────────────┐    ┌──────────────┐    ┌─────────────┐    ┌──────────────┐    ┌──────────────┐
│  User Query │───▶│   Planner    │───▶│  Tool Use   │───▶│  Reflection  │───▶│   Response   │
│             │    │  LangGraph   │    │  MCP · RAG  │    │ self-critique│    │   grounded   │
└─────────────┘    └──────────────┘    └─────────────┘    └──────────────┘    └──────────────┘
                          │                   │
                          ▼                   ▼
                   ┌────────────┐     ┌──────────────┐
                   │  Memory    │     │  Vector DB   │
                   │  store     │     │  FAISS·pgvec │
                   └────────────┘     └──────────────┘

Retrieval Stack

Layer Method Detail
Embedding Dense + sparse hybrid BERT, Word2Vec, BM25 fusion
Indexing HNSW approximate NN 200M+ document scale
Re-ranking Cross-encoder Precision boost post-retrieval
Entity linking Custom taxonomy mapper Publisher content → audience graph
Storage FAISS · pgvector On-prem and cloud portable

LLM Routing Strategy

Query complexity assessment
        │
        ├──▶ Simple retrieval   ──▶  Haiku  (fast · cheap)
        ├──▶ Structured output  ──▶  Sonnet (balanced)
        └──▶ Complex generation ──▶  Opus   (frontier)

Tools & Frameworks: LangGraph · LangSmith · PydanticAI · MCP · FAISS · pgvector · RAG


🏭 Production ML Pipeline

01 INGEST        02 FEATURES      03 MODEL         04 SERVE         05 FEEDBACK
─────────        ───────────      ─────────        ────────         ───────────
Polars · Kafka   Word2Vec         LightGBM QR      Lambda           Thompson MAB
Bidstream EDA    HMM states       DCN features     O(1) lookup      Auto-calibrate
DSP signals      GloVe embeds     Anomaly detect   Real-time        Price elasticity
     │                │                │                │                │
     └────────────────┴────────────────┴────────────────┴────────────────┘
                              Feedback loop (hourly recalibration)

📊 Impact at a Glance

Metric Result
Bid request reduction 50%+
GCPM gain 2×+
Pipeline speedup 10×
Directional decision accuracy 76%
Daily revenue lift $44–$500
ID5 integration revenue $10K+/day
Infra cost reduction 61% ($7.63 → $3/hr)

🧠 ML Competencies

Bid floor optimization    ████████████████████  95%
Quantile regression       ███████████████████   92%
LLM agents · LangGraph    ██████████████████    88%
Anomaly detection · IVT   ██████████████████    88%
Embedding · HNSW · RAG    █████████████████     87%
NLP · BERT · Word2Vec     █████████████████     86%
Thompson Sampling · MAB   ████████████████      83%
Hidden Markov Models      ████████████████      82%
Causal inference          ███████████████       80%

🔧 Tech Stack

Core ML

Python LightGBM XGBoost scikit--learn TensorFlow

LLM & Agents

LangGraph LangSmith PydanticAI MCP FAISS pgvector RAG

Data & Infrastructure

Polars Apache Spark Apache Kafka Airflow Docker MLflow DBT Terraform

Cloud

AWS Amazon Kinesis Amazon SageMaker AWS Lambda Amazon Redshift


🗂 Open Source Projects

All Repositories


📈 GitHub Analytics

Profile views


Addis Ababa, ET · Open to remote · AdTech · ML Systems · LLM Engineering

Pinned Loading

  1. Answers-Challenges Answers-Challenges Public

    This repo is my answer to multiple challenges from different sites.

    TSQL 3

  2. game game Public

    JavaScript

  3. Sensor_data_ETL Sensor_data_ETL Public

    Data engineering project demonstrate ETL for sensor data. Includes airflow for automation, dbt for templating and doumentation

    Python 3

  4. Pharmaceutical-Sales-Prediction Pharmaceutical-Sales-Prediction Public

    Time series analysis for phrmacutical sales analysis and forecast of sales across several cities in 6 weeks using three years of consumer and sales data in multiple stores

    Jupyter Notebook 1

  5. Casualty-Challenge/Breast_Cancer_Causality_Inference Casualty-Challenge/Breast_Cancer_Causality_Inference Public

    A causal graph is a central object in the framework, but it is often unknown, subject to personal knowledge and bias, or loosely connected to the available data. The main objective of the task is t…

    Jupyter Notebook 4 4

  6. Equb Equb Public

    M-application that demonstrates pixel-perfect UI replication, efficient state management using Provider, and advanced Git techniques. The project integrates with mock APIs for dynamic financial tra…

    Dart