I'm a Data Scientist and AI Resident building production agentic AI systems, with an MSc in Artificial Intelligence from the University of Aberdeen.
Most AI demos work because you control the inputs. Production is different. Evaluation gets expensive fast. Token costs explode at scale. Agents return plausible but wrong outputs that pass all your validation checks and you only find out when you dig into the logs. That is the problem I work on every day.
At VectorCube I built a multi-agent supply chain risk assessment system using LangGraph hierarchical orchestration that compressed analyst review from hours to under 30 seconds, with a keyword-scored context gate that cut token consumption by 92% and an LLM-as-judge evaluation module measuring synthesis quality across 5 dimensions. At Apziva I built a Bitcoin trading bot with ChromaDB RAG pipelines running 24/7 on GCP.
Currently working on a Sales CRM Deep Q-Network agent that improved customer acquisition 3.16x from a 0.44% baseline while dealing with 65:1 class imbalance.
Building ML systems that actually work in production. There is a real gap between something working on your laptop and something working reliably for real users. That gap is where the actual engineering happens.
Agentic AI and multi-agent systems. Agents that break down complex tasks, use tools, and self-correct. LangGraph has been my main framework for building these with hierarchical supervisor patterns.
LLM evaluation. LLM-as-judge frameworks, HITL controls, Pydantic guardrails. Getting a system to actually work in production requires knowing how to measure whether it is working at all.
Reinforcement learning for real problems. DQN achieving 3.16x improvement on actual business metrics. The algorithms work beyond toy problems when you set them up properly.
Computer vision. Page flip detection for mobile document scanning. Fake audio detection through spectrogram analysis. Vision keeps surprising me with how effective it is when applied to the right problem.
Transformer decoder architecture. Working through DeepLearning.AI Transformers in Practice. Recently went deep on autoregressive generation, how models build responses one token at a time by computing probability distributions across vocabulary and selecting via greedy or sampled selection. Also studied attention and how Query, Key and Value vectors let each token update its meaning based on surrounding context.
AI system evaluation. Studying pre-deployment and post-deployment evaluation properly. Building reference datasets, defining metrics around real failure patterns rather than generic ones, and tracing input and output at every step so you can actually see where things go wrong.
Open-source ML/AI community. Part of a collaborative learning group working through gradient boosting, ensemble methods, regularisation, AI agentic patterns together.
Production ML. FastAPI, PostgreSQL, APScheduler, tmux, GitHub Actions for CI/CD, GCP for cloud deployment. The unglamorous stuff that makes things actually run.
Dwarkesh Patel and Lex Fridman podcasts. Research papers that pull me down rabbit holes at 2am. The open-source community who shares everything freely.Currently reading The Alignment problem by Brian Christian.
Playing football, still waiting for the Premier League call-up. Hiking when I can, the Aberdeenshire trails were something else. Podcasts and coffee and thinking about AI alignment.
Trying to explain what I do to non-tech friends:
"So you teach computers to think?" "Not exactly..." 15 minutes later "So... you teach computers to think?" "Close enough."
Always happy to talk AI, ML engineering, or swap learning resources.
π§ krishnanair041@gmail.com π https://www.linkedin.com/in/krishna-balachandran-nair-46621987/
