Unified Project: Milestone 1 + Milestone 2
- Milestone 1 — ML-Based Student Performance Prediction
- Milestone 2 — Agentic AI Study Coach
- System Architecture
- Tech Stack
- Getting Started
- Project Structure
- Roadmap
Milestone 1 implements a machine learning pipeline that predicts student academic performance using demographic, academic, and behavioural data.
The system provides three outputs:
- Exam score prediction (regression)
- Pass/Fail classification
- Learner segmentation using clustering
A Streamlit dashboard is used to interactively visualize predictions.
Live Demo:
https://predictive-learning-analytics-ml.streamlit.app/
| Model | Task | Performance |
|---|---|---|
| Linear Regression | Predict ExamScore | R² = 0.9397 |
| Logistic Regression | Pass/Fail | Accuracy = 91.76% |
| K-Means Clustering | Learner Segmentation | Silhouette = 0.2112 |
- WritingScore used as proxy target (no leakage risk)
- Median-based Pass/Fail threshold (69.0)
- Manual ordinal encoding for education-related features
- Midpoint encoding for study hours
- class_weight='balanced' instead of SMOTE
- k=3 clustering aligned with interpretability requirement
- Train-only scaling to avoid data leakage
- 30,640 student records
- 14 original features
- 11 engineered features
Source: Kaggle Students Exam Scores Extended Dataset
- Data cleaning and normalization
- Missing value imputation
- Ordinal + one-hot encoding
- Outlier handling (IQR method)
- Feature scaling (StandardScaler)
- Train-test split (80/20)
- Model training and evaluation
Milestone 2 transforms the ML system into a conversational agentic AI tutor.
Instead of static inputs, students interact through natural language, and the system dynamically decides:
- What analysis to run
- What knowledge to retrieve
- Whether to generate a study plan or quiz
- How to respond using ML + LLM reasoning
This is implemented using LangGraph-based multi-agent orchestration.
Students interact via chat instead of form inputs.
A structured graph of specialized nodes:
- Analyser Node (ML inference)
- Retriever Node (RAG system)
- Planner Node (study plan generation)
- Quizzer Node (MCQ system)
- End Node (response finalization)
- Extracts student data from natural language
- Runs all Milestone 1 models dynamically
- Predicts:
- Exam score
- Pass/Fail status
- Learner category
- FAISS vector database
- Sentence-transformer embeddings (MiniLM)
- Academic knowledge base
- Optional Tavily web search fallback
- 7-day structured study plan
- Generated from retrieved knowledge
- Adapted to learner category:
- At-Risk → fundamentals + revision
- Average → balanced learning
- High Performer → advanced + challenge tasks
- 5 MCQs per session
- Auto-generated from retrieved content
- Automatic grading + explanations
- Performance-based feedback
- PostgreSQL (Neon)
- Stores:
- chat history
- agent state
- session data
- Allows conversation resumption
- Blocks non-academic queries
- Prevents cheating requests
- Ensures safe tutoring behavior
User (Streamlit UI)
|
v
LangGraph Agent (Master Node)
|
v
+----------------------------------+
| Specialist Nodes |
| - Analyser (ML Models) |
| - Retriever (RAG System) |
| - Planner (Study Plan Generator) |
| - Quizzer (MCQ System) |
| - End Node |
+----------------------------------+
|
v
PostgreSQL (Persistent Memory)
| Layer | Technology |
|---|---|
| ML Models | Scikit-Learn |
| Agent Framework | LangGraph |
| LLM | Groq (Llama 3.3 70B) |
| Embeddings | all-MiniLM-L6-v2 |
| Vector DB | FAISS |
| Backend | Python |
| UI | Streamlit |
| Database | PostgreSQL (Neon) |
| Deployment | Streamlit Cloud |
| Web Search | Tavily API |
git clone https://github.com/sathvik89/Predictive-Learning-Analytics_ML.git
cd Predictive-Learning-Analytics_MLpython -m venv venv
source venv/bin/activate # Mac/Linux
venv\Scripts\activate # Windowspip install -r requirements.txtstreamlit run app.pypredictive-learning-analytics
├── __pycache__
│ ├── app.cpython-313.pyc
│ └── styles.cpython-313.pyc
├── agent
│ ├── __init__.py
│ ├── __pycache__
│ ├── chat_history.py
│ ├── formatting.py
│ ├── graph.py
│ ├── guardrails.py
│ ├── ml_pipeline.py
│ ├── nodes.py
│ ├── rag.py
│ ├── session_context.py
│ └── state.py
├── app_errors.log
├── app.py
├── chat_history.db
├── Data
│ ├── processed
│ └── raw
├── knowledge
│ ├── academic_coaching.md
│ ├── algebra_geometry_trig.md
│ ├── math_foundations.md
│ ├── performance_intervention.md
│ ├── reading_comprehension.md
│ ├── README.md
│ ├── statistics_probability.md
│ ├── study_skills.md
│ └── writing_skills.md
├── models
│ ├── kmeans_model.pkl
│ ├── linear_model.pkl
│ ├── logistic_model.pkl
│ ├── scaler_clf.pkl
│ ├── scaler_cluster.pkl
│ └── scaler_reg.pkl
├── modules
│ ├── __pycache__
│ ├── components.py
│ ├── home.py
│ ├── icons.py
│ ├── model_loader.py
│ ├── performance.py
│ ├── predict.py
│ ├── sidebar.py
│ └── styling.py
├── notebooks
│ ├── AgenticAI_Practice_Roughbook.ipynb.ipynb
│ ├── Cleaned__Notebook.ipynb
│ └── GenAi_Project_Predictive_learning.ipynb
├── README.md
├── Report
│ └── GenAi_Final_Report_v2.pdf
├── requirements.txt
├── styles.py
└── venv
├── bin
├── etc
├── include
├── lib
├── pyvenv.cfg
└── share
- ML-based student prediction system
- Streamlit dashboard
- LangGraph agent system
- RAG pipeline
- Quiz system
- Persistent memory
- User authentication system
- Step-by-step adaptive quiz flow
- Larger academic knowledge base
- Student analytics dashboard
- Mobile UI optimization
- Personalized long-term learning tracking
This project demonstrates the evolution from a traditional machine learning pipeline into a fully agentic AI tutoring system. It integrates predictive modeling, retrieval-augmented generation, and multi-agent orchestration to create a system that not only analyzes student performance but actively supports learning through conversation, planning, and assessment.
Built for Gen AI Course — Milestone 1 & 2 Sathvik Koriginja | Anushka Tyagi | Apoorva Choudhary