What I've Built at Ren

A summary of technical work done during my internship at Ren, an AI EdTech startup building an AI-powered essay grading platform for schools.

AI Grading Pipeline

Engineered an end-to-end essay grading pipeline in Python, orchestrating GPT Vision to extract handwritten essay content into structured paragraphs and bounding boxes, then applying a dual-method grading strategy (image-based and text-extracted) that produces per-component rubric scores with justifications and a strengths/weaknesses/actionables summary
Extended the AI grading output schema to add per-rubric component scoring with justifications, giving students structured breakdowns of their performance across each rubric criterion
Designed a type-safe discriminated union interface for the grading adapter layer, enabling new grading strategies to be added without modifying the upstream worker - demonstrated by integrating text-extracted grading alongside the original image-based method with zero changes to the worker contract
Replaced legacy image-per-page grading with a GPT Vision pre-processing step that extracts paragraph text and page-spanning boundaries upfront, eliminating repeated vision API calls per grading iteration to reduce inference cost
Designed a question-specific context injection layer for the grading pipeline — generating structured per-question analysis (argument scope, judgment requirements, common misreadings) from each exam question before grading — enabling the LLM grader to assess answers against what each question is specifically testing rather than generic subject knowledge. Built the generator as a two-layer system (generic base extended by subject-specific guide fields) so exam teams can add analysis parameters for new subjects without code changes

Benchmarking & Observability

Built essay grading benchmark tooling using embedding cosine similarity and an LLM judge against gold-standard teacher-annotated scripts, with automated quality gates to detect AI hallucination in grading outputs and prevent rubric regression across model iterations
Built a concurrent load testing framework for the grading API, instrumenting LLM token usage, latency distributions, and Docker memory footprints across parallel grading jobs, generating per-run analytics reports to establish cost-per-submission baselines
Designed LLM observability infrastructure aggregating token costs, latency distributions, and memory footprints across concurrent grading workers, with the intention of informing capacity planning and pricing strategy

Product Features (Fullstack)

Built the end-to-end AI feedback summarization feature from Next.js/tRPC API through Python workers to GPT, automatically condensing ~20-30 teacher annotations from 30-page marked scripts into structured student summaries - enabling students to access targeted takeaways instead of manually reviewing annotated multi-page scripts
Extended the post-marking pipeline to auto-generate student-facing cover pages from existing component scores, delivering structured grading reports without additional LLM inference costs
Extended the Next.js LaTeX renderer to support inline math notation and built a PDF export sanitisation utility, enabling mathematical content in student feedback to render correctly across browser and PDF output

Testing & Quality

Designed a unit testing strategy for a Python FastAPI backend and authored 15+ test modules covering grading engine orchestration, worker concurrency, LLM inference, S3 storage, document handling, and notification services - using pytest-asyncio, factory-boy, and fakeredis to fully isolate all external dependencies, with 80% branch coverage enforced via GitHub Actions CI
Built a Playwright E2E test suite from scratch using the Page Object Model pattern, covering the full marking workflow end-to-end: class creation, student enrolment, assignment upload, AI pipeline completion, and graded status verification
Identified a silent failure propagation risk in the grading pipeline where error states could pass through undetected, and eliminated it through test-driven development - writing failing invariant tests to define the contract, then modifying production code until the contract held
Configured GitHub Actions CI pipelines for both unit and E2E test suites, automating the full test run on every pull request

Built with: Python, FastAPI, OpenAI GPT API, Next.js, TypeScript, tRPC, Prisma, Playwright, pytest, Docker

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What I've Built at Ren

AI Grading Pipeline

Benchmarking & Observability

Product Features (Fullstack)

Testing & Quality

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

What I've Built at Ren

AI Grading Pipeline

Benchmarking & Observability

Product Features (Fullstack)

Testing & Quality

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages