Quickdraw

Benchmark LLM streaming — TTFT, TPS, $/1K tokens. Across providers, on your prompts, with a hard cost ceiling.

The problem

LLM SDKs give you a latency number but not a streaming breakdown. "Total time to first token" vs "time after last token" vs "throughput in tokens/sec" are different numbers that tell you different things. Quickdraw splits the stream into phases and gives you each one.

How it works

flowchart LR
    P[prompts<br/>test-prompts.ts]
    B[BenchmarkRunner<br/>runBenchmark]
    O[openai provider<br/>gpt-4o-mini]
    A[anthropic provider<br/>claude-3-5-haiku]
    M[computeMetrics<br/>ttft / tps / cost]
    R[results.jsonl<br/>api_calls.jsonl]
    P --> B
    B --> O
    B --> A
    O --> M
    A --> M
    M --> R

runBenchmark() iterates over providers[], streams each prompt, measures TTFT and TPS, writes api_calls.jsonl with raw data, and computes summary stats.

Metrics captured per run:

Metric	Description
`ttft_ms`	Milliseconds from request start to first token received
`tps`	Tokens per second after first token
`total_duration_ms`	Full end-to-end time
`cost_usd`	Computed from token counts × provider pricing
`guardrail_overhead_ms`	Time spent in per-chunk callbacks

Quick start

# Install
npm install -g @ykstormsorg/quickdraw

# Run against both providers, 3 runs each, $2 hard cost cap
quickdraw bench --providers openai,anthropic --runs 3 --cost-cap 2

# Use your own prompt and save the full results JSON
quickdraw bench --providers openai --runs 5 --prompt-file ./bench/standard-prompt.md --json run.json

# Dry run (no API calls, prints the plan only)
DRY_RUN=true quickdraw bench --providers openai --runs 1

# Regression-diff two saved runs (exit code 2 if a regression is detected)
quickdraw diff baseline.json candidate.json

The benchmark table reports avg / p50 / p95 / p99 for both TTFT and TPS, plus per-provider cost. If a required API key is missing, the CLI exits with a clean Set OPENAI_API_KEY / Set ANTHROPIC_API_KEY message and makes no network call.

Try locally

git clone https://github.com/ykstorm/quickdraw.git
cd quickdraw
npm install
npm test                    # vitest suite
DRY_RUN=true npm run bench  # dry run against mock infra
# Then with real keys:
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
npm run bench               # live against OpenAI + Anthropic

Library mode

import { runBenchmark } from '@ykstormsorg/quickdraw'

const results = await runBenchmark({
  providers: ['openai', 'anthropic'],
  runs: 3,
  guardrails: false,
})
// results: BenchmarkResult[] with per-provider stream metrics

Stack

Layer	Choice
Runtime	Node.js 18+
Types	TypeScript
Build	tsup
Tests	Vitest
Providers	OpenAI + Anthropic REST streaming (raw `fetch`)
License	Apache 2.0

What's here now

Percentile reporting. TTFT and TPS are reported as avg / p50 / p95 / p99 across runs.
Regression diffing. quickdraw diff <run1.json> <run2.json> compares two saved runs and flags TTFT/TPS/cost regressions and success/model changes (exit code 2 when a regression is found).
Exact token counts. Token counts come from each provider's usage field when available, falling back to a char/4 estimate.
API-key preflight. Missing keys produce a clean Set <ENV_VAR> message and exit 1 — never a Bearer undefined 401 dump.

What's NOT here

No Bedrock / Vertex / Gemini support. Only OpenAI and Anthropic. Azure and local models are not wired.
No hosted nightly dashboard. The nightly workflow runs the real CLI and publishes a results page to GitHub Pages, but there is no richer dashboard UI yet.
Guardrail overhead is a stub. guardrail_overhead_ms is measured with a no-op callback — it doesn't run real Tripwire patterns.

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for details on how to get involved.

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
bench		bench
bin		bin
docs		docs
examples		examples
prompts		prompts
src		src
test		test
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SPEC.md		SPEC.md
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quickdraw

The problem

How it works

Quick start

Try locally

Library mode

Stack

What's here now

What's NOT here

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quickdraw

The problem

How it works

Quick start

Try locally

Library mode

Stack

What's here now

What's NOT here

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages