This repository contains the PDF inputs and local tooling needed to reproduce the Material Database Agent (MDA) benchmark workflows.
MDA is a multimodal, agentic literature-mining workflow. It converts scientific PDFs into markdown and extracted figures, sends each paper's parsed content to specialized subagents, writes one intermediate JSON file per paper, and then aggregates those JSON files into a final CSV material database.
PDFs_meltpoolnet/- paper PDFs used for the MeltpoolNet benchmark.PDFs_refrac/- paper PDFs used for the refractory HEA/CCA benchmark.BulkModulus_test_database_MPPolak_DMorgan.xlsx- bulk modulus benchmark spreadsheet used for the prior text-only comparison.marker_pdfs/- copied local MCP server for converting PDF folders to markdown with image extraction. The virtual environment from/home/ash/matdatabase/marker_pdfs/.venvwas intentionally not copied.
The copied MCP server is in marker_pdfs/. It exposes two tools:
convert_pdfs_to_markdown- process selected numbered folders.convert_all_pdfs_to_markdown- process every folder under the configured input directory.
The server expects an input directory containing numbered subfolders. Each subfolder can contain one or more PDF files. It writes markdown, extracted images, and log files to the configured output directory while preserving the folder structure.
From the repository root:
cd /home/ash/matdatabase/Material-Database-Agent/marker_pdfs
uv syncThis creates a new local .venv/ inside the copied server directory. The venv is intentionally generated locally instead of checked into the repository.
The server also needs marker_chunk_convert. If you already have the marker environment from the original workspace, use:
export MARKER_CHUNK_CONVERT_BIN=/home/ash/matdatabase/marker/.venv/bin/marker_chunk_convertIf not, install marker-pdf through uv sync and make sure marker_chunk_convert is available on PATH, or set MARKER_CHUNK_CONVERT_BIN to its full path.
For the MeltpoolNet PDF set:
cd /home/ash/matdatabase/Material-Database-Agent/marker_pdfs
export PDF_BASE_DIR=/home/ash/matdatabase/Material-Database-Agent/PDFs_meltpoolnet
export OUTPUT_BASE_DIR=/home/ash/matdatabase/Material-Database-Agent/pdfs_markdown_meltpoolnet
export MARKER_CHUNK_CONVERT_BIN=/home/ash/matdatabase/marker/.venv/bin/marker_chunk_convert
.venv/bin/python main.pyFor the refractory HEA/CCA PDF set, change the two dataset paths:
export PDF_BASE_DIR=/home/ash/matdatabase/Material-Database-Agent/PDFs_refrac
export OUTPUT_BASE_DIR=/home/ash/matdatabase/Material-Database-Agent/pdfs_markdown_refracIf PDF_BASE_DIR and OUTPUT_BASE_DIR are not set, the copied server keeps the original defaults:
- Input:
/home/ash/matdatabase/PDFs - Output:
/home/ash/matdatabase/pdfs_markdown
Claude Code can run this as a local stdio MCP server. Project scope writes a shareable .mcp.json; local scope stores the setting privately in your Claude config.
MeltpoolNet:
cd /home/ash/matdatabase/Material-Database-Agent
claude mcp add --transport stdio --scope project \
--env PDF_BASE_DIR=/home/ash/matdatabase/Material-Database-Agent/PDFs_meltpoolnet \
--env OUTPUT_BASE_DIR=/home/ash/matdatabase/Material-Database-Agent/pdfs_markdown_meltpoolnet \
--env MARKER_CHUNK_CONVERT_BIN=/home/ash/matdatabase/marker/.venv/bin/marker_chunk_convert \
marker-pdfs-meltpoolnet -- \
/home/ash/matdatabase/Material-Database-Agent/marker_pdfs/.venv/bin/python \
/home/ash/matdatabase/Material-Database-Agent/marker_pdfs/main.pyRefractory HEA/CCA:
cd /home/ash/matdatabase/Material-Database-Agent
claude mcp add --transport stdio --scope project \
--env PDF_BASE_DIR=/home/ash/matdatabase/Material-Database-Agent/PDFs_refrac \
--env OUTPUT_BASE_DIR=/home/ash/matdatabase/Material-Database-Agent/pdfs_markdown_refrac \
--env MARKER_CHUNK_CONVERT_BIN=/home/ash/matdatabase/marker/.venv/bin/marker_chunk_convert \
marker-pdfs-refrac -- \
/home/ash/matdatabase/Material-Database-Agent/marker_pdfs/.venv/bin/python \
/home/ash/matdatabase/Material-Database-Agent/marker_pdfs/main.pyUseful Claude Code MCP commands:
claude mcp list
claude mcp get marker-pdfs-meltpoolnetInside Claude Code, run /mcp to inspect server status and available tools.
Codex MCP servers are configured in ~/.codex/config.toml or in a trusted project at .codex/config.toml.
Example Codex config for MeltpoolNet:
[mcp_servers.marker_pdfs_meltpoolnet]
command = "/home/ash/matdatabase/Material-Database-Agent/marker_pdfs/.venv/bin/python"
args = ["/home/ash/matdatabase/Material-Database-Agent/marker_pdfs/main.py"]
startup_timeout_sec = 20
tool_timeout_sec = 3600
[mcp_servers.marker_pdfs_meltpoolnet.env]
PDF_BASE_DIR = "/home/ash/matdatabase/Material-Database-Agent/PDFs_meltpoolnet"
OUTPUT_BASE_DIR = "/home/ash/matdatabase/Material-Database-Agent/pdfs_markdown_meltpoolnet"
MARKER_CHUNK_CONVERT_BIN = "/home/ash/matdatabase/marker/.venv/bin/marker_chunk_convert"Example Codex config for refractory HEA/CCA:
[mcp_servers.marker_pdfs_refrac]
command = "/home/ash/matdatabase/Material-Database-Agent/marker_pdfs/.venv/bin/python"
args = ["/home/ash/matdatabase/Material-Database-Agent/marker_pdfs/main.py"]
startup_timeout_sec = 20
tool_timeout_sec = 3600
[mcp_servers.marker_pdfs_refrac.env]
PDF_BASE_DIR = "/home/ash/matdatabase/Material-Database-Agent/PDFs_refrac"
OUTPUT_BASE_DIR = "/home/ash/matdatabase/Material-Database-Agent/pdfs_markdown_refrac"
MARKER_CHUNK_CONVERT_BIN = "/home/ash/matdatabase/marker/.venv/bin/marker_chunk_convert"You can also add stdio servers with codex mcp add, then inspect active servers from the Codex TUI with /mcp.
- Install the MCP server dependencies with
uv sync. - Configure either the MeltpoolNet or refractory HEA/CCA MCP server in Claude Code or Codex.
- Ask the main agent to call
convert_all_pdfs_to_markdown, or callconvert_pdfs_to_markdownwith a selected folder list such as["1", "2", "12"]. - After conversion, each output folder contains markdown, figures, and a marker log.
- Ask doc-writer subagents to read each output paper folder and write one
inference.txtJSON file per folder. - Ask a csv-writer subagent to read all
inference.txtfiles and consolidate them into the final CSV.
Claude Code supports built-in and custom subagents. Custom subagents are Markdown files with YAML frontmatter plus a system prompt body.
Recommended setup:
.claude/agents/doc-writer.md
.claude/agents/csv-writer.md
Project-scoped agents live in .claude/agents/. User-scoped agents live in ~/.claude/agents/.
Minimal doc-writer example:
---
name: doc-writer
description: Extracts structured material data from one parsed paper folder.
tools: Read, Grep, Glob, Write
model: inherit
---
Read every markdown and image file in the assigned paper folder together.
Extract only material data supported by the source files.
Write one inference.txt file containing valid JSON in the requested schema.
Return a short summary of extracted rows and any uncertain fields.Minimal csv-writer example:
---
name: csv-writer
description: Consolidates material JSON inference files into a clean CSV.
tools: Read, Grep, Glob, Write
model: inherit
---
Read every inference.txt file requested by the main agent.
Validate JSON structure, normalize units where instructed, preserve nulls for missing data, and write one CSV with the requested column order.
Do not call PDF conversion MCP tools during CSV aggregation.Use /agents in Claude Code to create, edit, inspect, and manage these subagents. You can invoke them naturally:
Use the doc-writer subagent on every folder in pdfs_markdown_meltpoolnet, one folder per subagent, then summarize which folders produced inference.txt.
Claude Code can also use @ mentions for a specific agent, and claude --agent <name> can start a session where the main thread itself uses that agent's prompt. Subagents start with isolated context, inherit available tools by default, can run in foreground or background, and cannot spawn nested subagents.
Official docs:
Codex supports subagent workflows when you explicitly ask for parallel agents. It does not spawn subagents automatically. This is useful for MDA because each paper folder can be processed independently and the main thread only needs the final summaries.
Project-scoped custom agents live in:
.codex/agents/
Personal custom agents live in:
~/.codex/agents/
Each custom agent is a standalone TOML file. Required fields are name, description, and developer_instructions.
Example .codex/agents/doc-writer.toml:
name = "doc-writer"
description = "Extracts structured material data from one parsed paper folder."
model = "gpt-5.5"
model_reasoning_effort = "medium"
developer_instructions = """
Read every markdown and image file in the assigned paper folder together.
Extract only material data supported by the source files.
Write one inference.txt file containing valid JSON in the requested schema.
Return a short summary of extracted rows and uncertain fields.
"""Example .codex/agents/csv-writer.toml:
name = "csv-writer"
description = "Consolidates material JSON inference files into a clean CSV."
model = "gpt-5.5"
model_reasoning_effort = "medium"
developer_instructions = """
Read inference.txt files, validate JSON, normalize units where instructed, preserve nulls for missing data, and write one CSV with the requested column order.
Do not call PDF conversion MCP tools during CSV aggregation.
"""Example prompt:
Spawn one doc-writer subagent per folder in pdfs_markdown_meltpoolnet. Each subagent should write inference.txt in its assigned folder. Wait for all agents, then summarize completed and failed folders.
Use /agent in the Codex CLI to inspect and switch between active agent threads. You can tune concurrency in Codex config:
[agents]
max_threads = 6
max_depth = 1Official docs:
The MeltpoolNet benchmark comes from Akbari et al., "MeltpoolNet: Melt pool characteristic prediction in Metal Additive Manufacturing using machine learning," Additive Manufacturing 55, 102817 (2022).
This benchmark is an experimental dataset for powder bed fusion / metal additive manufacturing. It contains melt pool characteristics, processing parameters, and material data. The relevant fields include laser power, scanning velocity, hatch spacing, layer thickness, beam diameter, meltpool depth/width/length, density, melting point, specific heat capacity, thermal conductivity, absorptivity, material composition, particle size, paper ID, title, and DOI.
The ground-truth MeltpoolNet table used for evaluation has 789 rows. The repository currently contains PDFs_meltpoolnet/ with 37 numbered paper folders and 39 PDF files, including supplemental PDFs for some papers.
The refractory benchmark is a database of high-entropy alloys and complex concentrated alloys. It targets mechanical-property extraction from source papers and includes alloy composition, reported phases, density, Vickers hardness, test type, yield strength, ultimate strength, elongation, and Young's modulus.
The original HEA/CCA ground-truth database contains roughly 370 rows, with 366 original rows/datapoints used for evaluation. Unlike MeltpoolNet, this dataset is heavily graphical: many values are reported in stress-strain curves, bar charts, plots, and annotated micrographs rather than clean tables.
The repository currently contains PDFs_refrac/ with 74 folders and 72 PDF files. Folders no_12 and no_41 are present but do not contain PDFs in this checkout.
For both benchmarks, extracted databases are evaluated against manually mapped ground-truth row pairs. MeltpoolNet rows are mapped using material composition, laser power, scan velocity, layer thickness, and paper DOI. HEA/CCA rows are mapped using alloy composition, source paper number, and hardness value.
This workflow also includes a comparison against the Polak and Morgan (2024) ChatExtract benchmark for text-based extraction of bulk modulus values. This benchmark is not one of the two main multimodal MDA datasets; it is used as a previous-generation baseline for constrained text-only extraction.
The local spreadsheet is BulkModulus_test_database_MPPolak_DMorgan.xlsx. It contains two sheets:
Positive- 179 labeled rows from 63 papers. Columns arepassage,sentence,previous_sentence,title,doi,material,value, andunit. These rows contain positive examples where a material and bulk modulus value are annotated. All units areGPa; the parsed numeric values range from 9.6 to 843.0 GPa.Negative- 1,912 rows from the same 63 papers. Columns arepassage,sentence,previous_sentence,title, anddoi. These rows provide paper context that should not be extracted as positive material-bulk-modulus records.
The bulk modulus extraction workflow uses this spreadsheet with ten independent subagents. The prompt instructs the agents to read each row of the passage column, group rows by shared DOI values, and extract the unique material and bulk modulus values for each row. Opus 4.6 is evaluated on this benchmark at 99.23% precision and 100% recall, compared with the 2024 ChatExtract GPT-4 result of 90.8% precision and 87.7% recall.