Parxy provides a powerful and flexible command-line interface (CLI) that allows you to parse documents, convert them to Markdown, and manage configuration files directly from your terminal — without writing any Python code.
Once installed, you can run the CLI via the parxy command.
The Parxy CLI lets you:
| Command | Description |
|---|---|
parxy parse |
Extract text content from documents with customizable detail levels and output formats. Process files or folders with multiple drivers. |
parxy markdown |
Convert documents to Markdown files, with support for multiple drivers and folder processing |
parxy pdf:merge |
Merge multiple PDF files into one, with support for page ranges |
parxy pdf:split |
Split a PDF into individual pages, with optional page range and single-file extraction |
parxy pdf:outline |
Print or export a PDF's outline (bookmarks / table of contents) |
parxy pdf:tags |
Inspect and extract the tag (structure) tree of a tagged, accessible PDF |
parxy pdf:xmp |
Read and extract XMP metadata from a PDF |
parxy drivers |
List available document processing drivers |
parxy env |
Generate a default .env configuration file |
parxy docker |
Create a Docker Compose configuration for running Parxy-related services |
The parse command is a powerful tool for extracting text from documents with extensive customization options. It supports processing individual files or entire folders, multiple output formats, and can use multiple drivers for comparison.
Parse a single document using the default settings (PyMuPDF driver, json output):
parxy parse document.pdfThis creates a pymupdf-document.json file in the same directory as the source file. Parxy always prefix the output file with the driver name.
Parse multiple files at once:
parxy parse doc1.pdf doc2.pdf doc3.pdfProcess all PDFs in a folder (recursively):
parxy parse /path/to/folderMix files and folders:
parxy parse document.pdf /path/to/folderControl the output format with the --mode (-m) option:
# Markdown format (default)
parxy parse document.pdf -m markdown
# Plain text
parxy parse document.pdf -m plain
# JSON (full document structure)
parxy parse document.pdf -m jsonThe file extension is automatically set based on the output mode (.md, .txt, or .json).
Specify where to save the output files with --output (-o):
parxy parse document.pdf -o output/If not specified, files are saved in the same directory as the source files.
Adjust the extraction level with the --level (-l) option:
parxy parse --level line document.pdfSupported levels are (depending on the driver):
pageblock(default)linespancharacter
Specify a driver with the --driver (-d) option:
parxy parse --driver llamaparse document.pdf
# output will be saved as llamaparse-document.jsonParse the same document(s) with multiple drivers by specifying --driver (or -d for short) multiple times:
parxy parse document.pdf -d pymupdf -d llamaparseWhen using multiple drivers, Parxy always prepend the driver name to the output filenames, e.g. pymupdf-document.json, llamaparse-document.json. This is particularly useful for comparing extraction quality across different parsers.
When processing multiple files, Parxy displays a progress bar showing:
- Files being processed
- Driver being used
- Output file location
- Number of pages extracted
Process all PDFs in a folder with two drivers, output as JSON, and save to a specific directory:
parxy parse /path/to/pdfs -d pymupdf -d llamaparse -m json -o output/The markdown command converts documents to Markdown format, preserving structure such as headings and lists. It follows the same conventions as the parse command: output files are prefixed with the driver name and saved next to the source file by default.
parxy markdown document.pdfThis creates a pymupdf-document.md file in the same directory as the source file.
# Parse multiple files
parxy markdown doc1.pdf doc2.pdf doc3.pdf
# Parse all PDFs in a folder (non-recursive by default)
parxy markdown /path/to/folder
# Parse recursively
parxy markdown /path/to/folder --recursive
# Limit recursion depth
parxy markdown /path/to/folder --recursive --max-depth 2parxy markdown document.pdf -o output/Run the same documents through multiple drivers for comparison:
parxy markdown document.pdf -d pymupdf -d llamaparseThis produces pymupdf-document.md and llamaparse-document.md.
If you have a JSON file produced by parxy parse -m json, you can convert it to Markdown directly without re-parsing:
parxy markdown result.jsonThis loads the Document model from the JSON and converts it immediately — no driver or API call required. You can mix JSON files and PDF files in the same invocation:
parxy markdown result.json document.pdf -d pymupdf -o output/Use --page-separators to insert HTML comments before each page's content:
parxy markdown document.pdf --page-separatorsOutput will contain markers like:
<!-- page: 1 -->
First page content...
<!-- page: 2 -->
Second page content...This is useful for post-processing scripts that need to identify page boundaries.
Use --inline with a single file to print markdown directly to stdout with a YAML frontmatter header — useful for shell pipelines:
parxy markdown document.pdf --inline
parxy markdown document.pdf --inline | your-toolOutput format:
---
file: "document.pdf"
pages: 10
---
# Document heading
...Parxy provides two powerful commands for PDF manipulation: merging multiple PDFs into one and splitting a single PDF into multiple files.
The pdf:merge command combines multiple PDF files into a single output file. You can merge entire files, specific page ranges, or folders of PDFs.
Basic merge:
parxy pdf:merge file1.pdf file2.pdf -o merged.pdfMerge with page ranges:
parxy pdf:merge doc1.pdf[1:5] doc2.pdf[3:7] -o combined.pdfPage range syntax (1-based indexing):
file.pdf[1]- Single page (page 1)file.pdf[1:5]- Pages 1 through 5file.pdf[:3]- First 3 pagesfile.pdf[5:]- From page 5 to the end
Merge entire folders:
parxy pdf:merge /path/to/pdfs -o combined.pdfMix files, folders, and page ranges:
parxy pdf:merge cover.pdf /chapters doc.pdf[10:20] appendix.pdf -o book.pdfThe pdf:split command divides a PDF file into individual pages, with optional page range extraction and single-file output.
Split into individual pages:
parxy pdf:split document.pdfThis creates a document_split/ folder containing document_page_1.pdf, document_page_2.pdf, etc.
Specify output directory and prefix:
parxy pdf:split report.pdf -o ./pages -p pageExtract a page range as individual files:
parxy pdf:split document.pdf --pages 2:5 -o ./pagesCombine a page range into a single PDF:
# Auto-named output next to the input file
parxy pdf:split document.pdf --pages 2:5 --combine
# Custom output path
parxy pdf:split document.pdf --pages 2:5 --combine -o extracted.pdfPage range formats (1-based): 3 · 2:5 · :5 · 3:
For more detailed examples and use cases, see the Merge and split PDFs guide.
Beyond text extraction, Parxy can inspect a PDF's structure and metadata: its outline (bookmarks), its accessibility tag tree, and its XMP metadata. Each command prints a human-readable view by default and can emit JSON with --json (to stdout) or --output (to a file).
The pdf:outline command prints the table of contents as a tree:
parxy pdf:outline document.pdfUse --flat for an indented list instead of a tree, or export the structure:
# Flat listing
parxy pdf:outline document.pdf --flat
# Export as JSON (flat entries + nested tree)
parxy pdf:outline document.pdf -o outline.jsonThe command exits with code 2 when the PDF has no bookmarks, which is handy in scripts.
A tagged PDF carries a logical structure tree (/StructTreeRoot) that makes it accessible. Start by checking whether a PDF is tagged:
parxy pdf:tags-check document.pdfThis reports whether the content is marked, whether a structure tree is present, the document language, and the number of structure elements. It exits with 0 for a tagged PDF and 2 otherwise.
Extract the tag tree itself with pdf:tags:
# Print the structure tree (with page references and alt text)
parxy pdf:tags document.pdf
# Include the visible text of each element (rebuilt per page)
parxy pdf:tags document.pdf --text
# Export the full nested structure as JSON
parxy pdf:tags document.pdf -o tags.jsonThe default view walks the document-wide structure tree and shows accessibility attributes (alt text, titles, page references) but not body text, which lives in the page content streams. The --text view reconstructs the structure per page including each element's visible text, but without the accessibility attributes.
Two companion commands help with accessibility work:
# Copy a tagged PDF keeping its tags but removing visible content
parxy pdf:tag-skeleton document.pdf -o tags-only.pdf
# Create an empty tagged PDF skeleton from scratch
parxy pdf:tag-template -o template.pdf --pages 3 --lang en-USThe pdf:xmp command reads the XMP metadata packet (an RDF/XML block holding properties such as dc:title, dc:creator, and pdf:Producer) and prints the parsed properties alongside the classic /Info dictionary:
parxy pdf:xmp document.pdfYou can view the original packet or export the metadata:
# Print the raw XMP XML packet
parxy pdf:xmp document.pdf --raw
# Export parsed metadata as JSON
parxy pdf:xmp document.pdf --json
# Save the raw XMP packet (a .xml path writes the raw packet,
# any other extension writes parsed JSON)
parxy pdf:xmp document.pdf -o metadata.xmlTo view the list of supported document parsing drivers:
parxy driversThis will display all available backends (e.g., pymupdf, pdfact, llamaparse, etc.).
To create a default .env configuration file for Parxy:
parxy envIf a .env file already exists, you'll be prompted before overwriting it.
You can then edit this file to adjust driver settings, API keys, or other environment variables.
Parxy can generate a ready-to-use Docker Compose configuration for self-hosted services (e.g., parsers available via an http-based api):
parxy dockerThis creates a compose.yaml file in your working directory.
To start the services, run:
docker compose pull
docker compose up -dRun the following to see all available commands and options:
parxy --helpEach command also supports --help for detailed usage, for example:
parxy parse --helpWith the CLI, you can use Parxy as a standalone document parsing tool — ideal for quick experiments, batch conversions, or integrations in shell-based pipelines.
| Command | Purpose |
|---|---|
parxy parse |
Extract text from documents with multiple formats & drivers |
parxy markdown |
Generate Markdown files; accepts JSON results and supports --page-separators |
parxy pdf:merge |
Merge multiple PDF files with page range support |
parxy pdf:split |
Split PDF into individual pages; supports --pages and --combine |
parxy pdf:outline |
Print or export a PDF's outline (bookmarks) |
parxy pdf:tags |
Inspect and extract a tagged PDF's structure tree; supports --text |
parxy pdf:xmp |
Read and extract XMP metadata; supports --raw and JSON export |
parxy drivers |
List supported drivers |
parxy env |
Create default configuration file |
parxy docker |
Generate Docker Compose setup |