Skip to content

mbackschat/webex-export

Repository files navigation

webex-export

A CLI tool to export your Cisco Webex messaging history as readable Markdown files with all attachments preserved.

Webex organizes conversations into Direct Messages (1:1 chats between two people) and Spaces (group conversations, optionally grouped under Teams). This tool exports both, creating a local, searchable archive of your Webex communications — complete with threaded replies, formatted messages, file attachments, participant info, and accurate timestamps.

Key features:

  • Export Direct Messages and Spaces to clean, interlinked Markdown
  • Spaces organized into Team folders matching your Webex Team structure
  • Threaded replies grouped under their parent message
  • Full-text search across your exported archive
  • Deferred downloads: export text first, fetch attachments later
  • Incremental updates: only re-export conversations with new messages
  • Offline rebuild: re-render Markdown from JSON without contacting Webex
  • Agent-friendly: structured output, quiet mode, file:line search results

Prerequisites

  • Python 3.10+
  • uv (dependencies: requests, click)
  • Webex Access Token (see below)

Getting a token

Option 1: Create a Personal Access Token at developer.webex.com. These expire after 12 hours.

Option 2: Log in to Webex in your browser, open Dev Tools (F12), and extract the accessToken cookie. This uses your existing session token.

Installation

git clone <repo-url>
cd webex-export
uv sync

Quick start

export WEBEX_TOKEN="YOUR_TOKEN"

# Export all 1:1 chats
uv run webex-export chats

# Export all group spaces (organized by team)
uv run webex-export spaces

# Export both
uv run webex-export chats && uv run webex-export spaces

Add --json to any export command to also save the raw API data. This is recommended — it enables offline rebuild of markdown files and preserves fields the markdown doesn't show. See JSON export and offline rebuild.

uv run webex-export chats --json
uv run webex-export spaces --json

Subcommands

Command Description
chats Export Webex Direct Messages (1:1 conversations)
spaces Export Webex Spaces (group conversations, organized by Team)
search Full-text search across exported Markdown files (offline)
reindex Rebuild index files from state data on disk (offline)
rebuild Re-render Markdown from JSON exports (offline, requires --json)

Running webex-export without a subcommand shows help. Use --version to check the version.

Flags

All flags go on the subcommand: webex-export chats -t TOKEN -o ~/backup

Shared flags (chats and spaces)

Flag Description
-t, --token TOKEN Webex Access Token. Alternative: set WEBEX_TOKEN env var.
-o, --output DIR Output directory (default: current directory). Chats → <DIR>/chats/, spaces → <DIR>/spaces/.
-s, --select NAME Filter by name (substring, case-insensitive). Repeatable: -s alice -s bob.
--list List available chats/spaces with message counts from previous exports.
--dry-run Show what would be exported without fetching messages.
--incremental Skip chats/spaces whose last message ID matches the stored state. Still fetches to verify.
--skip-downloads Don't download attachments. Filenames are resolved via HEAD and linked in markdown.
--json Write raw API message data as <name>.json alongside each markdown file. Recommended — enables offline rebuild and preserves the full API response for future use.
-q, --quiet Suppress progress output. Only print errors and the final summary.

Spaces-only flags

Flag Description
--team NAME Filter by team name (substring, case-insensitive). Repeatable: --team Eng --team Sales.
--list-teams List all teams with space counts and last activity.

Search flags

Flag Description
query Search term (positional, case-insensitive).
-o, --output DIR Base directory to search in (default: current directory).
-C N, --context N Lines of context around each match (default: 1, max: 50).
--chats-only Search only in chats/.
--spaces-only Search only in spaces/.

Usage examples

Chats

# Export all chats
uv run webex-export chats

# List all chats
uv run webex-export chats --list

# Export specific chats by name
uv run webex-export chats -s alice -s bob

# Dry run
uv run webex-export chats --dry-run

# Export to a specific directory
uv run webex-export chats -o ~/webex-backup

# Quiet mode (only summary + errors)
uv run webex-export chats -q

Spaces

# List all teams
uv run webex-export spaces --list-teams
No.   Team                                  Spaces  Last Activity
---   ----                                  ------  -------------
1     Engineering                                5  2026-04-13
2     Falcon                                     6  2026-03-26
3     Training                                   9  2025-10-14
4     (standalone)                              19  2026-04-13
# Export all spaces in a team
uv run webex-export spaces --team "Falcon"

# Export multiple teams
uv run webex-export spaces --team "Falcon" --team "Training"

# List spaces in a team
uv run webex-export spaces --team "Falcon" --list

# Export a specific space by name
uv run webex-export spaces -s "Dev Chat"

# Export all spaces
uv run webex-export spaces

Search

Search works on local exported files — no token needed.

# Search across all exports
uv run webex-export search "project timeline"
chats/Alice_Johnson.md:142
      141 | **You** _2026-01-09 12:08_
  >   142 | Here's the updated project timeline for Q1.
      143 |

chats/Bob_Martinez.md:89
       88 | **Bob Martinez** _2026-02-15 14:30_
  >    89 | Can you share the project timeline with the team?
       90 | I need it for the stakeholder meeting.

--- 2 matches in 2 files ---
# More context lines
uv run webex-export search "deployment" -C 3

# Search only in chats or spaces
uv run webex-export search "API" --chats-only
uv run webex-export search "release" --spaces-only

Output is agent-friendly: each match starts with file:line so tools can navigate directly to the source.

Deferred attachment download

# First pass: text only (fast, filenames resolved via HEAD requests)
uv run webex-export chats --skip-downloads

# Later: download missing attachments
uv run webex-export chats

Only files not already on disk are downloaded. This works even if .file-map.json is deleted — the tool detects existing files by name.

Incremental export

# First run: full export
uv run webex-export chats

# Later: skip chats with no new messages
uv run webex-export chats --incremental

The tool always fetches the full message list to compare last_message_id with the stored state. If unchanged, it skips re-rendering and re-downloading.

JSON export and offline rebuild

Using --json is recommended for all exports. It stores the complete Webex API response alongside each markdown file, giving you a lossless archive that is independent of this tool's markdown rendering. This means you can:

  • Rebuild markdown anytime — re-render all .md files offline from the JSON, without contacting Webex. Useful when the tool improves its formatting (threads, timestamps, etc.) or when you want to experiment with the output.
  • Preserve data the markdown doesn't show — the JSON includes personId, roomId, roomType, mentionedPeople, raw html, and other API fields not present in the rendered markdown.
  • Feed data into other tools — the JSON files are standard arrays of Webex message objects, ready for scripts, search indexes, analytics, or LLM ingestion.

Sample workflow:

# 1. Export everything with JSON (one-time, needs token)
uv run webex-export chats --json
uv run webex-export spaces --json

# 2. Later: the tool gets a formatting update — rebuild all markdown from JSON (offline, no token)
uv run webex-export rebuild

# 3. Or rebuild selectively
uv run webex-export rebuild --chats-only
uv run webex-export rebuild -s alice
uv run webex-export rebuild --spaces-only --team "Falcon"

The JSON contains the full API response per message (messageId, personId, timestamps, parentId, html, text, files, etc.). Each .json file is a flat array of message objects in chronological order.

Interrupt and resume

^C
Export interrupted. Use --incremental to resume.

Output structure

<output>/
  chats_index.md                        # index of all chats (sorted by message count)
  spaces_index.md                       # index of all spaces (includes Team column)
  chats/                                # webex-export chats
    Alice_Johnson.md                    # chat as markdown
    Alice_Johnson.json                  # raw API data (with --json)
    Alice_Johnson/                      # attachments folder
      Q4-Report.pdf
      .file-map.json                    # URL → filename mapping
      .export-state.json                # last_message_id for --incremental
    Bob_Martinez.md
    Bob_Martinez/
  spaces/                               # webex-export spaces
    Falcon/                             # team folder
      Backend_Services.md
      Backend_Services/
    Training/                           # team folder
      Onboarding_2025.md
      Onboarding_2025/
    General_Discussion.md               # standalone space (no team)
    General_Discussion/
  • Team spaces go into spaces/<TeamName>/
  • Standalone spaces (no team) go flat into spaces/
  • Chats go flat into chats/

State files

File Purpose
.export-state.json Stores last_message_id and message_count. Used by --incremental, --list, and reindex.
.file-map.json Maps Webex download URLs to local filenames. Prevents re-downloading and enables --skip-downloads. Do not delete — without it, re-exports make a HEAD/GET request per file to re-resolve filenames (slow but not destructive).

Index files

Index files live next to (not inside) the data directories. Successive exports merge new rows without overwriting. Use webex-export reindex to rebuild them from state data.

# Rebuild both indexes
webex-export reindex

# Rebuild only one
webex-export reindex --chats-only
webex-export reindex --spaces-only

Rebuild markdown from JSON

If you exported with --json, you can re-render all markdown files offline without contacting the Webex API. See JSON export and offline rebuild for the full workflow and examples.

Example spaces_index.md:

# Webex Spaces Export

> Exported on 2026-04-14 12:30. 3 spaces.

| Space | Team | Messages | Last Message |
|-------|------|----------|--------------|
| [Backend Services](spaces/Falcon/Backend_Services.md) | Falcon | 353 | 2025-10-30 |
| [General Discussion](spaces/General_Discussion.md) || 89 | 2026-02-15 |

Links point into the chats/ or spaces/ subdirectory.

Chat / Space markdown

Messages are grouped by date. Thread replies are grouped under their parent message with a #### Thread heading. Edited messages show an (edited) indicator. Participant emails are listed in the header.

# Chat with Alice Johnson

> Exported on 2026-04-14 12:30.
> 342 messages from 2024-01-15 to 2026-04-12.
> Participants: Alice Johnson (alice.johnson@example.com)

### 2024-01-15

**Alice Johnson** _15:42_
Do you have the Q4 report ready?

**You** _15:45_ _(edited)_
Yes, updated version attached.
[Q4-Report.pdf](Alice_Johnson/Q4-Report.pdf)

---

### 2024-01-16

**Alice Johnson** _09:01_
Looks great, thanks!

#### Thread: "What tools are you using for..."

**Alice Johnson** _09:15_
What tools are you using for the data analysis?

**You** _2024-01-17 10:30_
Mostly Python with pandas and a few custom scripts.

**Alice Johnson** _2024-01-17 10:45_
Nice, we should align on the tooling.

Thread rendering:

  • Thread replies are grouped under a #### Thread heading with a truncated preview
  • The parent message is included inside the thread block
  • Same-day replies show time only (14:05), cross-date replies show full date + time (2024-01-17 10:30)
  • Orphan replies (parent not in export) render inline with *(thread reply)*

Spaces use # Space: <name> as title and show all participant names from the conversation.

Deactivated accounts: Chats where the other person's account has been deactivated show "Empty Title" in Webex. The tool reconstructs the name from the email address and marks the chat:

# Chat with Maria Santos *(deactivated account)*

> Exported on 2026-04-14 12:30.
> 98 messages from 2023-06-16 to 2025-03-24.
> Participants: Maria Santos (maria.santos@example.com)

File timestamps

Exported files use two dates (macOS):

  • Creation date = when the message was sent in Webex (the original timestamp)
  • Modification date = when you ran the export (download time)

This applies to both attachments and markdown files. Finder and ls -l show modification date by default (export time). Use "Get Info" or stat to see the original Webex timestamp in the creation date.

Markdown formatting

Webex messages with formatting (lists, bold, code, links, etc.) are correctly converted to Markdown. The tool uses the html field from the API and converts it, since the markdown field is only present on messages explicitly sent as Markdown. The conversion handles:

  • Bold, italic, inline code, code blocks
  • Ordered and unordered lists (including nested)
  • Links, blockquotes, headings
  • Line breaks and paragraphs

Attachment handling

  • All files referenced in messages are resolved and linked in the markdown
  • Messages with multiple files get multiple links
  • Image files (.png, .jpg, .gif, .webp) use ![image]() syntax for inline preview
  • Other files use [filename]() link syntax
  • File paths in links are URL-encoded to handle spaces and special characters
  • Files under Webex malware scan (HTTP 423) are skipped with a warning
  • Downloaded filenames are sanitized against path traversal
  • Duplicate filenames (e.g. multiple attachment.gif from different messages) get unique suffixes (attachment_1.gif, attachment_2.gif, etc.)
  • Corrupt file-maps from older exports are auto-detected and self-healed on re-export — no need to delete files

Project structure

webex-export/
  src/webex_export/
    __init__.py
    __main__.py           # python -m webex_export
    cli.py                # Click subcommands (chats, spaces, search)
    api_client.py         # Webex API client (auth, pagination, downloads)
    exporter.py           # Markdown rendering, attachments, state, index
    utils.py              # Filename sanitization, HTML→Markdown, timestamps
  pyproject.toml          # uv/hatchling project config
  README.md

Notes

  • The tool is deliberately slow (0.5s between API requests) to respect rate limits
  • On interruption (Ctrl+C), use --incremental to resume
  • Failed individual downloads do not stop the entire export
  • Existing attachments on disk are detected and never re-downloaded
  • The tool handles HTTP 429 (rate limit) responses automatically with retry
  • Edited messages show an (edited) indicator after the timestamp
  • Deactivated accounts are resolved from email and marked in the export
  • The search subcommand works offline on local files — no token needed
  • Messages are capped at 500,000 per room to prevent unbounded memory usage

License

This project is licensed under the MIT License — see the LICENSE file for details.

About

CLI tool to export Cisco Webex Direct Messages and Spaces as Markdown files with attachments, threads, and full-text search.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages