diff --git a/.cursor/rules/dataframe-serialization.mdc b/.cursor/rules/dataframe-serialization.mdc new file mode 100644 index 00000000..47d1ab5a --- /dev/null +++ b/.cursor/rules/dataframe-serialization.mdc @@ -0,0 +1,48 @@ +# DataFrame Serialization + +All DataFrame-to-records conversion for API responses, streaming events, or +frontend-visible data MUST use the centralized helpers in +`data_formulator.datalake.parquet_utils`: + +| Source type | Helper | +|---|---| +| `pd.DataFrame` | `df_to_safe_records(df)` | +| `pa.Table` (Arrow) | `get_sample_rows_from_arrow(table)` | + +## Why + +`pandas.DataFrame.to_json(orient='records')` defaults to `date_format='epoch'`, +which serializes datetime columns as **epoch milliseconds** (e.g. `1773532800000`). +The frontend interprets these as plain numbers and renders them with commas +(`1,773,532,800,000`) instead of formatted dates. + +`df_to_safe_records` enforces `date_format='iso'` and `default_handler=str`, +ensuring datetimes become ISO-8601 strings and exotic types degrade gracefully. + +## Banned Patterns + +```python +# BAD — missing date_format, datetimes become epoch numbers +json.loads(df.to_json(orient='records')) + +# BAD — to_dict returns Timestamp objects, not JSON-safe values +df.to_dict(orient='records') + +# ACCEPTABLE but should be unified for consistency +json.loads(df.to_json(orient='records', date_format='iso')) +``` + +## Correct Pattern + +```python +from data_formulator.datalake.parquet_utils import df_to_safe_records + +rows = df_to_safe_records(df) +preview = df_to_safe_records(df.head(5)) +``` + +## Exceptions + +Internal data processing that never reaches the frontend or JSON serialization +(e.g. Kusto SDK metadata parsing, Vega-Lite spec construction) may use +`to_dict(orient='records')` directly. diff --git a/.cursor/rules/dev-guides-first.mdc b/.cursor/rules/dev-guides-first.mdc index 118ea366..73695347 100644 --- a/.cursor/rules/dev-guides-first.mdc +++ b/.cursor/rules/dev-guides-first.mdc @@ -9,21 +9,23 @@ alwaysApply: true ## Before Starting Any Development Before implementing or designing a new feature, module, or significant change, **always -read the relevant documents in `dev-guides/`** to understand existing conventions: +read the relevant documents in `docs/dev-guides/`** to understand existing conventions: | Guide | When to Read | |-------|-------------| -| `dev-guides/1-streaming-protocol.md` | Any work on streaming endpoints or NDJSON protocol | -| `dev-guides/2-log-sanitization.md` | Any work involving logging, credentials, external services, or DataLoaders | -| `dev-guides/3-data-loader-development.md` | Any work on ExternalDataLoader, DataConnector, or connector routes | -| `dev-guides/4-authentication-oidc-tokenstore.md` | Any work on OIDC, TokenStore, AUTH_MODE, or SSO flows | -| `dev-guides/6-i18n-language-injection.md` | Any work on Agent prompts, Agent routes, backend user-visible messages, or frontend i18n | -| `dev-guides/7-unified-error-handling.md` | Any work on API errors, frontend API calls, streaming error events, or error tests | -| `dev-guides/8-path-safety.md` | Any work on backend file access, downloads, Workspace paths, Agent tools, DataLoaders, or sandbox config | -| `dev-guides/10-agent-knowledge-reasoning-log.md` | Any work on Agent knowledge injection, KnowledgeStore, reasoning logs, or experience distillation | -| `dev-guides/11-catalog-metadata-sync.md` | Any work on catalog sync, catalog_cache, catalog_annotations, metadata merge, Agent catalog tools, or frontend catalog browsing | -| `dev-guides/12-sandbox-session.md` | Any work on sandbox execution, Agent tool-calling loops, explore/execute_python code execution, or namespace management | -| `dev-guides/13-unified-row-limits.md` | Any work on row limits, data loading size caps, frontendRowLimit, MAX_IMPORT_ROWS, or DataLoader size parameter | +| `docs/dev-guides/1-streaming-protocol.md` | Any work on streaming endpoints or NDJSON protocol | +| `docs/dev-guides/2-log-sanitization.md` | Any work involving logging, credentials, external services, or DataLoaders | +| `docs/dev-guides/3-data-loader-development.md` | Any work on ExternalDataLoader, DataConnector, or connector routes | +| `docs/dev-guides/4-authentication-oidc-tokenstore.md` | Any work on OIDC, TokenStore, AUTH_MODE, or SSO flows | +| `docs/dev-guides/6-i18n-language-injection.md` | Any work on Agent prompts, Agent routes, backend user-visible messages, or frontend i18n | +| `docs/dev-guides/7-unified-error-handling.md` | Any work on API errors, frontend API calls, streaming error events, or error tests | +| `docs/dev-guides/8-path-safety.md` | Any work on backend file access, downloads, Workspace paths, Agent tools, DataLoaders, or sandbox config | +| `docs/dev-guides/10-agent-knowledge-reasoning-log.md` | Any work on Agent knowledge injection, KnowledgeStore, reasoning logs, or experience distillation | +| `docs/dev-guides/11-catalog-metadata-sync.md` | Any work on catalog sync, catalog_cache, catalog_annotations, metadata merge, Agent catalog tools, or frontend catalog browsing | +| `docs/dev-guides/12-sandbox-session.md` | Any work on sandbox execution, Agent tool-calling loops, explore/execute_python code execution, or namespace management | +| `docs/dev-guides/13-unified-row-limits.md` | Any work on row limits, data loading size caps, frontendRowLimit, MAX_IMPORT_ROWS, or DataLoader size parameter | +| `docs/dev-guides/14-model-capability-runtime-degradation.md` | Any work on LLM Client calls, Agent LLM invocations, model capability checks, reasoning_effort, or image/vision degradation | +| `docs/dev-guides/15-dataframe-serialization.md` | Any work on DataFrame→JSON serialization, Agent result rows, DataLoader sample_rows, or table API responses | | `.cursor/rules/i18n-no-hardcoded-strings.mdc` | Any work adding or changing user-visible strings in `src/` | Also check `.cursor/rules/` and `.cursor/skills/` for related coding conventions. @@ -37,7 +39,7 @@ decision, you MUST update documentation before considering the task complete: (e.g. new sensitive key type → update `2-log-sanitization.md`) 2. **Create new dev-guide** — if your change introduces a new cross-cutting convention - that future developers must follow. Place in `dev-guides/` with the next number prefix. + that future developers must follow. Place in `docs/dev-guides/` with the next number prefix. 3. **Update existing SKILL** — if your change affects how an existing skill works (e.g. new error handling pattern → update `error-handling/SKILL.md`) @@ -55,7 +57,7 @@ decision, you MUST update documentation before considering the task complete: For every PR that introduces new patterns: -- [ ] Searched `dev-guides/` for related existing docs +- [ ] Searched `docs/dev-guides/` for related existing docs - [ ] Updated or created dev-guide if introducing cross-cutting conventions - [ ] Updated or created `.cursor/skills/` if introducing reusable workflows - [ ] Updated or created `.cursor/rules/` if introducing file-scoped constraints diff --git a/.cursor/rules/error-response-safety.mdc b/.cursor/rules/error-response-safety.mdc index 963f7394..76cbc7c7 100644 --- a/.cursor/rules/error-response-safety.mdc +++ b/.cursor/rules/error-response-safety.mdc @@ -10,7 +10,7 @@ Never return raw exception text (`str(e)`, `f"...{e}"`) directly in HTTP respons Python exceptions may contain stack traces, file paths, database connection strings, API keys, or internal IP addresses — all of which are security risks (CWE-209). -See `dev-guides/7-unified-error-handling.md` for the full error handling contract. +See `docs/dev-guides/7-unified-error-handling.md` for the full error handling contract. ## Unified Error System diff --git a/.cursor/rules/language-injection-conventions.mdc b/.cursor/rules/language-injection-conventions.mdc index 57c533e6..9ee458da 100644 --- a/.cursor/rules/language-injection-conventions.mdc +++ b/.cursor/rules/language-injection-conventions.mdc @@ -40,4 +40,4 @@ yield { yield {"message": translate_in_python("empty_df", lang)} ``` -For detailed architecture and anti-pattern explanations, see `dev-guides/6-i18n-language-injection.md`. +For detailed architecture and anti-pattern explanations, see `docs/dev-guides/6-i18n-language-injection.md`. diff --git a/.cursor/rules/path-safety.mdc b/.cursor/rules/path-safety.mdc index f53337d6..56de8eaf 100644 --- a/.cursor/rules/path-safety.mdc +++ b/.cursor/rules/path-safety.mdc @@ -6,7 +6,7 @@ alwaysApply: false # 路径安全编码规范 -编辑此目录下的文件时,请遵守以下安全规则(详见 `dev-guides/8-path-safety.md` 和 `.cursor/skills/path-safety/SKILL.md`): +编辑此目录下的文件时,请遵守以下安全规则(详见 `docs/dev-guides/8-path-safety.md` 和 `.cursor/skills/path-safety/SKILL.md`): ## ConfinedDir 是唯一的路径约束原语 diff --git a/.cursor/rules/unified-error-protocol.mdc b/.cursor/rules/unified-error-protocol.mdc index 2e8eb3db..1428adee 100644 --- a/.cursor/rules/unified-error-protocol.mdc +++ b/.cursor/rules/unified-error-protocol.mdc @@ -6,7 +6,7 @@ alwaysApply: false # Unified Error Protocol -See `dev-guides/7-unified-error-handling.md` for the full developer guide. +See `docs/dev-guides/7-unified-error-handling.md` for the full developer guide. ## HTTP Status Code Policy diff --git a/.cursor/skills/error-handling/SKILL.md b/.cursor/skills/error-handling/SKILL.md index 133992e2..bd09d21c 100644 --- a/.cursor/skills/error-handling/SKILL.md +++ b/.cursor/skills/error-handling/SKILL.md @@ -7,8 +7,8 @@ description: 统一错误处理系统。在添加 API 端点、修改错误处 Unified error handling system for DF. Use when adding API endpoints, modifying error handling, or adding frontend API calls. -> **Prerequisites**: Read `dev-guides/7-unified-error-handling.md` before changing API error behavior. -> Read `dev-guides/2-log-sanitization.md` when the work involves logging, credentials, external services, or DataLoaders. +> **Prerequisites**: Read `docs/dev-guides/7-unified-error-handling.md` before changing API error behavior. +> Read `docs/dev-guides/2-log-sanitization.md` when the work involves logging, credentials, external services, or DataLoaders. > If your work introduces new error handling patterns or conventions, update this file and related dev-guides accordingly. ## Architecture Overview diff --git a/.cursor/skills/language-injection/SKILL.md b/.cursor/skills/language-injection/SKILL.md index a75c3a01..121bda64 100644 --- a/.cursor/skills/language-injection/SKILL.md +++ b/.cursor/skills/language-injection/SKILL.md @@ -5,9 +5,9 @@ description: LLM Agent 多语言注入规范。在修改 Agent 提示词、添 # Language Injection for Agent Prompts -Authoritative developer guide: `dev-guides/6-i18n-language-injection.md`. +Authoritative developer guide: `docs/dev-guides/6-i18n-language-injection.md`. -> **Prerequisites**: Read `dev-guides/6-i18n-language-injection.md` before changing Agent prompts, Agent routes, backend user-visible messages, or frontend i18n strings. +> **Prerequisites**: Read `docs/dev-guides/6-i18n-language-injection.md` before changing Agent prompts, Agent routes, backend user-visible messages, or frontend i18n strings. > If your work introduces new language injection patterns or conventions, update this file and related dev-guides accordingly. ## Architecture diff --git a/.cursor/skills/path-safety/SKILL.md b/.cursor/skills/path-safety/SKILL.md index 7704d064..f93cf8fa 100644 --- a/.cursor/skills/path-safety/SKILL.md +++ b/.cursor/skills/path-safety/SKILL.md @@ -5,7 +5,7 @@ description: 服务端路径安全与文件访问编码规范。在编写文件 # Path Safety — 服务端安全编码规范 -> **来源**:`dev-guides/8-path-safety.md`(正式开发规范)+ `design-docs/issues/002-arbitrary-file-read-audit.md`(安全审计复核)。 +> **来源**:`docs/dev-guides/8-path-safety.md`(正式开发规范)+ `design-docs/issues/002-arbitrary-file-read-audit.md`(安全审计复核)。 > 本文档提炼了 6 条必须遵守的编码规范。违反任一条即可能引入路径穿越(LFI)漏洞。 --- @@ -154,7 +154,7 @@ def _enforce_deployment_restrictions(): ## 参考文档 -- `dev-guides/8-path-safety.md` — 服务端路径安全开发规范 +- `docs/dev-guides/8-path-safety.md` — 服务端路径安全开发规范 - `design-docs/6-path-safety-confined-dir.md` — 剩余未完成实现项状态页 - `design-docs/issues/002-arbitrary-file-read-audit.md` — 安全审计复核报告 - `py-src/data_formulator/security/path_safety.py` — ConfinedDir 源码 diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md index 2db60c86..6ae19d08 100644 --- a/DEVELOPMENT.md +++ b/DEVELOPMENT.md @@ -527,7 +527,7 @@ cp -r /backup/workspaces $DATA_FORMULATOR_HOME/workspaces > **Tip:** If you forgot to back up `FLASK_SECRET_KEY` and it was auto-generated, there is no way to recover it. Users will need to log in again, and any chart with a cached code signature will need to be re-executed by the Agent. -> **中文版:** 详细的迁移操作指南见 [docs-cn/7-server-migration-guide.md](docs-cn/7-server-migration-guide.md)。 +> **中文版:** 详细的迁移操作指南见 [docs/docs-cn/7-server-migration-guide.md](docs/docs-cn/7-server-migration-guide.md)。 ## Authentication Architecture diff --git a/README.md b/README.md index 1d8f697f..8e88cc9f 100644 --- a/README.md +++ b/README.md @@ -26,21 +26,35 @@ + +## Why Data Formulator? + +Your data lives everywhere — databases, warehouses, BI tools, files. Coding agents can help, but only after someone wires them up, and answers come back as walls of code or text that are hard to follow, refine, or share. + +Data Formulator makes it simple: **connect any data, ask anything, get charts you can edit, branch, and share** — all on one interactive, visual canvas. + +- **Data & platform teams**: wire up your databases, warehouses, and BI sources once, and give the whole org an AI-powered data exploration layer. +- **Analysts & users**: ask, edit, branch, share. It's so easy to get insights from good-looking charts. + - ## News 🔥🔥🔥 -[05-11-2026] **Data Formulator 0.7 (alpha 2)** — A new chapter for AI-powered data exploration -- 🔌 **Data connectors** — first-class persistent connection to Superset, Kusto, Cosmos DB, MySQL, PostgreSQL, MSSQL, S3, Azure Blob, BigQuery, and more, with SSO, lazy catalog loading, search, and smart filters. -- 💬 **Conversational agent with thread memory** — a unified `DataAgent` that weaves explanation, exploration, visualization, and recommendation into one fluid conversation, carrying context across turns so the agent stays in sync with your train of thought. -- 🗂️ **Persistent session & workspace management** — identity-isolated workspaces with local and Azure Blob backends; sessions persist across restarts with timestamps and sort. -- 📊 **Expressive visualization** — 30+ chart types via a new semantic chart engine (area, streamgraph, candlestick, pie, radar, maps, …), plus a chart style-refinement agent that turns rough charts into presentation-ready visuals: restyle in one click, refine typography, color, layout, and annotations through natural language. -- 📚 **Knowledge distillation (experimental)** — agents distill reusable skills and experiences from your sessions into a shared knowledge library that informs future sessions. +[05-28-2026] **Data Formulator 0.7** — turn ANY data into insights in five easy steps: + +1. **Connect.** Governed, reusable connections to databases, warehouses, BI systems, object stores, and files (Superset, Kusto, Cosmos DB, MySQL, PostgreSQL, MSSQL, BigQuery, S3, Azure Blob, …). Need a custom source? Point your coding agent at the [data loader plugin guide](examples/plugins/README.md). +2. **Load.** Ask the **data-loading agent** to find tables from connected databases, or extract data from Excel files, images, websites, and text. +3. **Explore.** A unified **Data Agent** with thread memory inspects data, runs sandboxed code, and weaves explanation, exploration, and recommendation into one fluid conversation — grounded in your context. The **Data Thread** keeps questions, intermediate results, and charts navigable: revisit earlier steps, branch into alternatives, and compare side by side. +4. **Refine.** 30+ chart types (area, streamgraph, candlestick, radar, maps, KPI, …) via a new semantic chart engine, plus a **style-refinement agent** that turns rough charts into presentation-ready visuals through natural language. +5. **Share.** Build reports and export as image or PDF to tell the story. + +➕ **Persistent sessions & workspaces** — identity-isolated, saved across restarts. Data Formulator is your de facto data analysis pane. -> Install the pre-release with `pip install --pre data-formulator` or pin `==0.7.0a2`. +**Multilingual UI** — Data Formulator now speaks Chinese in addition to English (没错,DF现在会说中文了!). More languages on the way — [contributions welcome](src/i18n/TRANSLATION_GUIDE.md). + +> Install with `pip install data_formulator` or run instantly with `uvx data_formulator`. > [!TIP] > **Are you a developer?** Join us to shape the future of AI-powered data exploration! @@ -50,6 +64,7 @@ https://github.com/user-attachments/assets/8ca57b68-4d7a-42cb-bcce-43f8b1681ce2 ## Previous Updates Here are milestones that lead to the current design: +- **v0.7 alpha 2** (05-11-2026): Early preview of data connectors, the unified `DataAgent` with thread memory, persistent workspaces, the semantic chart engine, and experimental knowledge distillation. - **v0.6** ([Demo](https://github.com/microsoft/data-formulator/releases/tag/0.6)): Real-time insights from live data — connect to URLs and databases with automatic refresh - **uv support**: Faster installation with [uv](https://docs.astral.sh/uv/) — `uvx data_formulator` or `uv pip install data_formulator` - **v0.5.1** ([Demo](https://github.com/microsoft/data-formulator/pull/200#issue-3635408217)): Community data loaders, US Map & Pie Chart, editable reports, snappier UI @@ -67,9 +82,7 @@ Here are milestones that lead to the current design: ## Overview -**Data Formulator** is a Microsoft Research prototype for data exploration with visualizations powered by AI agents. - -Data Formulator enables analysts to explore data with visualizations. Started with data in any format (screenshot, text, csv, or database), you can work with AI agents with a novel blended interface that combines *user interface interactions (UI)* and *natural language (NL) inputs* to communicate their intents, control branching exploration directions, and create reports to share their insights. +**Data Formulator** is a Microsoft Research project for data exploration with visualizations powered by AI agents. It combines *UI interactions* with *natural language* so analysts can communicate intent, branch into alternative analyses, and share results — starting from any data format (screenshot, text, CSV, or database). ## Get Started diff --git a/dev-guides/1-streaming-protocol.md b/docs/dev-guides/1-streaming-protocol.md similarity index 96% rename from dev-guides/1-streaming-protocol.md rename to docs/dev-guides/1-streaming-protocol.md index 4885483f..dc9a7521 100644 --- a/dev-guides/1-streaming-protocol.md +++ b/docs/dev-guides/1-streaming-protocol.md @@ -34,7 +34,6 @@ | `get-recommendation-questions` | `"question"` | 探索建议问题 | | `generate-report-chat` | `"text_delta"`, `"embed_chart"`, `"embed_table"` | 报告生成流 | | `data-loading-chat` | `"text_delta"`, `"tool_call"`, `"tool_result"`, `"done"` | 数据加载对话 | -| `clean-data-stream` | 各种 agent 事件 | 数据清洗流 | | (跨端点通用) | `"thinking_text"` | Agent 推理/思考过程文本(参见 2.4) | `data-agent-streaming` 的 `result.type === "clarify"` 使用结构化多问题格式。后端和前端都以 @@ -177,7 +176,7 @@ Agent 在执行过程中产生的推理/思考文本。前端应实时展示为 1. **Agent 层面的 think tool**:`DataAgent` 使用 `think` 工具时,将 tool message 以 `thinking_text` 事件输出。 2. **LLM 伴随内容**:当 LLM 在 tool_calls 旁返回文本 content 时,Route 层将其作为 `thinking_text` 事件输出。 -3. **(Phase 2/3)模型原生推理**:Anthropic extended thinking 或 OpenAI reasoning tokens(`reasoning_content` 字段),由 `client_utils.py` 解析后输出为 `thinking_text` 事件。 +3. **模型原生推理链保持**:部分推理模型(当前为 DeepSeek V4)在响应中返回 `reasoning_content` 字段,该字段在多轮对话中必须回传至 assistant 消息。已通过 `agent_utils.attach_reasoning_content()` 和 `accumulate_reasoning_content()` 在所有 tool-loop agent 中统一处理。未来 Anthropic extended thinking 或 OpenAI reasoning tokens 如需类似处理,可复用同一机制。 **后端生成**: @@ -272,7 +271,6 @@ if (parsed.text) { ... } | `/get-recommendation-questions` | `x-ndjson` | route 累积碎片 → `_try_parse_explore_line` | `stream_error_event` | ✅ `_with_warnings` | | `/generate-report-chat` | `x-ndjson` | route `json.dumps(event)` | `stream_error_event` | ✅ `_with_warnings` | | `/data-loading-chat` | `x-ndjson` | route `json.dumps(event)` | `stream_error_event` | ✅ `_with_warnings` | -| `/clean-data-stream` | `x-ndjson` | agent 直接 yield | `stream_error_event` | ✅ `_with_warnings` | > **注意**: `/refine-data` 曾出现在此表中,但实际实现为普通 JSON endpoint(`jsonify` 返回),不使用 NDJSON 流。已于 2026-04-30 Phase 0 盘点中确认并移除。详见 `design-docs/20` 附录 A.10。 diff --git a/dev-guides/10-agent-knowledge-reasoning-log.md b/docs/dev-guides/10-agent-knowledge-reasoning-log.md similarity index 100% rename from dev-guides/10-agent-knowledge-reasoning-log.md rename to docs/dev-guides/10-agent-knowledge-reasoning-log.md diff --git a/dev-guides/11-catalog-metadata-sync.md b/docs/dev-guides/11-catalog-metadata-sync.md similarity index 100% rename from dev-guides/11-catalog-metadata-sync.md rename to docs/dev-guides/11-catalog-metadata-sync.md diff --git a/dev-guides/12-sandbox-session.md b/docs/dev-guides/12-sandbox-session.md similarity index 100% rename from dev-guides/12-sandbox-session.md rename to docs/dev-guides/12-sandbox-session.md diff --git a/dev-guides/13-unified-row-limits.md b/docs/dev-guides/13-unified-row-limits.md similarity index 100% rename from dev-guides/13-unified-row-limits.md rename to docs/dev-guides/13-unified-row-limits.md diff --git a/docs/dev-guides/14-model-capability-runtime-degradation.md b/docs/dev-guides/14-model-capability-runtime-degradation.md new file mode 100644 index 00000000..3d795ebe --- /dev/null +++ b/docs/dev-guides/14-model-capability-runtime-degradation.md @@ -0,0 +1,136 @@ +# 14 - 模型能力运行时降级架构 + +## 概述 + +所有 LLM 调用统一通过 LiteLLM,利用 `drop_params=True` 实现运行时自动降级。 +前端和后端不再预判模型能力(vision、reasoning 等),而是让模型调用自然失败后自动重试。 + +## 核心设计原则 + +1. **唯一调用路径** — 所有 Agent 通过 `Client.get_completion()` 或 `Client.get_completion_with_tools()` 调用 LLM,两者内部均走 `litellm.completion()`。 +2. **`drop_params=True`** — LiteLLM 自动丢弃模型不支持的参数(如 `reasoning_effort`、`parallel_tool_calls`),不会报错。 +3. **图片降级** — 如果模型不支持图片,`_is_image_deserialize_error()` 捕获异常后自动剥离图片重试。 +4. **无前端预检查** — 前端始终允许用户上传图片;后端自动处理。 + +## Client API + +```python +class Client: + def get_completion(self, messages, stream=False, + reasoning_effort="low", **kwargs): + """通用 completion 调用。""" + + def get_completion_with_tools(self, messages, tools, stream=False, + reasoning_effort="low", **kwargs): + """带 tools 的 completion 调用。接受 tool_choice、 + parallel_tool_calls 等通过 **kwargs 传入。""" +``` + +两个方法共享相同的内部逻辑: +- 拷贝 `self.params`,注入 `reasoning_effort` +- 调用 `litellm.completion(model=..., drop_params=True, ...)` +- 捕获 image deserialize 错误 → 剥离图片 → 重试 + +## reasoning_effort 分层 + +每个 Agent 的默认 tier 在 `py-src/data_formulator/agent_config.py` 的 +`AGENT_REASONING_EFFORT` 字典里统一维护。Agent 调用 LLM 时通过 +`reasoning_effort_for(_AGENT_ID, self.client.model)` 解析最终值,自动按 +目标模型做能力降级。 + +### 五档定义 + +| Tier | 适用范围 | 不支持的模型如何降级 | +|---|---|---| +| `none` | 仅 GPT-5 `codex` / `pro`(最轻档) | 其它模型回退到 `low` | +| `minimal` | 仅 OpenAI GPT-5 base / mini / nano / 5.x | GPT-5 codex/pro → `none`;其它 → `low` | +| `low` / `medium` / `high` | 所有支持 reasoning 的模型(LiteLLM 统一映射) | 不支持 reasoning 的模型由 `drop_params=True` 静默忽略 | + +> 选择原则:**挑能产出可接受质量的最低档**。重型代码生成 / 多步工具 +> 调用使用 `low`;单轮抽取 / 分类 / 格式化使用 `minimal`。 + +### 当前配置(来源:`agent_config.py`) + +| Agent ID | Tier | 备注 | +|---|---|---| +| `data_transform` | `low` | 生成 Python 转换脚本 | +| `data_rec` | `low` | 图表 / 转换推荐 | +| `data_agent` | `low` | 多步探索 agent | +| `report_gen` | `low` | 叙述 + inspect/embed 工具 | +| `interactive_explore` | `low` | 探索想法 agent | +| `data_loading_chat` | `low` | 会话式数据加载(带工具) | +| `data_load` | `minimal` | 一次性类型推断 | +| `experience_distill` | `minimal` | 总结分析上下文 | +| `chart_insight` | `minimal` | 图表标题 + 1–3 个 takeaway | +| `chart_restyle` | `minimal` | 对 Vega-Lite spec 做样式编辑 | +| `code_explanation` | `minimal` | 解释衍生字段 | +| `sort_data` | `minimal` | 小列表的自然顺序排序 | +| `simple` | `minimal` | nl_to_filter / workspace_name / intent | + +`DEFAULT_REASONING_EFFORT = "low"` —— 未在表中列出的 agent id 走默认值。 + +### 运行时覆盖 + +通过环境变量 `DF_REASONING_EFFORT_` 可在不改代码的情况下临时 +调整某个 agent 的 tier: + +```bash +DF_REASONING_EFFORT_DATA_TRANSFORM=medium +DF_REASONING_EFFORT_REPORT_GEN=high +``` + +合法取值:`none` / `minimal` / `low` / `medium` / `high`。 + +### Agent 调用模板 + +```python +from data_formulator.agent_config import reasoning_effort_for + +_AGENT_ID = "data_transform" + +response = self.client.get_completion( + messages=messages, + reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model), +) +``` + +### 查询方式 + +```bash +# 当前所有 agent 的默认 tier +grep -nE '"\w+": +"' py-src/data_formulator/agent_config.py + +# 所有调用点 +grep -rn 'reasoning_effort_for' py-src/data_formulator/agents/ +``` + +## 已删除的机制 + +| 删除项 | 原位置 | 理由 | +|---|---|---| +| `import openai` + 直连分支 | `client_utils.py`、5 个 Agent | 统一走 LiteLLM | +| `get_response()` | `client_utils.py` | 死代码 | +| `is_likely_text_only_model()` | `model_registry.py` | 硬编码模型名检查 | +| `model_supports_vision()` | `model_registry.py` | 前端/路由不再预检查 | +| `supports_vision` 字段 | `ModelConfig` (dfSlice)、`_reload`/`list_public` | 前端不消费 | +| `checkIsLikelyTextOnlyModel()` | `DataLoadingChat.tsx` | 前端不预检查 | +| `checkModelSupportsImageInput()` | `DataLoadingChat.tsx` | 前端不预检查 | +| vision 路由预检查 | `routes/agents.py` chart-insight、data-loading-chat | 改为运行时降级 | + +## 新增 Agent 检查清单 + +- [ ] Agent 继承正确的 base class 并使用 `self.client` +- [ ] 在 `agent_config.py` 的 `AGENT_REASONING_EFFORT` 中为新 agent_id 设置默认 tier +- [ ] 调用 LLM 时使用 `reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)` +- [ ] 需要 tools → 使用 `self.client.get_completion_with_tools(messages, tools, ...)` +- [ ] **不要** 直接 `import litellm` 或 `import openai` 调用 API +- [ ] **不要** 检查模型名来决定是否支持某功能(vision / reasoning 等) +- [ ] **不要** 在调用点硬编码 `reasoning_effort="high"`——统一通过 `agent_config.py` 维护 + +## 厂商映射 + +| 厂商 | `reasoning_effort` 效果 | +|---|---| +| OpenAI (o1/o3/gpt-5) | 直接透传 | +| Anthropic (Claude) | 映射为 `thinking.budget_tokens` | +| 其他(Gemini、Ollama 等) | `drop_params=True` 静默忽略 | diff --git a/docs/dev-guides/15-dataframe-serialization.md b/docs/dev-guides/15-dataframe-serialization.md new file mode 100644 index 00000000..50271d64 --- /dev/null +++ b/docs/dev-guides/15-dataframe-serialization.md @@ -0,0 +1,54 @@ +# 15 — DataFrame Serialization Convention + +## Problem + +pandas `DataFrame.to_json(orient='records')` defaults to `date_format='epoch'`, +which serializes `datetime64` columns as **epoch milliseconds** (e.g. +`1773532800000`). When the frontend receives these numbers, `formatCellValue` +treats them as plain integers and displays `1,773,532,800,000` instead of a +formatted date. + +`DataFrame.to_dict(orient='records')` is even worse — it returns Python +`pd.Timestamp` objects that are not JSON-serializable at all and rely on +whatever `json.dumps` fallback happens to be in scope. + +## Solution + +A single utility function in `data_formulator.datalake.parquet_utils`: + +```python +def df_to_safe_records(df: pd.DataFrame) -> list[dict[str, Any]]: + return json.loads( + df.to_json(orient="records", date_format="iso", default_handler=str) + ) +``` + +For Arrow tables, the existing `get_sample_rows_from_arrow(table)` already +handles this correctly via `make_json_safe`. + +## When to Use + +| Scenario | Function | +|---|---| +| Agent result rows (`content["rows"]`) | `df_to_safe_records(query_output)` | +| Table sample rows (`sample_rows`) | `df_to_safe_records(sample_df)` | +| Data loader metadata previews | `df_to_safe_records(df.head(5))` | +| File parse results (Excel/CSV) | `df_to_safe_records(df)` | +| Arrow table samples | `get_sample_rows_from_arrow(table)` | + +## Exceptions + +The following uses of `to_dict(orient='records')` are exempt because they never +reach JSON serialization or the frontend: + +- **Kusto SDK metadata** — `.show tables details` results iterated in Python only +- **Vega-Lite spec construction** — `create_vl_plots.py` builds inline data for + Vega specs; Vega handles temporal formatting itself + +## New Module Checklist + +When writing a new Agent, DataLoader, or route that returns DataFrame rows: + +1. Import: `from data_formulator.datalake.parquet_utils import df_to_safe_records` +2. Convert: `rows = df_to_safe_records(df)` (not `to_json` / `to_dict`) +3. Test: verify datetime columns appear as ISO strings in the response diff --git a/dev-guides/2-log-sanitization.md b/docs/dev-guides/2-log-sanitization.md similarity index 100% rename from dev-guides/2-log-sanitization.md rename to docs/dev-guides/2-log-sanitization.md diff --git a/dev-guides/3-data-loader-development.md b/docs/dev-guides/3-data-loader-development.md similarity index 98% rename from dev-guides/3-data-loader-development.md rename to docs/dev-guides/3-data-loader-development.md index 481cae55..7f5db97c 100644 --- a/dev-guides/3-data-loader-development.md +++ b/docs/dev-guides/3-data-loader-development.md @@ -249,7 +249,7 @@ Loader type 注册后,`DataConnector` 会自动把它包装成 connector insta 1. **内置 loader**:修改源码,在 `data_loader/__init__.py` 的 `_LOADER_SPECS` 列表中添加 `(registry_key, module_path, class_name, pip_package)` 条目。 2. **外部 loader 插件**:不修改仓库源码,把 `*_data_loader.py` 放到 - `DF_PLUGIN_DIR` 指向的目录。默认目录是 `~/.data-formulator/plugins/`。 + `DF_PLUGIN_DIR` 指向的目录。默认目录是 `~/.data_formulator/plugins/`。 内置 loader 当前**不是**包内自动发现;新增内置 loader 必须显式更新 `_LOADER_SPECS`。 零源码修改只适用于外部 `DF_PLUGIN_DIR` 插件。 @@ -315,7 +315,8 @@ my_report_data_loader.py -> my_report - [ ] 大目录优先实现 `ls(path, filter, limit, offset)` 与 `search_catalog(query)`,避免前端展开时全量扫描。 - [ ] table 节点携带稳定源标识符,例如 `metadata["_source_name"]`,供 preview/import/refresh 使用。 - [ ] `fetch_data_as_arrow()` 尊重 `import_options` 中的 `size`、`columns`、`sort_columns`、`sort_order`、`filters`、`source_filters`。 -- [ ] `size` 必须通过 `min(opts.get("size", MAX_IMPORT_ROWS), MAX_IMPORT_ROWS)` 截断,从 `external_data_loader` 导入 `MAX_IMPORT_ROWS`。详见 `dev-guides/13-unified-row-limits.md`。 +- [ ] `sample_rows` 等面向前端的 DataFrame→records 转换使用 `df_to_safe_records(df)` 而非 `to_json`/`to_dict`(详见 `docs/dev-guides/15-dataframe-serialization.md`)。 +- [ ] `size` 必须通过 `min(opts.get("size", MAX_IMPORT_ROWS), MAX_IMPORT_ROWS)` 截断,从 `external_data_loader` 导入 `MAX_IMPORT_ROWS`。详见 `docs/dev-guides/13-unified-row-limits.md`。 - [ ] SQL 类 loader 构造筛选条件时使用参数化、标识符白名单或受控运算符集合,禁止拼接未校验的 operator/column。 **可靠性与测试** @@ -450,7 +451,7 @@ Loader 应在 SQL 构建时使用参数化或白名单校验运算符(参考 ` - 连接和读取错误必须抛出清晰的 `ValueError` - 表名/对象名需校验或清洗 - `fetch_data_as_arrow` 必须尊重 `import_options` 中的 `size`、`columns`、`sort_columns`、`sort_order`、`filters`、`source_filters` -- `size` 参数必须受 `MAX_IMPORT_ROWS`(200 万)硬上限约束,防止意外 OOM(参见 `dev-guides/13-unified-row-limits.md`) +- `size` 参数必须受 `MAX_IMPORT_ROWS`(200 万)硬上限约束,防止意外 OOM(参见 `docs/dev-guides/13-unified-row-limits.md`) - `list_tables()` 返回值必须包含统一的轻量 metadata 字段(至少稳定源标识、列名/类型;有低成本来源时包含表/列描述) - 覆盖 `get_column_types()` 或 `get_metadata()` 时,测试表级 `description`、列级 `description`、空描述清空、缺 key 保留和 metadata 失败不阻断导入/预览 diff --git a/dev-guides/4-authentication-oidc-tokenstore.md b/docs/dev-guides/4-authentication-oidc-tokenstore.md similarity index 99% rename from dev-guides/4-authentication-oidc-tokenstore.md rename to docs/dev-guides/4-authentication-oidc-tokenstore.md index 870bdaa5..4565edac 100644 --- a/dev-guides/4-authentication-oidc-tokenstore.md +++ b/docs/dev-guides/4-authentication-oidc-tokenstore.md @@ -140,7 +140,7 @@ def auth_config() -> dict: - 前端收到消息后调用 `POST /api/auth/tokens/save`,请求体包含 `system_id`、`access_token`、可选 `refresh_token`、`user`、`remember`。 - Loader 的 `__init__` 必须消费注入的 `access_token` 或 `sso_access_token`,否则声明 delegated/SSO 模式没有实际效果。 -用户或运维侧 Superset 配置步骤见 `docs-cn/5.1-superset-sso-oauth-config-guide.md`。 +用户或运维侧 Superset 配置步骤见 `docs/docs-cn/5.1-superset-sso-oauth-config-guide.md`。 --- diff --git a/dev-guides/5-data-connector-api.md b/docs/dev-guides/5-data-connector-api.md similarity index 96% rename from dev-guides/5-data-connector-api.md rename to docs/dev-guides/5-data-connector-api.md index ca3a4c64..37d99602 100644 --- a/dev-guides/5-data-connector-api.md +++ b/docs/dev-guides/5-data-connector-api.md @@ -193,7 +193,7 @@ Vault 凭证同样按 `identity + connector_id` 隔离。两个用户连接同 Loader type discovery 来自 `DATA_LOADERS`: - 内置 loader 在 `data_loader/__init__.py` 的 `_LOADER_SPECS` 中注册。 -- 外部 loader 从 `DF_PLUGIN_DIR` 扫描,默认目录为 `~/.data-formulator/plugins/`。 +- 外部 loader 从 `DF_PLUGIN_DIR` 扫描,默认目录为 `~/.data_formulator/plugins/`。 - 文件名必须匹配 `*_data_loader.py`,例如 `my_report_data_loader.py` 会注册为 `my_report`。 - 文件中需要定义公开的 `ExternalDataLoader` 子类;如果 key 与内置 loader 相同, @@ -261,7 +261,7 @@ metadata 时,前端继续调用 `/api/connectors/get-catalog` 并传入该节 `design-docs/2-external-dataloader-enhancements.md` 的“缺陷二:认证方式单一, 缺少 SSO/集成认证”。 -认证相关细节见 `dev-guides/4-authentication-oidc-tokenstore.md`。 +认证相关细节见 `docs/dev-guides/4-authentication-oidc-tokenstore.md`。 --- @@ -302,11 +302,11 @@ Superset dashboard。成员表放在节点 `metadata["tables"]` 中。 `import-group` 会遍历请求中的 `tables`,为每个成员表单独写入 workspace。 每张表的行数受全局 `MAX_IMPORT_ROWS`(200 万)硬上限约束(前端不再传 `row_limit`, -由后端 DataLoader 统一使用 `frontendRowLimit`;详见 `dev-guides/13-unified-row-limits.md`)。 +由后端 DataLoader 统一使用 `frontendRowLimit`;详见 `docs/dev-guides/13-unified-row-limits.md`)。 若传入 `source_filters`,只把 `applies_to` 命中的筛选条件传给对应成员表。 `source_filters` 是数据源原生筛选条件,不等同于通用前端 WHERE 构造器。格式与 -`dev-guides/3-data-loader-development.md` 保持一致: +`docs/dev-guides/3-data-loader-development.md` 保持一致: ```json [ @@ -359,8 +359,8 @@ Loader 应在 `fetch_data_as_arrow(source_table, import_options)` 中读取 ## 14. 相关文档 -- `dev-guides/3-data-loader-development.md` -- `dev-guides/4-authentication-oidc-tokenstore.md` -- `docs-cn/1-data-source-connections.md` -- `docs-cn/6-credential-vault.md` -- `docs-cn/7-server-migration-guide.md` +- `docs/dev-guides/3-data-loader-development.md` +- `docs/dev-guides/4-authentication-oidc-tokenstore.md` +- `docs/docs-cn/1-data-source-connections.md` +- `docs/docs-cn/6-credential-vault.md` +- `docs/docs-cn/7-server-migration-guide.md` diff --git a/dev-guides/6-i18n-language-injection.md b/docs/dev-guides/6-i18n-language-injection.md similarity index 98% rename from dev-guides/6-i18n-language-injection.md rename to docs/dev-guides/6-i18n-language-injection.md index cfb4c8e0..15001b11 100644 --- a/dev-guides/6-i18n-language-injection.md +++ b/docs/dev-guides/6-i18n-language-injection.md @@ -95,10 +95,12 @@ agent = SortDataAgent(client=client, language_instruction=language_instruction) | 场景 | Mode | |------|------| | `DataAgent`、`ChartInsightAgent`、`InteractiveExploreAgent`、`ReportGenAgent` | `full` | +| `CodeExplanationAgent`、`DataLoadingAgent` | `full` | | `DataRecAgent`、`DataTransformationAgent`、`DataLoadAgent` | `compact` | -| `SortDataAgent` | `compact` | +| `SortDataAgent`、`ChartRestyleAgent` | `compact` | | `workspace-name` | `full` | | `test-model`、模型列表、纯状态检查 | 不注入 | +| `nl-to-filter`、`classify-chart-intent` | 不注入(纯结构化输出) | ### 2.2 Agent 层 @@ -222,7 +224,6 @@ event = { | `agent.noCodeBlock` | `DataRecAgent` / `DataTransformationAgent` | | `agent.unexpectedError` | `DataRecAgent` fallback | | `agent.codeExecError` | code execution fallback | -| `agent.unableExtractTables` | `DataCleanAgentStream` | | `agent.unableExtractScript` | `DataLoadAgent` / `SortDataAgent` | | `agent.errorCallingModel` | `DataLoadingChatAgent` | @@ -419,5 +420,5 @@ locale 文件和 `AVAILABLE_LANGUAGES` 都配置完成的语言,才应出现 - `.cursor/skills/language-injection/SKILL.md` - `.cursor/rules/language-injection-conventions.mdc` - `.cursor/rules/i18n-no-hardcoded-strings.mdc` -- `dev-guides/1-streaming-protocol.md` -- `dev-guides/7-unified-error-handling.md` +- `docs/dev-guides/1-streaming-protocol.md` +- `docs/dev-guides/7-unified-error-handling.md` diff --git a/dev-guides/7-unified-error-handling.md b/docs/dev-guides/7-unified-error-handling.md similarity index 99% rename from dev-guides/7-unified-error-handling.md rename to docs/dev-guides/7-unified-error-handling.md index 59554dfb..2127fdc3 100644 --- a/dev-guides/7-unified-error-handling.md +++ b/docs/dev-guides/7-unified-error-handling.md @@ -191,7 +191,7 @@ return json_ok(data), 400 # json_ok 已返回 (Response, status_code) ## 3. 后端流式 API -流式端点使用 NDJSON,详见 `dev-guides/1-streaming-protocol.md`。 +流式端点使用 NDJSON,详见 `docs/dev-guides/1-streaming-protocol.md`。 基本要求: @@ -459,7 +459,7 @@ metadata 请求不要复制短客户端 abort 模式。 前端通过 `getErrorMessage(apiError)` 优先使用本地 i18n,缺失时回退到后端英文 `message`。 -普通后端固定消息如果不是 `AppError` 体系,优先参考 `dev-guides/6-i18n-language-injection.md` 的 `message_code` / `content_code` 规则。 +普通后端固定消息如果不是 `AppError` 体系,优先参考 `docs/dev-guides/6-i18n-language-injection.md` 的 `message_code` / `content_code` 规则。 ## 6. 错误分类工具 diff --git a/dev-guides/8-path-safety.md b/docs/dev-guides/8-path-safety.md similarity index 99% rename from dev-guides/8-path-safety.md rename to docs/dev-guides/8-path-safety.md index 3d1ee749..ae74cd0a 100644 --- a/dev-guides/8-path-safety.md +++ b/docs/dev-guides/8-path-safety.md @@ -147,7 +147,7 @@ def _enforce_deployment_restrictions(): ) ``` -`local_folder` 是当前参考实现。Data Loader 通用开发规范见 `dev-guides/3-data-loader-development.md`。 +`local_folder` 是当前参考实现。Data Loader 通用开发规范见 `docs/dev-guides/3-data-loader-development.md`。 ## 7. Sandbox 部署 diff --git a/dev-guides/9-workspace-storage-architecture.md b/docs/dev-guides/9-workspace-storage-architecture.md similarity index 96% rename from dev-guides/9-workspace-storage-architecture.md rename to docs/dev-guides/9-workspace-storage-architecture.md index cb4af07d..ef3ecffb 100644 --- a/dev-guides/9-workspace-storage-architecture.md +++ b/docs/dev-guides/9-workspace-storage-architecture.md @@ -55,6 +55,12 @@ DATA_FORMULATOR_HOME/ # 默认 ~/.data_formulator,可 **大小**: ~150 bytes **写入时机**: `create_workspace()`, `save_session_state()`, `update_display_name()` +`update_display_name()` 采用 write-through 策略:同时写 `workspace_meta.json` 和 +`session_state.json`(patch `activeWorkspace.displayName`),确保两个文件的 displayName +始终一致——即使被重命名的 workspace 不是当前前端打开的那个。 +Azure Blob 后端因 session blob 可能好几 MB,下载+重上传开销大,不做 write-through; +如果出现不一致,用户再改一次名字即可,下次 auto-save 会同步。 + ```json { "id": "session_20260426_212411_1503", @@ -318,7 +324,7 @@ def _ensure_meta(self, workspace_id: str) -> dict: - 适用于 `--disable-database` 模式(无服务端持久化) - 默认行数限制为 20,000(`DEFAULT_ROW_LIMIT_EPHEMERAL`),以兼顾浏览器性能 -> **行数限制**: 两种模式的数据导入行数由统一的 `frontendRowLimit`(前端)和 `MAX_IMPORT_ROWS`(后端硬上限 200 万)控制。详见 `dev-guides/13-unified-row-limits.md`。 +> **行数限制**: 两种模式的数据导入行数由统一的 `frontendRowLimit`(前端)和 `MAX_IMPORT_ROWS`(后端硬上限 200 万)控制。详见 `docs/dev-guides/13-unified-row-limits.md`。 --- diff --git a/docs-cn/1-data-source-connections.md b/docs/docs-cn/1-data-source-connections.md similarity index 98% rename from docs-cn/1-data-source-connections.md rename to docs/docs-cn/1-data-source-connections.md index fd6de89e..2b5e38ce 100644 --- a/docs-cn/1-data-source-connections.md +++ b/docs/docs-cn/1-data-source-connections.md @@ -132,7 +132,7 @@ DATA_FORMULATOR_HOME/credentials.db ``` 服务器迁移、备份或 Docker 部署时,这两个文件需要和用户数据一起保留。更多说明见 -`docs-cn/6-credential-vault.md` 和 `docs-cn/7-server-migration-guide.md`。 +`docs/docs-cn/6-credential-vault.md` 和 `docs/docs-cn/7-server-migration-guide.md`。 --- diff --git a/docs-cn/5-datasource_plugin-development-guide.md b/docs/docs-cn/5-datasource_plugin-development-guide.md similarity index 96% rename from docs-cn/5-datasource_plugin-development-guide.md rename to docs/docs-cn/5-datasource_plugin-development-guide.md index 9279c93a..80adc43d 100644 --- a/docs-cn/5-datasource_plugin-development-guide.md +++ b/docs/docs-cn/5-datasource_plugin-development-guide.md @@ -44,7 +44,7 @@ DF_PLUGIN_DIR # 不改源码外加 loader type 的目录 如果要接入一个全新的报表系统或内部数据服务,管理员可以提供一个外部 loader 文件: ```text -~/.data-formulator/plugins/my_report_data_loader.py +~/.data_formulator/plugins/my_report_data_loader.py ``` 也可以通过环境变量指定目录: @@ -225,7 +225,7 @@ DF_SOURCES__my_report_prod__params__url=https://report.example.com | `delegated_login_config()` | 声明弹窗登录 URL 与按钮文案 | | `auth_mode()` | 旧兼容接口,新 loader 优先使用 `auth_config()` | -认证细节见 `dev-guides/4-authentication-oidc-tokenstore.md`。 +认证细节见 `docs/dev-guides/4-authentication-oidc-tokenstore.md`。 --- @@ -250,5 +250,5 @@ DF_SOURCES__my_report_prod__params__url=https://report.example.com - [ ] 服务重启后,`GET /api/data-loaders` 能看到新的 loader type。 - [ ] Add Connection 能创建连接,preview/import/refresh 能正常工作。 -更多开发细节见 [dev-guides/3-data-loader-development.md](../dev-guides/3-data-loader-development.md) -和 [dev-guides/5-data-connector-api.md](../dev-guides/5-data-connector-api.md)。 +更多开发细节见 [docs/dev-guides/3-data-loader-development.md](../dev-guides/3-data-loader-development.md) +和 [docs/dev-guides/5-data-connector-api.md](../dev-guides/5-data-connector-api.md)。 diff --git a/docs-cn/5.1-superset-sso-oauth-config-guide.md b/docs/docs-cn/5.1-superset-sso-oauth-config-guide.md similarity index 97% rename from docs-cn/5.1-superset-sso-oauth-config-guide.md rename to docs/docs-cn/5.1-superset-sso-oauth-config-guide.md index e0b9e13a..6b62beff 100644 --- a/docs-cn/5.1-superset-sso-oauth-config-guide.md +++ b/docs/docs-cn/5.1-superset-sso-oauth-config-guide.md @@ -113,7 +113,7 @@ superset_config.py ← Superset 主配置(导入 oauth_config) oauth_config.py ← SSO 认证 + DF 桥接(独立文件,便于维护) ``` -> **参考示例**:完整的配置示例文件见 [`docs-cn/config-examples/superset/`](config-examples/superset/),可作为起点按需修改。 +> **参考示例**:完整的配置示例文件见 [`docs/docs-cn/config-examples/superset/`](config-examples/superset/),可作为起点按需修改。 --- diff --git a/docs-cn/6-credential-vault.md b/docs/docs-cn/6-credential-vault.md similarity index 91% rename from docs-cn/6-credential-vault.md rename to docs/docs-cn/6-credential-vault.md index becc73b2..5c6d3f97 100644 --- a/docs-cn/6-credential-vault.md +++ b/docs/docs-cn/6-credential-vault.md @@ -39,7 +39,7 @@ DATA_FORMULATOR_HOME/users//connectors.yaml # 用户个人连接 ``` 这些文件在重启和升级后持续保留。迁移服务器时必须一起备份,详见 -`docs-cn/7-server-migration-guide.md`。 +`docs/docs-cn/7-server-migration-guide.md`。 --- @@ -114,7 +114,7 @@ service token 或通过 SSO exchange 获取目标系统 token。 5. 使用 vault 中保存的静态凭证。 6. 无可用凭证,提示用户重新授权或重新输入。 -开发细节见 `dev-guides/4-authentication-oidc-tokenstore.md`。 +开发细节见 `docs/dev-guides/4-authentication-oidc-tokenstore.md`。 --- @@ -174,6 +174,6 @@ py-src/data_formulator/credential_vault/ 相关文档: -- `docs-cn/1-data-source-connections.md` -- `docs-cn/7-server-migration-guide.md` -- `dev-guides/5-data-connector-api.md` +- `docs/docs-cn/1-data-source-connections.md` +- `docs/docs-cn/7-server-migration-guide.md` +- `docs/dev-guides/5-data-connector-api.md` diff --git a/docs-cn/7-server-migration-guide.md b/docs/docs-cn/7-server-migration-guide.md similarity index 98% rename from docs-cn/7-server-migration-guide.md rename to docs/docs-cn/7-server-migration-guide.md index 6162682c..9edc36e9 100644 --- a/docs-cn/7-server-migration-guide.md +++ b/docs/docs-cn/7-server-migration-guide.md @@ -240,5 +240,5 @@ Vault 加密密钥丢失后,已存储的凭证 **无法恢复**。处理方式 ## 8. 相关文档 - [凭证保险箱(Credential Vault)](6-credential-vault.md) — Vault 加密机制详解 -- [DEVELOPMENT.md — Server Migration Checklist](../DEVELOPMENT.md) — 英文版迁移清单 -- [.env.template](../.env.template) — 完整配置项说明 +- [DEVELOPMENT.md — Server Migration Checklist](../../DEVELOPMENT.md) — 英文版迁移清单 +- [.env.template](../../.env.template) — 完整配置项说明 diff --git a/docs-cn/config-examples/superset/oauth_config.py b/docs/docs-cn/config-examples/superset/oauth_config.py similarity index 100% rename from docs-cn/config-examples/superset/oauth_config.py rename to docs/docs-cn/config-examples/superset/oauth_config.py diff --git a/docs-cn/config-examples/superset/superset_config.py b/docs/docs-cn/config-examples/superset/superset_config.py similarity index 100% rename from docs-cn/config-examples/superset/superset_config.py rename to docs/docs-cn/config-examples/superset/superset_config.py diff --git a/embed/README.md b/embed/README.md deleted file mode 100644 index ce484d5b..00000000 --- a/embed/README.md +++ /dev/null @@ -1,85 +0,0 @@ -# Embed Data Formulator - -First you'll need to build the bundle: -``` -yarn build -``` - -This puts the complete js file in the `dist` folder. - -## Test bundle - -Next you can test to see the complete Data Formulator app by opening `/embed/index.html` in your browser. You can do this by double-clicking in your file explorer (this would use the `file://` protocol). - -To test cross-frame messaging, launch `postMessageTest.html` which hosts the app in an iframe, and has buttons to send commands such as `load data`. - -## Use in Fabric Notebook - -You willl need to enable access to your `dist` from the cloud. There are 2 ways to do this: -* Publish the `dist` (e.g. pip, npm, or other) -* Create a tunnel to your localhost - -### Tunnel to localhost -One way is to install [local-web-server](https://www.npmjs.com/package/local-web-server). This will serve a local folder as a website on http://localhost:8000. Next, you can set up a tunnel such as [ngrok](https://ngrok.com/download) which can provide a cloud-accesible url proxy to your local server. - -Copy the python function in a notebook cell: -```py -def dfviz(df, tableName, serverUrl): - # df is a PySpark DataFrame - - import json - from datetime import date, datetime - - # Custom function to convert datetime objects to string - def json_serial(obj): - if isinstance(obj, (datetime, date)): - return obj.isoformat() - raise TypeError ("Type %s not serializable" % type(obj)) - - # Convert DataFrame rows to dictionaries and collect them into a list - data = [row.asDict() for row in df.collect()] - - # Convert list of dictionaries to a single JSON array using the custom function - json_data = json.dumps(data, default=json_serial) - - displayHTML(f""" - - -""" -) -``` - -Get a dataframe and pass it to the `dfviz` function: -```py -df = spark.sql("SELECT * FROM Sample_lakehouse_475.publicholidays LIMIT 100") -display(df) -dfviz(df, 'Holidays', 'https://') -``` - diff --git a/embed/dynamic.html b/embed/dynamic.html deleted file mode 100644 index 02ebae7f..00000000 --- a/embed/dynamic.html +++ /dev/null @@ -1,110 +0,0 @@ - - - - - - - Data Formulator embedded dynamically - - - - - - - \ No newline at end of file diff --git a/embed/index.html b/embed/index.html deleted file mode 100644 index d81fe74d..00000000 --- a/embed/index.html +++ /dev/null @@ -1,12 +0,0 @@ - - - - - - Data Formulator embedded - - -
- - - \ No newline at end of file diff --git a/embed/postMessageTest.html b/embed/postMessageTest.html deleted file mode 100644 index c132d3ff..00000000 --- a/embed/postMessageTest.html +++ /dev/null @@ -1,116 +0,0 @@ - - - - - - - Data Formulator embedded iframe - - - - -
- -
- - - - \ No newline at end of file diff --git a/examples/plugins/README.md b/examples/plugins/README.md new file mode 100644 index 00000000..2aa9a58e --- /dev/null +++ b/examples/plugins/README.md @@ -0,0 +1,223 @@ +# Data Formulator — Data Loader Plugins + +Drop-in Python files that add new data-source connectors to Data +Formulator without modifying its source code. If the built-in +connectors don't cover your data source (an internal warehouse, a SaaS +API, a niche database), write a small plugin and DF will pick it up on +the next restart. + +This folder contains **example plugins**. Treat them as templates: copy +one, rename it, and adapt the body. + +--- + +## Quick start (3 steps) + +1. **Find your plugin directory.** It lives under your Data Formulator + home dir: + + ``` + $DATA_FORMULATOR_HOME/plugins/ + ``` + + If `DATA_FORMULATOR_HOME` is not set, DF defaults to + `~/.data_formulator/`, so the plugin dir is `~/.data_formulator/plugins/`. + + Power users can point somewhere else with `DF_PLUGIN_DIR` + (highest precedence) — useful for sharing one plugin folder across + multiple DF installs. + +2. **Copy an example into it.** For instance: + + ```bash + mkdir -p "${DATA_FORMULATOR_HOME:-$HOME/.data_formulator}/plugins" + cp examples/plugins/sqlite_data_loader.py \ + "${DATA_FORMULATOR_HOME:-$HOME/.data_formulator}/plugins/" + ``` + +3. **Restart Data Formulator.** The new connector appears in the UI + automatically. No registry edits, no rebuilds. + +To verify it loaded, check the startup log for a line like: + +``` +INFO ... Plugin loader 'sqlite' registered from sqlite_data_loader.py +INFO ... Plugin scan complete: 1 registered, 0 failed (dir=..., reason=WORKSPACE_BACKEND=local) +``` + +### Plugin directory resolution order + +| Precedence | Source | Default | +| ---------- | ----------------------------------------- | ----------------------------- | +| 1 | `DF_PLUGIN_DIR` env var (explicit override) | — | +| 2 | `$DATA_FORMULATOR_HOME/plugins` | — | +| 3 | Fallback | `~/.data_formulator/plugins/` | + +--- + +## File-name contract + +| Filename | Registry key | +| --------------------------------- | -------------- | +| `sqlite_data_loader.py` | `sqlite` | +| `acme_warehouse_data_loader.py` | `acme_warehouse` | +| `notion_data_loader.py` | `notion` | + +Rules: + +* The filename **must** end in `_data_loader.py`. +* The prefix becomes the registry key — keep it lowercase, no spaces. +* If the key matches a built-in (e.g. `mysql_data_loader.py`), the + plugin **overrides** the built-in. Useful for hot-patching. + +--- + +## What goes inside the file + +Each plugin defines exactly one class that subclasses +[`ExternalDataLoader`](../../py-src/data_formulator/data_loader/external_data_loader.py). +The minimum surface area: + +```python +from data_formulator.data_loader.external_data_loader import ( + ExternalDataLoader, MAX_IMPORT_ROWS, +) +import pyarrow as pa + +class MyLoader(ExternalDataLoader): + + # Optional: human-friendly UI label. Without this, the registry key + # is title-cased (``"my_warehouse"`` → ``"My Warehouse"``). Override + # to fix awkward casing (``"SQLite"``, ``"BigQuery"``). + DISPLAY_NAME = "My Warehouse" + + @staticmethod + def list_params() -> list[dict]: + """Declare connection-form fields. The UI auto-renders this.""" + return [ + {"name": "endpoint", "type": "string", "required": True, + "tier": "connection", "description": "Server URL"}, + {"name": "token", "type": "string", "required": True, + "tier": "auth", "sensitive": True, "description": "API token"}, + ] + + @staticmethod + def auth_instructions() -> str: + """Markdown help text shown next to the form.""" + return "Get your API token from https://example.com/settings/tokens" + + def __init__(self, params: dict): + self.params = params + # validate + open connection here + + def list_tables(self, table_filter: str | None = None) -> list[dict]: + """Return catalog: [{name, metadata: {columns, row_count}}, ...]""" + ... + + def fetch_data_as_arrow(self, source_table: str, + import_options: dict | None = None) -> pa.Table: + """Read rows. Honour import_options['size'] up to MAX_IMPORT_ROWS.""" + ... +``` + +Look at [`sqlite_data_loader.py`](sqlite_data_loader.py) for a runnable +implementation (~170 lines, stdlib only). + +### `list_params()` field reference + +| Key | Meaning | +| ------------- | ---------------------------------------------------------------- | +| `name` | Parameter key passed into `__init__(params)` | +| `type` | `"string"`, `"int"`, `"bool"`, `"password"` | +| `required` | If `True`, DF rejects connections that omit it | +| `default` | Pre-filled value in the form | +| `sensitive` | If `True`, value is redacted from logs / stored metadata | +| `tier` | `"auth"`, `"connection"`, or `"filter"` — groups fields visually | +| `description` | Help text shown under the field | + +--- + +## Security gate — please read + +Plugins execute arbitrary Python in the server process. To prevent +accidental code execution in shared deployments, the plugin scanner is +**enabled only in single-user local mode**: + +* `WORKSPACE_BACKEND` unset or `local` → scanner runs. +* `WORKSPACE_BACKEND` is anything else → scanner is skipped silently. +* To opt in for a hosted deployment, set `DF_ALLOW_PLUGINS=1` + **and** make sure the plugin directory is writable only by trusted + administrators. + +--- + +## Debugging a plugin that won't load + +The scanner reports every failure two ways: + +1. **In the log**, with a full traceback: + + ``` + WARNING Failed to load plugin foo_data_loader.py: ... ModuleNotFoundError ... + ``` + +2. **In `DISABLED_LOADERS`**, which the frontend surfaces as a greyed-out + connector with the reason. Common reasons: + + | Message | Fix | + | ------------------------------------------------------ | --- | + | `missing dependency 'X' (pip install X)` | `uv pip install X` in DF's venv | + | `no ExternalDataLoader subclass found` | Add a `class Foo(ExternalDataLoader)` to the file | + | `could not create import spec` | File isn't valid Python or has wrong extension | + | `: ` | Anything else raised during `import` — check the traceback | + +The plugin scanner cleans `sys.modules` on failure, so simply restarting +DF after editing the file picks up the fix — no manual cleanup needed. + +--- + +## Adding third-party dependencies + +If your plugin needs a package that isn't already in DF's environment: + +```bash +# from the data-formulator repo root +uv pip install +``` + +The plugin file is loaded into the same Python process as DF, so +anything importable from DF's venv is importable from your plugin. + +--- + +## For AI coding agents + +When asked to **add a new data source** to a user's Data Formulator +install, prefer writing a plugin file over modifying DF's source tree: + +1. Read [`sqlite_data_loader.py`](sqlite_data_loader.py) and the + `ExternalDataLoader` ABC in + `py-src/data_formulator/data_loader/external_data_loader.py` to + understand the required methods. +2. Create a new file `_data_loader.py` in this folder (or + directly in `~/.data_formulator/plugins/`). +3. Subclass `ExternalDataLoader`. Implement at minimum: + `list_params`, `auth_instructions`, `__init__`, `list_tables`, + `fetch_data_as_arrow`. +4. Return data as a `pyarrow.Table` from `fetch_data_as_arrow` — do + **not** convert to pandas in the hot path. +5. Quote identifiers when building SQL (see `_quote_ident` in the + SQLite example) to avoid injection vulnerabilities. +6. Respect `import_options['size']` and cap at `MAX_IMPORT_ROWS`. +7. If the source has credentials, mark those params `sensitive: True` + and `tier: "auth"` so DF redacts them from stored metadata. +8. Do not modify the user's data — open read-only connections where + the source supports it. + +After writing the file, verify it loads with: + +```bash +DF_PLUGIN_DIR= uv run python -c \ + "from data_formulator import data_loader as dl; \ + print(dl.PLUGIN_LOADERS, dl.DISABLED_LOADERS)" +``` diff --git a/examples/plugins/sqlite_data_loader.py b/examples/plugins/sqlite_data_loader.py new file mode 100644 index 00000000..cde32a82 --- /dev/null +++ b/examples/plugins/sqlite_data_loader.py @@ -0,0 +1,203 @@ +"""SQLite data loader — example plugin for Data Formulator. + +This file is a complete, working example of a Data Formulator plugin. +SQLite is part of the Python standard library, so this loader has **no +extra dependencies** and is a good template for building your own. + +How to install +-------------- +1. Make sure Data Formulator runs in single-user mode + (the default — ``WORKSPACE_BACKEND`` unset or ``local``). +2. Copy this file to ``~/.data_formulator/plugins/`` (or whatever + directory ``DF_PLUGIN_DIR`` points to). Filename must end in + ``_data_loader.py``. +3. Restart Data Formulator. A new "sqlite" connector should appear + alongside the built-ins. + +How to test it quickly +---------------------- +:: + + sqlite3 /tmp/demo.db <<'SQL' + CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT, age INTEGER); + INSERT INTO users VALUES (1, 'Alice', 30), (2, 'Bob', 25); + SQL + +Then in DF, add a SQLite connector pointing at ``/tmp/demo.db``. + +What this example demonstrates +------------------------------ +* Implementing every abstract method on :class:`ExternalDataLoader`. +* Declaring connection params via :meth:`list_params` so the UI + auto-renders a config form. +* Identifier quoting to avoid SQL-injection in table/column names. +* Returning data directly as a PyArrow Table (no pandas in the hot path). +* Read-only access (``mode=ro`` URI) — the loader never writes to the DB. +""" + +from __future__ import annotations + +import logging +import os +import sqlite3 +import threading +from typing import Any +from urllib.parse import quote as _url_quote + +import pyarrow as pa + +from data_formulator.data_loader.external_data_loader import ( + ExternalDataLoader, + MAX_IMPORT_ROWS, +) + +logger = logging.getLogger(__name__) + + +def _quote_ident(name: str) -> str: + """Quote a SQLite identifier safely (table or column name).""" + return '"' + str(name).replace('"', '""') + '"' + + +class SQLiteDataLoader(ExternalDataLoader): + """Read tables from a local SQLite database file.""" + + # Override the default title-casing of the registry key ("Sqlite") with + # a properly-cased product name. Optional; remove if you don't care. + DISPLAY_NAME = "SQLite" + + # ------------------------------------------------------------------ # + # Static metadata: what the UI shows + how to configure # + # ------------------------------------------------------------------ # + + @staticmethod + def list_params() -> list[dict[str, Any]]: + return [ + { + "name": "database_path", + "type": "string", + "required": True, + "default": "", + "tier": "connection", + "description": "Absolute path to a .db / .sqlite file", + }, + ] + + @staticmethod + def auth_instructions() -> str: + return ( + "**Example:** `/tmp/demo.db`\n\n" + "Provide the absolute path to a local SQLite database file. " + "The file must exist and be readable. The loader opens it " + "read-only — your data is never modified.\n\n" + "**Create a test DB:**\n" + "```\nsqlite3 /tmp/demo.db 'CREATE TABLE t(a,b); " + "INSERT INTO t VALUES (1,2),(3,4);'\n```" + ) + + # ------------------------------------------------------------------ # + # Connection # + # ------------------------------------------------------------------ # + + def __init__(self, params: dict[str, Any]): + self.params = params or {} + self.database_path = (self.params.get("database_path") or "").strip() + + if not self.database_path: + raise ValueError("SQLite database_path is required") + if not os.path.isfile(self.database_path): + raise ValueError(f"SQLite database not found: {self.database_path}") + + # Open read-only via URI so we cannot accidentally mutate the DB. + uri = f"file:{_url_quote(self.database_path)}?mode=ro" + try: + self._conn = sqlite3.connect( + uri, uri=True, check_same_thread=False, isolation_level=None, + ) + except sqlite3.Error as e: + raise ValueError( + f"Failed to open SQLite database '{self.database_path}': {e}" + ) from e + + self._lock = threading.Lock() + logger.info("SQLite plugin connected: %s", self.database_path) + + # ------------------------------------------------------------------ # + # Helpers # + # ------------------------------------------------------------------ # + + def _read_sql(self, query: str) -> pa.Table: + """Run a query and return the result as a PyArrow Table.""" + with self._lock: + cur = self._conn.execute(query) + if cur.description is None: + return pa.table({}) + columns = [d[0] for d in cur.description] + rows = cur.fetchall() + # Transpose rows to per-column lists; pyarrow infers types. + col_data: dict[str, list[Any]] = {c: [] for c in columns} + for row in rows: + for col, value in zip(columns, row): + col_data[col].append(value) + return pa.table(col_data) + + # ------------------------------------------------------------------ # + # Required loader API # + # ------------------------------------------------------------------ # + + def list_tables(self, table_filter: str | None = None) -> list[dict[str, Any]]: + """List every table + view in the database.""" + names_tbl = self._read_sql( + "SELECT name, type FROM sqlite_master " + "WHERE type IN ('table', 'view') " + "AND name NOT LIKE 'sqlite_%' " + "ORDER BY name" + ) + if names_tbl.num_rows == 0: + return [] + + results: list[dict[str, Any]] = [] + for name in names_tbl.column("name").to_pylist(): + if table_filter and table_filter.lower() not in name.lower(): + continue + # PRAGMA table_info gives column name + declared type. + info = self._read_sql(f"PRAGMA table_info({_quote_ident(name)})") + columns = [ + {"name": n, "type": t or "ANY"} + for n, t in zip( + info.column("name").to_pylist() if info.num_rows else [], + info.column("type").to_pylist() if info.num_rows else [], + ) + ] + count_tbl = self._read_sql( + f"SELECT COUNT(*) AS n FROM {_quote_ident(name)}" + ) + row_count = int(count_tbl.column("n")[0].as_py()) if count_tbl.num_rows else 0 + results.append({ + "name": name, + "metadata": {"columns": columns, "row_count": row_count}, + }) + return results + + def fetch_data_as_arrow( + self, + source_table: str, + import_options: dict[str, Any] | None = None, + ) -> pa.Table: + """Fetch rows from ``source_table`` as a PyArrow Table.""" + if not source_table: + raise ValueError("source_table must be provided") + + opts = import_options or {} + size = min(int(opts.get("size", MAX_IMPORT_ROWS)), MAX_IMPORT_ROWS) + sort_columns = opts.get("sort_columns") or [] + sort_order = "DESC" if str(opts.get("sort_order", "asc")).lower() == "desc" else "ASC" + + query = f"SELECT * FROM {_quote_ident(source_table)}" + if sort_columns: + order_by = ", ".join(f"{_quote_ident(c)} {sort_order}" for c in sort_columns) + query += f" ORDER BY {order_by}" + query += f" LIMIT {int(size)}" + + logger.info("SQLite plugin query: %s", query) + return self._read_sql(query) diff --git a/package.json b/package.json index 9e40746d..42c8e2bd 100644 --- a/package.json +++ b/package.json @@ -19,6 +19,10 @@ "@mui/x-tree-view": "^9.0.1", "@reduxjs/toolkit": "^1.8.6", "@tiptap/extension-image": "^3.22.2", + "@tiptap/extension-table": "^3.23.6", + "@tiptap/extension-table-cell": "^3.23.6", + "@tiptap/extension-table-header": "^3.23.6", + "@tiptap/extension-table-row": "^3.23.6", "@tiptap/pm": "^3.22.2", "@tiptap/react": "^3.22.2", "@tiptap/starter-kit": "^3.22.2", @@ -48,7 +52,7 @@ "react": "^18.2.0", "react-animate-height": "^3.0.4", "react-animate-on-change": "^2.2.0", - "react-arborist": "^3.5.0", + "react-arborist": "3.7.0", "react-dnd": "^16.0.1", "react-dnd-html5-backend": "^16.0.1", "react-dom": "^18.2.0", diff --git a/public/climate-gas-indicator.xlsx b/public/climate-gas-indicator.xlsx new file mode 100644 index 00000000..b936c18b Binary files /dev/null and b/public/climate-gas-indicator.xlsx differ diff --git a/public/df_global_energy.json b/public/df_global_energy.json index 3c678f94..6b82ad5e 100644 --- a/public/df_global_energy.json +++ b/public/df_global_energy.json @@ -1 +1 @@ -{"tables": [{"kind": "table", "id": "global-energy-20-small.csv", "displayId": "energy-co2", "names": ["Year", "Entity", "Value_co2_emissions_kt_by_country", "Electricity from fossil fuels (TWh)", "Electricity from nuclear (TWh)", "Electricity from renewables (TWh)"], "metadata": {"Year": {"type": "number", "semanticType": "Year"}, "Entity": {"type": "string", "semanticType": "Location"}, "Value_co2_emissions_kt_by_country": {"type": "number", "semanticType": "Number"}, "Electricity from fossil fuels (TWh)": {"type": "number", "semanticType": "Number"}, "Electricity from nuclear (TWh)": {"type": "number", "semanticType": "Number"}, "Electricity from renewables (TWh)": {"type": "number", "semanticType": "Number"}}, "rows": [{"Year": 2000, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 339450, "Electricity from fossil fuels (TWh)": 181.05, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 17.11}, {"Year": 2001, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 345640, "Electricity from fossil fuels (TWh)": 194.33, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 17.4}, {"Year": 2002, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 353369.9951, "Electricity from fossil fuels (TWh)": 197.29, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 17.35}, {"Year": 2003, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 352579.9866, "Electricity from fossil fuels (TWh)": 195.13, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 18.5}, {"Year": 2004, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 365809.9976, "Electricity from fossil fuels (TWh)": 203.66, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 19.41}, {"Year": 2005, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 370089.9963, "Electricity from fossil fuels (TWh)": 195.95, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 19.75}, {"Year": 2006, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 375489.9902, "Electricity from fossil fuels (TWh)": 198.72, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 21.19}, {"Year": 2007, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 385750, "Electricity from fossil fuels (TWh)": 208.59, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 20.93}, {"Year": 2008, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 388940.0024, "Electricity from fossil fuels (TWh)": 211.06, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 18.49}, {"Year": 2009, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 395290.0085, "Electricity from fossil fuels (TWh)": 216.42, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 18.32}, {"Year": 2010, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 387540.0085, "Electricity from fossil fuels (TWh)": 212.5, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 21.13}, {"Year": 2011, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 386380.0049, "Electricity from fossil fuels (TWh)": 213.56, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 27.33}, {"Year": 2012, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 386970.0012, "Electricity from fossil fuels (TWh)": 206.75, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 26.63}, {"Year": 2013, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 380279.9988, "Electricity from fossil fuels (TWh)": 195.78, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 34.2}, {"Year": 2014, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 371630.0049, "Electricity from fossil fuels (TWh)": 205.46, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 36.15}, {"Year": 2015, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 377799.9878, "Electricity from fossil fuels (TWh)": 197.72, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 33.12}, {"Year": 2016, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 384989.9902, "Electricity from fossil fuels (TWh)": 207.66, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 38.41}, {"Year": 2017, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 389160.0037, "Electricity from fossil fuels (TWh)": 209.14, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 40.77}, {"Year": 2018, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 387070.0073, "Electricity from fossil fuels (TWh)": 207.45, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 42.93}, {"Year": 2019, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 386529.9988, "Electricity from fossil fuels (TWh)": 196.45, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 53.41}, {"Year": 2020, "Entity": "Australia", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 186.92, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 63.99}, {"Year": 2000, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 313670, "Electricity from fossil fuels (TWh)": 28.87, "Electricity from nuclear (TWh)": 4.94, "Electricity from renewables (TWh)": 308.77}, {"Year": 2001, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 319380, "Electricity from fossil fuels (TWh)": 35.19, "Electricity from nuclear (TWh)": 14.27, "Electricity from renewables (TWh)": 273.71}, {"Year": 2002, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 317760.0098, "Electricity from fossil fuels (TWh)": 33.5, "Electricity from nuclear (TWh)": 13.84, "Electricity from renewables (TWh)": 292.95}, {"Year": 2003, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 310809.9976, "Electricity from fossil fuels (TWh)": 31.62, "Electricity from nuclear (TWh)": 13.4, "Electricity from renewables (TWh)": 313.88}, {"Year": 2004, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 328519.989, "Electricity from fossil fuels (TWh)": 40.14, "Electricity from nuclear (TWh)": 11.6, "Electricity from renewables (TWh)": 329.43}, {"Year": 2005, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 331690.0024, "Electricity from fossil fuels (TWh)": 39.56, "Electricity from nuclear (TWh)": 9.2, "Electricity from renewables (TWh)": 346.96}, {"Year": 2006, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 335619.9951, "Electricity from fossil fuels (TWh)": 39.4, "Electricity from nuclear (TWh)": 12.98, "Electricity from renewables (TWh)": 359.55}, {"Year": 2007, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 352559.9976, "Electricity from fossil fuels (TWh)": 37.64, "Electricity from nuclear (TWh)": 11.65, "Electricity from renewables (TWh)": 387.88}, {"Year": 2008, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 373630.0049, "Electricity from fossil fuels (TWh)": 55.87, "Electricity from nuclear (TWh)": 13.21, "Electricity from renewables (TWh)": 385.61}, {"Year": 2009, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 350000, "Electricity from fossil fuels (TWh)": 36.32, "Electricity from nuclear (TWh)": 12.22, "Electricity from renewables (TWh)": 410.13}, {"Year": 2010, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 397929.9927, "Electricity from fossil fuels (TWh)": 61.02, "Electricity from nuclear (TWh)": 13.77, "Electricity from renewables (TWh)": 435.99}, {"Year": 2011, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 418309.9976, "Electricity from fossil fuels (TWh)": 50.27, "Electricity from nuclear (TWh)": 14.8, "Electricity from renewables (TWh)": 462.32}, {"Year": 2012, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 454230.011, "Electricity from fossil fuels (TWh)": 77.21, "Electricity from nuclear (TWh)": 15.17, "Electricity from renewables (TWh)": 454.78}, {"Year": 2013, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 486839.9963, "Electricity from fossil fuels (TWh)": 112, "Electricity from nuclear (TWh)": 14.65, "Electricity from renewables (TWh)": 436.84}, {"Year": 2014, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 511619.9951, "Electricity from fossil fuels (TWh)": 136.58, "Electricity from nuclear (TWh)": 14.46, "Electricity from renewables (TWh)": 430.82}, {"Year": 2015, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 485339.9963, "Electricity from fossil fuels (TWh)": 128.85, "Electricity from nuclear (TWh)": 13.91, "Electricity from renewables (TWh)": 428.81}, {"Year": 2016, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 447079.9866, "Electricity from fossil fuels (TWh)": 93.06, "Electricity from nuclear (TWh)": 14.97, "Electricity from renewables (TWh)": 463.37}, {"Year": 2017, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 456489.9902, "Electricity from fossil fuels (TWh)": 101.9, "Electricity from nuclear (TWh)": 14.86, "Electricity from renewables (TWh)": 464.4}, {"Year": 2018, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 433989.9902, "Electricity from fossil fuels (TWh)": 86.69, "Electricity from nuclear (TWh)": 14.79, "Electricity from renewables (TWh)": 492.66}, {"Year": 2019, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 434299.9878, "Electricity from fossil fuels (TWh)": 90.91, "Electricity from nuclear (TWh)": 15.16, "Electricity from renewables (TWh)": 512.59}, {"Year": 2020, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 81.15, "Electricity from nuclear (TWh)": 13.21, "Electricity from renewables (TWh)": 520.01}, {"Year": 2000, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 514220, "Electricity from fossil fuels (TWh)": 155.56, "Electricity from nuclear (TWh)": 69.16, "Electricity from renewables (TWh)": 363.7}, {"Year": 2001, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 506620, "Electricity from fossil fuels (TWh)": 159.93, "Electricity from nuclear (TWh)": 72.86, "Electricity from renewables (TWh)": 339.58}, {"Year": 2002, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 524349.9756, "Electricity from fossil fuels (TWh)": 155.12, "Electricity from nuclear (TWh)": 71.75, "Electricity from renewables (TWh)": 357.06}, {"Year": 2003, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 544539.978, "Electricity from fossil fuels (TWh)": 157.35, "Electricity from nuclear (TWh)": 71.15, "Electricity from renewables (TWh)": 343.88}, {"Year": 2004, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 536419.9829, "Electricity from fossil fuels (TWh)": 148.86, "Electricity from nuclear (TWh)": 85.87, "Electricity from renewables (TWh)": 347.68}, {"Year": 2005, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 549030.0293, "Electricity from fossil fuels (TWh)": 150.78, "Electricity from nuclear (TWh)": 86.83, "Electricity from renewables (TWh)": 368.86}, {"Year": 2006, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 540530.0293, "Electricity from fossil fuels (TWh)": 139.71, "Electricity from nuclear (TWh)": 92.44, "Electricity from renewables (TWh)": 360.48}, {"Year": 2007, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 571630.0049, "Electricity from fossil fuels (TWh)": 149.36, "Electricity from nuclear (TWh)": 88.19, "Electricity from renewables (TWh)": 375.42}, {"Year": 2008, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 550469.9707, "Electricity from fossil fuels (TWh)": 141.33, "Electricity from nuclear (TWh)": 88.3, "Electricity from renewables (TWh)": 385.21}, {"Year": 2009, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 521320.0073, "Electricity from fossil fuels (TWh)": 129.76, "Electricity from nuclear (TWh)": 85.13, "Electricity from renewables (TWh)": 380.24}, {"Year": 2010, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 537010.0098, "Electricity from fossil fuels (TWh)": 130.08, "Electricity from nuclear (TWh)": 85.53, "Electricity from renewables (TWh)": 366.21}, {"Year": 2011, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 549289.978, "Electricity from fossil fuels (TWh)": 131.3, "Electricity from nuclear (TWh)": 88.29, "Electricity from renewables (TWh)": 391.95}, {"Year": 2012, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 546210.022, "Electricity from fossil fuels (TWh)": 124.2, "Electricity from nuclear (TWh)": 89.49, "Electricity from renewables (TWh)": 398.58}, {"Year": 2013, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 555659.9731, "Electricity from fossil fuels (TWh)": 122.87, "Electricity from nuclear (TWh)": 97.58, "Electricity from renewables (TWh)": 417.28}, {"Year": 2014, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 561679.9927, "Electricity from fossil fuels (TWh)": 122.75, "Electricity from nuclear (TWh)": 101.21, "Electricity from renewables (TWh)": 412.13}, {"Year": 2015, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 558700.0122, "Electricity from fossil fuels (TWh)": 125.7, "Electricity from nuclear (TWh)": 96.05, "Electricity from renewables (TWh)": 417.2}, {"Year": 2016, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 556830.0171, "Electricity from fossil fuels (TWh)": 122.35, "Electricity from nuclear (TWh)": 95.69, "Electricity from renewables (TWh)": 426.84}, {"Year": 2017, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 568080.0171, "Electricity from fossil fuels (TWh)": 113.7, "Electricity from nuclear (TWh)": 95.57, "Electricity from renewables (TWh)": 435.43}, {"Year": 2018, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 580090.0269, "Electricity from fossil fuels (TWh)": 112.47, "Electricity from nuclear (TWh)": 95.03, "Electricity from renewables (TWh)": 428.39}, {"Year": 2019, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 580210.022, "Electricity from fossil fuels (TWh)": 110.65, "Electricity from nuclear (TWh)": 95.47, "Electricity from renewables (TWh)": 421.8}, {"Year": 2020, "Entity": "Canada", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 102.19, "Electricity from nuclear (TWh)": 92.65, "Electricity from renewables (TWh)": 429.24}, {"Year": 2000, "Entity": "China", "Value_co2_emissions_kt_by_country": 3346530, "Electricity from fossil fuels (TWh)": 1113.3, "Electricity from nuclear (TWh)": 16.74, "Electricity from renewables (TWh)": 225.56}, {"Year": 2001, "Entity": "China", "Value_co2_emissions_kt_by_country": 3529080, "Electricity from fossil fuels (TWh)": 1182.59, "Electricity from nuclear (TWh)": 17.47, "Electricity from renewables (TWh)": 280.73}, {"Year": 2002, "Entity": "China", "Value_co2_emissions_kt_by_country": 3810060.059, "Electricity from fossil fuels (TWh)": 1337.46, "Electricity from nuclear (TWh)": 25.13, "Electricity from renewables (TWh)": 291.41}, {"Year": 2003, "Entity": "China", "Value_co2_emissions_kt_by_country": 4415910.156, "Electricity from fossil fuels (TWh)": 1579.96, "Electricity from nuclear (TWh)": 43.34, "Electricity from renewables (TWh)": 287.28}, {"Year": 2004, "Entity": "China", "Value_co2_emissions_kt_by_country": 5124819.824, "Electricity from fossil fuels (TWh)": 1795.41, "Electricity from nuclear (TWh)": 50.47, "Electricity from renewables (TWh)": 357.43}, {"Year": 2005, "Entity": "China", "Value_co2_emissions_kt_by_country": 5824629.883, "Electricity from fossil fuels (TWh)": 2042.8, "Electricity from nuclear (TWh)": 53.09, "Electricity from renewables (TWh)": 404.37}, {"Year": 2006, "Entity": "China", "Value_co2_emissions_kt_by_country": 6437470.215, "Electricity from fossil fuels (TWh)": 2364.16, "Electricity from nuclear (TWh)": 54.84, "Electricity from renewables (TWh)": 446.72}, {"Year": 2007, "Entity": "China", "Value_co2_emissions_kt_by_country": 6993180.176, "Electricity from fossil fuels (TWh)": 2718.7, "Electricity from nuclear (TWh)": 62.13, "Electricity from renewables (TWh)": 500.71}, {"Year": 2008, "Entity": "China", "Value_co2_emissions_kt_by_country": 7199600.098, "Electricity from fossil fuels (TWh)": 2762.29, "Electricity from nuclear (TWh)": 68.39, "Electricity from renewables (TWh)": 665.08}, {"Year": 2009, "Entity": "China", "Value_co2_emissions_kt_by_country": 7719069.824, "Electricity from fossil fuels (TWh)": 2980.2, "Electricity from nuclear (TWh)": 70.05, "Electricity from renewables (TWh)": 664.39}, {"Year": 2010, "Entity": "China", "Value_co2_emissions_kt_by_country": 8474919.922, "Electricity from fossil fuels (TWh)": 3326.19, "Electricity from nuclear (TWh)": 74.74, "Electricity from renewables (TWh)": 786.38}, {"Year": 2011, "Entity": "China", "Value_co2_emissions_kt_by_country": 9282549.805, "Electricity from fossil fuels (TWh)": 3811.77, "Electricity from nuclear (TWh)": 87.2, "Electricity from renewables (TWh)": 792.38}, {"Year": 2012, "Entity": "China", "Value_co2_emissions_kt_by_country": 9541870.117, "Electricity from fossil fuels (TWh)": 3869.38, "Electricity from nuclear (TWh)": 98.32, "Electricity from renewables (TWh)": 999.56}, {"Year": 2013, "Entity": "China", "Value_co2_emissions_kt_by_country": 9984570.313, "Electricity from fossil fuels (TWh)": 4203.77, "Electricity from nuclear (TWh)": 111.5, "Electricity from renewables (TWh)": 1093.37}, {"Year": 2014, "Entity": "China", "Value_co2_emissions_kt_by_country": 10006669.92, "Electricity from fossil fuels (TWh)": 4345.86, "Electricity from nuclear (TWh)": 133.22, "Electricity from renewables (TWh)": 1289.23}, {"Year": 2015, "Entity": "China", "Value_co2_emissions_kt_by_country": 9861099.609, "Electricity from fossil fuels (TWh)": 4222.76, "Electricity from nuclear (TWh)": 171.38, "Electricity from renewables (TWh)": 1393.66}, {"Year": 2016, "Entity": "China", "Value_co2_emissions_kt_by_country": 9874660.156, "Electricity from fossil fuels (TWh)": 4355, "Electricity from nuclear (TWh)": 213.18, "Electricity from renewables (TWh)": 1522.79}, {"Year": 2017, "Entity": "China", "Value_co2_emissions_kt_by_country": 10096009.77, "Electricity from fossil fuels (TWh)": 4643.1, "Electricity from nuclear (TWh)": 248.1, "Electricity from renewables (TWh)": 1667.06}, {"Year": 2018, "Entity": "China", "Value_co2_emissions_kt_by_country": 10502929.69, "Electricity from fossil fuels (TWh)": 4990.28, "Electricity from nuclear (TWh)": 295, "Electricity from renewables (TWh)": 1835.32}, {"Year": 2019, "Entity": "China", "Value_co2_emissions_kt_by_country": 10707219.73, "Electricity from fossil fuels (TWh)": 5098.22, "Electricity from nuclear (TWh)": 348.7, "Electricity from renewables (TWh)": 2014.57}, {"Year": 2020, "Entity": "China", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 5184.13, "Electricity from nuclear (TWh)": 366.2, "Electricity from renewables (TWh)": 2184.94}, {"Year": 2000, "Entity": "France", "Value_co2_emissions_kt_by_country": 373120, "Electricity from fossil fuels (TWh)": 50.61, "Electricity from nuclear (TWh)": 415.16, "Electricity from renewables (TWh)": 67.83}, {"Year": 2001, "Entity": "France", "Value_co2_emissions_kt_by_country": 376730, "Electricity from fossil fuels (TWh)": 46.48, "Electricity from nuclear (TWh)": 421.08, "Electricity from renewables (TWh)": 76.09}, {"Year": 2002, "Entity": "France", "Value_co2_emissions_kt_by_country": 371019.989, "Electricity from fossil fuels (TWh)": 52.67, "Electricity from nuclear (TWh)": 436.76, "Electricity from renewables (TWh)": 62.69}, {"Year": 2003, "Entity": "France", "Value_co2_emissions_kt_by_country": 376709.9915, "Electricity from fossil fuels (TWh)": 57.38, "Electricity from nuclear (TWh)": 441.07, "Electricity from renewables (TWh)": 61.47}, {"Year": 2004, "Entity": "France", "Value_co2_emissions_kt_by_country": 377790.0085, "Electricity from fossil fuels (TWh)": 56.53, "Electricity from nuclear (TWh)": 448.24, "Electricity from renewables (TWh)": 62.42}, {"Year": 2005, "Entity": "France", "Value_co2_emissions_kt_by_country": 380660.0037, "Electricity from fossil fuels (TWh)": 63.35, "Electricity from nuclear (TWh)": 451.53, "Electricity from renewables (TWh)": 54.98}, {"Year": 2006, "Entity": "France", "Value_co2_emissions_kt_by_country": 371549.9878, "Electricity from fossil fuels (TWh)": 56.9, "Electricity from nuclear (TWh)": 450.19, "Electricity from renewables (TWh)": 60.91}, {"Year": 2007, "Entity": "France", "Value_co2_emissions_kt_by_country": 362829.9866, "Electricity from fossil fuels (TWh)": 58.18, "Electricity from nuclear (TWh)": 439.73, "Electricity from renewables (TWh)": 64.3}, {"Year": 2008, "Entity": "France", "Value_co2_emissions_kt_by_country": 357989.9902, "Electricity from fossil fuels (TWh)": 55.57, "Electricity from nuclear (TWh)": 439.45, "Electricity from renewables (TWh)": 72.33}, {"Year": 2009, "Entity": "France", "Value_co2_emissions_kt_by_country": 343730.011, "Electricity from fossil fuels (TWh)": 51.32, "Electricity from nuclear (TWh)": 409.74, "Electricity from renewables (TWh)": 68.15}, {"Year": 2010, "Entity": "France", "Value_co2_emissions_kt_by_country": 347779.9988, "Electricity from fossil fuels (TWh)": 57.63, "Electricity from nuclear (TWh)": 428.52, "Electricity from renewables (TWh)": 76.68}, {"Year": 2011, "Entity": "France", "Value_co2_emissions_kt_by_country": 335140.0146, "Electricity from fossil fuels (TWh)": 58.99, "Electricity from nuclear (TWh)": 442.39, "Electricity from renewables (TWh)": 66.02}, {"Year": 2012, "Entity": "France", "Value_co2_emissions_kt_by_country": 338420.0134, "Electricity from fossil fuels (TWh)": 56.42, "Electricity from nuclear (TWh)": 425.41, "Electricity from renewables (TWh)": 85.25}, {"Year": 2013, "Entity": "France", "Value_co2_emissions_kt_by_country": 338559.9976, "Electricity from fossil fuels (TWh)": 53.35, "Electricity from nuclear (TWh)": 423.68, "Electricity from renewables (TWh)": 99.42}, {"Year": 2014, "Entity": "France", "Value_co2_emissions_kt_by_country": 306100.0061, "Electricity from fossil fuels (TWh)": 35.68, "Electricity from nuclear (TWh)": 436.48, "Electricity from renewables (TWh)": 94.03}, {"Year": 2015, "Entity": "France", "Value_co2_emissions_kt_by_country": 311299.9878, "Electricity from fossil fuels (TWh)": 44.65, "Electricity from nuclear (TWh)": 437.43, "Electricity from renewables (TWh)": 91.84}, {"Year": 2016, "Entity": "France", "Value_co2_emissions_kt_by_country": 313920.0134, "Electricity from fossil fuels (TWh)": 56.45, "Electricity from nuclear (TWh)": 403.2, "Electricity from renewables (TWh)": 99}, {"Year": 2017, "Entity": "France", "Value_co2_emissions_kt_by_country": 317829.9866, "Electricity from fossil fuels (TWh)": 65.09, "Electricity from nuclear (TWh)": 398.36, "Electricity from renewables (TWh)": 92.63}, {"Year": 2018, "Entity": "France", "Value_co2_emissions_kt_by_country": 307049.9878, "Electricity from fossil fuels (TWh)": 49.27, "Electricity from nuclear (TWh)": 412.94, "Electricity from renewables (TWh)": 113.62}, {"Year": 2019, "Entity": "France", "Value_co2_emissions_kt_by_country": 300519.989, "Electricity from fossil fuels (TWh)": 53.5, "Electricity from nuclear (TWh)": 399.01, "Electricity from renewables (TWh)": 113.21}, {"Year": 2020, "Entity": "France", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 48.14, "Electricity from nuclear (TWh)": 353.83, "Electricity from renewables (TWh)": 125.28}, {"Year": 2000, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 830280, "Electricity from fossil fuels (TWh)": 367.22, "Electricity from nuclear (TWh)": 169.61, "Electricity from renewables (TWh)": 35.47}, {"Year": 2001, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 847680, "Electricity from fossil fuels (TWh)": 372.69, "Electricity from nuclear (TWh)": 171.3, "Electricity from renewables (TWh)": 37.9}, {"Year": 2002, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 833380.0049, "Electricity from fossil fuels (TWh)": 372.64, "Electricity from nuclear (TWh)": 164.84, "Electricity from renewables (TWh)": 44.48}, {"Year": 2003, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 836789.978, "Electricity from fossil fuels (TWh)": 390.81, "Electricity from nuclear (TWh)": 165.06, "Electricity from renewables (TWh)": 46.67}, {"Year": 2004, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 821070.0073, "Electricity from fossil fuels (TWh)": 385.24, "Electricity from nuclear (TWh)": 167.07, "Electricity from renewables (TWh)": 57.97}, {"Year": 2005, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 802380.0049, "Electricity from fossil fuels (TWh)": 386.96, "Electricity from nuclear (TWh)": 163.05, "Electricity from renewables (TWh)": 63.4}, {"Year": 2006, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 814409.9731, "Electricity from fossil fuels (TWh)": 390.03, "Electricity from nuclear (TWh)": 167.27, "Electricity from renewables (TWh)": 72.51}, {"Year": 2007, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 783799.9878, "Electricity from fossil fuels (TWh)": 402.4, "Electricity from nuclear (TWh)": 140.53, "Electricity from renewables (TWh)": 89.38}, {"Year": 2008, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 789690.0024, "Electricity from fossil fuels (TWh)": 390.43, "Electricity from nuclear (TWh)": 148.49, "Electricity from renewables (TWh)": 94.28}, {"Year": 2009, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 734809.9976, "Electricity from fossil fuels (TWh)": 358.07, "Electricity from nuclear (TWh)": 134.93, "Electricity from renewables (TWh)": 95.94}, {"Year": 2010, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 773070.0073, "Electricity from fossil fuels (TWh)": 378.9, "Electricity from nuclear (TWh)": 140.56, "Electricity from renewables (TWh)": 105.18}, {"Year": 2011, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 746479.9805, "Electricity from fossil fuels (TWh)": 373.16, "Electricity from nuclear (TWh)": 107.97, "Electricity from renewables (TWh)": 124.04}, {"Year": 2012, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 760130.0049, "Electricity from fossil fuels (TWh)": 377.89, "Electricity from nuclear (TWh)": 99.46, "Electricity from renewables (TWh)": 143.04}, {"Year": 2013, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 776150.0244, "Electricity from fossil fuels (TWh)": 381.52, "Electricity from nuclear (TWh)": 97.29, "Electricity from renewables (TWh)": 152.34}, {"Year": 2014, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 736010.0098, "Electricity from fossil fuels (TWh)": 360.28, "Electricity from nuclear (TWh)": 97.13, "Electricity from renewables (TWh)": 162.54}, {"Year": 2015, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 742309.9976, "Electricity from fossil fuels (TWh)": 359.99, "Electricity from nuclear (TWh)": 91.79, "Electricity from renewables (TWh)": 188.79}, {"Year": 2016, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 747150.0244, "Electricity from fossil fuels (TWh)": 368.67, "Electricity from nuclear (TWh)": 84.63, "Electricity from renewables (TWh)": 189.67}, {"Year": 2017, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 732200.0122, "Electricity from fossil fuels (TWh)": 353.37, "Electricity from nuclear (TWh)": 76.32, "Electricity from renewables (TWh)": 216.32}, {"Year": 2018, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 707700.0122, "Electricity from fossil fuels (TWh)": 334.65, "Electricity from nuclear (TWh)": 76, "Electricity from renewables (TWh)": 222.07}, {"Year": 2019, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 657400.0244, "Electricity from fossil fuels (TWh)": 284.09, "Electricity from nuclear (TWh)": 75.07, "Electricity from renewables (TWh)": 240.33}, {"Year": 2020, "Entity": "Germany", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 251.4, "Electricity from nuclear (TWh)": 64.38, "Electricity from renewables (TWh)": 251.48}, {"Year": 2000, "Entity": "India", "Value_co2_emissions_kt_by_country": 937860, "Electricity from fossil fuels (TWh)": 475.35, "Electricity from nuclear (TWh)": 15.77, "Electricity from renewables (TWh)": 80.27}, {"Year": 2001, "Entity": "India", "Value_co2_emissions_kt_by_country": 953540, "Electricity from fossil fuels (TWh)": 491.01, "Electricity from nuclear (TWh)": 18.89, "Electricity from renewables (TWh)": 76.19}, {"Year": 2002, "Entity": "India", "Value_co2_emissions_kt_by_country": 985450.0122, "Electricity from fossil fuels (TWh)": 517.51, "Electricity from nuclear (TWh)": 19.35, "Electricity from renewables (TWh)": 72.78}, {"Year": 2003, "Entity": "India", "Value_co2_emissions_kt_by_country": 1011770.02, "Electricity from fossil fuels (TWh)": 545.36, "Electricity from nuclear (TWh)": 18.14, "Electricity from renewables (TWh)": 74.63}, {"Year": 2004, "Entity": "India", "Value_co2_emissions_kt_by_country": 1085670.044, "Electricity from fossil fuels (TWh)": 567.86, "Electricity from nuclear (TWh)": 21.26, "Electricity from renewables (TWh)": 109.2}, {"Year": 2005, "Entity": "India", "Value_co2_emissions_kt_by_country": 1136469.971, "Electricity from fossil fuels (TWh)": 579.32, "Electricity from nuclear (TWh)": 17.73, "Electricity from renewables (TWh)": 107.47}, {"Year": 2006, "Entity": "India", "Value_co2_emissions_kt_by_country": 1215209.961, "Electricity from fossil fuels (TWh)": 599.24, "Electricity from nuclear (TWh)": 17.63, "Electricity from renewables (TWh)": 127.56}, {"Year": 2007, "Entity": "India", "Value_co2_emissions_kt_by_country": 1336739.99, "Electricity from fossil fuels (TWh)": 636.68, "Electricity from nuclear (TWh)": 17.83, "Electricity from renewables (TWh)": 141.75}, {"Year": 2008, "Entity": "India", "Value_co2_emissions_kt_by_country": 1424380.005, "Electricity from fossil fuels (TWh)": 674.27, "Electricity from nuclear (TWh)": 15.23, "Electricity from renewables (TWh)": 138.91}, {"Year": 2009, "Entity": "India", "Value_co2_emissions_kt_by_country": 1564880.005, "Electricity from fossil fuels (TWh)": 728.56, "Electricity from nuclear (TWh)": 16.82, "Electricity from renewables (TWh)": 134.33}, {"Year": 2010, "Entity": "India", "Value_co2_emissions_kt_by_country": 1659979.98, "Electricity from fossil fuels (TWh)": 771.78, "Electricity from nuclear (TWh)": 23.08, "Electricity from renewables (TWh)": 142.61}, {"Year": 2011, "Entity": "India", "Value_co2_emissions_kt_by_country": 1756739.99, "Electricity from fossil fuels (TWh)": 828.16, "Electricity from nuclear (TWh)": 32.22, "Electricity from renewables (TWh)": 173.62}, {"Year": 2012, "Entity": "India", "Value_co2_emissions_kt_by_country": 1909439.941, "Electricity from fossil fuels (TWh)": 893.45, "Electricity from nuclear (TWh)": 33.14, "Electricity from renewables (TWh)": 165.25}, {"Year": 2013, "Entity": "India", "Value_co2_emissions_kt_by_country": 1972430.054, "Electricity from fossil fuels (TWh)": 924.93, "Electricity from nuclear (TWh)": 33.31, "Electricity from renewables (TWh)": 187.9}, {"Year": 2014, "Entity": "India", "Value_co2_emissions_kt_by_country": 2147110.107, "Electricity from fossil fuels (TWh)": 1025.29, "Electricity from nuclear (TWh)": 34.69, "Electricity from renewables (TWh)": 202.04}, {"Year": 2015, "Entity": "India", "Value_co2_emissions_kt_by_country": 2158020.02, "Electricity from fossil fuels (TWh)": 1080.44, "Electricity from nuclear (TWh)": 38.31, "Electricity from renewables (TWh)": 203.21}, {"Year": 2016, "Entity": "India", "Value_co2_emissions_kt_by_country": 2195250, "Electricity from fossil fuels (TWh)": 1155.52, "Electricity from nuclear (TWh)": 37.9, "Electricity from renewables (TWh)": 208.21}, {"Year": 2017, "Entity": "India", "Value_co2_emissions_kt_by_country": 2320409.912, "Electricity from fossil fuels (TWh)": 1198.85, "Electricity from nuclear (TWh)": 37.41, "Electricity from renewables (TWh)": 234.9}, {"Year": 2018, "Entity": "India", "Value_co2_emissions_kt_by_country": 2451929.932, "Electricity from fossil fuels (TWh)": 1276.32, "Electricity from nuclear (TWh)": 39.05, "Electricity from renewables (TWh)": 263.61}, {"Year": 2019, "Entity": "India", "Value_co2_emissions_kt_by_country": 2456300.049, "Electricity from fossil fuels (TWh)": 1273.59, "Electricity from nuclear (TWh)": 45.16, "Electricity from renewables (TWh)": 303.16}, {"Year": 2020, "Entity": "India", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 1202.34, "Electricity from nuclear (TWh)": 44.61, "Electricity from renewables (TWh)": 315.76}, {"Year": 2000, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 280650, "Electricity from fossil fuels (TWh)": 78.43, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 19.6}, {"Year": 2001, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 302060, "Electricity from fossil fuels (TWh)": 83.96, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 22.19}, {"Year": 2002, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 305640.0146, "Electricity from fossil fuels (TWh)": 92.03, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 21}, {"Year": 2003, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 333890.0146, "Electricity from fossil fuels (TWh)": 97.57, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 19.82}, {"Year": 2004, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 341239.9902, "Electricity from fossil fuels (TWh)": 103.8, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 20.97}, {"Year": 2005, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 342149.9939, "Electricity from fossil fuels (TWh)": 110.22, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 22.66}, {"Year": 2006, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 364470.0012, "Electricity from fossil fuels (TWh)": 116.8, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 21.18}, {"Year": 2007, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 379959.9915, "Electricity from fossil fuels (TWh)": 124.1, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 24.29}, {"Year": 2008, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 376140.0146, "Electricity from fossil fuels (TWh)": 129.55, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 26.34}, {"Year": 2009, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 391079.9866, "Electricity from fossil fuels (TWh)": 136.05, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 26.79}, {"Year": 2010, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 415519.989, "Electricity from fossil fuels (TWh)": 142.88, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 34.63}, {"Year": 2011, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 475309.9976, "Electricity from fossil fuels (TWh)": 161.41, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 30.46}, {"Year": 2012, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 481510.0098, "Electricity from fossil fuels (TWh)": 177.83, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 31.11}, {"Year": 2013, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 447940.0024, "Electricity from fossil fuels (TWh)": 189.66, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 35.5}, {"Year": 2014, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 483910.0037, "Electricity from fossil fuels (TWh)": 203.11, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 34.41}, {"Year": 2015, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 488549.9878, "Electricity from fossil fuels (TWh)": 209.71, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 33.56}, {"Year": 2016, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 482510.0098, "Electricity from fossil fuels (TWh)": 217.97, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 39.58}, {"Year": 2017, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 517320.0073, "Electricity from fossil fuels (TWh)": 222.64, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 43.17}, {"Year": 2018, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 576989.9902, "Electricity from fossil fuels (TWh)": 235.41, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 48.38}, {"Year": 2019, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 619840.0269, "Electricity from fossil fuels (TWh)": 247.39, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 48.04}, {"Year": 2020, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 238.91, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 52.91}, {"Year": 2000, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 436300, "Electricity from fossil fuels (TWh)": 218.28, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 50.87}, {"Year": 2001, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 436570, "Electricity from fossil fuels (TWh)": 216.73, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 54.35}, {"Year": 2002, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 443470.0012, "Electricity from fossil fuels (TWh)": 228.45, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 48.31}, {"Year": 2003, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 462200.0122, "Electricity from fossil fuels (TWh)": 238.52, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 46.86}, {"Year": 2004, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 472399.9939, "Electricity from fossil fuels (TWh)": 240.95, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 53.88}, {"Year": 2005, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 473829.9866, "Electricity from fossil fuels (TWh)": 247.29, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 48.43}, {"Year": 2006, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 466649.9939, "Electricity from fossil fuels (TWh)": 256.03, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 50.64}, {"Year": 2007, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 459369.9951, "Electricity from fossil fuels (TWh)": 259.49, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 47.72}, {"Year": 2008, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 444980.011, "Electricity from fossil fuels (TWh)": 254.34, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 58.16}, {"Year": 2009, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 397059.9976, "Electricity from fossil fuels (TWh)": 218.32, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 69.26}, {"Year": 2010, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 405269.989, "Electricity from fossil fuels (TWh)": 220.93, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 76.98}, {"Year": 2011, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 396690.0024, "Electricity from fossil fuels (TWh)": 216.78, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 82.96}, {"Year": 2012, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 376750, "Electricity from fossil fuels (TWh)": 204.26, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 92.22}, {"Year": 2013, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 346459.9915, "Electricity from fossil fuels (TWh)": 175.07, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 112}, {"Year": 2014, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 327500, "Electricity from fossil fuels (TWh)": 156.76, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 120.68}, {"Year": 2015, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 337859.9854, "Electricity from fossil fuels (TWh)": 172.06, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 108.89}, {"Year": 2016, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 333339.9963, "Electricity from fossil fuels (TWh)": 179.19, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 108.01}, {"Year": 2017, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 329190.0024, "Electricity from fossil fuels (TWh)": 189.44, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 103.89}, {"Year": 2018, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 324880.0049, "Electricity from fossil fuels (TWh)": 172.98, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 114.41}, {"Year": 2019, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 317239.9902, "Electricity from fossil fuels (TWh)": 175.52, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 115.83}, {"Year": 2020, "Entity": "Italy", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 161.17, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 116.9}, {"Year": 2000, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1182610, "Electricity from fossil fuels (TWh)": 578.29, "Electricity from nuclear (TWh)": 305.95, "Electricity from renewables (TWh)": 104.16}, {"Year": 2001, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1170380, "Electricity from fossil fuels (TWh)": 564.95, "Electricity from nuclear (TWh)": 303.86, "Electricity from renewables (TWh)": 101.36}, {"Year": 2002, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1206599.976, "Electricity from fossil fuels (TWh)": 605.12, "Electricity from nuclear (TWh)": 280.34, "Electricity from renewables (TWh)": 101.1}, {"Year": 2003, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1214949.951, "Electricity from fossil fuels (TWh)": 633.76, "Electricity from nuclear (TWh)": 228.01, "Electricity from renewables (TWh)": 114.18}, {"Year": 2004, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1209849.976, "Electricity from fossil fuels (TWh)": 621.6, "Electricity from nuclear (TWh)": 268.32, "Electricity from renewables (TWh)": 114.73}, {"Year": 2005, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1212819.946, "Electricity from fossil fuels (TWh)": 634.09, "Electricity from nuclear (TWh)": 280.5, "Electricity from renewables (TWh)": 100.57}, {"Year": 2006, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1189520.02, "Electricity from fossil fuels (TWh)": 628.77, "Electricity from nuclear (TWh)": 291.54, "Electricity from renewables (TWh)": 112.07}, {"Year": 2007, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1225069.946, "Electricity from fossil fuels (TWh)": 705.37, "Electricity from nuclear (TWh)": 267.34, "Electricity from renewables (TWh)": 100.8}, {"Year": 2008, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1158219.971, "Electricity from fossil fuels (TWh)": 663.88, "Electricity from nuclear (TWh)": 241.25, "Electricity from renewables (TWh)": 100.79}, {"Year": 2009, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1100979.98, "Electricity from fossil fuels (TWh)": 611.86, "Electricity from nuclear (TWh)": 263.05, "Electricity from renewables (TWh)": 102.28}, {"Year": 2010, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1156479.98, "Electricity from fossil fuels (TWh)": 689.89, "Electricity from nuclear (TWh)": 278.36, "Electricity from renewables (TWh)": 113.92}, {"Year": 2011, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1213520.02, "Electricity from fossil fuels (TWh)": 777.1, "Electricity from nuclear (TWh)": 153.38, "Electricity from renewables (TWh)": 116.5}, {"Year": 2012, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1253609.985, "Electricity from fossil fuels (TWh)": 920.39, "Electricity from nuclear (TWh)": 15.12, "Electricity from renewables (TWh)": 111.09}, {"Year": 2013, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1262780.029, "Electricity from fossil fuels (TWh)": 897.88, "Electricity from nuclear (TWh)": 10.43, "Electricity from renewables (TWh)": 121.48}, {"Year": 2014, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1217119.995, "Electricity from fossil fuels (TWh)": 892.18, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 136.53}, {"Year": 2015, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1179439.941, "Electricity from fossil fuels (TWh)": 844.23, "Electricity from nuclear (TWh)": 3.24, "Electricity from renewables (TWh)": 157.34}, {"Year": 2016, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1167790.039, "Electricity from fossil fuels (TWh)": 832.4, "Electricity from nuclear (TWh)": 14.87, "Electricity from renewables (TWh)": 157.7}, {"Year": 2017, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1155229.98, "Electricity from fossil fuels (TWh)": 806.12, "Electricity from nuclear (TWh)": 27.75, "Electricity from renewables (TWh)": 175.12}, {"Year": 2018, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1116150.024, "Electricity from fossil fuels (TWh)": 780.61, "Electricity from nuclear (TWh)": 47.82, "Electricity from renewables (TWh)": 183.63}, {"Year": 2019, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1081569.946, "Electricity from fossil fuels (TWh)": 735.66, "Electricity from nuclear (TWh)": 63.88, "Electricity from renewables (TWh)": 192.72}, {"Year": 2020, "Entity": "Japan", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 716.67, "Electricity from nuclear (TWh)": 41.86, "Electricity from renewables (TWh)": 205.6}, {"Year": 2000, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 120150, "Electricity from fossil fuels (TWh)": 44.11, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 7.53}, {"Year": 2001, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 117440, "Electricity from fossil fuels (TWh)": 47.3, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 8.08}, {"Year": 2002, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 131059.9976, "Electricity from fossil fuels (TWh)": 49.44, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 8.89}, {"Year": 2003, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 146139.9994, "Electricity from fossil fuels (TWh)": 55.24, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 8.62}, {"Year": 2004, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 158029.9988, "Electricity from fossil fuels (TWh)": 58.89, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 8.06}, {"Year": 2005, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 169210.0067, "Electricity from fossil fuels (TWh)": 60.06, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 7.86}, {"Year": 2006, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 185300.0031, "Electricity from fossil fuels (TWh)": 63.89, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 7.77}, {"Year": 2007, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 198389.9994, "Electricity from fossil fuels (TWh)": 68.45, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 8.17}, {"Year": 2008, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 242029.9988, "Electricity from fossil fuels (TWh)": 72.89, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 7.46}, {"Year": 2009, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 213610.0006, "Electricity from fossil fuels (TWh)": 71.85, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 6.88}, {"Year": 2010, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 229699.9969, "Electricity from fossil fuels (TWh)": 74.63, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 8.02}, {"Year": 2011, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 245449.9969, "Electricity from fossil fuels (TWh)": 78.7, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 7.88}, {"Year": 2012, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 244600.0061, "Electricity from fossil fuels (TWh)": 82.98, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 7.64}, {"Year": 2013, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 260010.0098, "Electricity from fossil fuels (TWh)": 84.88, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 7.73}, {"Year": 2014, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 209229.9957, "Electricity from fossil fuels (TWh)": 86.37, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 8.27}, {"Year": 2015, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 190729.9957, "Electricity from fossil fuels (TWh)": 82.2, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 9.45}, {"Year": 2016, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 202149.9939, "Electricity from fossil fuels (TWh)": 82.65, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 11.98}, {"Year": 2017, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 214580.0018, "Electricity from fossil fuels (TWh)": 91.48, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 11.64}, {"Year": 2018, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 216600.0061, "Electricity from fossil fuels (TWh)": 96.36, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 10.91}, {"Year": 2019, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 212110.0006, "Electricity from fossil fuels (TWh)": 95.39, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 11.09}, {"Year": 2020, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 96.7, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 11.94}, {"Year": 2000, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 379180, "Electricity from fossil fuels (TWh)": 141.8, "Electricity from nuclear (TWh)": 7.81, "Electricity from renewables (TWh)": 44.51}, {"Year": 2001, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 378830, "Electricity from fossil fuels (TWh)": 153.32, "Electricity from nuclear (TWh)": 8.29, "Electricity from renewables (TWh)": 39.56}, {"Year": 2002, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 386000, "Electricity from fossil fuels (TWh)": 159.81, "Electricity from nuclear (TWh)": 9.26, "Electricity from renewables (TWh)": 35.67}, {"Year": 2003, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 404690.0024, "Electricity from fossil fuels (TWh)": 160.45, "Electricity from nuclear (TWh)": 9.98, "Electricity from renewables (TWh)": 32.11}, {"Year": 2004, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 414100.0061, "Electricity from fossil fuels (TWh)": 173.66, "Electricity from nuclear (TWh)": 8.73, "Electricity from renewables (TWh)": 38.19}, {"Year": 2005, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 432190.0024, "Electricity from fossil fuels (TWh)": 178.76, "Electricity from nuclear (TWh)": 10.32, "Electricity from renewables (TWh)": 42.29}, {"Year": 2006, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 448299.9878, "Electricity from fossil fuels (TWh)": 182.76, "Electricity from nuclear (TWh)": 10.4, "Electricity from renewables (TWh)": 43.63}, {"Year": 2007, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 457119.9951, "Electricity from fossil fuels (TWh)": 191.83, "Electricity from nuclear (TWh)": 9.95, "Electricity from renewables (TWh)": 42.14}, {"Year": 2008, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 459549.9878, "Electricity from fossil fuels (TWh)": 184.51, "Electricity from nuclear (TWh)": 9.36, "Electricity from renewables (TWh)": 53.22}, {"Year": 2009, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 448369.9951, "Electricity from fossil fuels (TWh)": 194.75, "Electricity from nuclear (TWh)": 10.11, "Electricity from renewables (TWh)": 40.59}, {"Year": 2010, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 462869.9951, "Electricity from fossil fuels (TWh)": 207.38, "Electricity from nuclear (TWh)": 5.66, "Electricity from renewables (TWh)": 51.37}, {"Year": 2011, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 478399.9939, "Electricity from fossil fuels (TWh)": 219.88, "Electricity from nuclear (TWh)": 9.66, "Electricity from renewables (TWh)": 50.7}, {"Year": 2012, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 486450.0122, "Electricity from fossil fuels (TWh)": 229.14, "Electricity from nuclear (TWh)": 8.41, "Electricity from renewables (TWh)": 47.2}, {"Year": 2013, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 475739.9902, "Electricity from fossil fuels (TWh)": 231.23, "Electricity from nuclear (TWh)": 11.38, "Electricity from renewables (TWh)": 44.67}, {"Year": 2014, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 462239.9902, "Electricity from fossil fuels (TWh)": 223.43, "Electricity from nuclear (TWh)": 9.3, "Electricity from renewables (TWh)": 57.46}, {"Year": 2015, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 471630.0049, "Electricity from fossil fuels (TWh)": 234.28, "Electricity from nuclear (TWh)": 11.18, "Electricity from renewables (TWh)": 52.42}, {"Year": 2016, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 473309.9976, "Electricity from fossil fuels (TWh)": 239.78, "Electricity from nuclear (TWh)": 10.27, "Electricity from renewables (TWh)": 52.97}, {"Year": 2017, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 471579.9866, "Electricity from fossil fuels (TWh)": 242.69, "Electricity from nuclear (TWh)": 10.57, "Electricity from renewables (TWh)": 55.88}, {"Year": 2018, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 452570.0073, "Electricity from fossil fuels (TWh)": 259.92, "Electricity from nuclear (TWh)": 13.32, "Electricity from renewables (TWh)": 58.78}, {"Year": 2019, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 449269.989, "Electricity from fossil fuels (TWh)": 248.2, "Electricity from nuclear (TWh)": 10.88, "Electricity from renewables (TWh)": 59}, {"Year": 2020, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 245.46, "Electricity from nuclear (TWh)": 10.87, "Electricity from renewables (TWh)": 69.19}, {"Year": 2000, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 295770, "Electricity from fossil fuels (TWh)": 140.85, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 2.33}, {"Year": 2001, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 293630, "Electricity from fossil fuels (TWh)": 140.94, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 2.78}, {"Year": 2002, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 287320.0073, "Electricity from fossil fuels (TWh)": 139.72, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 2.77}, {"Year": 2003, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 297730.011, "Electricity from fossil fuels (TWh)": 147.76, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 2.25}, {"Year": 2004, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 301850.0061, "Electricity from fossil fuels (TWh)": 149.06, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 3.2}, {"Year": 2005, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 301350.0061, "Electricity from fossil fuels (TWh)": 151.2, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 3.85}, {"Year": 2006, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 314089.9963, "Electricity from fossil fuels (TWh)": 156.16, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 4.29}, {"Year": 2007, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 313380.0049, "Electricity from fossil fuels (TWh)": 153.08, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 5.43}, {"Year": 2008, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 308329.9866, "Electricity from fossil fuels (TWh)": 148.03, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 6.61}, {"Year": 2009, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 297260.0098, "Electricity from fossil fuels (TWh)": 142.4, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 8.69}, {"Year": 2010, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 313739.9902, "Electricity from fossil fuels (TWh)": 146.12, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 10.88}, {"Year": 2011, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 310589.9963, "Electricity from fossil fuels (TWh)": 149.88, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 13.13}, {"Year": 2012, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 303350.0061, "Electricity from fossil fuels (TWh)": 144.75, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 16.88}, {"Year": 2013, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 298299.9878, "Electricity from fossil fuels (TWh)": 146.85, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 17.06}, {"Year": 2014, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 285730.011, "Electricity from fossil fuels (TWh)": 138.53, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 19.85}, {"Year": 2015, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 289079.9866, "Electricity from fossil fuels (TWh)": 141.55, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 22.69}, {"Year": 2016, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 299799.9878, "Electricity from fossil fuels (TWh)": 143.28, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 22.81}, {"Year": 2017, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 312859.9854, "Electricity from fossil fuels (TWh)": 145.8, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 24.13}, {"Year": 2018, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 311910.0037, "Electricity from fossil fuels (TWh)": 147.87, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 21.62}, {"Year": 2019, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 295130.0049, "Electricity from fossil fuels (TWh)": 137.58, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 25.46}, {"Year": 2020, "Entity": "Poland", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 128.91, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 28.23}, {"Year": 2000, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 249660, "Electricity from fossil fuels (TWh)": 138.68, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2001, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 254090, "Electricity from fossil fuels (TWh)": 146.09, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2002, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 272250, "Electricity from fossil fuels (TWh)": 154.91, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2003, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 284829.9866, "Electricity from fossil fuels (TWh)": 166.58, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2004, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 299890.0146, "Electricity from fossil fuels (TWh)": 173.41, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2005, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 315290.0085, "Electricity from fossil fuels (TWh)": 191.05, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2006, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 335440.0024, "Electricity from fossil fuels (TWh)": 196.31, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2007, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 354619.9951, "Electricity from fossil fuels (TWh)": 204.43, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2008, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 389720.0012, "Electricity from fossil fuels (TWh)": 204.2, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2009, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 406529.9988, "Electricity from fossil fuels (TWh)": 217.31, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2010, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 446130.0049, "Electricity from fossil fuels (TWh)": 240.06, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2011, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 463769.989, "Electricity from fossil fuels (TWh)": 250.07, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.01}, {"Year": 2012, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 492470.0012, "Electricity from fossil fuels (TWh)": 271.68, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.03}, {"Year": 2013, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 503209.9915, "Electricity from fossil fuels (TWh)": 284.02, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.04}, {"Year": 2014, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 540520.0195, "Electricity from fossil fuels (TWh)": 311.81, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.05}, {"Year": 2015, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 565190.0024, "Electricity from fossil fuels (TWh)": 338.34, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.05}, {"Year": 2016, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 561229.9805, "Electricity from fossil fuels (TWh)": 337.38, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.05}, {"Year": 2017, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 545070.0073, "Electricity from fossil fuels (TWh)": 354.3, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.07}, {"Year": 2018, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 521260.0098, "Electricity from fossil fuels (TWh)": 334.7, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.16}, {"Year": 2019, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 523780.0293, "Electricity from fossil fuels (TWh)": 335.24, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.21}, {"Year": 2020, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 337.82, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.21}, {"Year": 2000, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 284660, "Electricity from fossil fuels (TWh)": 181.67, "Electricity from nuclear (TWh)": 13.01, "Electricity from renewables (TWh)": 1.79}, {"Year": 2001, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 320540, "Electricity from fossil fuels (TWh)": 183.36, "Electricity from nuclear (TWh)": 10.72, "Electricity from renewables (TWh)": 2.46}, {"Year": 2002, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 331320.0073, "Electricity from fossil fuels (TWh)": 188.79, "Electricity from nuclear (TWh)": 11.99, "Electricity from renewables (TWh)": 2.81}, {"Year": 2003, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 353089.9963, "Electricity from fossil fuels (TWh)": 204.39, "Electricity from nuclear (TWh)": 12.66, "Electricity from renewables (TWh)": 1.19}, {"Year": 2004, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 379989.9902, "Electricity from fossil fuels (TWh)": 212.63, "Electricity from nuclear (TWh)": 14.28, "Electricity from renewables (TWh)": 1.33}, {"Year": 2005, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 377649.9939, "Electricity from fossil fuels (TWh)": 215.23, "Electricity from nuclear (TWh)": 12.24, "Electricity from renewables (TWh)": 1.75}, {"Year": 2006, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 379790.0085, "Electricity from fossil fuels (TWh)": 223.25, "Electricity from nuclear (TWh)": 10.07, "Electricity from renewables (TWh)": 3.28}, {"Year": 2007, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 397059.9976, "Electricity from fossil fuels (TWh)": 232.91, "Electricity from nuclear (TWh)": 12.6, "Electricity from renewables (TWh)": 1.3}, {"Year": 2008, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 426739.9902, "Electricity from fossil fuels (TWh)": 226.32, "Electricity from nuclear (TWh)": 12.75, "Electricity from renewables (TWh)": 1.66}, {"Year": 2009, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 404200.0122, "Electricity from fossil fuels (TWh)": 218.17, "Electricity from nuclear (TWh)": 11.57, "Electricity from renewables (TWh)": 1.86}, {"Year": 2010, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 425309.9976, "Electricity from fossil fuels (TWh)": 227.57, "Electricity from nuclear (TWh)": 12.9, "Electricity from renewables (TWh)": 2.51}, {"Year": 2011, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 409260.0098, "Electricity from fossil fuels (TWh)": 229.06, "Electricity from nuclear (TWh)": 12.94, "Electricity from renewables (TWh)": 2.49}, {"Year": 2012, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 426779.9988, "Electricity from fossil fuels (TWh)": 226.84, "Electricity from nuclear (TWh)": 12.4, "Electricity from renewables (TWh)": 1.66}, {"Year": 2013, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 436920.0134, "Electricity from fossil fuels (TWh)": 223.28, "Electricity from nuclear (TWh)": 13.61, "Electricity from renewables (TWh)": 1.62}, {"Year": 2014, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 447929.9927, "Electricity from fossil fuels (TWh)": 218.42, "Electricity from nuclear (TWh)": 14.76, "Electricity from renewables (TWh)": 3.38}, {"Year": 2015, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 424809.9976, "Electricity from fossil fuels (TWh)": 214.88, "Electricity from nuclear (TWh)": 10.97, "Electricity from renewables (TWh)": 6.09}, {"Year": 2016, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 425140.0146, "Electricity from fossil fuels (TWh)": 213.09, "Electricity from nuclear (TWh)": 15.21, "Electricity from renewables (TWh)": 7.69}, {"Year": 2017, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 435649.9939, "Electricity from fossil fuels (TWh)": 212.77, "Electricity from nuclear (TWh)": 15.09, "Electricity from renewables (TWh)": 10.04}, {"Year": 2018, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 434350.0061, "Electricity from fossil fuels (TWh)": 214.25, "Electricity from nuclear (TWh)": 10.56, "Electricity from renewables (TWh)": 12.22}, {"Year": 2019, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 439640.0146, "Electricity from fossil fuels (TWh)": 208.39, "Electricity from nuclear (TWh)": 13.6, "Electricity from renewables (TWh)": 12.57}, {"Year": 2020, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 197.5, "Electricity from nuclear (TWh)": 11.62, "Electricity from renewables (TWh)": 12.83}, {"Year": 2000, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 293310, "Electricity from fossil fuels (TWh)": 124.22, "Electricity from nuclear (TWh)": 62.21, "Electricity from renewables (TWh)": 34.49}, {"Year": 2001, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 294790, "Electricity from fossil fuels (TWh)": 120.06, "Electricity from nuclear (TWh)": 63.71, "Electricity from renewables (TWh)": 49.3}, {"Year": 2002, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 312750, "Electricity from fossil fuels (TWh)": 143.72, "Electricity from nuclear (TWh)": 63.02, "Electricity from renewables (TWh)": 33.17}, {"Year": 2003, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 318660.0037, "Electricity from fossil fuels (TWh)": 139.67, "Electricity from nuclear (TWh)": 61.88, "Electricity from renewables (TWh)": 55.75}, {"Year": 2004, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 335559.9976, "Electricity from fossil fuels (TWh)": 159.91, "Electricity from nuclear (TWh)": 63.61, "Electricity from renewables (TWh)": 50.13}, {"Year": 2005, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 350500, "Electricity from fossil fuels (TWh)": 184.65, "Electricity from nuclear (TWh)": 57.54, "Electricity from renewables (TWh)": 42.27}, {"Year": 2006, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 341779.9988, "Electricity from fossil fuels (TWh)": 182.98, "Electricity from nuclear (TWh)": 60.13, "Electricity from renewables (TWh)": 52.15}, {"Year": 2007, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 354679.9927, "Electricity from fossil fuels (TWh)": 188.13, "Electricity from nuclear (TWh)": 55.1, "Electricity from renewables (TWh)": 58.3}, {"Year": 2008, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 324269.989, "Electricity from fossil fuels (TWh)": 189.55, "Electricity from nuclear (TWh)": 58.97, "Electricity from renewables (TWh)": 62.15}, {"Year": 2009, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 287489.9902, "Electricity from fossil fuels (TWh)": 164.69, "Electricity from nuclear (TWh)": 52.76, "Electricity from renewables (TWh)": 74.08}, {"Year": 2010, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 273250, "Electricity from fossil fuels (TWh)": 138.39, "Electricity from nuclear (TWh)": 61.99, "Electricity from renewables (TWh)": 97.77}, {"Year": 2011, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 274399.9939, "Electricity from fossil fuels (TWh)": 146.12, "Electricity from nuclear (TWh)": 57.72, "Electricity from renewables (TWh)": 87.53}, {"Year": 2012, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 269269.989, "Electricity from fossil fuels (TWh)": 145.33, "Electricity from nuclear (TWh)": 61.47, "Electricity from renewables (TWh)": 86.97}, {"Year": 2013, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 242809.9976, "Electricity from fossil fuels (TWh)": 113.32, "Electricity from nuclear (TWh)": 56.73, "Electricity from renewables (TWh)": 111.42}, {"Year": 2014, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 240960.0067, "Electricity from fossil fuels (TWh)": 107.37, "Electricity from nuclear (TWh)": 57.31, "Electricity from renewables (TWh)": 110.26}, {"Year": 2015, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 256279.9988, "Electricity from fossil fuels (TWh)": 123.19, "Electricity from nuclear (TWh)": 57.2, "Electricity from renewables (TWh)": 97.09}, {"Year": 2016, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 247029.9988, "Electricity from fossil fuels (TWh)": 107.93, "Electricity from nuclear (TWh)": 58.63, "Electricity from renewables (TWh)": 104.63}, {"Year": 2017, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 263450.0122, "Electricity from fossil fuels (TWh)": 126.93, "Electricity from nuclear (TWh)": 58.04, "Electricity from renewables (TWh)": 87.93}, {"Year": 2018, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 257040.0085, "Electricity from fossil fuels (TWh)": 112.23, "Electricity from nuclear (TWh)": 55.77, "Electricity from renewables (TWh)": 103.88}, {"Year": 2019, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 239979.9957, "Electricity from fossil fuels (TWh)": 111.55, "Electricity from nuclear (TWh)": 58.35, "Electricity from renewables (TWh)": 100.99}, {"Year": 2020, "Entity": "Spain", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 87.64, "Electricity from nuclear (TWh)": 58.3, "Electricity from renewables (TWh)": 113.79}, {"Year": 2000, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 164490, "Electricity from fossil fuels (TWh)": 83.15, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 6.38}, {"Year": 2001, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 173160, "Electricity from fossil fuels (TWh)": 88.97, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 6.76}, {"Year": 2002, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 184240.0055, "Electricity from fossil fuels (TWh)": 93.51, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 8.07}, {"Year": 2003, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 191929.9927, "Electricity from fossil fuels (TWh)": 100.61, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 8.36}, {"Year": 2004, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 210190.0024, "Electricity from fossil fuels (TWh)": 109.46, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 7.63}, {"Year": 2005, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 217770.0043, "Electricity from fossil fuels (TWh)": 115.58, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 7.42}, {"Year": 2006, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 219880.0049, "Electricity from fossil fuels (TWh)": 119.41, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 9.82}, {"Year": 2007, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 224589.9963, "Electricity from fossil fuels (TWh)": 122.12, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 10.2}, {"Year": 2008, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 227580.0018, "Electricity from fossil fuels (TWh)": 127.43, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 8.95}, {"Year": 2009, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 220259.9945, "Electricity from fossil fuels (TWh)": 128.09, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 9.09}, {"Year": 2010, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 234380.0049, "Electricity from fossil fuels (TWh)": 141.72, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 8.58}, {"Year": 2011, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 233600.0061, "Electricity from fossil fuels (TWh)": 135.31, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 11.83}, {"Year": 2012, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 250679.9927, "Electricity from fossil fuels (TWh)": 143.73, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 13.42}, {"Year": 2013, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 260700.0122, "Electricity from fossil fuels (TWh)": 148.29, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 12.33}, {"Year": 2014, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 256799.9878, "Electricity from fossil fuels (TWh)": 149.26, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 13.68}, {"Year": 2015, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 264000, "Electricity from fossil fuels (TWh)": 153.4, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 13.33}, {"Year": 2016, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 261600.0061, "Electricity from fossil fuels (TWh)": 161.79, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 15.97}, {"Year": 2017, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 258820.0073, "Electricity from fossil fuels (TWh)": 161.88, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 19.92}, {"Year": 2018, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 257049.9878, "Electricity from fossil fuels (TWh)": 156.26, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 25.84}, {"Year": 2019, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 267089.9963, "Electricity from fossil fuels (TWh)": 162.59, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 28.02}, {"Year": 2020, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 154.52, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 24.73}, {"Year": 2000, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 297380, "Electricity from fossil fuels (TWh)": 82.65, "Electricity from nuclear (TWh)": 77.34, "Electricity from renewables (TWh)": 11.28}, {"Year": 2001, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 300550, "Electricity from fossil fuels (TWh)": 84.59, "Electricity from nuclear (TWh)": 76.17, "Electricity from renewables (TWh)": 12.05}, {"Year": 2002, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 303940.0024, "Electricity from fossil fuels (TWh)": 85.93, "Electricity from nuclear (TWh)": 77.99, "Electricity from renewables (TWh)": 9.65}, {"Year": 2003, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 330230.011, "Electricity from fossil fuels (TWh)": 89.52, "Electricity from nuclear (TWh)": 81.41, "Electricity from renewables (TWh)": 9.27}, {"Year": 2004, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 307140.0146, "Electricity from fossil fuels (TWh)": 83.22, "Electricity from nuclear (TWh)": 87.02, "Electricity from renewables (TWh)": 11.78}, {"Year": 2005, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 295410.0037, "Electricity from fossil fuels (TWh)": 84.75, "Electricity from nuclear (TWh)": 88.76, "Electricity from renewables (TWh)": 12.4}, {"Year": 2006, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 303989.9902, "Electricity from fossil fuels (TWh)": 90.09, "Electricity from nuclear (TWh)": 90.22, "Electricity from renewables (TWh)": 12.92}, {"Year": 2007, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 312140.0146, "Electricity from fossil fuels (TWh)": 93.13, "Electricity from nuclear (TWh)": 92.54, "Electricity from renewables (TWh)": 10.47}, {"Year": 2008, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 301200.0122, "Electricity from fossil fuels (TWh)": 90.92, "Electricity from nuclear (TWh)": 89.84, "Electricity from renewables (TWh)": 11.82}, {"Year": 2009, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 251619.9951, "Electricity from fossil fuels (TWh)": 78.58, "Electricity from nuclear (TWh)": 82.92, "Electricity from renewables (TWh)": 12.12}, {"Year": 2010, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 268920.0134, "Electricity from fossil fuels (TWh)": 86.28, "Electricity from nuclear (TWh)": 89.15, "Electricity from renewables (TWh)": 13.39}, {"Year": 2011, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 283339.9963, "Electricity from fossil fuels (TWh)": 93.5, "Electricity from nuclear (TWh)": 90.25, "Electricity from renewables (TWh)": 11.2}, {"Year": 2012, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 277109.9854, "Electricity from fossil fuels (TWh)": 96.99, "Electricity from nuclear (TWh)": 90.14, "Electricity from renewables (TWh)": 11.23}, {"Year": 2013, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 270269.989, "Electricity from fossil fuels (TWh)": 95.39, "Electricity from nuclear (TWh)": 83.21, "Electricity from renewables (TWh)": 15.11}, {"Year": 2014, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 237729.9957, "Electricity from fossil fuels (TWh)": 83.42, "Electricity from nuclear (TWh)": 88.39, "Electricity from renewables (TWh)": 10.17}, {"Year": 2015, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 191070.0073, "Electricity from fossil fuels (TWh)": 66.91, "Electricity from nuclear (TWh)": 87.63, "Electricity from renewables (TWh)": 7.1}, {"Year": 2016, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 201660.0037, "Electricity from fossil fuels (TWh)": 72.66, "Electricity from nuclear (TWh)": 80.95, "Electricity from renewables (TWh)": 9.25}, {"Year": 2017, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 174940.0024, "Electricity from fossil fuels (TWh)": 57.96, "Electricity from nuclear (TWh)": 85.58, "Electricity from renewables (TWh)": 10.88}, {"Year": 2018, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 185619.9951, "Electricity from fossil fuels (TWh)": 60.81, "Electricity from nuclear (TWh)": 84.4, "Electricity from renewables (TWh)": 13.02}, {"Year": 2019, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 174729.9957, "Electricity from fossil fuels (TWh)": 57.79, "Electricity from nuclear (TWh)": 83, "Electricity from renewables (TWh)": 11.87}, {"Year": 2020, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 54.5, "Electricity from nuclear (TWh)": 76.2, "Electricity from renewables (TWh)": 17.56}, {"Year": 2000, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 530890, "Electricity from fossil fuels (TWh)": 279.34, "Electricity from nuclear (TWh)": 85.06, "Electricity from renewables (TWh)": 9.98}, {"Year": 2001, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 545260, "Electricity from fossil fuels (TWh)": 282.72, "Electricity from nuclear (TWh)": 90.09, "Electricity from renewables (TWh)": 9.56}, {"Year": 2002, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 530789.978, "Electricity from fossil fuels (TWh)": 285.62, "Electricity from nuclear (TWh)": 87.85, "Electricity from renewables (TWh)": 11.13}, {"Year": 2003, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 543039.978, "Electricity from fossil fuels (TWh)": 296.15, "Electricity from nuclear (TWh)": 88.69, "Electricity from renewables (TWh)": 10.62}, {"Year": 2004, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 543080.0171, "Electricity from fossil fuels (TWh)": 297.15, "Electricity from nuclear (TWh)": 80, "Electricity from renewables (TWh)": 14.14}, {"Year": 2005, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 540919.9829, "Electricity from fossil fuels (TWh)": 296.87, "Electricity from nuclear (TWh)": 81.62, "Electricity from renewables (TWh)": 16.93}, {"Year": 2006, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 542059.9976, "Electricity from fossil fuels (TWh)": 299.88, "Electricity from nuclear (TWh)": 75.45, "Electricity from renewables (TWh)": 18.11}, {"Year": 2007, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 530500, "Electricity from fossil fuels (TWh)": 310.26, "Electricity from nuclear (TWh)": 63.03, "Electricity from renewables (TWh)": 19.69}, {"Year": 2008, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 515340.0269, "Electricity from fossil fuels (TWh)": 310.5, "Electricity from nuclear (TWh)": 52.49, "Electricity from renewables (TWh)": 21.85}, {"Year": 2009, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 466489.9902, "Electricity from fossil fuels (TWh)": 278.73, "Electricity from nuclear (TWh)": 69.1, "Electricity from renewables (TWh)": 25.25}, {"Year": 2010, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 482440.0024, "Electricity from fossil fuels (TWh)": 290.59, "Electricity from nuclear (TWh)": 62.14, "Electricity from renewables (TWh)": 26.18}, {"Year": 2011, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 445589.9963, "Electricity from fossil fuels (TWh)": 260.88, "Electricity from nuclear (TWh)": 68.98, "Electricity from renewables (TWh)": 35.2}, {"Year": 2012, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 467779.9988, "Electricity from fossil fuels (TWh)": 249.25, "Electricity from nuclear (TWh)": 70.4, "Electricity from renewables (TWh)": 41.24}, {"Year": 2013, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 453760.0098, "Electricity from fossil fuels (TWh)": 231.56, "Electricity from nuclear (TWh)": 70.61, "Electricity from renewables (TWh)": 53.21}, {"Year": 2014, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 415600.0061, "Electricity from fossil fuels (TWh)": 206.94, "Electricity from nuclear (TWh)": 63.75, "Electricity from renewables (TWh)": 64.52}, {"Year": 2015, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 401079.9866, "Electricity from fossil fuels (TWh)": 182.43, "Electricity from nuclear (TWh)": 70.34, "Electricity from renewables (TWh)": 82.57}, {"Year": 2016, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 380809.9976, "Electricity from fossil fuels (TWh)": 181.56, "Electricity from nuclear (TWh)": 71.73, "Electricity from renewables (TWh)": 82.99}, {"Year": 2017, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 367000, "Electricity from fossil fuels (TWh)": 165.91, "Electricity from nuclear (TWh)": 70.34, "Electricity from renewables (TWh)": 98.85}, {"Year": 2018, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 360730.011, "Electricity from fossil fuels (TWh)": 155.41, "Electricity from nuclear (TWh)": 65.06, "Electricity from renewables (TWh)": 110.03}, {"Year": 2019, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 348920.0134, "Electricity from fossil fuels (TWh)": 144.99, "Electricity from nuclear (TWh)": 56.18, "Electricity from renewables (TWh)": 120.48}, {"Year": 2020, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 124.78, "Electricity from nuclear (TWh)": 50.85, "Electricity from renewables (TWh)": 131.74}, {"Year": 2000, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5775810, "Electricity from fossil fuels (TWh)": 2697.28, "Electricity from nuclear (TWh)": 753.89, "Electricity from renewables (TWh)": 350.93}, {"Year": 2001, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5748260, "Electricity from fossil fuels (TWh)": 2678.68, "Electricity from nuclear (TWh)": 768.83, "Electricity from renewables (TWh)": 280.06}, {"Year": 2002, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5593029.785, "Electricity from fossil fuels (TWh)": 2727.83, "Electricity from nuclear (TWh)": 780.06, "Electricity from renewables (TWh)": 336.34}, {"Year": 2003, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5658990.234, "Electricity from fossil fuels (TWh)": 2756.03, "Electricity from nuclear (TWh)": 763.73, "Electricity from renewables (TWh)": 349.18}, {"Year": 2004, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5738290.039, "Electricity from fossil fuels (TWh)": 2818.28, "Electricity from nuclear (TWh)": 788.53, "Electricity from renewables (TWh)": 345.14}, {"Year": 2005, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5753490.234, "Electricity from fossil fuels (TWh)": 2899.96, "Electricity from nuclear (TWh)": 781.99, "Electricity from renewables (TWh)": 353.04}, {"Year": 2006, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5653080.078, "Electricity from fossil fuels (TWh)": 2878.56, "Electricity from nuclear (TWh)": 787.22, "Electricity from renewables (TWh)": 381.16}, {"Year": 2007, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5736319.824, "Electricity from fossil fuels (TWh)": 2988.24, "Electricity from nuclear (TWh)": 806.42, "Electricity from renewables (TWh)": 347.91}, {"Year": 2008, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5558379.883, "Electricity from fossil fuels (TWh)": 2924.21, "Electricity from nuclear (TWh)": 806.21, "Electricity from renewables (TWh)": 377.11}, {"Year": 2009, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5156430.176, "Electricity from fossil fuels (TWh)": 2725.41, "Electricity from nuclear (TWh)": 798.85, "Electricity from renewables (TWh)": 415.56}, {"Year": 2010, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5392109.863, "Electricity from fossil fuels (TWh)": 2882.49, "Electricity from nuclear (TWh)": 806.97, "Electricity from renewables (TWh)": 424.48}, {"Year": 2011, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5173600.098, "Electricity from fossil fuels (TWh)": 2788.93, "Electricity from nuclear (TWh)": 790.2, "Electricity from renewables (TWh)": 509.74}, {"Year": 2012, "Entity": "United States", "Value_co2_emissions_kt_by_country": 4956060.059, "Electricity from fossil fuels (TWh)": 2779.02, "Electricity from nuclear (TWh)": 769.33, "Electricity from renewables (TWh)": 492.32}, {"Year": 2013, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5092100.098, "Electricity from fossil fuels (TWh)": 2746.21, "Electricity from nuclear (TWh)": 789.02, "Electricity from renewables (TWh)": 520.38}, {"Year": 2014, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5107209.961, "Electricity from fossil fuels (TWh)": 2752.01, "Electricity from nuclear (TWh)": 797.17, "Electricity from renewables (TWh)": 546.83}, {"Year": 2015, "Entity": "United States", "Value_co2_emissions_kt_by_country": 4990709.961, "Electricity from fossil fuels (TWh)": 2730.32, "Electricity from nuclear (TWh)": 797.18, "Electricity from renewables (TWh)": 556.49}, {"Year": 2016, "Entity": "United States", "Value_co2_emissions_kt_by_country": 4894500, "Electricity from fossil fuels (TWh)": 2656.96, "Electricity from nuclear (TWh)": 805.69, "Electricity from renewables (TWh)": 624.91}, {"Year": 2017, "Entity": "United States", "Value_co2_emissions_kt_by_country": 4819370.117, "Electricity from fossil fuels (TWh)": 2540.17, "Electricity from nuclear (TWh)": 804.95, "Electricity from renewables (TWh)": 707.19}, {"Year": 2018, "Entity": "United States", "Value_co2_emissions_kt_by_country": 4975310.059, "Electricity from fossil fuels (TWh)": 2661.3, "Electricity from nuclear (TWh)": 807.08, "Electricity from renewables (TWh)": 733.17}, {"Year": 2019, "Entity": "United States", "Value_co2_emissions_kt_by_country": 4817720.215, "Electricity from fossil fuels (TWh)": 2588.21, "Electricity from nuclear (TWh)": 809.41, "Electricity from renewables (TWh)": 760.76}, {"Year": 2020, "Entity": "United States", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 2431.9, "Electricity from nuclear (TWh)": 789.88, "Electricity from renewables (TWh)": 821.4}], "anchored": true, "attachedMetadata": ""}, {"kind": "table", "id": "table-82", "displayId": "energy-source", "names": ["Entity", "Year", "energy", "source"], "rows": [{"Entity": "Australia", "Year": 2000, "energy": 181.05, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2001, "energy": 194.33, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2002, "energy": 197.29, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2003, "energy": 195.13, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2004, "energy": 203.66, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2005, "energy": 195.95, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2006, "energy": 198.72, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2007, "energy": 208.59, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2008, "energy": 211.06, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2009, "energy": 216.42, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2010, "energy": 212.5, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2011, "energy": 213.56, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2012, "energy": 206.75, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2013, "energy": 195.78, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2014, "energy": 205.46, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2015, "energy": 197.72, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2016, "energy": 207.66, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2017, "energy": 209.14, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2018, "energy": 207.45, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2019, "energy": 196.45, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2020, "energy": 186.92, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2000, "energy": 28.87, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2001, "energy": 35.19, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2002, "energy": 33.5, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2003, "energy": 31.62, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2004, "energy": 40.14, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2005, "energy": 39.56, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2006, "energy": 39.4, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2007, "energy": 37.64, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2008, "energy": 55.87, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2009, "energy": 36.32, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2010, "energy": 61.02, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2011, "energy": 50.27, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2012, "energy": 77.21, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2013, "energy": 112, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2014, "energy": 136.58, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2015, "energy": 128.85, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2016, "energy": 93.06, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2017, "energy": 101.9, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2018, "energy": 86.69, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2019, "energy": 90.91, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2020, "energy": 81.15, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2000, "energy": 155.56, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2001, "energy": 159.93, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2002, "energy": 155.12, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2003, "energy": 157.35, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2004, "energy": 148.86, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2005, "energy": 150.78, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2006, "energy": 139.71, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2007, "energy": 149.36, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2008, "energy": 141.33, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2009, "energy": 129.76, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2010, "energy": 130.08, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2011, "energy": 131.3, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2012, "energy": 124.2, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2013, "energy": 122.87, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2014, "energy": 122.75, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2015, "energy": 125.7, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2016, "energy": 122.35, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2017, "energy": 113.7, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2018, "energy": 112.47, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2019, "energy": 110.65, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2020, "energy": 102.19, "source": "fossil fuels"}, {"Entity": "China", "Year": 2000, "energy": 1113.3, "source": "fossil fuels"}, {"Entity": "China", "Year": 2001, "energy": 1182.59, "source": "fossil fuels"}, {"Entity": "China", "Year": 2002, "energy": 1337.46, "source": "fossil fuels"}, {"Entity": "China", "Year": 2003, "energy": 1579.96, "source": "fossil fuels"}, {"Entity": "China", "Year": 2004, "energy": 1795.41, "source": "fossil fuels"}, {"Entity": "China", "Year": 2005, "energy": 2042.8, "source": "fossil fuels"}, {"Entity": "China", "Year": 2006, "energy": 2364.16, "source": "fossil fuels"}, {"Entity": "China", "Year": 2007, "energy": 2718.7, "source": "fossil fuels"}, {"Entity": "China", "Year": 2008, "energy": 2762.29, "source": "fossil fuels"}, {"Entity": "China", "Year": 2009, "energy": 2980.2, "source": "fossil fuels"}, {"Entity": "China", "Year": 2010, "energy": 3326.19, "source": "fossil fuels"}, {"Entity": "China", "Year": 2011, "energy": 3811.77, "source": "fossil fuels"}, {"Entity": "China", "Year": 2012, "energy": 3869.38, "source": "fossil fuels"}, {"Entity": "China", "Year": 2013, "energy": 4203.77, "source": "fossil fuels"}, {"Entity": "China", "Year": 2014, "energy": 4345.86, "source": "fossil fuels"}, {"Entity": "China", "Year": 2015, "energy": 4222.76, "source": "fossil fuels"}, {"Entity": "China", "Year": 2016, "energy": 4355, "source": "fossil fuels"}, {"Entity": "China", "Year": 2017, "energy": 4643.1, "source": "fossil fuels"}, {"Entity": "China", "Year": 2018, "energy": 4990.28, "source": "fossil fuels"}, {"Entity": "China", "Year": 2019, "energy": 5098.22, "source": "fossil fuels"}, {"Entity": "China", "Year": 2020, "energy": 5184.13, "source": "fossil fuels"}, {"Entity": "France", "Year": 2000, "energy": 50.61, "source": "fossil fuels"}, {"Entity": "France", "Year": 2001, "energy": 46.48, "source": "fossil fuels"}, {"Entity": "France", "Year": 2002, "energy": 52.67, "source": "fossil fuels"}, {"Entity": "France", "Year": 2003, "energy": 57.38, "source": "fossil fuels"}, {"Entity": "France", "Year": 2004, "energy": 56.53, "source": "fossil fuels"}, {"Entity": "France", "Year": 2005, "energy": 63.35, "source": "fossil fuels"}, {"Entity": "France", "Year": 2006, "energy": 56.9, "source": "fossil fuels"}, {"Entity": "France", "Year": 2007, "energy": 58.18, "source": "fossil fuels"}, {"Entity": "France", "Year": 2008, "energy": 55.57, "source": "fossil fuels"}, {"Entity": "France", "Year": 2009, "energy": 51.32, "source": "fossil fuels"}, {"Entity": "France", "Year": 2010, "energy": 57.63, "source": "fossil fuels"}, {"Entity": "France", "Year": 2011, "energy": 58.99, "source": "fossil fuels"}, {"Entity": "France", "Year": 2012, "energy": 56.42, "source": "fossil fuels"}, {"Entity": "France", "Year": 2013, "energy": 53.35, "source": "fossil fuels"}, {"Entity": "France", "Year": 2014, "energy": 35.68, "source": "fossil fuels"}, {"Entity": "France", "Year": 2015, "energy": 44.65, "source": "fossil fuels"}, {"Entity": "France", "Year": 2016, "energy": 56.45, "source": "fossil fuels"}, {"Entity": "France", "Year": 2017, "energy": 65.09, "source": "fossil fuels"}, {"Entity": "France", "Year": 2018, "energy": 49.27, "source": "fossil fuels"}, {"Entity": "France", "Year": 2019, "energy": 53.5, "source": "fossil fuels"}, {"Entity": "France", "Year": 2020, "energy": 48.14, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2000, "energy": 367.22, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2001, "energy": 372.69, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2002, "energy": 372.64, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2003, "energy": 390.81, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2004, "energy": 385.24, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2005, "energy": 386.96, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2006, "energy": 390.03, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2007, "energy": 402.4, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2008, "energy": 390.43, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2009, "energy": 358.07, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2010, "energy": 378.9, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2011, "energy": 373.16, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2012, "energy": 377.89, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2013, "energy": 381.52, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2014, "energy": 360.28, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2015, "energy": 359.99, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2016, "energy": 368.67, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2017, "energy": 353.37, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2018, "energy": 334.65, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2019, "energy": 284.09, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2020, "energy": 251.4, "source": "fossil fuels"}, {"Entity": "India", "Year": 2000, "energy": 475.35, "source": "fossil fuels"}, {"Entity": "India", "Year": 2001, "energy": 491.01, "source": "fossil fuels"}, {"Entity": "India", "Year": 2002, "energy": 517.51, "source": "fossil fuels"}, {"Entity": "India", "Year": 2003, "energy": 545.36, "source": "fossil fuels"}, {"Entity": "India", "Year": 2004, "energy": 567.86, "source": "fossil fuels"}, {"Entity": "India", "Year": 2005, "energy": 579.32, "source": "fossil fuels"}, {"Entity": "India", "Year": 2006, "energy": 599.24, "source": "fossil fuels"}, {"Entity": "India", "Year": 2007, "energy": 636.68, "source": "fossil fuels"}, {"Entity": "India", "Year": 2008, "energy": 674.27, "source": "fossil fuels"}, {"Entity": "India", "Year": 2009, "energy": 728.56, "source": "fossil fuels"}, {"Entity": "India", "Year": 2010, "energy": 771.78, "source": "fossil fuels"}, {"Entity": "India", "Year": 2011, "energy": 828.16, "source": "fossil fuels"}, {"Entity": "India", "Year": 2012, "energy": 893.45, "source": "fossil fuels"}, {"Entity": "India", "Year": 2013, "energy": 924.93, "source": "fossil fuels"}, {"Entity": "India", "Year": 2014, "energy": 1025.29, "source": "fossil fuels"}, {"Entity": "India", "Year": 2015, "energy": 1080.44, "source": "fossil fuels"}, {"Entity": "India", "Year": 2016, "energy": 1155.52, "source": "fossil fuels"}, {"Entity": "India", "Year": 2017, "energy": 1198.85, "source": "fossil fuels"}, {"Entity": "India", "Year": 2018, "energy": 1276.32, "source": "fossil fuels"}, {"Entity": "India", "Year": 2019, "energy": 1273.59, "source": "fossil fuels"}, {"Entity": "India", "Year": 2020, "energy": 1202.34, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2000, "energy": 78.43, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2001, "energy": 83.96, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2002, "energy": 92.03, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2003, "energy": 97.57, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2004, "energy": 103.8, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2005, "energy": 110.22, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2006, "energy": 116.8, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2007, "energy": 124.1, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2008, "energy": 129.55, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2009, "energy": 136.05, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2010, "energy": 142.88, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2011, "energy": 161.41, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2012, "energy": 177.83, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2013, "energy": 189.66, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2014, "energy": 203.11, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2015, "energy": 209.71, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2016, "energy": 217.97, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2017, "energy": 222.64, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2018, "energy": 235.41, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2019, "energy": 247.39, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2020, "energy": 238.91, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2000, "energy": 218.28, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2001, "energy": 216.73, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2002, "energy": 228.45, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2003, "energy": 238.52, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2004, "energy": 240.95, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2005, "energy": 247.29, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2006, "energy": 256.03, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2007, "energy": 259.49, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2008, "energy": 254.34, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2009, "energy": 218.32, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2010, "energy": 220.93, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2011, "energy": 216.78, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2012, "energy": 204.26, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2013, "energy": 175.07, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2014, "energy": 156.76, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2015, "energy": 172.06, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2016, "energy": 179.19, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2017, "energy": 189.44, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2018, "energy": 172.98, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2019, "energy": 175.52, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2020, "energy": 161.17, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2000, "energy": 578.29, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2001, "energy": 564.95, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2002, "energy": 605.12, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2003, "energy": 633.76, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2004, "energy": 621.6, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2005, "energy": 634.09, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2006, "energy": 628.77, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2007, "energy": 705.37, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2008, "energy": 663.88, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2009, "energy": 611.86, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2010, "energy": 689.89, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2011, "energy": 777.1, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2012, "energy": 920.39, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2013, "energy": 897.88, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2014, "energy": 892.18, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2015, "energy": 844.23, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2016, "energy": 832.4, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2017, "energy": 806.12, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2018, "energy": 780.61, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2019, "energy": 735.66, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2020, "energy": 716.67, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2000, "energy": 44.11, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2001, "energy": 47.3, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2002, "energy": 49.44, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2003, "energy": 55.24, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2004, "energy": 58.89, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2005, "energy": 60.06, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2006, "energy": 63.89, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2007, "energy": 68.45, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2008, "energy": 72.89, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2009, "energy": 71.85, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2010, "energy": 74.63, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2011, "energy": 78.7, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2012, "energy": 82.98, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2013, "energy": 84.88, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2014, "energy": 86.37, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2015, "energy": 82.2, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2016, "energy": 82.65, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2017, "energy": 91.48, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2018, "energy": 96.36, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2019, "energy": 95.39, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2020, "energy": 96.7, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2000, "energy": 141.8, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2001, "energy": 153.32, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2002, "energy": 159.81, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2003, "energy": 160.45, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2004, "energy": 173.66, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2005, "energy": 178.76, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2006, "energy": 182.76, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2007, "energy": 191.83, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2008, "energy": 184.51, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2009, "energy": 194.75, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2010, "energy": 207.38, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2011, "energy": 219.88, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2012, "energy": 229.14, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2013, "energy": 231.23, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2014, "energy": 223.43, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2015, "energy": 234.28, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2016, "energy": 239.78, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2017, "energy": 242.69, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2018, "energy": 259.92, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2019, "energy": 248.2, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2020, "energy": 245.46, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2000, "energy": 140.85, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2001, "energy": 140.94, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2002, "energy": 139.72, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2003, "energy": 147.76, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2004, "energy": 149.06, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2005, "energy": 151.2, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2006, "energy": 156.16, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2007, "energy": 153.08, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2008, "energy": 148.03, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2009, "energy": 142.4, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2010, "energy": 146.12, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2011, "energy": 149.88, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2012, "energy": 144.75, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2013, "energy": 146.85, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2014, "energy": 138.53, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2015, "energy": 141.55, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2016, "energy": 143.28, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2017, "energy": 145.8, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2018, "energy": 147.87, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2019, "energy": 137.58, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2020, "energy": 128.91, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2000, "energy": 138.68, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2001, "energy": 146.09, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2002, "energy": 154.91, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2003, "energy": 166.58, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2004, "energy": 173.41, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2005, "energy": 191.05, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2006, "energy": 196.31, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2007, "energy": 204.43, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2008, "energy": 204.2, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2009, "energy": 217.31, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2010, "energy": 240.06, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2011, "energy": 250.07, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2012, "energy": 271.68, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2013, "energy": 284.02, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2014, "energy": 311.81, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2015, "energy": 338.34, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2016, "energy": 337.38, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2017, "energy": 354.3, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2018, "energy": 334.7, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2019, "energy": 335.24, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2020, "energy": 337.82, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2000, "energy": 181.67, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2001, "energy": 183.36, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2002, "energy": 188.79, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2003, "energy": 204.39, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2004, "energy": 212.63, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2005, "energy": 215.23, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2006, "energy": 223.25, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2007, "energy": 232.91, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2008, "energy": 226.32, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2009, "energy": 218.17, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2010, "energy": 227.57, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2011, "energy": 229.06, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2012, "energy": 226.84, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2013, "energy": 223.28, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2014, "energy": 218.42, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2015, "energy": 214.88, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2016, "energy": 213.09, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2017, "energy": 212.77, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2018, "energy": 214.25, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2019, "energy": 208.39, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2020, "energy": 197.5, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2000, "energy": 124.22, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2001, "energy": 120.06, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2002, "energy": 143.72, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2003, "energy": 139.67, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2004, "energy": 159.91, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2005, "energy": 184.65, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2006, "energy": 182.98, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2007, "energy": 188.13, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2008, "energy": 189.55, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2009, "energy": 164.69, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2010, "energy": 138.39, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2011, "energy": 146.12, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2012, "energy": 145.33, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2013, "energy": 113.32, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2014, "energy": 107.37, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2015, "energy": 123.19, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2016, "energy": 107.93, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2017, "energy": 126.93, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2018, "energy": 112.23, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2019, "energy": 111.55, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2020, "energy": 87.64, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2000, "energy": 83.15, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2001, "energy": 88.97, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2002, "energy": 93.51, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2003, "energy": 100.61, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2004, "energy": 109.46, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2005, "energy": 115.58, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2006, "energy": 119.41, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2007, "energy": 122.12, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2008, "energy": 127.43, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2009, "energy": 128.09, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2010, "energy": 141.72, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2011, "energy": 135.31, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2012, "energy": 143.73, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2013, "energy": 148.29, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2014, "energy": 149.26, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2015, "energy": 153.4, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2016, "energy": 161.79, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2017, "energy": 161.88, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2018, "energy": 156.26, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2019, "energy": 162.59, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2020, "energy": 154.52, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2000, "energy": 82.65, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2001, "energy": 84.59, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2002, "energy": 85.93, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2003, "energy": 89.52, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2004, "energy": 83.22, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2005, "energy": 84.75, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2006, "energy": 90.09, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2007, "energy": 93.13, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2008, "energy": 90.92, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2009, "energy": 78.58, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2010, "energy": 86.28, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2011, "energy": 93.5, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2012, "energy": 96.99, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2013, "energy": 95.39, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2014, "energy": 83.42, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2015, "energy": 66.91, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2016, "energy": 72.66, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2017, "energy": 57.96, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2018, "energy": 60.81, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2019, "energy": 57.79, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2020, "energy": 54.5, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2000, "energy": 279.34, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2001, "energy": 282.72, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2002, "energy": 285.62, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2003, "energy": 296.15, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2004, "energy": 297.15, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2005, "energy": 296.87, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2006, "energy": 299.88, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2007, "energy": 310.26, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2008, "energy": 310.5, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2009, "energy": 278.73, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2010, "energy": 290.59, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2011, "energy": 260.88, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2012, "energy": 249.25, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2013, "energy": 231.56, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2014, "energy": 206.94, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2015, "energy": 182.43, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2016, "energy": 181.56, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2017, "energy": 165.91, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2018, "energy": 155.41, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2019, "energy": 144.99, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2020, "energy": 124.78, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2000, "energy": 2697.28, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2001, "energy": 2678.68, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2002, "energy": 2727.83, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2003, "energy": 2756.03, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2004, "energy": 2818.28, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2005, "energy": 2899.96, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2006, "energy": 2878.56, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2007, "energy": 2988.24, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2008, "energy": 2924.21, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2009, "energy": 2725.41, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2010, "energy": 2882.49, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2011, "energy": 2788.93, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2012, "energy": 2779.02, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2013, "energy": 2746.21, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2014, "energy": 2752.01, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2015, "energy": 2730.32, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2016, "energy": 2656.96, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2017, "energy": 2540.17, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2018, "energy": 2661.3, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2019, "energy": 2588.21, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2020, "energy": 2431.9, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2000, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2001, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2002, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2003, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2004, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2005, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2006, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2007, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2008, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2009, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2010, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2011, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2012, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2013, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2014, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2015, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2016, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2017, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2018, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2019, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2020, "energy": 0, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2000, "energy": 4.94, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2001, "energy": 14.27, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2002, "energy": 13.84, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2003, "energy": 13.4, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2004, "energy": 11.6, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2005, "energy": 9.2, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2006, "energy": 12.98, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2007, "energy": 11.65, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2008, "energy": 13.21, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2009, "energy": 12.22, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2010, "energy": 13.77, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2011, "energy": 14.8, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2012, "energy": 15.17, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2013, "energy": 14.65, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2014, "energy": 14.46, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2015, "energy": 13.91, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2016, "energy": 14.97, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2017, "energy": 14.86, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2018, "energy": 14.79, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2019, "energy": 15.16, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2020, "energy": 13.21, "source": "nuclear"}, {"Entity": "Canada", "Year": 2000, "energy": 69.16, "source": "nuclear"}, {"Entity": "Canada", "Year": 2001, "energy": 72.86, "source": "nuclear"}, {"Entity": "Canada", "Year": 2002, "energy": 71.75, "source": "nuclear"}, {"Entity": "Canada", "Year": 2003, "energy": 71.15, "source": "nuclear"}, {"Entity": "Canada", "Year": 2004, "energy": 85.87, "source": "nuclear"}, {"Entity": "Canada", "Year": 2005, "energy": 86.83, "source": "nuclear"}, {"Entity": "Canada", "Year": 2006, "energy": 92.44, "source": "nuclear"}, {"Entity": "Canada", "Year": 2007, "energy": 88.19, "source": "nuclear"}, {"Entity": "Canada", "Year": 2008, "energy": 88.3, "source": "nuclear"}, {"Entity": "Canada", "Year": 2009, "energy": 85.13, "source": "nuclear"}, {"Entity": "Canada", "Year": 2010, "energy": 85.53, "source": "nuclear"}, {"Entity": "Canada", "Year": 2011, "energy": 88.29, "source": "nuclear"}, {"Entity": "Canada", "Year": 2012, "energy": 89.49, "source": "nuclear"}, {"Entity": "Canada", "Year": 2013, "energy": 97.58, "source": "nuclear"}, {"Entity": "Canada", "Year": 2014, "energy": 101.21, "source": "nuclear"}, {"Entity": "Canada", "Year": 2015, "energy": 96.05, "source": "nuclear"}, {"Entity": "Canada", "Year": 2016, "energy": 95.69, "source": "nuclear"}, {"Entity": "Canada", "Year": 2017, "energy": 95.57, "source": "nuclear"}, {"Entity": "Canada", "Year": 2018, "energy": 95.03, "source": "nuclear"}, {"Entity": "Canada", "Year": 2019, "energy": 95.47, "source": "nuclear"}, {"Entity": "Canada", "Year": 2020, "energy": 92.65, "source": "nuclear"}, {"Entity": "China", "Year": 2000, "energy": 16.74, "source": "nuclear"}, {"Entity": "China", "Year": 2001, "energy": 17.47, "source": "nuclear"}, {"Entity": "China", "Year": 2002, "energy": 25.13, "source": "nuclear"}, {"Entity": "China", "Year": 2003, "energy": 43.34, "source": "nuclear"}, {"Entity": "China", "Year": 2004, "energy": 50.47, "source": "nuclear"}, {"Entity": "China", "Year": 2005, "energy": 53.09, "source": "nuclear"}, {"Entity": "China", "Year": 2006, "energy": 54.84, "source": "nuclear"}, {"Entity": "China", "Year": 2007, "energy": 62.13, "source": "nuclear"}, {"Entity": "China", "Year": 2008, "energy": 68.39, "source": "nuclear"}, {"Entity": "China", "Year": 2009, "energy": 70.05, "source": "nuclear"}, {"Entity": "China", "Year": 2010, "energy": 74.74, "source": "nuclear"}, {"Entity": "China", "Year": 2011, "energy": 87.2, "source": "nuclear"}, {"Entity": "China", "Year": 2012, "energy": 98.32, "source": "nuclear"}, {"Entity": "China", "Year": 2013, "energy": 111.5, "source": "nuclear"}, {"Entity": "China", "Year": 2014, "energy": 133.22, "source": "nuclear"}, {"Entity": "China", "Year": 2015, "energy": 171.38, "source": "nuclear"}, {"Entity": "China", "Year": 2016, "energy": 213.18, "source": "nuclear"}, {"Entity": "China", "Year": 2017, "energy": 248.1, "source": "nuclear"}, {"Entity": "China", "Year": 2018, "energy": 295, "source": "nuclear"}, {"Entity": "China", "Year": 2019, "energy": 348.7, "source": "nuclear"}, {"Entity": "China", "Year": 2020, "energy": 366.2, "source": "nuclear"}, {"Entity": "France", "Year": 2000, "energy": 415.16, "source": "nuclear"}, {"Entity": "France", "Year": 2001, "energy": 421.08, "source": "nuclear"}, {"Entity": "France", "Year": 2002, "energy": 436.76, "source": "nuclear"}, {"Entity": "France", "Year": 2003, "energy": 441.07, "source": "nuclear"}, {"Entity": "France", "Year": 2004, "energy": 448.24, "source": "nuclear"}, {"Entity": "France", "Year": 2005, "energy": 451.53, "source": "nuclear"}, {"Entity": "France", "Year": 2006, "energy": 450.19, "source": "nuclear"}, {"Entity": "France", "Year": 2007, "energy": 439.73, "source": "nuclear"}, {"Entity": "France", "Year": 2008, "energy": 439.45, "source": "nuclear"}, {"Entity": "France", "Year": 2009, "energy": 409.74, "source": "nuclear"}, {"Entity": "France", "Year": 2010, "energy": 428.52, "source": "nuclear"}, {"Entity": "France", "Year": 2011, "energy": 442.39, "source": "nuclear"}, {"Entity": "France", "Year": 2012, "energy": 425.41, "source": "nuclear"}, {"Entity": "France", "Year": 2013, "energy": 423.68, "source": "nuclear"}, {"Entity": "France", "Year": 2014, "energy": 436.48, "source": "nuclear"}, {"Entity": "France", "Year": 2015, "energy": 437.43, "source": "nuclear"}, {"Entity": "France", "Year": 2016, "energy": 403.2, "source": "nuclear"}, {"Entity": "France", "Year": 2017, "energy": 398.36, "source": "nuclear"}, {"Entity": "France", "Year": 2018, "energy": 412.94, "source": "nuclear"}, {"Entity": "France", "Year": 2019, "energy": 399.01, "source": "nuclear"}, {"Entity": "France", "Year": 2020, "energy": 353.83, "source": "nuclear"}, {"Entity": "Germany", "Year": 2000, "energy": 169.61, "source": "nuclear"}, {"Entity": "Germany", "Year": 2001, "energy": 171.3, "source": "nuclear"}, {"Entity": "Germany", "Year": 2002, "energy": 164.84, "source": "nuclear"}, {"Entity": "Germany", "Year": 2003, "energy": 165.06, "source": "nuclear"}, {"Entity": "Germany", "Year": 2004, "energy": 167.07, "source": "nuclear"}, {"Entity": "Germany", "Year": 2005, "energy": 163.05, "source": "nuclear"}, {"Entity": "Germany", "Year": 2006, "energy": 167.27, "source": "nuclear"}, {"Entity": "Germany", "Year": 2007, "energy": 140.53, "source": "nuclear"}, {"Entity": "Germany", "Year": 2008, "energy": 148.49, "source": "nuclear"}, {"Entity": "Germany", "Year": 2009, "energy": 134.93, "source": "nuclear"}, {"Entity": "Germany", "Year": 2010, "energy": 140.56, "source": "nuclear"}, {"Entity": "Germany", "Year": 2011, "energy": 107.97, "source": "nuclear"}, {"Entity": "Germany", "Year": 2012, "energy": 99.46, "source": "nuclear"}, {"Entity": "Germany", "Year": 2013, "energy": 97.29, "source": "nuclear"}, {"Entity": "Germany", "Year": 2014, "energy": 97.13, "source": "nuclear"}, {"Entity": "Germany", "Year": 2015, "energy": 91.79, "source": "nuclear"}, {"Entity": "Germany", "Year": 2016, "energy": 84.63, "source": "nuclear"}, {"Entity": "Germany", "Year": 2017, "energy": 76.32, "source": "nuclear"}, {"Entity": "Germany", "Year": 2018, "energy": 76, "source": "nuclear"}, {"Entity": "Germany", "Year": 2019, "energy": 75.07, "source": "nuclear"}, {"Entity": "Germany", "Year": 2020, "energy": 64.38, "source": "nuclear"}, {"Entity": "India", "Year": 2000, "energy": 15.77, "source": "nuclear"}, {"Entity": "India", "Year": 2001, "energy": 18.89, "source": "nuclear"}, {"Entity": "India", "Year": 2002, "energy": 19.35, "source": "nuclear"}, {"Entity": "India", "Year": 2003, "energy": 18.14, "source": "nuclear"}, {"Entity": "India", "Year": 2004, "energy": 21.26, "source": "nuclear"}, {"Entity": "India", "Year": 2005, "energy": 17.73, "source": "nuclear"}, {"Entity": "India", "Year": 2006, "energy": 17.63, "source": "nuclear"}, {"Entity": "India", "Year": 2007, "energy": 17.83, "source": "nuclear"}, {"Entity": "India", "Year": 2008, "energy": 15.23, "source": "nuclear"}, {"Entity": "India", "Year": 2009, "energy": 16.82, "source": "nuclear"}, {"Entity": "India", "Year": 2010, "energy": 23.08, "source": "nuclear"}, {"Entity": "India", "Year": 2011, "energy": 32.22, "source": "nuclear"}, {"Entity": "India", "Year": 2012, "energy": 33.14, "source": "nuclear"}, {"Entity": "India", "Year": 2013, "energy": 33.31, "source": "nuclear"}, {"Entity": "India", "Year": 2014, "energy": 34.69, "source": "nuclear"}, {"Entity": "India", "Year": 2015, "energy": 38.31, "source": "nuclear"}, {"Entity": "India", "Year": 2016, "energy": 37.9, "source": "nuclear"}, {"Entity": "India", "Year": 2017, "energy": 37.41, "source": "nuclear"}, {"Entity": "India", "Year": 2018, "energy": 39.05, "source": "nuclear"}, {"Entity": "India", "Year": 2019, "energy": 45.16, "source": "nuclear"}, {"Entity": "India", "Year": 2020, "energy": 44.61, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2000, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2001, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2002, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2003, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2004, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2005, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2006, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2007, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2008, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2009, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2010, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2011, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2012, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2013, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2014, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2015, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2016, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2017, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2018, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2019, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2020, "energy": null, "source": "nuclear"}, {"Entity": "Italy", "Year": 2000, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2001, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2002, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2003, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2004, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2005, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2006, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2007, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2008, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2009, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2010, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2011, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2012, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2013, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2014, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2015, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2016, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2017, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2018, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2019, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2020, "energy": 0, "source": "nuclear"}, {"Entity": "Japan", "Year": 2000, "energy": 305.95, "source": "nuclear"}, {"Entity": "Japan", "Year": 2001, "energy": 303.86, "source": "nuclear"}, {"Entity": "Japan", "Year": 2002, "energy": 280.34, "source": "nuclear"}, {"Entity": "Japan", "Year": 2003, "energy": 228.01, "source": "nuclear"}, {"Entity": "Japan", "Year": 2004, "energy": 268.32, "source": "nuclear"}, {"Entity": "Japan", "Year": 2005, "energy": 280.5, "source": "nuclear"}, {"Entity": "Japan", "Year": 2006, "energy": 291.54, "source": "nuclear"}, {"Entity": "Japan", "Year": 2007, "energy": 267.34, "source": "nuclear"}, {"Entity": "Japan", "Year": 2008, "energy": 241.25, "source": "nuclear"}, {"Entity": "Japan", "Year": 2009, "energy": 263.05, "source": "nuclear"}, {"Entity": "Japan", "Year": 2010, "energy": 278.36, "source": "nuclear"}, {"Entity": "Japan", "Year": 2011, "energy": 153.38, "source": "nuclear"}, {"Entity": "Japan", "Year": 2012, "energy": 15.12, "source": "nuclear"}, {"Entity": "Japan", "Year": 2013, "energy": 10.43, "source": "nuclear"}, {"Entity": "Japan", "Year": 2014, "energy": 0, "source": "nuclear"}, {"Entity": "Japan", "Year": 2015, "energy": 3.24, "source": "nuclear"}, {"Entity": "Japan", "Year": 2016, "energy": 14.87, "source": "nuclear"}, {"Entity": "Japan", "Year": 2017, "energy": 27.75, "source": "nuclear"}, {"Entity": "Japan", "Year": 2018, "energy": 47.82, "source": "nuclear"}, {"Entity": "Japan", "Year": 2019, "energy": 63.88, "source": "nuclear"}, {"Entity": "Japan", "Year": 2020, "energy": 41.86, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2000, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2001, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2002, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2003, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2004, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2005, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2006, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2007, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2008, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2009, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2010, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2011, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2012, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2013, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2014, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2015, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2016, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2017, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2018, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2019, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2020, "energy": null, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2000, "energy": 7.81, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2001, "energy": 8.29, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2002, "energy": 9.26, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2003, "energy": 9.98, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2004, "energy": 8.73, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2005, "energy": 10.32, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2006, "energy": 10.4, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2007, "energy": 9.95, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2008, "energy": 9.36, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2009, "energy": 10.11, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2010, "energy": 5.66, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2011, "energy": 9.66, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2012, "energy": 8.41, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2013, "energy": 11.38, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2014, "energy": 9.3, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2015, "energy": 11.18, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2016, "energy": 10.27, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2017, "energy": 10.57, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2018, "energy": 13.32, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2019, "energy": 10.88, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2020, "energy": 10.87, "source": "nuclear"}, {"Entity": "Poland", "Year": 2000, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2001, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2002, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2003, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2004, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2005, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2006, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2007, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2008, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2009, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2010, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2011, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2012, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2013, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2014, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2015, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2016, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2017, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2018, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2019, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2020, "energy": 0, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2000, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2001, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2002, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2003, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2004, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2005, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2006, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2007, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2008, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2009, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2010, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2011, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2012, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2013, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2014, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2015, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2016, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2017, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2018, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2019, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2020, "energy": null, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2000, "energy": 13.01, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2001, "energy": 10.72, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2002, "energy": 11.99, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2003, "energy": 12.66, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2004, "energy": 14.28, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2005, "energy": 12.24, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2006, "energy": 10.07, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2007, "energy": 12.6, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2008, "energy": 12.75, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2009, "energy": 11.57, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2010, "energy": 12.9, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2011, "energy": 12.94, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2012, "energy": 12.4, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2013, "energy": 13.61, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2014, "energy": 14.76, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2015, "energy": 10.97, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2016, "energy": 15.21, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2017, "energy": 15.09, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2018, "energy": 10.56, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2019, "energy": 13.6, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2020, "energy": 11.62, "source": "nuclear"}, {"Entity": "Spain", "Year": 2000, "energy": 62.21, "source": "nuclear"}, {"Entity": "Spain", "Year": 2001, "energy": 63.71, "source": "nuclear"}, {"Entity": "Spain", "Year": 2002, "energy": 63.02, "source": "nuclear"}, {"Entity": "Spain", "Year": 2003, "energy": 61.88, "source": "nuclear"}, {"Entity": "Spain", "Year": 2004, "energy": 63.61, "source": "nuclear"}, {"Entity": "Spain", "Year": 2005, "energy": 57.54, "source": "nuclear"}, {"Entity": "Spain", "Year": 2006, "energy": 60.13, "source": "nuclear"}, {"Entity": "Spain", "Year": 2007, "energy": 55.1, "source": "nuclear"}, {"Entity": "Spain", "Year": 2008, "energy": 58.97, "source": "nuclear"}, {"Entity": "Spain", "Year": 2009, "energy": 52.76, "source": "nuclear"}, {"Entity": "Spain", "Year": 2010, "energy": 61.99, "source": "nuclear"}, {"Entity": "Spain", "Year": 2011, "energy": 57.72, "source": "nuclear"}, {"Entity": "Spain", "Year": 2012, "energy": 61.47, "source": "nuclear"}, {"Entity": "Spain", "Year": 2013, "energy": 56.73, "source": "nuclear"}, {"Entity": "Spain", "Year": 2014, "energy": 57.31, "source": "nuclear"}, {"Entity": "Spain", "Year": 2015, "energy": 57.2, "source": "nuclear"}, {"Entity": "Spain", "Year": 2016, "energy": 58.63, "source": "nuclear"}, {"Entity": "Spain", "Year": 2017, "energy": 58.04, "source": "nuclear"}, {"Entity": "Spain", "Year": 2018, "energy": 55.77, "source": "nuclear"}, {"Entity": "Spain", "Year": 2019, "energy": 58.35, "source": "nuclear"}, {"Entity": "Spain", "Year": 2020, "energy": 58.3, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2000, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2001, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2002, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2003, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2004, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2005, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2006, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2007, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2008, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2009, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2010, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2011, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2012, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2013, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2014, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2015, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2016, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2017, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2018, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2019, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2020, "energy": 0, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2000, "energy": 77.34, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2001, "energy": 76.17, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2002, "energy": 77.99, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2003, "energy": 81.41, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2004, "energy": 87.02, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2005, "energy": 88.76, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2006, "energy": 90.22, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2007, "energy": 92.54, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2008, "energy": 89.84, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2009, "energy": 82.92, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2010, "energy": 89.15, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2011, "energy": 90.25, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2012, "energy": 90.14, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2013, "energy": 83.21, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2014, "energy": 88.39, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2015, "energy": 87.63, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2016, "energy": 80.95, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2017, "energy": 85.58, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2018, "energy": 84.4, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2019, "energy": 83, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2020, "energy": 76.2, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2000, "energy": 85.06, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2001, "energy": 90.09, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2002, "energy": 87.85, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2003, "energy": 88.69, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2004, "energy": 80, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2005, "energy": 81.62, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2006, "energy": 75.45, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2007, "energy": 63.03, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2008, "energy": 52.49, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2009, "energy": 69.1, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2010, "energy": 62.14, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2011, "energy": 68.98, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2012, "energy": 70.4, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2013, "energy": 70.61, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2014, "energy": 63.75, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2015, "energy": 70.34, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2016, "energy": 71.73, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2017, "energy": 70.34, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2018, "energy": 65.06, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2019, "energy": 56.18, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2020, "energy": 50.85, "source": "nuclear"}, {"Entity": "United States", "Year": 2000, "energy": 753.89, "source": "nuclear"}, {"Entity": "United States", "Year": 2001, "energy": 768.83, "source": "nuclear"}, {"Entity": "United States", "Year": 2002, "energy": 780.06, "source": "nuclear"}, {"Entity": "United States", "Year": 2003, "energy": 763.73, "source": "nuclear"}, {"Entity": "United States", "Year": 2004, "energy": 788.53, "source": "nuclear"}, {"Entity": "United States", "Year": 2005, "energy": 781.99, "source": "nuclear"}, {"Entity": "United States", "Year": 2006, "energy": 787.22, "source": "nuclear"}, {"Entity": "United States", "Year": 2007, "energy": 806.42, "source": "nuclear"}, {"Entity": "United States", "Year": 2008, "energy": 806.21, "source": "nuclear"}, {"Entity": "United States", "Year": 2009, "energy": 798.85, "source": "nuclear"}, {"Entity": "United States", "Year": 2010, "energy": 806.97, "source": "nuclear"}, {"Entity": "United States", "Year": 2011, "energy": 790.2, "source": "nuclear"}, {"Entity": "United States", "Year": 2012, "energy": 769.33, "source": "nuclear"}, {"Entity": "United States", "Year": 2013, "energy": 789.02, "source": "nuclear"}, {"Entity": "United States", "Year": 2014, "energy": 797.17, "source": "nuclear"}, {"Entity": "United States", "Year": 2015, "energy": 797.18, "source": "nuclear"}, {"Entity": "United States", "Year": 2016, "energy": 805.69, "source": "nuclear"}, {"Entity": "United States", "Year": 2017, "energy": 804.95, "source": "nuclear"}, {"Entity": "United States", "Year": 2018, "energy": 807.08, "source": "nuclear"}, {"Entity": "United States", "Year": 2019, "energy": 809.41, "source": "nuclear"}, {"Entity": "United States", "Year": 2020, "energy": 789.88, "source": "nuclear"}, {"Entity": "Australia", "Year": 2000, "energy": 17.11, "source": "renewables"}, {"Entity": "Australia", "Year": 2001, "energy": 17.4, "source": "renewables"}, {"Entity": "Australia", "Year": 2002, "energy": 17.35, "source": "renewables"}, {"Entity": "Australia", "Year": 2003, "energy": 18.5, "source": "renewables"}, {"Entity": "Australia", "Year": 2004, "energy": 19.41, "source": "renewables"}, {"Entity": "Australia", "Year": 2005, "energy": 19.75, "source": "renewables"}, {"Entity": "Australia", "Year": 2006, "energy": 21.19, "source": "renewables"}, {"Entity": "Australia", "Year": 2007, "energy": 20.93, "source": "renewables"}, {"Entity": "Australia", "Year": 2008, "energy": 18.49, "source": "renewables"}, {"Entity": "Australia", "Year": 2009, "energy": 18.32, "source": "renewables"}, {"Entity": "Australia", "Year": 2010, "energy": 21.13, "source": "renewables"}, {"Entity": "Australia", "Year": 2011, "energy": 27.33, "source": "renewables"}, {"Entity": "Australia", "Year": 2012, "energy": 26.63, "source": "renewables"}, {"Entity": "Australia", "Year": 2013, "energy": 34.2, "source": "renewables"}, {"Entity": "Australia", "Year": 2014, "energy": 36.15, "source": "renewables"}, {"Entity": "Australia", "Year": 2015, "energy": 33.12, "source": "renewables"}, {"Entity": "Australia", "Year": 2016, "energy": 38.41, "source": "renewables"}, {"Entity": "Australia", "Year": 2017, "energy": 40.77, "source": "renewables"}, {"Entity": "Australia", "Year": 2018, "energy": 42.93, "source": "renewables"}, {"Entity": "Australia", "Year": 2019, "energy": 53.41, "source": "renewables"}, {"Entity": "Australia", "Year": 2020, "energy": 63.99, "source": "renewables"}, {"Entity": "Brazil", "Year": 2000, "energy": 308.77, "source": "renewables"}, {"Entity": "Brazil", "Year": 2001, "energy": 273.71, "source": "renewables"}, {"Entity": "Brazil", "Year": 2002, "energy": 292.95, "source": "renewables"}, {"Entity": "Brazil", "Year": 2003, "energy": 313.88, "source": "renewables"}, {"Entity": "Brazil", "Year": 2004, "energy": 329.43, "source": "renewables"}, {"Entity": "Brazil", "Year": 2005, "energy": 346.96, "source": "renewables"}, {"Entity": "Brazil", "Year": 2006, "energy": 359.55, "source": "renewables"}, {"Entity": "Brazil", "Year": 2007, "energy": 387.88, "source": "renewables"}, {"Entity": "Brazil", "Year": 2008, "energy": 385.61, "source": "renewables"}, {"Entity": "Brazil", "Year": 2009, "energy": 410.13, "source": "renewables"}, {"Entity": "Brazil", "Year": 2010, "energy": 435.99, "source": "renewables"}, {"Entity": "Brazil", "Year": 2011, "energy": 462.32, "source": "renewables"}, {"Entity": "Brazil", "Year": 2012, "energy": 454.78, "source": "renewables"}, {"Entity": "Brazil", "Year": 2013, "energy": 436.84, "source": "renewables"}, {"Entity": "Brazil", "Year": 2014, "energy": 430.82, "source": "renewables"}, {"Entity": "Brazil", "Year": 2015, "energy": 428.81, "source": "renewables"}, {"Entity": "Brazil", "Year": 2016, "energy": 463.37, "source": "renewables"}, {"Entity": "Brazil", "Year": 2017, "energy": 464.4, "source": "renewables"}, {"Entity": "Brazil", "Year": 2018, "energy": 492.66, "source": "renewables"}, {"Entity": "Brazil", "Year": 2019, "energy": 512.59, "source": "renewables"}, {"Entity": "Brazil", "Year": 2020, "energy": 520.01, "source": "renewables"}, {"Entity": "Canada", "Year": 2000, "energy": 363.7, "source": "renewables"}, {"Entity": "Canada", "Year": 2001, "energy": 339.58, "source": "renewables"}, {"Entity": "Canada", "Year": 2002, "energy": 357.06, "source": "renewables"}, {"Entity": "Canada", "Year": 2003, "energy": 343.88, "source": "renewables"}, {"Entity": "Canada", "Year": 2004, "energy": 347.68, "source": "renewables"}, {"Entity": "Canada", "Year": 2005, "energy": 368.86, "source": "renewables"}, {"Entity": "Canada", "Year": 2006, "energy": 360.48, "source": "renewables"}, {"Entity": "Canada", "Year": 2007, "energy": 375.42, "source": "renewables"}, {"Entity": "Canada", "Year": 2008, "energy": 385.21, "source": "renewables"}, {"Entity": "Canada", "Year": 2009, "energy": 380.24, "source": "renewables"}, {"Entity": "Canada", "Year": 2010, "energy": 366.21, "source": "renewables"}, {"Entity": "Canada", "Year": 2011, "energy": 391.95, "source": "renewables"}, {"Entity": "Canada", "Year": 2012, "energy": 398.58, "source": "renewables"}, {"Entity": "Canada", "Year": 2013, "energy": 417.28, "source": "renewables"}, {"Entity": "Canada", "Year": 2014, "energy": 412.13, "source": "renewables"}, {"Entity": "Canada", "Year": 2015, "energy": 417.2, "source": "renewables"}, {"Entity": "Canada", "Year": 2016, "energy": 426.84, "source": "renewables"}, {"Entity": "Canada", "Year": 2017, "energy": 435.43, "source": "renewables"}, {"Entity": "Canada", "Year": 2018, "energy": 428.39, "source": "renewables"}, {"Entity": "Canada", "Year": 2019, "energy": 421.8, "source": "renewables"}, {"Entity": "Canada", "Year": 2020, "energy": 429.24, "source": "renewables"}, {"Entity": "China", "Year": 2000, "energy": 225.56, "source": "renewables"}, {"Entity": "China", "Year": 2001, "energy": 280.73, "source": "renewables"}, {"Entity": "China", "Year": 2002, "energy": 291.41, "source": "renewables"}, {"Entity": "China", "Year": 2003, "energy": 287.28, "source": "renewables"}, {"Entity": "China", "Year": 2004, "energy": 357.43, "source": "renewables"}, {"Entity": "China", "Year": 2005, "energy": 404.37, "source": "renewables"}, {"Entity": "China", "Year": 2006, "energy": 446.72, "source": "renewables"}, {"Entity": "China", "Year": 2007, "energy": 500.71, "source": "renewables"}, {"Entity": "China", "Year": 2008, "energy": 665.08, "source": "renewables"}, {"Entity": "China", "Year": 2009, "energy": 664.39, "source": "renewables"}, {"Entity": "China", "Year": 2010, "energy": 786.38, "source": "renewables"}, {"Entity": "China", "Year": 2011, "energy": 792.38, "source": "renewables"}, {"Entity": "China", "Year": 2012, "energy": 999.56, "source": "renewables"}, {"Entity": "China", "Year": 2013, "energy": 1093.37, "source": "renewables"}, {"Entity": "China", "Year": 2014, "energy": 1289.23, "source": "renewables"}, {"Entity": "China", "Year": 2015, "energy": 1393.66, "source": "renewables"}, {"Entity": "China", "Year": 2016, "energy": 1522.79, "source": "renewables"}, {"Entity": "China", "Year": 2017, "energy": 1667.06, "source": "renewables"}, {"Entity": "China", "Year": 2018, "energy": 1835.32, "source": "renewables"}, {"Entity": "China", "Year": 2019, "energy": 2014.57, "source": "renewables"}, {"Entity": "China", "Year": 2020, "energy": 2184.94, "source": "renewables"}, {"Entity": "France", "Year": 2000, "energy": 67.83, "source": "renewables"}, {"Entity": "France", "Year": 2001, "energy": 76.09, "source": "renewables"}, {"Entity": "France", "Year": 2002, "energy": 62.69, "source": "renewables"}, {"Entity": "France", "Year": 2003, "energy": 61.47, "source": "renewables"}, {"Entity": "France", "Year": 2004, "energy": 62.42, "source": "renewables"}, {"Entity": "France", "Year": 2005, "energy": 54.98, "source": "renewables"}, {"Entity": "France", "Year": 2006, "energy": 60.91, "source": "renewables"}, {"Entity": "France", "Year": 2007, "energy": 64.3, "source": "renewables"}, {"Entity": "France", "Year": 2008, "energy": 72.33, "source": "renewables"}, {"Entity": "France", "Year": 2009, "energy": 68.15, "source": "renewables"}, {"Entity": "France", "Year": 2010, "energy": 76.68, "source": "renewables"}, {"Entity": "France", "Year": 2011, "energy": 66.02, "source": "renewables"}, {"Entity": "France", "Year": 2012, "energy": 85.25, "source": "renewables"}, {"Entity": "France", "Year": 2013, "energy": 99.42, "source": "renewables"}, {"Entity": "France", "Year": 2014, "energy": 94.03, "source": "renewables"}, {"Entity": "France", "Year": 2015, "energy": 91.84, "source": "renewables"}, {"Entity": "France", "Year": 2016, "energy": 99, "source": "renewables"}, {"Entity": "France", "Year": 2017, "energy": 92.63, "source": "renewables"}, {"Entity": "France", "Year": 2018, "energy": 113.62, "source": "renewables"}, {"Entity": "France", "Year": 2019, "energy": 113.21, "source": "renewables"}, {"Entity": "France", "Year": 2020, "energy": 125.28, "source": "renewables"}, {"Entity": "Germany", "Year": 2000, "energy": 35.47, "source": "renewables"}, {"Entity": "Germany", "Year": 2001, "energy": 37.9, "source": "renewables"}, {"Entity": "Germany", "Year": 2002, "energy": 44.48, "source": "renewables"}, {"Entity": "Germany", "Year": 2003, "energy": 46.67, "source": "renewables"}, {"Entity": "Germany", "Year": 2004, "energy": 57.97, "source": "renewables"}, {"Entity": "Germany", "Year": 2005, "energy": 63.4, "source": "renewables"}, {"Entity": "Germany", "Year": 2006, "energy": 72.51, "source": "renewables"}, {"Entity": "Germany", "Year": 2007, "energy": 89.38, "source": "renewables"}, {"Entity": "Germany", "Year": 2008, "energy": 94.28, "source": "renewables"}, {"Entity": "Germany", "Year": 2009, "energy": 95.94, "source": "renewables"}, {"Entity": "Germany", "Year": 2010, "energy": 105.18, "source": "renewables"}, {"Entity": "Germany", "Year": 2011, "energy": 124.04, "source": "renewables"}, {"Entity": "Germany", "Year": 2012, "energy": 143.04, "source": "renewables"}, {"Entity": "Germany", "Year": 2013, "energy": 152.34, "source": "renewables"}, {"Entity": "Germany", "Year": 2014, "energy": 162.54, "source": "renewables"}, {"Entity": "Germany", "Year": 2015, "energy": 188.79, "source": "renewables"}, {"Entity": "Germany", "Year": 2016, "energy": 189.67, "source": "renewables"}, {"Entity": "Germany", "Year": 2017, "energy": 216.32, "source": "renewables"}, {"Entity": "Germany", "Year": 2018, "energy": 222.07, "source": "renewables"}, {"Entity": "Germany", "Year": 2019, "energy": 240.33, "source": "renewables"}, {"Entity": "Germany", "Year": 2020, "energy": 251.48, "source": "renewables"}, {"Entity": "India", "Year": 2000, "energy": 80.27, "source": "renewables"}, {"Entity": "India", "Year": 2001, "energy": 76.19, "source": "renewables"}, {"Entity": "India", "Year": 2002, "energy": 72.78, "source": "renewables"}, {"Entity": "India", "Year": 2003, "energy": 74.63, "source": "renewables"}, {"Entity": "India", "Year": 2004, "energy": 109.2, "source": "renewables"}, {"Entity": "India", "Year": 2005, "energy": 107.47, "source": "renewables"}, {"Entity": "India", "Year": 2006, "energy": 127.56, "source": "renewables"}, {"Entity": "India", "Year": 2007, "energy": 141.75, "source": "renewables"}, {"Entity": "India", "Year": 2008, "energy": 138.91, "source": "renewables"}, {"Entity": "India", "Year": 2009, "energy": 134.33, "source": "renewables"}, {"Entity": "India", "Year": 2010, "energy": 142.61, "source": "renewables"}, {"Entity": "India", "Year": 2011, "energy": 173.62, "source": "renewables"}, {"Entity": "India", "Year": 2012, "energy": 165.25, "source": "renewables"}, {"Entity": "India", "Year": 2013, "energy": 187.9, "source": "renewables"}, {"Entity": "India", "Year": 2014, "energy": 202.04, "source": "renewables"}, {"Entity": "India", "Year": 2015, "energy": 203.21, "source": "renewables"}, {"Entity": "India", "Year": 2016, "energy": 208.21, "source": "renewables"}, {"Entity": "India", "Year": 2017, "energy": 234.9, "source": "renewables"}, {"Entity": "India", "Year": 2018, "energy": 263.61, "source": "renewables"}, {"Entity": "India", "Year": 2019, "energy": 303.16, "source": "renewables"}, {"Entity": "India", "Year": 2020, "energy": 315.76, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2000, "energy": 19.6, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2001, "energy": 22.19, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2002, "energy": 21, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2003, "energy": 19.82, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2004, "energy": 20.97, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2005, "energy": 22.66, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2006, "energy": 21.18, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2007, "energy": 24.29, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2008, "energy": 26.34, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2009, "energy": 26.79, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2010, "energy": 34.63, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2011, "energy": 30.46, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2012, "energy": 31.11, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2013, "energy": 35.5, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2014, "energy": 34.41, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2015, "energy": 33.56, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2016, "energy": 39.58, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2017, "energy": 43.17, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2018, "energy": 48.38, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2019, "energy": 48.04, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2020, "energy": 52.91, "source": "renewables"}, {"Entity": "Italy", "Year": 2000, "energy": 50.87, "source": "renewables"}, {"Entity": "Italy", "Year": 2001, "energy": 54.35, "source": "renewables"}, {"Entity": "Italy", "Year": 2002, "energy": 48.31, "source": "renewables"}, {"Entity": "Italy", "Year": 2003, "energy": 46.86, "source": "renewables"}, {"Entity": "Italy", "Year": 2004, "energy": 53.88, "source": "renewables"}, {"Entity": "Italy", "Year": 2005, "energy": 48.43, "source": "renewables"}, {"Entity": "Italy", "Year": 2006, "energy": 50.64, "source": "renewables"}, {"Entity": "Italy", "Year": 2007, "energy": 47.72, "source": "renewables"}, {"Entity": "Italy", "Year": 2008, "energy": 58.16, "source": "renewables"}, {"Entity": "Italy", "Year": 2009, "energy": 69.26, "source": "renewables"}, {"Entity": "Italy", "Year": 2010, "energy": 76.98, "source": "renewables"}, {"Entity": "Italy", "Year": 2011, "energy": 82.96, "source": "renewables"}, {"Entity": "Italy", "Year": 2012, "energy": 92.22, "source": "renewables"}, {"Entity": "Italy", "Year": 2013, "energy": 112, "source": "renewables"}, {"Entity": "Italy", "Year": 2014, "energy": 120.68, "source": "renewables"}, {"Entity": "Italy", "Year": 2015, "energy": 108.89, "source": "renewables"}, {"Entity": "Italy", "Year": 2016, "energy": 108.01, "source": "renewables"}, {"Entity": "Italy", "Year": 2017, "energy": 103.89, "source": "renewables"}, {"Entity": "Italy", "Year": 2018, "energy": 114.41, "source": "renewables"}, {"Entity": "Italy", "Year": 2019, "energy": 115.83, "source": "renewables"}, {"Entity": "Italy", "Year": 2020, "energy": 116.9, "source": "renewables"}, {"Entity": "Japan", "Year": 2000, "energy": 104.16, "source": "renewables"}, {"Entity": "Japan", "Year": 2001, "energy": 101.36, "source": "renewables"}, {"Entity": "Japan", "Year": 2002, "energy": 101.1, "source": "renewables"}, {"Entity": "Japan", "Year": 2003, "energy": 114.18, "source": "renewables"}, {"Entity": "Japan", "Year": 2004, "energy": 114.73, "source": "renewables"}, {"Entity": "Japan", "Year": 2005, "energy": 100.57, "source": "renewables"}, {"Entity": "Japan", "Year": 2006, "energy": 112.07, "source": "renewables"}, {"Entity": "Japan", "Year": 2007, "energy": 100.8, "source": "renewables"}, {"Entity": "Japan", "Year": 2008, "energy": 100.79, "source": "renewables"}, {"Entity": "Japan", "Year": 2009, "energy": 102.28, "source": "renewables"}, {"Entity": "Japan", "Year": 2010, "energy": 113.92, "source": "renewables"}, {"Entity": "Japan", "Year": 2011, "energy": 116.5, "source": "renewables"}, {"Entity": "Japan", "Year": 2012, "energy": 111.09, "source": "renewables"}, {"Entity": "Japan", "Year": 2013, "energy": 121.48, "source": "renewables"}, {"Entity": "Japan", "Year": 2014, "energy": 136.53, "source": "renewables"}, {"Entity": "Japan", "Year": 2015, "energy": 157.34, "source": "renewables"}, {"Entity": "Japan", "Year": 2016, "energy": 157.7, "source": "renewables"}, {"Entity": "Japan", "Year": 2017, "energy": 175.12, "source": "renewables"}, {"Entity": "Japan", "Year": 2018, "energy": 183.63, "source": "renewables"}, {"Entity": "Japan", "Year": 2019, "energy": 192.72, "source": "renewables"}, {"Entity": "Japan", "Year": 2020, "energy": 205.6, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2000, "energy": 7.53, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2001, "energy": 8.08, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2002, "energy": 8.89, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2003, "energy": 8.62, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2004, "energy": 8.06, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2005, "energy": 7.86, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2006, "energy": 7.77, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2007, "energy": 8.17, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2008, "energy": 7.46, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2009, "energy": 6.88, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2010, "energy": 8.02, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2011, "energy": 7.88, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2012, "energy": 7.64, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2013, "energy": 7.73, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2014, "energy": 8.27, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2015, "energy": 9.45, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2016, "energy": 11.98, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2017, "energy": 11.64, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2018, "energy": 10.91, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2019, "energy": 11.09, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2020, "energy": 11.94, "source": "renewables"}, {"Entity": "Mexico", "Year": 2000, "energy": 44.51, "source": "renewables"}, {"Entity": "Mexico", "Year": 2001, "energy": 39.56, "source": "renewables"}, {"Entity": "Mexico", "Year": 2002, "energy": 35.67, "source": "renewables"}, {"Entity": "Mexico", "Year": 2003, "energy": 32.11, "source": "renewables"}, {"Entity": "Mexico", "Year": 2004, "energy": 38.19, "source": "renewables"}, {"Entity": "Mexico", "Year": 2005, "energy": 42.29, "source": "renewables"}, {"Entity": "Mexico", "Year": 2006, "energy": 43.63, "source": "renewables"}, {"Entity": "Mexico", "Year": 2007, "energy": 42.14, "source": "renewables"}, {"Entity": "Mexico", "Year": 2008, "energy": 53.22, "source": "renewables"}, {"Entity": "Mexico", "Year": 2009, "energy": 40.59, "source": "renewables"}, {"Entity": "Mexico", "Year": 2010, "energy": 51.37, "source": "renewables"}, {"Entity": "Mexico", "Year": 2011, "energy": 50.7, "source": "renewables"}, {"Entity": "Mexico", "Year": 2012, "energy": 47.2, "source": "renewables"}, {"Entity": "Mexico", "Year": 2013, "energy": 44.67, "source": "renewables"}, {"Entity": "Mexico", "Year": 2014, "energy": 57.46, "source": "renewables"}, {"Entity": "Mexico", "Year": 2015, "energy": 52.42, "source": "renewables"}, {"Entity": "Mexico", "Year": 2016, "energy": 52.97, "source": "renewables"}, {"Entity": "Mexico", "Year": 2017, "energy": 55.88, "source": "renewables"}, {"Entity": "Mexico", "Year": 2018, "energy": 58.78, "source": "renewables"}, {"Entity": "Mexico", "Year": 2019, "energy": 59, "source": "renewables"}, {"Entity": "Mexico", "Year": 2020, "energy": 69.19, "source": "renewables"}, {"Entity": "Poland", "Year": 2000, "energy": 2.33, "source": "renewables"}, {"Entity": "Poland", "Year": 2001, "energy": 2.78, "source": "renewables"}, {"Entity": "Poland", "Year": 2002, "energy": 2.77, "source": "renewables"}, {"Entity": "Poland", "Year": 2003, "energy": 2.25, "source": "renewables"}, {"Entity": "Poland", "Year": 2004, "energy": 3.2, "source": "renewables"}, {"Entity": "Poland", "Year": 2005, "energy": 3.85, "source": "renewables"}, {"Entity": "Poland", "Year": 2006, "energy": 4.29, "source": "renewables"}, {"Entity": "Poland", "Year": 2007, "energy": 5.43, "source": "renewables"}, {"Entity": "Poland", "Year": 2008, "energy": 6.61, "source": "renewables"}, {"Entity": "Poland", "Year": 2009, "energy": 8.69, "source": "renewables"}, {"Entity": "Poland", "Year": 2010, "energy": 10.88, "source": "renewables"}, {"Entity": "Poland", "Year": 2011, "energy": 13.13, "source": "renewables"}, {"Entity": "Poland", "Year": 2012, "energy": 16.88, "source": "renewables"}, {"Entity": "Poland", "Year": 2013, "energy": 17.06, "source": "renewables"}, {"Entity": "Poland", "Year": 2014, "energy": 19.85, "source": "renewables"}, {"Entity": "Poland", "Year": 2015, "energy": 22.69, "source": "renewables"}, {"Entity": "Poland", "Year": 2016, "energy": 22.81, "source": "renewables"}, {"Entity": "Poland", "Year": 2017, "energy": 24.13, "source": "renewables"}, {"Entity": "Poland", "Year": 2018, "energy": 21.62, "source": "renewables"}, {"Entity": "Poland", "Year": 2019, "energy": 25.46, "source": "renewables"}, {"Entity": "Poland", "Year": 2020, "energy": 28.23, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2000, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2001, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2002, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2003, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2004, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2005, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2006, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2007, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2008, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2009, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2010, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2011, "energy": 0.01, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2012, "energy": 0.03, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2013, "energy": 0.04, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2014, "energy": 0.05, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2015, "energy": 0.05, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2016, "energy": 0.05, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2017, "energy": 0.07, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2018, "energy": 0.16, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2019, "energy": 0.21, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2020, "energy": 0.21, "source": "renewables"}, {"Entity": "South Africa", "Year": 2000, "energy": 1.79, "source": "renewables"}, {"Entity": "South Africa", "Year": 2001, "energy": 2.46, "source": "renewables"}, {"Entity": "South Africa", "Year": 2002, "energy": 2.81, "source": "renewables"}, {"Entity": "South Africa", "Year": 2003, "energy": 1.19, "source": "renewables"}, {"Entity": "South Africa", "Year": 2004, "energy": 1.33, "source": "renewables"}, {"Entity": "South Africa", "Year": 2005, "energy": 1.75, "source": "renewables"}, {"Entity": "South Africa", "Year": 2006, "energy": 3.28, "source": "renewables"}, {"Entity": "South Africa", "Year": 2007, "energy": 1.3, "source": "renewables"}, {"Entity": "South Africa", "Year": 2008, "energy": 1.66, "source": "renewables"}, {"Entity": "South Africa", "Year": 2009, "energy": 1.86, "source": "renewables"}, {"Entity": "South Africa", "Year": 2010, "energy": 2.51, "source": "renewables"}, {"Entity": "South Africa", "Year": 2011, "energy": 2.49, "source": "renewables"}, {"Entity": "South Africa", "Year": 2012, "energy": 1.66, "source": "renewables"}, {"Entity": "South Africa", "Year": 2013, "energy": 1.62, "source": "renewables"}, {"Entity": "South Africa", "Year": 2014, "energy": 3.38, "source": "renewables"}, {"Entity": "South Africa", "Year": 2015, "energy": 6.09, "source": "renewables"}, {"Entity": "South Africa", "Year": 2016, "energy": 7.69, "source": "renewables"}, {"Entity": "South Africa", "Year": 2017, "energy": 10.04, "source": "renewables"}, {"Entity": "South Africa", "Year": 2018, "energy": 12.22, "source": "renewables"}, {"Entity": "South Africa", "Year": 2019, "energy": 12.57, "source": "renewables"}, {"Entity": "South Africa", "Year": 2020, "energy": 12.83, "source": "renewables"}, {"Entity": "Spain", "Year": 2000, "energy": 34.49, "source": "renewables"}, {"Entity": "Spain", "Year": 2001, "energy": 49.3, "source": "renewables"}, {"Entity": "Spain", "Year": 2002, "energy": 33.17, "source": "renewables"}, {"Entity": "Spain", "Year": 2003, "energy": 55.75, "source": "renewables"}, {"Entity": "Spain", "Year": 2004, "energy": 50.13, "source": "renewables"}, {"Entity": "Spain", "Year": 2005, "energy": 42.27, "source": "renewables"}, {"Entity": "Spain", "Year": 2006, "energy": 52.15, "source": "renewables"}, {"Entity": "Spain", "Year": 2007, "energy": 58.3, "source": "renewables"}, {"Entity": "Spain", "Year": 2008, "energy": 62.15, "source": "renewables"}, {"Entity": "Spain", "Year": 2009, "energy": 74.08, "source": "renewables"}, {"Entity": "Spain", "Year": 2010, "energy": 97.77, "source": "renewables"}, {"Entity": "Spain", "Year": 2011, "energy": 87.53, "source": "renewables"}, {"Entity": "Spain", "Year": 2012, "energy": 86.97, "source": "renewables"}, {"Entity": "Spain", "Year": 2013, "energy": 111.42, "source": "renewables"}, {"Entity": "Spain", "Year": 2014, "energy": 110.26, "source": "renewables"}, {"Entity": "Spain", "Year": 2015, "energy": 97.09, "source": "renewables"}, {"Entity": "Spain", "Year": 2016, "energy": 104.63, "source": "renewables"}, {"Entity": "Spain", "Year": 2017, "energy": 87.93, "source": "renewables"}, {"Entity": "Spain", "Year": 2018, "energy": 103.88, "source": "renewables"}, {"Entity": "Spain", "Year": 2019, "energy": 100.99, "source": "renewables"}, {"Entity": "Spain", "Year": 2020, "energy": 113.79, "source": "renewables"}, {"Entity": "Thailand", "Year": 2000, "energy": 6.38, "source": "renewables"}, {"Entity": "Thailand", "Year": 2001, "energy": 6.76, "source": "renewables"}, {"Entity": "Thailand", "Year": 2002, "energy": 8.07, "source": "renewables"}, {"Entity": "Thailand", "Year": 2003, "energy": 8.36, "source": "renewables"}, {"Entity": "Thailand", "Year": 2004, "energy": 7.63, "source": "renewables"}, {"Entity": "Thailand", "Year": 2005, "energy": 7.42, "source": "renewables"}, {"Entity": "Thailand", "Year": 2006, "energy": 9.82, "source": "renewables"}, {"Entity": "Thailand", "Year": 2007, "energy": 10.2, "source": "renewables"}, {"Entity": "Thailand", "Year": 2008, "energy": 8.95, "source": "renewables"}, {"Entity": "Thailand", "Year": 2009, "energy": 9.09, "source": "renewables"}, {"Entity": "Thailand", "Year": 2010, "energy": 8.58, "source": "renewables"}, {"Entity": "Thailand", "Year": 2011, "energy": 11.83, "source": "renewables"}, {"Entity": "Thailand", "Year": 2012, "energy": 13.42, "source": "renewables"}, {"Entity": "Thailand", "Year": 2013, "energy": 12.33, "source": "renewables"}, {"Entity": "Thailand", "Year": 2014, "energy": 13.68, "source": "renewables"}, {"Entity": "Thailand", "Year": 2015, "energy": 13.33, "source": "renewables"}, {"Entity": "Thailand", "Year": 2016, "energy": 15.97, "source": "renewables"}, {"Entity": "Thailand", "Year": 2017, "energy": 19.92, "source": "renewables"}, {"Entity": "Thailand", "Year": 2018, "energy": 25.84, "source": "renewables"}, {"Entity": "Thailand", "Year": 2019, "energy": 28.02, "source": "renewables"}, {"Entity": "Thailand", "Year": 2020, "energy": 24.73, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2000, "energy": 11.28, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2001, "energy": 12.05, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2002, "energy": 9.65, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2003, "energy": 9.27, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2004, "energy": 11.78, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2005, "energy": 12.4, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2006, "energy": 12.92, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2007, "energy": 10.47, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2008, "energy": 11.82, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2009, "energy": 12.12, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2010, "energy": 13.39, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2011, "energy": 11.2, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2012, "energy": 11.23, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2013, "energy": 15.11, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2014, "energy": 10.17, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2015, "energy": 7.1, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2016, "energy": 9.25, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2017, "energy": 10.88, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2018, "energy": 13.02, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2019, "energy": 11.87, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2020, "energy": 17.56, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2000, "energy": 9.98, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2001, "energy": 9.56, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2002, "energy": 11.13, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2003, "energy": 10.62, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2004, "energy": 14.14, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2005, "energy": 16.93, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2006, "energy": 18.11, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2007, "energy": 19.69, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2008, "energy": 21.85, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2009, "energy": 25.25, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2010, "energy": 26.18, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2011, "energy": 35.2, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2012, "energy": 41.24, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2013, "energy": 53.21, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2014, "energy": 64.52, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2015, "energy": 82.57, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2016, "energy": 82.99, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2017, "energy": 98.85, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2018, "energy": 110.03, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2019, "energy": 120.48, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2020, "energy": 131.74, "source": "renewables"}, {"Entity": "United States", "Year": 2000, "energy": 350.93, "source": "renewables"}, {"Entity": "United States", "Year": 2001, "energy": 280.06, "source": "renewables"}, {"Entity": "United States", "Year": 2002, "energy": 336.34, "source": "renewables"}, {"Entity": "United States", "Year": 2003, "energy": 349.18, "source": "renewables"}, {"Entity": "United States", "Year": 2004, "energy": 345.14, "source": "renewables"}, {"Entity": "United States", "Year": 2005, "energy": 353.04, "source": "renewables"}, {"Entity": "United States", "Year": 2006, "energy": 381.16, "source": "renewables"}, {"Entity": "United States", "Year": 2007, "energy": 347.91, "source": "renewables"}, {"Entity": "United States", "Year": 2008, "energy": 377.11, "source": "renewables"}, {"Entity": "United States", "Year": 2009, "energy": 415.56, "source": "renewables"}, {"Entity": "United States", "Year": 2010, "energy": 424.48, "source": "renewables"}, {"Entity": "United States", "Year": 2011, "energy": 509.74, "source": "renewables"}, {"Entity": "United States", "Year": 2012, "energy": 492.32, "source": "renewables"}, {"Entity": "United States", "Year": 2013, "energy": 520.38, "source": "renewables"}, {"Entity": "United States", "Year": 2014, "energy": 546.83, "source": "renewables"}, {"Entity": "United States", "Year": 2015, "energy": 556.49, "source": "renewables"}, {"Entity": "United States", "Year": 2016, "energy": 624.91, "source": "renewables"}, {"Entity": "United States", "Year": 2017, "energy": 707.19, "source": "renewables"}, {"Entity": "United States", "Year": 2018, "energy": 733.17, "source": "renewables"}, {"Entity": "United States", "Year": 2019, "energy": 760.76, "source": "renewables"}, {"Entity": "United States", "Year": 2020, "energy": 821.4, "source": "renewables"}], "metadata": {"Entity": {"type": "string", "semanticType": "Location"}, "Year": {"type": "number", "semanticType": "Year"}, "energy": {"type": "number", "semanticType": "Number"}, "source": {"type": "string", "semanticType": "String", "levels": ["fossil fuels", "nuclear", "renewables"]}}, "derive": {"code": "\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Unpivot the energy source columns into long format\n transformed_df = pd.melt(\n df_energy,\n id_vars=['Year', 'Entity'],\n value_vars=[\n 'Electricity from fossil fuels (TWh)',\n 'Electricity from nuclear (TWh)',\n 'Electricity from renewables (TWh)'\n ],\n var_name='source',\n value_name='energy'\n )\n \n # Clean up the source names to be more readable\n transformed_df['source'] = transformed_df['source'].str.replace('Electricity from ', '').str.replace(' (TWh)', '')\n \n return transformed_df\n", "source": ["global-energy-20-small.csv"], "dialog": [{"content": "You are a data scientist to help user to transform data that will be used for visualization.\nThe user will provide you information about what data would be needed, and your job is to create a python function based on the input data summary, transformation instruction and expected fields.\nThe users' instruction includes \"chart_type\" and \"chart_encodings\" that describe the visualization they want, and natural language instructions \"goal\" that describe what data is needed.\n\n**Important:**\n- NEVER make assumptions or judgments about a person's gender, biological sex, sexuality, religion, race, nationality, ethnicity, political stance, socioeconomic status, mental health, invisible disabilities, medical conditions, personality type, social impressions, emotional state, and cognitive state.\n- NEVER create formulas that could be used to discriminate based on age. Ageism of any form (explicit and implicit) is strictly prohibited.\n- If above issue occurs, generate columns with np.nan.\n\nConcretely, you should first refine users' goal and then create a python function in the output section based off the [CONTEXT] and [GOAL]:\n\n 1. First, refine users' [GOAL]. The main objective in this step is to check if \"chart_type\" and \"chart_encodings\" provided by the user are sufficient to achieve their \"goal\". Concretely:\n - based on the user's \"goal\" and \"chart_type\" and \"chart_encodings\", elaborate the goal into a \"detailed_instruction\".\n - \"display_instruction\" is a short verb phrase describing the users' goal. \n - it would be a short verbal description of user intent as a verb phrase (<12 words).\n - generate it based on detailed_instruction and the suggested chart_type and chart_encodings, but don't need to mention the chart details.\n - should capture key computation ideas: by reading the display, the user can understand the purpose and what's derived from the data.\n - if the user specification follows up the previous instruction, the 'display_instruction' should only describe how it builds up the previous instruction without repeating information from previous steps.\n - the phrase can be presented in different styles, e.g., question (what's xxx), instruction (show xxx), description, etc.\n - if you mention column names from the input or the output data, highlight the text in **bold**.\n * the column can either be a column in the input data, or a new column that will be computed in the output data.\n * the mention don't have to be exact match, it can be semantically matching, e.g., if you mentioned \"average score\" in the text while the column to be computed is \"Avg_Score\", you should still highlight \"**average score**\" in the text.\n - determine \"output_fields\", the desired fields that the output data should have to achieve the user's goal, it's a good idea to include intermediate fields here.\n - then decide \"chart_encodings\", which maps visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized, \n - the \"chart_encodings\" should be created to support the user's \"chart_type\".\n - first, determine whether the user has provided sufficient fields in \"chart_encodings\" that are needed to achieve their goal:\n - if the user's \"chart_encodings\" are sufficient, simply copy it.\n - if the user didn't provide sufficient fields in \"chart_encodings\", add missing fields in \"chart_encodings\" (ordered them based on whether the field will be used in x,y axes or legends);\n - \"chart_encodings\" should only include fields that will be visualized (do not include other intermediate fields from \"output_fields\") \n - when adding new fields to \"chart_encodings\", be efficient and add only a minimal number of fields that are needed to achive the user's goal. \n - generally, the total number of fields in \"chart_encodings\" should be no more than 3 for x,y,legend.\n - if the user's \"chart_encodings\" is sufficient but can be optimized, you can reorder encodings to visualize the data more effectively.\n - sometimes, user may provide instruction to update visualizations fields they provided. You should leverage the user's goal to resolve the conflict and decide the final \"chart_encodings\"\n - e.g., they may mention \"use B metric instead\" while A metric is in provided fields, in this case, you should update \"chart_encodings\" to update A metric with B metric.\n - guide on statistical analysis:\n - when the user asks for forecasting or regression analysis, you should consider the following:\n - the output should be a long format table where actual x, y pairs and predicted x, y pairs are included in the X, Y columns, they are differentiated with a third column \"is_predicted\" that is a boolean field.\n - i.e., if the user ask for forecasting based on two columns T and Y, the output should be three columns: T, Y, is_predicted, where\n - T, Y columns contain BOTH original values from the data and predicted values from the data.\n - is_predicted is a boolean field to indicate whether the x, y pairs are original values from the data or predicted / regression values from the data.\n - the recommended chart should be line chart (time series) or scatter plot (quantitative x, y)\n - if the user asks for forecasting, it's good to include predicted x, y pairs for both x in the original data and future x values (i.e., combine regression and forecasting results)\n - in this case, is_predicted should be of three values 'original', 'regression', 'forecasting'\n - when the user asks for clustering:\n - the output should be a long format table where actual x, y pairs with a third column \"cluster_id\" that indicates the cluster id of the data point.\n - the recommended chart should be scatter plot (quantitative x, y)\n \n Prepare the result in the following json format:\n\n```\n{\n \"detailed_instruction\": \"...\" // string, elaborate user instruction with details if the user\n \"display_instruction\": \"...\" // string, the short verb phrase describing the users' goal.\n \"output_fields\": [...] // string[], describe the desired output fields that the output data should have based on the user's goal, it's a good idea to preserve intermediate fields here (i.e., the goal of transformed data)\n \"chart_encodings\": {\n \"x\": \"\",\n \"y\": \"\",\n \"color\": \"\",\n \"size\": \"\",\n \"opacity\": \"\",\n \"facet\": \"\",\n ... // other visualization channels user used\n } // object: map visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized.\n \"reason\": \"...\" // string, explain why this refinement is made\n}\n```\n\n 2. Then, write a python function based on the refined goal, the function input is a dataframe \"df\" (or multiple dataframes based on tables presented in the [CONTEXT] section) and the output is the transformed dataframe \"transformed_df\". \"transformed_df\" should contain all \"output_fields\" from the refined goal.\nThe python function must follow the template provided in [TEMPLATE], do not import any other libraries or modify function name. The function should be as simple as possible and easily readable.\nIf there is no data transformation needed based on \"output_fields\", the transformation function can simply \"return df\".\n\n[TEMPLATE]\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\nfrom sklearn import ... # import necessary libraries from sklearn if needed\n\ndef transform_data(df1, df2, ...): \n # complete the template here\n return transformed_df\n```\n\nnote: \n- if the user provided one table, then it should be `def transform_data(df1)`, if the user provided multiple tables, then it should be `def transform_data(df1, df2, ...)` and you should consider the join between tables to derive the output.\n- **VERY IMPORTANT** the number of arguments in the function must match the number of tables provided, and the order of arguments must match the order of tables provided.\n- try to use intuitive table names to refer to the input dataframes, for example, if the user provided two tables city and weather, you can use `transform_data(df_city, df_weather)` to refer to the two dataframes, as long as the number and order of the arguments match the number and order of the tables provided.\n- datetime objects handling:\n - if the output field is year, convert it to number, if it is year-month / year-month-day, convert it to string object (e.g., \"2020-01\" / \"2020-01-01\").\n - if the output is time only: convert hour to number if it's just the hour (e.g., 10), but convert hour:min or h:m:s to string object (e.g., \"10:30\", \"10:30:45\")\n - never return datetime object directly, convert it to either number (if it only contains year) or string so it's readable.\n\n 3. The output must only contain a json object representing the refined goal and a python code block representing the transformation code, do not add any extra text explanation.\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 158029.9988, 190729.9957, 209229.9957, ..., 6993180.176, 9861099.609, 9874660.156, 10096009.77\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 90.14, 91.79, 92.54, 804.95\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[GOAL]\n\n{\n \"instruction\": \"\",\n \"chart_type\": \"Line Chart\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Transform the data into a long format where different energy sources (fossil fuels, nuclear, renewables) are unpivoted into separate rows. Each row should contain the year, entity (country), energy source type, and the corresponding energy value in TWh. This will enable visualization of energy generation trends over time by source and country.\",\n \"display_instruction\": \"Show **energy generation** trends by **source** and **Entity** over **Year**\",\n \"output_fields\": [\"Year\", \"Entity\", \"source\", \"energy\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n },\n \"reason\": \"The user wants to visualize energy data across different sources with a line chart faceted by source. The original data has energy sources in separate columns (fossil fuels, nuclear, renewables), so we need to unpivot these columns into a long format with 'source' and 'energy' fields. The chart_encodings provided by the user are appropriate for this visualization goal, using faceting (column) to separate different energy sources.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Unpivot the energy source columns into long format\n transformed_df = pd.melt(\n df_energy,\n id_vars=['Year', 'Entity'],\n value_vars=[\n 'Electricity from fossil fuels (TWh)',\n 'Electricity from nuclear (TWh)',\n 'Electricity from renewables (TWh)'\n ],\n var_name='source',\n value_name='energy'\n )\n \n # Clean up the source names to be more readable\n transformed_df['source'] = transformed_df['source'].str.replace('Electricity from ', '').str.replace(' (TWh)', '')\n \n return transformed_df\n```", "role": "assistant"}], "trigger": {"tableId": "global-energy-20-small.csv", "resultTableId": "table-82", "chart": {"id": "chart-1760738394155", "chartType": "Line Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"channel": "y", "bin": false, "fieldID": "concept-1760738385163"}, "color": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "opacity": {"channel": "opacity", "bin": false}, "column": {"channel": "column", "bin": false, "fieldID": "concept-1760738389404"}, "row": {"channel": "row", "bin": false}}, "tableRef": "global-energy-20-small.csv", "saved": false, "source": "trigger", "unread": false}, "interaction": [{"from": "user", "to": "datatransform-agent", "role": "instruction", "content": "", "displayContent": "Show **energy generation** trends by **source** and **Entity** over **Year**"}]}, "explanation": {"agent": "CodeExplanationAgent", "code": "The code performs a data transformation to restructure the energy data from a wide format to a long format:\n\n1. **Unpivot energy columns**: Uses `pd.melt()` to transform the three energy source columns (**Electricity from fossil fuels**, **Electricity from nuclear**, and **Electricity from renewables**) from separate columns into rows, while keeping **Year** and **Entity** as identifier columns.\n\n2. **Create source classification**: Generates a new **source** column that captures which energy type each row represents, and stores the corresponding energy values in an **energy** column.\n\n3. **Clean source labels**: Removes the prefix \"Electricity from \" and suffix \" (TWh)\" from the source names, resulting in cleaner labels: **fossil fuels**, **nuclear**, and **renewables**.\n\n4. **Return restructured data**: The final dataset has each country-year-energy source combination as a separate row, making it easier to analyze and visualize energy mix across countries and time periods.", "concepts": [], "dialog": [{"content": "You are a data scientist to help user explain code, \nso that a non-code can clearly understand what the code is doing, you are provided with a summary of the input data, and the transformation code.\n\nYour goal:\n1. You should generate a good itemized explanation of the code so that the reader can understand high-level steps of what the data transformation is doing.\n - Be very concise, and stay at a high-level. The reader doesn't understand code and does not want to learn exactly what the code is doing. They just want to learn what have been done from a logical level.\n - The explanation should be a markdown string that is a list of bullet points (with new lines), highlight constants, data fields, and important verbs.\n2. Generate a list of explanations for new fields (fields not from the input data) that introduce metrics/concepts that are not obvious from the code.\n - provide a declarative definition that explains the new field, use a mathematical notation if applicable.\n - only include new fields explanation of new metrics that are involved in computation (e.g., ROI, commerical_success_score)\n - *DO NOT* explain trivial new fields like \"Decade\" or \"Avg_Rating\", \"US_Sales\" that are self-explanatory.\n - Avoid explaining fields that are simple aggregate of fields in the original data (min_score, avg_value, count, etc.)\n - When a field involves mathematical computation, you can use LaTeX math notation in the explanation. Format mathematical expressions using:\n - Inline math: `\\( ... \\)` for formulas within text\n - Block math: `\\[ ... \\]` for standalone formulas\n - Examples: `\\( \\frac{\\text{Revenue}}{\\text{Cost}} \\)` for ratios, `\\[ \\text{Score} = \\text{Rating} \\times \\text{Worldwide\\_Gross} \\]` for formulas\n - note: when using underscores as part of the text, you need to escape them with a backslash, e.g., `\\_`\n - Note: don't use math notation for fields whose computation is trivial (use plain english), it will likely be confusing to the reader. \n Only use math notation for fields that can not be easilyexplained in plain english. Use it sparingly.\n3. If there are multiple fields that have the similar computation, you can explain them together in one explanation.\n - in \"field\", you can provide a list of fields in format of \"field1, field2, ...\"\n - in \"explanation\", you can provide a single explanation for the computation of the fields.\n - for example, if you have fields like \"Norm_Rating\", \"Norm_Gross\", \"Critical_Commercial_Score\", you can explain Norm_Rating, Norm_Gross together in one explanation and explain Critical_Commercial_Score in another explanation.\n4. If the code is about statistical analysis, you should explain the statistical analysis in the explanation as a concept named \"Statistical Analysis\" in the [CONCEPTS EXPLANATION] section.\n - explain how you model the data, which fields are used, how data processing is done, and what models are used.\n - suggest some other modeling approaches that can be used to analyze the data in the explanation as well.\n \nThe focus is to explain how new fields are computed, don't generate explanation for low-level actions like \"return\", \"load data\" etc. \n\nProvide the result in the following two sections:\n - first section is the code explanation that should be a markdown block explaining the code, in the [CODE EXPLANATION] section.\n - remember to highlight constants, data fields, and important verbs in the code explanation.\n - second section is the concepts explanation that should be a json block (start with ```json) in the [CONCEPTS EXPLANATION] section.\n\n[CODE EXPLANATION]\n\n...(explanation of the code)\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"...\",\n \"explanation\": \"...\"\n }\n]\n\n```\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small_csv)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 190729.9957, 227580.0018, 233600.0061, ..., 5736319.824, 5738290.039, nan, 9861099.609\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 90.14, 91.79, 92.54, nan\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[CODE]\n\nhere is the transformation code: \nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Unpivot the energy source columns into long format\n transformed_df = pd.melt(\n df_energy,\n id_vars=['Year', 'Entity'],\n value_vars=[\n 'Electricity from fossil fuels (TWh)',\n 'Electricity from nuclear (TWh)',\n 'Electricity from renewables (TWh)'\n ],\n var_name='source',\n value_name='energy'\n )\n \n # Clean up the source names to be more readable\n transformed_df['source'] = transformed_df['source'].str.replace('Electricity from ', '').str.replace(' (TWh)', '')\n \n return transformed_df\n\n\n[EXPLANATION]\n", "role": "user"}, {"content": "[CODE EXPLANATION]\n\nThe code performs a data transformation to restructure the energy data from a wide format to a long format:\n\n1. **Unpivot energy columns**: Uses `pd.melt()` to transform the three energy source columns (**Electricity from fossil fuels**, **Electricity from nuclear**, and **Electricity from renewables**) from separate columns into rows, while keeping **Year** and **Entity** as identifier columns.\n\n2. **Create source classification**: Generates a new **source** column that captures which energy type each row represents, and stores the corresponding energy values in an **energy** column.\n\n3. **Clean source labels**: Removes the prefix \"Electricity from \" and suffix \" (TWh)\" from the source names, resulting in cleaner labels: **fossil fuels**, **nuclear**, and **renewables**.\n\n4. **Return restructured data**: The final dataset has each country-year-energy source combination as a separate row, making it easier to analyze and visualize energy mix across countries and time periods.\n\n[CONCEPTS EXPLANATION]\n\n```json\n[]\n```", "role": "assistant"}], "status": "ok"}, "status": "completed"}, "anchored": false, "attachedMetadata": ""}, {"kind": "table", "id": "table-45", "displayId": "renewable-energy", "names": ["Entity", "Year", "renewable_percentage"], "rows": [{"Entity": "Australia", "Year": 2000, "renewable_percentage": 8.6344368187}, {"Entity": "Australia", "Year": 2001, "renewable_percentage": 8.2180135078}, {"Entity": "Australia", "Year": 2002, "renewable_percentage": 8.0833022736}, {"Entity": "Australia", "Year": 2003, "renewable_percentage": 8.6598324205}, {"Entity": "Australia", "Year": 2004, "renewable_percentage": 8.7013045232}, {"Entity": "Australia", "Year": 2005, "renewable_percentage": 9.1562355123}, {"Entity": "Australia", "Year": 2006, "renewable_percentage": 9.6357600837}, {"Entity": "Australia", "Year": 2007, "renewable_percentage": 9.1190310213}, {"Entity": "Australia", "Year": 2008, "renewable_percentage": 8.0548900022}, {"Entity": "Australia", "Year": 2009, "renewable_percentage": 7.8043793133}, {"Entity": "Australia", "Year": 2010, "renewable_percentage": 9.0442152121}, {"Entity": "Australia", "Year": 2011, "renewable_percentage": 11.3454273735}, {"Entity": "Australia", "Year": 2012, "renewable_percentage": 11.4105750279}, {"Entity": "Australia", "Year": 2013, "renewable_percentage": 14.8708583355}, {"Entity": "Australia", "Year": 2014, "renewable_percentage": 14.9621290509}, {"Entity": "Australia", "Year": 2015, "renewable_percentage": 14.3476000693}, {"Entity": "Australia", "Year": 2016, "renewable_percentage": 15.6093794449}, {"Entity": "Australia", "Year": 2017, "renewable_percentage": 16.3138729943}, {"Entity": "Australia", "Year": 2018, "renewable_percentage": 17.145938174}, {"Entity": "Australia", "Year": 2019, "renewable_percentage": 21.3759705435}, {"Entity": "Australia", "Year": 2020, "renewable_percentage": 25.5031684668}, {"Entity": "Brazil", "Year": 2000, "renewable_percentage": 90.1307723743}, {"Entity": "Brazil", "Year": 2001, "renewable_percentage": 84.6953615744}, {"Entity": "Brazil", "Year": 2002, "renewable_percentage": 86.0883364189}, {"Entity": "Brazil", "Year": 2003, "renewable_percentage": 87.4561159097}, {"Entity": "Brazil", "Year": 2004, "renewable_percentage": 86.4260041451}, {"Entity": "Brazil", "Year": 2005, "renewable_percentage": 87.6781562721}, {"Entity": "Brazil", "Year": 2006, "renewable_percentage": 87.2842473236}, {"Entity": "Brazil", "Year": 2007, "renewable_percentage": 88.7252098726}, {"Entity": "Brazil", "Year": 2008, "renewable_percentage": 84.8072313004}, {"Entity": "Brazil", "Year": 2009, "renewable_percentage": 89.4172280725}, {"Entity": "Brazil", "Year": 2010, "renewable_percentage": 85.3576882415}, {"Entity": "Brazil", "Year": 2011, "renewable_percentage": 87.6618820986}, {"Entity": "Brazil", "Year": 2012, "renewable_percentage": 83.1164558813}, {"Entity": "Brazil", "Year": 2013, "renewable_percentage": 77.5240022006}, {"Entity": "Brazil", "Year": 2014, "renewable_percentage": 74.0418657409}, {"Entity": "Brazil", "Year": 2015, "renewable_percentage": 75.0231817625}, {"Entity": "Brazil", "Year": 2016, "renewable_percentage": 81.0938046902}, {"Entity": "Brazil", "Year": 2017, "renewable_percentage": 79.9091472228}, {"Entity": "Brazil", "Year": 2018, "renewable_percentage": 82.9198505403}, {"Entity": "Brazil", "Year": 2019, "renewable_percentage": 82.8548799017}, {"Entity": "Brazil", "Year": 2020, "renewable_percentage": 84.6411771408}, {"Entity": "Canada", "Year": 2000, "renewable_percentage": 61.8095917882}, {"Entity": "Canada", "Year": 2001, "renewable_percentage": 59.3287558747}, {"Entity": "Canada", "Year": 2002, "renewable_percentage": 61.1477403113}, {"Entity": "Canada", "Year": 2003, "renewable_percentage": 60.0789685174}, {"Entity": "Canada", "Year": 2004, "renewable_percentage": 59.6967771845}, {"Entity": "Canada", "Year": 2005, "renewable_percentage": 60.8208155391}, {"Entity": "Canada", "Year": 2006, "renewable_percentage": 60.8271602855}, {"Entity": "Canada", "Year": 2007, "renewable_percentage": 61.2460642446}, {"Entity": "Canada", "Year": 2008, "renewable_percentage": 62.6520720838}, {"Entity": "Canada", "Year": 2009, "renewable_percentage": 63.8919227732}, {"Entity": "Canada", "Year": 2010, "renewable_percentage": 62.9421470558}, {"Entity": "Canada", "Year": 2011, "renewable_percentage": 64.0922915917}, {"Entity": "Canada", "Year": 2012, "renewable_percentage": 65.098730952}, {"Entity": "Canada", "Year": 2013, "renewable_percentage": 65.4320794066}, {"Entity": "Canada", "Year": 2014, "renewable_percentage": 64.791145907}, {"Entity": "Canada", "Year": 2015, "renewable_percentage": 65.2946239925}, {"Entity": "Canada", "Year": 2016, "renewable_percentage": 66.1890584295}, {"Entity": "Canada", "Year": 2017, "renewable_percentage": 67.5399410579}, {"Entity": "Canada", "Year": 2018, "renewable_percentage": 67.3685700357}, {"Entity": "Canada", "Year": 2019, "renewable_percentage": 67.1741623137}, {"Entity": "Canada", "Year": 2020, "renewable_percentage": 68.7796436354}, {"Entity": "China", "Year": 2000, "renewable_percentage": 16.639126586}, {"Entity": "China", "Year": 2001, "renewable_percentage": 18.9581237042}, {"Entity": "China", "Year": 2002, "renewable_percentage": 17.6185006046}, {"Entity": "China", "Year": 2003, "renewable_percentage": 15.0362717081}, {"Entity": "China", "Year": 2004, "renewable_percentage": 16.2224108273}, {"Entity": "China", "Year": 2005, "renewable_percentage": 16.1731179957}, {"Entity": "China", "Year": 2006, "renewable_percentage": 15.5884036124}, {"Entity": "China", "Year": 2007, "renewable_percentage": 15.2583847828}, {"Entity": "China", "Year": 2008, "renewable_percentage": 19.0253335469}, {"Entity": "China", "Year": 2009, "renewable_percentage": 17.8857170547}, {"Entity": "China", "Year": 2010, "renewable_percentage": 18.7800759915}, {"Entity": "China", "Year": 2011, "renewable_percentage": 16.8902341543}, {"Entity": "China", "Year": 2012, "renewable_percentage": 20.122965176}, {"Entity": "China", "Year": 2013, "renewable_percentage": 20.2152481955}, {"Entity": "China", "Year": 2014, "renewable_percentage": 22.3502204285}, {"Entity": "China", "Year": 2015, "renewable_percentage": 24.079270189}, {"Entity": "China", "Year": 2016, "renewable_percentage": 25.0007798429}, {"Entity": "China", "Year": 2017, "renewable_percentage": 25.419242299}, {"Entity": "China", "Year": 2018, "renewable_percentage": 25.7747942589}, {"Entity": "China", "Year": 2019, "renewable_percentage": 26.9995671106}, {"Entity": "China", "Year": 2020, "renewable_percentage": 28.2464606924}, {"Entity": "France", "Year": 2000, "renewable_percentage": 12.7117691154}, {"Entity": "France", "Year": 2001, "renewable_percentage": 13.9961372206}, {"Entity": "France", "Year": 2002, "renewable_percentage": 11.3544157067}, {"Entity": "France", "Year": 2003, "renewable_percentage": 10.9783540506}, {"Entity": "France", "Year": 2004, "renewable_percentage": 11.0051305559}, {"Entity": "France", "Year": 2005, "renewable_percentage": 9.6479837153}, {"Entity": "France", "Year": 2006, "renewable_percentage": 10.7235915493}, {"Entity": "France", "Year": 2007, "renewable_percentage": 11.4370075239}, {"Entity": "France", "Year": 2008, "renewable_percentage": 12.7487441615}, {"Entity": "France", "Year": 2009, "renewable_percentage": 12.8776856068}, {"Entity": "France", "Year": 2010, "renewable_percentage": 13.6240072491}, {"Entity": "France", "Year": 2011, "renewable_percentage": 11.63553049}, {"Entity": "France", "Year": 2012, "renewable_percentage": 15.0331522889}, {"Entity": "France", "Year": 2013, "renewable_percentage": 17.2469424928}, {"Entity": "France", "Year": 2014, "renewable_percentage": 16.6074992494}, {"Entity": "France", "Year": 2015, "renewable_percentage": 16.002230276}, {"Entity": "France", "Year": 2016, "renewable_percentage": 17.7212924013}, {"Entity": "France", "Year": 2017, "renewable_percentage": 16.6576751547}, {"Entity": "France", "Year": 2018, "renewable_percentage": 19.7315179827}, {"Entity": "France", "Year": 2019, "renewable_percentage": 20.0116665488}, {"Entity": "France", "Year": 2020, "renewable_percentage": 23.7610241821}, {"Entity": "Germany", "Year": 2000, "renewable_percentage": 6.1977983575}, {"Entity": "Germany", "Year": 2001, "renewable_percentage": 6.5132585197}, {"Entity": "Germany", "Year": 2002, "renewable_percentage": 7.6431369854}, {"Entity": "Germany", "Year": 2003, "renewable_percentage": 7.7455438643}, {"Entity": "Germany", "Year": 2004, "renewable_percentage": 9.4989185292}, {"Entity": "Germany", "Year": 2005, "renewable_percentage": 10.3356645637}, {"Entity": "Germany", "Year": 2006, "renewable_percentage": 11.5129959829}, {"Entity": "Germany", "Year": 2007, "renewable_percentage": 14.135471525}, {"Entity": "Germany", "Year": 2008, "renewable_percentage": 14.8894504106}, {"Entity": "Germany", "Year": 2009, "renewable_percentage": 16.2902842395}, {"Entity": "Germany", "Year": 2010, "renewable_percentage": 16.8384989754}, {"Entity": "Germany", "Year": 2011, "renewable_percentage": 20.4967199299}, {"Entity": "Germany", "Year": 2012, "renewable_percentage": 23.056464482}, {"Entity": "Germany", "Year": 2013, "renewable_percentage": 24.1368929731}, {"Entity": "Germany", "Year": 2014, "renewable_percentage": 26.2182434067}, {"Entity": "Germany", "Year": 2015, "renewable_percentage": 29.4721888318}, {"Entity": "Germany", "Year": 2016, "renewable_percentage": 29.4990435013}, {"Entity": "Germany", "Year": 2017, "renewable_percentage": 33.4855497593}, {"Entity": "Germany", "Year": 2018, "renewable_percentage": 35.0976735365}, {"Entity": "Germany", "Year": 2019, "renewable_percentage": 40.0890757144}, {"Entity": "Germany", "Year": 2020, "renewable_percentage": 44.3324048937}, {"Entity": "India", "Year": 2000, "renewable_percentage": 14.0481982534}, {"Entity": "India", "Year": 2001, "renewable_percentage": 12.9997099422}, {"Entity": "India", "Year": 2002, "renewable_percentage": 11.938193032}, {"Entity": "India", "Year": 2003, "renewable_percentage": 11.695109147}, {"Entity": "India", "Year": 2004, "renewable_percentage": 15.6375300722}, {"Entity": "India", "Year": 2005, "renewable_percentage": 15.2543575768}, {"Entity": "India", "Year": 2006, "renewable_percentage": 17.1352578483}, {"Entity": "India", "Year": 2007, "renewable_percentage": 17.8019742295}, {"Entity": "India", "Year": 2008, "renewable_percentage": 16.768266921}, {"Entity": "India", "Year": 2009, "renewable_percentage": 15.269804822}, {"Entity": "India", "Year": 2010, "renewable_percentage": 15.2122201244}, {"Entity": "India", "Year": 2011, "renewable_percentage": 16.7911025145}, {"Entity": "India", "Year": 2012, "renewable_percentage": 15.1350014654}, {"Entity": "India", "Year": 2013, "renewable_percentage": 16.3941577818}, {"Entity": "India", "Year": 2014, "renewable_percentage": 16.0092550039}, {"Entity": "India", "Year": 2015, "renewable_percentage": 15.3718720687}, {"Entity": "India", "Year": 2016, "renewable_percentage": 14.8548475703}, {"Entity": "India", "Year": 2017, "renewable_percentage": 15.9669920335}, {"Entity": "India", "Year": 2018, "renewable_percentage": 16.6949549709}, {"Entity": "India", "Year": 2019, "renewable_percentage": 18.6915426873}, {"Entity": "India", "Year": 2020, "renewable_percentage": 20.2059243238}, {"Entity": "Indonesia", "Year": 2000, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2001, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2002, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2003, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2004, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2005, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2006, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2007, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2008, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2009, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2010, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2011, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2012, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2013, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2014, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2015, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2016, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2017, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2018, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2019, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2020, "renewable_percentage": null}, {"Entity": "Italy", "Year": 2000, "renewable_percentage": 18.900241501}, {"Entity": "Italy", "Year": 2001, "renewable_percentage": 20.049431902}, {"Entity": "Italy", "Year": 2002, "renewable_percentage": 17.4555571614}, {"Entity": "Italy", "Year": 2003, "renewable_percentage": 16.4202116476}, {"Entity": "Italy", "Year": 2004, "renewable_percentage": 18.2749380999}, {"Entity": "Italy", "Year": 2005, "renewable_percentage": 16.3769782226}, {"Entity": "Italy", "Year": 2006, "renewable_percentage": 16.5128639906}, {"Entity": "Italy", "Year": 2007, "renewable_percentage": 15.5333485238}, {"Entity": "Italy", "Year": 2008, "renewable_percentage": 18.6112}, {"Entity": "Italy", "Year": 2009, "renewable_percentage": 24.0837332221}, {"Entity": "Italy", "Year": 2010, "renewable_percentage": 25.8400187976}, {"Entity": "Italy", "Year": 2011, "renewable_percentage": 27.6773203443}, {"Entity": "Italy", "Year": 2012, "renewable_percentage": 31.1049649217}, {"Entity": "Italy", "Year": 2013, "renewable_percentage": 39.0148744209}, {"Entity": "Italy", "Year": 2014, "renewable_percentage": 43.4976931949}, {"Entity": "Italy", "Year": 2015, "renewable_percentage": 38.7577860829}, {"Entity": "Italy", "Year": 2016, "renewable_percentage": 37.6079387187}, {"Entity": "Italy", "Year": 2017, "renewable_percentage": 35.4174479255}, {"Entity": "Italy", "Year": 2018, "renewable_percentage": 39.8100142663}, {"Entity": "Italy", "Year": 2019, "renewable_percentage": 39.7563068474}, {"Entity": "Italy", "Year": 2020, "renewable_percentage": 42.0397741576}, {"Entity": "Japan", "Year": 2000, "renewable_percentage": 10.5382436261}, {"Entity": "Japan", "Year": 2001, "renewable_percentage": 10.447653504}, {"Entity": "Japan", "Year": 2002, "renewable_percentage": 10.2477294843}, {"Entity": "Japan", "Year": 2003, "renewable_percentage": 11.6993698448}, {"Entity": "Japan", "Year": 2004, "renewable_percentage": 11.4198974767}, {"Entity": "Japan", "Year": 2005, "renewable_percentage": 9.9068127192}, {"Entity": "Japan", "Year": 2006, "renewable_percentage": 10.8554989442}, {"Entity": "Japan", "Year": 2007, "renewable_percentage": 9.3897588285}, {"Entity": "Japan", "Year": 2008, "renewable_percentage": 10.0196834738}, {"Entity": "Japan", "Year": 2009, "renewable_percentage": 10.4667464874}, {"Entity": "Japan", "Year": 2010, "renewable_percentage": 10.5269966826}, {"Entity": "Japan", "Year": 2011, "renewable_percentage": 11.1272421632}, {"Entity": "Japan", "Year": 2012, "renewable_percentage": 10.6143703421}, {"Entity": "Japan", "Year": 2013, "renewable_percentage": 11.7965798852}, {"Entity": "Japan", "Year": 2014, "renewable_percentage": 13.2719619718}, {"Entity": "Japan", "Year": 2015, "renewable_percentage": 15.6586817408}, {"Entity": "Japan", "Year": 2016, "renewable_percentage": 15.6920107068}, {"Entity": "Japan", "Year": 2017, "renewable_percentage": 17.3559698312}, {"Entity": "Japan", "Year": 2018, "renewable_percentage": 18.144181175}, {"Entity": "Japan", "Year": 2019, "renewable_percentage": 19.4223288251}, {"Entity": "Japan", "Year": 2020, "renewable_percentage": 21.324925062}, {"Entity": "Kazakhstan", "Year": 2000, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2001, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2002, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2003, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2004, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2005, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2006, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2007, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2008, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2009, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2010, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2011, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2012, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2013, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2014, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2015, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2016, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2017, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2018, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2019, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2020, "renewable_percentage": null}, {"Entity": "Mexico", "Year": 2000, "renewable_percentage": 22.9291160107}, {"Entity": "Mexico", "Year": 2001, "renewable_percentage": 19.6649599841}, {"Entity": "Mexico", "Year": 2002, "renewable_percentage": 17.4220963173}, {"Entity": "Mexico", "Year": 2003, "renewable_percentage": 15.8536585366}, {"Entity": "Mexico", "Year": 2004, "renewable_percentage": 17.3134463687}, {"Entity": "Mexico", "Year": 2005, "renewable_percentage": 18.2780827246}, {"Entity": "Mexico", "Year": 2006, "renewable_percentage": 18.4256091896}, {"Entity": "Mexico", "Year": 2007, "renewable_percentage": 17.2761561168}, {"Entity": "Mexico", "Year": 2008, "renewable_percentage": 21.5387105913}, {"Entity": "Mexico", "Year": 2009, "renewable_percentage": 16.5369729069}, {"Entity": "Mexico", "Year": 2010, "renewable_percentage": 19.4281608109}, {"Entity": "Mexico", "Year": 2011, "renewable_percentage": 18.0916357408}, {"Entity": "Mexico", "Year": 2012, "renewable_percentage": 16.5759438104}, {"Entity": "Mexico", "Year": 2013, "renewable_percentage": 15.5492898914}, {"Entity": "Mexico", "Year": 2014, "renewable_percentage": 19.8008201523}, {"Entity": "Mexico", "Year": 2015, "renewable_percentage": 17.5976903451}, {"Entity": "Mexico", "Year": 2016, "renewable_percentage": 17.4806943436}, {"Entity": "Mexico", "Year": 2017, "renewable_percentage": 18.0759526428}, {"Entity": "Mexico", "Year": 2018, "renewable_percentage": 17.703752786}, {"Entity": "Mexico", "Year": 2019, "renewable_percentage": 18.5487927565}, {"Entity": "Mexico", "Year": 2020, "renewable_percentage": 21.2552224134}, {"Entity": "Poland", "Year": 2000, "renewable_percentage": 1.6273222517}, {"Entity": "Poland", "Year": 2001, "renewable_percentage": 1.934316727}, {"Entity": "Poland", "Year": 2002, "renewable_percentage": 1.9439960699}, {"Entity": "Poland", "Year": 2003, "renewable_percentage": 1.4999000067}, {"Entity": "Poland", "Year": 2004, "renewable_percentage": 2.1016681991}, {"Entity": "Poland", "Year": 2005, "renewable_percentage": 2.4830699774}, {"Entity": "Poland", "Year": 2006, "renewable_percentage": 2.673730134}, {"Entity": "Poland", "Year": 2007, "renewable_percentage": 3.4256513785}, {"Entity": "Poland", "Year": 2008, "renewable_percentage": 4.2744438696}, {"Entity": "Poland", "Year": 2009, "renewable_percentage": 5.7515388179}, {"Entity": "Poland", "Year": 2010, "renewable_percentage": 6.9299363057}, {"Entity": "Poland", "Year": 2011, "renewable_percentage": 8.0547205693}, {"Entity": "Poland", "Year": 2012, "renewable_percentage": 10.4436057663}, {"Entity": "Poland", "Year": 2013, "renewable_percentage": 10.4081508145}, {"Entity": "Poland", "Year": 2014, "renewable_percentage": 12.5331481248}, {"Entity": "Poland", "Year": 2015, "renewable_percentage": 13.8151485631}, {"Entity": "Poland", "Year": 2016, "renewable_percentage": 13.7335179722}, {"Entity": "Poland", "Year": 2017, "renewable_percentage": 14.1999646913}, {"Entity": "Poland", "Year": 2018, "renewable_percentage": 12.7559148032}, {"Entity": "Poland", "Year": 2019, "renewable_percentage": 15.6157998037}, {"Entity": "Poland", "Year": 2020, "renewable_percentage": 17.9648720886}, {"Entity": "Saudi Arabia", "Year": 2000, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2001, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2002, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2003, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2004, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2005, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2006, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2007, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2008, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2009, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2010, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2011, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2012, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2013, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2014, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2015, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2016, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2017, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2018, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2019, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2020, "renewable_percentage": null}, {"Entity": "South Africa", "Year": 2000, "renewable_percentage": 0.9110805721}, {"Entity": "South Africa", "Year": 2001, "renewable_percentage": 1.2516536074}, {"Entity": "South Africa", "Year": 2002, "renewable_percentage": 1.3802249619}, {"Entity": "South Africa", "Year": 2003, "renewable_percentage": 0.545271261}, {"Entity": "South Africa", "Year": 2004, "renewable_percentage": 0.5827199439}, {"Entity": "South Africa", "Year": 2005, "renewable_percentage": 0.763458686}, {"Entity": "South Africa", "Year": 2006, "renewable_percentage": 1.3863060017}, {"Entity": "South Africa", "Year": 2007, "renewable_percentage": 0.5267209594}, {"Entity": "South Africa", "Year": 2008, "renewable_percentage": 0.6895692269}, {"Entity": "South Africa", "Year": 2009, "renewable_percentage": 0.8031088083}, {"Entity": "South Africa", "Year": 2010, "renewable_percentage": 1.0330068318}, {"Entity": "South Africa", "Year": 2011, "renewable_percentage": 1.0184465622}, {"Entity": "South Africa", "Year": 2012, "renewable_percentage": 0.6890826069}, {"Entity": "South Africa", "Year": 2013, "renewable_percentage": 0.6792168043}, {"Entity": "South Africa", "Year": 2014, "renewable_percentage": 1.4288129861}, {"Entity": "South Africa", "Year": 2015, "renewable_percentage": 2.6256790549}, {"Entity": "South Africa", "Year": 2016, "renewable_percentage": 3.2586126531}, {"Entity": "South Africa", "Year": 2017, "renewable_percentage": 4.2202606137}, {"Entity": "South Africa", "Year": 2018, "renewable_percentage": 5.1554655529}, {"Entity": "South Africa", "Year": 2019, "renewable_percentage": 5.3589699864}, {"Entity": "South Africa", "Year": 2020, "renewable_percentage": 5.780581212}, {"Entity": "Spain", "Year": 2000, "renewable_percentage": 15.6119862394}, {"Entity": "Spain", "Year": 2001, "renewable_percentage": 21.1524434719}, {"Entity": "Spain", "Year": 2002, "renewable_percentage": 13.8260180901}, {"Entity": "Spain", "Year": 2003, "renewable_percentage": 21.667314419}, {"Entity": "Spain", "Year": 2004, "renewable_percentage": 18.3190206468}, {"Entity": "Spain", "Year": 2005, "renewable_percentage": 14.8597342333}, {"Entity": "Spain", "Year": 2006, "renewable_percentage": 17.6623992413}, {"Entity": "Spain", "Year": 2007, "renewable_percentage": 19.3347262296}, {"Entity": "Spain", "Year": 2008, "renewable_percentage": 20.0051501593}, {"Entity": "Spain", "Year": 2009, "renewable_percentage": 25.4107639008}, {"Entity": "Spain", "Year": 2010, "renewable_percentage": 32.7922186819}, {"Entity": "Spain", "Year": 2011, "renewable_percentage": 30.0408415417}, {"Entity": "Spain", "Year": 2012, "renewable_percentage": 29.6047928652}, {"Entity": "Spain", "Year": 2013, "renewable_percentage": 39.5850357054}, {"Entity": "Spain", "Year": 2014, "renewable_percentage": 40.1032952644}, {"Entity": "Spain", "Year": 2015, "renewable_percentage": 34.9899091826}, {"Entity": "Spain", "Year": 2016, "renewable_percentage": 38.5818061138}, {"Entity": "Spain", "Year": 2017, "renewable_percentage": 32.220593624}, {"Entity": "Spain", "Year": 2018, "renewable_percentage": 38.2080329557}, {"Entity": "Spain", "Year": 2019, "renewable_percentage": 37.280815091}, {"Entity": "Spain", "Year": 2020, "renewable_percentage": 43.8108805298}, {"Entity": "Thailand", "Year": 2000, "renewable_percentage": 7.1261029822}, {"Entity": "Thailand", "Year": 2001, "renewable_percentage": 7.061527212}, {"Entity": "Thailand", "Year": 2002, "renewable_percentage": 7.9444772593}, {"Entity": "Thailand", "Year": 2003, "renewable_percentage": 7.6718362852}, {"Entity": "Thailand", "Year": 2004, "renewable_percentage": 6.5163549406}, {"Entity": "Thailand", "Year": 2005, "renewable_percentage": 6.0325203252}, {"Entity": "Thailand", "Year": 2006, "renewable_percentage": 7.5988547551}, {"Entity": "Thailand", "Year": 2007, "renewable_percentage": 7.7085852479}, {"Entity": "Thailand", "Year": 2008, "renewable_percentage": 6.5625458278}, {"Entity": "Thailand", "Year": 2009, "renewable_percentage": 6.6263303689}, {"Entity": "Thailand", "Year": 2010, "renewable_percentage": 5.7085828343}, {"Entity": "Thailand", "Year": 2011, "renewable_percentage": 8.039961941}, {"Entity": "Thailand", "Year": 2012, "renewable_percentage": 8.5396118358}, {"Entity": "Thailand", "Year": 2013, "renewable_percentage": 7.6765035487}, {"Entity": "Thailand", "Year": 2014, "renewable_percentage": 8.395728489}, {"Entity": "Thailand", "Year": 2015, "renewable_percentage": 7.9949619145}, {"Entity": "Thailand", "Year": 2016, "renewable_percentage": 8.9840234023}, {"Entity": "Thailand", "Year": 2017, "renewable_percentage": 10.9570957096}, {"Entity": "Thailand", "Year": 2018, "renewable_percentage": 14.1900054915}, {"Entity": "Thailand", "Year": 2019, "renewable_percentage": 14.7001731284}, {"Entity": "Thailand", "Year": 2020, "renewable_percentage": 13.7963737796}, {"Entity": "Ukraine", "Year": 2000, "renewable_percentage": 6.5860921352}, {"Entity": "Ukraine", "Year": 2001, "renewable_percentage": 6.9729761009}, {"Entity": "Ukraine", "Year": 2002, "renewable_percentage": 5.5597165409}, {"Entity": "Ukraine", "Year": 2003, "renewable_percentage": 5.1442841287}, {"Entity": "Ukraine", "Year": 2004, "renewable_percentage": 6.4718162839}, {"Entity": "Ukraine", "Year": 2005, "renewable_percentage": 6.6698940347}, {"Entity": "Ukraine", "Year": 2006, "renewable_percentage": 6.68633235}, {"Entity": "Ukraine", "Year": 2007, "renewable_percentage": 5.3380238605}, {"Entity": "Ukraine", "Year": 2008, "renewable_percentage": 6.1377090041}, {"Entity": "Ukraine", "Year": 2009, "renewable_percentage": 6.980762585}, {"Entity": "Ukraine", "Year": 2010, "renewable_percentage": 7.0914098083}, {"Entity": "Ukraine", "Year": 2011, "renewable_percentage": 5.7450628366}, {"Entity": "Ukraine", "Year": 2012, "renewable_percentage": 5.6614236741}, {"Entity": "Ukraine", "Year": 2013, "renewable_percentage": 7.8003200661}, {"Entity": "Ukraine", "Year": 2014, "renewable_percentage": 5.5885262117}, {"Entity": "Ukraine", "Year": 2015, "renewable_percentage": 4.3924771096}, {"Entity": "Ukraine", "Year": 2016, "renewable_percentage": 5.6797249171}, {"Entity": "Ukraine", "Year": 2017, "renewable_percentage": 7.0457194664}, {"Entity": "Ukraine", "Year": 2018, "renewable_percentage": 8.228528092}, {"Entity": "Ukraine", "Year": 2019, "renewable_percentage": 7.7754487096}, {"Entity": "Ukraine", "Year": 2020, "renewable_percentage": 11.8440577364}, {"Entity": "United Kingdom", "Year": 2000, "renewable_percentage": 2.6657406913}, {"Entity": "United Kingdom", "Year": 2001, "renewable_percentage": 2.5001961451}, {"Entity": "United Kingdom", "Year": 2002, "renewable_percentage": 2.8939157566}, {"Entity": "United Kingdom", "Year": 2003, "renewable_percentage": 2.6854802003}, {"Entity": "United Kingdom", "Year": 2004, "renewable_percentage": 3.6136880575}, {"Entity": "United Kingdom", "Year": 2005, "renewable_percentage": 4.2815234434}, {"Entity": "United Kingdom", "Year": 2006, "renewable_percentage": 4.6029890199}, {"Entity": "United Kingdom", "Year": 2007, "renewable_percentage": 5.0104331009}, {"Entity": "United Kingdom", "Year": 2008, "renewable_percentage": 5.6776842324}, {"Entity": "United Kingdom", "Year": 2009, "renewable_percentage": 6.7679854187}, {"Entity": "United Kingdom", "Year": 2010, "renewable_percentage": 6.9092924441}, {"Entity": "United Kingdom", "Year": 2011, "renewable_percentage": 9.6422505889}, {"Entity": "United Kingdom", "Year": 2012, "renewable_percentage": 11.4273047189}, {"Entity": "United Kingdom", "Year": 2013, "renewable_percentage": 14.9727052732}, {"Entity": "United Kingdom", "Year": 2014, "renewable_percentage": 19.2476358104}, {"Entity": "United Kingdom", "Year": 2015, "renewable_percentage": 24.6227709191}, {"Entity": "United Kingdom", "Year": 2016, "renewable_percentage": 24.6788390627}, {"Entity": "United Kingdom", "Year": 2017, "renewable_percentage": 29.4986571173}, {"Entity": "United Kingdom", "Year": 2018, "renewable_percentage": 33.2919818457}, {"Entity": "United Kingdom", "Year": 2019, "renewable_percentage": 37.4568630499}, {"Entity": "United Kingdom", "Year": 2020, "renewable_percentage": 42.8603962651}, {"Entity": "United States", "Year": 2000, "renewable_percentage": 9.2298992662}, {"Entity": "United States", "Year": 2001, "renewable_percentage": 7.5132056541}, {"Entity": "United States", "Year": 2002, "renewable_percentage": 8.749216358}, {"Entity": "United States", "Year": 2003, "renewable_percentage": 9.0252110397}, {"Entity": "United States", "Year": 2004, "renewable_percentage": 8.7334100887}, {"Entity": "United States", "Year": 2005, "renewable_percentage": 8.7494640631}, {"Entity": "United States", "Year": 2006, "renewable_percentage": 9.4184742052}, {"Entity": "United States", "Year": 2007, "renewable_percentage": 8.3984096829}, {"Entity": "United States", "Year": 2008, "renewable_percentage": 9.180943292}, {"Entity": "United States", "Year": 2009, "renewable_percentage": 10.547689996}, {"Entity": "United States", "Year": 2010, "renewable_percentage": 10.3180892283}, {"Entity": "United States", "Year": 2011, "renewable_percentage": 12.4665249812}, {"Entity": "United States", "Year": 2012, "renewable_percentage": 12.1841179804}, {"Entity": "United States", "Year": 2013, "renewable_percentage": 12.8311154179}, {"Entity": "United States", "Year": 2014, "renewable_percentage": 13.3503092033}, {"Entity": "United States", "Year": 2015, "renewable_percentage": 13.6261352256}, {"Entity": "United States", "Year": 2016, "renewable_percentage": 15.2880936304}, {"Entity": "United States", "Year": 2017, "renewable_percentage": 17.4515276472}, {"Entity": "United States", "Year": 2018, "renewable_percentage": 17.4499886946}, {"Entity": "United States", "Year": 2019, "renewable_percentage": 18.2946243489}, {"Entity": "United States", "Year": 2020, "renewable_percentage": 20.3156921037}], "metadata": {"Entity": {"type": "string", "semanticType": "Location"}, "Year": {"type": "number", "semanticType": "Year"}, "renewable_percentage": {"type": "number", "semanticType": "Percentage"}}, "derive": {"code": "\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total energy for each row\n df_energy['total_energy'] = (\n df_energy['Electricity from fossil fuels (TWh)'] + \n df_energy['Electricity from nuclear (TWh)'] + \n df_energy['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_energy['renewable_percentage'] = (\n df_energy['Electricity from renewables (TWh)'] / df_energy['total_energy'] * 100\n )\n \n # Handle division by zero cases\n df_energy['renewable_percentage'] = df_energy['renewable_percentage'].replace([np.inf, -np.inf], np.nan)\n \n # Select only the required columns\n transformed_df = df_energy[['Year', 'Entity', 'renewable_percentage']].copy()\n \n return transformed_df\n", "source": ["global-energy-20-small.csv"], "dialog": [{"content": "You are a data scientist to help user to transform data that will be used for visualization.\nThe user will provide you information about what data would be needed, and your job is to create a python function based on the input data summary, transformation instruction and expected fields.\nThe users' instruction includes \"chart_type\" and \"chart_encodings\" that describe the visualization they want, and natural language instructions \"goal\" that describe what data is needed.\n\n**Important:**\n- NEVER make assumptions or judgments about a person's gender, biological sex, sexuality, religion, race, nationality, ethnicity, political stance, socioeconomic status, mental health, invisible disabilities, medical conditions, personality type, social impressions, emotional state, and cognitive state.\n- NEVER create formulas that could be used to discriminate based on age. Ageism of any form (explicit and implicit) is strictly prohibited.\n- If above issue occurs, generate columns with np.nan.\n\nConcretely, you should first refine users' goal and then create a python function in the output section based off the [CONTEXT] and [GOAL]:\n\n 1. First, refine users' [GOAL]. The main objective in this step is to check if \"chart_type\" and \"chart_encodings\" provided by the user are sufficient to achieve their \"goal\". Concretely:\n - based on the user's \"goal\" and \"chart_type\" and \"chart_encodings\", elaborate the goal into a \"detailed_instruction\".\n - \"display_instruction\" is a short verb phrase describing the users' goal. \n - it would be a short verbal description of user intent as a verb phrase (<12 words).\n - generate it based on detailed_instruction and the suggested chart_type and chart_encodings, but don't need to mention the chart details.\n - should capture key computation ideas: by reading the display, the user can understand the purpose and what's derived from the data.\n - if the user specification follows up the previous instruction, the 'display_instruction' should only describe how it builds up the previous instruction without repeating information from previous steps.\n - the phrase can be presented in different styles, e.g., question (what's xxx), instruction (show xxx), description, etc.\n - if you mention column names from the input or the output data, highlight the text in **bold**.\n * the column can either be a column in the input data, or a new column that will be computed in the output data.\n * the mention don't have to be exact match, it can be semantically matching, e.g., if you mentioned \"average score\" in the text while the column to be computed is \"Avg_Score\", you should still highlight \"**average score**\" in the text.\n - determine \"output_fields\", the desired fields that the output data should have to achieve the user's goal, it's a good idea to include intermediate fields here.\n - then decide \"chart_encodings\", which maps visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized, \n - the \"chart_encodings\" should be created to support the user's \"chart_type\".\n - first, determine whether the user has provided sufficient fields in \"chart_encodings\" that are needed to achieve their goal:\n - if the user's \"chart_encodings\" are sufficient, simply copy it.\n - if the user didn't provide sufficient fields in \"chart_encodings\", add missing fields in \"chart_encodings\" (ordered them based on whether the field will be used in x,y axes or legends);\n - \"chart_encodings\" should only include fields that will be visualized (do not include other intermediate fields from \"output_fields\") \n - when adding new fields to \"chart_encodings\", be efficient and add only a minimal number of fields that are needed to achive the user's goal. \n - generally, the total number of fields in \"chart_encodings\" should be no more than 3 for x,y,legend.\n - if the user's \"chart_encodings\" is sufficient but can be optimized, you can reorder encodings to visualize the data more effectively.\n - sometimes, user may provide instruction to update visualizations fields they provided. You should leverage the user's goal to resolve the conflict and decide the final \"chart_encodings\"\n - e.g., they may mention \"use B metric instead\" while A metric is in provided fields, in this case, you should update \"chart_encodings\" to update A metric with B metric.\n - guide on statistical analysis:\n - when the user asks for forecasting or regression analysis, you should consider the following:\n - the output should be a long format table where actual x, y pairs and predicted x, y pairs are included in the X, Y columns, they are differentiated with a third column \"is_predicted\" that is a boolean field.\n - i.e., if the user ask for forecasting based on two columns T and Y, the output should be three columns: T, Y, is_predicted, where\n - T, Y columns contain BOTH original values from the data and predicted values from the data.\n - is_predicted is a boolean field to indicate whether the x, y pairs are original values from the data or predicted / regression values from the data.\n - the recommended chart should be line chart (time series) or scatter plot (quantitative x, y)\n - if the user asks for forecasting, it's good to include predicted x, y pairs for both x in the original data and future x values (i.e., combine regression and forecasting results)\n - in this case, is_predicted should be of three values 'original', 'regression', 'forecasting'\n - when the user asks for clustering:\n - the output should be a long format table where actual x, y pairs with a third column \"cluster_id\" that indicates the cluster id of the data point.\n - the recommended chart should be scatter plot (quantitative x, y)\n \n Prepare the result in the following json format:\n\n```\n{\n \"detailed_instruction\": \"...\" // string, elaborate user instruction with details if the user\n \"display_instruction\": \"...\" // string, the short verb phrase describing the users' goal.\n \"output_fields\": [...] // string[], describe the desired output fields that the output data should have based on the user's goal, it's a good idea to preserve intermediate fields here (i.e., the goal of transformed data)\n \"chart_encodings\": {\n \"x\": \"\",\n \"y\": \"\",\n \"color\": \"\",\n \"size\": \"\",\n \"opacity\": \"\",\n \"facet\": \"\",\n ... // other visualization channels user used\n } // object: map visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized.\n \"reason\": \"...\" // string, explain why this refinement is made\n}\n```\n\n 2. Then, write a python function based on the refined goal, the function input is a dataframe \"df\" (or multiple dataframes based on tables presented in the [CONTEXT] section) and the output is the transformed dataframe \"transformed_df\". \"transformed_df\" should contain all \"output_fields\" from the refined goal.\nThe python function must follow the template provided in [TEMPLATE], do not import any other libraries or modify function name. The function should be as simple as possible and easily readable.\nIf there is no data transformation needed based on \"output_fields\", the transformation function can simply \"return df\".\n\n[TEMPLATE]\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\nfrom sklearn import ... # import necessary libraries from sklearn if needed\n\ndef transform_data(df1, df2, ...): \n # complete the template here\n return transformed_df\n```\n\nnote: \n- if the user provided one table, then it should be `def transform_data(df1)`, if the user provided multiple tables, then it should be `def transform_data(df1, df2, ...)` and you should consider the join between tables to derive the output.\n- **VERY IMPORTANT** the number of arguments in the function must match the number of tables provided, and the order of arguments must match the order of tables provided.\n- try to use intuitive table names to refer to the input dataframes, for example, if the user provided two tables city and weather, you can use `transform_data(df_city, df_weather)` to refer to the two dataframes, as long as the number and order of the arguments match the number and order of the tables provided.\n- datetime objects handling:\n - if the output field is year, convert it to number, if it is year-month / year-month-day, convert it to string object (e.g., \"2020-01\" / \"2020-01-01\").\n - if the output is time only: convert hour to number if it's just the hour (e.g., 10), but convert hour:min or h:m:s to string object (e.g., \"10:30\", \"10:30:45\")\n - never return datetime object directly, convert it to either number (if it only contains year) or string so it's readable.\n\n 3. The output must only contain a json object representing the refined goal and a python code block representing the transformation code, do not add any extra text explanation.\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 158029.9988, 190729.9957, 209229.9957, ..., 6993180.176, 9861099.609, 9874660.156, 10096009.77\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 90.14, 91.79, 92.54, 804.95\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[GOAL]\n\n{\n \"instruction\": \"\",\n \"chart_type\": \"Line Chart\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Transform the data into a long format where different energy sources (fossil fuels, nuclear, renewables) are unpivoted into separate rows. Each row should contain the year, entity (country), energy source type, and the corresponding energy value in TWh. This will enable visualization of energy generation trends over time by source and country.\",\n \"display_instruction\": \"Show **energy generation** trends by **source** and **Entity** over **Year**\",\n \"output_fields\": [\"Year\", \"Entity\", \"source\", \"energy\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n },\n \"reason\": \"The user wants to visualize energy data across different sources with a line chart faceted by source. The original data has energy sources in separate columns (fossil fuels, nuclear, renewables), so we need to unpivot these columns into a long format with 'source' and 'energy' fields. The chart_encodings provided by the user are appropriate for this visualization goal, using faceting (column) to separate different energy sources.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Unpivot the energy source columns into long format\n transformed_df = pd.melt(\n df_energy,\n id_vars=['Year', 'Entity'],\n value_vars=[\n 'Electricity from fossil fuels (TWh)',\n 'Electricity from nuclear (TWh)',\n 'Electricity from renewables (TWh)'\n ],\n var_name='source',\n value_name='energy'\n )\n \n # Clean up the source names to be more readable\n transformed_df['source'] = transformed_df['source'].str.replace('Electricity from ', '').str.replace(' (TWh)', '')\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Entity Year energy source\n0 Australia 2000 181.05 fossil fuels\n1 Australia 2001 194.33 fossil fuels\n2 Australia 2002 197.29 fossil fuels\n3 Australia 2003 195.13 fossil fuels\n4 Australia 2004 203.66 fossil fuels\n5 Australia 2005 195.95 fossil fuels\n6 Australia 2006 198.72 fossil fuels\n7 Australia 2007 208.59 fossil fuels\n8 Australia 2008 211.06 fossil fuels\n9 Australia 2009 216.42 fossil fuels\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show renewable percentage for each country\",\n \"chart_type\": \"Line Chart\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Calculate the renewable energy percentage for each country and year. The renewable percentage is computed as the ratio of renewable energy to total energy (sum of fossil fuels, nuclear, and renewables) multiplied by 100. The output should contain year, entity (country), and the renewable percentage value.\",\n \"display_instruction\": \"Show **renewable percentage** for each **Entity** over **Year**\",\n \"output_fields\": [\"Year\", \"Entity\", \"renewable_percentage\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"renewable_percentage\",\n \"color\": \"Entity\"\n },\n \"reason\": \"The user wants to see renewable percentage trends, which requires calculating the proportion of renewable energy relative to total energy generation. This is a single metric per country per year, so we don't need the 'column' faceting by source anymore. The chart_encodings are updated to reflect that we're now showing a percentage metric rather than absolute energy values by source.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total energy for each row\n df_energy['total_energy'] = (\n df_energy['Electricity from fossil fuels (TWh)'] + \n df_energy['Electricity from nuclear (TWh)'] + \n df_energy['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_energy['renewable_percentage'] = (\n df_energy['Electricity from renewables (TWh)'] / df_energy['total_energy'] * 100\n )\n \n # Handle division by zero cases\n df_energy['renewable_percentage'] = df_energy['renewable_percentage'].replace([np.inf, -np.inf], np.nan)\n \n # Select only the required columns\n transformed_df = df_energy[['Year', 'Entity', 'renewable_percentage']].copy()\n \n return transformed_df\n```", "role": "assistant"}], "trigger": {"tableId": "table-82", "resultTableId": "table-45", "chart": {"id": "chart-1760738416626", "chartType": "Line Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"fieldID": "concept-1760738385163"}, "color": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "opacity": {"channel": "opacity", "bin": false}, "column": {"fieldID": "concept-1760738389404", "sortBy": "[\"fossil fuels\",\"nuclear\",\"renewables\"]"}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-82", "saved": false, "source": "trigger", "unread": false}, "interaction": [{"from": "user", "to": "datatransform-agent", "role": "instruction", "content": "show renewable percentage for each country", "displayContent": "Show **renewable percentage** for each **Entity** over **Year**"}]}, "explanation": {"agent": "CodeExplanationAgent", "code": "The code transforms global energy data to analyze renewable energy adoption:\n\n- Calculates **total_energy** by summing electricity generation from three sources: **fossil fuels**, **nuclear**, and **renewables** (all measured in TWh)\n- Computes **renewable_percentage** by dividing **Electricity from renewables (TWh)** by **total_energy** and multiplying by **100**\n- Handles edge cases by replacing infinite values (from division by zero) with **NaN**\n- Filters the dataset to retain only **Year**, **Entity** (country), and **renewable_percentage** columns", "concepts": [{"explanation": "The proportion of a country's total electricity generation that comes from renewable sources, expressed as a percentage. Calculated as: \\( \\frac{\\text{Electricity from renewables}}{\\text{Total electricity generation}} \\times 100 \\). This metric indicates the degree of renewable energy adoption in each country's energy mix.", "field": "renewable_percentage"}], "dialog": [{"content": "You are a data scientist to help user explain code, \nso that a non-code can clearly understand what the code is doing, you are provided with a summary of the input data, and the transformation code.\n\nYour goal:\n1. You should generate a good itemized explanation of the code so that the reader can understand high-level steps of what the data transformation is doing.\n - Be very concise, and stay at a high-level. The reader doesn't understand code and does not want to learn exactly what the code is doing. They just want to learn what have been done from a logical level.\n - The explanation should be a markdown string that is a list of bullet points (with new lines), highlight constants, data fields, and important verbs.\n2. Generate a list of explanations for new fields (fields not from the input data) that introduce metrics/concepts that are not obvious from the code.\n - provide a declarative definition that explains the new field, use a mathematical notation if applicable.\n - only include new fields explanation of new metrics that are involved in computation (e.g., ROI, commerical_success_score)\n - *DO NOT* explain trivial new fields like \"Decade\" or \"Avg_Rating\", \"US_Sales\" that are self-explanatory.\n - Avoid explaining fields that are simple aggregate of fields in the original data (min_score, avg_value, count, etc.)\n - When a field involves mathematical computation, you can use LaTeX math notation in the explanation. Format mathematical expressions using:\n - Inline math: `\\( ... \\)` for formulas within text\n - Block math: `\\[ ... \\]` for standalone formulas\n - Examples: `\\( \\frac{\\text{Revenue}}{\\text{Cost}} \\)` for ratios, `\\[ \\text{Score} = \\text{Rating} \\times \\text{Worldwide\\_Gross} \\]` for formulas\n - note: when using underscores as part of the text, you need to escape them with a backslash, e.g., `\\_`\n - Note: don't use math notation for fields whose computation is trivial (use plain english), it will likely be confusing to the reader. \n Only use math notation for fields that can not be easilyexplained in plain english. Use it sparingly.\n3. If there are multiple fields that have the similar computation, you can explain them together in one explanation.\n - in \"field\", you can provide a list of fields in format of \"field1, field2, ...\"\n - in \"explanation\", you can provide a single explanation for the computation of the fields.\n - for example, if you have fields like \"Norm_Rating\", \"Norm_Gross\", \"Critical_Commercial_Score\", you can explain Norm_Rating, Norm_Gross together in one explanation and explain Critical_Commercial_Score in another explanation.\n4. If the code is about statistical analysis, you should explain the statistical analysis in the explanation as a concept named \"Statistical Analysis\" in the [CONCEPTS EXPLANATION] section.\n - explain how you model the data, which fields are used, how data processing is done, and what models are used.\n - suggest some other modeling approaches that can be used to analyze the data in the explanation as well.\n \nThe focus is to explain how new fields are computed, don't generate explanation for low-level actions like \"return\", \"load data\" etc. \n\nProvide the result in the following two sections:\n - first section is the code explanation that should be a markdown block explaining the code, in the [CODE EXPLANATION] section.\n - remember to highlight constants, data fields, and important verbs in the code explanation.\n - second section is the concepts explanation that should be a json block (start with ```json) in the [CONCEPTS EXPLANATION] section.\n\n[CODE EXPLANATION]\n\n...(explanation of the code)\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"...\",\n \"explanation\": \"...\"\n }\n]\n\n```\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small_csv)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 5738290.039, 9861099.609, 10502929.69, 10707219.73\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., nan, nan, nan, nan\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[CODE]\n\nhere is the transformation code: \nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total energy for each row\n df_energy['total_energy'] = (\n df_energy['Electricity from fossil fuels (TWh)'] + \n df_energy['Electricity from nuclear (TWh)'] + \n df_energy['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_energy['renewable_percentage'] = (\n df_energy['Electricity from renewables (TWh)'] / df_energy['total_energy'] * 100\n )\n \n # Handle division by zero cases\n df_energy['renewable_percentage'] = df_energy['renewable_percentage'].replace([np.inf, -np.inf], np.nan)\n \n # Select only the required columns\n transformed_df = df_energy[['Year', 'Entity', 'renewable_percentage']].copy()\n \n return transformed_df\n\n\n[EXPLANATION]\n", "role": "user"}, {"content": "[CODE EXPLANATION]\n\nThe code transforms global energy data to analyze renewable energy adoption:\n\n- Calculates **total_energy** by summing electricity generation from three sources: **fossil fuels**, **nuclear**, and **renewables** (all measured in TWh)\n- Computes **renewable_percentage** by dividing **Electricity from renewables (TWh)** by **total_energy** and multiplying by **100**\n- Handles edge cases by replacing infinite values (from division by zero) with **NaN**\n- Filters the dataset to retain only **Year**, **Entity** (country), and **renewable_percentage** columns\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"renewable_percentage\",\n \"explanation\": \"The proportion of a country's total electricity generation that comes from renewable sources, expressed as a percentage. Calculated as: \\\\( \\\\frac{\\\\text{Electricity from renewables}}{\\\\text{Total electricity generation}} \\\\times 100 \\\\). This metric indicates the degree of renewable energy adoption in each country's energy mix.\"\n }\n]\n```", "role": "assistant"}], "status": "ok"}, "status": "completed"}, "anchored": false, "attachedMetadata": ""}, {"kind": "table", "id": "table-78", "displayId": "renewable-energy-rank", "names": ["Entity", "Year", "rank", "renewable_percentage"], "rows": [{"Entity": "Australia", "Year": 2000, "rank": 11, "renewable_percentage": 8.6344368187}, {"Entity": "Australia", "Year": 2001, "rank": 10, "renewable_percentage": 8.2180135078}, {"Entity": "Australia", "Year": 2002, "rank": 11, "renewable_percentage": 8.0833022736}, {"Entity": "Australia", "Year": 2003, "rank": 11, "renewable_percentage": 8.6598324205}, {"Entity": "Australia", "Year": 2004, "rank": 12, "renewable_percentage": 8.7013045232}, {"Entity": "Australia", "Year": 2005, "rank": 11, "renewable_percentage": 9.1562355123}, {"Entity": "Australia", "Year": 2006, "rank": 11, "renewable_percentage": 9.6357600837}, {"Entity": "Australia", "Year": 2007, "rank": 11, "renewable_percentage": 9.1190310213}, {"Entity": "Australia", "Year": 2008, "rank": 12, "renewable_percentage": 8.0548900022}, {"Entity": "Australia", "Year": 2009, "rank": 12, "renewable_percentage": 7.8043793133}, {"Entity": "Australia", "Year": 2010, "rank": 12, "renewable_percentage": 9.0442152121}, {"Entity": "Australia", "Year": 2011, "rank": 11, "renewable_percentage": 11.3454273735}, {"Entity": "Australia", "Year": 2012, "rank": 12, "renewable_percentage": 11.4105750279}, {"Entity": "Australia", "Year": 2013, "rank": 11, "renewable_percentage": 14.8708583355}, {"Entity": "Australia", "Year": 2014, "rank": 11, "renewable_percentage": 14.9621290509}, {"Entity": "Australia", "Year": 2015, "rank": 12, "renewable_percentage": 14.3476000693}, {"Entity": "Australia", "Year": 2016, "rank": 11, "renewable_percentage": 15.6093794449}, {"Entity": "Australia", "Year": 2017, "rank": 12, "renewable_percentage": 16.3138729943}, {"Entity": "Australia", "Year": 2018, "rank": 12, "renewable_percentage": 17.145938174}, {"Entity": "Australia", "Year": 2019, "rank": 8, "renewable_percentage": 21.3759705435}, {"Entity": "Australia", "Year": 2020, "rank": 8, "renewable_percentage": 25.5031684668}, {"Entity": "Brazil", "Year": 2000, "rank": 1, "renewable_percentage": 90.1307723743}, {"Entity": "Brazil", "Year": 2001, "rank": 1, "renewable_percentage": 84.6953615744}, {"Entity": "Brazil", "Year": 2002, "rank": 1, "renewable_percentage": 86.0883364189}, {"Entity": "Brazil", "Year": 2003, "rank": 1, "renewable_percentage": 87.4561159097}, {"Entity": "Brazil", "Year": 2004, "rank": 1, "renewable_percentage": 86.4260041451}, {"Entity": "Brazil", "Year": 2005, "rank": 1, "renewable_percentage": 87.6781562721}, {"Entity": "Brazil", "Year": 2006, "rank": 1, "renewable_percentage": 87.2842473236}, {"Entity": "Brazil", "Year": 2007, "rank": 1, "renewable_percentage": 88.7252098726}, {"Entity": "Brazil", "Year": 2008, "rank": 1, "renewable_percentage": 84.8072313004}, {"Entity": "Brazil", "Year": 2009, "rank": 1, "renewable_percentage": 89.4172280725}, {"Entity": "Brazil", "Year": 2010, "rank": 1, "renewable_percentage": 85.3576882415}, {"Entity": "Brazil", "Year": 2011, "rank": 1, "renewable_percentage": 87.6618820986}, {"Entity": "Brazil", "Year": 2012, "rank": 1, "renewable_percentage": 83.1164558813}, {"Entity": "Brazil", "Year": 2013, "rank": 1, "renewable_percentage": 77.5240022006}, {"Entity": "Brazil", "Year": 2014, "rank": 1, "renewable_percentage": 74.0418657409}, {"Entity": "Brazil", "Year": 2015, "rank": 1, "renewable_percentage": 75.0231817625}, {"Entity": "Brazil", "Year": 2016, "rank": 1, "renewable_percentage": 81.0938046902}, {"Entity": "Brazil", "Year": 2017, "rank": 1, "renewable_percentage": 79.9091472228}, {"Entity": "Brazil", "Year": 2018, "rank": 1, "renewable_percentage": 82.9198505403}, {"Entity": "Brazil", "Year": 2019, "rank": 1, "renewable_percentage": 82.8548799017}, {"Entity": "Brazil", "Year": 2020, "rank": 1, "renewable_percentage": 84.6411771408}, {"Entity": "Canada", "Year": 2000, "rank": 2, "renewable_percentage": 61.8095917882}, {"Entity": "Canada", "Year": 2001, "rank": 2, "renewable_percentage": 59.3287558747}, {"Entity": "Canada", "Year": 2002, "rank": 2, "renewable_percentage": 61.1477403113}, {"Entity": "Canada", "Year": 2003, "rank": 2, "renewable_percentage": 60.0789685174}, {"Entity": "Canada", "Year": 2004, "rank": 2, "renewable_percentage": 59.6967771845}, {"Entity": "Canada", "Year": 2005, "rank": 2, "renewable_percentage": 60.8208155391}, {"Entity": "Canada", "Year": 2006, "rank": 2, "renewable_percentage": 60.8271602855}, {"Entity": "Canada", "Year": 2007, "rank": 2, "renewable_percentage": 61.2460642446}, {"Entity": "Canada", "Year": 2008, "rank": 2, "renewable_percentage": 62.6520720838}, {"Entity": "Canada", "Year": 2009, "rank": 2, "renewable_percentage": 63.8919227732}, {"Entity": "Canada", "Year": 2010, "rank": 2, "renewable_percentage": 62.9421470558}, {"Entity": "Canada", "Year": 2011, "rank": 2, "renewable_percentage": 64.0922915917}, {"Entity": "Canada", "Year": 2012, "rank": 2, "renewable_percentage": 65.098730952}, {"Entity": "Canada", "Year": 2013, "rank": 2, "renewable_percentage": 65.4320794066}, {"Entity": "Canada", "Year": 2014, "rank": 2, "renewable_percentage": 64.791145907}, {"Entity": "Canada", "Year": 2015, "rank": 2, "renewable_percentage": 65.2946239925}, {"Entity": "Canada", "Year": 2016, "rank": 2, "renewable_percentage": 66.1890584295}, {"Entity": "Canada", "Year": 2017, "rank": 2, "renewable_percentage": 67.5399410579}, {"Entity": "Canada", "Year": 2018, "rank": 2, "renewable_percentage": 67.3685700357}, {"Entity": "Canada", "Year": 2019, "rank": 2, "renewable_percentage": 67.1741623137}, {"Entity": "Canada", "Year": 2020, "rank": 2, "renewable_percentage": 68.7796436354}, {"Entity": "China", "Year": 2000, "rank": 5, "renewable_percentage": 16.639126586}, {"Entity": "China", "Year": 2001, "rank": 6, "renewable_percentage": 18.9581237042}, {"Entity": "China", "Year": 2002, "rank": 3, "renewable_percentage": 17.6185006046}, {"Entity": "China", "Year": 2003, "rank": 6, "renewable_percentage": 15.0362717081}, {"Entity": "China", "Year": 2004, "rank": 6, "renewable_percentage": 16.2224108273}, {"Entity": "China", "Year": 2005, "rank": 5, "renewable_percentage": 16.1731179957}, {"Entity": "China", "Year": 2006, "rank": 7, "renewable_percentage": 15.5884036124}, {"Entity": "China", "Year": 2007, "rank": 7, "renewable_percentage": 15.2583847828}, {"Entity": "China", "Year": 2008, "rank": 5, "renewable_percentage": 19.0253335469}, {"Entity": "China", "Year": 2009, "rank": 5, "renewable_percentage": 17.8857170547}, {"Entity": "China", "Year": 2010, "rank": 6, "renewable_percentage": 18.7800759915}, {"Entity": "China", "Year": 2011, "rank": 7, "renewable_percentage": 16.8902341543}, {"Entity": "China", "Year": 2012, "rank": 6, "renewable_percentage": 20.122965176}, {"Entity": "China", "Year": 2013, "rank": 6, "renewable_percentage": 20.2152481955}, {"Entity": "China", "Year": 2014, "rank": 6, "renewable_percentage": 22.3502204285}, {"Entity": "China", "Year": 2015, "rank": 7, "renewable_percentage": 24.079270189}, {"Entity": "China", "Year": 2016, "rank": 6, "renewable_percentage": 25.0007798429}, {"Entity": "China", "Year": 2017, "rank": 7, "renewable_percentage": 25.419242299}, {"Entity": "China", "Year": 2018, "rank": 7, "renewable_percentage": 25.7747942589}, {"Entity": "China", "Year": 2019, "rank": 7, "renewable_percentage": 26.9995671106}, {"Entity": "China", "Year": 2020, "rank": 7, "renewable_percentage": 28.2464606924}, {"Entity": "France", "Year": 2000, "rank": 8, "renewable_percentage": 12.7117691154}, {"Entity": "France", "Year": 2001, "rank": 7, "renewable_percentage": 13.9961372206}, {"Entity": "France", "Year": 2002, "rank": 8, "renewable_percentage": 11.3544157067}, {"Entity": "France", "Year": 2003, "rank": 9, "renewable_percentage": 10.9783540506}, {"Entity": "France", "Year": 2004, "rank": 9, "renewable_percentage": 11.0051305559}, {"Entity": "France", "Year": 2005, "rank": 10, "renewable_percentage": 9.6479837153}, {"Entity": "France", "Year": 2006, "rank": 10, "renewable_percentage": 10.7235915493}, {"Entity": "France", "Year": 2007, "rank": 9, "renewable_percentage": 11.4370075239}, {"Entity": "France", "Year": 2008, "rank": 9, "renewable_percentage": 12.7487441615}, {"Entity": "France", "Year": 2009, "rank": 9, "renewable_percentage": 12.8776856068}, {"Entity": "France", "Year": 2010, "rank": 9, "renewable_percentage": 13.6240072491}, {"Entity": "France", "Year": 2011, "rank": 10, "renewable_percentage": 11.63553049}, {"Entity": "France", "Year": 2012, "rank": 9, "renewable_percentage": 15.0331522889}, {"Entity": "France", "Year": 2013, "rank": 7, "renewable_percentage": 17.2469424928}, {"Entity": "France", "Year": 2014, "rank": 9, "renewable_percentage": 16.6074992494}, {"Entity": "France", "Year": 2015, "rank": 9, "renewable_percentage": 16.002230276}, {"Entity": "France", "Year": 2016, "rank": 8, "renewable_percentage": 17.7212924013}, {"Entity": "France", "Year": 2017, "rank": 11, "renewable_percentage": 16.6576751547}, {"Entity": "France", "Year": 2018, "rank": 8, "renewable_percentage": 19.7315179827}, {"Entity": "France", "Year": 2019, "rank": 9, "renewable_percentage": 20.0116665488}, {"Entity": "France", "Year": 2020, "rank": 9, "renewable_percentage": 23.7610241821}, {"Entity": "Germany", "Year": 2000, "rank": 14, "renewable_percentage": 6.1977983575}, {"Entity": "Germany", "Year": 2001, "rank": 14, "renewable_percentage": 6.5132585197}, {"Entity": "Germany", "Year": 2002, "rank": 13, "renewable_percentage": 7.6431369854}, {"Entity": "Germany", "Year": 2003, "rank": 12, "renewable_percentage": 7.7455438643}, {"Entity": "Germany", "Year": 2004, "rank": 10, "renewable_percentage": 9.4989185292}, {"Entity": "Germany", "Year": 2005, "rank": 8, "renewable_percentage": 10.3356645637}, {"Entity": "Germany", "Year": 2006, "rank": 8, "renewable_percentage": 11.5129959829}, {"Entity": "Germany", "Year": 2007, "rank": 8, "renewable_percentage": 14.135471525}, {"Entity": "Germany", "Year": 2008, "rank": 8, "renewable_percentage": 14.8894504106}, {"Entity": "Germany", "Year": 2009, "rank": 7, "renewable_percentage": 16.2902842395}, {"Entity": "Germany", "Year": 2010, "rank": 7, "renewable_percentage": 16.8384989754}, {"Entity": "Germany", "Year": 2011, "rank": 5, "renewable_percentage": 20.4967199299}, {"Entity": "Germany", "Year": 2012, "rank": 5, "renewable_percentage": 23.056464482}, {"Entity": "Germany", "Year": 2013, "rank": 5, "renewable_percentage": 24.1368929731}, {"Entity": "Germany", "Year": 2014, "rank": 5, "renewable_percentage": 26.2182434067}, {"Entity": "Germany", "Year": 2015, "rank": 5, "renewable_percentage": 29.4721888318}, {"Entity": "Germany", "Year": 2016, "rank": 5, "renewable_percentage": 29.4990435013}, {"Entity": "Germany", "Year": 2017, "rank": 4, "renewable_percentage": 33.4855497593}, {"Entity": "Germany", "Year": 2018, "rank": 5, "renewable_percentage": 35.0976735365}, {"Entity": "Germany", "Year": 2019, "rank": 3, "renewable_percentage": 40.0890757144}, {"Entity": "Germany", "Year": 2020, "rank": 3, "renewable_percentage": 44.3324048937}, {"Entity": "India", "Year": 2000, "rank": 7, "renewable_percentage": 14.0481982534}, {"Entity": "India", "Year": 2001, "rank": 8, "renewable_percentage": 12.9997099422}, {"Entity": "India", "Year": 2002, "rank": 7, "renewable_percentage": 11.938193032}, {"Entity": "India", "Year": 2003, "rank": 8, "renewable_percentage": 11.695109147}, {"Entity": "India", "Year": 2004, "rank": 7, "renewable_percentage": 15.6375300722}, {"Entity": "India", "Year": 2005, "rank": 6, "renewable_percentage": 15.2543575768}, {"Entity": "India", "Year": 2006, "rank": 5, "renewable_percentage": 17.1352578483}, {"Entity": "India", "Year": 2007, "rank": 4, "renewable_percentage": 17.8019742295}, {"Entity": "India", "Year": 2008, "rank": 7, "renewable_percentage": 16.768266921}, {"Entity": "India", "Year": 2009, "rank": 8, "renewable_percentage": 15.269804822}, {"Entity": "India", "Year": 2010, "rank": 8, "renewable_percentage": 15.2122201244}, {"Entity": "India", "Year": 2011, "rank": 8, "renewable_percentage": 16.7911025145}, {"Entity": "India", "Year": 2012, "rank": 8, "renewable_percentage": 15.1350014654}, {"Entity": "India", "Year": 2013, "rank": 8, "renewable_percentage": 16.3941577818}, {"Entity": "India", "Year": 2014, "rank": 10, "renewable_percentage": 16.0092550039}, {"Entity": "India", "Year": 2015, "rank": 11, "renewable_percentage": 15.3718720687}, {"Entity": "India", "Year": 2016, "rank": 13, "renewable_percentage": 14.8548475703}, {"Entity": "India", "Year": 2017, "rank": 13, "renewable_percentage": 15.9669920335}, {"Entity": "India", "Year": 2018, "rank": 13, "renewable_percentage": 16.6949549709}, {"Entity": "India", "Year": 2019, "rank": 11, "renewable_percentage": 18.6915426873}, {"Entity": "India", "Year": 2020, "rank": 13, "renewable_percentage": 20.2059243238}, {"Entity": "Indonesia", "Year": 2000, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2001, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2002, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2003, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2004, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2005, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2006, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2007, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2008, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2009, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2010, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2011, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2012, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2013, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2014, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2015, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2016, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2017, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2018, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2019, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2020, "rank": null, "renewable_percentage": null}, {"Entity": "Italy", "Year": 2000, "rank": 4, "renewable_percentage": 18.900241501}, {"Entity": "Italy", "Year": 2001, "rank": 4, "renewable_percentage": 20.049431902}, {"Entity": "Italy", "Year": 2002, "rank": 4, "renewable_percentage": 17.4555571614}, {"Entity": "Italy", "Year": 2003, "rank": 4, "renewable_percentage": 16.4202116476}, {"Entity": "Italy", "Year": 2004, "rank": 4, "renewable_percentage": 18.2749380999}, {"Entity": "Italy", "Year": 2005, "rank": 4, "renewable_percentage": 16.3769782226}, {"Entity": "Italy", "Year": 2006, "rank": 6, "renewable_percentage": 16.5128639906}, {"Entity": "Italy", "Year": 2007, "rank": 6, "renewable_percentage": 15.5333485238}, {"Entity": "Italy", "Year": 2008, "rank": 6, "renewable_percentage": 18.6112}, {"Entity": "Italy", "Year": 2009, "rank": 4, "renewable_percentage": 24.0837332221}, {"Entity": "Italy", "Year": 2010, "rank": 4, "renewable_percentage": 25.8400187976}, {"Entity": "Italy", "Year": 2011, "rank": 4, "renewable_percentage": 27.6773203443}, {"Entity": "Italy", "Year": 2012, "rank": 3, "renewable_percentage": 31.1049649217}, {"Entity": "Italy", "Year": 2013, "rank": 4, "renewable_percentage": 39.0148744209}, {"Entity": "Italy", "Year": 2014, "rank": 3, "renewable_percentage": 43.4976931949}, {"Entity": "Italy", "Year": 2015, "rank": 3, "renewable_percentage": 38.7577860829}, {"Entity": "Italy", "Year": 2016, "rank": 4, "renewable_percentage": 37.6079387187}, {"Entity": "Italy", "Year": 2017, "rank": 3, "renewable_percentage": 35.4174479255}, {"Entity": "Italy", "Year": 2018, "rank": 3, "renewable_percentage": 39.8100142663}, {"Entity": "Italy", "Year": 2019, "rank": 4, "renewable_percentage": 39.7563068474}, {"Entity": "Italy", "Year": 2020, "rank": 6, "renewable_percentage": 42.0397741576}, {"Entity": "Japan", "Year": 2000, "rank": 9, "renewable_percentage": 10.5382436261}, {"Entity": "Japan", "Year": 2001, "rank": 9, "renewable_percentage": 10.447653504}, {"Entity": "Japan", "Year": 2002, "rank": 9, "renewable_percentage": 10.2477294843}, {"Entity": "Japan", "Year": 2003, "rank": 7, "renewable_percentage": 11.6993698448}, {"Entity": "Japan", "Year": 2004, "rank": 8, "renewable_percentage": 11.4198974767}, {"Entity": "Japan", "Year": 2005, "rank": 9, "renewable_percentage": 9.9068127192}, {"Entity": "Japan", "Year": 2006, "rank": 9, "renewable_percentage": 10.8554989442}, {"Entity": "Japan", "Year": 2007, "rank": 10, "renewable_percentage": 9.3897588285}, {"Entity": "Japan", "Year": 2008, "rank": 10, "renewable_percentage": 10.0196834738}, {"Entity": "Japan", "Year": 2009, "rank": 11, "renewable_percentage": 10.4667464874}, {"Entity": "Japan", "Year": 2010, "rank": 10, "renewable_percentage": 10.5269966826}, {"Entity": "Japan", "Year": 2011, "rank": 12, "renewable_percentage": 11.1272421632}, {"Entity": "Japan", "Year": 2012, "rank": 13, "renewable_percentage": 10.6143703421}, {"Entity": "Japan", "Year": 2013, "rank": 13, "renewable_percentage": 11.7965798852}, {"Entity": "Japan", "Year": 2014, "rank": 13, "renewable_percentage": 13.2719619718}, {"Entity": "Japan", "Year": 2015, "rank": 10, "renewable_percentage": 15.6586817408}, {"Entity": "Japan", "Year": 2016, "rank": 10, "renewable_percentage": 15.6920107068}, {"Entity": "Japan", "Year": 2017, "rank": 10, "renewable_percentage": 17.3559698312}, {"Entity": "Japan", "Year": 2018, "rank": 9, "renewable_percentage": 18.144181175}, {"Entity": "Japan", "Year": 2019, "rank": 10, "renewable_percentage": 19.4223288251}, {"Entity": "Japan", "Year": 2020, "rank": 10, "renewable_percentage": 21.324925062}, {"Entity": "Kazakhstan", "Year": 2000, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2001, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2002, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2003, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2004, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2005, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2006, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2007, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2008, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2009, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2010, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2011, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2012, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2013, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2014, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2015, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2016, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2017, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2018, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2019, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2020, "rank": null, "renewable_percentage": null}, {"Entity": "Mexico", "Year": 2000, "rank": 3, "renewable_percentage": 22.9291160107}, {"Entity": "Mexico", "Year": 2001, "rank": 5, "renewable_percentage": 19.6649599841}, {"Entity": "Mexico", "Year": 2002, "rank": 5, "renewable_percentage": 17.4220963173}, {"Entity": "Mexico", "Year": 2003, "rank": 5, "renewable_percentage": 15.8536585366}, {"Entity": "Mexico", "Year": 2004, "rank": 5, "renewable_percentage": 17.3134463687}, {"Entity": "Mexico", "Year": 2005, "rank": 3, "renewable_percentage": 18.2780827246}, {"Entity": "Mexico", "Year": 2006, "rank": 3, "renewable_percentage": 18.4256091896}, {"Entity": "Mexico", "Year": 2007, "rank": 5, "renewable_percentage": 17.2761561168}, {"Entity": "Mexico", "Year": 2008, "rank": 3, "renewable_percentage": 21.5387105913}, {"Entity": "Mexico", "Year": 2009, "rank": 6, "renewable_percentage": 16.5369729069}, {"Entity": "Mexico", "Year": 2010, "rank": 5, "renewable_percentage": 19.4281608109}, {"Entity": "Mexico", "Year": 2011, "rank": 6, "renewable_percentage": 18.0916357408}, {"Entity": "Mexico", "Year": 2012, "rank": 7, "renewable_percentage": 16.5759438104}, {"Entity": "Mexico", "Year": 2013, "rank": 9, "renewable_percentage": 15.5492898914}, {"Entity": "Mexico", "Year": 2014, "rank": 7, "renewable_percentage": 19.8008201523}, {"Entity": "Mexico", "Year": 2015, "rank": 8, "renewable_percentage": 17.5976903451}, {"Entity": "Mexico", "Year": 2016, "rank": 9, "renewable_percentage": 17.4806943436}, {"Entity": "Mexico", "Year": 2017, "rank": 8, "renewable_percentage": 18.0759526428}, {"Entity": "Mexico", "Year": 2018, "rank": 10, "renewable_percentage": 17.703752786}, {"Entity": "Mexico", "Year": 2019, "rank": 12, "renewable_percentage": 18.5487927565}, {"Entity": "Mexico", "Year": 2020, "rank": 11, "renewable_percentage": 21.2552224134}, {"Entity": "Poland", "Year": 2000, "rank": 16, "renewable_percentage": 1.6273222517}, {"Entity": "Poland", "Year": 2001, "rank": 16, "renewable_percentage": 1.934316727}, {"Entity": "Poland", "Year": 2002, "rank": 16, "renewable_percentage": 1.9439960699}, {"Entity": "Poland", "Year": 2003, "rank": 16, "renewable_percentage": 1.4999000067}, {"Entity": "Poland", "Year": 2004, "rank": 16, "renewable_percentage": 2.1016681991}, {"Entity": "Poland", "Year": 2005, "rank": 16, "renewable_percentage": 2.4830699774}, {"Entity": "Poland", "Year": 2006, "rank": 16, "renewable_percentage": 2.673730134}, {"Entity": "Poland", "Year": 2007, "rank": 16, "renewable_percentage": 3.4256513785}, {"Entity": "Poland", "Year": 2008, "rank": 16, "renewable_percentage": 4.2744438696}, {"Entity": "Poland", "Year": 2009, "rank": 16, "renewable_percentage": 5.7515388179}, {"Entity": "Poland", "Year": 2010, "rank": 14, "renewable_percentage": 6.9299363057}, {"Entity": "Poland", "Year": 2011, "rank": 14, "renewable_percentage": 8.0547205693}, {"Entity": "Poland", "Year": 2012, "rank": 14, "renewable_percentage": 10.4436057663}, {"Entity": "Poland", "Year": 2013, "rank": 14, "renewable_percentage": 10.4081508145}, {"Entity": "Poland", "Year": 2014, "rank": 14, "renewable_percentage": 12.5331481248}, {"Entity": "Poland", "Year": 2015, "rank": 13, "renewable_percentage": 13.8151485631}, {"Entity": "Poland", "Year": 2016, "rank": 14, "renewable_percentage": 13.7335179722}, {"Entity": "Poland", "Year": 2017, "rank": 14, "renewable_percentage": 14.1999646913}, {"Entity": "Poland", "Year": 2018, "rank": 15, "renewable_percentage": 12.7559148032}, {"Entity": "Poland", "Year": 2019, "rank": 14, "renewable_percentage": 15.6157998037}, {"Entity": "Poland", "Year": 2020, "rank": 14, "renewable_percentage": 17.9648720886}, {"Entity": "Saudi Arabia", "Year": 2000, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2001, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2002, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2003, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2004, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2005, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2006, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2007, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2008, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2009, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2010, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2011, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2012, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2013, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2014, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2015, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2016, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2017, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2018, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2019, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2020, "rank": null, "renewable_percentage": null}, {"Entity": "South Africa", "Year": 2000, "rank": 17, "renewable_percentage": 0.9110805721}, {"Entity": "South Africa", "Year": 2001, "rank": 17, "renewable_percentage": 1.2516536074}, {"Entity": "South Africa", "Year": 2002, "rank": 17, "renewable_percentage": 1.3802249619}, {"Entity": "South Africa", "Year": 2003, "rank": 17, "renewable_percentage": 0.545271261}, {"Entity": "South Africa", "Year": 2004, "rank": 17, "renewable_percentage": 0.5827199439}, {"Entity": "South Africa", "Year": 2005, "rank": 17, "renewable_percentage": 0.763458686}, {"Entity": "South Africa", "Year": 2006, "rank": 17, "renewable_percentage": 1.3863060017}, {"Entity": "South Africa", "Year": 2007, "rank": 17, "renewable_percentage": 0.5267209594}, {"Entity": "South Africa", "Year": 2008, "rank": 17, "renewable_percentage": 0.6895692269}, {"Entity": "South Africa", "Year": 2009, "rank": 17, "renewable_percentage": 0.8031088083}, {"Entity": "South Africa", "Year": 2010, "rank": 17, "renewable_percentage": 1.0330068318}, {"Entity": "South Africa", "Year": 2011, "rank": 17, "renewable_percentage": 1.0184465622}, {"Entity": "South Africa", "Year": 2012, "rank": 17, "renewable_percentage": 0.6890826069}, {"Entity": "South Africa", "Year": 2013, "rank": 17, "renewable_percentage": 0.6792168043}, {"Entity": "South Africa", "Year": 2014, "rank": 17, "renewable_percentage": 1.4288129861}, {"Entity": "South Africa", "Year": 2015, "rank": 17, "renewable_percentage": 2.6256790549}, {"Entity": "South Africa", "Year": 2016, "rank": 17, "renewable_percentage": 3.2586126531}, {"Entity": "South Africa", "Year": 2017, "rank": 17, "renewable_percentage": 4.2202606137}, {"Entity": "South Africa", "Year": 2018, "rank": 17, "renewable_percentage": 5.1554655529}, {"Entity": "South Africa", "Year": 2019, "rank": 17, "renewable_percentage": 5.3589699864}, {"Entity": "South Africa", "Year": 2020, "rank": 17, "renewable_percentage": 5.780581212}, {"Entity": "Spain", "Year": 2000, "rank": 6, "renewable_percentage": 15.6119862394}, {"Entity": "Spain", "Year": 2001, "rank": 3, "renewable_percentage": 21.1524434719}, {"Entity": "Spain", "Year": 2002, "rank": 6, "renewable_percentage": 13.8260180901}, {"Entity": "Spain", "Year": 2003, "rank": 3, "renewable_percentage": 21.667314419}, {"Entity": "Spain", "Year": 2004, "rank": 3, "renewable_percentage": 18.3190206468}, {"Entity": "Spain", "Year": 2005, "rank": 7, "renewable_percentage": 14.8597342333}, {"Entity": "Spain", "Year": 2006, "rank": 4, "renewable_percentage": 17.6623992413}, {"Entity": "Spain", "Year": 2007, "rank": 3, "renewable_percentage": 19.3347262296}, {"Entity": "Spain", "Year": 2008, "rank": 4, "renewable_percentage": 20.0051501593}, {"Entity": "Spain", "Year": 2009, "rank": 3, "renewable_percentage": 25.4107639008}, {"Entity": "Spain", "Year": 2010, "rank": 3, "renewable_percentage": 32.7922186819}, {"Entity": "Spain", "Year": 2011, "rank": 3, "renewable_percentage": 30.0408415417}, {"Entity": "Spain", "Year": 2012, "rank": 4, "renewable_percentage": 29.6047928652}, {"Entity": "Spain", "Year": 2013, "rank": 3, "renewable_percentage": 39.5850357054}, {"Entity": "Spain", "Year": 2014, "rank": 4, "renewable_percentage": 40.1032952644}, {"Entity": "Spain", "Year": 2015, "rank": 4, "renewable_percentage": 34.9899091826}, {"Entity": "Spain", "Year": 2016, "rank": 3, "renewable_percentage": 38.5818061138}, {"Entity": "Spain", "Year": 2017, "rank": 5, "renewable_percentage": 32.220593624}, {"Entity": "Spain", "Year": 2018, "rank": 4, "renewable_percentage": 38.2080329557}, {"Entity": "Spain", "Year": 2019, "rank": 6, "renewable_percentage": 37.280815091}, {"Entity": "Spain", "Year": 2020, "rank": 4, "renewable_percentage": 43.8108805298}, {"Entity": "Thailand", "Year": 2000, "rank": 12, "renewable_percentage": 7.1261029822}, {"Entity": "Thailand", "Year": 2001, "rank": 12, "renewable_percentage": 7.061527212}, {"Entity": "Thailand", "Year": 2002, "rank": 12, "renewable_percentage": 7.9444772593}, {"Entity": "Thailand", "Year": 2003, "rank": 13, "renewable_percentage": 7.6718362852}, {"Entity": "Thailand", "Year": 2004, "rank": 13, "renewable_percentage": 6.5163549406}, {"Entity": "Thailand", "Year": 2005, "rank": 14, "renewable_percentage": 6.0325203252}, {"Entity": "Thailand", "Year": 2006, "rank": 13, "renewable_percentage": 7.5988547551}, {"Entity": "Thailand", "Year": 2007, "rank": 13, "renewable_percentage": 7.7085852479}, {"Entity": "Thailand", "Year": 2008, "rank": 13, "renewable_percentage": 6.5625458278}, {"Entity": "Thailand", "Year": 2009, "rank": 15, "renewable_percentage": 6.6263303689}, {"Entity": "Thailand", "Year": 2010, "rank": 16, "renewable_percentage": 5.7085828343}, {"Entity": "Thailand", "Year": 2011, "rank": 15, "renewable_percentage": 8.039961941}, {"Entity": "Thailand", "Year": 2012, "rank": 15, "renewable_percentage": 8.5396118358}, {"Entity": "Thailand", "Year": 2013, "rank": 16, "renewable_percentage": 7.6765035487}, {"Entity": "Thailand", "Year": 2014, "rank": 15, "renewable_percentage": 8.395728489}, {"Entity": "Thailand", "Year": 2015, "rank": 15, "renewable_percentage": 7.9949619145}, {"Entity": "Thailand", "Year": 2016, "rank": 15, "renewable_percentage": 8.9840234023}, {"Entity": "Thailand", "Year": 2017, "rank": 15, "renewable_percentage": 10.9570957096}, {"Entity": "Thailand", "Year": 2018, "rank": 14, "renewable_percentage": 14.1900054915}, {"Entity": "Thailand", "Year": 2019, "rank": 15, "renewable_percentage": 14.7001731284}, {"Entity": "Thailand", "Year": 2020, "rank": 15, "renewable_percentage": 13.7963737796}, {"Entity": "Ukraine", "Year": 2000, "rank": 13, "renewable_percentage": 6.5860921352}, {"Entity": "Ukraine", "Year": 2001, "rank": 13, "renewable_percentage": 6.9729761009}, {"Entity": "Ukraine", "Year": 2002, "rank": 14, "renewable_percentage": 5.5597165409}, {"Entity": "Ukraine", "Year": 2003, "rank": 14, "renewable_percentage": 5.1442841287}, {"Entity": "Ukraine", "Year": 2004, "rank": 14, "renewable_percentage": 6.4718162839}, {"Entity": "Ukraine", "Year": 2005, "rank": 13, "renewable_percentage": 6.6698940347}, {"Entity": "Ukraine", "Year": 2006, "rank": 14, "renewable_percentage": 6.68633235}, {"Entity": "Ukraine", "Year": 2007, "rank": 14, "renewable_percentage": 5.3380238605}, {"Entity": "Ukraine", "Year": 2008, "rank": 14, "renewable_percentage": 6.1377090041}, {"Entity": "Ukraine", "Year": 2009, "rank": 13, "renewable_percentage": 6.980762585}, {"Entity": "Ukraine", "Year": 2010, "rank": 13, "renewable_percentage": 7.0914098083}, {"Entity": "Ukraine", "Year": 2011, "rank": 16, "renewable_percentage": 5.7450628366}, {"Entity": "Ukraine", "Year": 2012, "rank": 16, "renewable_percentage": 5.6614236741}, {"Entity": "Ukraine", "Year": 2013, "rank": 15, "renewable_percentage": 7.8003200661}, {"Entity": "Ukraine", "Year": 2014, "rank": 16, "renewable_percentage": 5.5885262117}, {"Entity": "Ukraine", "Year": 2015, "rank": 16, "renewable_percentage": 4.3924771096}, {"Entity": "Ukraine", "Year": 2016, "rank": 16, "renewable_percentage": 5.6797249171}, {"Entity": "Ukraine", "Year": 2017, "rank": 16, "renewable_percentage": 7.0457194664}, {"Entity": "Ukraine", "Year": 2018, "rank": 16, "renewable_percentage": 8.228528092}, {"Entity": "Ukraine", "Year": 2019, "rank": 16, "renewable_percentage": 7.7754487096}, {"Entity": "Ukraine", "Year": 2020, "rank": 16, "renewable_percentage": 11.8440577364}, {"Entity": "United Kingdom", "Year": 2000, "rank": 15, "renewable_percentage": 2.6657406913}, {"Entity": "United Kingdom", "Year": 2001, "rank": 15, "renewable_percentage": 2.5001961451}, {"Entity": "United Kingdom", "Year": 2002, "rank": 15, "renewable_percentage": 2.8939157566}, {"Entity": "United Kingdom", "Year": 2003, "rank": 15, "renewable_percentage": 2.6854802003}, {"Entity": "United Kingdom", "Year": 2004, "rank": 15, "renewable_percentage": 3.6136880575}, {"Entity": "United Kingdom", "Year": 2005, "rank": 15, "renewable_percentage": 4.2815234434}, {"Entity": "United Kingdom", "Year": 2006, "rank": 15, "renewable_percentage": 4.6029890199}, {"Entity": "United Kingdom", "Year": 2007, "rank": 15, "renewable_percentage": 5.0104331009}, {"Entity": "United Kingdom", "Year": 2008, "rank": 15, "renewable_percentage": 5.6776842324}, {"Entity": "United Kingdom", "Year": 2009, "rank": 14, "renewable_percentage": 6.7679854187}, {"Entity": "United Kingdom", "Year": 2010, "rank": 15, "renewable_percentage": 6.9092924441}, {"Entity": "United Kingdom", "Year": 2011, "rank": 13, "renewable_percentage": 9.6422505889}, {"Entity": "United Kingdom", "Year": 2012, "rank": 11, "renewable_percentage": 11.4273047189}, {"Entity": "United Kingdom", "Year": 2013, "rank": 10, "renewable_percentage": 14.9727052732}, {"Entity": "United Kingdom", "Year": 2014, "rank": 8, "renewable_percentage": 19.2476358104}, {"Entity": "United Kingdom", "Year": 2015, "rank": 6, "renewable_percentage": 24.6227709191}, {"Entity": "United Kingdom", "Year": 2016, "rank": 7, "renewable_percentage": 24.6788390627}, {"Entity": "United Kingdom", "Year": 2017, "rank": 6, "renewable_percentage": 29.4986571173}, {"Entity": "United Kingdom", "Year": 2018, "rank": 6, "renewable_percentage": 33.2919818457}, {"Entity": "United Kingdom", "Year": 2019, "rank": 5, "renewable_percentage": 37.4568630499}, {"Entity": "United Kingdom", "Year": 2020, "rank": 5, "renewable_percentage": 42.8603962651}, {"Entity": "United States", "Year": 2000, "rank": 10, "renewable_percentage": 9.2298992662}, {"Entity": "United States", "Year": 2001, "rank": 11, "renewable_percentage": 7.5132056541}, {"Entity": "United States", "Year": 2002, "rank": 10, "renewable_percentage": 8.749216358}, {"Entity": "United States", "Year": 2003, "rank": 10, "renewable_percentage": 9.0252110397}, {"Entity": "United States", "Year": 2004, "rank": 11, "renewable_percentage": 8.7334100887}, {"Entity": "United States", "Year": 2005, "rank": 12, "renewable_percentage": 8.7494640631}, {"Entity": "United States", "Year": 2006, "rank": 12, "renewable_percentage": 9.4184742052}, {"Entity": "United States", "Year": 2007, "rank": 12, "renewable_percentage": 8.3984096829}, {"Entity": "United States", "Year": 2008, "rank": 11, "renewable_percentage": 9.180943292}, {"Entity": "United States", "Year": 2009, "rank": 10, "renewable_percentage": 10.547689996}, {"Entity": "United States", "Year": 2010, "rank": 11, "renewable_percentage": 10.3180892283}, {"Entity": "United States", "Year": 2011, "rank": 9, "renewable_percentage": 12.4665249812}, {"Entity": "United States", "Year": 2012, "rank": 10, "renewable_percentage": 12.1841179804}, {"Entity": "United States", "Year": 2013, "rank": 12, "renewable_percentage": 12.8311154179}, {"Entity": "United States", "Year": 2014, "rank": 12, "renewable_percentage": 13.3503092033}, {"Entity": "United States", "Year": 2015, "rank": 14, "renewable_percentage": 13.6261352256}, {"Entity": "United States", "Year": 2016, "rank": 12, "renewable_percentage": 15.2880936304}, {"Entity": "United States", "Year": 2017, "rank": 9, "renewable_percentage": 17.4515276472}, {"Entity": "United States", "Year": 2018, "rank": 11, "renewable_percentage": 17.4499886946}, {"Entity": "United States", "Year": 2019, "rank": 13, "renewable_percentage": 18.2946243489}, {"Entity": "United States", "Year": 2020, "rank": 12, "renewable_percentage": 20.3156921037}], "metadata": {"Entity": {"type": "string", "semanticType": "Location"}, "Year": {"type": "number", "semanticType": "Year"}, "rank": {"type": "number", "semanticType": "Number"}, "renewable_percentage": {"type": "number", "semanticType": "Percentage"}}, "derive": {"code": "\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total energy for each row\n df_energy['total_energy'] = (\n df_energy['Electricity from fossil fuels (TWh)'] + \n df_energy['Electricity from nuclear (TWh)'] + \n df_energy['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_energy['renewable_percentage'] = (\n df_energy['Electricity from renewables (TWh)'] / df_energy['total_energy'] * 100\n )\n \n # Handle division by zero cases\n df_energy['renewable_percentage'] = df_energy['renewable_percentage'].replace([np.inf, -np.inf], np.nan)\n \n # Rank countries by renewable percentage within each year (highest percentage = rank 1)\n df_energy['rank'] = df_energy.groupby('Year')['renewable_percentage'].rank(\n method='dense', \n ascending=False\n )\n \n # Select only the required columns\n transformed_df = df_energy[['Year', 'Entity', 'renewable_percentage', 'rank']].copy()\n \n return transformed_df\n", "source": ["global-energy-20-small.csv"], "dialog": [{"content": "You are a data scientist to help user to transform data that will be used for visualization.\nThe user will provide you information about what data would be needed, and your job is to create a python function based on the input data summary, transformation instruction and expected fields.\nThe users' instruction includes \"chart_type\" and \"chart_encodings\" that describe the visualization they want, and natural language instructions \"goal\" that describe what data is needed.\n\n**Important:**\n- NEVER make assumptions or judgments about a person's gender, biological sex, sexuality, religion, race, nationality, ethnicity, political stance, socioeconomic status, mental health, invisible disabilities, medical conditions, personality type, social impressions, emotional state, and cognitive state.\n- NEVER create formulas that could be used to discriminate based on age. Ageism of any form (explicit and implicit) is strictly prohibited.\n- If above issue occurs, generate columns with np.nan.\n\nConcretely, you should first refine users' goal and then create a python function in the output section based off the [CONTEXT] and [GOAL]:\n\n 1. First, refine users' [GOAL]. The main objective in this step is to check if \"chart_type\" and \"chart_encodings\" provided by the user are sufficient to achieve their \"goal\". Concretely:\n - based on the user's \"goal\" and \"chart_type\" and \"chart_encodings\", elaborate the goal into a \"detailed_instruction\".\n - \"display_instruction\" is a short verb phrase describing the users' goal. \n - it would be a short verbal description of user intent as a verb phrase (<12 words).\n - generate it based on detailed_instruction and the suggested chart_type and chart_encodings, but don't need to mention the chart details.\n - should capture key computation ideas: by reading the display, the user can understand the purpose and what's derived from the data.\n - if the user specification follows up the previous instruction, the 'display_instruction' should only describe how it builds up the previous instruction without repeating information from previous steps.\n - the phrase can be presented in different styles, e.g., question (what's xxx), instruction (show xxx), description, etc.\n - if you mention column names from the input or the output data, highlight the text in **bold**.\n * the column can either be a column in the input data, or a new column that will be computed in the output data.\n * the mention don't have to be exact match, it can be semantically matching, e.g., if you mentioned \"average score\" in the text while the column to be computed is \"Avg_Score\", you should still highlight \"**average score**\" in the text.\n - determine \"output_fields\", the desired fields that the output data should have to achieve the user's goal, it's a good idea to include intermediate fields here.\n - then decide \"chart_encodings\", which maps visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized, \n - the \"chart_encodings\" should be created to support the user's \"chart_type\".\n - first, determine whether the user has provided sufficient fields in \"chart_encodings\" that are needed to achieve their goal:\n - if the user's \"chart_encodings\" are sufficient, simply copy it.\n - if the user didn't provide sufficient fields in \"chart_encodings\", add missing fields in \"chart_encodings\" (ordered them based on whether the field will be used in x,y axes or legends);\n - \"chart_encodings\" should only include fields that will be visualized (do not include other intermediate fields from \"output_fields\") \n - when adding new fields to \"chart_encodings\", be efficient and add only a minimal number of fields that are needed to achive the user's goal. \n - generally, the total number of fields in \"chart_encodings\" should be no more than 3 for x,y,legend.\n - if the user's \"chart_encodings\" is sufficient but can be optimized, you can reorder encodings to visualize the data more effectively.\n - sometimes, user may provide instruction to update visualizations fields they provided. You should leverage the user's goal to resolve the conflict and decide the final \"chart_encodings\"\n - e.g., they may mention \"use B metric instead\" while A metric is in provided fields, in this case, you should update \"chart_encodings\" to update A metric with B metric.\n - guide on statistical analysis:\n - when the user asks for forecasting or regression analysis, you should consider the following:\n - the output should be a long format table where actual x, y pairs and predicted x, y pairs are included in the X, Y columns, they are differentiated with a third column \"is_predicted\" that is a boolean field.\n - i.e., if the user ask for forecasting based on two columns T and Y, the output should be three columns: T, Y, is_predicted, where\n - T, Y columns contain BOTH original values from the data and predicted values from the data.\n - is_predicted is a boolean field to indicate whether the x, y pairs are original values from the data or predicted / regression values from the data.\n - the recommended chart should be line chart (time series) or scatter plot (quantitative x, y)\n - if the user asks for forecasting, it's good to include predicted x, y pairs for both x in the original data and future x values (i.e., combine regression and forecasting results)\n - in this case, is_predicted should be of three values 'original', 'regression', 'forecasting'\n - when the user asks for clustering:\n - the output should be a long format table where actual x, y pairs with a third column \"cluster_id\" that indicates the cluster id of the data point.\n - the recommended chart should be scatter plot (quantitative x, y)\n \n Prepare the result in the following json format:\n\n```\n{\n \"detailed_instruction\": \"...\" // string, elaborate user instruction with details if the user\n \"display_instruction\": \"...\" // string, the short verb phrase describing the users' goal.\n \"output_fields\": [...] // string[], describe the desired output fields that the output data should have based on the user's goal, it's a good idea to preserve intermediate fields here (i.e., the goal of transformed data)\n \"chart_encodings\": {\n \"x\": \"\",\n \"y\": \"\",\n \"color\": \"\",\n \"size\": \"\",\n \"opacity\": \"\",\n \"facet\": \"\",\n ... // other visualization channels user used\n } // object: map visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized.\n \"reason\": \"...\" // string, explain why this refinement is made\n}\n```\n\n 2. Then, write a python function based on the refined goal, the function input is a dataframe \"df\" (or multiple dataframes based on tables presented in the [CONTEXT] section) and the output is the transformed dataframe \"transformed_df\". \"transformed_df\" should contain all \"output_fields\" from the refined goal.\nThe python function must follow the template provided in [TEMPLATE], do not import any other libraries or modify function name. The function should be as simple as possible and easily readable.\nIf there is no data transformation needed based on \"output_fields\", the transformation function can simply \"return df\".\n\n[TEMPLATE]\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\nfrom sklearn import ... # import necessary libraries from sklearn if needed\n\ndef transform_data(df1, df2, ...): \n # complete the template here\n return transformed_df\n```\n\nnote: \n- if the user provided one table, then it should be `def transform_data(df1)`, if the user provided multiple tables, then it should be `def transform_data(df1, df2, ...)` and you should consider the join between tables to derive the output.\n- **VERY IMPORTANT** the number of arguments in the function must match the number of tables provided, and the order of arguments must match the order of tables provided.\n- try to use intuitive table names to refer to the input dataframes, for example, if the user provided two tables city and weather, you can use `transform_data(df_city, df_weather)` to refer to the two dataframes, as long as the number and order of the arguments match the number and order of the tables provided.\n- datetime objects handling:\n - if the output field is year, convert it to number, if it is year-month / year-month-day, convert it to string object (e.g., \"2020-01\" / \"2020-01-01\").\n - if the output is time only: convert hour to number if it's just the hour (e.g., 10), but convert hour:min or h:m:s to string object (e.g., \"10:30\", \"10:30:45\")\n - never return datetime object directly, convert it to either number (if it only contains year) or string so it's readable.\n\n 3. The output must only contain a json object representing the refined goal and a python code block representing the transformation code, do not add any extra text explanation.\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 158029.9988, 190729.9957, 209229.9957, ..., 6993180.176, 9861099.609, 9874660.156, 10096009.77\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 90.14, 91.79, 92.54, 804.95\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[GOAL]\n\n{\n \"instruction\": \"\",\n \"chart_type\": \"Line Chart\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Transform the data into a long format where different energy sources (fossil fuels, nuclear, renewables) are unpivoted into separate rows. Each row should contain the year, entity (country), energy source type, and the corresponding energy value in TWh. This will enable visualization of energy generation trends over time by source and country.\",\n \"display_instruction\": \"Show **energy generation** trends by **source** and **Entity** over **Year**\",\n \"output_fields\": [\"Year\", \"Entity\", \"source\", \"energy\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n },\n \"reason\": \"The user wants to visualize energy data across different sources with a line chart faceted by source. The original data has energy sources in separate columns (fossil fuels, nuclear, renewables), so we need to unpivot these columns into a long format with 'source' and 'energy' fields. The chart_encodings provided by the user are appropriate for this visualization goal, using faceting (column) to separate different energy sources.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Unpivot the energy source columns into long format\n transformed_df = pd.melt(\n df_energy,\n id_vars=['Year', 'Entity'],\n value_vars=[\n 'Electricity from fossil fuels (TWh)',\n 'Electricity from nuclear (TWh)',\n 'Electricity from renewables (TWh)'\n ],\n var_name='source',\n value_name='energy'\n )\n \n # Clean up the source names to be more readable\n transformed_df['source'] = transformed_df['source'].str.replace('Electricity from ', '').str.replace(' (TWh)', '')\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Entity Year energy source\n0 Australia 2000 181.05 fossil fuels\n1 Australia 2001 194.33 fossil fuels\n2 Australia 2002 197.29 fossil fuels\n3 Australia 2003 195.13 fossil fuels\n4 Australia 2004 203.66 fossil fuels\n5 Australia 2005 195.95 fossil fuels\n6 Australia 2006 198.72 fossil fuels\n7 Australia 2007 208.59 fossil fuels\n8 Australia 2008 211.06 fossil fuels\n9 Australia 2009 216.42 fossil fuels\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show renewable percentage for each country\",\n \"chart_type\": \"Line Chart\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Calculate the renewable energy percentage for each country and year. The renewable percentage is computed as the ratio of renewable energy to total energy (sum of fossil fuels, nuclear, and renewables) multiplied by 100. The output should contain year, entity (country), and the renewable percentage value.\",\n \"display_instruction\": \"Show **renewable percentage** for each **Entity** over **Year**\",\n \"output_fields\": [\"Year\", \"Entity\", \"renewable_percentage\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"renewable_percentage\",\n \"color\": \"Entity\"\n },\n \"reason\": \"The user wants to see renewable percentage trends, which requires calculating the proportion of renewable energy relative to total energy generation. This is a single metric per country per year, so we don't need the 'column' faceting by source anymore. The chart_encodings are updated to reflect that we're now showing a percentage metric rather than absolute energy values by source.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total energy for each row\n df_energy['total_energy'] = (\n df_energy['Electricity from fossil fuels (TWh)'] + \n df_energy['Electricity from nuclear (TWh)'] + \n df_energy['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_energy['renewable_percentage'] = (\n df_energy['Electricity from renewables (TWh)'] / df_energy['total_energy'] * 100\n )\n \n # Handle division by zero cases\n df_energy['renewable_percentage'] = df_energy['renewable_percentage'].replace([np.inf, -np.inf], np.nan)\n \n # Select only the required columns\n transformed_df = df_energy[['Year', 'Entity', 'renewable_percentage']].copy()\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Entity Year renewable_percentage\n0 Australia 2000 8.634437\n1 Australia 2001 8.218014\n2 Australia 2002 8.083302\n3 Australia 2003 8.659832\n4 Australia 2004 8.701305\n5 Australia 2005 9.156236\n6 Australia 2006 9.635760\n7 Australia 2007 9.119031\n8 Australia 2008 8.054890\n9 Australia 2009 7.804379\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"rank countries by their renewable percentage\",\n \"chart_type\": \"Line Chart\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"renewable_percentage\",\n \"color\": \"Entity\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Calculate the renewable energy percentage for each country and year, then rank countries based on their renewable percentage. The ranking should be computed for each year, showing which countries have the highest to lowest renewable percentage. The output should contain year, entity (country), renewable percentage, and the rank of each country within that year.\",\n \"display_instruction\": \"Show **countries ranked** by **renewable percentage** over **Year**\",\n \"output_fields\": [\"Year\", \"Entity\", \"renewable_percentage\", \"rank\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"renewable_percentage\",\n \"color\": \"Entity\"\n },\n \"reason\": \"The user wants to rank countries by their renewable percentage, which requires adding a rank field that orders countries within each year based on their renewable percentage (highest percentage = rank 1). This allows comparison of countries' relative positions in renewable energy adoption over time. The chart_encodings remain the same as the previous step since we're still visualizing renewable percentage trends, but now with ranking information available.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total energy for each row\n df_energy['total_energy'] = (\n df_energy['Electricity from fossil fuels (TWh)'] + \n df_energy['Electricity from nuclear (TWh)'] + \n df_energy['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_energy['renewable_percentage'] = (\n df_energy['Electricity from renewables (TWh)'] / df_energy['total_energy'] * 100\n )\n \n # Handle division by zero cases\n df_energy['renewable_percentage'] = df_energy['renewable_percentage'].replace([np.inf, -np.inf], np.nan)\n \n # Rank countries by renewable percentage within each year (highest percentage = rank 1)\n df_energy['rank'] = df_energy.groupby('Year')['renewable_percentage'].rank(\n method='dense', \n ascending=False\n )\n \n # Select only the required columns\n transformed_df = df_energy[['Year', 'Entity', 'renewable_percentage', 'rank']].copy()\n \n return transformed_df\n```", "role": "assistant"}], "trigger": {"tableId": "table-45", "resultTableId": "table-78", "chart": {"id": "chart-1760738441793", "chartType": "Line Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"fieldID": "concept-renewable_percentage-1760738424337"}, "color": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "opacity": {"channel": "opacity", "bin": false}, "column": {}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-45", "saved": false, "source": "trigger", "unread": false}, "interaction": [{"from": "user", "to": "datatransform-agent", "role": "instruction", "content": "rank countries by their renewable percentage", "displayContent": "Show **countries ranked** by **renewable percentage** over **Year**"}]}, "explanation": {"agent": "CodeExplanationAgent", "code": "- Calculate **total_energy** for each country-year by summing electricity generated from **fossil fuels**, **nuclear**, and **renewables** (all measured in TWh)\n- Compute **renewable_percentage** by dividing **Electricity from renewables (TWh)** by **total_energy** and multiplying by **100**\n- Replace any infinite values in **renewable_percentage** with NaN to handle cases where total energy is zero\n- Assign a **rank** to each country within each **Year** based on **renewable_percentage**, where rank **1** represents the highest renewable percentage (using dense ranking method)\n- Return a dataset containing **Year**, **Entity** (country name), **renewable_percentage**, and **rank**", "concepts": [{"explanation": "The proportion of a country's total electricity generation that comes from renewable sources, expressed as a percentage. Calculated as: \\[ \\text{renewable\\_percentage} = \\frac{\\text{Electricity from renewables (TWh)}}{\\text{total\\_energy}} \\times 100 \\]", "field": "renewable_percentage"}], "dialog": [{"content": "You are a data scientist to help user explain code, \nso that a non-code can clearly understand what the code is doing, you are provided with a summary of the input data, and the transformation code.\n\nYour goal:\n1. You should generate a good itemized explanation of the code so that the reader can understand high-level steps of what the data transformation is doing.\n - Be very concise, and stay at a high-level. The reader doesn't understand code and does not want to learn exactly what the code is doing. They just want to learn what have been done from a logical level.\n - The explanation should be a markdown string that is a list of bullet points (with new lines), highlight constants, data fields, and important verbs.\n2. Generate a list of explanations for new fields (fields not from the input data) that introduce metrics/concepts that are not obvious from the code.\n - provide a declarative definition that explains the new field, use a mathematical notation if applicable.\n - only include new fields explanation of new metrics that are involved in computation (e.g., ROI, commerical_success_score)\n - *DO NOT* explain trivial new fields like \"Decade\" or \"Avg_Rating\", \"US_Sales\" that are self-explanatory.\n - Avoid explaining fields that are simple aggregate of fields in the original data (min_score, avg_value, count, etc.)\n - When a field involves mathematical computation, you can use LaTeX math notation in the explanation. Format mathematical expressions using:\n - Inline math: `\\( ... \\)` for formulas within text\n - Block math: `\\[ ... \\]` for standalone formulas\n - Examples: `\\( \\frac{\\text{Revenue}}{\\text{Cost}} \\)` for ratios, `\\[ \\text{Score} = \\text{Rating} \\times \\text{Worldwide\\_Gross} \\]` for formulas\n - note: when using underscores as part of the text, you need to escape them with a backslash, e.g., `\\_`\n - Note: don't use math notation for fields whose computation is trivial (use plain english), it will likely be confusing to the reader. \n Only use math notation for fields that can not be easilyexplained in plain english. Use it sparingly.\n3. If there are multiple fields that have the similar computation, you can explain them together in one explanation.\n - in \"field\", you can provide a list of fields in format of \"field1, field2, ...\"\n - in \"explanation\", you can provide a single explanation for the computation of the fields.\n - for example, if you have fields like \"Norm_Rating\", \"Norm_Gross\", \"Critical_Commercial_Score\", you can explain Norm_Rating, Norm_Gross together in one explanation and explain Critical_Commercial_Score in another explanation.\n4. If the code is about statistical analysis, you should explain the statistical analysis in the explanation as a concept named \"Statistical Analysis\" in the [CONCEPTS EXPLANATION] section.\n - explain how you model the data, which fields are used, how data processing is done, and what models are used.\n - suggest some other modeling approaches that can be used to analyze the data in the explanation as well.\n \nThe focus is to explain how new fields are computed, don't generate explanation for low-level actions like \"return\", \"load data\" etc. \n\nProvide the result in the following two sections:\n - first section is the code explanation that should be a markdown block explaining the code, in the [CODE EXPLANATION] section.\n - remember to highlight constants, data fields, and important verbs in the code explanation.\n - second section is the concepts explanation that should be a json block (start with ```json) in the [CONCEPTS EXPLANATION] section.\n\n[CODE EXPLANATION]\n\n...(explanation of the code)\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"...\",\n \"explanation\": \"...\"\n }\n]\n\n```\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small_csv)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 190729.9957, 227580.0018, 233600.0061, ..., 9282549.805, 9861099.609, 9874660.156, 10096009.77\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 88.76, 90.14, 91.79, 92.54\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[CODE]\n\nhere is the transformation code: \nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total energy for each row\n df_energy['total_energy'] = (\n df_energy['Electricity from fossil fuels (TWh)'] + \n df_energy['Electricity from nuclear (TWh)'] + \n df_energy['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_energy['renewable_percentage'] = (\n df_energy['Electricity from renewables (TWh)'] / df_energy['total_energy'] * 100\n )\n \n # Handle division by zero cases\n df_energy['renewable_percentage'] = df_energy['renewable_percentage'].replace([np.inf, -np.inf], np.nan)\n \n # Rank countries by renewable percentage within each year (highest percentage = rank 1)\n df_energy['rank'] = df_energy.groupby('Year')['renewable_percentage'].rank(\n method='dense', \n ascending=False\n )\n \n # Select only the required columns\n transformed_df = df_energy[['Year', 'Entity', 'renewable_percentage', 'rank']].copy()\n \n return transformed_df\n\n\n[EXPLANATION]\n", "role": "user"}, {"content": "[CODE EXPLANATION]\n\n- Calculate **total_energy** for each country-year by summing electricity generated from **fossil fuels**, **nuclear**, and **renewables** (all measured in TWh)\n- Compute **renewable_percentage** by dividing **Electricity from renewables (TWh)** by **total_energy** and multiplying by **100**\n- Replace any infinite values in **renewable_percentage** with NaN to handle cases where total energy is zero\n- Assign a **rank** to each country within each **Year** based on **renewable_percentage**, where rank **1** represents the highest renewable percentage (using dense ranking method)\n- Return a dataset containing **Year**, **Entity** (country name), **renewable_percentage**, and **rank**\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"renewable_percentage\",\n \"explanation\": \"The proportion of a country's total electricity generation that comes from renewable sources, expressed as a percentage. Calculated as: \\\\[ \\\\text{renewable\\\\_percentage} = \\\\frac{\\\\text{Electricity from renewables (TWh)}}{\\\\text{total\\\\_energy}} \\\\times 100 \\\\]\"\n }\n]\n```", "role": "assistant"}], "status": "ok"}, "status": "completed"}, "anchored": false, "attachedMetadata": ""}, {"kind": "table", "id": "table-97", "displayId": "renewable-elec", "names": ["Electricity from renewables (TWh)", "Entity", "Year"], "rows": [{"Electricity from renewables (TWh)": 17.11, "Entity": "Australia", "Year": "2000"}, {"Electricity from renewables (TWh)": 63.99, "Entity": "Australia", "Year": "2020"}, {"Electricity from renewables (TWh)": 308.77, "Entity": "Brazil", "Year": "2000"}, {"Electricity from renewables (TWh)": 520.01, "Entity": "Brazil", "Year": "2020"}, {"Electricity from renewables (TWh)": 363.7, "Entity": "Canada", "Year": "2000"}, {"Electricity from renewables (TWh)": 429.24, "Entity": "Canada", "Year": "2020"}, {"Electricity from renewables (TWh)": 225.56, "Entity": "China", "Year": "2000"}, {"Electricity from renewables (TWh)": 2184.94, "Entity": "China", "Year": "2020"}, {"Electricity from renewables (TWh)": 67.83, "Entity": "France", "Year": "2000"}, {"Electricity from renewables (TWh)": 125.28, "Entity": "France", "Year": "2020"}, {"Electricity from renewables (TWh)": 35.47, "Entity": "Germany", "Year": "2000"}, {"Electricity from renewables (TWh)": 251.48, "Entity": "Germany", "Year": "2020"}, {"Electricity from renewables (TWh)": 80.27, "Entity": "India", "Year": "2000"}, {"Electricity from renewables (TWh)": 315.76, "Entity": "India", "Year": "2020"}, {"Electricity from renewables (TWh)": 19.6, "Entity": "Indonesia", "Year": "2000"}, {"Electricity from renewables (TWh)": 52.91, "Entity": "Indonesia", "Year": "2020"}, {"Electricity from renewables (TWh)": 50.87, "Entity": "Italy", "Year": "2000"}, {"Electricity from renewables (TWh)": 116.9, "Entity": "Italy", "Year": "2020"}, {"Electricity from renewables (TWh)": 104.16, "Entity": "Japan", "Year": "2000"}, {"Electricity from renewables (TWh)": 205.6, "Entity": "Japan", "Year": "2020"}, {"Electricity from renewables (TWh)": 7.53, "Entity": "Kazakhstan", "Year": "2000"}, {"Electricity from renewables (TWh)": 11.94, "Entity": "Kazakhstan", "Year": "2020"}, {"Electricity from renewables (TWh)": 44.51, "Entity": "Mexico", "Year": "2000"}, {"Electricity from renewables (TWh)": 69.19, "Entity": "Mexico", "Year": "2020"}, {"Electricity from renewables (TWh)": 2.33, "Entity": "Poland", "Year": "2000"}, {"Electricity from renewables (TWh)": 28.23, "Entity": "Poland", "Year": "2020"}, {"Electricity from renewables (TWh)": 0, "Entity": "Saudi Arabia", "Year": "2000"}, {"Electricity from renewables (TWh)": 0.21, "Entity": "Saudi Arabia", "Year": "2020"}, {"Electricity from renewables (TWh)": 1.79, "Entity": "South Africa", "Year": "2000"}, {"Electricity from renewables (TWh)": 12.83, "Entity": "South Africa", "Year": "2020"}, {"Electricity from renewables (TWh)": 34.49, "Entity": "Spain", "Year": "2000"}, {"Electricity from renewables (TWh)": 113.79, "Entity": "Spain", "Year": "2020"}, {"Electricity from renewables (TWh)": 6.38, "Entity": "Thailand", "Year": "2000"}, {"Electricity from renewables (TWh)": 24.73, "Entity": "Thailand", "Year": "2020"}, {"Electricity from renewables (TWh)": 11.28, "Entity": "Ukraine", "Year": "2000"}, {"Electricity from renewables (TWh)": 17.56, "Entity": "Ukraine", "Year": "2020"}, {"Electricity from renewables (TWh)": 9.98, "Entity": "United Kingdom", "Year": "2000"}, {"Electricity from renewables (TWh)": 131.74, "Entity": "United Kingdom", "Year": "2020"}, {"Electricity from renewables (TWh)": 350.93, "Entity": "United States", "Year": "2000"}, {"Electricity from renewables (TWh)": 821.4, "Entity": "United States", "Year": "2020"}], "metadata": {"Electricity from renewables (TWh)": {"type": "number", "semanticType": "Number"}, "Entity": {"type": "string", "semanticType": "Location"}, "Year": {"type": "number", "semanticType": "Year"}}, "derive": {"code": "\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n transformed_df = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Select only the required columns\n transformed_df = transformed_df[['Entity', 'Year', 'Electricity from renewables (TWh)']]\n \n # Convert Year to string for better visualization as a categorical variable\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year for better readability\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n", "source": ["global-energy-20-small.csv"], "dialog": [{"content": "You are a data scientist to help user to transform data that will be used for visualization.\nThe user will provide you information about what data would be needed, and your job is to create a python function based on the input data summary, transformation instruction and expected fields.\nThe users' instruction includes \"chart_type\" and \"chart_encodings\" that describe the visualization they want, and natural language instructions \"goal\" that describe what data is needed.\n\n**Important:**\n- NEVER make assumptions or judgments about a person's gender, biological sex, sexuality, religion, race, nationality, ethnicity, political stance, socioeconomic status, mental health, invisible disabilities, medical conditions, personality type, social impressions, emotional state, and cognitive state.\n- NEVER create formulas that could be used to discriminate based on age. Ageism of any form (explicit and implicit) is strictly prohibited.\n- If above issue occurs, generate columns with np.nan.\n\nConcretely, you should first refine users' goal and then create a python function in the output section based off the [CONTEXT] and [GOAL]:\n\n 1. First, refine users' [GOAL]. The main objective in this step is to check if \"chart_type\" and \"chart_encodings\" provided by the user are sufficient to achieve their \"goal\". Concretely:\n - based on the user's \"goal\" and \"chart_type\" and \"chart_encodings\", elaborate the goal into a \"detailed_instruction\".\n - \"display_instruction\" is a short verb phrase describing the users' goal. \n - it would be a short verbal description of user intent as a verb phrase (<12 words).\n - generate it based on detailed_instruction and the suggested chart_type and chart_encodings, but don't need to mention the chart details.\n - should capture key computation ideas: by reading the display, the user can understand the purpose and what's derived from the data.\n - if the user specification follows up the previous instruction, the 'display_instruction' should only describe how it builds up the previous instruction without repeating information from previous steps.\n - the phrase can be presented in different styles, e.g., question (what's xxx), instruction (show xxx), description, etc.\n - if you mention column names from the input or the output data, highlight the text in **bold**.\n * the column can either be a column in the input data, or a new column that will be computed in the output data.\n * the mention don't have to be exact match, it can be semantically matching, e.g., if you mentioned \"average score\" in the text while the column to be computed is \"Avg_Score\", you should still highlight \"**average score**\" in the text.\n - determine \"output_fields\", the desired fields that the output data should have to achieve the user's goal, it's a good idea to include intermediate fields here.\n - then decide \"chart_encodings\", which maps visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized, \n - the \"chart_encodings\" should be created to support the user's \"chart_type\".\n - first, determine whether the user has provided sufficient fields in \"chart_encodings\" that are needed to achieve their goal:\n - if the user's \"chart_encodings\" are sufficient, simply copy it.\n - if the user didn't provide sufficient fields in \"chart_encodings\", add missing fields in \"chart_encodings\" (ordered them based on whether the field will be used in x,y axes or legends);\n - \"chart_encodings\" should only include fields that will be visualized (do not include other intermediate fields from \"output_fields\") \n - when adding new fields to \"chart_encodings\", be efficient and add only a minimal number of fields that are needed to achive the user's goal. \n - generally, the total number of fields in \"chart_encodings\" should be no more than 3 for x,y,legend.\n - if the user's \"chart_encodings\" is sufficient but can be optimized, you can reorder encodings to visualize the data more effectively.\n - sometimes, user may provide instruction to update visualizations fields they provided. You should leverage the user's goal to resolve the conflict and decide the final \"chart_encodings\"\n - e.g., they may mention \"use B metric instead\" while A metric is in provided fields, in this case, you should update \"chart_encodings\" to update A metric with B metric.\n - guide on statistical analysis:\n - when the user asks for forecasting or regression analysis, you should consider the following:\n - the output should be a long format table where actual x, y pairs and predicted x, y pairs are included in the X, Y columns, they are differentiated with a third column \"is_predicted\" that is a boolean field.\n - i.e., if the user ask for forecasting based on two columns T and Y, the output should be three columns: T, Y, is_predicted, where\n - T, Y columns contain BOTH original values from the data and predicted values from the data.\n - is_predicted is a boolean field to indicate whether the x, y pairs are original values from the data or predicted / regression values from the data.\n - the recommended chart should be line chart (time series) or scatter plot (quantitative x, y)\n - if the user asks for forecasting, it's good to include predicted x, y pairs for both x in the original data and future x values (i.e., combine regression and forecasting results)\n - in this case, is_predicted should be of three values 'original', 'regression', 'forecasting'\n - when the user asks for clustering:\n - the output should be a long format table where actual x, y pairs with a third column \"cluster_id\" that indicates the cluster id of the data point.\n - the recommended chart should be scatter plot (quantitative x, y)\n \n Prepare the result in the following json format:\n\n```\n{\n \"detailed_instruction\": \"...\" // string, elaborate user instruction with details if the user\n \"display_instruction\": \"...\" // string, the short verb phrase describing the users' goal.\n \"output_fields\": [...] // string[], describe the desired output fields that the output data should have based on the user's goal, it's a good idea to preserve intermediate fields here (i.e., the goal of transformed data)\n \"chart_encodings\": {\n \"x\": \"\",\n \"y\": \"\",\n \"color\": \"\",\n \"size\": \"\",\n \"opacity\": \"\",\n \"facet\": \"\",\n ... // other visualization channels user used\n } // object: map visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized.\n \"reason\": \"...\" // string, explain why this refinement is made\n}\n```\n\n 2. Then, write a python function based on the refined goal, the function input is a dataframe \"df\" (or multiple dataframes based on tables presented in the [CONTEXT] section) and the output is the transformed dataframe \"transformed_df\". \"transformed_df\" should contain all \"output_fields\" from the refined goal.\nThe python function must follow the template provided in [TEMPLATE], do not import any other libraries or modify function name. The function should be as simple as possible and easily readable.\nIf there is no data transformation needed based on \"output_fields\", the transformation function can simply \"return df\".\n\n[TEMPLATE]\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\nfrom sklearn import ... # import necessary libraries from sklearn if needed\n\ndef transform_data(df1, df2, ...): \n # complete the template here\n return transformed_df\n```\n\nnote: \n- if the user provided one table, then it should be `def transform_data(df1)`, if the user provided multiple tables, then it should be `def transform_data(df1, df2, ...)` and you should consider the join between tables to derive the output.\n- **VERY IMPORTANT** the number of arguments in the function must match the number of tables provided, and the order of arguments must match the order of tables provided.\n- try to use intuitive table names to refer to the input dataframes, for example, if the user provided two tables city and weather, you can use `transform_data(df_city, df_weather)` to refer to the two dataframes, as long as the number and order of the arguments match the number and order of the tables provided.\n- datetime objects handling:\n - if the output field is year, convert it to number, if it is year-month / year-month-day, convert it to string object (e.g., \"2020-01\" / \"2020-01-01\").\n - if the output is time only: convert hour to number if it's just the hour (e.g., 10), but convert hour:min or h:m:s to string object (e.g., \"10:30\", \"10:30:45\")\n - never return datetime object directly, convert it to either number (if it only contains year) or string so it's readable.\n\n 3. The output must only contain a json object representing the refined goal and a python code block representing the transformation code, do not add any extra text explanation.\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 5736319.824, 5738290.039, nan, 9861099.609\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 61.99, 62.21, 63.02, 63.71\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[GOAL]\n\n{\n \"instruction\": \"compare each countries renewable between 2000 and 2020\",\n \"chart_type\": \"Grouped Bar Chart\",\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Renewable Energy\",\n \"color\": \"Year\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Compare renewable energy (Electricity from renewables) for each country between the years 2000 and 2020. Filter the data to include only years 2000 and 2020, and prepare the output with country names, year, and renewable energy values for visualization as a grouped bar chart.\",\n \"display_instruction\": \"Compare **renewable energy** by **country** between **2000** and **2020**\",\n \"output_fields\": [\"Entity\", \"Year\", \"Electricity from renewables (TWh)\"],\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Electricity from renewables (TWh)\",\n \"color\": \"Year\"\n },\n \"reason\": \"The user wants to compare renewable energy across countries between 2000 and 2020. The output field 'Renewable Energy' mentioned in the original chart_encodings maps to 'Electricity from renewables (TWh)' in the dataset. To create a grouped bar chart comparing these two years, we need to filter the data to only include years 2000 and 2020, and output the Entity (country), Year, and the renewable energy values. The chart_encodings are updated to use the actual column name from the dataset.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n transformed_df = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Select only the required columns\n transformed_df = transformed_df[['Entity', 'Year', 'Electricity from renewables (TWh)']]\n \n # Convert Year to string for better visualization as a categorical variable\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year for better readability\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}], "trigger": {"tableId": "global-energy-20-small.csv", "resultTableId": "table-97", "chart": {"id": "chart-1760738769634", "chartType": "Grouped Bar Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "y": {"fieldID": "concept-1760738743125"}, "color": {"channel": "color", "bin": false, "fieldID": "original--global-energy-20-small.csv--Year"}, "column": {"channel": "column", "bin": false}, "row": {"channel": "row", "bin": false}}, "tableRef": "global-energy-20-small.csv", "saved": false, "source": "trigger", "unread": false}, "interaction": [{"from": "user", "to": "datatransform-agent", "role": "instruction", "content": "compare each countries renewable between 2000 and 2020", "displayContent": "Compare **renewable energy** by **country** between **2000** and **2020**"}]}, "explanation": {"agent": "CodeExplanationAgent", "code": "The code performs a straightforward comparison analysis of renewable energy generation across countries:\n\n- **Filters** the dataset to include only data from years **2000** and **2020**, creating a 20-year comparison window\n- **Selects** three key columns: `Entity` (country name), `Year`, and `Electricity from renewables (TWh)` (renewable energy generation)\n- **Converts** the `Year` field to string format to treat it as a categorical variable for visualization purposes\n- **Sorts** the results by `Entity` and `Year` to organize countries alphabetically with their respective year data grouped together\n- **Returns** a cleaned dataset showing how renewable electricity generation has changed for each country between 2000 and 2020", "concepts": [], "dialog": [{"content": "You are a data scientist to help user explain code, \nso that a non-code can clearly understand what the code is doing, you are provided with a summary of the input data, and the transformation code.\n\nYour goal:\n1. You should generate a good itemized explanation of the code so that the reader can understand high-level steps of what the data transformation is doing.\n - Be very concise, and stay at a high-level. The reader doesn't understand code and does not want to learn exactly what the code is doing. They just want to learn what have been done from a logical level.\n - The explanation should be a markdown string that is a list of bullet points (with new lines), highlight constants, data fields, and important verbs.\n2. Generate a list of explanations for new fields (fields not from the input data) that introduce metrics/concepts that are not obvious from the code.\n - provide a declarative definition that explains the new field, use a mathematical notation if applicable.\n - only include new fields explanation of new metrics that are involved in computation (e.g., ROI, commerical_success_score)\n - *DO NOT* explain trivial new fields like \"Decade\" or \"Avg_Rating\", \"US_Sales\" that are self-explanatory.\n - Avoid explaining fields that are simple aggregate of fields in the original data (min_score, avg_value, count, etc.)\n - When a field involves mathematical computation, you can use LaTeX math notation in the explanation. Format mathematical expressions using:\n - Inline math: `\\( ... \\)` for formulas within text\n - Block math: `\\[ ... \\]` for standalone formulas\n - Examples: `\\( \\frac{\\text{Revenue}}{\\text{Cost}} \\)` for ratios, `\\[ \\text{Score} = \\text{Rating} \\times \\text{Worldwide\\_Gross} \\]` for formulas\n - note: when using underscores as part of the text, you need to escape them with a backslash, e.g., `\\_`\n - Note: don't use math notation for fields whose computation is trivial (use plain english), it will likely be confusing to the reader. \n Only use math notation for fields that can not be easilyexplained in plain english. Use it sparingly.\n3. If there are multiple fields that have the similar computation, you can explain them together in one explanation.\n - in \"field\", you can provide a list of fields in format of \"field1, field2, ...\"\n - in \"explanation\", you can provide a single explanation for the computation of the fields.\n - for example, if you have fields like \"Norm_Rating\", \"Norm_Gross\", \"Critical_Commercial_Score\", you can explain Norm_Rating, Norm_Gross together in one explanation and explain Critical_Commercial_Score in another explanation.\n4. If the code is about statistical analysis, you should explain the statistical analysis in the explanation as a concept named \"Statistical Analysis\" in the [CONCEPTS EXPLANATION] section.\n - explain how you model the data, which fields are used, how data processing is done, and what models are used.\n - suggest some other modeling approaches that can be used to analyze the data in the explanation as well.\n \nThe focus is to explain how new fields are computed, don't generate explanation for low-level actions like \"return\", \"load data\" etc. \n\nProvide the result in the following two sections:\n - first section is the code explanation that should be a markdown block explaining the code, in the [CODE EXPLANATION] section.\n - remember to highlight constants, data fields, and important verbs in the code explanation.\n - second section is the concepts explanation that should be a json block (start with ```json) in the [CONCEPTS EXPLANATION] section.\n\n[CODE EXPLANATION]\n\n...(explanation of the code)\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"...\",\n \"explanation\": \"...\"\n }\n]\n\n```\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small_csv)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 10006669.92, 10502929.69, 10707219.73, nan\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 56.18, 61.99, 63.02, 63.71\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[CODE]\n\nhere is the transformation code: \nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n transformed_df = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Select only the required columns\n transformed_df = transformed_df[['Entity', 'Year', 'Electricity from renewables (TWh)']]\n \n # Convert Year to string for better visualization as a categorical variable\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year for better readability\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n\n\n[EXPLANATION]\n", "role": "user"}, {"content": "[CODE EXPLANATION]\n\nThe code performs a straightforward comparison analysis of renewable energy generation across countries:\n\n- **Filters** the dataset to include only data from years **2000** and **2020**, creating a 20-year comparison window\n- **Selects** three key columns: `Entity` (country name), `Year`, and `Electricity from renewables (TWh)` (renewable energy generation)\n- **Converts** the `Year` field to string format to treat it as a categorical variable for visualization purposes\n- **Sorts** the results by `Entity` and `Year` to organize countries alphabetically with their respective year data grouped together\n- **Returns** a cleaned dataset showing how renewable electricity generation has changed for each country between 2000 and 2020\n\n[CONCEPTS EXPLANATION]\n\n```json\n[]\n```", "role": "assistant"}], "status": "ok"}, "status": "completed"}, "anchored": false, "attachedMetadata": ""}, {"kind": "table", "id": "table-27", "displayId": "renewable-energy1", "names": ["Entity", "Renewable Percentage", "Year"], "rows": [{"Entity": "Australia", "Renewable Percentage": 8.6344368187, "Year": "2000"}, {"Entity": "Australia", "Renewable Percentage": 25.5031684668, "Year": "2020"}, {"Entity": "Brazil", "Renewable Percentage": 90.1307723743, "Year": "2000"}, {"Entity": "Brazil", "Renewable Percentage": 84.6411771408, "Year": "2020"}, {"Entity": "Canada", "Renewable Percentage": 61.8095917882, "Year": "2000"}, {"Entity": "Canada", "Renewable Percentage": 68.7796436354, "Year": "2020"}, {"Entity": "China", "Renewable Percentage": 16.639126586, "Year": "2000"}, {"Entity": "China", "Renewable Percentage": 28.2464606924, "Year": "2020"}, {"Entity": "France", "Renewable Percentage": 12.7117691154, "Year": "2000"}, {"Entity": "France", "Renewable Percentage": 23.7610241821, "Year": "2020"}, {"Entity": "Germany", "Renewable Percentage": 6.1977983575, "Year": "2000"}, {"Entity": "Germany", "Renewable Percentage": 44.3324048937, "Year": "2020"}, {"Entity": "Global Average", "Renewable Percentage": 16.4213212559, "Year": "2000"}, {"Entity": "Global Average", "Renewable Percentage": 29.2955247263, "Year": "2020"}, {"Entity": "India", "Renewable Percentage": 14.0481982534, "Year": "2000"}, {"Entity": "India", "Renewable Percentage": 20.2059243238, "Year": "2020"}, {"Entity": "Indonesia", "Renewable Percentage": null, "Year": "2000"}, {"Entity": "Indonesia", "Renewable Percentage": null, "Year": "2020"}, {"Entity": "Italy", "Renewable Percentage": 18.900241501, "Year": "2000"}, {"Entity": "Italy", "Renewable Percentage": 42.0397741576, "Year": "2020"}, {"Entity": "Japan", "Renewable Percentage": 10.5382436261, "Year": "2000"}, {"Entity": "Japan", "Renewable Percentage": 21.324925062, "Year": "2020"}, {"Entity": "Kazakhstan", "Renewable Percentage": null, "Year": "2000"}, {"Entity": "Kazakhstan", "Renewable Percentage": null, "Year": "2020"}, {"Entity": "Mexico", "Renewable Percentage": 22.9291160107, "Year": "2000"}, {"Entity": "Mexico", "Renewable Percentage": 21.2552224134, "Year": "2020"}, {"Entity": "Poland", "Renewable Percentage": 1.6273222517, "Year": "2000"}, {"Entity": "Poland", "Renewable Percentage": 17.9648720886, "Year": "2020"}, {"Entity": "Saudi Arabia", "Renewable Percentage": null, "Year": "2000"}, {"Entity": "Saudi Arabia", "Renewable Percentage": null, "Year": "2020"}, {"Entity": "South Africa", "Renewable Percentage": 0.9110805721, "Year": "2000"}, {"Entity": "South Africa", "Renewable Percentage": 5.780581212, "Year": "2020"}, {"Entity": "Spain", "Renewable Percentage": 15.6119862394, "Year": "2000"}, {"Entity": "Spain", "Renewable Percentage": 43.8108805298, "Year": "2020"}, {"Entity": "Thailand", "Renewable Percentage": 7.1261029822, "Year": "2000"}, {"Entity": "Thailand", "Renewable Percentage": 13.7963737796, "Year": "2020"}, {"Entity": "Ukraine", "Renewable Percentage": 6.5860921352, "Year": "2000"}, {"Entity": "Ukraine", "Renewable Percentage": 11.8440577364, "Year": "2020"}, {"Entity": "United Kingdom", "Renewable Percentage": 2.6657406913, "Year": "2000"}, {"Entity": "United Kingdom", "Renewable Percentage": 42.8603962651, "Year": "2020"}, {"Entity": "United States", "Renewable Percentage": 9.2298992662, "Year": "2000"}, {"Entity": "United States", "Renewable Percentage": 20.3156921037, "Year": "2020"}], "metadata": {"Entity": {"type": "string", "semanticType": "Name"}, "Renewable Percentage": {"type": "number", "semanticType": "Percentage"}, "Year": {"type": "number", "semanticType": "Year"}}, "derive": {"code": "\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n df_filtered = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Calculate total electricity generation\n df_filtered['Total Electricity'] = (\n df_filtered['Electricity from fossil fuels (TWh)'] + \n df_filtered['Electricity from nuclear (TWh)'] + \n df_filtered['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_filtered['Renewable Percentage'] = (\n df_filtered['Electricity from renewables (TWh)'] / df_filtered['Total Electricity'] * 100\n )\n \n # Select required columns for individual countries\n country_data = df_filtered[['Entity', 'Year', 'Renewable Percentage']].copy()\n \n # Calculate global average for each year\n global_avg = df_filtered.groupby('Year').apply(\n lambda x: pd.Series({\n 'Entity': 'Global Average',\n 'Year': x['Year'].iloc[0],\n 'Renewable Percentage': (\n x['Electricity from renewables (TWh)'].sum() / x['Total Electricity'].sum() * 100\n )\n })\n ).reset_index(drop=True)\n \n # Combine country data with global average\n transformed_df = pd.concat([country_data, global_avg], ignore_index=True)\n \n # Convert Year to string for better visualization\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n", "source": ["global-energy-20-small.csv"], "dialog": [{"content": "You are a data scientist to help user to transform data that will be used for visualization.\nThe user will provide you information about what data would be needed, and your job is to create a python function based on the input data summary, transformation instruction and expected fields.\nThe users' instruction includes \"chart_type\" and \"chart_encodings\" that describe the visualization they want, and natural language instructions \"goal\" that describe what data is needed.\n\n**Important:**\n- NEVER make assumptions or judgments about a person's gender, biological sex, sexuality, religion, race, nationality, ethnicity, political stance, socioeconomic status, mental health, invisible disabilities, medical conditions, personality type, social impressions, emotional state, and cognitive state.\n- NEVER create formulas that could be used to discriminate based on age. Ageism of any form (explicit and implicit) is strictly prohibited.\n- If above issue occurs, generate columns with np.nan.\n\nConcretely, you should first refine users' goal and then create a python function in the output section based off the [CONTEXT] and [GOAL]:\n\n 1. First, refine users' [GOAL]. The main objective in this step is to check if \"chart_type\" and \"chart_encodings\" provided by the user are sufficient to achieve their \"goal\". Concretely:\n - based on the user's \"goal\" and \"chart_type\" and \"chart_encodings\", elaborate the goal into a \"detailed_instruction\".\n - \"display_instruction\" is a short verb phrase describing the users' goal. \n - it would be a short verbal description of user intent as a verb phrase (<12 words).\n - generate it based on detailed_instruction and the suggested chart_type and chart_encodings, but don't need to mention the chart details.\n - should capture key computation ideas: by reading the display, the user can understand the purpose and what's derived from the data.\n - if the user specification follows up the previous instruction, the 'display_instruction' should only describe how it builds up the previous instruction without repeating information from previous steps.\n - the phrase can be presented in different styles, e.g., question (what's xxx), instruction (show xxx), description, etc.\n - if you mention column names from the input or the output data, highlight the text in **bold**.\n * the column can either be a column in the input data, or a new column that will be computed in the output data.\n * the mention don't have to be exact match, it can be semantically matching, e.g., if you mentioned \"average score\" in the text while the column to be computed is \"Avg_Score\", you should still highlight \"**average score**\" in the text.\n - determine \"output_fields\", the desired fields that the output data should have to achieve the user's goal, it's a good idea to include intermediate fields here.\n - then decide \"chart_encodings\", which maps visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized, \n - the \"chart_encodings\" should be created to support the user's \"chart_type\".\n - first, determine whether the user has provided sufficient fields in \"chart_encodings\" that are needed to achieve their goal:\n - if the user's \"chart_encodings\" are sufficient, simply copy it.\n - if the user didn't provide sufficient fields in \"chart_encodings\", add missing fields in \"chart_encodings\" (ordered them based on whether the field will be used in x,y axes or legends);\n - \"chart_encodings\" should only include fields that will be visualized (do not include other intermediate fields from \"output_fields\") \n - when adding new fields to \"chart_encodings\", be efficient and add only a minimal number of fields that are needed to achive the user's goal. \n - generally, the total number of fields in \"chart_encodings\" should be no more than 3 for x,y,legend.\n - if the user's \"chart_encodings\" is sufficient but can be optimized, you can reorder encodings to visualize the data more effectively.\n - sometimes, user may provide instruction to update visualizations fields they provided. You should leverage the user's goal to resolve the conflict and decide the final \"chart_encodings\"\n - e.g., they may mention \"use B metric instead\" while A metric is in provided fields, in this case, you should update \"chart_encodings\" to update A metric with B metric.\n - guide on statistical analysis:\n - when the user asks for forecasting or regression analysis, you should consider the following:\n - the output should be a long format table where actual x, y pairs and predicted x, y pairs are included in the X, Y columns, they are differentiated with a third column \"is_predicted\" that is a boolean field.\n - i.e., if the user ask for forecasting based on two columns T and Y, the output should be three columns: T, Y, is_predicted, where\n - T, Y columns contain BOTH original values from the data and predicted values from the data.\n - is_predicted is a boolean field to indicate whether the x, y pairs are original values from the data or predicted / regression values from the data.\n - the recommended chart should be line chart (time series) or scatter plot (quantitative x, y)\n - if the user asks for forecasting, it's good to include predicted x, y pairs for both x in the original data and future x values (i.e., combine regression and forecasting results)\n - in this case, is_predicted should be of three values 'original', 'regression', 'forecasting'\n - when the user asks for clustering:\n - the output should be a long format table where actual x, y pairs with a third column \"cluster_id\" that indicates the cluster id of the data point.\n - the recommended chart should be scatter plot (quantitative x, y)\n \n Prepare the result in the following json format:\n\n```\n{\n \"detailed_instruction\": \"...\" // string, elaborate user instruction with details if the user\n \"display_instruction\": \"...\" // string, the short verb phrase describing the users' goal.\n \"output_fields\": [...] // string[], describe the desired output fields that the output data should have based on the user's goal, it's a good idea to preserve intermediate fields here (i.e., the goal of transformed data)\n \"chart_encodings\": {\n \"x\": \"\",\n \"y\": \"\",\n \"color\": \"\",\n \"size\": \"\",\n \"opacity\": \"\",\n \"facet\": \"\",\n ... // other visualization channels user used\n } // object: map visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized.\n \"reason\": \"...\" // string, explain why this refinement is made\n}\n```\n\n 2. Then, write a python function based on the refined goal, the function input is a dataframe \"df\" (or multiple dataframes based on tables presented in the [CONTEXT] section) and the output is the transformed dataframe \"transformed_df\". \"transformed_df\" should contain all \"output_fields\" from the refined goal.\nThe python function must follow the template provided in [TEMPLATE], do not import any other libraries or modify function name. The function should be as simple as possible and easily readable.\nIf there is no data transformation needed based on \"output_fields\", the transformation function can simply \"return df\".\n\n[TEMPLATE]\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\nfrom sklearn import ... # import necessary libraries from sklearn if needed\n\ndef transform_data(df1, df2, ...): \n # complete the template here\n return transformed_df\n```\n\nnote: \n- if the user provided one table, then it should be `def transform_data(df1)`, if the user provided multiple tables, then it should be `def transform_data(df1, df2, ...)` and you should consider the join between tables to derive the output.\n- **VERY IMPORTANT** the number of arguments in the function must match the number of tables provided, and the order of arguments must match the order of tables provided.\n- try to use intuitive table names to refer to the input dataframes, for example, if the user provided two tables city and weather, you can use `transform_data(df_city, df_weather)` to refer to the two dataframes, as long as the number and order of the arguments match the number and order of the tables provided.\n- datetime objects handling:\n - if the output field is year, convert it to number, if it is year-month / year-month-day, convert it to string object (e.g., \"2020-01\" / \"2020-01-01\").\n - if the output is time only: convert hour to number if it's just the hour (e.g., 10), but convert hour:min or h:m:s to string object (e.g., \"10:30\", \"10:30:45\")\n - never return datetime object directly, convert it to either number (if it only contains year) or string so it's readable.\n\n 3. The output must only contain a json object representing the refined goal and a python code block representing the transformation code, do not add any extra text explanation.\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 5736319.824, 5738290.039, nan, 9861099.609\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 61.99, 62.21, 63.02, 63.71\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[GOAL]\n\n{\n \"instruction\": \"compare each countries renewable between 2000 and 2020\",\n \"chart_type\": \"Grouped Bar Chart\",\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Renewable Energy\",\n \"color\": \"Year\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Compare renewable energy (Electricity from renewables) for each country between the years 2000 and 2020. Filter the data to include only years 2000 and 2020, and prepare the output with country names, year, and renewable energy values for visualization as a grouped bar chart.\",\n \"display_instruction\": \"Compare **renewable energy** by **country** between **2000** and **2020**\",\n \"output_fields\": [\"Entity\", \"Year\", \"Electricity from renewables (TWh)\"],\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Electricity from renewables (TWh)\",\n \"color\": \"Year\"\n },\n \"reason\": \"The user wants to compare renewable energy across countries between 2000 and 2020. The output field 'Renewable Energy' mentioned in the original chart_encodings maps to 'Electricity from renewables (TWh)' in the dataset. To create a grouped bar chart comparing these two years, we need to filter the data to only include years 2000 and 2020, and output the Entity (country), Year, and the renewable energy values. The chart_encodings are updated to use the actual column name from the dataset.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n transformed_df = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Select only the required columns\n transformed_df = transformed_df[['Entity', 'Year', 'Electricity from renewables (TWh)']]\n \n # Convert Year to string for better visualization as a categorical variable\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year for better readability\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Electricity from renewables (TWh) Entity Year\n0 17.11 Australia 2000\n1 63.99 Australia 2020\n2 308.77 Brazil 2000\n3 520.01 Brazil 2020\n4 363.70 Canada 2000\n5 429.24 Canada 2020\n6 225.56 China 2000\n7 2184.94 China 2020\n8 67.83 France 2000\n9 125.28 France 2020\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show their renewable percentage instead, and add an entity to show global average\",\n \"chart_type\": \"Grouped Bar Chart\",\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Electricity from renewables (TWh)\",\n \"color\": \"Year\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Calculate the renewable energy percentage (proportion of renewables to total electricity generation from all sources) for each country between 2000 and 2020. Add a 'Global Average' entity that represents the worldwide average renewable percentage for each year. The output should include Entity (with 'Global Average' added), Year, and the renewable percentage.\",\n \"display_instruction\": \"Show **renewable energy percentage** by **country** with **global average** for **2000** and **2020**\",\n \"output_fields\": [\"Entity\", \"Year\", \"Renewable Percentage\"],\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Renewable Percentage\",\n \"color\": \"Year\"\n },\n \"reason\": \"The user wants to see renewable percentage instead of absolute values, and wants to add a global average entity for comparison. Renewable percentage is calculated as the ratio of renewable energy to total electricity generation (fossil fuels + nuclear + renewables). A 'Global Average' entity needs to be computed by aggregating all countries' data for each year. The y-axis encoding is updated to reflect the new metric 'Renewable Percentage'.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n df_filtered = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Calculate total electricity generation\n df_filtered['Total Electricity'] = (\n df_filtered['Electricity from fossil fuels (TWh)'] + \n df_filtered['Electricity from nuclear (TWh)'] + \n df_filtered['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_filtered['Renewable Percentage'] = (\n df_filtered['Electricity from renewables (TWh)'] / df_filtered['Total Electricity'] * 100\n )\n \n # Select required columns for individual countries\n country_data = df_filtered[['Entity', 'Year', 'Renewable Percentage']].copy()\n \n # Calculate global average for each year\n global_avg = df_filtered.groupby('Year').apply(\n lambda x: pd.Series({\n 'Entity': 'Global Average',\n 'Year': x['Year'].iloc[0],\n 'Renewable Percentage': (\n x['Electricity from renewables (TWh)'].sum() / x['Total Electricity'].sum() * 100\n )\n })\n ).reset_index(drop=True)\n \n # Combine country data with global average\n transformed_df = pd.concat([country_data, global_avg], ignore_index=True)\n \n # Convert Year to string for better visualization\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}], "trigger": {"tableId": "table-97", "resultTableId": "table-27", "chart": {"id": "chart-1760738813383", "chartType": "Grouped Bar Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "y": {"fieldID": "original--global-energy-20-small.csv--Electricity from renewables (TWh)"}, "color": {"fieldID": "original--global-energy-20-small.csv--Year"}, "column": {"channel": "column", "bin": false}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-97", "saved": false, "source": "trigger", "unread": false}, "interaction": [{"from": "user", "to": "datatransform-agent", "role": "instruction", "content": "show their renewable percentage instead, and add an entity to show global average", "displayContent": "Show **renewable energy percentage** by **country** with **global average** for **2000** and **2020**"}]}, "explanation": {"agent": "CodeExplanationAgent", "code": "The code performs the following transformation steps:\n\n1. **Filter data** to include only records from **2000** and **2020**\n2. Calculate **Total Electricity** generation by summing three sources: **fossil fuels**, **nuclear**, and **renewables** (all measured in TWh)\n3. Compute **Renewable Percentage** for each country and year by dividing **Electricity from renewables** by **Total Electricity** and multiplying by 100\n4. Extract relevant columns: **Entity** (country name), **Year**, and **Renewable Percentage**\n5. Calculate **Global Average** renewable percentage for each year by:\n - Summing all countries' **renewable electricity** generation\n - Dividing by the sum of all countries' **total electricity** generation\n - Multiplying by 100 to get percentage\n6. Combine individual country data with the **Global Average** statistics\n7. Convert **Year** values to string format\n8. Sort results by **Entity** name and **Year**", "concepts": [{"explanation": "The sum of electricity generated from all three sources (fossil fuels, nuclear, and renewables) measured in terawatt-hours (TWh). This represents the total electricity production capacity for each country.", "field": "Total Electricity"}, {"explanation": "The proportion of electricity generated from renewable sources relative to total electricity production, expressed as a percentage: \\( \\text{Renewable Percentage} = \\frac{\\text{Electricity from renewables}}{\\text{Total Electricity}} \\times 100 \\). This metric indicates the extent of renewable energy adoption in a country's electricity mix.", "field": "Renewable Percentage"}], "dialog": [{"content": "You are a data scientist to help user explain code, \nso that a non-code can clearly understand what the code is doing, you are provided with a summary of the input data, and the transformation code.\n\nYour goal:\n1. You should generate a good itemized explanation of the code so that the reader can understand high-level steps of what the data transformation is doing.\n - Be very concise, and stay at a high-level. The reader doesn't understand code and does not want to learn exactly what the code is doing. They just want to learn what have been done from a logical level.\n - The explanation should be a markdown string that is a list of bullet points (with new lines), highlight constants, data fields, and important verbs.\n2. Generate a list of explanations for new fields (fields not from the input data) that introduce metrics/concepts that are not obvious from the code.\n - provide a declarative definition that explains the new field, use a mathematical notation if applicable.\n - only include new fields explanation of new metrics that are involved in computation (e.g., ROI, commerical_success_score)\n - *DO NOT* explain trivial new fields like \"Decade\" or \"Avg_Rating\", \"US_Sales\" that are self-explanatory.\n - Avoid explaining fields that are simple aggregate of fields in the original data (min_score, avg_value, count, etc.)\n - When a field involves mathematical computation, you can use LaTeX math notation in the explanation. Format mathematical expressions using:\n - Inline math: `\\( ... \\)` for formulas within text\n - Block math: `\\[ ... \\]` for standalone formulas\n - Examples: `\\( \\frac{\\text{Revenue}}{\\text{Cost}} \\)` for ratios, `\\[ \\text{Score} = \\text{Rating} \\times \\text{Worldwide\\_Gross} \\]` for formulas\n - note: when using underscores as part of the text, you need to escape them with a backslash, e.g., `\\_`\n - Note: don't use math notation for fields whose computation is trivial (use plain english), it will likely be confusing to the reader. \n Only use math notation for fields that can not be easilyexplained in plain english. Use it sparingly.\n3. If there are multiple fields that have the similar computation, you can explain them together in one explanation.\n - in \"field\", you can provide a list of fields in format of \"field1, field2, ...\"\n - in \"explanation\", you can provide a single explanation for the computation of the fields.\n - for example, if you have fields like \"Norm_Rating\", \"Norm_Gross\", \"Critical_Commercial_Score\", you can explain Norm_Rating, Norm_Gross together in one explanation and explain Critical_Commercial_Score in another explanation.\n4. If the code is about statistical analysis, you should explain the statistical analysis in the explanation as a concept named \"Statistical Analysis\" in the [CONCEPTS EXPLANATION] section.\n - explain how you model the data, which fields are used, how data processing is done, and what models are used.\n - suggest some other modeling approaches that can be used to analyze the data in the explanation as well.\n \nThe focus is to explain how new fields are computed, don't generate explanation for low-level actions like \"return\", \"load data\" etc. \n\nProvide the result in the following two sections:\n - first section is the code explanation that should be a markdown block explaining the code, in the [CODE EXPLANATION] section.\n - remember to highlight constants, data fields, and important verbs in the code explanation.\n - second section is the concepts explanation that should be a json block (start with ```json) in the [CONCEPTS EXPLANATION] section.\n\n[CODE EXPLANATION]\n\n...(explanation of the code)\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"...\",\n \"explanation\": \"...\"\n }\n]\n\n```\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small_csv)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 10096009.77, 10502929.69, 10707219.73, nan\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., nan, nan, nan, nan\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[CODE]\n\nhere is the transformation code: \nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n df_filtered = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Calculate total electricity generation\n df_filtered['Total Electricity'] = (\n df_filtered['Electricity from fossil fuels (TWh)'] + \n df_filtered['Electricity from nuclear (TWh)'] + \n df_filtered['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_filtered['Renewable Percentage'] = (\n df_filtered['Electricity from renewables (TWh)'] / df_filtered['Total Electricity'] * 100\n )\n \n # Select required columns for individual countries\n country_data = df_filtered[['Entity', 'Year', 'Renewable Percentage']].copy()\n \n # Calculate global average for each year\n global_avg = df_filtered.groupby('Year').apply(\n lambda x: pd.Series({\n 'Entity': 'Global Average',\n 'Year': x['Year'].iloc[0],\n 'Renewable Percentage': (\n x['Electricity from renewables (TWh)'].sum() / x['Total Electricity'].sum() * 100\n )\n })\n ).reset_index(drop=True)\n \n # Combine country data with global average\n transformed_df = pd.concat([country_data, global_avg], ignore_index=True)\n \n # Convert Year to string for better visualization\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n\n\n[EXPLANATION]\n", "role": "user"}, {"content": "[CODE EXPLANATION]\n\nThe code performs the following transformation steps:\n\n1. **Filter data** to include only records from **2000** and **2020**\n2. Calculate **Total Electricity** generation by summing three sources: **fossil fuels**, **nuclear**, and **renewables** (all measured in TWh)\n3. Compute **Renewable Percentage** for each country and year by dividing **Electricity from renewables** by **Total Electricity** and multiplying by 100\n4. Extract relevant columns: **Entity** (country name), **Year**, and **Renewable Percentage**\n5. Calculate **Global Average** renewable percentage for each year by:\n - Summing all countries' **renewable electricity** generation\n - Dividing by the sum of all countries' **total electricity** generation\n - Multiplying by 100 to get percentage\n6. Combine individual country data with the **Global Average** statistics\n7. Convert **Year** values to string format\n8. Sort results by **Entity** name and **Year**\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"Total Electricity\",\n \"explanation\": \"The sum of electricity generated from all three sources (fossil fuels, nuclear, and renewables) measured in terawatt-hours (TWh). This represents the total electricity production capacity for each country.\"\n },\n {\n \"field\": \"Renewable Percentage\",\n \"explanation\": \"The proportion of electricity generated from renewable sources relative to total electricity production, expressed as a percentage: \\\\( \\\\text{Renewable Percentage} = \\\\frac{\\\\text{Electricity from renewables}}{\\\\text{Total Electricity}} \\\\times 100 \\\\). This metric indicates the extent of renewable energy adoption in a country's electricity mix.\"\n }\n]\n```", "role": "assistant"}], "status": "ok"}, "status": "completed"}, "anchored": false, "attachedMetadata": ""}, {"kind": "table", "id": "table-81", "displayId": "energy-source1", "names": ["Energy", "Energy Source", "Year"], "rows": [{"Energy": 7160.71, "Energy Source": "Fossil Fuels", "Year": 2000}, {"Energy": 1996.65, "Energy Source": "Nuclear", "Year": 2000}, {"Energy": 1742.56, "Energy Source": "Renewables", "Year": 2000}, {"Energy": 7273.89, "Energy Source": "Fossil Fuels", "Year": 2001}, {"Energy": 2037.54, "Energy Source": "Nuclear", "Year": 2001}, {"Energy": 1690.11, "Energy Source": "Renewables", "Year": 2001}, {"Energy": 7621.07, "Energy Source": "Fossil Fuels", "Year": 2002}, {"Energy": 2042.18, "Energy Source": "Nuclear", "Year": 2002}, {"Energy": 1757.63, "Energy Source": "Renewables", "Year": 2002}, {"Energy": 8043.86, "Energy Source": "Fossil Fuels", "Year": 2003}, {"Energy": 1998.52, "Energy Source": "Nuclear", "Year": 2003}, {"Energy": 1804.52, "Energy Source": "Renewables", "Year": 2003}, {"Energy": 8399.72, "Energy Source": "Fossil Fuels", "Year": 2004}, {"Energy": 2095, "Energy Source": "Nuclear", "Year": 2004}, {"Energy": 1952.72, "Energy Source": "Renewables", "Year": 2004}, {"Energy": 8828.43, "Energy Source": "Fossil Fuels", "Year": 2005}, {"Energy": 2094.4, "Energy Source": "Nuclear", "Year": 2005}, {"Energy": 2025.26, "Energy Source": "Renewables", "Year": 2005}, {"Energy": 9183.05, "Energy Source": "Fossil Fuels", "Year": 2006}, {"Energy": 2120.38, "Energy Source": "Nuclear", "Year": 2006}, {"Energy": 2165.94, "Energy Source": "Renewables", "Year": 2006}, {"Energy": 9853.09, "Energy Source": "Fossil Fuels", "Year": 2007}, {"Energy": 2067.04, "Energy Source": "Nuclear", "Year": 2007}, {"Energy": 2256.79, "Energy Source": "Renewables", "Year": 2007}, {"Energy": 9817.15, "Energy Source": "Fossil Fuels", "Year": 2008}, {"Energy": 2043.94, "Energy Source": "Nuclear", "Year": 2008}, {"Energy": 2496.03, "Energy Source": "Renewables", "Year": 2008}, {"Energy": 9686.86, "Energy Source": "Fossil Fuels", "Year": 2009}, {"Energy": 2017.25, "Energy Source": "Nuclear", "Year": 2009}, {"Energy": 2563.95, "Energy Source": "Renewables", "Year": 2009}, {"Energy": 10427.03, "Energy Source": "Fossil Fuels", "Year": 2010}, {"Energy": 2083.37, "Energy Source": "Nuclear", "Year": 2010}, {"Energy": 2802.89, "Energy Source": "Renewables", "Year": 2010}, {"Energy": 10974.83, "Energy Source": "Fossil Fuels", "Year": 2011}, {"Energy": 1956, "Energy Source": "Nuclear", "Year": 2011}, {"Energy": 2997.29, "Energy Source": "Renewables", "Year": 2011}, {"Energy": 11277.49, "Energy Source": "Fossil Fuels", "Year": 2012}, {"Energy": 1788.26, "Energy Source": "Nuclear", "Year": 2012}, {"Energy": 3226.1, "Energy Source": "Renewables", "Year": 2012}, {"Energy": 11561.86, "Energy Source": "Fossil Fuels", "Year": 2013}, {"Energy": 1813, "Energy Source": "Nuclear", "Year": 2013}, {"Energy": 3473.9, "Energy Source": "Renewables", "Year": 2013}, {"Energy": 11761.51, "Energy Source": "Fossil Fuels", "Year": 2014}, {"Energy": 1847.87, "Energy Source": "Nuclear", "Year": 2014}, {"Energy": 3753.03, "Energy Source": "Renewables", "Year": 2014}, {"Energy": 11653.61, "Energy Source": "Fossil Fuels", "Year": 2015}, {"Energy": 1886.61, "Energy Source": "Nuclear", "Year": 2015}, {"Energy": 3903.7, "Energy Source": "Renewables", "Year": 2015}, {"Energy": 11785.35, "Energy Source": "Fossil Fuels", "Year": 2016}, {"Energy": 1906.92, "Energy Source": "Nuclear", "Year": 2016}, {"Energy": 4186.83, "Energy Source": "Renewables", "Year": 2016}, {"Energy": 12003.24, "Energy Source": "Fossil Fuels", "Year": 2017}, {"Energy": 1942.94, "Energy Source": "Nuclear", "Year": 2017}, {"Energy": 4500.22, "Energy Source": "Renewables", "Year": 2017}, {"Energy": 12445.24, "Energy Source": "Fossil Fuels", "Year": 2018}, {"Energy": 2016.82, "Energy Source": "Nuclear", "Year": 2018}, {"Energy": 4834.65, "Energy Source": "Renewables", "Year": 2018}, {"Energy": 12355.91, "Energy Source": "Fossil Fuels", "Year": 2019}, {"Energy": 2073.87, "Energy Source": "Nuclear", "Year": 2019}, {"Energy": 5146.11, "Energy Source": "Renewables", "Year": 2019}, {"Energy": 12032.75, "Energy Source": "Fossil Fuels", "Year": 2020}, {"Energy": 1974.46, "Energy Source": "Nuclear", "Year": 2020}, {"Energy": 5497.73, "Energy Source": "Renewables", "Year": 2020}], "metadata": {"Energy": {"type": "number", "semanticType": "Number"}, "Energy Source": {"type": "string", "semanticType": "String", "levels": ["Fossil Fuels", "Nuclear", "Renewables"]}, "Year": {"type": "number", "semanticType": "Year"}}, "derive": {"code": "\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Group by Year and sum electricity generation from all countries\n global_energy = df_energy.groupby('Year').agg({\n 'Electricity from fossil fuels (TWh)': 'sum',\n 'Electricity from nuclear (TWh)': 'sum',\n 'Electricity from renewables (TWh)': 'sum'\n }).reset_index()\n \n # Rename columns for clarity\n global_energy.columns = ['Year', 'Fossil Fuels', 'Nuclear', 'Renewables']\n \n # Transform to long format\n transformed_df = pd.melt(\n global_energy,\n id_vars=['Year'],\n value_vars=['Fossil Fuels', 'Nuclear', 'Renewables'],\n var_name='Energy Source',\n value_name='Energy'\n )\n \n # Sort by Year and Energy Source\n transformed_df = transformed_df.sort_values(['Year', 'Energy Source']).reset_index(drop=True)\n \n return transformed_df\n", "source": ["global-energy-20-small.csv"], "dialog": [{"content": "You are a data scientist to help user to transform data that will be used for visualization.\nThe user will provide you information about what data would be needed, and your job is to create a python function based on the input data summary, transformation instruction and expected fields.\nThe users' instruction includes \"chart_type\" and \"chart_encodings\" that describe the visualization they want, and natural language instructions \"goal\" that describe what data is needed.\n\n**Important:**\n- NEVER make assumptions or judgments about a person's gender, biological sex, sexuality, religion, race, nationality, ethnicity, political stance, socioeconomic status, mental health, invisible disabilities, medical conditions, personality type, social impressions, emotional state, and cognitive state.\n- NEVER create formulas that could be used to discriminate based on age. Ageism of any form (explicit and implicit) is strictly prohibited.\n- If above issue occurs, generate columns with np.nan.\n\nConcretely, you should first refine users' goal and then create a python function in the output section based off the [CONTEXT] and [GOAL]:\n\n 1. First, refine users' [GOAL]. The main objective in this step is to check if \"chart_type\" and \"chart_encodings\" provided by the user are sufficient to achieve their \"goal\". Concretely:\n - based on the user's \"goal\" and \"chart_type\" and \"chart_encodings\", elaborate the goal into a \"detailed_instruction\".\n - \"display_instruction\" is a short verb phrase describing the users' goal. \n - it would be a short verbal description of user intent as a verb phrase (<12 words).\n - generate it based on detailed_instruction and the suggested chart_type and chart_encodings, but don't need to mention the chart details.\n - should capture key computation ideas: by reading the display, the user can understand the purpose and what's derived from the data.\n - if the user specification follows up the previous instruction, the 'display_instruction' should only describe how it builds up the previous instruction without repeating information from previous steps.\n - the phrase can be presented in different styles, e.g., question (what's xxx), instruction (show xxx), description, etc.\n - if you mention column names from the input or the output data, highlight the text in **bold**.\n * the column can either be a column in the input data, or a new column that will be computed in the output data.\n * the mention don't have to be exact match, it can be semantically matching, e.g., if you mentioned \"average score\" in the text while the column to be computed is \"Avg_Score\", you should still highlight \"**average score**\" in the text.\n - determine \"output_fields\", the desired fields that the output data should have to achieve the user's goal, it's a good idea to include intermediate fields here.\n - then decide \"chart_encodings\", which maps visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized, \n - the \"chart_encodings\" should be created to support the user's \"chart_type\".\n - first, determine whether the user has provided sufficient fields in \"chart_encodings\" that are needed to achieve their goal:\n - if the user's \"chart_encodings\" are sufficient, simply copy it.\n - if the user didn't provide sufficient fields in \"chart_encodings\", add missing fields in \"chart_encodings\" (ordered them based on whether the field will be used in x,y axes or legends);\n - \"chart_encodings\" should only include fields that will be visualized (do not include other intermediate fields from \"output_fields\") \n - when adding new fields to \"chart_encodings\", be efficient and add only a minimal number of fields that are needed to achive the user's goal. \n - generally, the total number of fields in \"chart_encodings\" should be no more than 3 for x,y,legend.\n - if the user's \"chart_encodings\" is sufficient but can be optimized, you can reorder encodings to visualize the data more effectively.\n - sometimes, user may provide instruction to update visualizations fields they provided. You should leverage the user's goal to resolve the conflict and decide the final \"chart_encodings\"\n - e.g., they may mention \"use B metric instead\" while A metric is in provided fields, in this case, you should update \"chart_encodings\" to update A metric with B metric.\n - guide on statistical analysis:\n - when the user asks for forecasting or regression analysis, you should consider the following:\n - the output should be a long format table where actual x, y pairs and predicted x, y pairs are included in the X, Y columns, they are differentiated with a third column \"is_predicted\" that is a boolean field.\n - i.e., if the user ask for forecasting based on two columns T and Y, the output should be three columns: T, Y, is_predicted, where\n - T, Y columns contain BOTH original values from the data and predicted values from the data.\n - is_predicted is a boolean field to indicate whether the x, y pairs are original values from the data or predicted / regression values from the data.\n - the recommended chart should be line chart (time series) or scatter plot (quantitative x, y)\n - if the user asks for forecasting, it's good to include predicted x, y pairs for both x in the original data and future x values (i.e., combine regression and forecasting results)\n - in this case, is_predicted should be of three values 'original', 'regression', 'forecasting'\n - when the user asks for clustering:\n - the output should be a long format table where actual x, y pairs with a third column \"cluster_id\" that indicates the cluster id of the data point.\n - the recommended chart should be scatter plot (quantitative x, y)\n \n Prepare the result in the following json format:\n\n```\n{\n \"detailed_instruction\": \"...\" // string, elaborate user instruction with details if the user\n \"display_instruction\": \"...\" // string, the short verb phrase describing the users' goal.\n \"output_fields\": [...] // string[], describe the desired output fields that the output data should have based on the user's goal, it's a good idea to preserve intermediate fields here (i.e., the goal of transformed data)\n \"chart_encodings\": {\n \"x\": \"\",\n \"y\": \"\",\n \"color\": \"\",\n \"size\": \"\",\n \"opacity\": \"\",\n \"facet\": \"\",\n ... // other visualization channels user used\n } // object: map visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized.\n \"reason\": \"...\" // string, explain why this refinement is made\n}\n```\n\n 2. Then, write a python function based on the refined goal, the function input is a dataframe \"df\" (or multiple dataframes based on tables presented in the [CONTEXT] section) and the output is the transformed dataframe \"transformed_df\". \"transformed_df\" should contain all \"output_fields\" from the refined goal.\nThe python function must follow the template provided in [TEMPLATE], do not import any other libraries or modify function name. The function should be as simple as possible and easily readable.\nIf there is no data transformation needed based on \"output_fields\", the transformation function can simply \"return df\".\n\n[TEMPLATE]\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\nfrom sklearn import ... # import necessary libraries from sklearn if needed\n\ndef transform_data(df1, df2, ...): \n # complete the template here\n return transformed_df\n```\n\nnote: \n- if the user provided one table, then it should be `def transform_data(df1)`, if the user provided multiple tables, then it should be `def transform_data(df1, df2, ...)` and you should consider the join between tables to derive the output.\n- **VERY IMPORTANT** the number of arguments in the function must match the number of tables provided, and the order of arguments must match the order of tables provided.\n- try to use intuitive table names to refer to the input dataframes, for example, if the user provided two tables city and weather, you can use `transform_data(df_city, df_weather)` to refer to the two dataframes, as long as the number and order of the arguments match the number and order of the tables provided.\n- datetime objects handling:\n - if the output field is year, convert it to number, if it is year-month / year-month-day, convert it to string object (e.g., \"2020-01\" / \"2020-01-01\").\n - if the output is time only: convert hour to number if it's just the hour (e.g., 10), but convert hour:min or h:m:s to string object (e.g., \"10:30\", \"10:30:45\")\n - never return datetime object directly, convert it to either number (if it only contains year) or string so it's readable.\n\n 3. The output must only contain a json object representing the refined goal and a python code block representing the transformation code, do not add any extra text explanation.\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 5736319.824, 5738290.039, nan, 9861099.609\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 61.99, 62.21, 63.02, 63.71\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[GOAL]\n\n{\n \"instruction\": \"compare each countries renewable between 2000 and 2020\",\n \"chart_type\": \"Grouped Bar Chart\",\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Renewable Energy\",\n \"color\": \"Year\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Compare renewable energy (Electricity from renewables) for each country between the years 2000 and 2020. Filter the data to include only years 2000 and 2020, and prepare the output with country names, year, and renewable energy values for visualization as a grouped bar chart.\",\n \"display_instruction\": \"Compare **renewable energy** by **country** between **2000** and **2020**\",\n \"output_fields\": [\"Entity\", \"Year\", \"Electricity from renewables (TWh)\"],\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Electricity from renewables (TWh)\",\n \"color\": \"Year\"\n },\n \"reason\": \"The user wants to compare renewable energy across countries between 2000 and 2020. The output field 'Renewable Energy' mentioned in the original chart_encodings maps to 'Electricity from renewables (TWh)' in the dataset. To create a grouped bar chart comparing these two years, we need to filter the data to only include years 2000 and 2020, and output the Entity (country), Year, and the renewable energy values. The chart_encodings are updated to use the actual column name from the dataset.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n transformed_df = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Select only the required columns\n transformed_df = transformed_df[['Entity', 'Year', 'Electricity from renewables (TWh)']]\n \n # Convert Year to string for better visualization as a categorical variable\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year for better readability\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Electricity from renewables (TWh) Entity Year\n0 17.11 Australia 2000\n1 63.99 Australia 2020\n2 308.77 Brazil 2000\n3 520.01 Brazil 2020\n4 363.70 Canada 2000\n5 429.24 Canada 2020\n6 225.56 China 2000\n7 2184.94 China 2020\n8 67.83 France 2000\n9 125.28 France 2020\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show their renewable percentage instead, and add an entity to show global average\",\n \"chart_type\": \"Grouped Bar Chart\",\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Electricity from renewables (TWh)\",\n \"color\": \"Year\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Calculate the renewable energy percentage (proportion of renewables to total electricity generation from all sources) for each country between 2000 and 2020. Add a 'Global Average' entity that represents the worldwide average renewable percentage for each year. The output should include Entity (with 'Global Average' added), Year, and the renewable percentage.\",\n \"display_instruction\": \"Show **renewable energy percentage** by **country** with **global average** for **2000** and **2020**\",\n \"output_fields\": [\"Entity\", \"Year\", \"Renewable Percentage\"],\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Renewable Percentage\",\n \"color\": \"Year\"\n },\n \"reason\": \"The user wants to see renewable percentage instead of absolute values, and wants to add a global average entity for comparison. Renewable percentage is calculated as the ratio of renewable energy to total electricity generation (fossil fuels + nuclear + renewables). A 'Global Average' entity needs to be computed by aggregating all countries' data for each year. The y-axis encoding is updated to reflect the new metric 'Renewable Percentage'.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n df_filtered = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Calculate total electricity generation\n df_filtered['Total Electricity'] = (\n df_filtered['Electricity from fossil fuels (TWh)'] + \n df_filtered['Electricity from nuclear (TWh)'] + \n df_filtered['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_filtered['Renewable Percentage'] = (\n df_filtered['Electricity from renewables (TWh)'] / df_filtered['Total Electricity'] * 100\n )\n \n # Select required columns for individual countries\n country_data = df_filtered[['Entity', 'Year', 'Renewable Percentage']].copy()\n \n # Calculate global average for each year\n global_avg = df_filtered.groupby('Year').apply(\n lambda x: pd.Series({\n 'Entity': 'Global Average',\n 'Year': x['Year'].iloc[0],\n 'Renewable Percentage': (\n x['Electricity from renewables (TWh)'].sum() / x['Total Electricity'].sum() * 100\n )\n })\n ).reset_index(drop=True)\n \n # Combine country data with global average\n transformed_df = pd.concat([country_data, global_avg], ignore_index=True)\n \n # Convert Year to string for better visualization\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Entity Renewable Percentage Year\n0 Australia 8.634437 2000\n1 Australia 25.503168 2020\n2 Brazil 90.130772 2000\n3 Brazil 84.641177 2020\n4 Canada 61.809592 2000\n5 Canada 68.779644 2020\n6 China 16.639127 2000\n7 China 28.246461 2020\n8 France 12.711769 2000\n9 France 23.761024 2020\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show global energy composition over time\",\n \"chart_type\": \"Custom Area\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"Energy\",\n \"color\": \"Energy Source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Show the global energy composition over time by aggregating electricity generation from all countries for each year, broken down by energy source (fossil fuels, nuclear, and renewables). The output should be in long format with Year, Energy Source, and Energy values to support a stacked area chart visualization.\",\n \"display_instruction\": \"Show global **energy composition** by **energy source** over time\",\n \"output_fields\": [\"Year\", \"Energy Source\", \"Energy\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"Energy\",\n \"color\": \"Energy Source\"\n },\n \"reason\": \"The user wants to visualize global energy composition over time, which requires aggregating electricity generation data across all countries for each year and breaking it down by energy source type. The data needs to be transformed into long format where each row represents a year-energy source combination, with the total energy value for that source globally. This follows up from the previous renewable percentage analysis by showing the overall energy mix trends rather than focusing on specific countries or percentages.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Group by Year and sum electricity generation from all countries\n global_energy = df_energy.groupby('Year').agg({\n 'Electricity from fossil fuels (TWh)': 'sum',\n 'Electricity from nuclear (TWh)': 'sum',\n 'Electricity from renewables (TWh)': 'sum'\n }).reset_index()\n \n # Rename columns for clarity\n global_energy.columns = ['Year', 'Fossil Fuels', 'Nuclear', 'Renewables']\n \n # Transform to long format\n transformed_df = pd.melt(\n global_energy,\n id_vars=['Year'],\n value_vars=['Fossil Fuels', 'Nuclear', 'Renewables'],\n var_name='Energy Source',\n value_name='Energy'\n )\n \n # Sort by Year and Energy Source\n transformed_df = transformed_df.sort_values(['Year', 'Energy Source']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}], "trigger": {"tableId": "table-27", "resultTableId": "table-81", "chart": {"id": "chart-1760738942326", "chartType": "Custom Area", "encodingMap": {"x": {"channel": "x", "bin": false, "fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"channel": "y", "bin": false, "fieldID": "concept-1760738917347"}, "x2": {"channel": "x2", "bin": false}, "y2": {"channel": "y2", "bin": false}, "color": {"channel": "color", "bin": false, "fieldID": "concept-1760738921927"}, "column": {"channel": "column", "bin": false}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-27", "saved": false, "source": "trigger", "unread": false}, "interaction": [{"from": "user", "to": "datatransform-agent", "role": "instruction", "content": "show global energy composition over time", "displayContent": "Show global **energy composition** by **energy source** over time"}]}, "explanation": {"agent": "CodeExplanationAgent", "code": "- Group the energy data by **Year** and aggregate electricity generation values across all countries by summing **Electricity from fossil fuels (TWh)**, **Electricity from nuclear (TWh)**, and **Electricity from renewables (TWh)**\n- Rename the aggregated columns to simplified labels: **Fossil Fuels**, **Nuclear**, and **Renewables**\n- Transform the data from wide format to long format by unpivoting the three energy source columns into two columns: **Energy Source** (containing the type of energy) and **Energy** (containing the generation value in TWh)\n- Sort the resulting dataset by **Year** and **Energy Source** for consistent ordering", "concepts": [], "dialog": [{"content": "You are a data scientist to help user explain code, \nso that a non-code can clearly understand what the code is doing, you are provided with a summary of the input data, and the transformation code.\n\nYour goal:\n1. You should generate a good itemized explanation of the code so that the reader can understand high-level steps of what the data transformation is doing.\n - Be very concise, and stay at a high-level. The reader doesn't understand code and does not want to learn exactly what the code is doing. They just want to learn what have been done from a logical level.\n - The explanation should be a markdown string that is a list of bullet points (with new lines), highlight constants, data fields, and important verbs.\n2. Generate a list of explanations for new fields (fields not from the input data) that introduce metrics/concepts that are not obvious from the code.\n - provide a declarative definition that explains the new field, use a mathematical notation if applicable.\n - only include new fields explanation of new metrics that are involved in computation (e.g., ROI, commerical_success_score)\n - *DO NOT* explain trivial new fields like \"Decade\" or \"Avg_Rating\", \"US_Sales\" that are self-explanatory.\n - Avoid explaining fields that are simple aggregate of fields in the original data (min_score, avg_value, count, etc.)\n - When a field involves mathematical computation, you can use LaTeX math notation in the explanation. Format mathematical expressions using:\n - Inline math: `\\( ... \\)` for formulas within text\n - Block math: `\\[ ... \\]` for standalone formulas\n - Examples: `\\( \\frac{\\text{Revenue}}{\\text{Cost}} \\)` for ratios, `\\[ \\text{Score} = \\text{Rating} \\times \\text{Worldwide\\_Gross} \\]` for formulas\n - note: when using underscores as part of the text, you need to escape them with a backslash, e.g., `\\_`\n - Note: don't use math notation for fields whose computation is trivial (use plain english), it will likely be confusing to the reader. \n Only use math notation for fields that can not be easilyexplained in plain english. Use it sparingly.\n3. If there are multiple fields that have the similar computation, you can explain them together in one explanation.\n - in \"field\", you can provide a list of fields in format of \"field1, field2, ...\"\n - in \"explanation\", you can provide a single explanation for the computation of the fields.\n - for example, if you have fields like \"Norm_Rating\", \"Norm_Gross\", \"Critical_Commercial_Score\", you can explain Norm_Rating, Norm_Gross together in one explanation and explain Critical_Commercial_Score in another explanation.\n4. If the code is about statistical analysis, you should explain the statistical analysis in the explanation as a concept named \"Statistical Analysis\" in the [CONCEPTS EXPLANATION] section.\n - explain how you model the data, which fields are used, how data processing is done, and what models are used.\n - suggest some other modeling approaches that can be used to analyze the data in the explanation as well.\n \nThe focus is to explain how new fields are computed, don't generate explanation for low-level actions like \"return\", \"load data\" etc. \n\nProvide the result in the following two sections:\n - first section is the code explanation that should be a markdown block explaining the code, in the [CODE EXPLANATION] section.\n - remember to highlight constants, data fields, and important verbs in the code explanation.\n - second section is the concepts explanation that should be a json block (start with ```json) in the [CONCEPTS EXPLANATION] section.\n\n[CODE EXPLANATION]\n\n...(explanation of the code)\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"...\",\n \"explanation\": \"...\"\n }\n]\n\n```\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small_csv)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 158029.9988, 190729.9957, 209229.9957, ..., nan, 4956060.059, nan, 9861099.609\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., nan, nan, nan, nan\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[CODE]\n\nhere is the transformation code: \nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Group by Year and sum electricity generation from all countries\n global_energy = df_energy.groupby('Year').agg({\n 'Electricity from fossil fuels (TWh)': 'sum',\n 'Electricity from nuclear (TWh)': 'sum',\n 'Electricity from renewables (TWh)': 'sum'\n }).reset_index()\n \n # Rename columns for clarity\n global_energy.columns = ['Year', 'Fossil Fuels', 'Nuclear', 'Renewables']\n \n # Transform to long format\n transformed_df = pd.melt(\n global_energy,\n id_vars=['Year'],\n value_vars=['Fossil Fuels', 'Nuclear', 'Renewables'],\n var_name='Energy Source',\n value_name='Energy'\n )\n \n # Sort by Year and Energy Source\n transformed_df = transformed_df.sort_values(['Year', 'Energy Source']).reset_index(drop=True)\n \n return transformed_df\n\n\n[EXPLANATION]\n", "role": "user"}, {"content": "[CODE EXPLANATION]\n\n- Group the energy data by **Year** and aggregate electricity generation values across all countries by summing **Electricity from fossil fuels (TWh)**, **Electricity from nuclear (TWh)**, and **Electricity from renewables (TWh)**\n- Rename the aggregated columns to simplified labels: **Fossil Fuels**, **Nuclear**, and **Renewables**\n- Transform the data from wide format to long format by unpivoting the three energy source columns into two columns: **Energy Source** (containing the type of energy) and **Energy** (containing the generation value in TWh)\n- Sort the resulting dataset by **Year** and **Energy Source** for consistent ordering\n\n[CONCEPTS EXPLANATION]\n\n```json\n[]\n```", "role": "assistant"}], "status": "ok"}, "status": "completed"}, "anchored": false, "attachedMetadata": ""}, {"kind": "table", "id": "table-10", "displayId": "energy-source2", "names": ["Energy", "Energy Source", "Entity", "Year"], "rows": [{"Energy": 1113.3, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2000}, {"Energy": 16.74, "Energy Source": "Nuclear", "Entity": "China", "Year": 2000}, {"Energy": 225.56, "Energy Source": "Renewables", "Entity": "China", "Year": 2000}, {"Energy": 1182.59, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2001}, {"Energy": 17.47, "Energy Source": "Nuclear", "Entity": "China", "Year": 2001}, {"Energy": 280.73, "Energy Source": "Renewables", "Entity": "China", "Year": 2001}, {"Energy": 1337.46, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2002}, {"Energy": 25.13, "Energy Source": "Nuclear", "Entity": "China", "Year": 2002}, {"Energy": 291.41, "Energy Source": "Renewables", "Entity": "China", "Year": 2002}, {"Energy": 1579.96, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2003}, {"Energy": 43.34, "Energy Source": "Nuclear", "Entity": "China", "Year": 2003}, {"Energy": 287.28, "Energy Source": "Renewables", "Entity": "China", "Year": 2003}, {"Energy": 1795.41, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2004}, {"Energy": 50.47, "Energy Source": "Nuclear", "Entity": "China", "Year": 2004}, {"Energy": 357.43, "Energy Source": "Renewables", "Entity": "China", "Year": 2004}, {"Energy": 2042.8, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2005}, {"Energy": 53.09, "Energy Source": "Nuclear", "Entity": "China", "Year": 2005}, {"Energy": 404.37, "Energy Source": "Renewables", "Entity": "China", "Year": 2005}, {"Energy": 2364.16, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2006}, {"Energy": 54.84, "Energy Source": "Nuclear", "Entity": "China", "Year": 2006}, {"Energy": 446.72, "Energy Source": "Renewables", "Entity": "China", "Year": 2006}, {"Energy": 2718.7, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2007}, {"Energy": 62.13, "Energy Source": "Nuclear", "Entity": "China", "Year": 2007}, {"Energy": 500.71, "Energy Source": "Renewables", "Entity": "China", "Year": 2007}, {"Energy": 2762.29, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2008}, {"Energy": 68.39, "Energy Source": "Nuclear", "Entity": "China", "Year": 2008}, {"Energy": 665.08, "Energy Source": "Renewables", "Entity": "China", "Year": 2008}, {"Energy": 2980.2, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2009}, {"Energy": 70.05, "Energy Source": "Nuclear", "Entity": "China", "Year": 2009}, {"Energy": 664.39, "Energy Source": "Renewables", "Entity": "China", "Year": 2009}, {"Energy": 3326.19, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2010}, {"Energy": 74.74, "Energy Source": "Nuclear", "Entity": "China", "Year": 2010}, {"Energy": 786.38, "Energy Source": "Renewables", "Entity": "China", "Year": 2010}, {"Energy": 3811.77, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2011}, {"Energy": 87.2, "Energy Source": "Nuclear", "Entity": "China", "Year": 2011}, {"Energy": 792.38, "Energy Source": "Renewables", "Entity": "China", "Year": 2011}, {"Energy": 3869.38, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2012}, {"Energy": 98.32, "Energy Source": "Nuclear", "Entity": "China", "Year": 2012}, {"Energy": 999.56, "Energy Source": "Renewables", "Entity": "China", "Year": 2012}, {"Energy": 4203.77, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2013}, {"Energy": 111.5, "Energy Source": "Nuclear", "Entity": "China", "Year": 2013}, {"Energy": 1093.37, "Energy Source": "Renewables", "Entity": "China", "Year": 2013}, {"Energy": 4345.86, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2014}, {"Energy": 133.22, "Energy Source": "Nuclear", "Entity": "China", "Year": 2014}, {"Energy": 1289.23, "Energy Source": "Renewables", "Entity": "China", "Year": 2014}, {"Energy": 4222.76, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2015}, {"Energy": 171.38, "Energy Source": "Nuclear", "Entity": "China", "Year": 2015}, {"Energy": 1393.66, "Energy Source": "Renewables", "Entity": "China", "Year": 2015}, {"Energy": 4355, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2016}, {"Energy": 213.18, "Energy Source": "Nuclear", "Entity": "China", "Year": 2016}, {"Energy": 1522.79, "Energy Source": "Renewables", "Entity": "China", "Year": 2016}, {"Energy": 4643.1, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2017}, {"Energy": 248.1, "Energy Source": "Nuclear", "Entity": "China", "Year": 2017}, {"Energy": 1667.06, "Energy Source": "Renewables", "Entity": "China", "Year": 2017}, {"Energy": 4990.28, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2018}, {"Energy": 295, "Energy Source": "Nuclear", "Entity": "China", "Year": 2018}, {"Energy": 1835.32, "Energy Source": "Renewables", "Entity": "China", "Year": 2018}, {"Energy": 5098.22, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2019}, {"Energy": 348.7, "Energy Source": "Nuclear", "Entity": "China", "Year": 2019}, {"Energy": 2014.57, "Energy Source": "Renewables", "Entity": "China", "Year": 2019}, {"Energy": 5184.13, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2020}, {"Energy": 366.2, "Energy Source": "Nuclear", "Entity": "China", "Year": 2020}, {"Energy": 2184.94, "Energy Source": "Renewables", "Entity": "China", "Year": 2020}, {"Energy": 475.35, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2000}, {"Energy": 15.77, "Energy Source": "Nuclear", "Entity": "India", "Year": 2000}, {"Energy": 80.27, "Energy Source": "Renewables", "Entity": "India", "Year": 2000}, {"Energy": 491.01, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2001}, {"Energy": 18.89, "Energy Source": "Nuclear", "Entity": "India", "Year": 2001}, {"Energy": 76.19, "Energy Source": "Renewables", "Entity": "India", "Year": 2001}, {"Energy": 517.51, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2002}, {"Energy": 19.35, "Energy Source": "Nuclear", "Entity": "India", "Year": 2002}, {"Energy": 72.78, "Energy Source": "Renewables", "Entity": "India", "Year": 2002}, {"Energy": 545.36, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2003}, {"Energy": 18.14, "Energy Source": "Nuclear", "Entity": "India", "Year": 2003}, {"Energy": 74.63, "Energy Source": "Renewables", "Entity": "India", "Year": 2003}, {"Energy": 567.86, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2004}, {"Energy": 21.26, "Energy Source": "Nuclear", "Entity": "India", "Year": 2004}, {"Energy": 109.2, "Energy Source": "Renewables", "Entity": "India", "Year": 2004}, {"Energy": 579.32, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2005}, {"Energy": 17.73, "Energy Source": "Nuclear", "Entity": "India", "Year": 2005}, {"Energy": 107.47, "Energy Source": "Renewables", "Entity": "India", "Year": 2005}, {"Energy": 599.24, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2006}, {"Energy": 17.63, "Energy Source": "Nuclear", "Entity": "India", "Year": 2006}, {"Energy": 127.56, "Energy Source": "Renewables", "Entity": "India", "Year": 2006}, {"Energy": 636.68, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2007}, {"Energy": 17.83, "Energy Source": "Nuclear", "Entity": "India", "Year": 2007}, {"Energy": 141.75, "Energy Source": "Renewables", "Entity": "India", "Year": 2007}, {"Energy": 674.27, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2008}, {"Energy": 15.23, "Energy Source": "Nuclear", "Entity": "India", "Year": 2008}, {"Energy": 138.91, "Energy Source": "Renewables", "Entity": "India", "Year": 2008}, {"Energy": 728.56, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2009}, {"Energy": 16.82, "Energy Source": "Nuclear", "Entity": "India", "Year": 2009}, {"Energy": 134.33, "Energy Source": "Renewables", "Entity": "India", "Year": 2009}, {"Energy": 771.78, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2010}, {"Energy": 23.08, "Energy Source": "Nuclear", "Entity": "India", "Year": 2010}, {"Energy": 142.61, "Energy Source": "Renewables", "Entity": "India", "Year": 2010}, {"Energy": 828.16, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2011}, {"Energy": 32.22, "Energy Source": "Nuclear", "Entity": "India", "Year": 2011}, {"Energy": 173.62, "Energy Source": "Renewables", "Entity": "India", "Year": 2011}, {"Energy": 893.45, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2012}, {"Energy": 33.14, "Energy Source": "Nuclear", "Entity": "India", "Year": 2012}, {"Energy": 165.25, "Energy Source": "Renewables", "Entity": "India", "Year": 2012}, {"Energy": 924.93, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2013}, {"Energy": 33.31, "Energy Source": "Nuclear", "Entity": "India", "Year": 2013}, {"Energy": 187.9, "Energy Source": "Renewables", "Entity": "India", "Year": 2013}, {"Energy": 1025.29, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2014}, {"Energy": 34.69, "Energy Source": "Nuclear", "Entity": "India", "Year": 2014}, {"Energy": 202.04, "Energy Source": "Renewables", "Entity": "India", "Year": 2014}, {"Energy": 1080.44, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2015}, {"Energy": 38.31, "Energy Source": "Nuclear", "Entity": "India", "Year": 2015}, {"Energy": 203.21, "Energy Source": "Renewables", "Entity": "India", "Year": 2015}, {"Energy": 1155.52, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2016}, {"Energy": 37.9, "Energy Source": "Nuclear", "Entity": "India", "Year": 2016}, {"Energy": 208.21, "Energy Source": "Renewables", "Entity": "India", "Year": 2016}, {"Energy": 1198.85, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2017}, {"Energy": 37.41, "Energy Source": "Nuclear", "Entity": "India", "Year": 2017}, {"Energy": 234.9, "Energy Source": "Renewables", "Entity": "India", "Year": 2017}, {"Energy": 1276.32, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2018}, {"Energy": 39.05, "Energy Source": "Nuclear", "Entity": "India", "Year": 2018}, {"Energy": 263.61, "Energy Source": "Renewables", "Entity": "India", "Year": 2018}, {"Energy": 1273.59, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2019}, {"Energy": 45.16, "Energy Source": "Nuclear", "Entity": "India", "Year": 2019}, {"Energy": 303.16, "Energy Source": "Renewables", "Entity": "India", "Year": 2019}, {"Energy": 1202.34, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2020}, {"Energy": 44.61, "Energy Source": "Nuclear", "Entity": "India", "Year": 2020}, {"Energy": 315.76, "Energy Source": "Renewables", "Entity": "India", "Year": 2020}, {"Energy": 2697.28, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2000}, {"Energy": 753.89, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2000}, {"Energy": 350.93, "Energy Source": "Renewables", "Entity": "United States", "Year": 2000}, {"Energy": 2678.68, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2001}, {"Energy": 768.83, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2001}, {"Energy": 280.06, "Energy Source": "Renewables", "Entity": "United States", "Year": 2001}, {"Energy": 2727.83, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2002}, {"Energy": 780.06, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2002}, {"Energy": 336.34, "Energy Source": "Renewables", "Entity": "United States", "Year": 2002}, {"Energy": 2756.03, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2003}, {"Energy": 763.73, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2003}, {"Energy": 349.18, "Energy Source": "Renewables", "Entity": "United States", "Year": 2003}, {"Energy": 2818.28, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2004}, {"Energy": 788.53, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2004}, {"Energy": 345.14, "Energy Source": "Renewables", "Entity": "United States", "Year": 2004}, {"Energy": 2899.96, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2005}, {"Energy": 781.99, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2005}, {"Energy": 353.04, "Energy Source": "Renewables", "Entity": "United States", "Year": 2005}, {"Energy": 2878.56, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2006}, {"Energy": 787.22, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2006}, {"Energy": 381.16, "Energy Source": "Renewables", "Entity": "United States", "Year": 2006}, {"Energy": 2988.24, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2007}, {"Energy": 806.42, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2007}, {"Energy": 347.91, "Energy Source": "Renewables", "Entity": "United States", "Year": 2007}, {"Energy": 2924.21, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2008}, {"Energy": 806.21, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2008}, {"Energy": 377.11, "Energy Source": "Renewables", "Entity": "United States", "Year": 2008}, {"Energy": 2725.41, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2009}, {"Energy": 798.85, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2009}, {"Energy": 415.56, "Energy Source": "Renewables", "Entity": "United States", "Year": 2009}, {"Energy": 2882.49, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2010}, {"Energy": 806.97, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2010}, {"Energy": 424.48, "Energy Source": "Renewables", "Entity": "United States", "Year": 2010}, {"Energy": 2788.93, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2011}, {"Energy": 790.2, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2011}, {"Energy": 509.74, "Energy Source": "Renewables", "Entity": "United States", "Year": 2011}, {"Energy": 2779.02, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2012}, {"Energy": 769.33, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2012}, {"Energy": 492.32, "Energy Source": "Renewables", "Entity": "United States", "Year": 2012}, {"Energy": 2746.21, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2013}, {"Energy": 789.02, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2013}, {"Energy": 520.38, "Energy Source": "Renewables", "Entity": "United States", "Year": 2013}, {"Energy": 2752.01, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2014}, {"Energy": 797.17, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2014}, {"Energy": 546.83, "Energy Source": "Renewables", "Entity": "United States", "Year": 2014}, {"Energy": 2730.32, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2015}, {"Energy": 797.18, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2015}, {"Energy": 556.49, "Energy Source": "Renewables", "Entity": "United States", "Year": 2015}, {"Energy": 2656.96, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2016}, {"Energy": 805.69, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2016}, {"Energy": 624.91, "Energy Source": "Renewables", "Entity": "United States", "Year": 2016}, {"Energy": 2540.17, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2017}, {"Energy": 804.95, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2017}, {"Energy": 707.19, "Energy Source": "Renewables", "Entity": "United States", "Year": 2017}, {"Energy": 2661.3, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2018}, {"Energy": 807.08, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2018}, {"Energy": 733.17, "Energy Source": "Renewables", "Entity": "United States", "Year": 2018}, {"Energy": 2588.21, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2019}, {"Energy": 809.41, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2019}, {"Energy": 760.76, "Energy Source": "Renewables", "Entity": "United States", "Year": 2019}, {"Energy": 2431.9, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2020}, {"Energy": 789.88, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2020}, {"Energy": 821.4, "Energy Source": "Renewables", "Entity": "United States", "Year": 2020}], "metadata": {"Energy": {"type": "number", "semanticType": "Number"}, "Energy Source": {"type": "string", "semanticType": "String", "levels": ["Fossil Fuels", "Nuclear", "Renewables"]}, "Entity": {"type": "string", "semanticType": "Location"}, "Year": {"type": "number", "semanticType": "Year"}}, "derive": {"code": "\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total CO2 emissions by country across all years\n total_co2_by_country = df_energy.groupby('Entity')['Value_co2_emissions_kt_by_country'].sum().reset_index()\n \n # Get top 3 CO2 emitting countries\n top3_countries = total_co2_by_country.nlargest(3, 'Value_co2_emissions_kt_by_country')['Entity'].tolist()\n \n # Filter data for top 3 countries\n df_top3 = df_energy[df_energy['Entity'].isin(top3_countries)].copy()\n \n # Select relevant columns and rename for clarity\n df_top3 = df_top3[['Year', 'Entity', 'Electricity from fossil fuels (TWh)', \n 'Electricity from nuclear (TWh)', 'Electricity from renewables (TWh)']].copy()\n \n df_top3.columns = ['Year', 'Entity', 'Fossil Fuels', 'Nuclear', 'Renewables']\n \n # Transform to long format\n transformed_df = pd.melt(\n df_top3,\n id_vars=['Year', 'Entity'],\n value_vars=['Fossil Fuels', 'Nuclear', 'Renewables'],\n var_name='Energy Source',\n value_name='Energy'\n )\n \n # Sort by Entity, Year, and Energy Source\n transformed_df = transformed_df.sort_values(['Entity', 'Year', 'Energy Source']).reset_index(drop=True)\n \n return transformed_df\n", "source": ["global-energy-20-small.csv"], "dialog": [{"content": "You are a data scientist to help user to transform data that will be used for visualization.\nThe user will provide you information about what data would be needed, and your job is to create a python function based on the input data summary, transformation instruction and expected fields.\nThe users' instruction includes \"chart_type\" and \"chart_encodings\" that describe the visualization they want, and natural language instructions \"goal\" that describe what data is needed.\n\n**Important:**\n- NEVER make assumptions or judgments about a person's gender, biological sex, sexuality, religion, race, nationality, ethnicity, political stance, socioeconomic status, mental health, invisible disabilities, medical conditions, personality type, social impressions, emotional state, and cognitive state.\n- NEVER create formulas that could be used to discriminate based on age. Ageism of any form (explicit and implicit) is strictly prohibited.\n- If above issue occurs, generate columns with np.nan.\n\nConcretely, you should first refine users' goal and then create a python function in the output section based off the [CONTEXT] and [GOAL]:\n\n 1. First, refine users' [GOAL]. The main objective in this step is to check if \"chart_type\" and \"chart_encodings\" provided by the user are sufficient to achieve their \"goal\". Concretely:\n - based on the user's \"goal\" and \"chart_type\" and \"chart_encodings\", elaborate the goal into a \"detailed_instruction\".\n - \"display_instruction\" is a short verb phrase describing the users' goal. \n - it would be a short verbal description of user intent as a verb phrase (<12 words).\n - generate it based on detailed_instruction and the suggested chart_type and chart_encodings, but don't need to mention the chart details.\n - should capture key computation ideas: by reading the display, the user can understand the purpose and what's derived from the data.\n - if the user specification follows up the previous instruction, the 'display_instruction' should only describe how it builds up the previous instruction without repeating information from previous steps.\n - the phrase can be presented in different styles, e.g., question (what's xxx), instruction (show xxx), description, etc.\n - if you mention column names from the input or the output data, highlight the text in **bold**.\n * the column can either be a column in the input data, or a new column that will be computed in the output data.\n * the mention don't have to be exact match, it can be semantically matching, e.g., if you mentioned \"average score\" in the text while the column to be computed is \"Avg_Score\", you should still highlight \"**average score**\" in the text.\n - determine \"output_fields\", the desired fields that the output data should have to achieve the user's goal, it's a good idea to include intermediate fields here.\n - then decide \"chart_encodings\", which maps visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized, \n - the \"chart_encodings\" should be created to support the user's \"chart_type\".\n - first, determine whether the user has provided sufficient fields in \"chart_encodings\" that are needed to achieve their goal:\n - if the user's \"chart_encodings\" are sufficient, simply copy it.\n - if the user didn't provide sufficient fields in \"chart_encodings\", add missing fields in \"chart_encodings\" (ordered them based on whether the field will be used in x,y axes or legends);\n - \"chart_encodings\" should only include fields that will be visualized (do not include other intermediate fields from \"output_fields\") \n - when adding new fields to \"chart_encodings\", be efficient and add only a minimal number of fields that are needed to achive the user's goal. \n - generally, the total number of fields in \"chart_encodings\" should be no more than 3 for x,y,legend.\n - if the user's \"chart_encodings\" is sufficient but can be optimized, you can reorder encodings to visualize the data more effectively.\n - sometimes, user may provide instruction to update visualizations fields they provided. You should leverage the user's goal to resolve the conflict and decide the final \"chart_encodings\"\n - e.g., they may mention \"use B metric instead\" while A metric is in provided fields, in this case, you should update \"chart_encodings\" to update A metric with B metric.\n - guide on statistical analysis:\n - when the user asks for forecasting or regression analysis, you should consider the following:\n - the output should be a long format table where actual x, y pairs and predicted x, y pairs are included in the X, Y columns, they are differentiated with a third column \"is_predicted\" that is a boolean field.\n - i.e., if the user ask for forecasting based on two columns T and Y, the output should be three columns: T, Y, is_predicted, where\n - T, Y columns contain BOTH original values from the data and predicted values from the data.\n - is_predicted is a boolean field to indicate whether the x, y pairs are original values from the data or predicted / regression values from the data.\n - the recommended chart should be line chart (time series) or scatter plot (quantitative x, y)\n - if the user asks for forecasting, it's good to include predicted x, y pairs for both x in the original data and future x values (i.e., combine regression and forecasting results)\n - in this case, is_predicted should be of three values 'original', 'regression', 'forecasting'\n - when the user asks for clustering:\n - the output should be a long format table where actual x, y pairs with a third column \"cluster_id\" that indicates the cluster id of the data point.\n - the recommended chart should be scatter plot (quantitative x, y)\n \n Prepare the result in the following json format:\n\n```\n{\n \"detailed_instruction\": \"...\" // string, elaborate user instruction with details if the user\n \"display_instruction\": \"...\" // string, the short verb phrase describing the users' goal.\n \"output_fields\": [...] // string[], describe the desired output fields that the output data should have based on the user's goal, it's a good idea to preserve intermediate fields here (i.e., the goal of transformed data)\n \"chart_encodings\": {\n \"x\": \"\",\n \"y\": \"\",\n \"color\": \"\",\n \"size\": \"\",\n \"opacity\": \"\",\n \"facet\": \"\",\n ... // other visualization channels user used\n } // object: map visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized.\n \"reason\": \"...\" // string, explain why this refinement is made\n}\n```\n\n 2. Then, write a python function based on the refined goal, the function input is a dataframe \"df\" (or multiple dataframes based on tables presented in the [CONTEXT] section) and the output is the transformed dataframe \"transformed_df\". \"transformed_df\" should contain all \"output_fields\" from the refined goal.\nThe python function must follow the template provided in [TEMPLATE], do not import any other libraries or modify function name. The function should be as simple as possible and easily readable.\nIf there is no data transformation needed based on \"output_fields\", the transformation function can simply \"return df\".\n\n[TEMPLATE]\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\nfrom sklearn import ... # import necessary libraries from sklearn if needed\n\ndef transform_data(df1, df2, ...): \n # complete the template here\n return transformed_df\n```\n\nnote: \n- if the user provided one table, then it should be `def transform_data(df1)`, if the user provided multiple tables, then it should be `def transform_data(df1, df2, ...)` and you should consider the join between tables to derive the output.\n- **VERY IMPORTANT** the number of arguments in the function must match the number of tables provided, and the order of arguments must match the order of tables provided.\n- try to use intuitive table names to refer to the input dataframes, for example, if the user provided two tables city and weather, you can use `transform_data(df_city, df_weather)` to refer to the two dataframes, as long as the number and order of the arguments match the number and order of the tables provided.\n- datetime objects handling:\n - if the output field is year, convert it to number, if it is year-month / year-month-day, convert it to string object (e.g., \"2020-01\" / \"2020-01-01\").\n - if the output is time only: convert hour to number if it's just the hour (e.g., 10), but convert hour:min or h:m:s to string object (e.g., \"10:30\", \"10:30:45\")\n - never return datetime object directly, convert it to either number (if it only contains year) or string so it's readable.\n\n 3. The output must only contain a json object representing the refined goal and a python code block representing the transformation code, do not add any extra text explanation.\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 5736319.824, 5738290.039, nan, 9861099.609\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 61.99, 62.21, 63.02, 63.71\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[GOAL]\n\n{\n \"instruction\": \"compare each countries renewable between 2000 and 2020\",\n \"chart_type\": \"Grouped Bar Chart\",\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Renewable Energy\",\n \"color\": \"Year\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Compare renewable energy (Electricity from renewables) for each country between the years 2000 and 2020. Filter the data to include only years 2000 and 2020, and prepare the output with country names, year, and renewable energy values for visualization as a grouped bar chart.\",\n \"display_instruction\": \"Compare **renewable energy** by **country** between **2000** and **2020**\",\n \"output_fields\": [\"Entity\", \"Year\", \"Electricity from renewables (TWh)\"],\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Electricity from renewables (TWh)\",\n \"color\": \"Year\"\n },\n \"reason\": \"The user wants to compare renewable energy across countries between 2000 and 2020. The output field 'Renewable Energy' mentioned in the original chart_encodings maps to 'Electricity from renewables (TWh)' in the dataset. To create a grouped bar chart comparing these two years, we need to filter the data to only include years 2000 and 2020, and output the Entity (country), Year, and the renewable energy values. The chart_encodings are updated to use the actual column name from the dataset.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n transformed_df = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Select only the required columns\n transformed_df = transformed_df[['Entity', 'Year', 'Electricity from renewables (TWh)']]\n \n # Convert Year to string for better visualization as a categorical variable\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year for better readability\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Electricity from renewables (TWh) Entity Year\n0 17.11 Australia 2000\n1 63.99 Australia 2020\n2 308.77 Brazil 2000\n3 520.01 Brazil 2020\n4 363.70 Canada 2000\n5 429.24 Canada 2020\n6 225.56 China 2000\n7 2184.94 China 2020\n8 67.83 France 2000\n9 125.28 France 2020\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show their renewable percentage instead, and add an entity to show global average\",\n \"chart_type\": \"Grouped Bar Chart\",\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Electricity from renewables (TWh)\",\n \"color\": \"Year\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Calculate the renewable energy percentage (proportion of renewables to total electricity generation from all sources) for each country between 2000 and 2020. Add a 'Global Average' entity that represents the worldwide average renewable percentage for each year. The output should include Entity (with 'Global Average' added), Year, and the renewable percentage.\",\n \"display_instruction\": \"Show **renewable energy percentage** by **country** with **global average** for **2000** and **2020**\",\n \"output_fields\": [\"Entity\", \"Year\", \"Renewable Percentage\"],\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Renewable Percentage\",\n \"color\": \"Year\"\n },\n \"reason\": \"The user wants to see renewable percentage instead of absolute values, and wants to add a global average entity for comparison. Renewable percentage is calculated as the ratio of renewable energy to total electricity generation (fossil fuels + nuclear + renewables). A 'Global Average' entity needs to be computed by aggregating all countries' data for each year. The y-axis encoding is updated to reflect the new metric 'Renewable Percentage'.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n df_filtered = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Calculate total electricity generation\n df_filtered['Total Electricity'] = (\n df_filtered['Electricity from fossil fuels (TWh)'] + \n df_filtered['Electricity from nuclear (TWh)'] + \n df_filtered['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_filtered['Renewable Percentage'] = (\n df_filtered['Electricity from renewables (TWh)'] / df_filtered['Total Electricity'] * 100\n )\n \n # Select required columns for individual countries\n country_data = df_filtered[['Entity', 'Year', 'Renewable Percentage']].copy()\n \n # Calculate global average for each year\n global_avg = df_filtered.groupby('Year').apply(\n lambda x: pd.Series({\n 'Entity': 'Global Average',\n 'Year': x['Year'].iloc[0],\n 'Renewable Percentage': (\n x['Electricity from renewables (TWh)'].sum() / x['Total Electricity'].sum() * 100\n )\n })\n ).reset_index(drop=True)\n \n # Combine country data with global average\n transformed_df = pd.concat([country_data, global_avg], ignore_index=True)\n \n # Convert Year to string for better visualization\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Entity Renewable Percentage Year\n0 Australia 8.634437 2000\n1 Australia 25.503168 2020\n2 Brazil 90.130772 2000\n3 Brazil 84.641177 2020\n4 Canada 61.809592 2000\n5 Canada 68.779644 2020\n6 China 16.639127 2000\n7 China 28.246461 2020\n8 France 12.711769 2000\n9 France 23.761024 2020\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show global energy composition over time\",\n \"chart_type\": \"Custom Area\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"Energy\",\n \"color\": \"Energy Source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Show the global energy composition over time by aggregating electricity generation from all countries for each year, broken down by energy source (fossil fuels, nuclear, and renewables). The output should be in long format with Year, Energy Source, and Energy values to support a stacked area chart visualization.\",\n \"display_instruction\": \"Show global **energy composition** by **energy source** over time\",\n \"output_fields\": [\"Year\", \"Energy Source\", \"Energy\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"Energy\",\n \"color\": \"Energy Source\"\n },\n \"reason\": \"The user wants to visualize global energy composition over time, which requires aggregating electricity generation data across all countries for each year and breaking it down by energy source type. The data needs to be transformed into long format where each row represents a year-energy source combination, with the total energy value for that source globally. This follows up from the previous renewable percentage analysis by showing the overall energy mix trends rather than focusing on specific countries or percentages.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Group by Year and sum electricity generation from all countries\n global_energy = df_energy.groupby('Year').agg({\n 'Electricity from fossil fuels (TWh)': 'sum',\n 'Electricity from nuclear (TWh)': 'sum',\n 'Electricity from renewables (TWh)': 'sum'\n }).reset_index()\n \n # Rename columns for clarity\n global_energy.columns = ['Year', 'Fossil Fuels', 'Nuclear', 'Renewables']\n \n # Transform to long format\n transformed_df = pd.melt(\n global_energy,\n id_vars=['Year'],\n value_vars=['Fossil Fuels', 'Nuclear', 'Renewables'],\n var_name='Energy Source',\n value_name='Energy'\n )\n \n # Sort by Year and Energy Source\n transformed_df = transformed_df.sort_values(['Year', 'Energy Source']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Energy Energy Source Year\n0 7160.71 Fossil Fuels 2000\n1 1996.65 Nuclear 2000\n2 1742.56 Renewables 2000\n3 7273.89 Fossil Fuels 2001\n4 2037.54 Nuclear 2001\n5 1690.11 Renewables 2001\n6 7621.07 Fossil Fuels 2002\n7 2042.18 Nuclear 2002\n8 1757.63 Renewables 2002\n9 8043.86 Fossil Fuels 2003\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show the composition of top 3 CO2 emitting countries\",\n \"chart_type\": \"Custom Area\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"Energy\",\n \"color\": \"Energy Source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Identify the top 3 CO2 emitting countries based on total CO2 emissions across all years, then show the energy composition (fossil fuels, nuclear, and renewables) for each of these countries over time. The output should be in long format with Year, Entity (country), Energy Source, and Energy values to support faceted area charts.\",\n \"display_instruction\": \"Show **energy composition** of **top 3 CO2 emitting countries** over time\",\n \"output_fields\": [\"Year\", \"Entity\", \"Energy Source\", \"Energy\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"Energy\",\n \"color\": \"Energy Source\",\n \"facet\": \"Entity\"\n },\n \"reason\": \"The user wants to see the energy composition specifically for the top 3 CO2 emitting countries over time, following up from the global energy composition view. We need to identify which countries have the highest total CO2 emissions, filter the data for those countries, and transform it into long format by energy source. A facet encoding is added to show separate area charts for each of the top 3 countries, allowing comparison of their individual energy compositions over time.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total CO2 emissions by country across all years\n total_co2_by_country = df_energy.groupby('Entity')['Value_co2_emissions_kt_by_country'].sum().reset_index()\n \n # Get top 3 CO2 emitting countries\n top3_countries = total_co2_by_country.nlargest(3, 'Value_co2_emissions_kt_by_country')['Entity'].tolist()\n \n # Filter data for top 3 countries\n df_top3 = df_energy[df_energy['Entity'].isin(top3_countries)].copy()\n \n # Select relevant columns and rename for clarity\n df_top3 = df_top3[['Year', 'Entity', 'Electricity from fossil fuels (TWh)', \n 'Electricity from nuclear (TWh)', 'Electricity from renewables (TWh)']].copy()\n \n df_top3.columns = ['Year', 'Entity', 'Fossil Fuels', 'Nuclear', 'Renewables']\n \n # Transform to long format\n transformed_df = pd.melt(\n df_top3,\n id_vars=['Year', 'Entity'],\n value_vars=['Fossil Fuels', 'Nuclear', 'Renewables'],\n var_name='Energy Source',\n value_name='Energy'\n )\n \n # Sort by Entity, Year, and Energy Source\n transformed_df = transformed_df.sort_values(['Entity', 'Year', 'Energy Source']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}], "trigger": {"tableId": "table-81", "resultTableId": "table-10", "chart": {"id": "chart-1760739075465", "chartType": "Custom Area", "encodingMap": {"x": {"channel": "x", "bin": false, "fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"channel": "y", "bin": false, "fieldID": "concept-1760738917347"}, "x2": {}, "y2": {"channel": "y2", "bin": false}, "color": {"fieldID": "concept-1760738921927", "sortBy": "[\"Fossil Fuels\",\"Nuclear\",\"Renewables\"]"}, "column": {}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-81", "saved": false, "source": "trigger", "unread": false}, "interaction": [{"from": "user", "to": "datatransform-agent", "role": "instruction", "content": "show the composition of top 3 CO2 emitting countries", "displayContent": "Show **energy composition** of **top 3 CO2 emitting countries** over time"}]}, "explanation": {"agent": "CodeExplanationAgent", "code": "The code performs the following high-level transformations:\n\n1. **Identifies top polluters**: Calculates the total `Value_co2_emissions_kt_by_country` across all years for each `Entity` (country), then selects the **top 3** countries with the highest cumulative CO2 emissions.\n\n2. **Filters the dataset**: Retains only records for the **top 3 CO2-emitting countries** from the original energy data.\n\n3. **Simplifies column structure**: Extracts the relevant energy production columns (`Electricity from fossil fuels (TWh)`, `Electricity from nuclear (TWh)`, `Electricity from renewables (TWh)`) and renames them to **Fossil Fuels**, **Nuclear**, and **Renewables** for clarity.\n\n4. **Reshapes data to long format**: Transforms the data from wide format (separate columns for each energy source) to long format, where each row represents a specific `Year`, `Entity`, and `Energy Source` combination with its corresponding `Energy` production value.\n\n5. **Sorts the output**: Orders the final dataset by `Entity`, `Year`, and `Energy Source` for systematic presentation.", "concepts": [], "dialog": [{"content": "You are a data scientist to help user explain code, \nso that a non-code can clearly understand what the code is doing, you are provided with a summary of the input data, and the transformation code.\n\nYour goal:\n1. You should generate a good itemized explanation of the code so that the reader can understand high-level steps of what the data transformation is doing.\n - Be very concise, and stay at a high-level. The reader doesn't understand code and does not want to learn exactly what the code is doing. They just want to learn what have been done from a logical level.\n - The explanation should be a markdown string that is a list of bullet points (with new lines), highlight constants, data fields, and important verbs.\n2. Generate a list of explanations for new fields (fields not from the input data) that introduce metrics/concepts that are not obvious from the code.\n - provide a declarative definition that explains the new field, use a mathematical notation if applicable.\n - only include new fields explanation of new metrics that are involved in computation (e.g., ROI, commerical_success_score)\n - *DO NOT* explain trivial new fields like \"Decade\" or \"Avg_Rating\", \"US_Sales\" that are self-explanatory.\n - Avoid explaining fields that are simple aggregate of fields in the original data (min_score, avg_value, count, etc.)\n - When a field involves mathematical computation, you can use LaTeX math notation in the explanation. Format mathematical expressions using:\n - Inline math: `\\( ... \\)` for formulas within text\n - Block math: `\\[ ... \\]` for standalone formulas\n - Examples: `\\( \\frac{\\text{Revenue}}{\\text{Cost}} \\)` for ratios, `\\[ \\text{Score} = \\text{Rating} \\times \\text{Worldwide\\_Gross} \\]` for formulas\n - note: when using underscores as part of the text, you need to escape them with a backslash, e.g., `\\_`\n - Note: don't use math notation for fields whose computation is trivial (use plain english), it will likely be confusing to the reader. \n Only use math notation for fields that can not be easilyexplained in plain english. Use it sparingly.\n3. If there are multiple fields that have the similar computation, you can explain them together in one explanation.\n - in \"field\", you can provide a list of fields in format of \"field1, field2, ...\"\n - in \"explanation\", you can provide a single explanation for the computation of the fields.\n - for example, if you have fields like \"Norm_Rating\", \"Norm_Gross\", \"Critical_Commercial_Score\", you can explain Norm_Rating, Norm_Gross together in one explanation and explain Critical_Commercial_Score in another explanation.\n4. If the code is about statistical analysis, you should explain the statistical analysis in the explanation as a concept named \"Statistical Analysis\" in the [CONCEPTS EXPLANATION] section.\n - explain how you model the data, which fields are used, how data processing is done, and what models are used.\n - suggest some other modeling approaches that can be used to analyze the data in the explanation as well.\n \nThe focus is to explain how new fields are computed, don't generate explanation for low-level actions like \"return\", \"load data\" etc. \n\nProvide the result in the following two sections:\n - first section is the code explanation that should be a markdown block explaining the code, in the [CODE EXPLANATION] section.\n - remember to highlight constants, data fields, and important verbs in the code explanation.\n - second section is the concepts explanation that should be a json block (start with ```json) in the [CONCEPTS EXPLANATION] section.\n\n[CODE EXPLANATION]\n\n...(explanation of the code)\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"...\",\n \"explanation\": \"...\"\n }\n]\n\n```\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small_csv)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 5593029.785, 5736319.824, 5738290.039, 9861099.609\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 62.21, 63.02, 63.71, 439.73\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[CODE]\n\nhere is the transformation code: \nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total CO2 emissions by country across all years\n total_co2_by_country = df_energy.groupby('Entity')['Value_co2_emissions_kt_by_country'].sum().reset_index()\n \n # Get top 3 CO2 emitting countries\n top3_countries = total_co2_by_country.nlargest(3, 'Value_co2_emissions_kt_by_country')['Entity'].tolist()\n \n # Filter data for top 3 countries\n df_top3 = df_energy[df_energy['Entity'].isin(top3_countries)].copy()\n \n # Select relevant columns and rename for clarity\n df_top3 = df_top3[['Year', 'Entity', 'Electricity from fossil fuels (TWh)', \n 'Electricity from nuclear (TWh)', 'Electricity from renewables (TWh)']].copy()\n \n df_top3.columns = ['Year', 'Entity', 'Fossil Fuels', 'Nuclear', 'Renewables']\n \n # Transform to long format\n transformed_df = pd.melt(\n df_top3,\n id_vars=['Year', 'Entity'],\n value_vars=['Fossil Fuels', 'Nuclear', 'Renewables'],\n var_name='Energy Source',\n value_name='Energy'\n )\n \n # Sort by Entity, Year, and Energy Source\n transformed_df = transformed_df.sort_values(['Entity', 'Year', 'Energy Source']).reset_index(drop=True)\n \n return transformed_df\n\n\n[EXPLANATION]\n", "role": "user"}, {"content": "[CODE EXPLANATION]\n\nThe code performs the following high-level transformations:\n\n1. **Identifies top polluters**: Calculates the total `Value_co2_emissions_kt_by_country` across all years for each `Entity` (country), then selects the **top 3** countries with the highest cumulative CO2 emissions.\n\n2. **Filters the dataset**: Retains only records for the **top 3 CO2-emitting countries** from the original energy data.\n\n3. **Simplifies column structure**: Extracts the relevant energy production columns (`Electricity from fossil fuels (TWh)`, `Electricity from nuclear (TWh)`, `Electricity from renewables (TWh)`) and renames them to **Fossil Fuels**, **Nuclear**, and **Renewables** for clarity.\n\n4. **Reshapes data to long format**: Transforms the data from wide format (separate columns for each energy source) to long format, where each row represents a specific `Year`, `Entity`, and `Energy Source` combination with its corresponding `Energy` production value.\n\n5. **Sorts the output**: Orders the final dataset by `Entity`, `Year`, and `Energy Source` for systematic presentation.\n\n[CONCEPTS EXPLANATION]\n\n```json\n[]\n```", "role": "assistant"}], "status": "ok"}, "status": "completed"}, "anchored": false, "attachedMetadata": ""}], "charts": [{"id": "chart-1760739084455", "chartType": "Custom Area", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"fieldID": "concept-1760738917347"}, "x2": {}, "y2": {"channel": "y2", "bin": false}, "color": {"fieldID": "concept-1760738921927"}, "column": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-10", "saved": false, "source": "user", "unread": false}, {"id": "chart-1760739021201", "chartType": "Custom Area", "encodingMap": {"x": {"channel": "x", "bin": false, "fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"channel": "y", "bin": false, "fieldID": "concept-1760738917347"}, "x2": {}, "y2": {"channel": "y2", "bin": false}, "color": {"fieldID": "concept-1760738921927", "sortBy": "[\"Fossil Fuels\",\"Nuclear\",\"Renewables\"]"}, "column": {}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-81", "saved": false, "source": "user", "unread": false}, {"id": "chart-1760738819387", "chartType": "Grouped Bar Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "y": {"fieldID": "concept-Renewable Percentage-1760738820889"}, "column": {"channel": "column", "bin": false}, "row": {"channel": "row", "bin": false}, "group": {"fieldID": "original--global-energy-20-small.csv--Year"}}, "tableRef": "table-27", "saved": false, "source": "user", "unread": false}, {"id": "chart-1760738770100", "chartType": "Grouped Bar Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "y": {"fieldID": "original--global-energy-20-small.csv--Electricity from renewables (TWh)"}, "column": {"channel": "column", "bin": false}, "row": {"channel": "row", "bin": false}, "group": {"fieldID": "original--global-energy-20-small.csv--Year"}}, "tableRef": "table-97", "saved": false, "source": "user", "unread": false}, {"id": "chart-1760738436615", "chartType": "Line Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year", "sortOrder": "ascending"}, "y": {"fieldID": "concept-rank-1760738444550", "sortOrder": "descending"}, "color": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "opacity": {"channel": "opacity", "bin": false}, "column": {}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-78", "saved": false, "source": "user", "unread": false}, {"id": "chart-1760738423852", "chartType": "Dotted Line Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"fieldID": "concept-renewable_percentage-1760738424337"}, "color": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "column": {}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-45", "saved": false, "source": "user", "unread": false}, {"id": "chart-1760738400970", "chartType": "Line Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"fieldID": "concept-1760738385163"}, "color": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "opacity": {"channel": "opacity", "bin": false}, "column": {"fieldID": "concept-1760738389404", "sortBy": "[\"fossil fuels\",\"nuclear\",\"renewables\"]"}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-82", "saved": false, "source": "user", "unread": false}, {"id": "chart-1760738355655", "chartType": "Line Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"fieldID": "original--global-energy-20-small.csv--Value_co2_emissions_kt_by_country"}, "color": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "opacity": {"channel": "opacity", "bin": false}, "column": {"channel": "column", "bin": false}, "row": {"channel": "row", "bin": false}}, "tableRef": "global-energy-20-small.csv", "saved": false, "source": "user", "unread": false}], "conceptShelfItems": [{"id": "concept-1760738921927", "name": "Energy Source", "type": "auto", "description": "", "source": "custom", "tableRef": "custom"}, {"id": "concept-1760738917347", "name": "Energy", "type": "auto", "description": "", "source": "custom", "tableRef": "custom"}, {"id": "concept-Renewable Percentage-1760738820889", "name": "Renewable Percentage", "type": "auto", "description": "", "source": "custom", "tableRef": "custom", "temporary": true}, {"id": "concept-1760738743125", "name": "Renewable Energy", "type": "auto", "description": "", "source": "custom", "tableRef": "custom"}, {"id": "concept-rank-1760738444550", "name": "rank", "type": "auto", "description": "", "source": "custom", "tableRef": "custom", "temporary": true}, {"id": "concept-renewable_percentage-1760738424337", "name": "renewable_percentage", "type": "auto", "description": "", "source": "custom", "tableRef": "custom", "temporary": true}, {"id": "concept-1760738389404", "name": "source", "type": "auto", "description": "", "source": "custom", "tableRef": "custom"}, {"id": "concept-1760738385163", "name": "energy", "type": "auto", "description": "", "source": "custom", "tableRef": "custom"}, {"id": "original--global-energy-20-small.csv--Year", "name": "Year", "type": "integer", "source": "original", "description": "", "tableRef": "global-energy-20-small.csv"}, {"id": "original--global-energy-20-small.csv--Entity", "name": "Entity", "type": "string", "source": "original", "description": "", "tableRef": "global-energy-20-small.csv"}, {"id": "original--global-energy-20-small.csv--Value_co2_emissions_kt_by_country", "name": "Value_co2_emissions_kt_by_country", "type": "number", "source": "original", "description": "", "tableRef": "global-energy-20-small.csv"}, {"id": "original--global-energy-20-small.csv--Electricity from fossil fuels (TWh)", "name": "Electricity from fossil fuels (TWh)", "type": "number", "source": "original", "description": "", "tableRef": "global-energy-20-small.csv"}, {"id": "original--global-energy-20-small.csv--Electricity from nuclear (TWh)", "name": "Electricity from nuclear (TWh)", "type": "number", "source": "original", "description": "", "tableRef": "global-energy-20-small.csv"}, {"id": "original--global-energy-20-small.csv--Electricity from renewables (TWh)", "name": "Electricity from renewables (TWh)", "type": "number", "source": "original", "description": "", "tableRef": "global-energy-20-small.csv"}], "messages": [{"timestamp": 1760831081885, "type": "success", "component": "data formulator", "value": "Successfully loaded Global Energy"}], "displayedMessageIdx": 0, "viewMode": "report", "chartSynthesisInProgress": [], "config": {"formulateTimeoutSeconds": 60, "maxRepairAttempts": 1, "defaultChartWidth": 300, "defaultChartHeight": 300}, "dataCleanBlocks": [], "cleanInProgress": false, "generatedReports": [{"id": "report-1760831156182-8277", "content": "# Global Renewable Energy Shift: 2000 to 2020\n\nBetween 2000 and 2020, the world witnessed a notable transformation in renewable energy adoption. Global renewable electricity nearly doubled from 16% to 29% of total generation, signaling meaningful progress in the energy transition.\n\n[IMAGE(chart-1760738819387)]\n\nThe data reveals striking regional variations. Brazil maintained renewable leadership above 84%, while countries like Australia, Germany, and Italy dramatically expanded their renewable capacity—Australia tripling from 9% to 26%. However, some nations like Mexico experienced declining renewable shares, highlighting uneven progress across different energy systems.\n\n**In summary**, while the 20-year period shows encouraging momentum toward cleaner energy, the pace and direction vary significantly by country, suggesting that achieving global renewable energy goals will require sustained, coordinated efforts tailored to each nation's unique energy landscape and policy environment.", "style": "short note", "selectedChartIds": ["chart-1760738819387"], "createdAt": 1760831163718, "status": "completed", "title": "Global Renewable Energy Shift: 2000 to 2020", "anchorChartId": "chart-1760738819387"}, {"id": "report-1760831130105-4063", "content": "# The Global Renewable Energy Revolution: Two Decades of Transformation\n\nThe world's energy landscape has undergone a remarkable transformation between 2000 and 2020, with renewable electricity generation emerging as a critical component of the global energy mix. This shift reflects both technological advancement and growing commitment to sustainable energy solutions.\n\n[IMAGE(chart-1760738770100)]\n\nLooking at renewable energy adoption across major economies, the growth has been nothing short of extraordinary. China leads the pack with a staggering increase from 225.56 TWh in 2000 to 2,184.94 TWh in 2020—nearly a tenfold expansion. The United States more than doubled its renewable output from 335.45 TWh to 821.40 TWh, while Brazil grew from 308.77 TWh to 520.01 TWh. Notably, countries like Australia, India, and Germany also demonstrated significant gains, with Australia jumping from just 17.11 TWh to 63.99 TWh during this period.\n\n[IMAGE(chart-1760739084455)]\n\nWhen examining the energy portfolios of the three largest CO2 emitters—China, India, and the United States—a complex picture emerges. While China's total energy consumption has grown exponentially, with fossil fuels still dominating, the renewable sector (shown in red) has expanded substantially. The United States shows a more stable total energy consumption, with renewables gradually claiming a larger share. India's energy growth, though significant, remains heavily reliant on fossil fuels, though renewable adoption is accelerating.\n\n**In summary**, the past two decades reveal a global energy transition in progress. While renewable energy has achieved impressive growth worldwide, fossil fuels continue to dominate electricity generation in major economies. Key questions remain: Can this momentum accelerate sufficiently to meet climate goals? What policies will drive faster renewable adoption in emerging economies?", "style": "blog post", "selectedChartIds": ["chart-1760739084455", "chart-1760738770100"], "createdAt": 1760831142289, "status": "completed", "title": "The Global Renewable Energy Revolution: Two Decades of Transformation", "anchorChartId": "chart-1760739084455"}, {"id": "report-1760831094231-2424", "content": "# Global Renewable Energy: A Tale of Leaders and Laggards\n\nThe global energy landscape has undergone significant transformation over the past two decades, with renewable energy emerging as a critical player in the electricity mix. Analyzing data from 21 major economies between 2000 and 2020 reveals striking disparities in how nations have embraced clean energy alternatives.\n\n[IMAGE(chart-1760738423852)]\n\nThe first visualization reveals a fascinating divergence in renewable energy adoption. **Brazil** stands out as a consistent leader, maintaining renewable electricity percentages between 75-90% throughout the entire period, thanks largely to its robust hydroelectric infrastructure. **Canada** follows a similar trajectory, steadily increasing from about 60% to nearly 70% by 2020. Meanwhile, **Germany, Spain, Italy, and the United Kingdom** show remarkable growth trajectories, climbing from under 20% in the early 2000s to over 40% by 2020—demonstrating that nations can dramatically reshape their energy portfolios within two decades.\n\nOn the opposite end of the spectrum, **South Africa** remains nearly flat at the bottom, showing minimal renewable adoption despite global trends. **Saudi Arabia** and **Poland** also lag significantly, though both show modest upticks in recent years.\n\n[IMAGE(chart-1760738436615)]\n\nThe ranking chart illustrates the competitive dynamics of renewable energy leadership. **Brazil and Canada** maintain their dominance at ranks 1-2 throughout most years, while European nations like **Germany, Spain, and the UK** engage in a dynamic competition for the 3rd-5th positions, particularly after 2010. The volatility in middle rankings reflects the rapid changes in energy policy and investment across different nations, with countries like **China** climbing from lower ranks to break into the top 7 by 2020.\n\n**In summary**, the data reveals a bifurcated global energy transition: a group of progressive nations have successfully scaled renewable electricity to 40-90% of their mix, while others remain heavily dependent on fossil fuels. These patterns suggest that political will, natural resource endowment, and infrastructure investment are key determinants of renewable energy success. Important follow-up questions include: What policy mechanisms enabled top performers to achieve such high renewable percentages? Can lagging nations replicate these successes, or do geographic and economic constraints create insurmountable barriers?", "style": "blog post", "selectedChartIds": ["chart-1760738423852", "chart-1760738436615"], "createdAt": 1760831110064, "status": "completed", "title": "Global Renewable Energy: A Tale of Leaders and Laggards", "anchorChartId": "chart-1760738423852"}], "currentReport": {"id": "report-1760750575650-2619", "content": "# Hollywood's Billion-Dollar Hitmakers\n\n*Avatar* stands alone—earning over $2.5B in profit, dwarfing all competition. Action and Adventure films dominate the most profitable titles, with franchises like *Jurassic Park*, *The Dark Knight*, and *Lord of the Rings* proving blockbuster formulas work.\n\n\"Chart\"\n\nSteven Spielberg leads all directors with $7.2B in total profit across his career, showcasing remarkable consistency with hits spanning decades—from *Jurassic Park* to *E.T.* His nearest competitors trail by billions, underlining his unmatched commercial impact.\n\n\"Chart\"\n\n**In summary**, mega-budget Action and Adventure films generate extraordinary returns when they succeed, and a handful of elite directors—led by Spielberg—have mastered the formula for sustained box office dominance.", "style": "short note", "selectedChartIds": ["chart-1760743347871", "chart-1760743768741"], "chartImages": {}, "createdAt": 1760750584189, "title": "Report - 10/17/2025"}, "activeChallenges": [], "_persist": {"version": -1, "rehydrated": true}, "draftNodes": [], "focusedId": {"type": "report", "reportId": "report-1760831156182-8277"}} \ No newline at end of file +{"tables": [{"kind": "table", "id": "global-energy-20-small.csv", "displayId": "energy-co2", "names": ["Year", "Entity", "Value_co2_emissions_kt_by_country", "Electricity from fossil fuels (TWh)", "Electricity from nuclear (TWh)", "Electricity from renewables (TWh)"], "metadata": {"Year": {"type": "number", "semanticType": "Year"}, "Entity": {"type": "string", "semanticType": "Location"}, "Value_co2_emissions_kt_by_country": {"type": "number", "semanticType": "Number"}, "Electricity from fossil fuels (TWh)": {"type": "number", "semanticType": "Number"}, "Electricity from nuclear (TWh)": {"type": "number", "semanticType": "Number"}, "Electricity from renewables (TWh)": {"type": "number", "semanticType": "Number"}}, "rows": [{"Year": 2000, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 339450, "Electricity from fossil fuels (TWh)": 181.05, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 17.11}, {"Year": 2001, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 345640, "Electricity from fossil fuels (TWh)": 194.33, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 17.4}, {"Year": 2002, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 353369.9951, "Electricity from fossil fuels (TWh)": 197.29, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 17.35}, {"Year": 2003, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 352579.9866, "Electricity from fossil fuels (TWh)": 195.13, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 18.5}, {"Year": 2004, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 365809.9976, "Electricity from fossil fuels (TWh)": 203.66, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 19.41}, {"Year": 2005, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 370089.9963, "Electricity from fossil fuels (TWh)": 195.95, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 19.75}, {"Year": 2006, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 375489.9902, "Electricity from fossil fuels (TWh)": 198.72, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 21.19}, {"Year": 2007, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 385750, "Electricity from fossil fuels (TWh)": 208.59, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 20.93}, {"Year": 2008, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 388940.0024, "Electricity from fossil fuels (TWh)": 211.06, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 18.49}, {"Year": 2009, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 395290.0085, "Electricity from fossil fuels (TWh)": 216.42, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 18.32}, {"Year": 2010, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 387540.0085, "Electricity from fossil fuels (TWh)": 212.5, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 21.13}, {"Year": 2011, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 386380.0049, "Electricity from fossil fuels (TWh)": 213.56, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 27.33}, {"Year": 2012, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 386970.0012, "Electricity from fossil fuels (TWh)": 206.75, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 26.63}, {"Year": 2013, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 380279.9988, "Electricity from fossil fuels (TWh)": 195.78, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 34.2}, {"Year": 2014, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 371630.0049, "Electricity from fossil fuels (TWh)": 205.46, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 36.15}, {"Year": 2015, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 377799.9878, "Electricity from fossil fuels (TWh)": 197.72, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 33.12}, {"Year": 2016, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 384989.9902, "Electricity from fossil fuels (TWh)": 207.66, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 38.41}, {"Year": 2017, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 389160.0037, "Electricity from fossil fuels (TWh)": 209.14, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 40.77}, {"Year": 2018, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 387070.0073, "Electricity from fossil fuels (TWh)": 207.45, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 42.93}, {"Year": 2019, "Entity": "Australia", "Value_co2_emissions_kt_by_country": 386529.9988, "Electricity from fossil fuels (TWh)": 196.45, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 53.41}, {"Year": 2020, "Entity": "Australia", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 186.92, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 63.99}, {"Year": 2000, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 313670, "Electricity from fossil fuels (TWh)": 28.87, "Electricity from nuclear (TWh)": 4.94, "Electricity from renewables (TWh)": 308.77}, {"Year": 2001, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 319380, "Electricity from fossil fuels (TWh)": 35.19, "Electricity from nuclear (TWh)": 14.27, "Electricity from renewables (TWh)": 273.71}, {"Year": 2002, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 317760.0098, "Electricity from fossil fuels (TWh)": 33.5, "Electricity from nuclear (TWh)": 13.84, "Electricity from renewables (TWh)": 292.95}, {"Year": 2003, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 310809.9976, "Electricity from fossil fuels (TWh)": 31.62, "Electricity from nuclear (TWh)": 13.4, "Electricity from renewables (TWh)": 313.88}, {"Year": 2004, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 328519.989, "Electricity from fossil fuels (TWh)": 40.14, "Electricity from nuclear (TWh)": 11.6, "Electricity from renewables (TWh)": 329.43}, {"Year": 2005, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 331690.0024, "Electricity from fossil fuels (TWh)": 39.56, "Electricity from nuclear (TWh)": 9.2, "Electricity from renewables (TWh)": 346.96}, {"Year": 2006, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 335619.9951, "Electricity from fossil fuels (TWh)": 39.4, "Electricity from nuclear (TWh)": 12.98, "Electricity from renewables (TWh)": 359.55}, {"Year": 2007, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 352559.9976, "Electricity from fossil fuels (TWh)": 37.64, "Electricity from nuclear (TWh)": 11.65, "Electricity from renewables (TWh)": 387.88}, {"Year": 2008, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 373630.0049, "Electricity from fossil fuels (TWh)": 55.87, "Electricity from nuclear (TWh)": 13.21, "Electricity from renewables (TWh)": 385.61}, {"Year": 2009, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 350000, "Electricity from fossil fuels (TWh)": 36.32, "Electricity from nuclear (TWh)": 12.22, "Electricity from renewables (TWh)": 410.13}, {"Year": 2010, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 397929.9927, "Electricity from fossil fuels (TWh)": 61.02, "Electricity from nuclear (TWh)": 13.77, "Electricity from renewables (TWh)": 435.99}, {"Year": 2011, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 418309.9976, "Electricity from fossil fuels (TWh)": 50.27, "Electricity from nuclear (TWh)": 14.8, "Electricity from renewables (TWh)": 462.32}, {"Year": 2012, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 454230.011, "Electricity from fossil fuels (TWh)": 77.21, "Electricity from nuclear (TWh)": 15.17, "Electricity from renewables (TWh)": 454.78}, {"Year": 2013, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 486839.9963, "Electricity from fossil fuels (TWh)": 112, "Electricity from nuclear (TWh)": 14.65, "Electricity from renewables (TWh)": 436.84}, {"Year": 2014, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 511619.9951, "Electricity from fossil fuels (TWh)": 136.58, "Electricity from nuclear (TWh)": 14.46, "Electricity from renewables (TWh)": 430.82}, {"Year": 2015, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 485339.9963, "Electricity from fossil fuels (TWh)": 128.85, "Electricity from nuclear (TWh)": 13.91, "Electricity from renewables (TWh)": 428.81}, {"Year": 2016, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 447079.9866, "Electricity from fossil fuels (TWh)": 93.06, "Electricity from nuclear (TWh)": 14.97, "Electricity from renewables (TWh)": 463.37}, {"Year": 2017, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 456489.9902, "Electricity from fossil fuels (TWh)": 101.9, "Electricity from nuclear (TWh)": 14.86, "Electricity from renewables (TWh)": 464.4}, {"Year": 2018, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 433989.9902, "Electricity from fossil fuels (TWh)": 86.69, "Electricity from nuclear (TWh)": 14.79, "Electricity from renewables (TWh)": 492.66}, {"Year": 2019, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": 434299.9878, "Electricity from fossil fuels (TWh)": 90.91, "Electricity from nuclear (TWh)": 15.16, "Electricity from renewables (TWh)": 512.59}, {"Year": 2020, "Entity": "Brazil", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 81.15, "Electricity from nuclear (TWh)": 13.21, "Electricity from renewables (TWh)": 520.01}, {"Year": 2000, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 514220, "Electricity from fossil fuels (TWh)": 155.56, "Electricity from nuclear (TWh)": 69.16, "Electricity from renewables (TWh)": 363.7}, {"Year": 2001, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 506620, "Electricity from fossil fuels (TWh)": 159.93, "Electricity from nuclear (TWh)": 72.86, "Electricity from renewables (TWh)": 339.58}, {"Year": 2002, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 524349.9756, "Electricity from fossil fuels (TWh)": 155.12, "Electricity from nuclear (TWh)": 71.75, "Electricity from renewables (TWh)": 357.06}, {"Year": 2003, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 544539.978, "Electricity from fossil fuels (TWh)": 157.35, "Electricity from nuclear (TWh)": 71.15, "Electricity from renewables (TWh)": 343.88}, {"Year": 2004, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 536419.9829, "Electricity from fossil fuels (TWh)": 148.86, "Electricity from nuclear (TWh)": 85.87, "Electricity from renewables (TWh)": 347.68}, {"Year": 2005, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 549030.0293, "Electricity from fossil fuels (TWh)": 150.78, "Electricity from nuclear (TWh)": 86.83, "Electricity from renewables (TWh)": 368.86}, {"Year": 2006, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 540530.0293, "Electricity from fossil fuels (TWh)": 139.71, "Electricity from nuclear (TWh)": 92.44, "Electricity from renewables (TWh)": 360.48}, {"Year": 2007, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 571630.0049, "Electricity from fossil fuels (TWh)": 149.36, "Electricity from nuclear (TWh)": 88.19, "Electricity from renewables (TWh)": 375.42}, {"Year": 2008, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 550469.9707, "Electricity from fossil fuels (TWh)": 141.33, "Electricity from nuclear (TWh)": 88.3, "Electricity from renewables (TWh)": 385.21}, {"Year": 2009, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 521320.0073, "Electricity from fossil fuels (TWh)": 129.76, "Electricity from nuclear (TWh)": 85.13, "Electricity from renewables (TWh)": 380.24}, {"Year": 2010, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 537010.0098, "Electricity from fossil fuels (TWh)": 130.08, "Electricity from nuclear (TWh)": 85.53, "Electricity from renewables (TWh)": 366.21}, {"Year": 2011, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 549289.978, "Electricity from fossil fuels (TWh)": 131.3, "Electricity from nuclear (TWh)": 88.29, "Electricity from renewables (TWh)": 391.95}, {"Year": 2012, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 546210.022, "Electricity from fossil fuels (TWh)": 124.2, "Electricity from nuclear (TWh)": 89.49, "Electricity from renewables (TWh)": 398.58}, {"Year": 2013, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 555659.9731, "Electricity from fossil fuels (TWh)": 122.87, "Electricity from nuclear (TWh)": 97.58, "Electricity from renewables (TWh)": 417.28}, {"Year": 2014, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 561679.9927, "Electricity from fossil fuels (TWh)": 122.75, "Electricity from nuclear (TWh)": 101.21, "Electricity from renewables (TWh)": 412.13}, {"Year": 2015, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 558700.0122, "Electricity from fossil fuels (TWh)": 125.7, "Electricity from nuclear (TWh)": 96.05, "Electricity from renewables (TWh)": 417.2}, {"Year": 2016, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 556830.0171, "Electricity from fossil fuels (TWh)": 122.35, "Electricity from nuclear (TWh)": 95.69, "Electricity from renewables (TWh)": 426.84}, {"Year": 2017, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 568080.0171, "Electricity from fossil fuels (TWh)": 113.7, "Electricity from nuclear (TWh)": 95.57, "Electricity from renewables (TWh)": 435.43}, {"Year": 2018, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 580090.0269, "Electricity from fossil fuels (TWh)": 112.47, "Electricity from nuclear (TWh)": 95.03, "Electricity from renewables (TWh)": 428.39}, {"Year": 2019, "Entity": "Canada", "Value_co2_emissions_kt_by_country": 580210.022, "Electricity from fossil fuels (TWh)": 110.65, "Electricity from nuclear (TWh)": 95.47, "Electricity from renewables (TWh)": 421.8}, {"Year": 2020, "Entity": "Canada", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 102.19, "Electricity from nuclear (TWh)": 92.65, "Electricity from renewables (TWh)": 429.24}, {"Year": 2000, "Entity": "China", "Value_co2_emissions_kt_by_country": 3346530, "Electricity from fossil fuels (TWh)": 1113.3, "Electricity from nuclear (TWh)": 16.74, "Electricity from renewables (TWh)": 225.56}, {"Year": 2001, "Entity": "China", "Value_co2_emissions_kt_by_country": 3529080, "Electricity from fossil fuels (TWh)": 1182.59, "Electricity from nuclear (TWh)": 17.47, "Electricity from renewables (TWh)": 280.73}, {"Year": 2002, "Entity": "China", "Value_co2_emissions_kt_by_country": 3810060.059, "Electricity from fossil fuels (TWh)": 1337.46, "Electricity from nuclear (TWh)": 25.13, "Electricity from renewables (TWh)": 291.41}, {"Year": 2003, "Entity": "China", "Value_co2_emissions_kt_by_country": 4415910.156, "Electricity from fossil fuels (TWh)": 1579.96, "Electricity from nuclear (TWh)": 43.34, "Electricity from renewables (TWh)": 287.28}, {"Year": 2004, "Entity": "China", "Value_co2_emissions_kt_by_country": 5124819.824, "Electricity from fossil fuels (TWh)": 1795.41, "Electricity from nuclear (TWh)": 50.47, "Electricity from renewables (TWh)": 357.43}, {"Year": 2005, "Entity": "China", "Value_co2_emissions_kt_by_country": 5824629.883, "Electricity from fossil fuels (TWh)": 2042.8, "Electricity from nuclear (TWh)": 53.09, "Electricity from renewables (TWh)": 404.37}, {"Year": 2006, "Entity": "China", "Value_co2_emissions_kt_by_country": 6437470.215, "Electricity from fossil fuels (TWh)": 2364.16, "Electricity from nuclear (TWh)": 54.84, "Electricity from renewables (TWh)": 446.72}, {"Year": 2007, "Entity": "China", "Value_co2_emissions_kt_by_country": 6993180.176, "Electricity from fossil fuels (TWh)": 2718.7, "Electricity from nuclear (TWh)": 62.13, "Electricity from renewables (TWh)": 500.71}, {"Year": 2008, "Entity": "China", "Value_co2_emissions_kt_by_country": 7199600.098, "Electricity from fossil fuels (TWh)": 2762.29, "Electricity from nuclear (TWh)": 68.39, "Electricity from renewables (TWh)": 665.08}, {"Year": 2009, "Entity": "China", "Value_co2_emissions_kt_by_country": 7719069.824, "Electricity from fossil fuels (TWh)": 2980.2, "Electricity from nuclear (TWh)": 70.05, "Electricity from renewables (TWh)": 664.39}, {"Year": 2010, "Entity": "China", "Value_co2_emissions_kt_by_country": 8474919.922, "Electricity from fossil fuels (TWh)": 3326.19, "Electricity from nuclear (TWh)": 74.74, "Electricity from renewables (TWh)": 786.38}, {"Year": 2011, "Entity": "China", "Value_co2_emissions_kt_by_country": 9282549.805, "Electricity from fossil fuels (TWh)": 3811.77, "Electricity from nuclear (TWh)": 87.2, "Electricity from renewables (TWh)": 792.38}, {"Year": 2012, "Entity": "China", "Value_co2_emissions_kt_by_country": 9541870.117, "Electricity from fossil fuels (TWh)": 3869.38, "Electricity from nuclear (TWh)": 98.32, "Electricity from renewables (TWh)": 999.56}, {"Year": 2013, "Entity": "China", "Value_co2_emissions_kt_by_country": 9984570.313, "Electricity from fossil fuels (TWh)": 4203.77, "Electricity from nuclear (TWh)": 111.5, "Electricity from renewables (TWh)": 1093.37}, {"Year": 2014, "Entity": "China", "Value_co2_emissions_kt_by_country": 10006669.92, "Electricity from fossil fuels (TWh)": 4345.86, "Electricity from nuclear (TWh)": 133.22, "Electricity from renewables (TWh)": 1289.23}, {"Year": 2015, "Entity": "China", "Value_co2_emissions_kt_by_country": 9861099.609, "Electricity from fossil fuels (TWh)": 4222.76, "Electricity from nuclear (TWh)": 171.38, "Electricity from renewables (TWh)": 1393.66}, {"Year": 2016, "Entity": "China", "Value_co2_emissions_kt_by_country": 9874660.156, "Electricity from fossil fuels (TWh)": 4355, "Electricity from nuclear (TWh)": 213.18, "Electricity from renewables (TWh)": 1522.79}, {"Year": 2017, "Entity": "China", "Value_co2_emissions_kt_by_country": 10096009.77, "Electricity from fossil fuels (TWh)": 4643.1, "Electricity from nuclear (TWh)": 248.1, "Electricity from renewables (TWh)": 1667.06}, {"Year": 2018, "Entity": "China", "Value_co2_emissions_kt_by_country": 10502929.69, "Electricity from fossil fuels (TWh)": 4990.28, "Electricity from nuclear (TWh)": 295, "Electricity from renewables (TWh)": 1835.32}, {"Year": 2019, "Entity": "China", "Value_co2_emissions_kt_by_country": 10707219.73, "Electricity from fossil fuels (TWh)": 5098.22, "Electricity from nuclear (TWh)": 348.7, "Electricity from renewables (TWh)": 2014.57}, {"Year": 2020, "Entity": "China", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 5184.13, "Electricity from nuclear (TWh)": 366.2, "Electricity from renewables (TWh)": 2184.94}, {"Year": 2000, "Entity": "France", "Value_co2_emissions_kt_by_country": 373120, "Electricity from fossil fuels (TWh)": 50.61, "Electricity from nuclear (TWh)": 415.16, "Electricity from renewables (TWh)": 67.83}, {"Year": 2001, "Entity": "France", "Value_co2_emissions_kt_by_country": 376730, "Electricity from fossil fuels (TWh)": 46.48, "Electricity from nuclear (TWh)": 421.08, "Electricity from renewables (TWh)": 76.09}, {"Year": 2002, "Entity": "France", "Value_co2_emissions_kt_by_country": 371019.989, "Electricity from fossil fuels (TWh)": 52.67, "Electricity from nuclear (TWh)": 436.76, "Electricity from renewables (TWh)": 62.69}, {"Year": 2003, "Entity": "France", "Value_co2_emissions_kt_by_country": 376709.9915, "Electricity from fossil fuels (TWh)": 57.38, "Electricity from nuclear (TWh)": 441.07, "Electricity from renewables (TWh)": 61.47}, {"Year": 2004, "Entity": "France", "Value_co2_emissions_kt_by_country": 377790.0085, "Electricity from fossil fuels (TWh)": 56.53, "Electricity from nuclear (TWh)": 448.24, "Electricity from renewables (TWh)": 62.42}, {"Year": 2005, "Entity": "France", "Value_co2_emissions_kt_by_country": 380660.0037, "Electricity from fossil fuels (TWh)": 63.35, "Electricity from nuclear (TWh)": 451.53, "Electricity from renewables (TWh)": 54.98}, {"Year": 2006, "Entity": "France", "Value_co2_emissions_kt_by_country": 371549.9878, "Electricity from fossil fuels (TWh)": 56.9, "Electricity from nuclear (TWh)": 450.19, "Electricity from renewables (TWh)": 60.91}, {"Year": 2007, "Entity": "France", "Value_co2_emissions_kt_by_country": 362829.9866, "Electricity from fossil fuels (TWh)": 58.18, "Electricity from nuclear (TWh)": 439.73, "Electricity from renewables (TWh)": 64.3}, {"Year": 2008, "Entity": "France", "Value_co2_emissions_kt_by_country": 357989.9902, "Electricity from fossil fuels (TWh)": 55.57, "Electricity from nuclear (TWh)": 439.45, "Electricity from renewables (TWh)": 72.33}, {"Year": 2009, "Entity": "France", "Value_co2_emissions_kt_by_country": 343730.011, "Electricity from fossil fuels (TWh)": 51.32, "Electricity from nuclear (TWh)": 409.74, "Electricity from renewables (TWh)": 68.15}, {"Year": 2010, "Entity": "France", "Value_co2_emissions_kt_by_country": 347779.9988, "Electricity from fossil fuels (TWh)": 57.63, "Electricity from nuclear (TWh)": 428.52, "Electricity from renewables (TWh)": 76.68}, {"Year": 2011, "Entity": "France", "Value_co2_emissions_kt_by_country": 335140.0146, "Electricity from fossil fuels (TWh)": 58.99, "Electricity from nuclear (TWh)": 442.39, "Electricity from renewables (TWh)": 66.02}, {"Year": 2012, "Entity": "France", "Value_co2_emissions_kt_by_country": 338420.0134, "Electricity from fossil fuels (TWh)": 56.42, "Electricity from nuclear (TWh)": 425.41, "Electricity from renewables (TWh)": 85.25}, {"Year": 2013, "Entity": "France", "Value_co2_emissions_kt_by_country": 338559.9976, "Electricity from fossil fuels (TWh)": 53.35, "Electricity from nuclear (TWh)": 423.68, "Electricity from renewables (TWh)": 99.42}, {"Year": 2014, "Entity": "France", "Value_co2_emissions_kt_by_country": 306100.0061, "Electricity from fossil fuels (TWh)": 35.68, "Electricity from nuclear (TWh)": 436.48, "Electricity from renewables (TWh)": 94.03}, {"Year": 2015, "Entity": "France", "Value_co2_emissions_kt_by_country": 311299.9878, "Electricity from fossil fuels (TWh)": 44.65, "Electricity from nuclear (TWh)": 437.43, "Electricity from renewables (TWh)": 91.84}, {"Year": 2016, "Entity": "France", "Value_co2_emissions_kt_by_country": 313920.0134, "Electricity from fossil fuels (TWh)": 56.45, "Electricity from nuclear (TWh)": 403.2, "Electricity from renewables (TWh)": 99}, {"Year": 2017, "Entity": "France", "Value_co2_emissions_kt_by_country": 317829.9866, "Electricity from fossil fuels (TWh)": 65.09, "Electricity from nuclear (TWh)": 398.36, "Electricity from renewables (TWh)": 92.63}, {"Year": 2018, "Entity": "France", "Value_co2_emissions_kt_by_country": 307049.9878, "Electricity from fossil fuels (TWh)": 49.27, "Electricity from nuclear (TWh)": 412.94, "Electricity from renewables (TWh)": 113.62}, {"Year": 2019, "Entity": "France", "Value_co2_emissions_kt_by_country": 300519.989, "Electricity from fossil fuels (TWh)": 53.5, "Electricity from nuclear (TWh)": 399.01, "Electricity from renewables (TWh)": 113.21}, {"Year": 2020, "Entity": "France", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 48.14, "Electricity from nuclear (TWh)": 353.83, "Electricity from renewables (TWh)": 125.28}, {"Year": 2000, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 830280, "Electricity from fossil fuels (TWh)": 367.22, "Electricity from nuclear (TWh)": 169.61, "Electricity from renewables (TWh)": 35.47}, {"Year": 2001, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 847680, "Electricity from fossil fuels (TWh)": 372.69, "Electricity from nuclear (TWh)": 171.3, "Electricity from renewables (TWh)": 37.9}, {"Year": 2002, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 833380.0049, "Electricity from fossil fuels (TWh)": 372.64, "Electricity from nuclear (TWh)": 164.84, "Electricity from renewables (TWh)": 44.48}, {"Year": 2003, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 836789.978, "Electricity from fossil fuels (TWh)": 390.81, "Electricity from nuclear (TWh)": 165.06, "Electricity from renewables (TWh)": 46.67}, {"Year": 2004, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 821070.0073, "Electricity from fossil fuels (TWh)": 385.24, "Electricity from nuclear (TWh)": 167.07, "Electricity from renewables (TWh)": 57.97}, {"Year": 2005, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 802380.0049, "Electricity from fossil fuels (TWh)": 386.96, "Electricity from nuclear (TWh)": 163.05, "Electricity from renewables (TWh)": 63.4}, {"Year": 2006, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 814409.9731, "Electricity from fossil fuels (TWh)": 390.03, "Electricity from nuclear (TWh)": 167.27, "Electricity from renewables (TWh)": 72.51}, {"Year": 2007, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 783799.9878, "Electricity from fossil fuels (TWh)": 402.4, "Electricity from nuclear (TWh)": 140.53, "Electricity from renewables (TWh)": 89.38}, {"Year": 2008, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 789690.0024, "Electricity from fossil fuels (TWh)": 390.43, "Electricity from nuclear (TWh)": 148.49, "Electricity from renewables (TWh)": 94.28}, {"Year": 2009, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 734809.9976, "Electricity from fossil fuels (TWh)": 358.07, "Electricity from nuclear (TWh)": 134.93, "Electricity from renewables (TWh)": 95.94}, {"Year": 2010, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 773070.0073, "Electricity from fossil fuels (TWh)": 378.9, "Electricity from nuclear (TWh)": 140.56, "Electricity from renewables (TWh)": 105.18}, {"Year": 2011, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 746479.9805, "Electricity from fossil fuels (TWh)": 373.16, "Electricity from nuclear (TWh)": 107.97, "Electricity from renewables (TWh)": 124.04}, {"Year": 2012, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 760130.0049, "Electricity from fossil fuels (TWh)": 377.89, "Electricity from nuclear (TWh)": 99.46, "Electricity from renewables (TWh)": 143.04}, {"Year": 2013, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 776150.0244, "Electricity from fossil fuels (TWh)": 381.52, "Electricity from nuclear (TWh)": 97.29, "Electricity from renewables (TWh)": 152.34}, {"Year": 2014, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 736010.0098, "Electricity from fossil fuels (TWh)": 360.28, "Electricity from nuclear (TWh)": 97.13, "Electricity from renewables (TWh)": 162.54}, {"Year": 2015, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 742309.9976, "Electricity from fossil fuels (TWh)": 359.99, "Electricity from nuclear (TWh)": 91.79, "Electricity from renewables (TWh)": 188.79}, {"Year": 2016, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 747150.0244, "Electricity from fossil fuels (TWh)": 368.67, "Electricity from nuclear (TWh)": 84.63, "Electricity from renewables (TWh)": 189.67}, {"Year": 2017, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 732200.0122, "Electricity from fossil fuels (TWh)": 353.37, "Electricity from nuclear (TWh)": 76.32, "Electricity from renewables (TWh)": 216.32}, {"Year": 2018, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 707700.0122, "Electricity from fossil fuels (TWh)": 334.65, "Electricity from nuclear (TWh)": 76, "Electricity from renewables (TWh)": 222.07}, {"Year": 2019, "Entity": "Germany", "Value_co2_emissions_kt_by_country": 657400.0244, "Electricity from fossil fuels (TWh)": 284.09, "Electricity from nuclear (TWh)": 75.07, "Electricity from renewables (TWh)": 240.33}, {"Year": 2020, "Entity": "Germany", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 251.4, "Electricity from nuclear (TWh)": 64.38, "Electricity from renewables (TWh)": 251.48}, {"Year": 2000, "Entity": "India", "Value_co2_emissions_kt_by_country": 937860, "Electricity from fossil fuels (TWh)": 475.35, "Electricity from nuclear (TWh)": 15.77, "Electricity from renewables (TWh)": 80.27}, {"Year": 2001, "Entity": "India", "Value_co2_emissions_kt_by_country": 953540, "Electricity from fossil fuels (TWh)": 491.01, "Electricity from nuclear (TWh)": 18.89, "Electricity from renewables (TWh)": 76.19}, {"Year": 2002, "Entity": "India", "Value_co2_emissions_kt_by_country": 985450.0122, "Electricity from fossil fuels (TWh)": 517.51, "Electricity from nuclear (TWh)": 19.35, "Electricity from renewables (TWh)": 72.78}, {"Year": 2003, "Entity": "India", "Value_co2_emissions_kt_by_country": 1011770.02, "Electricity from fossil fuels (TWh)": 545.36, "Electricity from nuclear (TWh)": 18.14, "Electricity from renewables (TWh)": 74.63}, {"Year": 2004, "Entity": "India", "Value_co2_emissions_kt_by_country": 1085670.044, "Electricity from fossil fuels (TWh)": 567.86, "Electricity from nuclear (TWh)": 21.26, "Electricity from renewables (TWh)": 109.2}, {"Year": 2005, "Entity": "India", "Value_co2_emissions_kt_by_country": 1136469.971, "Electricity from fossil fuels (TWh)": 579.32, "Electricity from nuclear (TWh)": 17.73, "Electricity from renewables (TWh)": 107.47}, {"Year": 2006, "Entity": "India", "Value_co2_emissions_kt_by_country": 1215209.961, "Electricity from fossil fuels (TWh)": 599.24, "Electricity from nuclear (TWh)": 17.63, "Electricity from renewables (TWh)": 127.56}, {"Year": 2007, "Entity": "India", "Value_co2_emissions_kt_by_country": 1336739.99, "Electricity from fossil fuels (TWh)": 636.68, "Electricity from nuclear (TWh)": 17.83, "Electricity from renewables (TWh)": 141.75}, {"Year": 2008, "Entity": "India", "Value_co2_emissions_kt_by_country": 1424380.005, "Electricity from fossil fuels (TWh)": 674.27, "Electricity from nuclear (TWh)": 15.23, "Electricity from renewables (TWh)": 138.91}, {"Year": 2009, "Entity": "India", "Value_co2_emissions_kt_by_country": 1564880.005, "Electricity from fossil fuels (TWh)": 728.56, "Electricity from nuclear (TWh)": 16.82, "Electricity from renewables (TWh)": 134.33}, {"Year": 2010, "Entity": "India", "Value_co2_emissions_kt_by_country": 1659979.98, "Electricity from fossil fuels (TWh)": 771.78, "Electricity from nuclear (TWh)": 23.08, "Electricity from renewables (TWh)": 142.61}, {"Year": 2011, "Entity": "India", "Value_co2_emissions_kt_by_country": 1756739.99, "Electricity from fossil fuels (TWh)": 828.16, "Electricity from nuclear (TWh)": 32.22, "Electricity from renewables (TWh)": 173.62}, {"Year": 2012, "Entity": "India", "Value_co2_emissions_kt_by_country": 1909439.941, "Electricity from fossil fuels (TWh)": 893.45, "Electricity from nuclear (TWh)": 33.14, "Electricity from renewables (TWh)": 165.25}, {"Year": 2013, "Entity": "India", "Value_co2_emissions_kt_by_country": 1972430.054, "Electricity from fossil fuels (TWh)": 924.93, "Electricity from nuclear (TWh)": 33.31, "Electricity from renewables (TWh)": 187.9}, {"Year": 2014, "Entity": "India", "Value_co2_emissions_kt_by_country": 2147110.107, "Electricity from fossil fuels (TWh)": 1025.29, "Electricity from nuclear (TWh)": 34.69, "Electricity from renewables (TWh)": 202.04}, {"Year": 2015, "Entity": "India", "Value_co2_emissions_kt_by_country": 2158020.02, "Electricity from fossil fuels (TWh)": 1080.44, "Electricity from nuclear (TWh)": 38.31, "Electricity from renewables (TWh)": 203.21}, {"Year": 2016, "Entity": "India", "Value_co2_emissions_kt_by_country": 2195250, "Electricity from fossil fuels (TWh)": 1155.52, "Electricity from nuclear (TWh)": 37.9, "Electricity from renewables (TWh)": 208.21}, {"Year": 2017, "Entity": "India", "Value_co2_emissions_kt_by_country": 2320409.912, "Electricity from fossil fuels (TWh)": 1198.85, "Electricity from nuclear (TWh)": 37.41, "Electricity from renewables (TWh)": 234.9}, {"Year": 2018, "Entity": "India", "Value_co2_emissions_kt_by_country": 2451929.932, "Electricity from fossil fuels (TWh)": 1276.32, "Electricity from nuclear (TWh)": 39.05, "Electricity from renewables (TWh)": 263.61}, {"Year": 2019, "Entity": "India", "Value_co2_emissions_kt_by_country": 2456300.049, "Electricity from fossil fuels (TWh)": 1273.59, "Electricity from nuclear (TWh)": 45.16, "Electricity from renewables (TWh)": 303.16}, {"Year": 2020, "Entity": "India", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 1202.34, "Electricity from nuclear (TWh)": 44.61, "Electricity from renewables (TWh)": 315.76}, {"Year": 2000, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 280650, "Electricity from fossil fuels (TWh)": 78.43, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 19.6}, {"Year": 2001, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 302060, "Electricity from fossil fuels (TWh)": 83.96, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 22.19}, {"Year": 2002, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 305640.0146, "Electricity from fossil fuels (TWh)": 92.03, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 21}, {"Year": 2003, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 333890.0146, "Electricity from fossil fuels (TWh)": 97.57, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 19.82}, {"Year": 2004, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 341239.9902, "Electricity from fossil fuels (TWh)": 103.8, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 20.97}, {"Year": 2005, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 342149.9939, "Electricity from fossil fuels (TWh)": 110.22, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 22.66}, {"Year": 2006, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 364470.0012, "Electricity from fossil fuels (TWh)": 116.8, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 21.18}, {"Year": 2007, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 379959.9915, "Electricity from fossil fuels (TWh)": 124.1, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 24.29}, {"Year": 2008, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 376140.0146, "Electricity from fossil fuels (TWh)": 129.55, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 26.34}, {"Year": 2009, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 391079.9866, "Electricity from fossil fuels (TWh)": 136.05, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 26.79}, {"Year": 2010, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 415519.989, "Electricity from fossil fuels (TWh)": 142.88, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 34.63}, {"Year": 2011, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 475309.9976, "Electricity from fossil fuels (TWh)": 161.41, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 30.46}, {"Year": 2012, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 481510.0098, "Electricity from fossil fuels (TWh)": 177.83, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 31.11}, {"Year": 2013, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 447940.0024, "Electricity from fossil fuels (TWh)": 189.66, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 35.5}, {"Year": 2014, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 483910.0037, "Electricity from fossil fuels (TWh)": 203.11, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 34.41}, {"Year": 2015, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 488549.9878, "Electricity from fossil fuels (TWh)": 209.71, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 33.56}, {"Year": 2016, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 482510.0098, "Electricity from fossil fuels (TWh)": 217.97, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 39.58}, {"Year": 2017, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 517320.0073, "Electricity from fossil fuels (TWh)": 222.64, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 43.17}, {"Year": 2018, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 576989.9902, "Electricity from fossil fuels (TWh)": 235.41, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 48.38}, {"Year": 2019, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": 619840.0269, "Electricity from fossil fuels (TWh)": 247.39, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 48.04}, {"Year": 2020, "Entity": "Indonesia", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 238.91, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 52.91}, {"Year": 2000, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 436300, "Electricity from fossil fuels (TWh)": 218.28, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 50.87}, {"Year": 2001, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 436570, "Electricity from fossil fuels (TWh)": 216.73, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 54.35}, {"Year": 2002, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 443470.0012, "Electricity from fossil fuels (TWh)": 228.45, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 48.31}, {"Year": 2003, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 462200.0122, "Electricity from fossil fuels (TWh)": 238.52, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 46.86}, {"Year": 2004, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 472399.9939, "Electricity from fossil fuels (TWh)": 240.95, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 53.88}, {"Year": 2005, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 473829.9866, "Electricity from fossil fuels (TWh)": 247.29, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 48.43}, {"Year": 2006, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 466649.9939, "Electricity from fossil fuels (TWh)": 256.03, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 50.64}, {"Year": 2007, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 459369.9951, "Electricity from fossil fuels (TWh)": 259.49, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 47.72}, {"Year": 2008, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 444980.011, "Electricity from fossil fuels (TWh)": 254.34, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 58.16}, {"Year": 2009, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 397059.9976, "Electricity from fossil fuels (TWh)": 218.32, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 69.26}, {"Year": 2010, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 405269.989, "Electricity from fossil fuels (TWh)": 220.93, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 76.98}, {"Year": 2011, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 396690.0024, "Electricity from fossil fuels (TWh)": 216.78, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 82.96}, {"Year": 2012, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 376750, "Electricity from fossil fuels (TWh)": 204.26, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 92.22}, {"Year": 2013, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 346459.9915, "Electricity from fossil fuels (TWh)": 175.07, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 112}, {"Year": 2014, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 327500, "Electricity from fossil fuels (TWh)": 156.76, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 120.68}, {"Year": 2015, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 337859.9854, "Electricity from fossil fuels (TWh)": 172.06, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 108.89}, {"Year": 2016, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 333339.9963, "Electricity from fossil fuels (TWh)": 179.19, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 108.01}, {"Year": 2017, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 329190.0024, "Electricity from fossil fuels (TWh)": 189.44, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 103.89}, {"Year": 2018, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 324880.0049, "Electricity from fossil fuels (TWh)": 172.98, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 114.41}, {"Year": 2019, "Entity": "Italy", "Value_co2_emissions_kt_by_country": 317239.9902, "Electricity from fossil fuels (TWh)": 175.52, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 115.83}, {"Year": 2020, "Entity": "Italy", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 161.17, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 116.9}, {"Year": 2000, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1182610, "Electricity from fossil fuels (TWh)": 578.29, "Electricity from nuclear (TWh)": 305.95, "Electricity from renewables (TWh)": 104.16}, {"Year": 2001, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1170380, "Electricity from fossil fuels (TWh)": 564.95, "Electricity from nuclear (TWh)": 303.86, "Electricity from renewables (TWh)": 101.36}, {"Year": 2002, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1206599.976, "Electricity from fossil fuels (TWh)": 605.12, "Electricity from nuclear (TWh)": 280.34, "Electricity from renewables (TWh)": 101.1}, {"Year": 2003, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1214949.951, "Electricity from fossil fuels (TWh)": 633.76, "Electricity from nuclear (TWh)": 228.01, "Electricity from renewables (TWh)": 114.18}, {"Year": 2004, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1209849.976, "Electricity from fossil fuels (TWh)": 621.6, "Electricity from nuclear (TWh)": 268.32, "Electricity from renewables (TWh)": 114.73}, {"Year": 2005, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1212819.946, "Electricity from fossil fuels (TWh)": 634.09, "Electricity from nuclear (TWh)": 280.5, "Electricity from renewables (TWh)": 100.57}, {"Year": 2006, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1189520.02, "Electricity from fossil fuels (TWh)": 628.77, "Electricity from nuclear (TWh)": 291.54, "Electricity from renewables (TWh)": 112.07}, {"Year": 2007, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1225069.946, "Electricity from fossil fuels (TWh)": 705.37, "Electricity from nuclear (TWh)": 267.34, "Electricity from renewables (TWh)": 100.8}, {"Year": 2008, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1158219.971, "Electricity from fossil fuels (TWh)": 663.88, "Electricity from nuclear (TWh)": 241.25, "Electricity from renewables (TWh)": 100.79}, {"Year": 2009, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1100979.98, "Electricity from fossil fuels (TWh)": 611.86, "Electricity from nuclear (TWh)": 263.05, "Electricity from renewables (TWh)": 102.28}, {"Year": 2010, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1156479.98, "Electricity from fossil fuels (TWh)": 689.89, "Electricity from nuclear (TWh)": 278.36, "Electricity from renewables (TWh)": 113.92}, {"Year": 2011, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1213520.02, "Electricity from fossil fuels (TWh)": 777.1, "Electricity from nuclear (TWh)": 153.38, "Electricity from renewables (TWh)": 116.5}, {"Year": 2012, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1253609.985, "Electricity from fossil fuels (TWh)": 920.39, "Electricity from nuclear (TWh)": 15.12, "Electricity from renewables (TWh)": 111.09}, {"Year": 2013, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1262780.029, "Electricity from fossil fuels (TWh)": 897.88, "Electricity from nuclear (TWh)": 10.43, "Electricity from renewables (TWh)": 121.48}, {"Year": 2014, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1217119.995, "Electricity from fossil fuels (TWh)": 892.18, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 136.53}, {"Year": 2015, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1179439.941, "Electricity from fossil fuels (TWh)": 844.23, "Electricity from nuclear (TWh)": 3.24, "Electricity from renewables (TWh)": 157.34}, {"Year": 2016, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1167790.039, "Electricity from fossil fuels (TWh)": 832.4, "Electricity from nuclear (TWh)": 14.87, "Electricity from renewables (TWh)": 157.7}, {"Year": 2017, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1155229.98, "Electricity from fossil fuels (TWh)": 806.12, "Electricity from nuclear (TWh)": 27.75, "Electricity from renewables (TWh)": 175.12}, {"Year": 2018, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1116150.024, "Electricity from fossil fuels (TWh)": 780.61, "Electricity from nuclear (TWh)": 47.82, "Electricity from renewables (TWh)": 183.63}, {"Year": 2019, "Entity": "Japan", "Value_co2_emissions_kt_by_country": 1081569.946, "Electricity from fossil fuels (TWh)": 735.66, "Electricity from nuclear (TWh)": 63.88, "Electricity from renewables (TWh)": 192.72}, {"Year": 2020, "Entity": "Japan", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 716.67, "Electricity from nuclear (TWh)": 41.86, "Electricity from renewables (TWh)": 205.6}, {"Year": 2000, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 120150, "Electricity from fossil fuels (TWh)": 44.11, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 7.53}, {"Year": 2001, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 117440, "Electricity from fossil fuels (TWh)": 47.3, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 8.08}, {"Year": 2002, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 131059.9976, "Electricity from fossil fuels (TWh)": 49.44, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 8.89}, {"Year": 2003, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 146139.9994, "Electricity from fossil fuels (TWh)": 55.24, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 8.62}, {"Year": 2004, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 158029.9988, "Electricity from fossil fuels (TWh)": 58.89, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 8.06}, {"Year": 2005, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 169210.0067, "Electricity from fossil fuels (TWh)": 60.06, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 7.86}, {"Year": 2006, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 185300.0031, "Electricity from fossil fuels (TWh)": 63.89, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 7.77}, {"Year": 2007, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 198389.9994, "Electricity from fossil fuels (TWh)": 68.45, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 8.17}, {"Year": 2008, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 242029.9988, "Electricity from fossil fuels (TWh)": 72.89, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 7.46}, {"Year": 2009, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 213610.0006, "Electricity from fossil fuels (TWh)": 71.85, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 6.88}, {"Year": 2010, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 229699.9969, "Electricity from fossil fuels (TWh)": 74.63, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 8.02}, {"Year": 2011, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 245449.9969, "Electricity from fossil fuels (TWh)": 78.7, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 7.88}, {"Year": 2012, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 244600.0061, "Electricity from fossil fuels (TWh)": 82.98, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 7.64}, {"Year": 2013, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 260010.0098, "Electricity from fossil fuels (TWh)": 84.88, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 7.73}, {"Year": 2014, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 209229.9957, "Electricity from fossil fuels (TWh)": 86.37, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 8.27}, {"Year": 2015, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 190729.9957, "Electricity from fossil fuels (TWh)": 82.2, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 9.45}, {"Year": 2016, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 202149.9939, "Electricity from fossil fuels (TWh)": 82.65, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 11.98}, {"Year": 2017, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 214580.0018, "Electricity from fossil fuels (TWh)": 91.48, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 11.64}, {"Year": 2018, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 216600.0061, "Electricity from fossil fuels (TWh)": 96.36, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 10.91}, {"Year": 2019, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": 212110.0006, "Electricity from fossil fuels (TWh)": 95.39, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 11.09}, {"Year": 2020, "Entity": "Kazakhstan", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 96.7, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 11.94}, {"Year": 2000, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 379180, "Electricity from fossil fuels (TWh)": 141.8, "Electricity from nuclear (TWh)": 7.81, "Electricity from renewables (TWh)": 44.51}, {"Year": 2001, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 378830, "Electricity from fossil fuels (TWh)": 153.32, "Electricity from nuclear (TWh)": 8.29, "Electricity from renewables (TWh)": 39.56}, {"Year": 2002, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 386000, "Electricity from fossil fuels (TWh)": 159.81, "Electricity from nuclear (TWh)": 9.26, "Electricity from renewables (TWh)": 35.67}, {"Year": 2003, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 404690.0024, "Electricity from fossil fuels (TWh)": 160.45, "Electricity from nuclear (TWh)": 9.98, "Electricity from renewables (TWh)": 32.11}, {"Year": 2004, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 414100.0061, "Electricity from fossil fuels (TWh)": 173.66, "Electricity from nuclear (TWh)": 8.73, "Electricity from renewables (TWh)": 38.19}, {"Year": 2005, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 432190.0024, "Electricity from fossil fuels (TWh)": 178.76, "Electricity from nuclear (TWh)": 10.32, "Electricity from renewables (TWh)": 42.29}, {"Year": 2006, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 448299.9878, "Electricity from fossil fuels (TWh)": 182.76, "Electricity from nuclear (TWh)": 10.4, "Electricity from renewables (TWh)": 43.63}, {"Year": 2007, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 457119.9951, "Electricity from fossil fuels (TWh)": 191.83, "Electricity from nuclear (TWh)": 9.95, "Electricity from renewables (TWh)": 42.14}, {"Year": 2008, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 459549.9878, "Electricity from fossil fuels (TWh)": 184.51, "Electricity from nuclear (TWh)": 9.36, "Electricity from renewables (TWh)": 53.22}, {"Year": 2009, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 448369.9951, "Electricity from fossil fuels (TWh)": 194.75, "Electricity from nuclear (TWh)": 10.11, "Electricity from renewables (TWh)": 40.59}, {"Year": 2010, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 462869.9951, "Electricity from fossil fuels (TWh)": 207.38, "Electricity from nuclear (TWh)": 5.66, "Electricity from renewables (TWh)": 51.37}, {"Year": 2011, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 478399.9939, "Electricity from fossil fuels (TWh)": 219.88, "Electricity from nuclear (TWh)": 9.66, "Electricity from renewables (TWh)": 50.7}, {"Year": 2012, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 486450.0122, "Electricity from fossil fuels (TWh)": 229.14, "Electricity from nuclear (TWh)": 8.41, "Electricity from renewables (TWh)": 47.2}, {"Year": 2013, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 475739.9902, "Electricity from fossil fuels (TWh)": 231.23, "Electricity from nuclear (TWh)": 11.38, "Electricity from renewables (TWh)": 44.67}, {"Year": 2014, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 462239.9902, "Electricity from fossil fuels (TWh)": 223.43, "Electricity from nuclear (TWh)": 9.3, "Electricity from renewables (TWh)": 57.46}, {"Year": 2015, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 471630.0049, "Electricity from fossil fuels (TWh)": 234.28, "Electricity from nuclear (TWh)": 11.18, "Electricity from renewables (TWh)": 52.42}, {"Year": 2016, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 473309.9976, "Electricity from fossil fuels (TWh)": 239.78, "Electricity from nuclear (TWh)": 10.27, "Electricity from renewables (TWh)": 52.97}, {"Year": 2017, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 471579.9866, "Electricity from fossil fuels (TWh)": 242.69, "Electricity from nuclear (TWh)": 10.57, "Electricity from renewables (TWh)": 55.88}, {"Year": 2018, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 452570.0073, "Electricity from fossil fuels (TWh)": 259.92, "Electricity from nuclear (TWh)": 13.32, "Electricity from renewables (TWh)": 58.78}, {"Year": 2019, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": 449269.989, "Electricity from fossil fuels (TWh)": 248.2, "Electricity from nuclear (TWh)": 10.88, "Electricity from renewables (TWh)": 59}, {"Year": 2020, "Entity": "Mexico", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 245.46, "Electricity from nuclear (TWh)": 10.87, "Electricity from renewables (TWh)": 69.19}, {"Year": 2000, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 295770, "Electricity from fossil fuels (TWh)": 140.85, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 2.33}, {"Year": 2001, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 293630, "Electricity from fossil fuels (TWh)": 140.94, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 2.78}, {"Year": 2002, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 287320.0073, "Electricity from fossil fuels (TWh)": 139.72, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 2.77}, {"Year": 2003, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 297730.011, "Electricity from fossil fuels (TWh)": 147.76, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 2.25}, {"Year": 2004, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 301850.0061, "Electricity from fossil fuels (TWh)": 149.06, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 3.2}, {"Year": 2005, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 301350.0061, "Electricity from fossil fuels (TWh)": 151.2, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 3.85}, {"Year": 2006, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 314089.9963, "Electricity from fossil fuels (TWh)": 156.16, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 4.29}, {"Year": 2007, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 313380.0049, "Electricity from fossil fuels (TWh)": 153.08, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 5.43}, {"Year": 2008, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 308329.9866, "Electricity from fossil fuels (TWh)": 148.03, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 6.61}, {"Year": 2009, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 297260.0098, "Electricity from fossil fuels (TWh)": 142.4, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 8.69}, {"Year": 2010, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 313739.9902, "Electricity from fossil fuels (TWh)": 146.12, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 10.88}, {"Year": 2011, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 310589.9963, "Electricity from fossil fuels (TWh)": 149.88, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 13.13}, {"Year": 2012, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 303350.0061, "Electricity from fossil fuels (TWh)": 144.75, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 16.88}, {"Year": 2013, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 298299.9878, "Electricity from fossil fuels (TWh)": 146.85, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 17.06}, {"Year": 2014, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 285730.011, "Electricity from fossil fuels (TWh)": 138.53, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 19.85}, {"Year": 2015, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 289079.9866, "Electricity from fossil fuels (TWh)": 141.55, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 22.69}, {"Year": 2016, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 299799.9878, "Electricity from fossil fuels (TWh)": 143.28, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 22.81}, {"Year": 2017, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 312859.9854, "Electricity from fossil fuels (TWh)": 145.8, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 24.13}, {"Year": 2018, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 311910.0037, "Electricity from fossil fuels (TWh)": 147.87, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 21.62}, {"Year": 2019, "Entity": "Poland", "Value_co2_emissions_kt_by_country": 295130.0049, "Electricity from fossil fuels (TWh)": 137.58, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 25.46}, {"Year": 2020, "Entity": "Poland", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 128.91, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 28.23}, {"Year": 2000, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 249660, "Electricity from fossil fuels (TWh)": 138.68, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2001, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 254090, "Electricity from fossil fuels (TWh)": 146.09, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2002, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 272250, "Electricity from fossil fuels (TWh)": 154.91, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2003, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 284829.9866, "Electricity from fossil fuels (TWh)": 166.58, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2004, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 299890.0146, "Electricity from fossil fuels (TWh)": 173.41, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2005, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 315290.0085, "Electricity from fossil fuels (TWh)": 191.05, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2006, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 335440.0024, "Electricity from fossil fuels (TWh)": 196.31, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2007, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 354619.9951, "Electricity from fossil fuels (TWh)": 204.43, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2008, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 389720.0012, "Electricity from fossil fuels (TWh)": 204.2, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2009, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 406529.9988, "Electricity from fossil fuels (TWh)": 217.31, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2010, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 446130.0049, "Electricity from fossil fuels (TWh)": 240.06, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0}, {"Year": 2011, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 463769.989, "Electricity from fossil fuels (TWh)": 250.07, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.01}, {"Year": 2012, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 492470.0012, "Electricity from fossil fuels (TWh)": 271.68, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.03}, {"Year": 2013, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 503209.9915, "Electricity from fossil fuels (TWh)": 284.02, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.04}, {"Year": 2014, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 540520.0195, "Electricity from fossil fuels (TWh)": 311.81, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.05}, {"Year": 2015, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 565190.0024, "Electricity from fossil fuels (TWh)": 338.34, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.05}, {"Year": 2016, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 561229.9805, "Electricity from fossil fuels (TWh)": 337.38, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.05}, {"Year": 2017, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 545070.0073, "Electricity from fossil fuels (TWh)": 354.3, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.07}, {"Year": 2018, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 521260.0098, "Electricity from fossil fuels (TWh)": 334.7, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.16}, {"Year": 2019, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": 523780.0293, "Electricity from fossil fuels (TWh)": 335.24, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.21}, {"Year": 2020, "Entity": "Saudi Arabia", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 337.82, "Electricity from nuclear (TWh)": null, "Electricity from renewables (TWh)": 0.21}, {"Year": 2000, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 284660, "Electricity from fossil fuels (TWh)": 181.67, "Electricity from nuclear (TWh)": 13.01, "Electricity from renewables (TWh)": 1.79}, {"Year": 2001, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 320540, "Electricity from fossil fuels (TWh)": 183.36, "Electricity from nuclear (TWh)": 10.72, "Electricity from renewables (TWh)": 2.46}, {"Year": 2002, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 331320.0073, "Electricity from fossil fuels (TWh)": 188.79, "Electricity from nuclear (TWh)": 11.99, "Electricity from renewables (TWh)": 2.81}, {"Year": 2003, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 353089.9963, "Electricity from fossil fuels (TWh)": 204.39, "Electricity from nuclear (TWh)": 12.66, "Electricity from renewables (TWh)": 1.19}, {"Year": 2004, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 379989.9902, "Electricity from fossil fuels (TWh)": 212.63, "Electricity from nuclear (TWh)": 14.28, "Electricity from renewables (TWh)": 1.33}, {"Year": 2005, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 377649.9939, "Electricity from fossil fuels (TWh)": 215.23, "Electricity from nuclear (TWh)": 12.24, "Electricity from renewables (TWh)": 1.75}, {"Year": 2006, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 379790.0085, "Electricity from fossil fuels (TWh)": 223.25, "Electricity from nuclear (TWh)": 10.07, "Electricity from renewables (TWh)": 3.28}, {"Year": 2007, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 397059.9976, "Electricity from fossil fuels (TWh)": 232.91, "Electricity from nuclear (TWh)": 12.6, "Electricity from renewables (TWh)": 1.3}, {"Year": 2008, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 426739.9902, "Electricity from fossil fuels (TWh)": 226.32, "Electricity from nuclear (TWh)": 12.75, "Electricity from renewables (TWh)": 1.66}, {"Year": 2009, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 404200.0122, "Electricity from fossil fuels (TWh)": 218.17, "Electricity from nuclear (TWh)": 11.57, "Electricity from renewables (TWh)": 1.86}, {"Year": 2010, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 425309.9976, "Electricity from fossil fuels (TWh)": 227.57, "Electricity from nuclear (TWh)": 12.9, "Electricity from renewables (TWh)": 2.51}, {"Year": 2011, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 409260.0098, "Electricity from fossil fuels (TWh)": 229.06, "Electricity from nuclear (TWh)": 12.94, "Electricity from renewables (TWh)": 2.49}, {"Year": 2012, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 426779.9988, "Electricity from fossil fuels (TWh)": 226.84, "Electricity from nuclear (TWh)": 12.4, "Electricity from renewables (TWh)": 1.66}, {"Year": 2013, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 436920.0134, "Electricity from fossil fuels (TWh)": 223.28, "Electricity from nuclear (TWh)": 13.61, "Electricity from renewables (TWh)": 1.62}, {"Year": 2014, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 447929.9927, "Electricity from fossil fuels (TWh)": 218.42, "Electricity from nuclear (TWh)": 14.76, "Electricity from renewables (TWh)": 3.38}, {"Year": 2015, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 424809.9976, "Electricity from fossil fuels (TWh)": 214.88, "Electricity from nuclear (TWh)": 10.97, "Electricity from renewables (TWh)": 6.09}, {"Year": 2016, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 425140.0146, "Electricity from fossil fuels (TWh)": 213.09, "Electricity from nuclear (TWh)": 15.21, "Electricity from renewables (TWh)": 7.69}, {"Year": 2017, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 435649.9939, "Electricity from fossil fuels (TWh)": 212.77, "Electricity from nuclear (TWh)": 15.09, "Electricity from renewables (TWh)": 10.04}, {"Year": 2018, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 434350.0061, "Electricity from fossil fuels (TWh)": 214.25, "Electricity from nuclear (TWh)": 10.56, "Electricity from renewables (TWh)": 12.22}, {"Year": 2019, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": 439640.0146, "Electricity from fossil fuels (TWh)": 208.39, "Electricity from nuclear (TWh)": 13.6, "Electricity from renewables (TWh)": 12.57}, {"Year": 2020, "Entity": "South Africa", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 197.5, "Electricity from nuclear (TWh)": 11.62, "Electricity from renewables (TWh)": 12.83}, {"Year": 2000, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 293310, "Electricity from fossil fuels (TWh)": 124.22, "Electricity from nuclear (TWh)": 62.21, "Electricity from renewables (TWh)": 34.49}, {"Year": 2001, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 294790, "Electricity from fossil fuels (TWh)": 120.06, "Electricity from nuclear (TWh)": 63.71, "Electricity from renewables (TWh)": 49.3}, {"Year": 2002, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 312750, "Electricity from fossil fuels (TWh)": 143.72, "Electricity from nuclear (TWh)": 63.02, "Electricity from renewables (TWh)": 33.17}, {"Year": 2003, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 318660.0037, "Electricity from fossil fuels (TWh)": 139.67, "Electricity from nuclear (TWh)": 61.88, "Electricity from renewables (TWh)": 55.75}, {"Year": 2004, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 335559.9976, "Electricity from fossil fuels (TWh)": 159.91, "Electricity from nuclear (TWh)": 63.61, "Electricity from renewables (TWh)": 50.13}, {"Year": 2005, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 350500, "Electricity from fossil fuels (TWh)": 184.65, "Electricity from nuclear (TWh)": 57.54, "Electricity from renewables (TWh)": 42.27}, {"Year": 2006, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 341779.9988, "Electricity from fossil fuels (TWh)": 182.98, "Electricity from nuclear (TWh)": 60.13, "Electricity from renewables (TWh)": 52.15}, {"Year": 2007, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 354679.9927, "Electricity from fossil fuels (TWh)": 188.13, "Electricity from nuclear (TWh)": 55.1, "Electricity from renewables (TWh)": 58.3}, {"Year": 2008, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 324269.989, "Electricity from fossil fuels (TWh)": 189.55, "Electricity from nuclear (TWh)": 58.97, "Electricity from renewables (TWh)": 62.15}, {"Year": 2009, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 287489.9902, "Electricity from fossil fuels (TWh)": 164.69, "Electricity from nuclear (TWh)": 52.76, "Electricity from renewables (TWh)": 74.08}, {"Year": 2010, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 273250, "Electricity from fossil fuels (TWh)": 138.39, "Electricity from nuclear (TWh)": 61.99, "Electricity from renewables (TWh)": 97.77}, {"Year": 2011, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 274399.9939, "Electricity from fossil fuels (TWh)": 146.12, "Electricity from nuclear (TWh)": 57.72, "Electricity from renewables (TWh)": 87.53}, {"Year": 2012, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 269269.989, "Electricity from fossil fuels (TWh)": 145.33, "Electricity from nuclear (TWh)": 61.47, "Electricity from renewables (TWh)": 86.97}, {"Year": 2013, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 242809.9976, "Electricity from fossil fuels (TWh)": 113.32, "Electricity from nuclear (TWh)": 56.73, "Electricity from renewables (TWh)": 111.42}, {"Year": 2014, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 240960.0067, "Electricity from fossil fuels (TWh)": 107.37, "Electricity from nuclear (TWh)": 57.31, "Electricity from renewables (TWh)": 110.26}, {"Year": 2015, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 256279.9988, "Electricity from fossil fuels (TWh)": 123.19, "Electricity from nuclear (TWh)": 57.2, "Electricity from renewables (TWh)": 97.09}, {"Year": 2016, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 247029.9988, "Electricity from fossil fuels (TWh)": 107.93, "Electricity from nuclear (TWh)": 58.63, "Electricity from renewables (TWh)": 104.63}, {"Year": 2017, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 263450.0122, "Electricity from fossil fuels (TWh)": 126.93, "Electricity from nuclear (TWh)": 58.04, "Electricity from renewables (TWh)": 87.93}, {"Year": 2018, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 257040.0085, "Electricity from fossil fuels (TWh)": 112.23, "Electricity from nuclear (TWh)": 55.77, "Electricity from renewables (TWh)": 103.88}, {"Year": 2019, "Entity": "Spain", "Value_co2_emissions_kt_by_country": 239979.9957, "Electricity from fossil fuels (TWh)": 111.55, "Electricity from nuclear (TWh)": 58.35, "Electricity from renewables (TWh)": 100.99}, {"Year": 2020, "Entity": "Spain", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 87.64, "Electricity from nuclear (TWh)": 58.3, "Electricity from renewables (TWh)": 113.79}, {"Year": 2000, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 164490, "Electricity from fossil fuels (TWh)": 83.15, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 6.38}, {"Year": 2001, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 173160, "Electricity from fossil fuels (TWh)": 88.97, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 6.76}, {"Year": 2002, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 184240.0055, "Electricity from fossil fuels (TWh)": 93.51, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 8.07}, {"Year": 2003, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 191929.9927, "Electricity from fossil fuels (TWh)": 100.61, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 8.36}, {"Year": 2004, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 210190.0024, "Electricity from fossil fuels (TWh)": 109.46, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 7.63}, {"Year": 2005, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 217770.0043, "Electricity from fossil fuels (TWh)": 115.58, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 7.42}, {"Year": 2006, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 219880.0049, "Electricity from fossil fuels (TWh)": 119.41, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 9.82}, {"Year": 2007, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 224589.9963, "Electricity from fossil fuels (TWh)": 122.12, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 10.2}, {"Year": 2008, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 227580.0018, "Electricity from fossil fuels (TWh)": 127.43, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 8.95}, {"Year": 2009, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 220259.9945, "Electricity from fossil fuels (TWh)": 128.09, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 9.09}, {"Year": 2010, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 234380.0049, "Electricity from fossil fuels (TWh)": 141.72, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 8.58}, {"Year": 2011, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 233600.0061, "Electricity from fossil fuels (TWh)": 135.31, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 11.83}, {"Year": 2012, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 250679.9927, "Electricity from fossil fuels (TWh)": 143.73, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 13.42}, {"Year": 2013, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 260700.0122, "Electricity from fossil fuels (TWh)": 148.29, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 12.33}, {"Year": 2014, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 256799.9878, "Electricity from fossil fuels (TWh)": 149.26, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 13.68}, {"Year": 2015, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 264000, "Electricity from fossil fuels (TWh)": 153.4, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 13.33}, {"Year": 2016, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 261600.0061, "Electricity from fossil fuels (TWh)": 161.79, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 15.97}, {"Year": 2017, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 258820.0073, "Electricity from fossil fuels (TWh)": 161.88, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 19.92}, {"Year": 2018, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 257049.9878, "Electricity from fossil fuels (TWh)": 156.26, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 25.84}, {"Year": 2019, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": 267089.9963, "Electricity from fossil fuels (TWh)": 162.59, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 28.02}, {"Year": 2020, "Entity": "Thailand", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 154.52, "Electricity from nuclear (TWh)": 0, "Electricity from renewables (TWh)": 24.73}, {"Year": 2000, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 297380, "Electricity from fossil fuels (TWh)": 82.65, "Electricity from nuclear (TWh)": 77.34, "Electricity from renewables (TWh)": 11.28}, {"Year": 2001, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 300550, "Electricity from fossil fuels (TWh)": 84.59, "Electricity from nuclear (TWh)": 76.17, "Electricity from renewables (TWh)": 12.05}, {"Year": 2002, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 303940.0024, "Electricity from fossil fuels (TWh)": 85.93, "Electricity from nuclear (TWh)": 77.99, "Electricity from renewables (TWh)": 9.65}, {"Year": 2003, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 330230.011, "Electricity from fossil fuels (TWh)": 89.52, "Electricity from nuclear (TWh)": 81.41, "Electricity from renewables (TWh)": 9.27}, {"Year": 2004, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 307140.0146, "Electricity from fossil fuels (TWh)": 83.22, "Electricity from nuclear (TWh)": 87.02, "Electricity from renewables (TWh)": 11.78}, {"Year": 2005, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 295410.0037, "Electricity from fossil fuels (TWh)": 84.75, "Electricity from nuclear (TWh)": 88.76, "Electricity from renewables (TWh)": 12.4}, {"Year": 2006, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 303989.9902, "Electricity from fossil fuels (TWh)": 90.09, "Electricity from nuclear (TWh)": 90.22, "Electricity from renewables (TWh)": 12.92}, {"Year": 2007, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 312140.0146, "Electricity from fossil fuels (TWh)": 93.13, "Electricity from nuclear (TWh)": 92.54, "Electricity from renewables (TWh)": 10.47}, {"Year": 2008, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 301200.0122, "Electricity from fossil fuels (TWh)": 90.92, "Electricity from nuclear (TWh)": 89.84, "Electricity from renewables (TWh)": 11.82}, {"Year": 2009, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 251619.9951, "Electricity from fossil fuels (TWh)": 78.58, "Electricity from nuclear (TWh)": 82.92, "Electricity from renewables (TWh)": 12.12}, {"Year": 2010, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 268920.0134, "Electricity from fossil fuels (TWh)": 86.28, "Electricity from nuclear (TWh)": 89.15, "Electricity from renewables (TWh)": 13.39}, {"Year": 2011, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 283339.9963, "Electricity from fossil fuels (TWh)": 93.5, "Electricity from nuclear (TWh)": 90.25, "Electricity from renewables (TWh)": 11.2}, {"Year": 2012, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 277109.9854, "Electricity from fossil fuels (TWh)": 96.99, "Electricity from nuclear (TWh)": 90.14, "Electricity from renewables (TWh)": 11.23}, {"Year": 2013, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 270269.989, "Electricity from fossil fuels (TWh)": 95.39, "Electricity from nuclear (TWh)": 83.21, "Electricity from renewables (TWh)": 15.11}, {"Year": 2014, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 237729.9957, "Electricity from fossil fuels (TWh)": 83.42, "Electricity from nuclear (TWh)": 88.39, "Electricity from renewables (TWh)": 10.17}, {"Year": 2015, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 191070.0073, "Electricity from fossil fuels (TWh)": 66.91, "Electricity from nuclear (TWh)": 87.63, "Electricity from renewables (TWh)": 7.1}, {"Year": 2016, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 201660.0037, "Electricity from fossil fuels (TWh)": 72.66, "Electricity from nuclear (TWh)": 80.95, "Electricity from renewables (TWh)": 9.25}, {"Year": 2017, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 174940.0024, "Electricity from fossil fuels (TWh)": 57.96, "Electricity from nuclear (TWh)": 85.58, "Electricity from renewables (TWh)": 10.88}, {"Year": 2018, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 185619.9951, "Electricity from fossil fuels (TWh)": 60.81, "Electricity from nuclear (TWh)": 84.4, "Electricity from renewables (TWh)": 13.02}, {"Year": 2019, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": 174729.9957, "Electricity from fossil fuels (TWh)": 57.79, "Electricity from nuclear (TWh)": 83, "Electricity from renewables (TWh)": 11.87}, {"Year": 2020, "Entity": "Ukraine", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 54.5, "Electricity from nuclear (TWh)": 76.2, "Electricity from renewables (TWh)": 17.56}, {"Year": 2000, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 530890, "Electricity from fossil fuels (TWh)": 279.34, "Electricity from nuclear (TWh)": 85.06, "Electricity from renewables (TWh)": 9.98}, {"Year": 2001, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 545260, "Electricity from fossil fuels (TWh)": 282.72, "Electricity from nuclear (TWh)": 90.09, "Electricity from renewables (TWh)": 9.56}, {"Year": 2002, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 530789.978, "Electricity from fossil fuels (TWh)": 285.62, "Electricity from nuclear (TWh)": 87.85, "Electricity from renewables (TWh)": 11.13}, {"Year": 2003, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 543039.978, "Electricity from fossil fuels (TWh)": 296.15, "Electricity from nuclear (TWh)": 88.69, "Electricity from renewables (TWh)": 10.62}, {"Year": 2004, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 543080.0171, "Electricity from fossil fuels (TWh)": 297.15, "Electricity from nuclear (TWh)": 80, "Electricity from renewables (TWh)": 14.14}, {"Year": 2005, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 540919.9829, "Electricity from fossil fuels (TWh)": 296.87, "Electricity from nuclear (TWh)": 81.62, "Electricity from renewables (TWh)": 16.93}, {"Year": 2006, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 542059.9976, "Electricity from fossil fuels (TWh)": 299.88, "Electricity from nuclear (TWh)": 75.45, "Electricity from renewables (TWh)": 18.11}, {"Year": 2007, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 530500, "Electricity from fossil fuels (TWh)": 310.26, "Electricity from nuclear (TWh)": 63.03, "Electricity from renewables (TWh)": 19.69}, {"Year": 2008, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 515340.0269, "Electricity from fossil fuels (TWh)": 310.5, "Electricity from nuclear (TWh)": 52.49, "Electricity from renewables (TWh)": 21.85}, {"Year": 2009, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 466489.9902, "Electricity from fossil fuels (TWh)": 278.73, "Electricity from nuclear (TWh)": 69.1, "Electricity from renewables (TWh)": 25.25}, {"Year": 2010, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 482440.0024, "Electricity from fossil fuels (TWh)": 290.59, "Electricity from nuclear (TWh)": 62.14, "Electricity from renewables (TWh)": 26.18}, {"Year": 2011, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 445589.9963, "Electricity from fossil fuels (TWh)": 260.88, "Electricity from nuclear (TWh)": 68.98, "Electricity from renewables (TWh)": 35.2}, {"Year": 2012, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 467779.9988, "Electricity from fossil fuels (TWh)": 249.25, "Electricity from nuclear (TWh)": 70.4, "Electricity from renewables (TWh)": 41.24}, {"Year": 2013, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 453760.0098, "Electricity from fossil fuels (TWh)": 231.56, "Electricity from nuclear (TWh)": 70.61, "Electricity from renewables (TWh)": 53.21}, {"Year": 2014, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 415600.0061, "Electricity from fossil fuels (TWh)": 206.94, "Electricity from nuclear (TWh)": 63.75, "Electricity from renewables (TWh)": 64.52}, {"Year": 2015, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 401079.9866, "Electricity from fossil fuels (TWh)": 182.43, "Electricity from nuclear (TWh)": 70.34, "Electricity from renewables (TWh)": 82.57}, {"Year": 2016, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 380809.9976, "Electricity from fossil fuels (TWh)": 181.56, "Electricity from nuclear (TWh)": 71.73, "Electricity from renewables (TWh)": 82.99}, {"Year": 2017, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 367000, "Electricity from fossil fuels (TWh)": 165.91, "Electricity from nuclear (TWh)": 70.34, "Electricity from renewables (TWh)": 98.85}, {"Year": 2018, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 360730.011, "Electricity from fossil fuels (TWh)": 155.41, "Electricity from nuclear (TWh)": 65.06, "Electricity from renewables (TWh)": 110.03}, {"Year": 2019, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": 348920.0134, "Electricity from fossil fuels (TWh)": 144.99, "Electricity from nuclear (TWh)": 56.18, "Electricity from renewables (TWh)": 120.48}, {"Year": 2020, "Entity": "United Kingdom", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 124.78, "Electricity from nuclear (TWh)": 50.85, "Electricity from renewables (TWh)": 131.74}, {"Year": 2000, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5775810, "Electricity from fossil fuels (TWh)": 2697.28, "Electricity from nuclear (TWh)": 753.89, "Electricity from renewables (TWh)": 350.93}, {"Year": 2001, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5748260, "Electricity from fossil fuels (TWh)": 2678.68, "Electricity from nuclear (TWh)": 768.83, "Electricity from renewables (TWh)": 280.06}, {"Year": 2002, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5593029.785, "Electricity from fossil fuels (TWh)": 2727.83, "Electricity from nuclear (TWh)": 780.06, "Electricity from renewables (TWh)": 336.34}, {"Year": 2003, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5658990.234, "Electricity from fossil fuels (TWh)": 2756.03, "Electricity from nuclear (TWh)": 763.73, "Electricity from renewables (TWh)": 349.18}, {"Year": 2004, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5738290.039, "Electricity from fossil fuels (TWh)": 2818.28, "Electricity from nuclear (TWh)": 788.53, "Electricity from renewables (TWh)": 345.14}, {"Year": 2005, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5753490.234, "Electricity from fossil fuels (TWh)": 2899.96, "Electricity from nuclear (TWh)": 781.99, "Electricity from renewables (TWh)": 353.04}, {"Year": 2006, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5653080.078, "Electricity from fossil fuels (TWh)": 2878.56, "Electricity from nuclear (TWh)": 787.22, "Electricity from renewables (TWh)": 381.16}, {"Year": 2007, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5736319.824, "Electricity from fossil fuels (TWh)": 2988.24, "Electricity from nuclear (TWh)": 806.42, "Electricity from renewables (TWh)": 347.91}, {"Year": 2008, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5558379.883, "Electricity from fossil fuels (TWh)": 2924.21, "Electricity from nuclear (TWh)": 806.21, "Electricity from renewables (TWh)": 377.11}, {"Year": 2009, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5156430.176, "Electricity from fossil fuels (TWh)": 2725.41, "Electricity from nuclear (TWh)": 798.85, "Electricity from renewables (TWh)": 415.56}, {"Year": 2010, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5392109.863, "Electricity from fossil fuels (TWh)": 2882.49, "Electricity from nuclear (TWh)": 806.97, "Electricity from renewables (TWh)": 424.48}, {"Year": 2011, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5173600.098, "Electricity from fossil fuels (TWh)": 2788.93, "Electricity from nuclear (TWh)": 790.2, "Electricity from renewables (TWh)": 509.74}, {"Year": 2012, "Entity": "United States", "Value_co2_emissions_kt_by_country": 4956060.059, "Electricity from fossil fuels (TWh)": 2779.02, "Electricity from nuclear (TWh)": 769.33, "Electricity from renewables (TWh)": 492.32}, {"Year": 2013, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5092100.098, "Electricity from fossil fuels (TWh)": 2746.21, "Electricity from nuclear (TWh)": 789.02, "Electricity from renewables (TWh)": 520.38}, {"Year": 2014, "Entity": "United States", "Value_co2_emissions_kt_by_country": 5107209.961, "Electricity from fossil fuels (TWh)": 2752.01, "Electricity from nuclear (TWh)": 797.17, "Electricity from renewables (TWh)": 546.83}, {"Year": 2015, "Entity": "United States", "Value_co2_emissions_kt_by_country": 4990709.961, "Electricity from fossil fuels (TWh)": 2730.32, "Electricity from nuclear (TWh)": 797.18, "Electricity from renewables (TWh)": 556.49}, {"Year": 2016, "Entity": "United States", "Value_co2_emissions_kt_by_country": 4894500, "Electricity from fossil fuels (TWh)": 2656.96, "Electricity from nuclear (TWh)": 805.69, "Electricity from renewables (TWh)": 624.91}, {"Year": 2017, "Entity": "United States", "Value_co2_emissions_kt_by_country": 4819370.117, "Electricity from fossil fuels (TWh)": 2540.17, "Electricity from nuclear (TWh)": 804.95, "Electricity from renewables (TWh)": 707.19}, {"Year": 2018, "Entity": "United States", "Value_co2_emissions_kt_by_country": 4975310.059, "Electricity from fossil fuels (TWh)": 2661.3, "Electricity from nuclear (TWh)": 807.08, "Electricity from renewables (TWh)": 733.17}, {"Year": 2019, "Entity": "United States", "Value_co2_emissions_kt_by_country": 4817720.215, "Electricity from fossil fuels (TWh)": 2588.21, "Electricity from nuclear (TWh)": 809.41, "Electricity from renewables (TWh)": 760.76}, {"Year": 2020, "Entity": "United States", "Value_co2_emissions_kt_by_country": null, "Electricity from fossil fuels (TWh)": 2431.9, "Electricity from nuclear (TWh)": 789.88, "Electricity from renewables (TWh)": 821.4}], "anchored": true, "attachedMetadata": ""}, {"kind": "table", "id": "table-82", "displayId": "energy-source", "names": ["Entity", "Year", "energy", "source"], "rows": [{"Entity": "Australia", "Year": 2000, "energy": 181.05, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2001, "energy": 194.33, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2002, "energy": 197.29, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2003, "energy": 195.13, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2004, "energy": 203.66, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2005, "energy": 195.95, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2006, "energy": 198.72, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2007, "energy": 208.59, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2008, "energy": 211.06, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2009, "energy": 216.42, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2010, "energy": 212.5, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2011, "energy": 213.56, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2012, "energy": 206.75, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2013, "energy": 195.78, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2014, "energy": 205.46, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2015, "energy": 197.72, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2016, "energy": 207.66, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2017, "energy": 209.14, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2018, "energy": 207.45, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2019, "energy": 196.45, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2020, "energy": 186.92, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2000, "energy": 28.87, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2001, "energy": 35.19, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2002, "energy": 33.5, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2003, "energy": 31.62, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2004, "energy": 40.14, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2005, "energy": 39.56, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2006, "energy": 39.4, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2007, "energy": 37.64, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2008, "energy": 55.87, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2009, "energy": 36.32, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2010, "energy": 61.02, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2011, "energy": 50.27, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2012, "energy": 77.21, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2013, "energy": 112, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2014, "energy": 136.58, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2015, "energy": 128.85, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2016, "energy": 93.06, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2017, "energy": 101.9, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2018, "energy": 86.69, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2019, "energy": 90.91, "source": "fossil fuels"}, {"Entity": "Brazil", "Year": 2020, "energy": 81.15, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2000, "energy": 155.56, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2001, "energy": 159.93, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2002, "energy": 155.12, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2003, "energy": 157.35, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2004, "energy": 148.86, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2005, "energy": 150.78, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2006, "energy": 139.71, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2007, "energy": 149.36, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2008, "energy": 141.33, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2009, "energy": 129.76, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2010, "energy": 130.08, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2011, "energy": 131.3, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2012, "energy": 124.2, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2013, "energy": 122.87, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2014, "energy": 122.75, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2015, "energy": 125.7, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2016, "energy": 122.35, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2017, "energy": 113.7, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2018, "energy": 112.47, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2019, "energy": 110.65, "source": "fossil fuels"}, {"Entity": "Canada", "Year": 2020, "energy": 102.19, "source": "fossil fuels"}, {"Entity": "China", "Year": 2000, "energy": 1113.3, "source": "fossil fuels"}, {"Entity": "China", "Year": 2001, "energy": 1182.59, "source": "fossil fuels"}, {"Entity": "China", "Year": 2002, "energy": 1337.46, "source": "fossil fuels"}, {"Entity": "China", "Year": 2003, "energy": 1579.96, "source": "fossil fuels"}, {"Entity": "China", "Year": 2004, "energy": 1795.41, "source": "fossil fuels"}, {"Entity": "China", "Year": 2005, "energy": 2042.8, "source": "fossil fuels"}, {"Entity": "China", "Year": 2006, "energy": 2364.16, "source": "fossil fuels"}, {"Entity": "China", "Year": 2007, "energy": 2718.7, "source": "fossil fuels"}, {"Entity": "China", "Year": 2008, "energy": 2762.29, "source": "fossil fuels"}, {"Entity": "China", "Year": 2009, "energy": 2980.2, "source": "fossil fuels"}, {"Entity": "China", "Year": 2010, "energy": 3326.19, "source": "fossil fuels"}, {"Entity": "China", "Year": 2011, "energy": 3811.77, "source": "fossil fuels"}, {"Entity": "China", "Year": 2012, "energy": 3869.38, "source": "fossil fuels"}, {"Entity": "China", "Year": 2013, "energy": 4203.77, "source": "fossil fuels"}, {"Entity": "China", "Year": 2014, "energy": 4345.86, "source": "fossil fuels"}, {"Entity": "China", "Year": 2015, "energy": 4222.76, "source": "fossil fuels"}, {"Entity": "China", "Year": 2016, "energy": 4355, "source": "fossil fuels"}, {"Entity": "China", "Year": 2017, "energy": 4643.1, "source": "fossil fuels"}, {"Entity": "China", "Year": 2018, "energy": 4990.28, "source": "fossil fuels"}, {"Entity": "China", "Year": 2019, "energy": 5098.22, "source": "fossil fuels"}, {"Entity": "China", "Year": 2020, "energy": 5184.13, "source": "fossil fuels"}, {"Entity": "France", "Year": 2000, "energy": 50.61, "source": "fossil fuels"}, {"Entity": "France", "Year": 2001, "energy": 46.48, "source": "fossil fuels"}, {"Entity": "France", "Year": 2002, "energy": 52.67, "source": "fossil fuels"}, {"Entity": "France", "Year": 2003, "energy": 57.38, "source": "fossil fuels"}, {"Entity": "France", "Year": 2004, "energy": 56.53, "source": "fossil fuels"}, {"Entity": "France", "Year": 2005, "energy": 63.35, "source": "fossil fuels"}, {"Entity": "France", "Year": 2006, "energy": 56.9, "source": "fossil fuels"}, {"Entity": "France", "Year": 2007, "energy": 58.18, "source": "fossil fuels"}, {"Entity": "France", "Year": 2008, "energy": 55.57, "source": "fossil fuels"}, {"Entity": "France", "Year": 2009, "energy": 51.32, "source": "fossil fuels"}, {"Entity": "France", "Year": 2010, "energy": 57.63, "source": "fossil fuels"}, {"Entity": "France", "Year": 2011, "energy": 58.99, "source": "fossil fuels"}, {"Entity": "France", "Year": 2012, "energy": 56.42, "source": "fossil fuels"}, {"Entity": "France", "Year": 2013, "energy": 53.35, "source": "fossil fuels"}, {"Entity": "France", "Year": 2014, "energy": 35.68, "source": "fossil fuels"}, {"Entity": "France", "Year": 2015, "energy": 44.65, "source": "fossil fuels"}, {"Entity": "France", "Year": 2016, "energy": 56.45, "source": "fossil fuels"}, {"Entity": "France", "Year": 2017, "energy": 65.09, "source": "fossil fuels"}, {"Entity": "France", "Year": 2018, "energy": 49.27, "source": "fossil fuels"}, {"Entity": "France", "Year": 2019, "energy": 53.5, "source": "fossil fuels"}, {"Entity": "France", "Year": 2020, "energy": 48.14, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2000, "energy": 367.22, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2001, "energy": 372.69, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2002, "energy": 372.64, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2003, "energy": 390.81, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2004, "energy": 385.24, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2005, "energy": 386.96, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2006, "energy": 390.03, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2007, "energy": 402.4, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2008, "energy": 390.43, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2009, "energy": 358.07, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2010, "energy": 378.9, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2011, "energy": 373.16, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2012, "energy": 377.89, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2013, "energy": 381.52, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2014, "energy": 360.28, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2015, "energy": 359.99, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2016, "energy": 368.67, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2017, "energy": 353.37, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2018, "energy": 334.65, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2019, "energy": 284.09, "source": "fossil fuels"}, {"Entity": "Germany", "Year": 2020, "energy": 251.4, "source": "fossil fuels"}, {"Entity": "India", "Year": 2000, "energy": 475.35, "source": "fossil fuels"}, {"Entity": "India", "Year": 2001, "energy": 491.01, "source": "fossil fuels"}, {"Entity": "India", "Year": 2002, "energy": 517.51, "source": "fossil fuels"}, {"Entity": "India", "Year": 2003, "energy": 545.36, "source": "fossil fuels"}, {"Entity": "India", "Year": 2004, "energy": 567.86, "source": "fossil fuels"}, {"Entity": "India", "Year": 2005, "energy": 579.32, "source": "fossil fuels"}, {"Entity": "India", "Year": 2006, "energy": 599.24, "source": "fossil fuels"}, {"Entity": "India", "Year": 2007, "energy": 636.68, "source": "fossil fuels"}, {"Entity": "India", "Year": 2008, "energy": 674.27, "source": "fossil fuels"}, {"Entity": "India", "Year": 2009, "energy": 728.56, "source": "fossil fuels"}, {"Entity": "India", "Year": 2010, "energy": 771.78, "source": "fossil fuels"}, {"Entity": "India", "Year": 2011, "energy": 828.16, "source": "fossil fuels"}, {"Entity": "India", "Year": 2012, "energy": 893.45, "source": "fossil fuels"}, {"Entity": "India", "Year": 2013, "energy": 924.93, "source": "fossil fuels"}, {"Entity": "India", "Year": 2014, "energy": 1025.29, "source": "fossil fuels"}, {"Entity": "India", "Year": 2015, "energy": 1080.44, "source": "fossil fuels"}, {"Entity": "India", "Year": 2016, "energy": 1155.52, "source": "fossil fuels"}, {"Entity": "India", "Year": 2017, "energy": 1198.85, "source": "fossil fuels"}, {"Entity": "India", "Year": 2018, "energy": 1276.32, "source": "fossil fuels"}, {"Entity": "India", "Year": 2019, "energy": 1273.59, "source": "fossil fuels"}, {"Entity": "India", "Year": 2020, "energy": 1202.34, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2000, "energy": 78.43, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2001, "energy": 83.96, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2002, "energy": 92.03, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2003, "energy": 97.57, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2004, "energy": 103.8, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2005, "energy": 110.22, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2006, "energy": 116.8, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2007, "energy": 124.1, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2008, "energy": 129.55, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2009, "energy": 136.05, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2010, "energy": 142.88, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2011, "energy": 161.41, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2012, "energy": 177.83, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2013, "energy": 189.66, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2014, "energy": 203.11, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2015, "energy": 209.71, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2016, "energy": 217.97, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2017, "energy": 222.64, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2018, "energy": 235.41, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2019, "energy": 247.39, "source": "fossil fuels"}, {"Entity": "Indonesia", "Year": 2020, "energy": 238.91, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2000, "energy": 218.28, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2001, "energy": 216.73, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2002, "energy": 228.45, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2003, "energy": 238.52, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2004, "energy": 240.95, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2005, "energy": 247.29, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2006, "energy": 256.03, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2007, "energy": 259.49, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2008, "energy": 254.34, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2009, "energy": 218.32, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2010, "energy": 220.93, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2011, "energy": 216.78, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2012, "energy": 204.26, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2013, "energy": 175.07, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2014, "energy": 156.76, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2015, "energy": 172.06, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2016, "energy": 179.19, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2017, "energy": 189.44, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2018, "energy": 172.98, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2019, "energy": 175.52, "source": "fossil fuels"}, {"Entity": "Italy", "Year": 2020, "energy": 161.17, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2000, "energy": 578.29, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2001, "energy": 564.95, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2002, "energy": 605.12, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2003, "energy": 633.76, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2004, "energy": 621.6, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2005, "energy": 634.09, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2006, "energy": 628.77, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2007, "energy": 705.37, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2008, "energy": 663.88, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2009, "energy": 611.86, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2010, "energy": 689.89, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2011, "energy": 777.1, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2012, "energy": 920.39, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2013, "energy": 897.88, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2014, "energy": 892.18, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2015, "energy": 844.23, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2016, "energy": 832.4, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2017, "energy": 806.12, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2018, "energy": 780.61, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2019, "energy": 735.66, "source": "fossil fuels"}, {"Entity": "Japan", "Year": 2020, "energy": 716.67, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2000, "energy": 44.11, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2001, "energy": 47.3, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2002, "energy": 49.44, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2003, "energy": 55.24, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2004, "energy": 58.89, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2005, "energy": 60.06, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2006, "energy": 63.89, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2007, "energy": 68.45, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2008, "energy": 72.89, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2009, "energy": 71.85, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2010, "energy": 74.63, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2011, "energy": 78.7, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2012, "energy": 82.98, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2013, "energy": 84.88, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2014, "energy": 86.37, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2015, "energy": 82.2, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2016, "energy": 82.65, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2017, "energy": 91.48, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2018, "energy": 96.36, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2019, "energy": 95.39, "source": "fossil fuels"}, {"Entity": "Kazakhstan", "Year": 2020, "energy": 96.7, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2000, "energy": 141.8, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2001, "energy": 153.32, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2002, "energy": 159.81, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2003, "energy": 160.45, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2004, "energy": 173.66, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2005, "energy": 178.76, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2006, "energy": 182.76, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2007, "energy": 191.83, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2008, "energy": 184.51, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2009, "energy": 194.75, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2010, "energy": 207.38, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2011, "energy": 219.88, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2012, "energy": 229.14, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2013, "energy": 231.23, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2014, "energy": 223.43, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2015, "energy": 234.28, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2016, "energy": 239.78, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2017, "energy": 242.69, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2018, "energy": 259.92, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2019, "energy": 248.2, "source": "fossil fuels"}, {"Entity": "Mexico", "Year": 2020, "energy": 245.46, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2000, "energy": 140.85, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2001, "energy": 140.94, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2002, "energy": 139.72, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2003, "energy": 147.76, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2004, "energy": 149.06, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2005, "energy": 151.2, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2006, "energy": 156.16, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2007, "energy": 153.08, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2008, "energy": 148.03, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2009, "energy": 142.4, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2010, "energy": 146.12, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2011, "energy": 149.88, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2012, "energy": 144.75, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2013, "energy": 146.85, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2014, "energy": 138.53, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2015, "energy": 141.55, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2016, "energy": 143.28, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2017, "energy": 145.8, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2018, "energy": 147.87, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2019, "energy": 137.58, "source": "fossil fuels"}, {"Entity": "Poland", "Year": 2020, "energy": 128.91, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2000, "energy": 138.68, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2001, "energy": 146.09, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2002, "energy": 154.91, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2003, "energy": 166.58, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2004, "energy": 173.41, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2005, "energy": 191.05, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2006, "energy": 196.31, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2007, "energy": 204.43, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2008, "energy": 204.2, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2009, "energy": 217.31, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2010, "energy": 240.06, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2011, "energy": 250.07, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2012, "energy": 271.68, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2013, "energy": 284.02, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2014, "energy": 311.81, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2015, "energy": 338.34, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2016, "energy": 337.38, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2017, "energy": 354.3, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2018, "energy": 334.7, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2019, "energy": 335.24, "source": "fossil fuels"}, {"Entity": "Saudi Arabia", "Year": 2020, "energy": 337.82, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2000, "energy": 181.67, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2001, "energy": 183.36, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2002, "energy": 188.79, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2003, "energy": 204.39, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2004, "energy": 212.63, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2005, "energy": 215.23, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2006, "energy": 223.25, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2007, "energy": 232.91, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2008, "energy": 226.32, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2009, "energy": 218.17, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2010, "energy": 227.57, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2011, "energy": 229.06, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2012, "energy": 226.84, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2013, "energy": 223.28, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2014, "energy": 218.42, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2015, "energy": 214.88, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2016, "energy": 213.09, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2017, "energy": 212.77, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2018, "energy": 214.25, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2019, "energy": 208.39, "source": "fossil fuels"}, {"Entity": "South Africa", "Year": 2020, "energy": 197.5, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2000, "energy": 124.22, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2001, "energy": 120.06, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2002, "energy": 143.72, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2003, "energy": 139.67, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2004, "energy": 159.91, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2005, "energy": 184.65, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2006, "energy": 182.98, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2007, "energy": 188.13, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2008, "energy": 189.55, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2009, "energy": 164.69, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2010, "energy": 138.39, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2011, "energy": 146.12, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2012, "energy": 145.33, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2013, "energy": 113.32, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2014, "energy": 107.37, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2015, "energy": 123.19, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2016, "energy": 107.93, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2017, "energy": 126.93, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2018, "energy": 112.23, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2019, "energy": 111.55, "source": "fossil fuels"}, {"Entity": "Spain", "Year": 2020, "energy": 87.64, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2000, "energy": 83.15, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2001, "energy": 88.97, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2002, "energy": 93.51, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2003, "energy": 100.61, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2004, "energy": 109.46, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2005, "energy": 115.58, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2006, "energy": 119.41, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2007, "energy": 122.12, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2008, "energy": 127.43, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2009, "energy": 128.09, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2010, "energy": 141.72, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2011, "energy": 135.31, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2012, "energy": 143.73, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2013, "energy": 148.29, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2014, "energy": 149.26, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2015, "energy": 153.4, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2016, "energy": 161.79, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2017, "energy": 161.88, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2018, "energy": 156.26, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2019, "energy": 162.59, "source": "fossil fuels"}, {"Entity": "Thailand", "Year": 2020, "energy": 154.52, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2000, "energy": 82.65, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2001, "energy": 84.59, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2002, "energy": 85.93, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2003, "energy": 89.52, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2004, "energy": 83.22, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2005, "energy": 84.75, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2006, "energy": 90.09, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2007, "energy": 93.13, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2008, "energy": 90.92, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2009, "energy": 78.58, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2010, "energy": 86.28, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2011, "energy": 93.5, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2012, "energy": 96.99, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2013, "energy": 95.39, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2014, "energy": 83.42, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2015, "energy": 66.91, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2016, "energy": 72.66, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2017, "energy": 57.96, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2018, "energy": 60.81, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2019, "energy": 57.79, "source": "fossil fuels"}, {"Entity": "Ukraine", "Year": 2020, "energy": 54.5, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2000, "energy": 279.34, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2001, "energy": 282.72, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2002, "energy": 285.62, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2003, "energy": 296.15, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2004, "energy": 297.15, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2005, "energy": 296.87, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2006, "energy": 299.88, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2007, "energy": 310.26, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2008, "energy": 310.5, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2009, "energy": 278.73, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2010, "energy": 290.59, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2011, "energy": 260.88, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2012, "energy": 249.25, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2013, "energy": 231.56, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2014, "energy": 206.94, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2015, "energy": 182.43, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2016, "energy": 181.56, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2017, "energy": 165.91, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2018, "energy": 155.41, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2019, "energy": 144.99, "source": "fossil fuels"}, {"Entity": "United Kingdom", "Year": 2020, "energy": 124.78, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2000, "energy": 2697.28, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2001, "energy": 2678.68, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2002, "energy": 2727.83, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2003, "energy": 2756.03, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2004, "energy": 2818.28, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2005, "energy": 2899.96, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2006, "energy": 2878.56, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2007, "energy": 2988.24, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2008, "energy": 2924.21, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2009, "energy": 2725.41, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2010, "energy": 2882.49, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2011, "energy": 2788.93, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2012, "energy": 2779.02, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2013, "energy": 2746.21, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2014, "energy": 2752.01, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2015, "energy": 2730.32, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2016, "energy": 2656.96, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2017, "energy": 2540.17, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2018, "energy": 2661.3, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2019, "energy": 2588.21, "source": "fossil fuels"}, {"Entity": "United States", "Year": 2020, "energy": 2431.9, "source": "fossil fuels"}, {"Entity": "Australia", "Year": 2000, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2001, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2002, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2003, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2004, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2005, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2006, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2007, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2008, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2009, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2010, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2011, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2012, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2013, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2014, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2015, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2016, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2017, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2018, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2019, "energy": 0, "source": "nuclear"}, {"Entity": "Australia", "Year": 2020, "energy": 0, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2000, "energy": 4.94, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2001, "energy": 14.27, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2002, "energy": 13.84, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2003, "energy": 13.4, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2004, "energy": 11.6, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2005, "energy": 9.2, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2006, "energy": 12.98, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2007, "energy": 11.65, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2008, "energy": 13.21, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2009, "energy": 12.22, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2010, "energy": 13.77, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2011, "energy": 14.8, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2012, "energy": 15.17, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2013, "energy": 14.65, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2014, "energy": 14.46, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2015, "energy": 13.91, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2016, "energy": 14.97, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2017, "energy": 14.86, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2018, "energy": 14.79, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2019, "energy": 15.16, "source": "nuclear"}, {"Entity": "Brazil", "Year": 2020, "energy": 13.21, "source": "nuclear"}, {"Entity": "Canada", "Year": 2000, "energy": 69.16, "source": "nuclear"}, {"Entity": "Canada", "Year": 2001, "energy": 72.86, "source": "nuclear"}, {"Entity": "Canada", "Year": 2002, "energy": 71.75, "source": "nuclear"}, {"Entity": "Canada", "Year": 2003, "energy": 71.15, "source": "nuclear"}, {"Entity": "Canada", "Year": 2004, "energy": 85.87, "source": "nuclear"}, {"Entity": "Canada", "Year": 2005, "energy": 86.83, "source": "nuclear"}, {"Entity": "Canada", "Year": 2006, "energy": 92.44, "source": "nuclear"}, {"Entity": "Canada", "Year": 2007, "energy": 88.19, "source": "nuclear"}, {"Entity": "Canada", "Year": 2008, "energy": 88.3, "source": "nuclear"}, {"Entity": "Canada", "Year": 2009, "energy": 85.13, "source": "nuclear"}, {"Entity": "Canada", "Year": 2010, "energy": 85.53, "source": "nuclear"}, {"Entity": "Canada", "Year": 2011, "energy": 88.29, "source": "nuclear"}, {"Entity": "Canada", "Year": 2012, "energy": 89.49, "source": "nuclear"}, {"Entity": "Canada", "Year": 2013, "energy": 97.58, "source": "nuclear"}, {"Entity": "Canada", "Year": 2014, "energy": 101.21, "source": "nuclear"}, {"Entity": "Canada", "Year": 2015, "energy": 96.05, "source": "nuclear"}, {"Entity": "Canada", "Year": 2016, "energy": 95.69, "source": "nuclear"}, {"Entity": "Canada", "Year": 2017, "energy": 95.57, "source": "nuclear"}, {"Entity": "Canada", "Year": 2018, "energy": 95.03, "source": "nuclear"}, {"Entity": "Canada", "Year": 2019, "energy": 95.47, "source": "nuclear"}, {"Entity": "Canada", "Year": 2020, "energy": 92.65, "source": "nuclear"}, {"Entity": "China", "Year": 2000, "energy": 16.74, "source": "nuclear"}, {"Entity": "China", "Year": 2001, "energy": 17.47, "source": "nuclear"}, {"Entity": "China", "Year": 2002, "energy": 25.13, "source": "nuclear"}, {"Entity": "China", "Year": 2003, "energy": 43.34, "source": "nuclear"}, {"Entity": "China", "Year": 2004, "energy": 50.47, "source": "nuclear"}, {"Entity": "China", "Year": 2005, "energy": 53.09, "source": "nuclear"}, {"Entity": "China", "Year": 2006, "energy": 54.84, "source": "nuclear"}, {"Entity": "China", "Year": 2007, "energy": 62.13, "source": "nuclear"}, {"Entity": "China", "Year": 2008, "energy": 68.39, "source": "nuclear"}, {"Entity": "China", "Year": 2009, "energy": 70.05, "source": "nuclear"}, {"Entity": "China", "Year": 2010, "energy": 74.74, "source": "nuclear"}, {"Entity": "China", "Year": 2011, "energy": 87.2, "source": "nuclear"}, {"Entity": "China", "Year": 2012, "energy": 98.32, "source": "nuclear"}, {"Entity": "China", "Year": 2013, "energy": 111.5, "source": "nuclear"}, {"Entity": "China", "Year": 2014, "energy": 133.22, "source": "nuclear"}, {"Entity": "China", "Year": 2015, "energy": 171.38, "source": "nuclear"}, {"Entity": "China", "Year": 2016, "energy": 213.18, "source": "nuclear"}, {"Entity": "China", "Year": 2017, "energy": 248.1, "source": "nuclear"}, {"Entity": "China", "Year": 2018, "energy": 295, "source": "nuclear"}, {"Entity": "China", "Year": 2019, "energy": 348.7, "source": "nuclear"}, {"Entity": "China", "Year": 2020, "energy": 366.2, "source": "nuclear"}, {"Entity": "France", "Year": 2000, "energy": 415.16, "source": "nuclear"}, {"Entity": "France", "Year": 2001, "energy": 421.08, "source": "nuclear"}, {"Entity": "France", "Year": 2002, "energy": 436.76, "source": "nuclear"}, {"Entity": "France", "Year": 2003, "energy": 441.07, "source": "nuclear"}, {"Entity": "France", "Year": 2004, "energy": 448.24, "source": "nuclear"}, {"Entity": "France", "Year": 2005, "energy": 451.53, "source": "nuclear"}, {"Entity": "France", "Year": 2006, "energy": 450.19, "source": "nuclear"}, {"Entity": "France", "Year": 2007, "energy": 439.73, "source": "nuclear"}, {"Entity": "France", "Year": 2008, "energy": 439.45, "source": "nuclear"}, {"Entity": "France", "Year": 2009, "energy": 409.74, "source": "nuclear"}, {"Entity": "France", "Year": 2010, "energy": 428.52, "source": "nuclear"}, {"Entity": "France", "Year": 2011, "energy": 442.39, "source": "nuclear"}, {"Entity": "France", "Year": 2012, "energy": 425.41, "source": "nuclear"}, {"Entity": "France", "Year": 2013, "energy": 423.68, "source": "nuclear"}, {"Entity": "France", "Year": 2014, "energy": 436.48, "source": "nuclear"}, {"Entity": "France", "Year": 2015, "energy": 437.43, "source": "nuclear"}, {"Entity": "France", "Year": 2016, "energy": 403.2, "source": "nuclear"}, {"Entity": "France", "Year": 2017, "energy": 398.36, "source": "nuclear"}, {"Entity": "France", "Year": 2018, "energy": 412.94, "source": "nuclear"}, {"Entity": "France", "Year": 2019, "energy": 399.01, "source": "nuclear"}, {"Entity": "France", "Year": 2020, "energy": 353.83, "source": "nuclear"}, {"Entity": "Germany", "Year": 2000, "energy": 169.61, "source": "nuclear"}, {"Entity": "Germany", "Year": 2001, "energy": 171.3, "source": "nuclear"}, {"Entity": "Germany", "Year": 2002, "energy": 164.84, "source": "nuclear"}, {"Entity": "Germany", "Year": 2003, "energy": 165.06, "source": "nuclear"}, {"Entity": "Germany", "Year": 2004, "energy": 167.07, "source": "nuclear"}, {"Entity": "Germany", "Year": 2005, "energy": 163.05, "source": "nuclear"}, {"Entity": "Germany", "Year": 2006, "energy": 167.27, "source": "nuclear"}, {"Entity": "Germany", "Year": 2007, "energy": 140.53, "source": "nuclear"}, {"Entity": "Germany", "Year": 2008, "energy": 148.49, "source": "nuclear"}, {"Entity": "Germany", "Year": 2009, "energy": 134.93, "source": "nuclear"}, {"Entity": "Germany", "Year": 2010, "energy": 140.56, "source": "nuclear"}, {"Entity": "Germany", "Year": 2011, "energy": 107.97, "source": "nuclear"}, {"Entity": "Germany", "Year": 2012, "energy": 99.46, "source": "nuclear"}, {"Entity": "Germany", "Year": 2013, "energy": 97.29, "source": "nuclear"}, {"Entity": "Germany", "Year": 2014, "energy": 97.13, "source": "nuclear"}, {"Entity": "Germany", "Year": 2015, "energy": 91.79, "source": "nuclear"}, {"Entity": "Germany", "Year": 2016, "energy": 84.63, "source": "nuclear"}, {"Entity": "Germany", "Year": 2017, "energy": 76.32, "source": "nuclear"}, {"Entity": "Germany", "Year": 2018, "energy": 76, "source": "nuclear"}, {"Entity": "Germany", "Year": 2019, "energy": 75.07, "source": "nuclear"}, {"Entity": "Germany", "Year": 2020, "energy": 64.38, "source": "nuclear"}, {"Entity": "India", "Year": 2000, "energy": 15.77, "source": "nuclear"}, {"Entity": "India", "Year": 2001, "energy": 18.89, "source": "nuclear"}, {"Entity": "India", "Year": 2002, "energy": 19.35, "source": "nuclear"}, {"Entity": "India", "Year": 2003, "energy": 18.14, "source": "nuclear"}, {"Entity": "India", "Year": 2004, "energy": 21.26, "source": "nuclear"}, {"Entity": "India", "Year": 2005, "energy": 17.73, "source": "nuclear"}, {"Entity": "India", "Year": 2006, "energy": 17.63, "source": "nuclear"}, {"Entity": "India", "Year": 2007, "energy": 17.83, "source": "nuclear"}, {"Entity": "India", "Year": 2008, "energy": 15.23, "source": "nuclear"}, {"Entity": "India", "Year": 2009, "energy": 16.82, "source": "nuclear"}, {"Entity": "India", "Year": 2010, "energy": 23.08, "source": "nuclear"}, {"Entity": "India", "Year": 2011, "energy": 32.22, "source": "nuclear"}, {"Entity": "India", "Year": 2012, "energy": 33.14, "source": "nuclear"}, {"Entity": "India", "Year": 2013, "energy": 33.31, "source": "nuclear"}, {"Entity": "India", "Year": 2014, "energy": 34.69, "source": "nuclear"}, {"Entity": "India", "Year": 2015, "energy": 38.31, "source": "nuclear"}, {"Entity": "India", "Year": 2016, "energy": 37.9, "source": "nuclear"}, {"Entity": "India", "Year": 2017, "energy": 37.41, "source": "nuclear"}, {"Entity": "India", "Year": 2018, "energy": 39.05, "source": "nuclear"}, {"Entity": "India", "Year": 2019, "energy": 45.16, "source": "nuclear"}, {"Entity": "India", "Year": 2020, "energy": 44.61, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2000, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2001, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2002, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2003, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2004, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2005, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2006, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2007, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2008, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2009, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2010, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2011, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2012, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2013, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2014, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2015, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2016, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2017, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2018, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2019, "energy": null, "source": "nuclear"}, {"Entity": "Indonesia", "Year": 2020, "energy": null, "source": "nuclear"}, {"Entity": "Italy", "Year": 2000, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2001, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2002, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2003, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2004, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2005, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2006, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2007, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2008, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2009, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2010, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2011, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2012, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2013, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2014, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2015, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2016, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2017, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2018, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2019, "energy": 0, "source": "nuclear"}, {"Entity": "Italy", "Year": 2020, "energy": 0, "source": "nuclear"}, {"Entity": "Japan", "Year": 2000, "energy": 305.95, "source": "nuclear"}, {"Entity": "Japan", "Year": 2001, "energy": 303.86, "source": "nuclear"}, {"Entity": "Japan", "Year": 2002, "energy": 280.34, "source": "nuclear"}, {"Entity": "Japan", "Year": 2003, "energy": 228.01, "source": "nuclear"}, {"Entity": "Japan", "Year": 2004, "energy": 268.32, "source": "nuclear"}, {"Entity": "Japan", "Year": 2005, "energy": 280.5, "source": "nuclear"}, {"Entity": "Japan", "Year": 2006, "energy": 291.54, "source": "nuclear"}, {"Entity": "Japan", "Year": 2007, "energy": 267.34, "source": "nuclear"}, {"Entity": "Japan", "Year": 2008, "energy": 241.25, "source": "nuclear"}, {"Entity": "Japan", "Year": 2009, "energy": 263.05, "source": "nuclear"}, {"Entity": "Japan", "Year": 2010, "energy": 278.36, "source": "nuclear"}, {"Entity": "Japan", "Year": 2011, "energy": 153.38, "source": "nuclear"}, {"Entity": "Japan", "Year": 2012, "energy": 15.12, "source": "nuclear"}, {"Entity": "Japan", "Year": 2013, "energy": 10.43, "source": "nuclear"}, {"Entity": "Japan", "Year": 2014, "energy": 0, "source": "nuclear"}, {"Entity": "Japan", "Year": 2015, "energy": 3.24, "source": "nuclear"}, {"Entity": "Japan", "Year": 2016, "energy": 14.87, "source": "nuclear"}, {"Entity": "Japan", "Year": 2017, "energy": 27.75, "source": "nuclear"}, {"Entity": "Japan", "Year": 2018, "energy": 47.82, "source": "nuclear"}, {"Entity": "Japan", "Year": 2019, "energy": 63.88, "source": "nuclear"}, {"Entity": "Japan", "Year": 2020, "energy": 41.86, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2000, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2001, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2002, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2003, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2004, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2005, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2006, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2007, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2008, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2009, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2010, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2011, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2012, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2013, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2014, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2015, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2016, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2017, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2018, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2019, "energy": null, "source": "nuclear"}, {"Entity": "Kazakhstan", "Year": 2020, "energy": null, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2000, "energy": 7.81, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2001, "energy": 8.29, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2002, "energy": 9.26, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2003, "energy": 9.98, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2004, "energy": 8.73, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2005, "energy": 10.32, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2006, "energy": 10.4, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2007, "energy": 9.95, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2008, "energy": 9.36, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2009, "energy": 10.11, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2010, "energy": 5.66, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2011, "energy": 9.66, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2012, "energy": 8.41, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2013, "energy": 11.38, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2014, "energy": 9.3, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2015, "energy": 11.18, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2016, "energy": 10.27, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2017, "energy": 10.57, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2018, "energy": 13.32, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2019, "energy": 10.88, "source": "nuclear"}, {"Entity": "Mexico", "Year": 2020, "energy": 10.87, "source": "nuclear"}, {"Entity": "Poland", "Year": 2000, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2001, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2002, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2003, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2004, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2005, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2006, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2007, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2008, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2009, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2010, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2011, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2012, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2013, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2014, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2015, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2016, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2017, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2018, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2019, "energy": 0, "source": "nuclear"}, {"Entity": "Poland", "Year": 2020, "energy": 0, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2000, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2001, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2002, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2003, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2004, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2005, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2006, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2007, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2008, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2009, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2010, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2011, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2012, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2013, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2014, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2015, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2016, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2017, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2018, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2019, "energy": null, "source": "nuclear"}, {"Entity": "Saudi Arabia", "Year": 2020, "energy": null, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2000, "energy": 13.01, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2001, "energy": 10.72, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2002, "energy": 11.99, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2003, "energy": 12.66, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2004, "energy": 14.28, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2005, "energy": 12.24, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2006, "energy": 10.07, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2007, "energy": 12.6, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2008, "energy": 12.75, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2009, "energy": 11.57, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2010, "energy": 12.9, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2011, "energy": 12.94, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2012, "energy": 12.4, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2013, "energy": 13.61, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2014, "energy": 14.76, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2015, "energy": 10.97, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2016, "energy": 15.21, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2017, "energy": 15.09, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2018, "energy": 10.56, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2019, "energy": 13.6, "source": "nuclear"}, {"Entity": "South Africa", "Year": 2020, "energy": 11.62, "source": "nuclear"}, {"Entity": "Spain", "Year": 2000, "energy": 62.21, "source": "nuclear"}, {"Entity": "Spain", "Year": 2001, "energy": 63.71, "source": "nuclear"}, {"Entity": "Spain", "Year": 2002, "energy": 63.02, "source": "nuclear"}, {"Entity": "Spain", "Year": 2003, "energy": 61.88, "source": "nuclear"}, {"Entity": "Spain", "Year": 2004, "energy": 63.61, "source": "nuclear"}, {"Entity": "Spain", "Year": 2005, "energy": 57.54, "source": "nuclear"}, {"Entity": "Spain", "Year": 2006, "energy": 60.13, "source": "nuclear"}, {"Entity": "Spain", "Year": 2007, "energy": 55.1, "source": "nuclear"}, {"Entity": "Spain", "Year": 2008, "energy": 58.97, "source": "nuclear"}, {"Entity": "Spain", "Year": 2009, "energy": 52.76, "source": "nuclear"}, {"Entity": "Spain", "Year": 2010, "energy": 61.99, "source": "nuclear"}, {"Entity": "Spain", "Year": 2011, "energy": 57.72, "source": "nuclear"}, {"Entity": "Spain", "Year": 2012, "energy": 61.47, "source": "nuclear"}, {"Entity": "Spain", "Year": 2013, "energy": 56.73, "source": "nuclear"}, {"Entity": "Spain", "Year": 2014, "energy": 57.31, "source": "nuclear"}, {"Entity": "Spain", "Year": 2015, "energy": 57.2, "source": "nuclear"}, {"Entity": "Spain", "Year": 2016, "energy": 58.63, "source": "nuclear"}, {"Entity": "Spain", "Year": 2017, "energy": 58.04, "source": "nuclear"}, {"Entity": "Spain", "Year": 2018, "energy": 55.77, "source": "nuclear"}, {"Entity": "Spain", "Year": 2019, "energy": 58.35, "source": "nuclear"}, {"Entity": "Spain", "Year": 2020, "energy": 58.3, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2000, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2001, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2002, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2003, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2004, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2005, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2006, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2007, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2008, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2009, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2010, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2011, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2012, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2013, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2014, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2015, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2016, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2017, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2018, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2019, "energy": 0, "source": "nuclear"}, {"Entity": "Thailand", "Year": 2020, "energy": 0, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2000, "energy": 77.34, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2001, "energy": 76.17, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2002, "energy": 77.99, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2003, "energy": 81.41, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2004, "energy": 87.02, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2005, "energy": 88.76, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2006, "energy": 90.22, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2007, "energy": 92.54, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2008, "energy": 89.84, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2009, "energy": 82.92, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2010, "energy": 89.15, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2011, "energy": 90.25, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2012, "energy": 90.14, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2013, "energy": 83.21, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2014, "energy": 88.39, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2015, "energy": 87.63, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2016, "energy": 80.95, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2017, "energy": 85.58, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2018, "energy": 84.4, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2019, "energy": 83, "source": "nuclear"}, {"Entity": "Ukraine", "Year": 2020, "energy": 76.2, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2000, "energy": 85.06, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2001, "energy": 90.09, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2002, "energy": 87.85, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2003, "energy": 88.69, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2004, "energy": 80, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2005, "energy": 81.62, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2006, "energy": 75.45, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2007, "energy": 63.03, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2008, "energy": 52.49, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2009, "energy": 69.1, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2010, "energy": 62.14, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2011, "energy": 68.98, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2012, "energy": 70.4, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2013, "energy": 70.61, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2014, "energy": 63.75, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2015, "energy": 70.34, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2016, "energy": 71.73, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2017, "energy": 70.34, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2018, "energy": 65.06, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2019, "energy": 56.18, "source": "nuclear"}, {"Entity": "United Kingdom", "Year": 2020, "energy": 50.85, "source": "nuclear"}, {"Entity": "United States", "Year": 2000, "energy": 753.89, "source": "nuclear"}, {"Entity": "United States", "Year": 2001, "energy": 768.83, "source": "nuclear"}, {"Entity": "United States", "Year": 2002, "energy": 780.06, "source": "nuclear"}, {"Entity": "United States", "Year": 2003, "energy": 763.73, "source": "nuclear"}, {"Entity": "United States", "Year": 2004, "energy": 788.53, "source": "nuclear"}, {"Entity": "United States", "Year": 2005, "energy": 781.99, "source": "nuclear"}, {"Entity": "United States", "Year": 2006, "energy": 787.22, "source": "nuclear"}, {"Entity": "United States", "Year": 2007, "energy": 806.42, "source": "nuclear"}, {"Entity": "United States", "Year": 2008, "energy": 806.21, "source": "nuclear"}, {"Entity": "United States", "Year": 2009, "energy": 798.85, "source": "nuclear"}, {"Entity": "United States", "Year": 2010, "energy": 806.97, "source": "nuclear"}, {"Entity": "United States", "Year": 2011, "energy": 790.2, "source": "nuclear"}, {"Entity": "United States", "Year": 2012, "energy": 769.33, "source": "nuclear"}, {"Entity": "United States", "Year": 2013, "energy": 789.02, "source": "nuclear"}, {"Entity": "United States", "Year": 2014, "energy": 797.17, "source": "nuclear"}, {"Entity": "United States", "Year": 2015, "energy": 797.18, "source": "nuclear"}, {"Entity": "United States", "Year": 2016, "energy": 805.69, "source": "nuclear"}, {"Entity": "United States", "Year": 2017, "energy": 804.95, "source": "nuclear"}, {"Entity": "United States", "Year": 2018, "energy": 807.08, "source": "nuclear"}, {"Entity": "United States", "Year": 2019, "energy": 809.41, "source": "nuclear"}, {"Entity": "United States", "Year": 2020, "energy": 789.88, "source": "nuclear"}, {"Entity": "Australia", "Year": 2000, "energy": 17.11, "source": "renewables"}, {"Entity": "Australia", "Year": 2001, "energy": 17.4, "source": "renewables"}, {"Entity": "Australia", "Year": 2002, "energy": 17.35, "source": "renewables"}, {"Entity": "Australia", "Year": 2003, "energy": 18.5, "source": "renewables"}, {"Entity": "Australia", "Year": 2004, "energy": 19.41, "source": "renewables"}, {"Entity": "Australia", "Year": 2005, "energy": 19.75, "source": "renewables"}, {"Entity": "Australia", "Year": 2006, "energy": 21.19, "source": "renewables"}, {"Entity": "Australia", "Year": 2007, "energy": 20.93, "source": "renewables"}, {"Entity": "Australia", "Year": 2008, "energy": 18.49, "source": "renewables"}, {"Entity": "Australia", "Year": 2009, "energy": 18.32, "source": "renewables"}, {"Entity": "Australia", "Year": 2010, "energy": 21.13, "source": "renewables"}, {"Entity": "Australia", "Year": 2011, "energy": 27.33, "source": "renewables"}, {"Entity": "Australia", "Year": 2012, "energy": 26.63, "source": "renewables"}, {"Entity": "Australia", "Year": 2013, "energy": 34.2, "source": "renewables"}, {"Entity": "Australia", "Year": 2014, "energy": 36.15, "source": "renewables"}, {"Entity": "Australia", "Year": 2015, "energy": 33.12, "source": "renewables"}, {"Entity": "Australia", "Year": 2016, "energy": 38.41, "source": "renewables"}, {"Entity": "Australia", "Year": 2017, "energy": 40.77, "source": "renewables"}, {"Entity": "Australia", "Year": 2018, "energy": 42.93, "source": "renewables"}, {"Entity": "Australia", "Year": 2019, "energy": 53.41, "source": "renewables"}, {"Entity": "Australia", "Year": 2020, "energy": 63.99, "source": "renewables"}, {"Entity": "Brazil", "Year": 2000, "energy": 308.77, "source": "renewables"}, {"Entity": "Brazil", "Year": 2001, "energy": 273.71, "source": "renewables"}, {"Entity": "Brazil", "Year": 2002, "energy": 292.95, "source": "renewables"}, {"Entity": "Brazil", "Year": 2003, "energy": 313.88, "source": "renewables"}, {"Entity": "Brazil", "Year": 2004, "energy": 329.43, "source": "renewables"}, {"Entity": "Brazil", "Year": 2005, "energy": 346.96, "source": "renewables"}, {"Entity": "Brazil", "Year": 2006, "energy": 359.55, "source": "renewables"}, {"Entity": "Brazil", "Year": 2007, "energy": 387.88, "source": "renewables"}, {"Entity": "Brazil", "Year": 2008, "energy": 385.61, "source": "renewables"}, {"Entity": "Brazil", "Year": 2009, "energy": 410.13, "source": "renewables"}, {"Entity": "Brazil", "Year": 2010, "energy": 435.99, "source": "renewables"}, {"Entity": "Brazil", "Year": 2011, "energy": 462.32, "source": "renewables"}, {"Entity": "Brazil", "Year": 2012, "energy": 454.78, "source": "renewables"}, {"Entity": "Brazil", "Year": 2013, "energy": 436.84, "source": "renewables"}, {"Entity": "Brazil", "Year": 2014, "energy": 430.82, "source": "renewables"}, {"Entity": "Brazil", "Year": 2015, "energy": 428.81, "source": "renewables"}, {"Entity": "Brazil", "Year": 2016, "energy": 463.37, "source": "renewables"}, {"Entity": "Brazil", "Year": 2017, "energy": 464.4, "source": "renewables"}, {"Entity": "Brazil", "Year": 2018, "energy": 492.66, "source": "renewables"}, {"Entity": "Brazil", "Year": 2019, "energy": 512.59, "source": "renewables"}, {"Entity": "Brazil", "Year": 2020, "energy": 520.01, "source": "renewables"}, {"Entity": "Canada", "Year": 2000, "energy": 363.7, "source": "renewables"}, {"Entity": "Canada", "Year": 2001, "energy": 339.58, "source": "renewables"}, {"Entity": "Canada", "Year": 2002, "energy": 357.06, "source": "renewables"}, {"Entity": "Canada", "Year": 2003, "energy": 343.88, "source": "renewables"}, {"Entity": "Canada", "Year": 2004, "energy": 347.68, "source": "renewables"}, {"Entity": "Canada", "Year": 2005, "energy": 368.86, "source": "renewables"}, {"Entity": "Canada", "Year": 2006, "energy": 360.48, "source": "renewables"}, {"Entity": "Canada", "Year": 2007, "energy": 375.42, "source": "renewables"}, {"Entity": "Canada", "Year": 2008, "energy": 385.21, "source": "renewables"}, {"Entity": "Canada", "Year": 2009, "energy": 380.24, "source": "renewables"}, {"Entity": "Canada", "Year": 2010, "energy": 366.21, "source": "renewables"}, {"Entity": "Canada", "Year": 2011, "energy": 391.95, "source": "renewables"}, {"Entity": "Canada", "Year": 2012, "energy": 398.58, "source": "renewables"}, {"Entity": "Canada", "Year": 2013, "energy": 417.28, "source": "renewables"}, {"Entity": "Canada", "Year": 2014, "energy": 412.13, "source": "renewables"}, {"Entity": "Canada", "Year": 2015, "energy": 417.2, "source": "renewables"}, {"Entity": "Canada", "Year": 2016, "energy": 426.84, "source": "renewables"}, {"Entity": "Canada", "Year": 2017, "energy": 435.43, "source": "renewables"}, {"Entity": "Canada", "Year": 2018, "energy": 428.39, "source": "renewables"}, {"Entity": "Canada", "Year": 2019, "energy": 421.8, "source": "renewables"}, {"Entity": "Canada", "Year": 2020, "energy": 429.24, "source": "renewables"}, {"Entity": "China", "Year": 2000, "energy": 225.56, "source": "renewables"}, {"Entity": "China", "Year": 2001, "energy": 280.73, "source": "renewables"}, {"Entity": "China", "Year": 2002, "energy": 291.41, "source": "renewables"}, {"Entity": "China", "Year": 2003, "energy": 287.28, "source": "renewables"}, {"Entity": "China", "Year": 2004, "energy": 357.43, "source": "renewables"}, {"Entity": "China", "Year": 2005, "energy": 404.37, "source": "renewables"}, {"Entity": "China", "Year": 2006, "energy": 446.72, "source": "renewables"}, {"Entity": "China", "Year": 2007, "energy": 500.71, "source": "renewables"}, {"Entity": "China", "Year": 2008, "energy": 665.08, "source": "renewables"}, {"Entity": "China", "Year": 2009, "energy": 664.39, "source": "renewables"}, {"Entity": "China", "Year": 2010, "energy": 786.38, "source": "renewables"}, {"Entity": "China", "Year": 2011, "energy": 792.38, "source": "renewables"}, {"Entity": "China", "Year": 2012, "energy": 999.56, "source": "renewables"}, {"Entity": "China", "Year": 2013, "energy": 1093.37, "source": "renewables"}, {"Entity": "China", "Year": 2014, "energy": 1289.23, "source": "renewables"}, {"Entity": "China", "Year": 2015, "energy": 1393.66, "source": "renewables"}, {"Entity": "China", "Year": 2016, "energy": 1522.79, "source": "renewables"}, {"Entity": "China", "Year": 2017, "energy": 1667.06, "source": "renewables"}, {"Entity": "China", "Year": 2018, "energy": 1835.32, "source": "renewables"}, {"Entity": "China", "Year": 2019, "energy": 2014.57, "source": "renewables"}, {"Entity": "China", "Year": 2020, "energy": 2184.94, "source": "renewables"}, {"Entity": "France", "Year": 2000, "energy": 67.83, "source": "renewables"}, {"Entity": "France", "Year": 2001, "energy": 76.09, "source": "renewables"}, {"Entity": "France", "Year": 2002, "energy": 62.69, "source": "renewables"}, {"Entity": "France", "Year": 2003, "energy": 61.47, "source": "renewables"}, {"Entity": "France", "Year": 2004, "energy": 62.42, "source": "renewables"}, {"Entity": "France", "Year": 2005, "energy": 54.98, "source": "renewables"}, {"Entity": "France", "Year": 2006, "energy": 60.91, "source": "renewables"}, {"Entity": "France", "Year": 2007, "energy": 64.3, "source": "renewables"}, {"Entity": "France", "Year": 2008, "energy": 72.33, "source": "renewables"}, {"Entity": "France", "Year": 2009, "energy": 68.15, "source": "renewables"}, {"Entity": "France", "Year": 2010, "energy": 76.68, "source": "renewables"}, {"Entity": "France", "Year": 2011, "energy": 66.02, "source": "renewables"}, {"Entity": "France", "Year": 2012, "energy": 85.25, "source": "renewables"}, {"Entity": "France", "Year": 2013, "energy": 99.42, "source": "renewables"}, {"Entity": "France", "Year": 2014, "energy": 94.03, "source": "renewables"}, {"Entity": "France", "Year": 2015, "energy": 91.84, "source": "renewables"}, {"Entity": "France", "Year": 2016, "energy": 99, "source": "renewables"}, {"Entity": "France", "Year": 2017, "energy": 92.63, "source": "renewables"}, {"Entity": "France", "Year": 2018, "energy": 113.62, "source": "renewables"}, {"Entity": "France", "Year": 2019, "energy": 113.21, "source": "renewables"}, {"Entity": "France", "Year": 2020, "energy": 125.28, "source": "renewables"}, {"Entity": "Germany", "Year": 2000, "energy": 35.47, "source": "renewables"}, {"Entity": "Germany", "Year": 2001, "energy": 37.9, "source": "renewables"}, {"Entity": "Germany", "Year": 2002, "energy": 44.48, "source": "renewables"}, {"Entity": "Germany", "Year": 2003, "energy": 46.67, "source": "renewables"}, {"Entity": "Germany", "Year": 2004, "energy": 57.97, "source": "renewables"}, {"Entity": "Germany", "Year": 2005, "energy": 63.4, "source": "renewables"}, {"Entity": "Germany", "Year": 2006, "energy": 72.51, "source": "renewables"}, {"Entity": "Germany", "Year": 2007, "energy": 89.38, "source": "renewables"}, {"Entity": "Germany", "Year": 2008, "energy": 94.28, "source": "renewables"}, {"Entity": "Germany", "Year": 2009, "energy": 95.94, "source": "renewables"}, {"Entity": "Germany", "Year": 2010, "energy": 105.18, "source": "renewables"}, {"Entity": "Germany", "Year": 2011, "energy": 124.04, "source": "renewables"}, {"Entity": "Germany", "Year": 2012, "energy": 143.04, "source": "renewables"}, {"Entity": "Germany", "Year": 2013, "energy": 152.34, "source": "renewables"}, {"Entity": "Germany", "Year": 2014, "energy": 162.54, "source": "renewables"}, {"Entity": "Germany", "Year": 2015, "energy": 188.79, "source": "renewables"}, {"Entity": "Germany", "Year": 2016, "energy": 189.67, "source": "renewables"}, {"Entity": "Germany", "Year": 2017, "energy": 216.32, "source": "renewables"}, {"Entity": "Germany", "Year": 2018, "energy": 222.07, "source": "renewables"}, {"Entity": "Germany", "Year": 2019, "energy": 240.33, "source": "renewables"}, {"Entity": "Germany", "Year": 2020, "energy": 251.48, "source": "renewables"}, {"Entity": "India", "Year": 2000, "energy": 80.27, "source": "renewables"}, {"Entity": "India", "Year": 2001, "energy": 76.19, "source": "renewables"}, {"Entity": "India", "Year": 2002, "energy": 72.78, "source": "renewables"}, {"Entity": "India", "Year": 2003, "energy": 74.63, "source": "renewables"}, {"Entity": "India", "Year": 2004, "energy": 109.2, "source": "renewables"}, {"Entity": "India", "Year": 2005, "energy": 107.47, "source": "renewables"}, {"Entity": "India", "Year": 2006, "energy": 127.56, "source": "renewables"}, {"Entity": "India", "Year": 2007, "energy": 141.75, "source": "renewables"}, {"Entity": "India", "Year": 2008, "energy": 138.91, "source": "renewables"}, {"Entity": "India", "Year": 2009, "energy": 134.33, "source": "renewables"}, {"Entity": "India", "Year": 2010, "energy": 142.61, "source": "renewables"}, {"Entity": "India", "Year": 2011, "energy": 173.62, "source": "renewables"}, {"Entity": "India", "Year": 2012, "energy": 165.25, "source": "renewables"}, {"Entity": "India", "Year": 2013, "energy": 187.9, "source": "renewables"}, {"Entity": "India", "Year": 2014, "energy": 202.04, "source": "renewables"}, {"Entity": "India", "Year": 2015, "energy": 203.21, "source": "renewables"}, {"Entity": "India", "Year": 2016, "energy": 208.21, "source": "renewables"}, {"Entity": "India", "Year": 2017, "energy": 234.9, "source": "renewables"}, {"Entity": "India", "Year": 2018, "energy": 263.61, "source": "renewables"}, {"Entity": "India", "Year": 2019, "energy": 303.16, "source": "renewables"}, {"Entity": "India", "Year": 2020, "energy": 315.76, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2000, "energy": 19.6, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2001, "energy": 22.19, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2002, "energy": 21, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2003, "energy": 19.82, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2004, "energy": 20.97, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2005, "energy": 22.66, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2006, "energy": 21.18, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2007, "energy": 24.29, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2008, "energy": 26.34, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2009, "energy": 26.79, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2010, "energy": 34.63, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2011, "energy": 30.46, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2012, "energy": 31.11, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2013, "energy": 35.5, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2014, "energy": 34.41, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2015, "energy": 33.56, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2016, "energy": 39.58, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2017, "energy": 43.17, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2018, "energy": 48.38, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2019, "energy": 48.04, "source": "renewables"}, {"Entity": "Indonesia", "Year": 2020, "energy": 52.91, "source": "renewables"}, {"Entity": "Italy", "Year": 2000, "energy": 50.87, "source": "renewables"}, {"Entity": "Italy", "Year": 2001, "energy": 54.35, "source": "renewables"}, {"Entity": "Italy", "Year": 2002, "energy": 48.31, "source": "renewables"}, {"Entity": "Italy", "Year": 2003, "energy": 46.86, "source": "renewables"}, {"Entity": "Italy", "Year": 2004, "energy": 53.88, "source": "renewables"}, {"Entity": "Italy", "Year": 2005, "energy": 48.43, "source": "renewables"}, {"Entity": "Italy", "Year": 2006, "energy": 50.64, "source": "renewables"}, {"Entity": "Italy", "Year": 2007, "energy": 47.72, "source": "renewables"}, {"Entity": "Italy", "Year": 2008, "energy": 58.16, "source": "renewables"}, {"Entity": "Italy", "Year": 2009, "energy": 69.26, "source": "renewables"}, {"Entity": "Italy", "Year": 2010, "energy": 76.98, "source": "renewables"}, {"Entity": "Italy", "Year": 2011, "energy": 82.96, "source": "renewables"}, {"Entity": "Italy", "Year": 2012, "energy": 92.22, "source": "renewables"}, {"Entity": "Italy", "Year": 2013, "energy": 112, "source": "renewables"}, {"Entity": "Italy", "Year": 2014, "energy": 120.68, "source": "renewables"}, {"Entity": "Italy", "Year": 2015, "energy": 108.89, "source": "renewables"}, {"Entity": "Italy", "Year": 2016, "energy": 108.01, "source": "renewables"}, {"Entity": "Italy", "Year": 2017, "energy": 103.89, "source": "renewables"}, {"Entity": "Italy", "Year": 2018, "energy": 114.41, "source": "renewables"}, {"Entity": "Italy", "Year": 2019, "energy": 115.83, "source": "renewables"}, {"Entity": "Italy", "Year": 2020, "energy": 116.9, "source": "renewables"}, {"Entity": "Japan", "Year": 2000, "energy": 104.16, "source": "renewables"}, {"Entity": "Japan", "Year": 2001, "energy": 101.36, "source": "renewables"}, {"Entity": "Japan", "Year": 2002, "energy": 101.1, "source": "renewables"}, {"Entity": "Japan", "Year": 2003, "energy": 114.18, "source": "renewables"}, {"Entity": "Japan", "Year": 2004, "energy": 114.73, "source": "renewables"}, {"Entity": "Japan", "Year": 2005, "energy": 100.57, "source": "renewables"}, {"Entity": "Japan", "Year": 2006, "energy": 112.07, "source": "renewables"}, {"Entity": "Japan", "Year": 2007, "energy": 100.8, "source": "renewables"}, {"Entity": "Japan", "Year": 2008, "energy": 100.79, "source": "renewables"}, {"Entity": "Japan", "Year": 2009, "energy": 102.28, "source": "renewables"}, {"Entity": "Japan", "Year": 2010, "energy": 113.92, "source": "renewables"}, {"Entity": "Japan", "Year": 2011, "energy": 116.5, "source": "renewables"}, {"Entity": "Japan", "Year": 2012, "energy": 111.09, "source": "renewables"}, {"Entity": "Japan", "Year": 2013, "energy": 121.48, "source": "renewables"}, {"Entity": "Japan", "Year": 2014, "energy": 136.53, "source": "renewables"}, {"Entity": "Japan", "Year": 2015, "energy": 157.34, "source": "renewables"}, {"Entity": "Japan", "Year": 2016, "energy": 157.7, "source": "renewables"}, {"Entity": "Japan", "Year": 2017, "energy": 175.12, "source": "renewables"}, {"Entity": "Japan", "Year": 2018, "energy": 183.63, "source": "renewables"}, {"Entity": "Japan", "Year": 2019, "energy": 192.72, "source": "renewables"}, {"Entity": "Japan", "Year": 2020, "energy": 205.6, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2000, "energy": 7.53, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2001, "energy": 8.08, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2002, "energy": 8.89, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2003, "energy": 8.62, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2004, "energy": 8.06, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2005, "energy": 7.86, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2006, "energy": 7.77, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2007, "energy": 8.17, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2008, "energy": 7.46, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2009, "energy": 6.88, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2010, "energy": 8.02, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2011, "energy": 7.88, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2012, "energy": 7.64, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2013, "energy": 7.73, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2014, "energy": 8.27, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2015, "energy": 9.45, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2016, "energy": 11.98, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2017, "energy": 11.64, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2018, "energy": 10.91, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2019, "energy": 11.09, "source": "renewables"}, {"Entity": "Kazakhstan", "Year": 2020, "energy": 11.94, "source": "renewables"}, {"Entity": "Mexico", "Year": 2000, "energy": 44.51, "source": "renewables"}, {"Entity": "Mexico", "Year": 2001, "energy": 39.56, "source": "renewables"}, {"Entity": "Mexico", "Year": 2002, "energy": 35.67, "source": "renewables"}, {"Entity": "Mexico", "Year": 2003, "energy": 32.11, "source": "renewables"}, {"Entity": "Mexico", "Year": 2004, "energy": 38.19, "source": "renewables"}, {"Entity": "Mexico", "Year": 2005, "energy": 42.29, "source": "renewables"}, {"Entity": "Mexico", "Year": 2006, "energy": 43.63, "source": "renewables"}, {"Entity": "Mexico", "Year": 2007, "energy": 42.14, "source": "renewables"}, {"Entity": "Mexico", "Year": 2008, "energy": 53.22, "source": "renewables"}, {"Entity": "Mexico", "Year": 2009, "energy": 40.59, "source": "renewables"}, {"Entity": "Mexico", "Year": 2010, "energy": 51.37, "source": "renewables"}, {"Entity": "Mexico", "Year": 2011, "energy": 50.7, "source": "renewables"}, {"Entity": "Mexico", "Year": 2012, "energy": 47.2, "source": "renewables"}, {"Entity": "Mexico", "Year": 2013, "energy": 44.67, "source": "renewables"}, {"Entity": "Mexico", "Year": 2014, "energy": 57.46, "source": "renewables"}, {"Entity": "Mexico", "Year": 2015, "energy": 52.42, "source": "renewables"}, {"Entity": "Mexico", "Year": 2016, "energy": 52.97, "source": "renewables"}, {"Entity": "Mexico", "Year": 2017, "energy": 55.88, "source": "renewables"}, {"Entity": "Mexico", "Year": 2018, "energy": 58.78, "source": "renewables"}, {"Entity": "Mexico", "Year": 2019, "energy": 59, "source": "renewables"}, {"Entity": "Mexico", "Year": 2020, "energy": 69.19, "source": "renewables"}, {"Entity": "Poland", "Year": 2000, "energy": 2.33, "source": "renewables"}, {"Entity": "Poland", "Year": 2001, "energy": 2.78, "source": "renewables"}, {"Entity": "Poland", "Year": 2002, "energy": 2.77, "source": "renewables"}, {"Entity": "Poland", "Year": 2003, "energy": 2.25, "source": "renewables"}, {"Entity": "Poland", "Year": 2004, "energy": 3.2, "source": "renewables"}, {"Entity": "Poland", "Year": 2005, "energy": 3.85, "source": "renewables"}, {"Entity": "Poland", "Year": 2006, "energy": 4.29, "source": "renewables"}, {"Entity": "Poland", "Year": 2007, "energy": 5.43, "source": "renewables"}, {"Entity": "Poland", "Year": 2008, "energy": 6.61, "source": "renewables"}, {"Entity": "Poland", "Year": 2009, "energy": 8.69, "source": "renewables"}, {"Entity": "Poland", "Year": 2010, "energy": 10.88, "source": "renewables"}, {"Entity": "Poland", "Year": 2011, "energy": 13.13, "source": "renewables"}, {"Entity": "Poland", "Year": 2012, "energy": 16.88, "source": "renewables"}, {"Entity": "Poland", "Year": 2013, "energy": 17.06, "source": "renewables"}, {"Entity": "Poland", "Year": 2014, "energy": 19.85, "source": "renewables"}, {"Entity": "Poland", "Year": 2015, "energy": 22.69, "source": "renewables"}, {"Entity": "Poland", "Year": 2016, "energy": 22.81, "source": "renewables"}, {"Entity": "Poland", "Year": 2017, "energy": 24.13, "source": "renewables"}, {"Entity": "Poland", "Year": 2018, "energy": 21.62, "source": "renewables"}, {"Entity": "Poland", "Year": 2019, "energy": 25.46, "source": "renewables"}, {"Entity": "Poland", "Year": 2020, "energy": 28.23, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2000, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2001, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2002, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2003, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2004, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2005, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2006, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2007, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2008, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2009, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2010, "energy": 0, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2011, "energy": 0.01, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2012, "energy": 0.03, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2013, "energy": 0.04, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2014, "energy": 0.05, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2015, "energy": 0.05, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2016, "energy": 0.05, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2017, "energy": 0.07, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2018, "energy": 0.16, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2019, "energy": 0.21, "source": "renewables"}, {"Entity": "Saudi Arabia", "Year": 2020, "energy": 0.21, "source": "renewables"}, {"Entity": "South Africa", "Year": 2000, "energy": 1.79, "source": "renewables"}, {"Entity": "South Africa", "Year": 2001, "energy": 2.46, "source": "renewables"}, {"Entity": "South Africa", "Year": 2002, "energy": 2.81, "source": "renewables"}, {"Entity": "South Africa", "Year": 2003, "energy": 1.19, "source": "renewables"}, {"Entity": "South Africa", "Year": 2004, "energy": 1.33, "source": "renewables"}, {"Entity": "South Africa", "Year": 2005, "energy": 1.75, "source": "renewables"}, {"Entity": "South Africa", "Year": 2006, "energy": 3.28, "source": "renewables"}, {"Entity": "South Africa", "Year": 2007, "energy": 1.3, "source": "renewables"}, {"Entity": "South Africa", "Year": 2008, "energy": 1.66, "source": "renewables"}, {"Entity": "South Africa", "Year": 2009, "energy": 1.86, "source": "renewables"}, {"Entity": "South Africa", "Year": 2010, "energy": 2.51, "source": "renewables"}, {"Entity": "South Africa", "Year": 2011, "energy": 2.49, "source": "renewables"}, {"Entity": "South Africa", "Year": 2012, "energy": 1.66, "source": "renewables"}, {"Entity": "South Africa", "Year": 2013, "energy": 1.62, "source": "renewables"}, {"Entity": "South Africa", "Year": 2014, "energy": 3.38, "source": "renewables"}, {"Entity": "South Africa", "Year": 2015, "energy": 6.09, "source": "renewables"}, {"Entity": "South Africa", "Year": 2016, "energy": 7.69, "source": "renewables"}, {"Entity": "South Africa", "Year": 2017, "energy": 10.04, "source": "renewables"}, {"Entity": "South Africa", "Year": 2018, "energy": 12.22, "source": "renewables"}, {"Entity": "South Africa", "Year": 2019, "energy": 12.57, "source": "renewables"}, {"Entity": "South Africa", "Year": 2020, "energy": 12.83, "source": "renewables"}, {"Entity": "Spain", "Year": 2000, "energy": 34.49, "source": "renewables"}, {"Entity": "Spain", "Year": 2001, "energy": 49.3, "source": "renewables"}, {"Entity": "Spain", "Year": 2002, "energy": 33.17, "source": "renewables"}, {"Entity": "Spain", "Year": 2003, "energy": 55.75, "source": "renewables"}, {"Entity": "Spain", "Year": 2004, "energy": 50.13, "source": "renewables"}, {"Entity": "Spain", "Year": 2005, "energy": 42.27, "source": "renewables"}, {"Entity": "Spain", "Year": 2006, "energy": 52.15, "source": "renewables"}, {"Entity": "Spain", "Year": 2007, "energy": 58.3, "source": "renewables"}, {"Entity": "Spain", "Year": 2008, "energy": 62.15, "source": "renewables"}, {"Entity": "Spain", "Year": 2009, "energy": 74.08, "source": "renewables"}, {"Entity": "Spain", "Year": 2010, "energy": 97.77, "source": "renewables"}, {"Entity": "Spain", "Year": 2011, "energy": 87.53, "source": "renewables"}, {"Entity": "Spain", "Year": 2012, "energy": 86.97, "source": "renewables"}, {"Entity": "Spain", "Year": 2013, "energy": 111.42, "source": "renewables"}, {"Entity": "Spain", "Year": 2014, "energy": 110.26, "source": "renewables"}, {"Entity": "Spain", "Year": 2015, "energy": 97.09, "source": "renewables"}, {"Entity": "Spain", "Year": 2016, "energy": 104.63, "source": "renewables"}, {"Entity": "Spain", "Year": 2017, "energy": 87.93, "source": "renewables"}, {"Entity": "Spain", "Year": 2018, "energy": 103.88, "source": "renewables"}, {"Entity": "Spain", "Year": 2019, "energy": 100.99, "source": "renewables"}, {"Entity": "Spain", "Year": 2020, "energy": 113.79, "source": "renewables"}, {"Entity": "Thailand", "Year": 2000, "energy": 6.38, "source": "renewables"}, {"Entity": "Thailand", "Year": 2001, "energy": 6.76, "source": "renewables"}, {"Entity": "Thailand", "Year": 2002, "energy": 8.07, "source": "renewables"}, {"Entity": "Thailand", "Year": 2003, "energy": 8.36, "source": "renewables"}, {"Entity": "Thailand", "Year": 2004, "energy": 7.63, "source": "renewables"}, {"Entity": "Thailand", "Year": 2005, "energy": 7.42, "source": "renewables"}, {"Entity": "Thailand", "Year": 2006, "energy": 9.82, "source": "renewables"}, {"Entity": "Thailand", "Year": 2007, "energy": 10.2, "source": "renewables"}, {"Entity": "Thailand", "Year": 2008, "energy": 8.95, "source": "renewables"}, {"Entity": "Thailand", "Year": 2009, "energy": 9.09, "source": "renewables"}, {"Entity": "Thailand", "Year": 2010, "energy": 8.58, "source": "renewables"}, {"Entity": "Thailand", "Year": 2011, "energy": 11.83, "source": "renewables"}, {"Entity": "Thailand", "Year": 2012, "energy": 13.42, "source": "renewables"}, {"Entity": "Thailand", "Year": 2013, "energy": 12.33, "source": "renewables"}, {"Entity": "Thailand", "Year": 2014, "energy": 13.68, "source": "renewables"}, {"Entity": "Thailand", "Year": 2015, "energy": 13.33, "source": "renewables"}, {"Entity": "Thailand", "Year": 2016, "energy": 15.97, "source": "renewables"}, {"Entity": "Thailand", "Year": 2017, "energy": 19.92, "source": "renewables"}, {"Entity": "Thailand", "Year": 2018, "energy": 25.84, "source": "renewables"}, {"Entity": "Thailand", "Year": 2019, "energy": 28.02, "source": "renewables"}, {"Entity": "Thailand", "Year": 2020, "energy": 24.73, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2000, "energy": 11.28, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2001, "energy": 12.05, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2002, "energy": 9.65, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2003, "energy": 9.27, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2004, "energy": 11.78, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2005, "energy": 12.4, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2006, "energy": 12.92, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2007, "energy": 10.47, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2008, "energy": 11.82, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2009, "energy": 12.12, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2010, "energy": 13.39, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2011, "energy": 11.2, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2012, "energy": 11.23, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2013, "energy": 15.11, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2014, "energy": 10.17, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2015, "energy": 7.1, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2016, "energy": 9.25, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2017, "energy": 10.88, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2018, "energy": 13.02, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2019, "energy": 11.87, "source": "renewables"}, {"Entity": "Ukraine", "Year": 2020, "energy": 17.56, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2000, "energy": 9.98, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2001, "energy": 9.56, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2002, "energy": 11.13, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2003, "energy": 10.62, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2004, "energy": 14.14, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2005, "energy": 16.93, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2006, "energy": 18.11, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2007, "energy": 19.69, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2008, "energy": 21.85, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2009, "energy": 25.25, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2010, "energy": 26.18, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2011, "energy": 35.2, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2012, "energy": 41.24, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2013, "energy": 53.21, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2014, "energy": 64.52, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2015, "energy": 82.57, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2016, "energy": 82.99, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2017, "energy": 98.85, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2018, "energy": 110.03, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2019, "energy": 120.48, "source": "renewables"}, {"Entity": "United Kingdom", "Year": 2020, "energy": 131.74, "source": "renewables"}, {"Entity": "United States", "Year": 2000, "energy": 350.93, "source": "renewables"}, {"Entity": "United States", "Year": 2001, "energy": 280.06, "source": "renewables"}, {"Entity": "United States", "Year": 2002, "energy": 336.34, "source": "renewables"}, {"Entity": "United States", "Year": 2003, "energy": 349.18, "source": "renewables"}, {"Entity": "United States", "Year": 2004, "energy": 345.14, "source": "renewables"}, {"Entity": "United States", "Year": 2005, "energy": 353.04, "source": "renewables"}, {"Entity": "United States", "Year": 2006, "energy": 381.16, "source": "renewables"}, {"Entity": "United States", "Year": 2007, "energy": 347.91, "source": "renewables"}, {"Entity": "United States", "Year": 2008, "energy": 377.11, "source": "renewables"}, {"Entity": "United States", "Year": 2009, "energy": 415.56, "source": "renewables"}, {"Entity": "United States", "Year": 2010, "energy": 424.48, "source": "renewables"}, {"Entity": "United States", "Year": 2011, "energy": 509.74, "source": "renewables"}, {"Entity": "United States", "Year": 2012, "energy": 492.32, "source": "renewables"}, {"Entity": "United States", "Year": 2013, "energy": 520.38, "source": "renewables"}, {"Entity": "United States", "Year": 2014, "energy": 546.83, "source": "renewables"}, {"Entity": "United States", "Year": 2015, "energy": 556.49, "source": "renewables"}, {"Entity": "United States", "Year": 2016, "energy": 624.91, "source": "renewables"}, {"Entity": "United States", "Year": 2017, "energy": 707.19, "source": "renewables"}, {"Entity": "United States", "Year": 2018, "energy": 733.17, "source": "renewables"}, {"Entity": "United States", "Year": 2019, "energy": 760.76, "source": "renewables"}, {"Entity": "United States", "Year": 2020, "energy": 821.4, "source": "renewables"}], "metadata": {"Entity": {"type": "string", "semanticType": "Location"}, "Year": {"type": "number", "semanticType": "Year"}, "energy": {"type": "number", "semanticType": "Number"}, "source": {"type": "string", "semanticType": "String", "levels": ["fossil fuels", "nuclear", "renewables"]}}, "derive": {"code": "\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Unpivot the energy source columns into long format\n transformed_df = pd.melt(\n df_energy,\n id_vars=['Year', 'Entity'],\n value_vars=[\n 'Electricity from fossil fuels (TWh)',\n 'Electricity from nuclear (TWh)',\n 'Electricity from renewables (TWh)'\n ],\n var_name='source',\n value_name='energy'\n )\n \n # Clean up the source names to be more readable\n transformed_df['source'] = transformed_df['source'].str.replace('Electricity from ', '').str.replace(' (TWh)', '')\n \n return transformed_df\n", "source": ["global-energy-20-small.csv"], "dialog": [{"content": "You are a data scientist to help user to transform data that will be used for visualization.\nThe user will provide you information about what data would be needed, and your job is to create a python function based on the input data summary, transformation instruction and expected fields.\nThe users' instruction includes \"chart_type\" and \"chart_encodings\" that describe the visualization they want, and natural language instructions \"goal\" that describe what data is needed.\n\n**Important:**\n- NEVER make assumptions or judgments about a person's gender, biological sex, sexuality, religion, race, nationality, ethnicity, political stance, socioeconomic status, mental health, invisible disabilities, medical conditions, personality type, social impressions, emotional state, and cognitive state.\n- NEVER create formulas that could be used to discriminate based on age. Ageism of any form (explicit and implicit) is strictly prohibited.\n- If above issue occurs, generate columns with np.nan.\n\nConcretely, you should first refine users' goal and then create a python function in the output section based off the [CONTEXT] and [GOAL]:\n\n 1. First, refine users' [GOAL]. The main objective in this step is to check if \"chart_type\" and \"chart_encodings\" provided by the user are sufficient to achieve their \"goal\". Concretely:\n - based on the user's \"goal\" and \"chart_type\" and \"chart_encodings\", elaborate the goal into a \"detailed_instruction\".\n - \"display_instruction\" is a short verb phrase describing the users' goal. \n - it would be a short verbal description of user intent as a verb phrase (<12 words).\n - generate it based on detailed_instruction and the suggested chart_type and chart_encodings, but don't need to mention the chart details.\n - should capture key computation ideas: by reading the display, the user can understand the purpose and what's derived from the data.\n - if the user specification follows up the previous instruction, the 'display_instruction' should only describe how it builds up the previous instruction without repeating information from previous steps.\n - the phrase can be presented in different styles, e.g., question (what's xxx), instruction (show xxx), description, etc.\n - if you mention column names from the input or the output data, highlight the text in **bold**.\n * the column can either be a column in the input data, or a new column that will be computed in the output data.\n * the mention don't have to be exact match, it can be semantically matching, e.g., if you mentioned \"average score\" in the text while the column to be computed is \"Avg_Score\", you should still highlight \"**average score**\" in the text.\n - determine \"output_fields\", the desired fields that the output data should have to achieve the user's goal, it's a good idea to include intermediate fields here.\n - then decide \"chart_encodings\", which maps visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized, \n - the \"chart_encodings\" should be created to support the user's \"chart_type\".\n - first, determine whether the user has provided sufficient fields in \"chart_encodings\" that are needed to achieve their goal:\n - if the user's \"chart_encodings\" are sufficient, simply copy it.\n - if the user didn't provide sufficient fields in \"chart_encodings\", add missing fields in \"chart_encodings\" (ordered them based on whether the field will be used in x,y axes or legends);\n - \"chart_encodings\" should only include fields that will be visualized (do not include other intermediate fields from \"output_fields\") \n - when adding new fields to \"chart_encodings\", be efficient and add only a minimal number of fields that are needed to achive the user's goal. \n - generally, the total number of fields in \"chart_encodings\" should be no more than 3 for x,y,legend.\n - if the user's \"chart_encodings\" is sufficient but can be optimized, you can reorder encodings to visualize the data more effectively.\n - sometimes, user may provide instruction to update visualizations fields they provided. You should leverage the user's goal to resolve the conflict and decide the final \"chart_encodings\"\n - e.g., they may mention \"use B metric instead\" while A metric is in provided fields, in this case, you should update \"chart_encodings\" to update A metric with B metric.\n - guide on statistical analysis:\n - when the user asks for forecasting or regression analysis, you should consider the following:\n - the output should be a long format table where actual x, y pairs and predicted x, y pairs are included in the X, Y columns, they are differentiated with a third column \"is_predicted\" that is a boolean field.\n - i.e., if the user ask for forecasting based on two columns T and Y, the output should be three columns: T, Y, is_predicted, where\n - T, Y columns contain BOTH original values from the data and predicted values from the data.\n - is_predicted is a boolean field to indicate whether the x, y pairs are original values from the data or predicted / regression values from the data.\n - the recommended chart should be line chart (time series) or scatter plot (quantitative x, y)\n - if the user asks for forecasting, it's good to include predicted x, y pairs for both x in the original data and future x values (i.e., combine regression and forecasting results)\n - in this case, is_predicted should be of three values 'original', 'regression', 'forecasting'\n - when the user asks for clustering:\n - the output should be a long format table where actual x, y pairs with a third column \"cluster_id\" that indicates the cluster id of the data point.\n - the recommended chart should be scatter plot (quantitative x, y)\n \n Prepare the result in the following json format:\n\n```\n{\n \"detailed_instruction\": \"...\" // string, elaborate user instruction with details if the user\n \"display_instruction\": \"...\" // string, the short verb phrase describing the users' goal.\n \"output_fields\": [...] // string[], describe the desired output fields that the output data should have based on the user's goal, it's a good idea to preserve intermediate fields here (i.e., the goal of transformed data)\n \"chart_encodings\": {\n \"x\": \"\",\n \"y\": \"\",\n \"color\": \"\",\n \"size\": \"\",\n \"opacity\": \"\",\n \"facet\": \"\",\n ... // other visualization channels user used\n } // object: map visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized.\n \"reason\": \"...\" // string, explain why this refinement is made\n}\n```\n\n 2. Then, write a python function based on the refined goal, the function input is a dataframe \"df\" (or multiple dataframes based on tables presented in the [CONTEXT] section) and the output is the transformed dataframe \"transformed_df\". \"transformed_df\" should contain all \"output_fields\" from the refined goal.\nThe python function must follow the template provided in [TEMPLATE], do not import any other libraries or modify function name. The function should be as simple as possible and easily readable.\nIf there is no data transformation needed based on \"output_fields\", the transformation function can simply \"return df\".\n\n[TEMPLATE]\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\nfrom sklearn import ... # import necessary libraries from sklearn if needed\n\ndef transform_data(df1, df2, ...): \n # complete the template here\n return transformed_df\n```\n\nnote: \n- if the user provided one table, then it should be `def transform_data(df1)`, if the user provided multiple tables, then it should be `def transform_data(df1, df2, ...)` and you should consider the join between tables to derive the output.\n- **VERY IMPORTANT** the number of arguments in the function must match the number of tables provided, and the order of arguments must match the order of tables provided.\n- try to use intuitive table names to refer to the input dataframes, for example, if the user provided two tables city and weather, you can use `transform_data(df_city, df_weather)` to refer to the two dataframes, as long as the number and order of the arguments match the number and order of the tables provided.\n- datetime objects handling:\n - if the output field is year, convert it to number, if it is year-month / year-month-day, convert it to string object (e.g., \"2020-01\" / \"2020-01-01\").\n - if the output is time only: convert hour to number if it's just the hour (e.g., 10), but convert hour:min or h:m:s to string object (e.g., \"10:30\", \"10:30:45\")\n - never return datetime object directly, convert it to either number (if it only contains year) or string so it's readable.\n\n 3. The output must only contain a json object representing the refined goal and a python code block representing the transformation code, do not add any extra text explanation.\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 158029.9988, 190729.9957, 209229.9957, ..., 6993180.176, 9861099.609, 9874660.156, 10096009.77\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 90.14, 91.79, 92.54, 804.95\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[GOAL]\n\n{\n \"instruction\": \"\",\n \"chart_type\": \"Line Chart\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Transform the data into a long format where different energy sources (fossil fuels, nuclear, renewables) are unpivoted into separate rows. Each row should contain the year, entity (country), energy source type, and the corresponding energy value in TWh. This will enable visualization of energy generation trends over time by source and country.\",\n \"display_instruction\": \"Show **energy generation** trends by **source** and **Entity** over **Year**\",\n \"output_fields\": [\"Year\", \"Entity\", \"source\", \"energy\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n },\n \"reason\": \"The user wants to visualize energy data across different sources with a line chart faceted by source. The original data has energy sources in separate columns (fossil fuels, nuclear, renewables), so we need to unpivot these columns into a long format with 'source' and 'energy' fields. The chart_encodings provided by the user are appropriate for this visualization goal, using faceting (column) to separate different energy sources.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Unpivot the energy source columns into long format\n transformed_df = pd.melt(\n df_energy,\n id_vars=['Year', 'Entity'],\n value_vars=[\n 'Electricity from fossil fuels (TWh)',\n 'Electricity from nuclear (TWh)',\n 'Electricity from renewables (TWh)'\n ],\n var_name='source',\n value_name='energy'\n )\n \n # Clean up the source names to be more readable\n transformed_df['source'] = transformed_df['source'].str.replace('Electricity from ', '').str.replace(' (TWh)', '')\n \n return transformed_df\n```", "role": "assistant"}], "trigger": {"tableId": "global-energy-20-small.csv", "resultTableId": "table-82", "chart": {"id": "chart-1760738394155", "chartType": "Line Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"channel": "y", "bin": false, "fieldID": "concept-1760738385163"}, "color": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "opacity": {"channel": "opacity", "bin": false}, "column": {"channel": "column", "bin": false, "fieldID": "concept-1760738389404"}, "row": {"channel": "row", "bin": false}}, "tableRef": "global-energy-20-small.csv", "saved": false, "source": "trigger", "unread": false}, "interaction": [{"from": "user", "to": "datatransform-agent", "role": "instruction", "content": "", "displayContent": "Show **energy generation** trends by **source** and **Entity** over **Year**"}]}, "explanation": {"agent": "CodeExplanationAgent", "code": "The code performs a data transformation to restructure the energy data from a wide format to a long format:\n\n1. **Unpivot energy columns**: Uses `pd.melt()` to transform the three energy source columns (**Electricity from fossil fuels**, **Electricity from nuclear**, and **Electricity from renewables**) from separate columns into rows, while keeping **Year** and **Entity** as identifier columns.\n\n2. **Create source classification**: Generates a new **source** column that captures which energy type each row represents, and stores the corresponding energy values in an **energy** column.\n\n3. **Clean source labels**: Removes the prefix \"Electricity from \" and suffix \" (TWh)\" from the source names, resulting in cleaner labels: **fossil fuels**, **nuclear**, and **renewables**.\n\n4. **Return restructured data**: The final dataset has each country-year-energy source combination as a separate row, making it easier to analyze and visualize energy mix across countries and time periods.", "concepts": [], "dialog": [{"content": "You are a data scientist to help user explain code, \nso that a non-code can clearly understand what the code is doing, you are provided with a summary of the input data, and the transformation code.\n\nYour goal:\n1. You should generate a good itemized explanation of the code so that the reader can understand high-level steps of what the data transformation is doing.\n - Be very concise, and stay at a high-level. The reader doesn't understand code and does not want to learn exactly what the code is doing. They just want to learn what have been done from a logical level.\n - The explanation should be a markdown string that is a list of bullet points (with new lines), highlight constants, data fields, and important verbs.\n2. Generate a list of explanations for new fields (fields not from the input data) that introduce metrics/concepts that are not obvious from the code.\n - provide a declarative definition that explains the new field, use a mathematical notation if applicable.\n - only include new fields explanation of new metrics that are involved in computation (e.g., ROI, commerical_success_score)\n - *DO NOT* explain trivial new fields like \"Decade\" or \"Avg_Rating\", \"US_Sales\" that are self-explanatory.\n - Avoid explaining fields that are simple aggregate of fields in the original data (min_score, avg_value, count, etc.)\n - When a field involves mathematical computation, you can use LaTeX math notation in the explanation. Format mathematical expressions using:\n - Inline math: `\\( ... \\)` for formulas within text\n - Block math: `\\[ ... \\]` for standalone formulas\n - Examples: `\\( \\frac{\\text{Revenue}}{\\text{Cost}} \\)` for ratios, `\\[ \\text{Score} = \\text{Rating} \\times \\text{Worldwide\\_Gross} \\]` for formulas\n - note: when using underscores as part of the text, you need to escape them with a backslash, e.g., `\\_`\n - Note: don't use math notation for fields whose computation is trivial (use plain english), it will likely be confusing to the reader. \n Only use math notation for fields that can not be easilyexplained in plain english. Use it sparingly.\n3. If there are multiple fields that have the similar computation, you can explain them together in one explanation.\n - in \"field\", you can provide a list of fields in format of \"field1, field2, ...\"\n - in \"explanation\", you can provide a single explanation for the computation of the fields.\n - for example, if you have fields like \"Norm_Rating\", \"Norm_Gross\", \"Critical_Commercial_Score\", you can explain Norm_Rating, Norm_Gross together in one explanation and explain Critical_Commercial_Score in another explanation.\n4. If the code is about statistical analysis, you should explain the statistical analysis in the explanation as a concept named \"Statistical Analysis\" in the [CONCEPTS EXPLANATION] section.\n - explain how you model the data, which fields are used, how data processing is done, and what models are used.\n - suggest some other modeling approaches that can be used to analyze the data in the explanation as well.\n \nThe focus is to explain how new fields are computed, don't generate explanation for low-level actions like \"return\", \"load data\" etc. \n\nProvide the result in the following two sections:\n - first section is the code explanation that should be a markdown block explaining the code, in the [CODE EXPLANATION] section.\n - remember to highlight constants, data fields, and important verbs in the code explanation.\n - second section is the concepts explanation that should be a json block (start with ```json) in the [CONCEPTS EXPLANATION] section.\n\n[CODE EXPLANATION]\n\n...(explanation of the code)\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"...\",\n \"explanation\": \"...\"\n }\n]\n\n```\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small_csv)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 190729.9957, 227580.0018, 233600.0061, ..., 5736319.824, 5738290.039, nan, 9861099.609\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 90.14, 91.79, 92.54, nan\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[CODE]\n\nhere is the transformation code: \nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Unpivot the energy source columns into long format\n transformed_df = pd.melt(\n df_energy,\n id_vars=['Year', 'Entity'],\n value_vars=[\n 'Electricity from fossil fuels (TWh)',\n 'Electricity from nuclear (TWh)',\n 'Electricity from renewables (TWh)'\n ],\n var_name='source',\n value_name='energy'\n )\n \n # Clean up the source names to be more readable\n transformed_df['source'] = transformed_df['source'].str.replace('Electricity from ', '').str.replace(' (TWh)', '')\n \n return transformed_df\n\n\n[EXPLANATION]\n", "role": "user"}, {"content": "[CODE EXPLANATION]\n\nThe code performs a data transformation to restructure the energy data from a wide format to a long format:\n\n1. **Unpivot energy columns**: Uses `pd.melt()` to transform the three energy source columns (**Electricity from fossil fuels**, **Electricity from nuclear**, and **Electricity from renewables**) from separate columns into rows, while keeping **Year** and **Entity** as identifier columns.\n\n2. **Create source classification**: Generates a new **source** column that captures which energy type each row represents, and stores the corresponding energy values in an **energy** column.\n\n3. **Clean source labels**: Removes the prefix \"Electricity from \" and suffix \" (TWh)\" from the source names, resulting in cleaner labels: **fossil fuels**, **nuclear**, and **renewables**.\n\n4. **Return restructured data**: The final dataset has each country-year-energy source combination as a separate row, making it easier to analyze and visualize energy mix across countries and time periods.\n\n[CONCEPTS EXPLANATION]\n\n```json\n[]\n```", "role": "assistant"}], "status": "ok"}, "status": "completed"}, "anchored": false, "attachedMetadata": ""}, {"kind": "table", "id": "table-45", "displayId": "renewable-energy", "names": ["Entity", "Year", "renewable_percentage"], "rows": [{"Entity": "Australia", "Year": 2000, "renewable_percentage": 8.6344368187}, {"Entity": "Australia", "Year": 2001, "renewable_percentage": 8.2180135078}, {"Entity": "Australia", "Year": 2002, "renewable_percentage": 8.0833022736}, {"Entity": "Australia", "Year": 2003, "renewable_percentage": 8.6598324205}, {"Entity": "Australia", "Year": 2004, "renewable_percentage": 8.7013045232}, {"Entity": "Australia", "Year": 2005, "renewable_percentage": 9.1562355123}, {"Entity": "Australia", "Year": 2006, "renewable_percentage": 9.6357600837}, {"Entity": "Australia", "Year": 2007, "renewable_percentage": 9.1190310213}, {"Entity": "Australia", "Year": 2008, "renewable_percentage": 8.0548900022}, {"Entity": "Australia", "Year": 2009, "renewable_percentage": 7.8043793133}, {"Entity": "Australia", "Year": 2010, "renewable_percentage": 9.0442152121}, {"Entity": "Australia", "Year": 2011, "renewable_percentage": 11.3454273735}, {"Entity": "Australia", "Year": 2012, "renewable_percentage": 11.4105750279}, {"Entity": "Australia", "Year": 2013, "renewable_percentage": 14.8708583355}, {"Entity": "Australia", "Year": 2014, "renewable_percentage": 14.9621290509}, {"Entity": "Australia", "Year": 2015, "renewable_percentage": 14.3476000693}, {"Entity": "Australia", "Year": 2016, "renewable_percentage": 15.6093794449}, {"Entity": "Australia", "Year": 2017, "renewable_percentage": 16.3138729943}, {"Entity": "Australia", "Year": 2018, "renewable_percentage": 17.145938174}, {"Entity": "Australia", "Year": 2019, "renewable_percentage": 21.3759705435}, {"Entity": "Australia", "Year": 2020, "renewable_percentage": 25.5031684668}, {"Entity": "Brazil", "Year": 2000, "renewable_percentage": 90.1307723743}, {"Entity": "Brazil", "Year": 2001, "renewable_percentage": 84.6953615744}, {"Entity": "Brazil", "Year": 2002, "renewable_percentage": 86.0883364189}, {"Entity": "Brazil", "Year": 2003, "renewable_percentage": 87.4561159097}, {"Entity": "Brazil", "Year": 2004, "renewable_percentage": 86.4260041451}, {"Entity": "Brazil", "Year": 2005, "renewable_percentage": 87.6781562721}, {"Entity": "Brazil", "Year": 2006, "renewable_percentage": 87.2842473236}, {"Entity": "Brazil", "Year": 2007, "renewable_percentage": 88.7252098726}, {"Entity": "Brazil", "Year": 2008, "renewable_percentage": 84.8072313004}, {"Entity": "Brazil", "Year": 2009, "renewable_percentage": 89.4172280725}, {"Entity": "Brazil", "Year": 2010, "renewable_percentage": 85.3576882415}, {"Entity": "Brazil", "Year": 2011, "renewable_percentage": 87.6618820986}, {"Entity": "Brazil", "Year": 2012, "renewable_percentage": 83.1164558813}, {"Entity": "Brazil", "Year": 2013, "renewable_percentage": 77.5240022006}, {"Entity": "Brazil", "Year": 2014, "renewable_percentage": 74.0418657409}, {"Entity": "Brazil", "Year": 2015, "renewable_percentage": 75.0231817625}, {"Entity": "Brazil", "Year": 2016, "renewable_percentage": 81.0938046902}, {"Entity": "Brazil", "Year": 2017, "renewable_percentage": 79.9091472228}, {"Entity": "Brazil", "Year": 2018, "renewable_percentage": 82.9198505403}, {"Entity": "Brazil", "Year": 2019, "renewable_percentage": 82.8548799017}, {"Entity": "Brazil", "Year": 2020, "renewable_percentage": 84.6411771408}, {"Entity": "Canada", "Year": 2000, "renewable_percentage": 61.8095917882}, {"Entity": "Canada", "Year": 2001, "renewable_percentage": 59.3287558747}, {"Entity": "Canada", "Year": 2002, "renewable_percentage": 61.1477403113}, {"Entity": "Canada", "Year": 2003, "renewable_percentage": 60.0789685174}, {"Entity": "Canada", "Year": 2004, "renewable_percentage": 59.6967771845}, {"Entity": "Canada", "Year": 2005, "renewable_percentage": 60.8208155391}, {"Entity": "Canada", "Year": 2006, "renewable_percentage": 60.8271602855}, {"Entity": "Canada", "Year": 2007, "renewable_percentage": 61.2460642446}, {"Entity": "Canada", "Year": 2008, "renewable_percentage": 62.6520720838}, {"Entity": "Canada", "Year": 2009, "renewable_percentage": 63.8919227732}, {"Entity": "Canada", "Year": 2010, "renewable_percentage": 62.9421470558}, {"Entity": "Canada", "Year": 2011, "renewable_percentage": 64.0922915917}, {"Entity": "Canada", "Year": 2012, "renewable_percentage": 65.098730952}, {"Entity": "Canada", "Year": 2013, "renewable_percentage": 65.4320794066}, {"Entity": "Canada", "Year": 2014, "renewable_percentage": 64.791145907}, {"Entity": "Canada", "Year": 2015, "renewable_percentage": 65.2946239925}, {"Entity": "Canada", "Year": 2016, "renewable_percentage": 66.1890584295}, {"Entity": "Canada", "Year": 2017, "renewable_percentage": 67.5399410579}, {"Entity": "Canada", "Year": 2018, "renewable_percentage": 67.3685700357}, {"Entity": "Canada", "Year": 2019, "renewable_percentage": 67.1741623137}, {"Entity": "Canada", "Year": 2020, "renewable_percentage": 68.7796436354}, {"Entity": "China", "Year": 2000, "renewable_percentage": 16.639126586}, {"Entity": "China", "Year": 2001, "renewable_percentage": 18.9581237042}, {"Entity": "China", "Year": 2002, "renewable_percentage": 17.6185006046}, {"Entity": "China", "Year": 2003, "renewable_percentage": 15.0362717081}, {"Entity": "China", "Year": 2004, "renewable_percentage": 16.2224108273}, {"Entity": "China", "Year": 2005, "renewable_percentage": 16.1731179957}, {"Entity": "China", "Year": 2006, "renewable_percentage": 15.5884036124}, {"Entity": "China", "Year": 2007, "renewable_percentage": 15.2583847828}, {"Entity": "China", "Year": 2008, "renewable_percentage": 19.0253335469}, {"Entity": "China", "Year": 2009, "renewable_percentage": 17.8857170547}, {"Entity": "China", "Year": 2010, "renewable_percentage": 18.7800759915}, {"Entity": "China", "Year": 2011, "renewable_percentage": 16.8902341543}, {"Entity": "China", "Year": 2012, "renewable_percentage": 20.122965176}, {"Entity": "China", "Year": 2013, "renewable_percentage": 20.2152481955}, {"Entity": "China", "Year": 2014, "renewable_percentage": 22.3502204285}, {"Entity": "China", "Year": 2015, "renewable_percentage": 24.079270189}, {"Entity": "China", "Year": 2016, "renewable_percentage": 25.0007798429}, {"Entity": "China", "Year": 2017, "renewable_percentage": 25.419242299}, {"Entity": "China", "Year": 2018, "renewable_percentage": 25.7747942589}, {"Entity": "China", "Year": 2019, "renewable_percentage": 26.9995671106}, {"Entity": "China", "Year": 2020, "renewable_percentage": 28.2464606924}, {"Entity": "France", "Year": 2000, "renewable_percentage": 12.7117691154}, {"Entity": "France", "Year": 2001, "renewable_percentage": 13.9961372206}, {"Entity": "France", "Year": 2002, "renewable_percentage": 11.3544157067}, {"Entity": "France", "Year": 2003, "renewable_percentage": 10.9783540506}, {"Entity": "France", "Year": 2004, "renewable_percentage": 11.0051305559}, {"Entity": "France", "Year": 2005, "renewable_percentage": 9.6479837153}, {"Entity": "France", "Year": 2006, "renewable_percentage": 10.7235915493}, {"Entity": "France", "Year": 2007, "renewable_percentage": 11.4370075239}, {"Entity": "France", "Year": 2008, "renewable_percentage": 12.7487441615}, {"Entity": "France", "Year": 2009, "renewable_percentage": 12.8776856068}, {"Entity": "France", "Year": 2010, "renewable_percentage": 13.6240072491}, {"Entity": "France", "Year": 2011, "renewable_percentage": 11.63553049}, {"Entity": "France", "Year": 2012, "renewable_percentage": 15.0331522889}, {"Entity": "France", "Year": 2013, "renewable_percentage": 17.2469424928}, {"Entity": "France", "Year": 2014, "renewable_percentage": 16.6074992494}, {"Entity": "France", "Year": 2015, "renewable_percentage": 16.002230276}, {"Entity": "France", "Year": 2016, "renewable_percentage": 17.7212924013}, {"Entity": "France", "Year": 2017, "renewable_percentage": 16.6576751547}, {"Entity": "France", "Year": 2018, "renewable_percentage": 19.7315179827}, {"Entity": "France", "Year": 2019, "renewable_percentage": 20.0116665488}, {"Entity": "France", "Year": 2020, "renewable_percentage": 23.7610241821}, {"Entity": "Germany", "Year": 2000, "renewable_percentage": 6.1977983575}, {"Entity": "Germany", "Year": 2001, "renewable_percentage": 6.5132585197}, {"Entity": "Germany", "Year": 2002, "renewable_percentage": 7.6431369854}, {"Entity": "Germany", "Year": 2003, "renewable_percentage": 7.7455438643}, {"Entity": "Germany", "Year": 2004, "renewable_percentage": 9.4989185292}, {"Entity": "Germany", "Year": 2005, "renewable_percentage": 10.3356645637}, {"Entity": "Germany", "Year": 2006, "renewable_percentage": 11.5129959829}, {"Entity": "Germany", "Year": 2007, "renewable_percentage": 14.135471525}, {"Entity": "Germany", "Year": 2008, "renewable_percentage": 14.8894504106}, {"Entity": "Germany", "Year": 2009, "renewable_percentage": 16.2902842395}, {"Entity": "Germany", "Year": 2010, "renewable_percentage": 16.8384989754}, {"Entity": "Germany", "Year": 2011, "renewable_percentage": 20.4967199299}, {"Entity": "Germany", "Year": 2012, "renewable_percentage": 23.056464482}, {"Entity": "Germany", "Year": 2013, "renewable_percentage": 24.1368929731}, {"Entity": "Germany", "Year": 2014, "renewable_percentage": 26.2182434067}, {"Entity": "Germany", "Year": 2015, "renewable_percentage": 29.4721888318}, {"Entity": "Germany", "Year": 2016, "renewable_percentage": 29.4990435013}, {"Entity": "Germany", "Year": 2017, "renewable_percentage": 33.4855497593}, {"Entity": "Germany", "Year": 2018, "renewable_percentage": 35.0976735365}, {"Entity": "Germany", "Year": 2019, "renewable_percentage": 40.0890757144}, {"Entity": "Germany", "Year": 2020, "renewable_percentage": 44.3324048937}, {"Entity": "India", "Year": 2000, "renewable_percentage": 14.0481982534}, {"Entity": "India", "Year": 2001, "renewable_percentage": 12.9997099422}, {"Entity": "India", "Year": 2002, "renewable_percentage": 11.938193032}, {"Entity": "India", "Year": 2003, "renewable_percentage": 11.695109147}, {"Entity": "India", "Year": 2004, "renewable_percentage": 15.6375300722}, {"Entity": "India", "Year": 2005, "renewable_percentage": 15.2543575768}, {"Entity": "India", "Year": 2006, "renewable_percentage": 17.1352578483}, {"Entity": "India", "Year": 2007, "renewable_percentage": 17.8019742295}, {"Entity": "India", "Year": 2008, "renewable_percentage": 16.768266921}, {"Entity": "India", "Year": 2009, "renewable_percentage": 15.269804822}, {"Entity": "India", "Year": 2010, "renewable_percentage": 15.2122201244}, {"Entity": "India", "Year": 2011, "renewable_percentage": 16.7911025145}, {"Entity": "India", "Year": 2012, "renewable_percentage": 15.1350014654}, {"Entity": "India", "Year": 2013, "renewable_percentage": 16.3941577818}, {"Entity": "India", "Year": 2014, "renewable_percentage": 16.0092550039}, {"Entity": "India", "Year": 2015, "renewable_percentage": 15.3718720687}, {"Entity": "India", "Year": 2016, "renewable_percentage": 14.8548475703}, {"Entity": "India", "Year": 2017, "renewable_percentage": 15.9669920335}, {"Entity": "India", "Year": 2018, "renewable_percentage": 16.6949549709}, {"Entity": "India", "Year": 2019, "renewable_percentage": 18.6915426873}, {"Entity": "India", "Year": 2020, "renewable_percentage": 20.2059243238}, {"Entity": "Indonesia", "Year": 2000, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2001, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2002, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2003, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2004, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2005, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2006, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2007, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2008, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2009, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2010, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2011, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2012, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2013, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2014, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2015, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2016, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2017, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2018, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2019, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2020, "renewable_percentage": null}, {"Entity": "Italy", "Year": 2000, "renewable_percentage": 18.900241501}, {"Entity": "Italy", "Year": 2001, "renewable_percentage": 20.049431902}, {"Entity": "Italy", "Year": 2002, "renewable_percentage": 17.4555571614}, {"Entity": "Italy", "Year": 2003, "renewable_percentage": 16.4202116476}, {"Entity": "Italy", "Year": 2004, "renewable_percentage": 18.2749380999}, {"Entity": "Italy", "Year": 2005, "renewable_percentage": 16.3769782226}, {"Entity": "Italy", "Year": 2006, "renewable_percentage": 16.5128639906}, {"Entity": "Italy", "Year": 2007, "renewable_percentage": 15.5333485238}, {"Entity": "Italy", "Year": 2008, "renewable_percentage": 18.6112}, {"Entity": "Italy", "Year": 2009, "renewable_percentage": 24.0837332221}, {"Entity": "Italy", "Year": 2010, "renewable_percentage": 25.8400187976}, {"Entity": "Italy", "Year": 2011, "renewable_percentage": 27.6773203443}, {"Entity": "Italy", "Year": 2012, "renewable_percentage": 31.1049649217}, {"Entity": "Italy", "Year": 2013, "renewable_percentage": 39.0148744209}, {"Entity": "Italy", "Year": 2014, "renewable_percentage": 43.4976931949}, {"Entity": "Italy", "Year": 2015, "renewable_percentage": 38.7577860829}, {"Entity": "Italy", "Year": 2016, "renewable_percentage": 37.6079387187}, {"Entity": "Italy", "Year": 2017, "renewable_percentage": 35.4174479255}, {"Entity": "Italy", "Year": 2018, "renewable_percentage": 39.8100142663}, {"Entity": "Italy", "Year": 2019, "renewable_percentage": 39.7563068474}, {"Entity": "Italy", "Year": 2020, "renewable_percentage": 42.0397741576}, {"Entity": "Japan", "Year": 2000, "renewable_percentage": 10.5382436261}, {"Entity": "Japan", "Year": 2001, "renewable_percentage": 10.447653504}, {"Entity": "Japan", "Year": 2002, "renewable_percentage": 10.2477294843}, {"Entity": "Japan", "Year": 2003, "renewable_percentage": 11.6993698448}, {"Entity": "Japan", "Year": 2004, "renewable_percentage": 11.4198974767}, {"Entity": "Japan", "Year": 2005, "renewable_percentage": 9.9068127192}, {"Entity": "Japan", "Year": 2006, "renewable_percentage": 10.8554989442}, {"Entity": "Japan", "Year": 2007, "renewable_percentage": 9.3897588285}, {"Entity": "Japan", "Year": 2008, "renewable_percentage": 10.0196834738}, {"Entity": "Japan", "Year": 2009, "renewable_percentage": 10.4667464874}, {"Entity": "Japan", "Year": 2010, "renewable_percentage": 10.5269966826}, {"Entity": "Japan", "Year": 2011, "renewable_percentage": 11.1272421632}, {"Entity": "Japan", "Year": 2012, "renewable_percentage": 10.6143703421}, {"Entity": "Japan", "Year": 2013, "renewable_percentage": 11.7965798852}, {"Entity": "Japan", "Year": 2014, "renewable_percentage": 13.2719619718}, {"Entity": "Japan", "Year": 2015, "renewable_percentage": 15.6586817408}, {"Entity": "Japan", "Year": 2016, "renewable_percentage": 15.6920107068}, {"Entity": "Japan", "Year": 2017, "renewable_percentage": 17.3559698312}, {"Entity": "Japan", "Year": 2018, "renewable_percentage": 18.144181175}, {"Entity": "Japan", "Year": 2019, "renewable_percentage": 19.4223288251}, {"Entity": "Japan", "Year": 2020, "renewable_percentage": 21.324925062}, {"Entity": "Kazakhstan", "Year": 2000, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2001, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2002, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2003, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2004, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2005, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2006, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2007, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2008, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2009, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2010, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2011, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2012, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2013, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2014, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2015, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2016, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2017, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2018, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2019, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2020, "renewable_percentage": null}, {"Entity": "Mexico", "Year": 2000, "renewable_percentage": 22.9291160107}, {"Entity": "Mexico", "Year": 2001, "renewable_percentage": 19.6649599841}, {"Entity": "Mexico", "Year": 2002, "renewable_percentage": 17.4220963173}, {"Entity": "Mexico", "Year": 2003, "renewable_percentage": 15.8536585366}, {"Entity": "Mexico", "Year": 2004, "renewable_percentage": 17.3134463687}, {"Entity": "Mexico", "Year": 2005, "renewable_percentage": 18.2780827246}, {"Entity": "Mexico", "Year": 2006, "renewable_percentage": 18.4256091896}, {"Entity": "Mexico", "Year": 2007, "renewable_percentage": 17.2761561168}, {"Entity": "Mexico", "Year": 2008, "renewable_percentage": 21.5387105913}, {"Entity": "Mexico", "Year": 2009, "renewable_percentage": 16.5369729069}, {"Entity": "Mexico", "Year": 2010, "renewable_percentage": 19.4281608109}, {"Entity": "Mexico", "Year": 2011, "renewable_percentage": 18.0916357408}, {"Entity": "Mexico", "Year": 2012, "renewable_percentage": 16.5759438104}, {"Entity": "Mexico", "Year": 2013, "renewable_percentage": 15.5492898914}, {"Entity": "Mexico", "Year": 2014, "renewable_percentage": 19.8008201523}, {"Entity": "Mexico", "Year": 2015, "renewable_percentage": 17.5976903451}, {"Entity": "Mexico", "Year": 2016, "renewable_percentage": 17.4806943436}, {"Entity": "Mexico", "Year": 2017, "renewable_percentage": 18.0759526428}, {"Entity": "Mexico", "Year": 2018, "renewable_percentage": 17.703752786}, {"Entity": "Mexico", "Year": 2019, "renewable_percentage": 18.5487927565}, {"Entity": "Mexico", "Year": 2020, "renewable_percentage": 21.2552224134}, {"Entity": "Poland", "Year": 2000, "renewable_percentage": 1.6273222517}, {"Entity": "Poland", "Year": 2001, "renewable_percentage": 1.934316727}, {"Entity": "Poland", "Year": 2002, "renewable_percentage": 1.9439960699}, {"Entity": "Poland", "Year": 2003, "renewable_percentage": 1.4999000067}, {"Entity": "Poland", "Year": 2004, "renewable_percentage": 2.1016681991}, {"Entity": "Poland", "Year": 2005, "renewable_percentage": 2.4830699774}, {"Entity": "Poland", "Year": 2006, "renewable_percentage": 2.673730134}, {"Entity": "Poland", "Year": 2007, "renewable_percentage": 3.4256513785}, {"Entity": "Poland", "Year": 2008, "renewable_percentage": 4.2744438696}, {"Entity": "Poland", "Year": 2009, "renewable_percentage": 5.7515388179}, {"Entity": "Poland", "Year": 2010, "renewable_percentage": 6.9299363057}, {"Entity": "Poland", "Year": 2011, "renewable_percentage": 8.0547205693}, {"Entity": "Poland", "Year": 2012, "renewable_percentage": 10.4436057663}, {"Entity": "Poland", "Year": 2013, "renewable_percentage": 10.4081508145}, {"Entity": "Poland", "Year": 2014, "renewable_percentage": 12.5331481248}, {"Entity": "Poland", "Year": 2015, "renewable_percentage": 13.8151485631}, {"Entity": "Poland", "Year": 2016, "renewable_percentage": 13.7335179722}, {"Entity": "Poland", "Year": 2017, "renewable_percentage": 14.1999646913}, {"Entity": "Poland", "Year": 2018, "renewable_percentage": 12.7559148032}, {"Entity": "Poland", "Year": 2019, "renewable_percentage": 15.6157998037}, {"Entity": "Poland", "Year": 2020, "renewable_percentage": 17.9648720886}, {"Entity": "Saudi Arabia", "Year": 2000, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2001, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2002, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2003, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2004, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2005, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2006, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2007, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2008, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2009, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2010, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2011, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2012, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2013, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2014, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2015, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2016, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2017, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2018, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2019, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2020, "renewable_percentage": null}, {"Entity": "South Africa", "Year": 2000, "renewable_percentage": 0.9110805721}, {"Entity": "South Africa", "Year": 2001, "renewable_percentage": 1.2516536074}, {"Entity": "South Africa", "Year": 2002, "renewable_percentage": 1.3802249619}, {"Entity": "South Africa", "Year": 2003, "renewable_percentage": 0.545271261}, {"Entity": "South Africa", "Year": 2004, "renewable_percentage": 0.5827199439}, {"Entity": "South Africa", "Year": 2005, "renewable_percentage": 0.763458686}, {"Entity": "South Africa", "Year": 2006, "renewable_percentage": 1.3863060017}, {"Entity": "South Africa", "Year": 2007, "renewable_percentage": 0.5267209594}, {"Entity": "South Africa", "Year": 2008, "renewable_percentage": 0.6895692269}, {"Entity": "South Africa", "Year": 2009, "renewable_percentage": 0.8031088083}, {"Entity": "South Africa", "Year": 2010, "renewable_percentage": 1.0330068318}, {"Entity": "South Africa", "Year": 2011, "renewable_percentage": 1.0184465622}, {"Entity": "South Africa", "Year": 2012, "renewable_percentage": 0.6890826069}, {"Entity": "South Africa", "Year": 2013, "renewable_percentage": 0.6792168043}, {"Entity": "South Africa", "Year": 2014, "renewable_percentage": 1.4288129861}, {"Entity": "South Africa", "Year": 2015, "renewable_percentage": 2.6256790549}, {"Entity": "South Africa", "Year": 2016, "renewable_percentage": 3.2586126531}, {"Entity": "South Africa", "Year": 2017, "renewable_percentage": 4.2202606137}, {"Entity": "South Africa", "Year": 2018, "renewable_percentage": 5.1554655529}, {"Entity": "South Africa", "Year": 2019, "renewable_percentage": 5.3589699864}, {"Entity": "South Africa", "Year": 2020, "renewable_percentage": 5.780581212}, {"Entity": "Spain", "Year": 2000, "renewable_percentage": 15.6119862394}, {"Entity": "Spain", "Year": 2001, "renewable_percentage": 21.1524434719}, {"Entity": "Spain", "Year": 2002, "renewable_percentage": 13.8260180901}, {"Entity": "Spain", "Year": 2003, "renewable_percentage": 21.667314419}, {"Entity": "Spain", "Year": 2004, "renewable_percentage": 18.3190206468}, {"Entity": "Spain", "Year": 2005, "renewable_percentage": 14.8597342333}, {"Entity": "Spain", "Year": 2006, "renewable_percentage": 17.6623992413}, {"Entity": "Spain", "Year": 2007, "renewable_percentage": 19.3347262296}, {"Entity": "Spain", "Year": 2008, "renewable_percentage": 20.0051501593}, {"Entity": "Spain", "Year": 2009, "renewable_percentage": 25.4107639008}, {"Entity": "Spain", "Year": 2010, "renewable_percentage": 32.7922186819}, {"Entity": "Spain", "Year": 2011, "renewable_percentage": 30.0408415417}, {"Entity": "Spain", "Year": 2012, "renewable_percentage": 29.6047928652}, {"Entity": "Spain", "Year": 2013, "renewable_percentage": 39.5850357054}, {"Entity": "Spain", "Year": 2014, "renewable_percentage": 40.1032952644}, {"Entity": "Spain", "Year": 2015, "renewable_percentage": 34.9899091826}, {"Entity": "Spain", "Year": 2016, "renewable_percentage": 38.5818061138}, {"Entity": "Spain", "Year": 2017, "renewable_percentage": 32.220593624}, {"Entity": "Spain", "Year": 2018, "renewable_percentage": 38.2080329557}, {"Entity": "Spain", "Year": 2019, "renewable_percentage": 37.280815091}, {"Entity": "Spain", "Year": 2020, "renewable_percentage": 43.8108805298}, {"Entity": "Thailand", "Year": 2000, "renewable_percentage": 7.1261029822}, {"Entity": "Thailand", "Year": 2001, "renewable_percentage": 7.061527212}, {"Entity": "Thailand", "Year": 2002, "renewable_percentage": 7.9444772593}, {"Entity": "Thailand", "Year": 2003, "renewable_percentage": 7.6718362852}, {"Entity": "Thailand", "Year": 2004, "renewable_percentage": 6.5163549406}, {"Entity": "Thailand", "Year": 2005, "renewable_percentage": 6.0325203252}, {"Entity": "Thailand", "Year": 2006, "renewable_percentage": 7.5988547551}, {"Entity": "Thailand", "Year": 2007, "renewable_percentage": 7.7085852479}, {"Entity": "Thailand", "Year": 2008, "renewable_percentage": 6.5625458278}, {"Entity": "Thailand", "Year": 2009, "renewable_percentage": 6.6263303689}, {"Entity": "Thailand", "Year": 2010, "renewable_percentage": 5.7085828343}, {"Entity": "Thailand", "Year": 2011, "renewable_percentage": 8.039961941}, {"Entity": "Thailand", "Year": 2012, "renewable_percentage": 8.5396118358}, {"Entity": "Thailand", "Year": 2013, "renewable_percentage": 7.6765035487}, {"Entity": "Thailand", "Year": 2014, "renewable_percentage": 8.395728489}, {"Entity": "Thailand", "Year": 2015, "renewable_percentage": 7.9949619145}, {"Entity": "Thailand", "Year": 2016, "renewable_percentage": 8.9840234023}, {"Entity": "Thailand", "Year": 2017, "renewable_percentage": 10.9570957096}, {"Entity": "Thailand", "Year": 2018, "renewable_percentage": 14.1900054915}, {"Entity": "Thailand", "Year": 2019, "renewable_percentage": 14.7001731284}, {"Entity": "Thailand", "Year": 2020, "renewable_percentage": 13.7963737796}, {"Entity": "Ukraine", "Year": 2000, "renewable_percentage": 6.5860921352}, {"Entity": "Ukraine", "Year": 2001, "renewable_percentage": 6.9729761009}, {"Entity": "Ukraine", "Year": 2002, "renewable_percentage": 5.5597165409}, {"Entity": "Ukraine", "Year": 2003, "renewable_percentage": 5.1442841287}, {"Entity": "Ukraine", "Year": 2004, "renewable_percentage": 6.4718162839}, {"Entity": "Ukraine", "Year": 2005, "renewable_percentage": 6.6698940347}, {"Entity": "Ukraine", "Year": 2006, "renewable_percentage": 6.68633235}, {"Entity": "Ukraine", "Year": 2007, "renewable_percentage": 5.3380238605}, {"Entity": "Ukraine", "Year": 2008, "renewable_percentage": 6.1377090041}, {"Entity": "Ukraine", "Year": 2009, "renewable_percentage": 6.980762585}, {"Entity": "Ukraine", "Year": 2010, "renewable_percentage": 7.0914098083}, {"Entity": "Ukraine", "Year": 2011, "renewable_percentage": 5.7450628366}, {"Entity": "Ukraine", "Year": 2012, "renewable_percentage": 5.6614236741}, {"Entity": "Ukraine", "Year": 2013, "renewable_percentage": 7.8003200661}, {"Entity": "Ukraine", "Year": 2014, "renewable_percentage": 5.5885262117}, {"Entity": "Ukraine", "Year": 2015, "renewable_percentage": 4.3924771096}, {"Entity": "Ukraine", "Year": 2016, "renewable_percentage": 5.6797249171}, {"Entity": "Ukraine", "Year": 2017, "renewable_percentage": 7.0457194664}, {"Entity": "Ukraine", "Year": 2018, "renewable_percentage": 8.228528092}, {"Entity": "Ukraine", "Year": 2019, "renewable_percentage": 7.7754487096}, {"Entity": "Ukraine", "Year": 2020, "renewable_percentage": 11.8440577364}, {"Entity": "United Kingdom", "Year": 2000, "renewable_percentage": 2.6657406913}, {"Entity": "United Kingdom", "Year": 2001, "renewable_percentage": 2.5001961451}, {"Entity": "United Kingdom", "Year": 2002, "renewable_percentage": 2.8939157566}, {"Entity": "United Kingdom", "Year": 2003, "renewable_percentage": 2.6854802003}, {"Entity": "United Kingdom", "Year": 2004, "renewable_percentage": 3.6136880575}, {"Entity": "United Kingdom", "Year": 2005, "renewable_percentage": 4.2815234434}, {"Entity": "United Kingdom", "Year": 2006, "renewable_percentage": 4.6029890199}, {"Entity": "United Kingdom", "Year": 2007, "renewable_percentage": 5.0104331009}, {"Entity": "United Kingdom", "Year": 2008, "renewable_percentage": 5.6776842324}, {"Entity": "United Kingdom", "Year": 2009, "renewable_percentage": 6.7679854187}, {"Entity": "United Kingdom", "Year": 2010, "renewable_percentage": 6.9092924441}, {"Entity": "United Kingdom", "Year": 2011, "renewable_percentage": 9.6422505889}, {"Entity": "United Kingdom", "Year": 2012, "renewable_percentage": 11.4273047189}, {"Entity": "United Kingdom", "Year": 2013, "renewable_percentage": 14.9727052732}, {"Entity": "United Kingdom", "Year": 2014, "renewable_percentage": 19.2476358104}, {"Entity": "United Kingdom", "Year": 2015, "renewable_percentage": 24.6227709191}, {"Entity": "United Kingdom", "Year": 2016, "renewable_percentage": 24.6788390627}, {"Entity": "United Kingdom", "Year": 2017, "renewable_percentage": 29.4986571173}, {"Entity": "United Kingdom", "Year": 2018, "renewable_percentage": 33.2919818457}, {"Entity": "United Kingdom", "Year": 2019, "renewable_percentage": 37.4568630499}, {"Entity": "United Kingdom", "Year": 2020, "renewable_percentage": 42.8603962651}, {"Entity": "United States", "Year": 2000, "renewable_percentage": 9.2298992662}, {"Entity": "United States", "Year": 2001, "renewable_percentage": 7.5132056541}, {"Entity": "United States", "Year": 2002, "renewable_percentage": 8.749216358}, {"Entity": "United States", "Year": 2003, "renewable_percentage": 9.0252110397}, {"Entity": "United States", "Year": 2004, "renewable_percentage": 8.7334100887}, {"Entity": "United States", "Year": 2005, "renewable_percentage": 8.7494640631}, {"Entity": "United States", "Year": 2006, "renewable_percentage": 9.4184742052}, {"Entity": "United States", "Year": 2007, "renewable_percentage": 8.3984096829}, {"Entity": "United States", "Year": 2008, "renewable_percentage": 9.180943292}, {"Entity": "United States", "Year": 2009, "renewable_percentage": 10.547689996}, {"Entity": "United States", "Year": 2010, "renewable_percentage": 10.3180892283}, {"Entity": "United States", "Year": 2011, "renewable_percentage": 12.4665249812}, {"Entity": "United States", "Year": 2012, "renewable_percentage": 12.1841179804}, {"Entity": "United States", "Year": 2013, "renewable_percentage": 12.8311154179}, {"Entity": "United States", "Year": 2014, "renewable_percentage": 13.3503092033}, {"Entity": "United States", "Year": 2015, "renewable_percentage": 13.6261352256}, {"Entity": "United States", "Year": 2016, "renewable_percentage": 15.2880936304}, {"Entity": "United States", "Year": 2017, "renewable_percentage": 17.4515276472}, {"Entity": "United States", "Year": 2018, "renewable_percentage": 17.4499886946}, {"Entity": "United States", "Year": 2019, "renewable_percentage": 18.2946243489}, {"Entity": "United States", "Year": 2020, "renewable_percentage": 20.3156921037}], "metadata": {"Entity": {"type": "string", "semanticType": "Location"}, "Year": {"type": "number", "semanticType": "Year"}, "renewable_percentage": {"type": "number", "semanticType": "Percentage"}}, "derive": {"code": "\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total energy for each row\n df_energy['total_energy'] = (\n df_energy['Electricity from fossil fuels (TWh)'] + \n df_energy['Electricity from nuclear (TWh)'] + \n df_energy['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_energy['renewable_percentage'] = (\n df_energy['Electricity from renewables (TWh)'] / df_energy['total_energy'] * 100\n )\n \n # Handle division by zero cases\n df_energy['renewable_percentage'] = df_energy['renewable_percentage'].replace([np.inf, -np.inf], np.nan)\n \n # Select only the required columns\n transformed_df = df_energy[['Year', 'Entity', 'renewable_percentage']].copy()\n \n return transformed_df\n", "source": ["global-energy-20-small.csv"], "dialog": [{"content": "You are a data scientist to help user to transform data that will be used for visualization.\nThe user will provide you information about what data would be needed, and your job is to create a python function based on the input data summary, transformation instruction and expected fields.\nThe users' instruction includes \"chart_type\" and \"chart_encodings\" that describe the visualization they want, and natural language instructions \"goal\" that describe what data is needed.\n\n**Important:**\n- NEVER make assumptions or judgments about a person's gender, biological sex, sexuality, religion, race, nationality, ethnicity, political stance, socioeconomic status, mental health, invisible disabilities, medical conditions, personality type, social impressions, emotional state, and cognitive state.\n- NEVER create formulas that could be used to discriminate based on age. Ageism of any form (explicit and implicit) is strictly prohibited.\n- If above issue occurs, generate columns with np.nan.\n\nConcretely, you should first refine users' goal and then create a python function in the output section based off the [CONTEXT] and [GOAL]:\n\n 1. First, refine users' [GOAL]. The main objective in this step is to check if \"chart_type\" and \"chart_encodings\" provided by the user are sufficient to achieve their \"goal\". Concretely:\n - based on the user's \"goal\" and \"chart_type\" and \"chart_encodings\", elaborate the goal into a \"detailed_instruction\".\n - \"display_instruction\" is a short verb phrase describing the users' goal. \n - it would be a short verbal description of user intent as a verb phrase (<12 words).\n - generate it based on detailed_instruction and the suggested chart_type and chart_encodings, but don't need to mention the chart details.\n - should capture key computation ideas: by reading the display, the user can understand the purpose and what's derived from the data.\n - if the user specification follows up the previous instruction, the 'display_instruction' should only describe how it builds up the previous instruction without repeating information from previous steps.\n - the phrase can be presented in different styles, e.g., question (what's xxx), instruction (show xxx), description, etc.\n - if you mention column names from the input or the output data, highlight the text in **bold**.\n * the column can either be a column in the input data, or a new column that will be computed in the output data.\n * the mention don't have to be exact match, it can be semantically matching, e.g., if you mentioned \"average score\" in the text while the column to be computed is \"Avg_Score\", you should still highlight \"**average score**\" in the text.\n - determine \"output_fields\", the desired fields that the output data should have to achieve the user's goal, it's a good idea to include intermediate fields here.\n - then decide \"chart_encodings\", which maps visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized, \n - the \"chart_encodings\" should be created to support the user's \"chart_type\".\n - first, determine whether the user has provided sufficient fields in \"chart_encodings\" that are needed to achieve their goal:\n - if the user's \"chart_encodings\" are sufficient, simply copy it.\n - if the user didn't provide sufficient fields in \"chart_encodings\", add missing fields in \"chart_encodings\" (ordered them based on whether the field will be used in x,y axes or legends);\n - \"chart_encodings\" should only include fields that will be visualized (do not include other intermediate fields from \"output_fields\") \n - when adding new fields to \"chart_encodings\", be efficient and add only a minimal number of fields that are needed to achive the user's goal. \n - generally, the total number of fields in \"chart_encodings\" should be no more than 3 for x,y,legend.\n - if the user's \"chart_encodings\" is sufficient but can be optimized, you can reorder encodings to visualize the data more effectively.\n - sometimes, user may provide instruction to update visualizations fields they provided. You should leverage the user's goal to resolve the conflict and decide the final \"chart_encodings\"\n - e.g., they may mention \"use B metric instead\" while A metric is in provided fields, in this case, you should update \"chart_encodings\" to update A metric with B metric.\n - guide on statistical analysis:\n - when the user asks for forecasting or regression analysis, you should consider the following:\n - the output should be a long format table where actual x, y pairs and predicted x, y pairs are included in the X, Y columns, they are differentiated with a third column \"is_predicted\" that is a boolean field.\n - i.e., if the user ask for forecasting based on two columns T and Y, the output should be three columns: T, Y, is_predicted, where\n - T, Y columns contain BOTH original values from the data and predicted values from the data.\n - is_predicted is a boolean field to indicate whether the x, y pairs are original values from the data or predicted / regression values from the data.\n - the recommended chart should be line chart (time series) or scatter plot (quantitative x, y)\n - if the user asks for forecasting, it's good to include predicted x, y pairs for both x in the original data and future x values (i.e., combine regression and forecasting results)\n - in this case, is_predicted should be of three values 'original', 'regression', 'forecasting'\n - when the user asks for clustering:\n - the output should be a long format table where actual x, y pairs with a third column \"cluster_id\" that indicates the cluster id of the data point.\n - the recommended chart should be scatter plot (quantitative x, y)\n \n Prepare the result in the following json format:\n\n```\n{\n \"detailed_instruction\": \"...\" // string, elaborate user instruction with details if the user\n \"display_instruction\": \"...\" // string, the short verb phrase describing the users' goal.\n \"output_fields\": [...] // string[], describe the desired output fields that the output data should have based on the user's goal, it's a good idea to preserve intermediate fields here (i.e., the goal of transformed data)\n \"chart_encodings\": {\n \"x\": \"\",\n \"y\": \"\",\n \"color\": \"\",\n \"size\": \"\",\n \"opacity\": \"\",\n \"facet\": \"\",\n ... // other visualization channels user used\n } // object: map visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized.\n \"reason\": \"...\" // string, explain why this refinement is made\n}\n```\n\n 2. Then, write a python function based on the refined goal, the function input is a dataframe \"df\" (or multiple dataframes based on tables presented in the [CONTEXT] section) and the output is the transformed dataframe \"transformed_df\". \"transformed_df\" should contain all \"output_fields\" from the refined goal.\nThe python function must follow the template provided in [TEMPLATE], do not import any other libraries or modify function name. The function should be as simple as possible and easily readable.\nIf there is no data transformation needed based on \"output_fields\", the transformation function can simply \"return df\".\n\n[TEMPLATE]\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\nfrom sklearn import ... # import necessary libraries from sklearn if needed\n\ndef transform_data(df1, df2, ...): \n # complete the template here\n return transformed_df\n```\n\nnote: \n- if the user provided one table, then it should be `def transform_data(df1)`, if the user provided multiple tables, then it should be `def transform_data(df1, df2, ...)` and you should consider the join between tables to derive the output.\n- **VERY IMPORTANT** the number of arguments in the function must match the number of tables provided, and the order of arguments must match the order of tables provided.\n- try to use intuitive table names to refer to the input dataframes, for example, if the user provided two tables city and weather, you can use `transform_data(df_city, df_weather)` to refer to the two dataframes, as long as the number and order of the arguments match the number and order of the tables provided.\n- datetime objects handling:\n - if the output field is year, convert it to number, if it is year-month / year-month-day, convert it to string object (e.g., \"2020-01\" / \"2020-01-01\").\n - if the output is time only: convert hour to number if it's just the hour (e.g., 10), but convert hour:min or h:m:s to string object (e.g., \"10:30\", \"10:30:45\")\n - never return datetime object directly, convert it to either number (if it only contains year) or string so it's readable.\n\n 3. The output must only contain a json object representing the refined goal and a python code block representing the transformation code, do not add any extra text explanation.\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 158029.9988, 190729.9957, 209229.9957, ..., 6993180.176, 9861099.609, 9874660.156, 10096009.77\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 90.14, 91.79, 92.54, 804.95\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[GOAL]\n\n{\n \"instruction\": \"\",\n \"chart_type\": \"Line Chart\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Transform the data into a long format where different energy sources (fossil fuels, nuclear, renewables) are unpivoted into separate rows. Each row should contain the year, entity (country), energy source type, and the corresponding energy value in TWh. This will enable visualization of energy generation trends over time by source and country.\",\n \"display_instruction\": \"Show **energy generation** trends by **source** and **Entity** over **Year**\",\n \"output_fields\": [\"Year\", \"Entity\", \"source\", \"energy\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n },\n \"reason\": \"The user wants to visualize energy data across different sources with a line chart faceted by source. The original data has energy sources in separate columns (fossil fuels, nuclear, renewables), so we need to unpivot these columns into a long format with 'source' and 'energy' fields. The chart_encodings provided by the user are appropriate for this visualization goal, using faceting (column) to separate different energy sources.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Unpivot the energy source columns into long format\n transformed_df = pd.melt(\n df_energy,\n id_vars=['Year', 'Entity'],\n value_vars=[\n 'Electricity from fossil fuels (TWh)',\n 'Electricity from nuclear (TWh)',\n 'Electricity from renewables (TWh)'\n ],\n var_name='source',\n value_name='energy'\n )\n \n # Clean up the source names to be more readable\n transformed_df['source'] = transformed_df['source'].str.replace('Electricity from ', '').str.replace(' (TWh)', '')\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Entity Year energy source\n0 Australia 2000 181.05 fossil fuels\n1 Australia 2001 194.33 fossil fuels\n2 Australia 2002 197.29 fossil fuels\n3 Australia 2003 195.13 fossil fuels\n4 Australia 2004 203.66 fossil fuels\n5 Australia 2005 195.95 fossil fuels\n6 Australia 2006 198.72 fossil fuels\n7 Australia 2007 208.59 fossil fuels\n8 Australia 2008 211.06 fossil fuels\n9 Australia 2009 216.42 fossil fuels\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show renewable percentage for each country\",\n \"chart_type\": \"Line Chart\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Calculate the renewable energy percentage for each country and year. The renewable percentage is computed as the ratio of renewable energy to total energy (sum of fossil fuels, nuclear, and renewables) multiplied by 100. The output should contain year, entity (country), and the renewable percentage value.\",\n \"display_instruction\": \"Show **renewable percentage** for each **Entity** over **Year**\",\n \"output_fields\": [\"Year\", \"Entity\", \"renewable_percentage\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"renewable_percentage\",\n \"color\": \"Entity\"\n },\n \"reason\": \"The user wants to see renewable percentage trends, which requires calculating the proportion of renewable energy relative to total energy generation. This is a single metric per country per year, so we don't need the 'column' faceting by source anymore. The chart_encodings are updated to reflect that we're now showing a percentage metric rather than absolute energy values by source.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total energy for each row\n df_energy['total_energy'] = (\n df_energy['Electricity from fossil fuels (TWh)'] + \n df_energy['Electricity from nuclear (TWh)'] + \n df_energy['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_energy['renewable_percentage'] = (\n df_energy['Electricity from renewables (TWh)'] / df_energy['total_energy'] * 100\n )\n \n # Handle division by zero cases\n df_energy['renewable_percentage'] = df_energy['renewable_percentage'].replace([np.inf, -np.inf], np.nan)\n \n # Select only the required columns\n transformed_df = df_energy[['Year', 'Entity', 'renewable_percentage']].copy()\n \n return transformed_df\n```", "role": "assistant"}], "trigger": {"tableId": "table-82", "resultTableId": "table-45", "chart": {"id": "chart-1760738416626", "chartType": "Line Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"fieldID": "concept-1760738385163"}, "color": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "opacity": {"channel": "opacity", "bin": false}, "column": {"fieldID": "concept-1760738389404", "sortBy": "[\"fossil fuels\",\"nuclear\",\"renewables\"]"}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-82", "saved": false, "source": "trigger", "unread": false}, "interaction": [{"from": "user", "to": "datatransform-agent", "role": "instruction", "content": "show renewable percentage for each country", "displayContent": "Show **renewable percentage** for each **Entity** over **Year**"}]}, "explanation": {"agent": "CodeExplanationAgent", "code": "The code transforms global energy data to analyze renewable energy adoption:\n\n- Calculates **total_energy** by summing electricity generation from three sources: **fossil fuels**, **nuclear**, and **renewables** (all measured in TWh)\n- Computes **renewable_percentage** by dividing **Electricity from renewables (TWh)** by **total_energy** and multiplying by **100**\n- Handles edge cases by replacing infinite values (from division by zero) with **NaN**\n- Filters the dataset to retain only **Year**, **Entity** (country), and **renewable_percentage** columns", "concepts": [{"explanation": "The proportion of a country's total electricity generation that comes from renewable sources, expressed as a percentage. Calculated as: \\( \\frac{\\text{Electricity from renewables}}{\\text{Total electricity generation}} \\times 100 \\). This metric indicates the degree of renewable energy adoption in each country's energy mix.", "field": "renewable_percentage"}], "dialog": [{"content": "You are a data scientist to help user explain code, \nso that a non-code can clearly understand what the code is doing, you are provided with a summary of the input data, and the transformation code.\n\nYour goal:\n1. You should generate a good itemized explanation of the code so that the reader can understand high-level steps of what the data transformation is doing.\n - Be very concise, and stay at a high-level. The reader doesn't understand code and does not want to learn exactly what the code is doing. They just want to learn what have been done from a logical level.\n - The explanation should be a markdown string that is a list of bullet points (with new lines), highlight constants, data fields, and important verbs.\n2. Generate a list of explanations for new fields (fields not from the input data) that introduce metrics/concepts that are not obvious from the code.\n - provide a declarative definition that explains the new field, use a mathematical notation if applicable.\n - only include new fields explanation of new metrics that are involved in computation (e.g., ROI, commerical_success_score)\n - *DO NOT* explain trivial new fields like \"Decade\" or \"Avg_Rating\", \"US_Sales\" that are self-explanatory.\n - Avoid explaining fields that are simple aggregate of fields in the original data (min_score, avg_value, count, etc.)\n - When a field involves mathematical computation, you can use LaTeX math notation in the explanation. Format mathematical expressions using:\n - Inline math: `\\( ... \\)` for formulas within text\n - Block math: `\\[ ... \\]` for standalone formulas\n - Examples: `\\( \\frac{\\text{Revenue}}{\\text{Cost}} \\)` for ratios, `\\[ \\text{Score} = \\text{Rating} \\times \\text{Worldwide\\_Gross} \\]` for formulas\n - note: when using underscores as part of the text, you need to escape them with a backslash, e.g., `\\_`\n - Note: don't use math notation for fields whose computation is trivial (use plain english), it will likely be confusing to the reader. \n Only use math notation for fields that can not be easilyexplained in plain english. Use it sparingly.\n3. If there are multiple fields that have the similar computation, you can explain them together in one explanation.\n - in \"field\", you can provide a list of fields in format of \"field1, field2, ...\"\n - in \"explanation\", you can provide a single explanation for the computation of the fields.\n - for example, if you have fields like \"Norm_Rating\", \"Norm_Gross\", \"Critical_Commercial_Score\", you can explain Norm_Rating, Norm_Gross together in one explanation and explain Critical_Commercial_Score in another explanation.\n4. If the code is about statistical analysis, you should explain the statistical analysis in the explanation as a concept named \"Statistical Analysis\" in the [CONCEPTS EXPLANATION] section.\n - explain how you model the data, which fields are used, how data processing is done, and what models are used.\n - suggest some other modeling approaches that can be used to analyze the data in the explanation as well.\n \nThe focus is to explain how new fields are computed, don't generate explanation for low-level actions like \"return\", \"load data\" etc. \n\nProvide the result in the following two sections:\n - first section is the code explanation that should be a markdown block explaining the code, in the [CODE EXPLANATION] section.\n - remember to highlight constants, data fields, and important verbs in the code explanation.\n - second section is the concepts explanation that should be a json block (start with ```json) in the [CONCEPTS EXPLANATION] section.\n\n[CODE EXPLANATION]\n\n...(explanation of the code)\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"...\",\n \"explanation\": \"...\"\n }\n]\n\n```\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small_csv)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 5738290.039, 9861099.609, 10502929.69, 10707219.73\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., nan, nan, nan, nan\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[CODE]\n\nhere is the transformation code: \nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total energy for each row\n df_energy['total_energy'] = (\n df_energy['Electricity from fossil fuels (TWh)'] + \n df_energy['Electricity from nuclear (TWh)'] + \n df_energy['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_energy['renewable_percentage'] = (\n df_energy['Electricity from renewables (TWh)'] / df_energy['total_energy'] * 100\n )\n \n # Handle division by zero cases\n df_energy['renewable_percentage'] = df_energy['renewable_percentage'].replace([np.inf, -np.inf], np.nan)\n \n # Select only the required columns\n transformed_df = df_energy[['Year', 'Entity', 'renewable_percentage']].copy()\n \n return transformed_df\n\n\n[EXPLANATION]\n", "role": "user"}, {"content": "[CODE EXPLANATION]\n\nThe code transforms global energy data to analyze renewable energy adoption:\n\n- Calculates **total_energy** by summing electricity generation from three sources: **fossil fuels**, **nuclear**, and **renewables** (all measured in TWh)\n- Computes **renewable_percentage** by dividing **Electricity from renewables (TWh)** by **total_energy** and multiplying by **100**\n- Handles edge cases by replacing infinite values (from division by zero) with **NaN**\n- Filters the dataset to retain only **Year**, **Entity** (country), and **renewable_percentage** columns\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"renewable_percentage\",\n \"explanation\": \"The proportion of a country's total electricity generation that comes from renewable sources, expressed as a percentage. Calculated as: \\\\( \\\\frac{\\\\text{Electricity from renewables}}{\\\\text{Total electricity generation}} \\\\times 100 \\\\). This metric indicates the degree of renewable energy adoption in each country's energy mix.\"\n }\n]\n```", "role": "assistant"}], "status": "ok"}, "status": "completed"}, "anchored": false, "attachedMetadata": ""}, {"kind": "table", "id": "table-78", "displayId": "renewable-energy-rank", "names": ["Entity", "Year", "rank", "renewable_percentage"], "rows": [{"Entity": "Australia", "Year": 2000, "rank": 11, "renewable_percentage": 8.6344368187}, {"Entity": "Australia", "Year": 2001, "rank": 10, "renewable_percentage": 8.2180135078}, {"Entity": "Australia", "Year": 2002, "rank": 11, "renewable_percentage": 8.0833022736}, {"Entity": "Australia", "Year": 2003, "rank": 11, "renewable_percentage": 8.6598324205}, {"Entity": "Australia", "Year": 2004, "rank": 12, "renewable_percentage": 8.7013045232}, {"Entity": "Australia", "Year": 2005, "rank": 11, "renewable_percentage": 9.1562355123}, {"Entity": "Australia", "Year": 2006, "rank": 11, "renewable_percentage": 9.6357600837}, {"Entity": "Australia", "Year": 2007, "rank": 11, "renewable_percentage": 9.1190310213}, {"Entity": "Australia", "Year": 2008, "rank": 12, "renewable_percentage": 8.0548900022}, {"Entity": "Australia", "Year": 2009, "rank": 12, "renewable_percentage": 7.8043793133}, {"Entity": "Australia", "Year": 2010, "rank": 12, "renewable_percentage": 9.0442152121}, {"Entity": "Australia", "Year": 2011, "rank": 11, "renewable_percentage": 11.3454273735}, {"Entity": "Australia", "Year": 2012, "rank": 12, "renewable_percentage": 11.4105750279}, {"Entity": "Australia", "Year": 2013, "rank": 11, "renewable_percentage": 14.8708583355}, {"Entity": "Australia", "Year": 2014, "rank": 11, "renewable_percentage": 14.9621290509}, {"Entity": "Australia", "Year": 2015, "rank": 12, "renewable_percentage": 14.3476000693}, {"Entity": "Australia", "Year": 2016, "rank": 11, "renewable_percentage": 15.6093794449}, {"Entity": "Australia", "Year": 2017, "rank": 12, "renewable_percentage": 16.3138729943}, {"Entity": "Australia", "Year": 2018, "rank": 12, "renewable_percentage": 17.145938174}, {"Entity": "Australia", "Year": 2019, "rank": 8, "renewable_percentage": 21.3759705435}, {"Entity": "Australia", "Year": 2020, "rank": 8, "renewable_percentage": 25.5031684668}, {"Entity": "Brazil", "Year": 2000, "rank": 1, "renewable_percentage": 90.1307723743}, {"Entity": "Brazil", "Year": 2001, "rank": 1, "renewable_percentage": 84.6953615744}, {"Entity": "Brazil", "Year": 2002, "rank": 1, "renewable_percentage": 86.0883364189}, {"Entity": "Brazil", "Year": 2003, "rank": 1, "renewable_percentage": 87.4561159097}, {"Entity": "Brazil", "Year": 2004, "rank": 1, "renewable_percentage": 86.4260041451}, {"Entity": "Brazil", "Year": 2005, "rank": 1, "renewable_percentage": 87.6781562721}, {"Entity": "Brazil", "Year": 2006, "rank": 1, "renewable_percentage": 87.2842473236}, {"Entity": "Brazil", "Year": 2007, "rank": 1, "renewable_percentage": 88.7252098726}, {"Entity": "Brazil", "Year": 2008, "rank": 1, "renewable_percentage": 84.8072313004}, {"Entity": "Brazil", "Year": 2009, "rank": 1, "renewable_percentage": 89.4172280725}, {"Entity": "Brazil", "Year": 2010, "rank": 1, "renewable_percentage": 85.3576882415}, {"Entity": "Brazil", "Year": 2011, "rank": 1, "renewable_percentage": 87.6618820986}, {"Entity": "Brazil", "Year": 2012, "rank": 1, "renewable_percentage": 83.1164558813}, {"Entity": "Brazil", "Year": 2013, "rank": 1, "renewable_percentage": 77.5240022006}, {"Entity": "Brazil", "Year": 2014, "rank": 1, "renewable_percentage": 74.0418657409}, {"Entity": "Brazil", "Year": 2015, "rank": 1, "renewable_percentage": 75.0231817625}, {"Entity": "Brazil", "Year": 2016, "rank": 1, "renewable_percentage": 81.0938046902}, {"Entity": "Brazil", "Year": 2017, "rank": 1, "renewable_percentage": 79.9091472228}, {"Entity": "Brazil", "Year": 2018, "rank": 1, "renewable_percentage": 82.9198505403}, {"Entity": "Brazil", "Year": 2019, "rank": 1, "renewable_percentage": 82.8548799017}, {"Entity": "Brazil", "Year": 2020, "rank": 1, "renewable_percentage": 84.6411771408}, {"Entity": "Canada", "Year": 2000, "rank": 2, "renewable_percentage": 61.8095917882}, {"Entity": "Canada", "Year": 2001, "rank": 2, "renewable_percentage": 59.3287558747}, {"Entity": "Canada", "Year": 2002, "rank": 2, "renewable_percentage": 61.1477403113}, {"Entity": "Canada", "Year": 2003, "rank": 2, "renewable_percentage": 60.0789685174}, {"Entity": "Canada", "Year": 2004, "rank": 2, "renewable_percentage": 59.6967771845}, {"Entity": "Canada", "Year": 2005, "rank": 2, "renewable_percentage": 60.8208155391}, {"Entity": "Canada", "Year": 2006, "rank": 2, "renewable_percentage": 60.8271602855}, {"Entity": "Canada", "Year": 2007, "rank": 2, "renewable_percentage": 61.2460642446}, {"Entity": "Canada", "Year": 2008, "rank": 2, "renewable_percentage": 62.6520720838}, {"Entity": "Canada", "Year": 2009, "rank": 2, "renewable_percentage": 63.8919227732}, {"Entity": "Canada", "Year": 2010, "rank": 2, "renewable_percentage": 62.9421470558}, {"Entity": "Canada", "Year": 2011, "rank": 2, "renewable_percentage": 64.0922915917}, {"Entity": "Canada", "Year": 2012, "rank": 2, "renewable_percentage": 65.098730952}, {"Entity": "Canada", "Year": 2013, "rank": 2, "renewable_percentage": 65.4320794066}, {"Entity": "Canada", "Year": 2014, "rank": 2, "renewable_percentage": 64.791145907}, {"Entity": "Canada", "Year": 2015, "rank": 2, "renewable_percentage": 65.2946239925}, {"Entity": "Canada", "Year": 2016, "rank": 2, "renewable_percentage": 66.1890584295}, {"Entity": "Canada", "Year": 2017, "rank": 2, "renewable_percentage": 67.5399410579}, {"Entity": "Canada", "Year": 2018, "rank": 2, "renewable_percentage": 67.3685700357}, {"Entity": "Canada", "Year": 2019, "rank": 2, "renewable_percentage": 67.1741623137}, {"Entity": "Canada", "Year": 2020, "rank": 2, "renewable_percentage": 68.7796436354}, {"Entity": "China", "Year": 2000, "rank": 5, "renewable_percentage": 16.639126586}, {"Entity": "China", "Year": 2001, "rank": 6, "renewable_percentage": 18.9581237042}, {"Entity": "China", "Year": 2002, "rank": 3, "renewable_percentage": 17.6185006046}, {"Entity": "China", "Year": 2003, "rank": 6, "renewable_percentage": 15.0362717081}, {"Entity": "China", "Year": 2004, "rank": 6, "renewable_percentage": 16.2224108273}, {"Entity": "China", "Year": 2005, "rank": 5, "renewable_percentage": 16.1731179957}, {"Entity": "China", "Year": 2006, "rank": 7, "renewable_percentage": 15.5884036124}, {"Entity": "China", "Year": 2007, "rank": 7, "renewable_percentage": 15.2583847828}, {"Entity": "China", "Year": 2008, "rank": 5, "renewable_percentage": 19.0253335469}, {"Entity": "China", "Year": 2009, "rank": 5, "renewable_percentage": 17.8857170547}, {"Entity": "China", "Year": 2010, "rank": 6, "renewable_percentage": 18.7800759915}, {"Entity": "China", "Year": 2011, "rank": 7, "renewable_percentage": 16.8902341543}, {"Entity": "China", "Year": 2012, "rank": 6, "renewable_percentage": 20.122965176}, {"Entity": "China", "Year": 2013, "rank": 6, "renewable_percentage": 20.2152481955}, {"Entity": "China", "Year": 2014, "rank": 6, "renewable_percentage": 22.3502204285}, {"Entity": "China", "Year": 2015, "rank": 7, "renewable_percentage": 24.079270189}, {"Entity": "China", "Year": 2016, "rank": 6, "renewable_percentage": 25.0007798429}, {"Entity": "China", "Year": 2017, "rank": 7, "renewable_percentage": 25.419242299}, {"Entity": "China", "Year": 2018, "rank": 7, "renewable_percentage": 25.7747942589}, {"Entity": "China", "Year": 2019, "rank": 7, "renewable_percentage": 26.9995671106}, {"Entity": "China", "Year": 2020, "rank": 7, "renewable_percentage": 28.2464606924}, {"Entity": "France", "Year": 2000, "rank": 8, "renewable_percentage": 12.7117691154}, {"Entity": "France", "Year": 2001, "rank": 7, "renewable_percentage": 13.9961372206}, {"Entity": "France", "Year": 2002, "rank": 8, "renewable_percentage": 11.3544157067}, {"Entity": "France", "Year": 2003, "rank": 9, "renewable_percentage": 10.9783540506}, {"Entity": "France", "Year": 2004, "rank": 9, "renewable_percentage": 11.0051305559}, {"Entity": "France", "Year": 2005, "rank": 10, "renewable_percentage": 9.6479837153}, {"Entity": "France", "Year": 2006, "rank": 10, "renewable_percentage": 10.7235915493}, {"Entity": "France", "Year": 2007, "rank": 9, "renewable_percentage": 11.4370075239}, {"Entity": "France", "Year": 2008, "rank": 9, "renewable_percentage": 12.7487441615}, {"Entity": "France", "Year": 2009, "rank": 9, "renewable_percentage": 12.8776856068}, {"Entity": "France", "Year": 2010, "rank": 9, "renewable_percentage": 13.6240072491}, {"Entity": "France", "Year": 2011, "rank": 10, "renewable_percentage": 11.63553049}, {"Entity": "France", "Year": 2012, "rank": 9, "renewable_percentage": 15.0331522889}, {"Entity": "France", "Year": 2013, "rank": 7, "renewable_percentage": 17.2469424928}, {"Entity": "France", "Year": 2014, "rank": 9, "renewable_percentage": 16.6074992494}, {"Entity": "France", "Year": 2015, "rank": 9, "renewable_percentage": 16.002230276}, {"Entity": "France", "Year": 2016, "rank": 8, "renewable_percentage": 17.7212924013}, {"Entity": "France", "Year": 2017, "rank": 11, "renewable_percentage": 16.6576751547}, {"Entity": "France", "Year": 2018, "rank": 8, "renewable_percentage": 19.7315179827}, {"Entity": "France", "Year": 2019, "rank": 9, "renewable_percentage": 20.0116665488}, {"Entity": "France", "Year": 2020, "rank": 9, "renewable_percentage": 23.7610241821}, {"Entity": "Germany", "Year": 2000, "rank": 14, "renewable_percentage": 6.1977983575}, {"Entity": "Germany", "Year": 2001, "rank": 14, "renewable_percentage": 6.5132585197}, {"Entity": "Germany", "Year": 2002, "rank": 13, "renewable_percentage": 7.6431369854}, {"Entity": "Germany", "Year": 2003, "rank": 12, "renewable_percentage": 7.7455438643}, {"Entity": "Germany", "Year": 2004, "rank": 10, "renewable_percentage": 9.4989185292}, {"Entity": "Germany", "Year": 2005, "rank": 8, "renewable_percentage": 10.3356645637}, {"Entity": "Germany", "Year": 2006, "rank": 8, "renewable_percentage": 11.5129959829}, {"Entity": "Germany", "Year": 2007, "rank": 8, "renewable_percentage": 14.135471525}, {"Entity": "Germany", "Year": 2008, "rank": 8, "renewable_percentage": 14.8894504106}, {"Entity": "Germany", "Year": 2009, "rank": 7, "renewable_percentage": 16.2902842395}, {"Entity": "Germany", "Year": 2010, "rank": 7, "renewable_percentage": 16.8384989754}, {"Entity": "Germany", "Year": 2011, "rank": 5, "renewable_percentage": 20.4967199299}, {"Entity": "Germany", "Year": 2012, "rank": 5, "renewable_percentage": 23.056464482}, {"Entity": "Germany", "Year": 2013, "rank": 5, "renewable_percentage": 24.1368929731}, {"Entity": "Germany", "Year": 2014, "rank": 5, "renewable_percentage": 26.2182434067}, {"Entity": "Germany", "Year": 2015, "rank": 5, "renewable_percentage": 29.4721888318}, {"Entity": "Germany", "Year": 2016, "rank": 5, "renewable_percentage": 29.4990435013}, {"Entity": "Germany", "Year": 2017, "rank": 4, "renewable_percentage": 33.4855497593}, {"Entity": "Germany", "Year": 2018, "rank": 5, "renewable_percentage": 35.0976735365}, {"Entity": "Germany", "Year": 2019, "rank": 3, "renewable_percentage": 40.0890757144}, {"Entity": "Germany", "Year": 2020, "rank": 3, "renewable_percentage": 44.3324048937}, {"Entity": "India", "Year": 2000, "rank": 7, "renewable_percentage": 14.0481982534}, {"Entity": "India", "Year": 2001, "rank": 8, "renewable_percentage": 12.9997099422}, {"Entity": "India", "Year": 2002, "rank": 7, "renewable_percentage": 11.938193032}, {"Entity": "India", "Year": 2003, "rank": 8, "renewable_percentage": 11.695109147}, {"Entity": "India", "Year": 2004, "rank": 7, "renewable_percentage": 15.6375300722}, {"Entity": "India", "Year": 2005, "rank": 6, "renewable_percentage": 15.2543575768}, {"Entity": "India", "Year": 2006, "rank": 5, "renewable_percentage": 17.1352578483}, {"Entity": "India", "Year": 2007, "rank": 4, "renewable_percentage": 17.8019742295}, {"Entity": "India", "Year": 2008, "rank": 7, "renewable_percentage": 16.768266921}, {"Entity": "India", "Year": 2009, "rank": 8, "renewable_percentage": 15.269804822}, {"Entity": "India", "Year": 2010, "rank": 8, "renewable_percentage": 15.2122201244}, {"Entity": "India", "Year": 2011, "rank": 8, "renewable_percentage": 16.7911025145}, {"Entity": "India", "Year": 2012, "rank": 8, "renewable_percentage": 15.1350014654}, {"Entity": "India", "Year": 2013, "rank": 8, "renewable_percentage": 16.3941577818}, {"Entity": "India", "Year": 2014, "rank": 10, "renewable_percentage": 16.0092550039}, {"Entity": "India", "Year": 2015, "rank": 11, "renewable_percentage": 15.3718720687}, {"Entity": "India", "Year": 2016, "rank": 13, "renewable_percentage": 14.8548475703}, {"Entity": "India", "Year": 2017, "rank": 13, "renewable_percentage": 15.9669920335}, {"Entity": "India", "Year": 2018, "rank": 13, "renewable_percentage": 16.6949549709}, {"Entity": "India", "Year": 2019, "rank": 11, "renewable_percentage": 18.6915426873}, {"Entity": "India", "Year": 2020, "rank": 13, "renewable_percentage": 20.2059243238}, {"Entity": "Indonesia", "Year": 2000, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2001, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2002, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2003, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2004, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2005, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2006, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2007, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2008, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2009, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2010, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2011, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2012, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2013, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2014, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2015, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2016, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2017, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2018, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2019, "rank": null, "renewable_percentage": null}, {"Entity": "Indonesia", "Year": 2020, "rank": null, "renewable_percentage": null}, {"Entity": "Italy", "Year": 2000, "rank": 4, "renewable_percentage": 18.900241501}, {"Entity": "Italy", "Year": 2001, "rank": 4, "renewable_percentage": 20.049431902}, {"Entity": "Italy", "Year": 2002, "rank": 4, "renewable_percentage": 17.4555571614}, {"Entity": "Italy", "Year": 2003, "rank": 4, "renewable_percentage": 16.4202116476}, {"Entity": "Italy", "Year": 2004, "rank": 4, "renewable_percentage": 18.2749380999}, {"Entity": "Italy", "Year": 2005, "rank": 4, "renewable_percentage": 16.3769782226}, {"Entity": "Italy", "Year": 2006, "rank": 6, "renewable_percentage": 16.5128639906}, {"Entity": "Italy", "Year": 2007, "rank": 6, "renewable_percentage": 15.5333485238}, {"Entity": "Italy", "Year": 2008, "rank": 6, "renewable_percentage": 18.6112}, {"Entity": "Italy", "Year": 2009, "rank": 4, "renewable_percentage": 24.0837332221}, {"Entity": "Italy", "Year": 2010, "rank": 4, "renewable_percentage": 25.8400187976}, {"Entity": "Italy", "Year": 2011, "rank": 4, "renewable_percentage": 27.6773203443}, {"Entity": "Italy", "Year": 2012, "rank": 3, "renewable_percentage": 31.1049649217}, {"Entity": "Italy", "Year": 2013, "rank": 4, "renewable_percentage": 39.0148744209}, {"Entity": "Italy", "Year": 2014, "rank": 3, "renewable_percentage": 43.4976931949}, {"Entity": "Italy", "Year": 2015, "rank": 3, "renewable_percentage": 38.7577860829}, {"Entity": "Italy", "Year": 2016, "rank": 4, "renewable_percentage": 37.6079387187}, {"Entity": "Italy", "Year": 2017, "rank": 3, "renewable_percentage": 35.4174479255}, {"Entity": "Italy", "Year": 2018, "rank": 3, "renewable_percentage": 39.8100142663}, {"Entity": "Italy", "Year": 2019, "rank": 4, "renewable_percentage": 39.7563068474}, {"Entity": "Italy", "Year": 2020, "rank": 6, "renewable_percentage": 42.0397741576}, {"Entity": "Japan", "Year": 2000, "rank": 9, "renewable_percentage": 10.5382436261}, {"Entity": "Japan", "Year": 2001, "rank": 9, "renewable_percentage": 10.447653504}, {"Entity": "Japan", "Year": 2002, "rank": 9, "renewable_percentage": 10.2477294843}, {"Entity": "Japan", "Year": 2003, "rank": 7, "renewable_percentage": 11.6993698448}, {"Entity": "Japan", "Year": 2004, "rank": 8, "renewable_percentage": 11.4198974767}, {"Entity": "Japan", "Year": 2005, "rank": 9, "renewable_percentage": 9.9068127192}, {"Entity": "Japan", "Year": 2006, "rank": 9, "renewable_percentage": 10.8554989442}, {"Entity": "Japan", "Year": 2007, "rank": 10, "renewable_percentage": 9.3897588285}, {"Entity": "Japan", "Year": 2008, "rank": 10, "renewable_percentage": 10.0196834738}, {"Entity": "Japan", "Year": 2009, "rank": 11, "renewable_percentage": 10.4667464874}, {"Entity": "Japan", "Year": 2010, "rank": 10, "renewable_percentage": 10.5269966826}, {"Entity": "Japan", "Year": 2011, "rank": 12, "renewable_percentage": 11.1272421632}, {"Entity": "Japan", "Year": 2012, "rank": 13, "renewable_percentage": 10.6143703421}, {"Entity": "Japan", "Year": 2013, "rank": 13, "renewable_percentage": 11.7965798852}, {"Entity": "Japan", "Year": 2014, "rank": 13, "renewable_percentage": 13.2719619718}, {"Entity": "Japan", "Year": 2015, "rank": 10, "renewable_percentage": 15.6586817408}, {"Entity": "Japan", "Year": 2016, "rank": 10, "renewable_percentage": 15.6920107068}, {"Entity": "Japan", "Year": 2017, "rank": 10, "renewable_percentage": 17.3559698312}, {"Entity": "Japan", "Year": 2018, "rank": 9, "renewable_percentage": 18.144181175}, {"Entity": "Japan", "Year": 2019, "rank": 10, "renewable_percentage": 19.4223288251}, {"Entity": "Japan", "Year": 2020, "rank": 10, "renewable_percentage": 21.324925062}, {"Entity": "Kazakhstan", "Year": 2000, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2001, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2002, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2003, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2004, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2005, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2006, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2007, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2008, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2009, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2010, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2011, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2012, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2013, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2014, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2015, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2016, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2017, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2018, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2019, "rank": null, "renewable_percentage": null}, {"Entity": "Kazakhstan", "Year": 2020, "rank": null, "renewable_percentage": null}, {"Entity": "Mexico", "Year": 2000, "rank": 3, "renewable_percentage": 22.9291160107}, {"Entity": "Mexico", "Year": 2001, "rank": 5, "renewable_percentage": 19.6649599841}, {"Entity": "Mexico", "Year": 2002, "rank": 5, "renewable_percentage": 17.4220963173}, {"Entity": "Mexico", "Year": 2003, "rank": 5, "renewable_percentage": 15.8536585366}, {"Entity": "Mexico", "Year": 2004, "rank": 5, "renewable_percentage": 17.3134463687}, {"Entity": "Mexico", "Year": 2005, "rank": 3, "renewable_percentage": 18.2780827246}, {"Entity": "Mexico", "Year": 2006, "rank": 3, "renewable_percentage": 18.4256091896}, {"Entity": "Mexico", "Year": 2007, "rank": 5, "renewable_percentage": 17.2761561168}, {"Entity": "Mexico", "Year": 2008, "rank": 3, "renewable_percentage": 21.5387105913}, {"Entity": "Mexico", "Year": 2009, "rank": 6, "renewable_percentage": 16.5369729069}, {"Entity": "Mexico", "Year": 2010, "rank": 5, "renewable_percentage": 19.4281608109}, {"Entity": "Mexico", "Year": 2011, "rank": 6, "renewable_percentage": 18.0916357408}, {"Entity": "Mexico", "Year": 2012, "rank": 7, "renewable_percentage": 16.5759438104}, {"Entity": "Mexico", "Year": 2013, "rank": 9, "renewable_percentage": 15.5492898914}, {"Entity": "Mexico", "Year": 2014, "rank": 7, "renewable_percentage": 19.8008201523}, {"Entity": "Mexico", "Year": 2015, "rank": 8, "renewable_percentage": 17.5976903451}, {"Entity": "Mexico", "Year": 2016, "rank": 9, "renewable_percentage": 17.4806943436}, {"Entity": "Mexico", "Year": 2017, "rank": 8, "renewable_percentage": 18.0759526428}, {"Entity": "Mexico", "Year": 2018, "rank": 10, "renewable_percentage": 17.703752786}, {"Entity": "Mexico", "Year": 2019, "rank": 12, "renewable_percentage": 18.5487927565}, {"Entity": "Mexico", "Year": 2020, "rank": 11, "renewable_percentage": 21.2552224134}, {"Entity": "Poland", "Year": 2000, "rank": 16, "renewable_percentage": 1.6273222517}, {"Entity": "Poland", "Year": 2001, "rank": 16, "renewable_percentage": 1.934316727}, {"Entity": "Poland", "Year": 2002, "rank": 16, "renewable_percentage": 1.9439960699}, {"Entity": "Poland", "Year": 2003, "rank": 16, "renewable_percentage": 1.4999000067}, {"Entity": "Poland", "Year": 2004, "rank": 16, "renewable_percentage": 2.1016681991}, {"Entity": "Poland", "Year": 2005, "rank": 16, "renewable_percentage": 2.4830699774}, {"Entity": "Poland", "Year": 2006, "rank": 16, "renewable_percentage": 2.673730134}, {"Entity": "Poland", "Year": 2007, "rank": 16, "renewable_percentage": 3.4256513785}, {"Entity": "Poland", "Year": 2008, "rank": 16, "renewable_percentage": 4.2744438696}, {"Entity": "Poland", "Year": 2009, "rank": 16, "renewable_percentage": 5.7515388179}, {"Entity": "Poland", "Year": 2010, "rank": 14, "renewable_percentage": 6.9299363057}, {"Entity": "Poland", "Year": 2011, "rank": 14, "renewable_percentage": 8.0547205693}, {"Entity": "Poland", "Year": 2012, "rank": 14, "renewable_percentage": 10.4436057663}, {"Entity": "Poland", "Year": 2013, "rank": 14, "renewable_percentage": 10.4081508145}, {"Entity": "Poland", "Year": 2014, "rank": 14, "renewable_percentage": 12.5331481248}, {"Entity": "Poland", "Year": 2015, "rank": 13, "renewable_percentage": 13.8151485631}, {"Entity": "Poland", "Year": 2016, "rank": 14, "renewable_percentage": 13.7335179722}, {"Entity": "Poland", "Year": 2017, "rank": 14, "renewable_percentage": 14.1999646913}, {"Entity": "Poland", "Year": 2018, "rank": 15, "renewable_percentage": 12.7559148032}, {"Entity": "Poland", "Year": 2019, "rank": 14, "renewable_percentage": 15.6157998037}, {"Entity": "Poland", "Year": 2020, "rank": 14, "renewable_percentage": 17.9648720886}, {"Entity": "Saudi Arabia", "Year": 2000, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2001, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2002, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2003, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2004, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2005, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2006, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2007, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2008, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2009, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2010, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2011, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2012, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2013, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2014, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2015, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2016, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2017, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2018, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2019, "rank": null, "renewable_percentage": null}, {"Entity": "Saudi Arabia", "Year": 2020, "rank": null, "renewable_percentage": null}, {"Entity": "South Africa", "Year": 2000, "rank": 17, "renewable_percentage": 0.9110805721}, {"Entity": "South Africa", "Year": 2001, "rank": 17, "renewable_percentage": 1.2516536074}, {"Entity": "South Africa", "Year": 2002, "rank": 17, "renewable_percentage": 1.3802249619}, {"Entity": "South Africa", "Year": 2003, "rank": 17, "renewable_percentage": 0.545271261}, {"Entity": "South Africa", "Year": 2004, "rank": 17, "renewable_percentage": 0.5827199439}, {"Entity": "South Africa", "Year": 2005, "rank": 17, "renewable_percentage": 0.763458686}, {"Entity": "South Africa", "Year": 2006, "rank": 17, "renewable_percentage": 1.3863060017}, {"Entity": "South Africa", "Year": 2007, "rank": 17, "renewable_percentage": 0.5267209594}, {"Entity": "South Africa", "Year": 2008, "rank": 17, "renewable_percentage": 0.6895692269}, {"Entity": "South Africa", "Year": 2009, "rank": 17, "renewable_percentage": 0.8031088083}, {"Entity": "South Africa", "Year": 2010, "rank": 17, "renewable_percentage": 1.0330068318}, {"Entity": "South Africa", "Year": 2011, "rank": 17, "renewable_percentage": 1.0184465622}, {"Entity": "South Africa", "Year": 2012, "rank": 17, "renewable_percentage": 0.6890826069}, {"Entity": "South Africa", "Year": 2013, "rank": 17, "renewable_percentage": 0.6792168043}, {"Entity": "South Africa", "Year": 2014, "rank": 17, "renewable_percentage": 1.4288129861}, {"Entity": "South Africa", "Year": 2015, "rank": 17, "renewable_percentage": 2.6256790549}, {"Entity": "South Africa", "Year": 2016, "rank": 17, "renewable_percentage": 3.2586126531}, {"Entity": "South Africa", "Year": 2017, "rank": 17, "renewable_percentage": 4.2202606137}, {"Entity": "South Africa", "Year": 2018, "rank": 17, "renewable_percentage": 5.1554655529}, {"Entity": "South Africa", "Year": 2019, "rank": 17, "renewable_percentage": 5.3589699864}, {"Entity": "South Africa", "Year": 2020, "rank": 17, "renewable_percentage": 5.780581212}, {"Entity": "Spain", "Year": 2000, "rank": 6, "renewable_percentage": 15.6119862394}, {"Entity": "Spain", "Year": 2001, "rank": 3, "renewable_percentage": 21.1524434719}, {"Entity": "Spain", "Year": 2002, "rank": 6, "renewable_percentage": 13.8260180901}, {"Entity": "Spain", "Year": 2003, "rank": 3, "renewable_percentage": 21.667314419}, {"Entity": "Spain", "Year": 2004, "rank": 3, "renewable_percentage": 18.3190206468}, {"Entity": "Spain", "Year": 2005, "rank": 7, "renewable_percentage": 14.8597342333}, {"Entity": "Spain", "Year": 2006, "rank": 4, "renewable_percentage": 17.6623992413}, {"Entity": "Spain", "Year": 2007, "rank": 3, "renewable_percentage": 19.3347262296}, {"Entity": "Spain", "Year": 2008, "rank": 4, "renewable_percentage": 20.0051501593}, {"Entity": "Spain", "Year": 2009, "rank": 3, "renewable_percentage": 25.4107639008}, {"Entity": "Spain", "Year": 2010, "rank": 3, "renewable_percentage": 32.7922186819}, {"Entity": "Spain", "Year": 2011, "rank": 3, "renewable_percentage": 30.0408415417}, {"Entity": "Spain", "Year": 2012, "rank": 4, "renewable_percentage": 29.6047928652}, {"Entity": "Spain", "Year": 2013, "rank": 3, "renewable_percentage": 39.5850357054}, {"Entity": "Spain", "Year": 2014, "rank": 4, "renewable_percentage": 40.1032952644}, {"Entity": "Spain", "Year": 2015, "rank": 4, "renewable_percentage": 34.9899091826}, {"Entity": "Spain", "Year": 2016, "rank": 3, "renewable_percentage": 38.5818061138}, {"Entity": "Spain", "Year": 2017, "rank": 5, "renewable_percentage": 32.220593624}, {"Entity": "Spain", "Year": 2018, "rank": 4, "renewable_percentage": 38.2080329557}, {"Entity": "Spain", "Year": 2019, "rank": 6, "renewable_percentage": 37.280815091}, {"Entity": "Spain", "Year": 2020, "rank": 4, "renewable_percentage": 43.8108805298}, {"Entity": "Thailand", "Year": 2000, "rank": 12, "renewable_percentage": 7.1261029822}, {"Entity": "Thailand", "Year": 2001, "rank": 12, "renewable_percentage": 7.061527212}, {"Entity": "Thailand", "Year": 2002, "rank": 12, "renewable_percentage": 7.9444772593}, {"Entity": "Thailand", "Year": 2003, "rank": 13, "renewable_percentage": 7.6718362852}, {"Entity": "Thailand", "Year": 2004, "rank": 13, "renewable_percentage": 6.5163549406}, {"Entity": "Thailand", "Year": 2005, "rank": 14, "renewable_percentage": 6.0325203252}, {"Entity": "Thailand", "Year": 2006, "rank": 13, "renewable_percentage": 7.5988547551}, {"Entity": "Thailand", "Year": 2007, "rank": 13, "renewable_percentage": 7.7085852479}, {"Entity": "Thailand", "Year": 2008, "rank": 13, "renewable_percentage": 6.5625458278}, {"Entity": "Thailand", "Year": 2009, "rank": 15, "renewable_percentage": 6.6263303689}, {"Entity": "Thailand", "Year": 2010, "rank": 16, "renewable_percentage": 5.7085828343}, {"Entity": "Thailand", "Year": 2011, "rank": 15, "renewable_percentage": 8.039961941}, {"Entity": "Thailand", "Year": 2012, "rank": 15, "renewable_percentage": 8.5396118358}, {"Entity": "Thailand", "Year": 2013, "rank": 16, "renewable_percentage": 7.6765035487}, {"Entity": "Thailand", "Year": 2014, "rank": 15, "renewable_percentage": 8.395728489}, {"Entity": "Thailand", "Year": 2015, "rank": 15, "renewable_percentage": 7.9949619145}, {"Entity": "Thailand", "Year": 2016, "rank": 15, "renewable_percentage": 8.9840234023}, {"Entity": "Thailand", "Year": 2017, "rank": 15, "renewable_percentage": 10.9570957096}, {"Entity": "Thailand", "Year": 2018, "rank": 14, "renewable_percentage": 14.1900054915}, {"Entity": "Thailand", "Year": 2019, "rank": 15, "renewable_percentage": 14.7001731284}, {"Entity": "Thailand", "Year": 2020, "rank": 15, "renewable_percentage": 13.7963737796}, {"Entity": "Ukraine", "Year": 2000, "rank": 13, "renewable_percentage": 6.5860921352}, {"Entity": "Ukraine", "Year": 2001, "rank": 13, "renewable_percentage": 6.9729761009}, {"Entity": "Ukraine", "Year": 2002, "rank": 14, "renewable_percentage": 5.5597165409}, {"Entity": "Ukraine", "Year": 2003, "rank": 14, "renewable_percentage": 5.1442841287}, {"Entity": "Ukraine", "Year": 2004, "rank": 14, "renewable_percentage": 6.4718162839}, {"Entity": "Ukraine", "Year": 2005, "rank": 13, "renewable_percentage": 6.6698940347}, {"Entity": "Ukraine", "Year": 2006, "rank": 14, "renewable_percentage": 6.68633235}, {"Entity": "Ukraine", "Year": 2007, "rank": 14, "renewable_percentage": 5.3380238605}, {"Entity": "Ukraine", "Year": 2008, "rank": 14, "renewable_percentage": 6.1377090041}, {"Entity": "Ukraine", "Year": 2009, "rank": 13, "renewable_percentage": 6.980762585}, {"Entity": "Ukraine", "Year": 2010, "rank": 13, "renewable_percentage": 7.0914098083}, {"Entity": "Ukraine", "Year": 2011, "rank": 16, "renewable_percentage": 5.7450628366}, {"Entity": "Ukraine", "Year": 2012, "rank": 16, "renewable_percentage": 5.6614236741}, {"Entity": "Ukraine", "Year": 2013, "rank": 15, "renewable_percentage": 7.8003200661}, {"Entity": "Ukraine", "Year": 2014, "rank": 16, "renewable_percentage": 5.5885262117}, {"Entity": "Ukraine", "Year": 2015, "rank": 16, "renewable_percentage": 4.3924771096}, {"Entity": "Ukraine", "Year": 2016, "rank": 16, "renewable_percentage": 5.6797249171}, {"Entity": "Ukraine", "Year": 2017, "rank": 16, "renewable_percentage": 7.0457194664}, {"Entity": "Ukraine", "Year": 2018, "rank": 16, "renewable_percentage": 8.228528092}, {"Entity": "Ukraine", "Year": 2019, "rank": 16, "renewable_percentage": 7.7754487096}, {"Entity": "Ukraine", "Year": 2020, "rank": 16, "renewable_percentage": 11.8440577364}, {"Entity": "United Kingdom", "Year": 2000, "rank": 15, "renewable_percentage": 2.6657406913}, {"Entity": "United Kingdom", "Year": 2001, "rank": 15, "renewable_percentage": 2.5001961451}, {"Entity": "United Kingdom", "Year": 2002, "rank": 15, "renewable_percentage": 2.8939157566}, {"Entity": "United Kingdom", "Year": 2003, "rank": 15, "renewable_percentage": 2.6854802003}, {"Entity": "United Kingdom", "Year": 2004, "rank": 15, "renewable_percentage": 3.6136880575}, {"Entity": "United Kingdom", "Year": 2005, "rank": 15, "renewable_percentage": 4.2815234434}, {"Entity": "United Kingdom", "Year": 2006, "rank": 15, "renewable_percentage": 4.6029890199}, {"Entity": "United Kingdom", "Year": 2007, "rank": 15, "renewable_percentage": 5.0104331009}, {"Entity": "United Kingdom", "Year": 2008, "rank": 15, "renewable_percentage": 5.6776842324}, {"Entity": "United Kingdom", "Year": 2009, "rank": 14, "renewable_percentage": 6.7679854187}, {"Entity": "United Kingdom", "Year": 2010, "rank": 15, "renewable_percentage": 6.9092924441}, {"Entity": "United Kingdom", "Year": 2011, "rank": 13, "renewable_percentage": 9.6422505889}, {"Entity": "United Kingdom", "Year": 2012, "rank": 11, "renewable_percentage": 11.4273047189}, {"Entity": "United Kingdom", "Year": 2013, "rank": 10, "renewable_percentage": 14.9727052732}, {"Entity": "United Kingdom", "Year": 2014, "rank": 8, "renewable_percentage": 19.2476358104}, {"Entity": "United Kingdom", "Year": 2015, "rank": 6, "renewable_percentage": 24.6227709191}, {"Entity": "United Kingdom", "Year": 2016, "rank": 7, "renewable_percentage": 24.6788390627}, {"Entity": "United Kingdom", "Year": 2017, "rank": 6, "renewable_percentage": 29.4986571173}, {"Entity": "United Kingdom", "Year": 2018, "rank": 6, "renewable_percentage": 33.2919818457}, {"Entity": "United Kingdom", "Year": 2019, "rank": 5, "renewable_percentage": 37.4568630499}, {"Entity": "United Kingdom", "Year": 2020, "rank": 5, "renewable_percentage": 42.8603962651}, {"Entity": "United States", "Year": 2000, "rank": 10, "renewable_percentage": 9.2298992662}, {"Entity": "United States", "Year": 2001, "rank": 11, "renewable_percentage": 7.5132056541}, {"Entity": "United States", "Year": 2002, "rank": 10, "renewable_percentage": 8.749216358}, {"Entity": "United States", "Year": 2003, "rank": 10, "renewable_percentage": 9.0252110397}, {"Entity": "United States", "Year": 2004, "rank": 11, "renewable_percentage": 8.7334100887}, {"Entity": "United States", "Year": 2005, "rank": 12, "renewable_percentage": 8.7494640631}, {"Entity": "United States", "Year": 2006, "rank": 12, "renewable_percentage": 9.4184742052}, {"Entity": "United States", "Year": 2007, "rank": 12, "renewable_percentage": 8.3984096829}, {"Entity": "United States", "Year": 2008, "rank": 11, "renewable_percentage": 9.180943292}, {"Entity": "United States", "Year": 2009, "rank": 10, "renewable_percentage": 10.547689996}, {"Entity": "United States", "Year": 2010, "rank": 11, "renewable_percentage": 10.3180892283}, {"Entity": "United States", "Year": 2011, "rank": 9, "renewable_percentage": 12.4665249812}, {"Entity": "United States", "Year": 2012, "rank": 10, "renewable_percentage": 12.1841179804}, {"Entity": "United States", "Year": 2013, "rank": 12, "renewable_percentage": 12.8311154179}, {"Entity": "United States", "Year": 2014, "rank": 12, "renewable_percentage": 13.3503092033}, {"Entity": "United States", "Year": 2015, "rank": 14, "renewable_percentage": 13.6261352256}, {"Entity": "United States", "Year": 2016, "rank": 12, "renewable_percentage": 15.2880936304}, {"Entity": "United States", "Year": 2017, "rank": 9, "renewable_percentage": 17.4515276472}, {"Entity": "United States", "Year": 2018, "rank": 11, "renewable_percentage": 17.4499886946}, {"Entity": "United States", "Year": 2019, "rank": 13, "renewable_percentage": 18.2946243489}, {"Entity": "United States", "Year": 2020, "rank": 12, "renewable_percentage": 20.3156921037}], "metadata": {"Entity": {"type": "string", "semanticType": "Location"}, "Year": {"type": "number", "semanticType": "Year"}, "rank": {"type": "number", "semanticType": "Number"}, "renewable_percentage": {"type": "number", "semanticType": "Percentage"}}, "derive": {"code": "\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total energy for each row\n df_energy['total_energy'] = (\n df_energy['Electricity from fossil fuels (TWh)'] + \n df_energy['Electricity from nuclear (TWh)'] + \n df_energy['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_energy['renewable_percentage'] = (\n df_energy['Electricity from renewables (TWh)'] / df_energy['total_energy'] * 100\n )\n \n # Handle division by zero cases\n df_energy['renewable_percentage'] = df_energy['renewable_percentage'].replace([np.inf, -np.inf], np.nan)\n \n # Rank countries by renewable percentage within each year (highest percentage = rank 1)\n df_energy['rank'] = df_energy.groupby('Year')['renewable_percentage'].rank(\n method='dense', \n ascending=False\n )\n \n # Select only the required columns\n transformed_df = df_energy[['Year', 'Entity', 'renewable_percentage', 'rank']].copy()\n \n return transformed_df\n", "source": ["global-energy-20-small.csv"], "dialog": [{"content": "You are a data scientist to help user to transform data that will be used for visualization.\nThe user will provide you information about what data would be needed, and your job is to create a python function based on the input data summary, transformation instruction and expected fields.\nThe users' instruction includes \"chart_type\" and \"chart_encodings\" that describe the visualization they want, and natural language instructions \"goal\" that describe what data is needed.\n\n**Important:**\n- NEVER make assumptions or judgments about a person's gender, biological sex, sexuality, religion, race, nationality, ethnicity, political stance, socioeconomic status, mental health, invisible disabilities, medical conditions, personality type, social impressions, emotional state, and cognitive state.\n- NEVER create formulas that could be used to discriminate based on age. Ageism of any form (explicit and implicit) is strictly prohibited.\n- If above issue occurs, generate columns with np.nan.\n\nConcretely, you should first refine users' goal and then create a python function in the output section based off the [CONTEXT] and [GOAL]:\n\n 1. First, refine users' [GOAL]. The main objective in this step is to check if \"chart_type\" and \"chart_encodings\" provided by the user are sufficient to achieve their \"goal\". Concretely:\n - based on the user's \"goal\" and \"chart_type\" and \"chart_encodings\", elaborate the goal into a \"detailed_instruction\".\n - \"display_instruction\" is a short verb phrase describing the users' goal. \n - it would be a short verbal description of user intent as a verb phrase (<12 words).\n - generate it based on detailed_instruction and the suggested chart_type and chart_encodings, but don't need to mention the chart details.\n - should capture key computation ideas: by reading the display, the user can understand the purpose and what's derived from the data.\n - if the user specification follows up the previous instruction, the 'display_instruction' should only describe how it builds up the previous instruction without repeating information from previous steps.\n - the phrase can be presented in different styles, e.g., question (what's xxx), instruction (show xxx), description, etc.\n - if you mention column names from the input or the output data, highlight the text in **bold**.\n * the column can either be a column in the input data, or a new column that will be computed in the output data.\n * the mention don't have to be exact match, it can be semantically matching, e.g., if you mentioned \"average score\" in the text while the column to be computed is \"Avg_Score\", you should still highlight \"**average score**\" in the text.\n - determine \"output_fields\", the desired fields that the output data should have to achieve the user's goal, it's a good idea to include intermediate fields here.\n - then decide \"chart_encodings\", which maps visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized, \n - the \"chart_encodings\" should be created to support the user's \"chart_type\".\n - first, determine whether the user has provided sufficient fields in \"chart_encodings\" that are needed to achieve their goal:\n - if the user's \"chart_encodings\" are sufficient, simply copy it.\n - if the user didn't provide sufficient fields in \"chart_encodings\", add missing fields in \"chart_encodings\" (ordered them based on whether the field will be used in x,y axes or legends);\n - \"chart_encodings\" should only include fields that will be visualized (do not include other intermediate fields from \"output_fields\") \n - when adding new fields to \"chart_encodings\", be efficient and add only a minimal number of fields that are needed to achive the user's goal. \n - generally, the total number of fields in \"chart_encodings\" should be no more than 3 for x,y,legend.\n - if the user's \"chart_encodings\" is sufficient but can be optimized, you can reorder encodings to visualize the data more effectively.\n - sometimes, user may provide instruction to update visualizations fields they provided. You should leverage the user's goal to resolve the conflict and decide the final \"chart_encodings\"\n - e.g., they may mention \"use B metric instead\" while A metric is in provided fields, in this case, you should update \"chart_encodings\" to update A metric with B metric.\n - guide on statistical analysis:\n - when the user asks for forecasting or regression analysis, you should consider the following:\n - the output should be a long format table where actual x, y pairs and predicted x, y pairs are included in the X, Y columns, they are differentiated with a third column \"is_predicted\" that is a boolean field.\n - i.e., if the user ask for forecasting based on two columns T and Y, the output should be three columns: T, Y, is_predicted, where\n - T, Y columns contain BOTH original values from the data and predicted values from the data.\n - is_predicted is a boolean field to indicate whether the x, y pairs are original values from the data or predicted / regression values from the data.\n - the recommended chart should be line chart (time series) or scatter plot (quantitative x, y)\n - if the user asks for forecasting, it's good to include predicted x, y pairs for both x in the original data and future x values (i.e., combine regression and forecasting results)\n - in this case, is_predicted should be of three values 'original', 'regression', 'forecasting'\n - when the user asks for clustering:\n - the output should be a long format table where actual x, y pairs with a third column \"cluster_id\" that indicates the cluster id of the data point.\n - the recommended chart should be scatter plot (quantitative x, y)\n \n Prepare the result in the following json format:\n\n```\n{\n \"detailed_instruction\": \"...\" // string, elaborate user instruction with details if the user\n \"display_instruction\": \"...\" // string, the short verb phrase describing the users' goal.\n \"output_fields\": [...] // string[], describe the desired output fields that the output data should have based on the user's goal, it's a good idea to preserve intermediate fields here (i.e., the goal of transformed data)\n \"chart_encodings\": {\n \"x\": \"\",\n \"y\": \"\",\n \"color\": \"\",\n \"size\": \"\",\n \"opacity\": \"\",\n \"facet\": \"\",\n ... // other visualization channels user used\n } // object: map visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized.\n \"reason\": \"...\" // string, explain why this refinement is made\n}\n```\n\n 2. Then, write a python function based on the refined goal, the function input is a dataframe \"df\" (or multiple dataframes based on tables presented in the [CONTEXT] section) and the output is the transformed dataframe \"transformed_df\". \"transformed_df\" should contain all \"output_fields\" from the refined goal.\nThe python function must follow the template provided in [TEMPLATE], do not import any other libraries or modify function name. The function should be as simple as possible and easily readable.\nIf there is no data transformation needed based on \"output_fields\", the transformation function can simply \"return df\".\n\n[TEMPLATE]\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\nfrom sklearn import ... # import necessary libraries from sklearn if needed\n\ndef transform_data(df1, df2, ...): \n # complete the template here\n return transformed_df\n```\n\nnote: \n- if the user provided one table, then it should be `def transform_data(df1)`, if the user provided multiple tables, then it should be `def transform_data(df1, df2, ...)` and you should consider the join between tables to derive the output.\n- **VERY IMPORTANT** the number of arguments in the function must match the number of tables provided, and the order of arguments must match the order of tables provided.\n- try to use intuitive table names to refer to the input dataframes, for example, if the user provided two tables city and weather, you can use `transform_data(df_city, df_weather)` to refer to the two dataframes, as long as the number and order of the arguments match the number and order of the tables provided.\n- datetime objects handling:\n - if the output field is year, convert it to number, if it is year-month / year-month-day, convert it to string object (e.g., \"2020-01\" / \"2020-01-01\").\n - if the output is time only: convert hour to number if it's just the hour (e.g., 10), but convert hour:min or h:m:s to string object (e.g., \"10:30\", \"10:30:45\")\n - never return datetime object directly, convert it to either number (if it only contains year) or string so it's readable.\n\n 3. The output must only contain a json object representing the refined goal and a python code block representing the transformation code, do not add any extra text explanation.\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 158029.9988, 190729.9957, 209229.9957, ..., 6993180.176, 9861099.609, 9874660.156, 10096009.77\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 90.14, 91.79, 92.54, 804.95\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[GOAL]\n\n{\n \"instruction\": \"\",\n \"chart_type\": \"Line Chart\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Transform the data into a long format where different energy sources (fossil fuels, nuclear, renewables) are unpivoted into separate rows. Each row should contain the year, entity (country), energy source type, and the corresponding energy value in TWh. This will enable visualization of energy generation trends over time by source and country.\",\n \"display_instruction\": \"Show **energy generation** trends by **source** and **Entity** over **Year**\",\n \"output_fields\": [\"Year\", \"Entity\", \"source\", \"energy\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n },\n \"reason\": \"The user wants to visualize energy data across different sources with a line chart faceted by source. The original data has energy sources in separate columns (fossil fuels, nuclear, renewables), so we need to unpivot these columns into a long format with 'source' and 'energy' fields. The chart_encodings provided by the user are appropriate for this visualization goal, using faceting (column) to separate different energy sources.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Unpivot the energy source columns into long format\n transformed_df = pd.melt(\n df_energy,\n id_vars=['Year', 'Entity'],\n value_vars=[\n 'Electricity from fossil fuels (TWh)',\n 'Electricity from nuclear (TWh)',\n 'Electricity from renewables (TWh)'\n ],\n var_name='source',\n value_name='energy'\n )\n \n # Clean up the source names to be more readable\n transformed_df['source'] = transformed_df['source'].str.replace('Electricity from ', '').str.replace(' (TWh)', '')\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Entity Year energy source\n0 Australia 2000 181.05 fossil fuels\n1 Australia 2001 194.33 fossil fuels\n2 Australia 2002 197.29 fossil fuels\n3 Australia 2003 195.13 fossil fuels\n4 Australia 2004 203.66 fossil fuels\n5 Australia 2005 195.95 fossil fuels\n6 Australia 2006 198.72 fossil fuels\n7 Australia 2007 208.59 fossil fuels\n8 Australia 2008 211.06 fossil fuels\n9 Australia 2009 216.42 fossil fuels\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show renewable percentage for each country\",\n \"chart_type\": \"Line Chart\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"energy\",\n \"color\": \"Entity\",\n \"column\": \"source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Calculate the renewable energy percentage for each country and year. The renewable percentage is computed as the ratio of renewable energy to total energy (sum of fossil fuels, nuclear, and renewables) multiplied by 100. The output should contain year, entity (country), and the renewable percentage value.\",\n \"display_instruction\": \"Show **renewable percentage** for each **Entity** over **Year**\",\n \"output_fields\": [\"Year\", \"Entity\", \"renewable_percentage\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"renewable_percentage\",\n \"color\": \"Entity\"\n },\n \"reason\": \"The user wants to see renewable percentage trends, which requires calculating the proportion of renewable energy relative to total energy generation. This is a single metric per country per year, so we don't need the 'column' faceting by source anymore. The chart_encodings are updated to reflect that we're now showing a percentage metric rather than absolute energy values by source.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total energy for each row\n df_energy['total_energy'] = (\n df_energy['Electricity from fossil fuels (TWh)'] + \n df_energy['Electricity from nuclear (TWh)'] + \n df_energy['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_energy['renewable_percentage'] = (\n df_energy['Electricity from renewables (TWh)'] / df_energy['total_energy'] * 100\n )\n \n # Handle division by zero cases\n df_energy['renewable_percentage'] = df_energy['renewable_percentage'].replace([np.inf, -np.inf], np.nan)\n \n # Select only the required columns\n transformed_df = df_energy[['Year', 'Entity', 'renewable_percentage']].copy()\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Entity Year renewable_percentage\n0 Australia 2000 8.634437\n1 Australia 2001 8.218014\n2 Australia 2002 8.083302\n3 Australia 2003 8.659832\n4 Australia 2004 8.701305\n5 Australia 2005 9.156236\n6 Australia 2006 9.635760\n7 Australia 2007 9.119031\n8 Australia 2008 8.054890\n9 Australia 2009 7.804379\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"rank countries by their renewable percentage\",\n \"chart_type\": \"Line Chart\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"renewable_percentage\",\n \"color\": \"Entity\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Calculate the renewable energy percentage for each country and year, then rank countries based on their renewable percentage. The ranking should be computed for each year, showing which countries have the highest to lowest renewable percentage. The output should contain year, entity (country), renewable percentage, and the rank of each country within that year.\",\n \"display_instruction\": \"Show **countries ranked** by **renewable percentage** over **Year**\",\n \"output_fields\": [\"Year\", \"Entity\", \"renewable_percentage\", \"rank\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"renewable_percentage\",\n \"color\": \"Entity\"\n },\n \"reason\": \"The user wants to rank countries by their renewable percentage, which requires adding a rank field that orders countries within each year based on their renewable percentage (highest percentage = rank 1). This allows comparison of countries' relative positions in renewable energy adoption over time. The chart_encodings remain the same as the previous step since we're still visualizing renewable percentage trends, but now with ranking information available.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total energy for each row\n df_energy['total_energy'] = (\n df_energy['Electricity from fossil fuels (TWh)'] + \n df_energy['Electricity from nuclear (TWh)'] + \n df_energy['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_energy['renewable_percentage'] = (\n df_energy['Electricity from renewables (TWh)'] / df_energy['total_energy'] * 100\n )\n \n # Handle division by zero cases\n df_energy['renewable_percentage'] = df_energy['renewable_percentage'].replace([np.inf, -np.inf], np.nan)\n \n # Rank countries by renewable percentage within each year (highest percentage = rank 1)\n df_energy['rank'] = df_energy.groupby('Year')['renewable_percentage'].rank(\n method='dense', \n ascending=False\n )\n \n # Select only the required columns\n transformed_df = df_energy[['Year', 'Entity', 'renewable_percentage', 'rank']].copy()\n \n return transformed_df\n```", "role": "assistant"}], "trigger": {"tableId": "table-45", "resultTableId": "table-78", "chart": {"id": "chart-1760738441793", "chartType": "Line Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"fieldID": "concept-renewable_percentage-1760738424337"}, "color": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "opacity": {"channel": "opacity", "bin": false}, "column": {}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-45", "saved": false, "source": "trigger", "unread": false}, "interaction": [{"from": "user", "to": "datatransform-agent", "role": "instruction", "content": "rank countries by their renewable percentage", "displayContent": "Show **countries ranked** by **renewable percentage** over **Year**"}]}, "explanation": {"agent": "CodeExplanationAgent", "code": "- Calculate **total_energy** for each country-year by summing electricity generated from **fossil fuels**, **nuclear**, and **renewables** (all measured in TWh)\n- Compute **renewable_percentage** by dividing **Electricity from renewables (TWh)** by **total_energy** and multiplying by **100**\n- Replace any infinite values in **renewable_percentage** with NaN to handle cases where total energy is zero\n- Assign a **rank** to each country within each **Year** based on **renewable_percentage**, where rank **1** represents the highest renewable percentage (using dense ranking method)\n- Return a dataset containing **Year**, **Entity** (country name), **renewable_percentage**, and **rank**", "concepts": [{"explanation": "The proportion of a country's total electricity generation that comes from renewable sources, expressed as a percentage. Calculated as: \\[ \\text{renewable\\_percentage} = \\frac{\\text{Electricity from renewables (TWh)}}{\\text{total\\_energy}} \\times 100 \\]", "field": "renewable_percentage"}], "dialog": [{"content": "You are a data scientist to help user explain code, \nso that a non-code can clearly understand what the code is doing, you are provided with a summary of the input data, and the transformation code.\n\nYour goal:\n1. You should generate a good itemized explanation of the code so that the reader can understand high-level steps of what the data transformation is doing.\n - Be very concise, and stay at a high-level. The reader doesn't understand code and does not want to learn exactly what the code is doing. They just want to learn what have been done from a logical level.\n - The explanation should be a markdown string that is a list of bullet points (with new lines), highlight constants, data fields, and important verbs.\n2. Generate a list of explanations for new fields (fields not from the input data) that introduce metrics/concepts that are not obvious from the code.\n - provide a declarative definition that explains the new field, use a mathematical notation if applicable.\n - only include new fields explanation of new metrics that are involved in computation (e.g., ROI, commerical_success_score)\n - *DO NOT* explain trivial new fields like \"Decade\" or \"Avg_Rating\", \"US_Sales\" that are self-explanatory.\n - Avoid explaining fields that are simple aggregate of fields in the original data (min_score, avg_value, count, etc.)\n - When a field involves mathematical computation, you can use LaTeX math notation in the explanation. Format mathematical expressions using:\n - Inline math: `\\( ... \\)` for formulas within text\n - Block math: `\\[ ... \\]` for standalone formulas\n - Examples: `\\( \\frac{\\text{Revenue}}{\\text{Cost}} \\)` for ratios, `\\[ \\text{Score} = \\text{Rating} \\times \\text{Worldwide\\_Gross} \\]` for formulas\n - note: when using underscores as part of the text, you need to escape them with a backslash, e.g., `\\_`\n - Note: don't use math notation for fields whose computation is trivial (use plain english), it will likely be confusing to the reader. \n Only use math notation for fields that can not be easilyexplained in plain english. Use it sparingly.\n3. If there are multiple fields that have the similar computation, you can explain them together in one explanation.\n - in \"field\", you can provide a list of fields in format of \"field1, field2, ...\"\n - in \"explanation\", you can provide a single explanation for the computation of the fields.\n - for example, if you have fields like \"Norm_Rating\", \"Norm_Gross\", \"Critical_Commercial_Score\", you can explain Norm_Rating, Norm_Gross together in one explanation and explain Critical_Commercial_Score in another explanation.\n4. If the code is about statistical analysis, you should explain the statistical analysis in the explanation as a concept named \"Statistical Analysis\" in the [CONCEPTS EXPLANATION] section.\n - explain how you model the data, which fields are used, how data processing is done, and what models are used.\n - suggest some other modeling approaches that can be used to analyze the data in the explanation as well.\n \nThe focus is to explain how new fields are computed, don't generate explanation for low-level actions like \"return\", \"load data\" etc. \n\nProvide the result in the following two sections:\n - first section is the code explanation that should be a markdown block explaining the code, in the [CODE EXPLANATION] section.\n - remember to highlight constants, data fields, and important verbs in the code explanation.\n - second section is the concepts explanation that should be a json block (start with ```json) in the [CONCEPTS EXPLANATION] section.\n\n[CODE EXPLANATION]\n\n...(explanation of the code)\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"...\",\n \"explanation\": \"...\"\n }\n]\n\n```\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small_csv)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 190729.9957, 227580.0018, 233600.0061, ..., 9282549.805, 9861099.609, 9874660.156, 10096009.77\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 88.76, 90.14, 91.79, 92.54\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[CODE]\n\nhere is the transformation code: \nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total energy for each row\n df_energy['total_energy'] = (\n df_energy['Electricity from fossil fuels (TWh)'] + \n df_energy['Electricity from nuclear (TWh)'] + \n df_energy['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_energy['renewable_percentage'] = (\n df_energy['Electricity from renewables (TWh)'] / df_energy['total_energy'] * 100\n )\n \n # Handle division by zero cases\n df_energy['renewable_percentage'] = df_energy['renewable_percentage'].replace([np.inf, -np.inf], np.nan)\n \n # Rank countries by renewable percentage within each year (highest percentage = rank 1)\n df_energy['rank'] = df_energy.groupby('Year')['renewable_percentage'].rank(\n method='dense', \n ascending=False\n )\n \n # Select only the required columns\n transformed_df = df_energy[['Year', 'Entity', 'renewable_percentage', 'rank']].copy()\n \n return transformed_df\n\n\n[EXPLANATION]\n", "role": "user"}, {"content": "[CODE EXPLANATION]\n\n- Calculate **total_energy** for each country-year by summing electricity generated from **fossil fuels**, **nuclear**, and **renewables** (all measured in TWh)\n- Compute **renewable_percentage** by dividing **Electricity from renewables (TWh)** by **total_energy** and multiplying by **100**\n- Replace any infinite values in **renewable_percentage** with NaN to handle cases where total energy is zero\n- Assign a **rank** to each country within each **Year** based on **renewable_percentage**, where rank **1** represents the highest renewable percentage (using dense ranking method)\n- Return a dataset containing **Year**, **Entity** (country name), **renewable_percentage**, and **rank**\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"renewable_percentage\",\n \"explanation\": \"The proportion of a country's total electricity generation that comes from renewable sources, expressed as a percentage. Calculated as: \\\\[ \\\\text{renewable\\\\_percentage} = \\\\frac{\\\\text{Electricity from renewables (TWh)}}{\\\\text{total\\\\_energy}} \\\\times 100 \\\\]\"\n }\n]\n```", "role": "assistant"}], "status": "ok"}, "status": "completed"}, "anchored": false, "attachedMetadata": ""}, {"kind": "table", "id": "table-97", "displayId": "renewable-elec", "names": ["Electricity from renewables (TWh)", "Entity", "Year"], "rows": [{"Electricity from renewables (TWh)": 17.11, "Entity": "Australia", "Year": "2000"}, {"Electricity from renewables (TWh)": 63.99, "Entity": "Australia", "Year": "2020"}, {"Electricity from renewables (TWh)": 308.77, "Entity": "Brazil", "Year": "2000"}, {"Electricity from renewables (TWh)": 520.01, "Entity": "Brazil", "Year": "2020"}, {"Electricity from renewables (TWh)": 363.7, "Entity": "Canada", "Year": "2000"}, {"Electricity from renewables (TWh)": 429.24, "Entity": "Canada", "Year": "2020"}, {"Electricity from renewables (TWh)": 225.56, "Entity": "China", "Year": "2000"}, {"Electricity from renewables (TWh)": 2184.94, "Entity": "China", "Year": "2020"}, {"Electricity from renewables (TWh)": 67.83, "Entity": "France", "Year": "2000"}, {"Electricity from renewables (TWh)": 125.28, "Entity": "France", "Year": "2020"}, {"Electricity from renewables (TWh)": 35.47, "Entity": "Germany", "Year": "2000"}, {"Electricity from renewables (TWh)": 251.48, "Entity": "Germany", "Year": "2020"}, {"Electricity from renewables (TWh)": 80.27, "Entity": "India", "Year": "2000"}, {"Electricity from renewables (TWh)": 315.76, "Entity": "India", "Year": "2020"}, {"Electricity from renewables (TWh)": 19.6, "Entity": "Indonesia", "Year": "2000"}, {"Electricity from renewables (TWh)": 52.91, "Entity": "Indonesia", "Year": "2020"}, {"Electricity from renewables (TWh)": 50.87, "Entity": "Italy", "Year": "2000"}, {"Electricity from renewables (TWh)": 116.9, "Entity": "Italy", "Year": "2020"}, {"Electricity from renewables (TWh)": 104.16, "Entity": "Japan", "Year": "2000"}, {"Electricity from renewables (TWh)": 205.6, "Entity": "Japan", "Year": "2020"}, {"Electricity from renewables (TWh)": 7.53, "Entity": "Kazakhstan", "Year": "2000"}, {"Electricity from renewables (TWh)": 11.94, "Entity": "Kazakhstan", "Year": "2020"}, {"Electricity from renewables (TWh)": 44.51, "Entity": "Mexico", "Year": "2000"}, {"Electricity from renewables (TWh)": 69.19, "Entity": "Mexico", "Year": "2020"}, {"Electricity from renewables (TWh)": 2.33, "Entity": "Poland", "Year": "2000"}, {"Electricity from renewables (TWh)": 28.23, "Entity": "Poland", "Year": "2020"}, {"Electricity from renewables (TWh)": 0, "Entity": "Saudi Arabia", "Year": "2000"}, {"Electricity from renewables (TWh)": 0.21, "Entity": "Saudi Arabia", "Year": "2020"}, {"Electricity from renewables (TWh)": 1.79, "Entity": "South Africa", "Year": "2000"}, {"Electricity from renewables (TWh)": 12.83, "Entity": "South Africa", "Year": "2020"}, {"Electricity from renewables (TWh)": 34.49, "Entity": "Spain", "Year": "2000"}, {"Electricity from renewables (TWh)": 113.79, "Entity": "Spain", "Year": "2020"}, {"Electricity from renewables (TWh)": 6.38, "Entity": "Thailand", "Year": "2000"}, {"Electricity from renewables (TWh)": 24.73, "Entity": "Thailand", "Year": "2020"}, {"Electricity from renewables (TWh)": 11.28, "Entity": "Ukraine", "Year": "2000"}, {"Electricity from renewables (TWh)": 17.56, "Entity": "Ukraine", "Year": "2020"}, {"Electricity from renewables (TWh)": 9.98, "Entity": "United Kingdom", "Year": "2000"}, {"Electricity from renewables (TWh)": 131.74, "Entity": "United Kingdom", "Year": "2020"}, {"Electricity from renewables (TWh)": 350.93, "Entity": "United States", "Year": "2000"}, {"Electricity from renewables (TWh)": 821.4, "Entity": "United States", "Year": "2020"}], "metadata": {"Electricity from renewables (TWh)": {"type": "number", "semanticType": "Number"}, "Entity": {"type": "string", "semanticType": "Location"}, "Year": {"type": "number", "semanticType": "Year"}}, "derive": {"code": "\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n transformed_df = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Select only the required columns\n transformed_df = transformed_df[['Entity', 'Year', 'Electricity from renewables (TWh)']]\n \n # Convert Year to string for better visualization as a categorical variable\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year for better readability\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n", "source": ["global-energy-20-small.csv"], "dialog": [{"content": "You are a data scientist to help user to transform data that will be used for visualization.\nThe user will provide you information about what data would be needed, and your job is to create a python function based on the input data summary, transformation instruction and expected fields.\nThe users' instruction includes \"chart_type\" and \"chart_encodings\" that describe the visualization they want, and natural language instructions \"goal\" that describe what data is needed.\n\n**Important:**\n- NEVER make assumptions or judgments about a person's gender, biological sex, sexuality, religion, race, nationality, ethnicity, political stance, socioeconomic status, mental health, invisible disabilities, medical conditions, personality type, social impressions, emotional state, and cognitive state.\n- NEVER create formulas that could be used to discriminate based on age. Ageism of any form (explicit and implicit) is strictly prohibited.\n- If above issue occurs, generate columns with np.nan.\n\nConcretely, you should first refine users' goal and then create a python function in the output section based off the [CONTEXT] and [GOAL]:\n\n 1. First, refine users' [GOAL]. The main objective in this step is to check if \"chart_type\" and \"chart_encodings\" provided by the user are sufficient to achieve their \"goal\". Concretely:\n - based on the user's \"goal\" and \"chart_type\" and \"chart_encodings\", elaborate the goal into a \"detailed_instruction\".\n - \"display_instruction\" is a short verb phrase describing the users' goal. \n - it would be a short verbal description of user intent as a verb phrase (<12 words).\n - generate it based on detailed_instruction and the suggested chart_type and chart_encodings, but don't need to mention the chart details.\n - should capture key computation ideas: by reading the display, the user can understand the purpose and what's derived from the data.\n - if the user specification follows up the previous instruction, the 'display_instruction' should only describe how it builds up the previous instruction without repeating information from previous steps.\n - the phrase can be presented in different styles, e.g., question (what's xxx), instruction (show xxx), description, etc.\n - if you mention column names from the input or the output data, highlight the text in **bold**.\n * the column can either be a column in the input data, or a new column that will be computed in the output data.\n * the mention don't have to be exact match, it can be semantically matching, e.g., if you mentioned \"average score\" in the text while the column to be computed is \"Avg_Score\", you should still highlight \"**average score**\" in the text.\n - determine \"output_fields\", the desired fields that the output data should have to achieve the user's goal, it's a good idea to include intermediate fields here.\n - then decide \"chart_encodings\", which maps visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized, \n - the \"chart_encodings\" should be created to support the user's \"chart_type\".\n - first, determine whether the user has provided sufficient fields in \"chart_encodings\" that are needed to achieve their goal:\n - if the user's \"chart_encodings\" are sufficient, simply copy it.\n - if the user didn't provide sufficient fields in \"chart_encodings\", add missing fields in \"chart_encodings\" (ordered them based on whether the field will be used in x,y axes or legends);\n - \"chart_encodings\" should only include fields that will be visualized (do not include other intermediate fields from \"output_fields\") \n - when adding new fields to \"chart_encodings\", be efficient and add only a minimal number of fields that are needed to achive the user's goal. \n - generally, the total number of fields in \"chart_encodings\" should be no more than 3 for x,y,legend.\n - if the user's \"chart_encodings\" is sufficient but can be optimized, you can reorder encodings to visualize the data more effectively.\n - sometimes, user may provide instruction to update visualizations fields they provided. You should leverage the user's goal to resolve the conflict and decide the final \"chart_encodings\"\n - e.g., they may mention \"use B metric instead\" while A metric is in provided fields, in this case, you should update \"chart_encodings\" to update A metric with B metric.\n - guide on statistical analysis:\n - when the user asks for forecasting or regression analysis, you should consider the following:\n - the output should be a long format table where actual x, y pairs and predicted x, y pairs are included in the X, Y columns, they are differentiated with a third column \"is_predicted\" that is a boolean field.\n - i.e., if the user ask for forecasting based on two columns T and Y, the output should be three columns: T, Y, is_predicted, where\n - T, Y columns contain BOTH original values from the data and predicted values from the data.\n - is_predicted is a boolean field to indicate whether the x, y pairs are original values from the data or predicted / regression values from the data.\n - the recommended chart should be line chart (time series) or scatter plot (quantitative x, y)\n - if the user asks for forecasting, it's good to include predicted x, y pairs for both x in the original data and future x values (i.e., combine regression and forecasting results)\n - in this case, is_predicted should be of three values 'original', 'regression', 'forecasting'\n - when the user asks for clustering:\n - the output should be a long format table where actual x, y pairs with a third column \"cluster_id\" that indicates the cluster id of the data point.\n - the recommended chart should be scatter plot (quantitative x, y)\n \n Prepare the result in the following json format:\n\n```\n{\n \"detailed_instruction\": \"...\" // string, elaborate user instruction with details if the user\n \"display_instruction\": \"...\" // string, the short verb phrase describing the users' goal.\n \"output_fields\": [...] // string[], describe the desired output fields that the output data should have based on the user's goal, it's a good idea to preserve intermediate fields here (i.e., the goal of transformed data)\n \"chart_encodings\": {\n \"x\": \"\",\n \"y\": \"\",\n \"color\": \"\",\n \"size\": \"\",\n \"opacity\": \"\",\n \"facet\": \"\",\n ... // other visualization channels user used\n } // object: map visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized.\n \"reason\": \"...\" // string, explain why this refinement is made\n}\n```\n\n 2. Then, write a python function based on the refined goal, the function input is a dataframe \"df\" (or multiple dataframes based on tables presented in the [CONTEXT] section) and the output is the transformed dataframe \"transformed_df\". \"transformed_df\" should contain all \"output_fields\" from the refined goal.\nThe python function must follow the template provided in [TEMPLATE], do not import any other libraries or modify function name. The function should be as simple as possible and easily readable.\nIf there is no data transformation needed based on \"output_fields\", the transformation function can simply \"return df\".\n\n[TEMPLATE]\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\nfrom sklearn import ... # import necessary libraries from sklearn if needed\n\ndef transform_data(df1, df2, ...): \n # complete the template here\n return transformed_df\n```\n\nnote: \n- if the user provided one table, then it should be `def transform_data(df1)`, if the user provided multiple tables, then it should be `def transform_data(df1, df2, ...)` and you should consider the join between tables to derive the output.\n- **VERY IMPORTANT** the number of arguments in the function must match the number of tables provided, and the order of arguments must match the order of tables provided.\n- try to use intuitive table names to refer to the input dataframes, for example, if the user provided two tables city and weather, you can use `transform_data(df_city, df_weather)` to refer to the two dataframes, as long as the number and order of the arguments match the number and order of the tables provided.\n- datetime objects handling:\n - if the output field is year, convert it to number, if it is year-month / year-month-day, convert it to string object (e.g., \"2020-01\" / \"2020-01-01\").\n - if the output is time only: convert hour to number if it's just the hour (e.g., 10), but convert hour:min or h:m:s to string object (e.g., \"10:30\", \"10:30:45\")\n - never return datetime object directly, convert it to either number (if it only contains year) or string so it's readable.\n\n 3. The output must only contain a json object representing the refined goal and a python code block representing the transformation code, do not add any extra text explanation.\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 5736319.824, 5738290.039, nan, 9861099.609\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 61.99, 62.21, 63.02, 63.71\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[GOAL]\n\n{\n \"instruction\": \"compare each countries renewable between 2000 and 2020\",\n \"chart_type\": \"Grouped Bar Chart\",\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Renewable Energy\",\n \"color\": \"Year\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Compare renewable energy (Electricity from renewables) for each country between the years 2000 and 2020. Filter the data to include only years 2000 and 2020, and prepare the output with country names, year, and renewable energy values for visualization as a grouped bar chart.\",\n \"display_instruction\": \"Compare **renewable energy** by **country** between **2000** and **2020**\",\n \"output_fields\": [\"Entity\", \"Year\", \"Electricity from renewables (TWh)\"],\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Electricity from renewables (TWh)\",\n \"color\": \"Year\"\n },\n \"reason\": \"The user wants to compare renewable energy across countries between 2000 and 2020. The output field 'Renewable Energy' mentioned in the original chart_encodings maps to 'Electricity from renewables (TWh)' in the dataset. To create a grouped bar chart comparing these two years, we need to filter the data to only include years 2000 and 2020, and output the Entity (country), Year, and the renewable energy values. The chart_encodings are updated to use the actual column name from the dataset.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n transformed_df = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Select only the required columns\n transformed_df = transformed_df[['Entity', 'Year', 'Electricity from renewables (TWh)']]\n \n # Convert Year to string for better visualization as a categorical variable\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year for better readability\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}], "trigger": {"tableId": "global-energy-20-small.csv", "resultTableId": "table-97", "chart": {"id": "chart-1760738769634", "chartType": "Grouped Bar Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "y": {"fieldID": "concept-1760738743125"}, "color": {"channel": "color", "bin": false, "fieldID": "original--global-energy-20-small.csv--Year"}, "column": {"channel": "column", "bin": false}, "row": {"channel": "row", "bin": false}}, "tableRef": "global-energy-20-small.csv", "saved": false, "source": "trigger", "unread": false}, "interaction": [{"from": "user", "to": "datatransform-agent", "role": "instruction", "content": "compare each countries renewable between 2000 and 2020", "displayContent": "Compare **renewable energy** by **country** between **2000** and **2020**"}]}, "explanation": {"agent": "CodeExplanationAgent", "code": "The code performs a straightforward comparison analysis of renewable energy generation across countries:\n\n- **Filters** the dataset to include only data from years **2000** and **2020**, creating a 20-year comparison window\n- **Selects** three key columns: `Entity` (country name), `Year`, and `Electricity from renewables (TWh)` (renewable energy generation)\n- **Converts** the `Year` field to string format to treat it as a categorical variable for visualization purposes\n- **Sorts** the results by `Entity` and `Year` to organize countries alphabetically with their respective year data grouped together\n- **Returns** a cleaned dataset showing how renewable electricity generation has changed for each country between 2000 and 2020", "concepts": [], "dialog": [{"content": "You are a data scientist to help user explain code, \nso that a non-code can clearly understand what the code is doing, you are provided with a summary of the input data, and the transformation code.\n\nYour goal:\n1. You should generate a good itemized explanation of the code so that the reader can understand high-level steps of what the data transformation is doing.\n - Be very concise, and stay at a high-level. The reader doesn't understand code and does not want to learn exactly what the code is doing. They just want to learn what have been done from a logical level.\n - The explanation should be a markdown string that is a list of bullet points (with new lines), highlight constants, data fields, and important verbs.\n2. Generate a list of explanations for new fields (fields not from the input data) that introduce metrics/concepts that are not obvious from the code.\n - provide a declarative definition that explains the new field, use a mathematical notation if applicable.\n - only include new fields explanation of new metrics that are involved in computation (e.g., ROI, commerical_success_score)\n - *DO NOT* explain trivial new fields like \"Decade\" or \"Avg_Rating\", \"US_Sales\" that are self-explanatory.\n - Avoid explaining fields that are simple aggregate of fields in the original data (min_score, avg_value, count, etc.)\n - When a field involves mathematical computation, you can use LaTeX math notation in the explanation. Format mathematical expressions using:\n - Inline math: `\\( ... \\)` for formulas within text\n - Block math: `\\[ ... \\]` for standalone formulas\n - Examples: `\\( \\frac{\\text{Revenue}}{\\text{Cost}} \\)` for ratios, `\\[ \\text{Score} = \\text{Rating} \\times \\text{Worldwide\\_Gross} \\]` for formulas\n - note: when using underscores as part of the text, you need to escape them with a backslash, e.g., `\\_`\n - Note: don't use math notation for fields whose computation is trivial (use plain english), it will likely be confusing to the reader. \n Only use math notation for fields that can not be easilyexplained in plain english. Use it sparingly.\n3. If there are multiple fields that have the similar computation, you can explain them together in one explanation.\n - in \"field\", you can provide a list of fields in format of \"field1, field2, ...\"\n - in \"explanation\", you can provide a single explanation for the computation of the fields.\n - for example, if you have fields like \"Norm_Rating\", \"Norm_Gross\", \"Critical_Commercial_Score\", you can explain Norm_Rating, Norm_Gross together in one explanation and explain Critical_Commercial_Score in another explanation.\n4. If the code is about statistical analysis, you should explain the statistical analysis in the explanation as a concept named \"Statistical Analysis\" in the [CONCEPTS EXPLANATION] section.\n - explain how you model the data, which fields are used, how data processing is done, and what models are used.\n - suggest some other modeling approaches that can be used to analyze the data in the explanation as well.\n \nThe focus is to explain how new fields are computed, don't generate explanation for low-level actions like \"return\", \"load data\" etc. \n\nProvide the result in the following two sections:\n - first section is the code explanation that should be a markdown block explaining the code, in the [CODE EXPLANATION] section.\n - remember to highlight constants, data fields, and important verbs in the code explanation.\n - second section is the concepts explanation that should be a json block (start with ```json) in the [CONCEPTS EXPLANATION] section.\n\n[CODE EXPLANATION]\n\n...(explanation of the code)\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"...\",\n \"explanation\": \"...\"\n }\n]\n\n```\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small_csv)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 10006669.92, 10502929.69, 10707219.73, nan\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 56.18, 61.99, 63.02, 63.71\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[CODE]\n\nhere is the transformation code: \nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n transformed_df = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Select only the required columns\n transformed_df = transformed_df[['Entity', 'Year', 'Electricity from renewables (TWh)']]\n \n # Convert Year to string for better visualization as a categorical variable\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year for better readability\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n\n\n[EXPLANATION]\n", "role": "user"}, {"content": "[CODE EXPLANATION]\n\nThe code performs a straightforward comparison analysis of renewable energy generation across countries:\n\n- **Filters** the dataset to include only data from years **2000** and **2020**, creating a 20-year comparison window\n- **Selects** three key columns: `Entity` (country name), `Year`, and `Electricity from renewables (TWh)` (renewable energy generation)\n- **Converts** the `Year` field to string format to treat it as a categorical variable for visualization purposes\n- **Sorts** the results by `Entity` and `Year` to organize countries alphabetically with their respective year data grouped together\n- **Returns** a cleaned dataset showing how renewable electricity generation has changed for each country between 2000 and 2020\n\n[CONCEPTS EXPLANATION]\n\n```json\n[]\n```", "role": "assistant"}], "status": "ok"}, "status": "completed"}, "anchored": false, "attachedMetadata": ""}, {"kind": "table", "id": "table-27", "displayId": "renewable-energy1", "names": ["Entity", "Renewable Percentage", "Year"], "rows": [{"Entity": "Australia", "Renewable Percentage": 8.6344368187, "Year": "2000"}, {"Entity": "Australia", "Renewable Percentage": 25.5031684668, "Year": "2020"}, {"Entity": "Brazil", "Renewable Percentage": 90.1307723743, "Year": "2000"}, {"Entity": "Brazil", "Renewable Percentage": 84.6411771408, "Year": "2020"}, {"Entity": "Canada", "Renewable Percentage": 61.8095917882, "Year": "2000"}, {"Entity": "Canada", "Renewable Percentage": 68.7796436354, "Year": "2020"}, {"Entity": "China", "Renewable Percentage": 16.639126586, "Year": "2000"}, {"Entity": "China", "Renewable Percentage": 28.2464606924, "Year": "2020"}, {"Entity": "France", "Renewable Percentage": 12.7117691154, "Year": "2000"}, {"Entity": "France", "Renewable Percentage": 23.7610241821, "Year": "2020"}, {"Entity": "Germany", "Renewable Percentage": 6.1977983575, "Year": "2000"}, {"Entity": "Germany", "Renewable Percentage": 44.3324048937, "Year": "2020"}, {"Entity": "Global Average", "Renewable Percentage": 16.4213212559, "Year": "2000"}, {"Entity": "Global Average", "Renewable Percentage": 29.2955247263, "Year": "2020"}, {"Entity": "India", "Renewable Percentage": 14.0481982534, "Year": "2000"}, {"Entity": "India", "Renewable Percentage": 20.2059243238, "Year": "2020"}, {"Entity": "Indonesia", "Renewable Percentage": null, "Year": "2000"}, {"Entity": "Indonesia", "Renewable Percentage": null, "Year": "2020"}, {"Entity": "Italy", "Renewable Percentage": 18.900241501, "Year": "2000"}, {"Entity": "Italy", "Renewable Percentage": 42.0397741576, "Year": "2020"}, {"Entity": "Japan", "Renewable Percentage": 10.5382436261, "Year": "2000"}, {"Entity": "Japan", "Renewable Percentage": 21.324925062, "Year": "2020"}, {"Entity": "Kazakhstan", "Renewable Percentage": null, "Year": "2000"}, {"Entity": "Kazakhstan", "Renewable Percentage": null, "Year": "2020"}, {"Entity": "Mexico", "Renewable Percentage": 22.9291160107, "Year": "2000"}, {"Entity": "Mexico", "Renewable Percentage": 21.2552224134, "Year": "2020"}, {"Entity": "Poland", "Renewable Percentage": 1.6273222517, "Year": "2000"}, {"Entity": "Poland", "Renewable Percentage": 17.9648720886, "Year": "2020"}, {"Entity": "Saudi Arabia", "Renewable Percentage": null, "Year": "2000"}, {"Entity": "Saudi Arabia", "Renewable Percentage": null, "Year": "2020"}, {"Entity": "South Africa", "Renewable Percentage": 0.9110805721, "Year": "2000"}, {"Entity": "South Africa", "Renewable Percentage": 5.780581212, "Year": "2020"}, {"Entity": "Spain", "Renewable Percentage": 15.6119862394, "Year": "2000"}, {"Entity": "Spain", "Renewable Percentage": 43.8108805298, "Year": "2020"}, {"Entity": "Thailand", "Renewable Percentage": 7.1261029822, "Year": "2000"}, {"Entity": "Thailand", "Renewable Percentage": 13.7963737796, "Year": "2020"}, {"Entity": "Ukraine", "Renewable Percentage": 6.5860921352, "Year": "2000"}, {"Entity": "Ukraine", "Renewable Percentage": 11.8440577364, "Year": "2020"}, {"Entity": "United Kingdom", "Renewable Percentage": 2.6657406913, "Year": "2000"}, {"Entity": "United Kingdom", "Renewable Percentage": 42.8603962651, "Year": "2020"}, {"Entity": "United States", "Renewable Percentage": 9.2298992662, "Year": "2000"}, {"Entity": "United States", "Renewable Percentage": 20.3156921037, "Year": "2020"}], "metadata": {"Entity": {"type": "string", "semanticType": "Name"}, "Renewable Percentage": {"type": "number", "semanticType": "Percentage"}, "Year": {"type": "number", "semanticType": "Year"}}, "derive": {"code": "\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n df_filtered = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Calculate total electricity generation\n df_filtered['Total Electricity'] = (\n df_filtered['Electricity from fossil fuels (TWh)'] + \n df_filtered['Electricity from nuclear (TWh)'] + \n df_filtered['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_filtered['Renewable Percentage'] = (\n df_filtered['Electricity from renewables (TWh)'] / df_filtered['Total Electricity'] * 100\n )\n \n # Select required columns for individual countries\n country_data = df_filtered[['Entity', 'Year', 'Renewable Percentage']].copy()\n \n # Calculate global average for each year\n global_avg = df_filtered.groupby('Year').apply(\n lambda x: pd.Series({\n 'Entity': 'Global Average',\n 'Year': x['Year'].iloc[0],\n 'Renewable Percentage': (\n x['Electricity from renewables (TWh)'].sum() / x['Total Electricity'].sum() * 100\n )\n })\n ).reset_index(drop=True)\n \n # Combine country data with global average\n transformed_df = pd.concat([country_data, global_avg], ignore_index=True)\n \n # Convert Year to string for better visualization\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n", "source": ["global-energy-20-small.csv"], "dialog": [{"content": "You are a data scientist to help user to transform data that will be used for visualization.\nThe user will provide you information about what data would be needed, and your job is to create a python function based on the input data summary, transformation instruction and expected fields.\nThe users' instruction includes \"chart_type\" and \"chart_encodings\" that describe the visualization they want, and natural language instructions \"goal\" that describe what data is needed.\n\n**Important:**\n- NEVER make assumptions or judgments about a person's gender, biological sex, sexuality, religion, race, nationality, ethnicity, political stance, socioeconomic status, mental health, invisible disabilities, medical conditions, personality type, social impressions, emotional state, and cognitive state.\n- NEVER create formulas that could be used to discriminate based on age. Ageism of any form (explicit and implicit) is strictly prohibited.\n- If above issue occurs, generate columns with np.nan.\n\nConcretely, you should first refine users' goal and then create a python function in the output section based off the [CONTEXT] and [GOAL]:\n\n 1. First, refine users' [GOAL]. The main objective in this step is to check if \"chart_type\" and \"chart_encodings\" provided by the user are sufficient to achieve their \"goal\". Concretely:\n - based on the user's \"goal\" and \"chart_type\" and \"chart_encodings\", elaborate the goal into a \"detailed_instruction\".\n - \"display_instruction\" is a short verb phrase describing the users' goal. \n - it would be a short verbal description of user intent as a verb phrase (<12 words).\n - generate it based on detailed_instruction and the suggested chart_type and chart_encodings, but don't need to mention the chart details.\n - should capture key computation ideas: by reading the display, the user can understand the purpose and what's derived from the data.\n - if the user specification follows up the previous instruction, the 'display_instruction' should only describe how it builds up the previous instruction without repeating information from previous steps.\n - the phrase can be presented in different styles, e.g., question (what's xxx), instruction (show xxx), description, etc.\n - if you mention column names from the input or the output data, highlight the text in **bold**.\n * the column can either be a column in the input data, or a new column that will be computed in the output data.\n * the mention don't have to be exact match, it can be semantically matching, e.g., if you mentioned \"average score\" in the text while the column to be computed is \"Avg_Score\", you should still highlight \"**average score**\" in the text.\n - determine \"output_fields\", the desired fields that the output data should have to achieve the user's goal, it's a good idea to include intermediate fields here.\n - then decide \"chart_encodings\", which maps visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized, \n - the \"chart_encodings\" should be created to support the user's \"chart_type\".\n - first, determine whether the user has provided sufficient fields in \"chart_encodings\" that are needed to achieve their goal:\n - if the user's \"chart_encodings\" are sufficient, simply copy it.\n - if the user didn't provide sufficient fields in \"chart_encodings\", add missing fields in \"chart_encodings\" (ordered them based on whether the field will be used in x,y axes or legends);\n - \"chart_encodings\" should only include fields that will be visualized (do not include other intermediate fields from \"output_fields\") \n - when adding new fields to \"chart_encodings\", be efficient and add only a minimal number of fields that are needed to achive the user's goal. \n - generally, the total number of fields in \"chart_encodings\" should be no more than 3 for x,y,legend.\n - if the user's \"chart_encodings\" is sufficient but can be optimized, you can reorder encodings to visualize the data more effectively.\n - sometimes, user may provide instruction to update visualizations fields they provided. You should leverage the user's goal to resolve the conflict and decide the final \"chart_encodings\"\n - e.g., they may mention \"use B metric instead\" while A metric is in provided fields, in this case, you should update \"chart_encodings\" to update A metric with B metric.\n - guide on statistical analysis:\n - when the user asks for forecasting or regression analysis, you should consider the following:\n - the output should be a long format table where actual x, y pairs and predicted x, y pairs are included in the X, Y columns, they are differentiated with a third column \"is_predicted\" that is a boolean field.\n - i.e., if the user ask for forecasting based on two columns T and Y, the output should be three columns: T, Y, is_predicted, where\n - T, Y columns contain BOTH original values from the data and predicted values from the data.\n - is_predicted is a boolean field to indicate whether the x, y pairs are original values from the data or predicted / regression values from the data.\n - the recommended chart should be line chart (time series) or scatter plot (quantitative x, y)\n - if the user asks for forecasting, it's good to include predicted x, y pairs for both x in the original data and future x values (i.e., combine regression and forecasting results)\n - in this case, is_predicted should be of three values 'original', 'regression', 'forecasting'\n - when the user asks for clustering:\n - the output should be a long format table where actual x, y pairs with a third column \"cluster_id\" that indicates the cluster id of the data point.\n - the recommended chart should be scatter plot (quantitative x, y)\n \n Prepare the result in the following json format:\n\n```\n{\n \"detailed_instruction\": \"...\" // string, elaborate user instruction with details if the user\n \"display_instruction\": \"...\" // string, the short verb phrase describing the users' goal.\n \"output_fields\": [...] // string[], describe the desired output fields that the output data should have based on the user's goal, it's a good idea to preserve intermediate fields here (i.e., the goal of transformed data)\n \"chart_encodings\": {\n \"x\": \"\",\n \"y\": \"\",\n \"color\": \"\",\n \"size\": \"\",\n \"opacity\": \"\",\n \"facet\": \"\",\n ... // other visualization channels user used\n } // object: map visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized.\n \"reason\": \"...\" // string, explain why this refinement is made\n}\n```\n\n 2. Then, write a python function based on the refined goal, the function input is a dataframe \"df\" (or multiple dataframes based on tables presented in the [CONTEXT] section) and the output is the transformed dataframe \"transformed_df\". \"transformed_df\" should contain all \"output_fields\" from the refined goal.\nThe python function must follow the template provided in [TEMPLATE], do not import any other libraries or modify function name. The function should be as simple as possible and easily readable.\nIf there is no data transformation needed based on \"output_fields\", the transformation function can simply \"return df\".\n\n[TEMPLATE]\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\nfrom sklearn import ... # import necessary libraries from sklearn if needed\n\ndef transform_data(df1, df2, ...): \n # complete the template here\n return transformed_df\n```\n\nnote: \n- if the user provided one table, then it should be `def transform_data(df1)`, if the user provided multiple tables, then it should be `def transform_data(df1, df2, ...)` and you should consider the join between tables to derive the output.\n- **VERY IMPORTANT** the number of arguments in the function must match the number of tables provided, and the order of arguments must match the order of tables provided.\n- try to use intuitive table names to refer to the input dataframes, for example, if the user provided two tables city and weather, you can use `transform_data(df_city, df_weather)` to refer to the two dataframes, as long as the number and order of the arguments match the number and order of the tables provided.\n- datetime objects handling:\n - if the output field is year, convert it to number, if it is year-month / year-month-day, convert it to string object (e.g., \"2020-01\" / \"2020-01-01\").\n - if the output is time only: convert hour to number if it's just the hour (e.g., 10), but convert hour:min or h:m:s to string object (e.g., \"10:30\", \"10:30:45\")\n - never return datetime object directly, convert it to either number (if it only contains year) or string so it's readable.\n\n 3. The output must only contain a json object representing the refined goal and a python code block representing the transformation code, do not add any extra text explanation.\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 5736319.824, 5738290.039, nan, 9861099.609\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 61.99, 62.21, 63.02, 63.71\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[GOAL]\n\n{\n \"instruction\": \"compare each countries renewable between 2000 and 2020\",\n \"chart_type\": \"Grouped Bar Chart\",\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Renewable Energy\",\n \"color\": \"Year\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Compare renewable energy (Electricity from renewables) for each country between the years 2000 and 2020. Filter the data to include only years 2000 and 2020, and prepare the output with country names, year, and renewable energy values for visualization as a grouped bar chart.\",\n \"display_instruction\": \"Compare **renewable energy** by **country** between **2000** and **2020**\",\n \"output_fields\": [\"Entity\", \"Year\", \"Electricity from renewables (TWh)\"],\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Electricity from renewables (TWh)\",\n \"color\": \"Year\"\n },\n \"reason\": \"The user wants to compare renewable energy across countries between 2000 and 2020. The output field 'Renewable Energy' mentioned in the original chart_encodings maps to 'Electricity from renewables (TWh)' in the dataset. To create a grouped bar chart comparing these two years, we need to filter the data to only include years 2000 and 2020, and output the Entity (country), Year, and the renewable energy values. The chart_encodings are updated to use the actual column name from the dataset.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n transformed_df = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Select only the required columns\n transformed_df = transformed_df[['Entity', 'Year', 'Electricity from renewables (TWh)']]\n \n # Convert Year to string for better visualization as a categorical variable\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year for better readability\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Electricity from renewables (TWh) Entity Year\n0 17.11 Australia 2000\n1 63.99 Australia 2020\n2 308.77 Brazil 2000\n3 520.01 Brazil 2020\n4 363.70 Canada 2000\n5 429.24 Canada 2020\n6 225.56 China 2000\n7 2184.94 China 2020\n8 67.83 France 2000\n9 125.28 France 2020\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show their renewable percentage instead, and add an entity to show global average\",\n \"chart_type\": \"Grouped Bar Chart\",\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Electricity from renewables (TWh)\",\n \"color\": \"Year\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Calculate the renewable energy percentage (proportion of renewables to total electricity generation from all sources) for each country between 2000 and 2020. Add a 'Global Average' entity that represents the worldwide average renewable percentage for each year. The output should include Entity (with 'Global Average' added), Year, and the renewable percentage.\",\n \"display_instruction\": \"Show **renewable energy percentage** by **country** with **global average** for **2000** and **2020**\",\n \"output_fields\": [\"Entity\", \"Year\", \"Renewable Percentage\"],\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Renewable Percentage\",\n \"color\": \"Year\"\n },\n \"reason\": \"The user wants to see renewable percentage instead of absolute values, and wants to add a global average entity for comparison. Renewable percentage is calculated as the ratio of renewable energy to total electricity generation (fossil fuels + nuclear + renewables). A 'Global Average' entity needs to be computed by aggregating all countries' data for each year. The y-axis encoding is updated to reflect the new metric 'Renewable Percentage'.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n df_filtered = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Calculate total electricity generation\n df_filtered['Total Electricity'] = (\n df_filtered['Electricity from fossil fuels (TWh)'] + \n df_filtered['Electricity from nuclear (TWh)'] + \n df_filtered['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_filtered['Renewable Percentage'] = (\n df_filtered['Electricity from renewables (TWh)'] / df_filtered['Total Electricity'] * 100\n )\n \n # Select required columns for individual countries\n country_data = df_filtered[['Entity', 'Year', 'Renewable Percentage']].copy()\n \n # Calculate global average for each year\n global_avg = df_filtered.groupby('Year').apply(\n lambda x: pd.Series({\n 'Entity': 'Global Average',\n 'Year': x['Year'].iloc[0],\n 'Renewable Percentage': (\n x['Electricity from renewables (TWh)'].sum() / x['Total Electricity'].sum() * 100\n )\n })\n ).reset_index(drop=True)\n \n # Combine country data with global average\n transformed_df = pd.concat([country_data, global_avg], ignore_index=True)\n \n # Convert Year to string for better visualization\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}], "trigger": {"tableId": "table-97", "resultTableId": "table-27", "chart": {"id": "chart-1760738813383", "chartType": "Grouped Bar Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "y": {"fieldID": "original--global-energy-20-small.csv--Electricity from renewables (TWh)"}, "color": {"fieldID": "original--global-energy-20-small.csv--Year"}, "column": {"channel": "column", "bin": false}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-97", "saved": false, "source": "trigger", "unread": false}, "interaction": [{"from": "user", "to": "datatransform-agent", "role": "instruction", "content": "show their renewable percentage instead, and add an entity to show global average", "displayContent": "Show **renewable energy percentage** by **country** with **global average** for **2000** and **2020**"}]}, "explanation": {"agent": "CodeExplanationAgent", "code": "The code performs the following transformation steps:\n\n1. **Filter data** to include only records from **2000** and **2020**\n2. Calculate **Total Electricity** generation by summing three sources: **fossil fuels**, **nuclear**, and **renewables** (all measured in TWh)\n3. Compute **Renewable Percentage** for each country and year by dividing **Electricity from renewables** by **Total Electricity** and multiplying by 100\n4. Extract relevant columns: **Entity** (country name), **Year**, and **Renewable Percentage**\n5. Calculate **Global Average** renewable percentage for each year by:\n - Summing all countries' **renewable electricity** generation\n - Dividing by the sum of all countries' **total electricity** generation\n - Multiplying by 100 to get percentage\n6. Combine individual country data with the **Global Average** statistics\n7. Convert **Year** values to string format\n8. Sort results by **Entity** name and **Year**", "concepts": [{"explanation": "The sum of electricity generated from all three sources (fossil fuels, nuclear, and renewables) measured in terawatt-hours (TWh). This represents the total electricity production capacity for each country.", "field": "Total Electricity"}, {"explanation": "The proportion of electricity generated from renewable sources relative to total electricity production, expressed as a percentage: \\( \\text{Renewable Percentage} = \\frac{\\text{Electricity from renewables}}{\\text{Total Electricity}} \\times 100 \\). This metric indicates the extent of renewable energy adoption in a country's electricity mix.", "field": "Renewable Percentage"}], "dialog": [{"content": "You are a data scientist to help user explain code, \nso that a non-code can clearly understand what the code is doing, you are provided with a summary of the input data, and the transformation code.\n\nYour goal:\n1. You should generate a good itemized explanation of the code so that the reader can understand high-level steps of what the data transformation is doing.\n - Be very concise, and stay at a high-level. The reader doesn't understand code and does not want to learn exactly what the code is doing. They just want to learn what have been done from a logical level.\n - The explanation should be a markdown string that is a list of bullet points (with new lines), highlight constants, data fields, and important verbs.\n2. Generate a list of explanations for new fields (fields not from the input data) that introduce metrics/concepts that are not obvious from the code.\n - provide a declarative definition that explains the new field, use a mathematical notation if applicable.\n - only include new fields explanation of new metrics that are involved in computation (e.g., ROI, commerical_success_score)\n - *DO NOT* explain trivial new fields like \"Decade\" or \"Avg_Rating\", \"US_Sales\" that are self-explanatory.\n - Avoid explaining fields that are simple aggregate of fields in the original data (min_score, avg_value, count, etc.)\n - When a field involves mathematical computation, you can use LaTeX math notation in the explanation. Format mathematical expressions using:\n - Inline math: `\\( ... \\)` for formulas within text\n - Block math: `\\[ ... \\]` for standalone formulas\n - Examples: `\\( \\frac{\\text{Revenue}}{\\text{Cost}} \\)` for ratios, `\\[ \\text{Score} = \\text{Rating} \\times \\text{Worldwide\\_Gross} \\]` for formulas\n - note: when using underscores as part of the text, you need to escape them with a backslash, e.g., `\\_`\n - Note: don't use math notation for fields whose computation is trivial (use plain english), it will likely be confusing to the reader. \n Only use math notation for fields that can not be easilyexplained in plain english. Use it sparingly.\n3. If there are multiple fields that have the similar computation, you can explain them together in one explanation.\n - in \"field\", you can provide a list of fields in format of \"field1, field2, ...\"\n - in \"explanation\", you can provide a single explanation for the computation of the fields.\n - for example, if you have fields like \"Norm_Rating\", \"Norm_Gross\", \"Critical_Commercial_Score\", you can explain Norm_Rating, Norm_Gross together in one explanation and explain Critical_Commercial_Score in another explanation.\n4. If the code is about statistical analysis, you should explain the statistical analysis in the explanation as a concept named \"Statistical Analysis\" in the [CONCEPTS EXPLANATION] section.\n - explain how you model the data, which fields are used, how data processing is done, and what models are used.\n - suggest some other modeling approaches that can be used to analyze the data in the explanation as well.\n \nThe focus is to explain how new fields are computed, don't generate explanation for low-level actions like \"return\", \"load data\" etc. \n\nProvide the result in the following two sections:\n - first section is the code explanation that should be a markdown block explaining the code, in the [CODE EXPLANATION] section.\n - remember to highlight constants, data fields, and important verbs in the code explanation.\n - second section is the concepts explanation that should be a json block (start with ```json) in the [CONCEPTS EXPLANATION] section.\n\n[CODE EXPLANATION]\n\n...(explanation of the code)\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"...\",\n \"explanation\": \"...\"\n }\n]\n\n```\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small_csv)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 10096009.77, 10502929.69, 10707219.73, nan\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., nan, nan, nan, nan\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[CODE]\n\nhere is the transformation code: \nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n df_filtered = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Calculate total electricity generation\n df_filtered['Total Electricity'] = (\n df_filtered['Electricity from fossil fuels (TWh)'] + \n df_filtered['Electricity from nuclear (TWh)'] + \n df_filtered['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_filtered['Renewable Percentage'] = (\n df_filtered['Electricity from renewables (TWh)'] / df_filtered['Total Electricity'] * 100\n )\n \n # Select required columns for individual countries\n country_data = df_filtered[['Entity', 'Year', 'Renewable Percentage']].copy()\n \n # Calculate global average for each year\n global_avg = df_filtered.groupby('Year').apply(\n lambda x: pd.Series({\n 'Entity': 'Global Average',\n 'Year': x['Year'].iloc[0],\n 'Renewable Percentage': (\n x['Electricity from renewables (TWh)'].sum() / x['Total Electricity'].sum() * 100\n )\n })\n ).reset_index(drop=True)\n \n # Combine country data with global average\n transformed_df = pd.concat([country_data, global_avg], ignore_index=True)\n \n # Convert Year to string for better visualization\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n\n\n[EXPLANATION]\n", "role": "user"}, {"content": "[CODE EXPLANATION]\n\nThe code performs the following transformation steps:\n\n1. **Filter data** to include only records from **2000** and **2020**\n2. Calculate **Total Electricity** generation by summing three sources: **fossil fuels**, **nuclear**, and **renewables** (all measured in TWh)\n3. Compute **Renewable Percentage** for each country and year by dividing **Electricity from renewables** by **Total Electricity** and multiplying by 100\n4. Extract relevant columns: **Entity** (country name), **Year**, and **Renewable Percentage**\n5. Calculate **Global Average** renewable percentage for each year by:\n - Summing all countries' **renewable electricity** generation\n - Dividing by the sum of all countries' **total electricity** generation\n - Multiplying by 100 to get percentage\n6. Combine individual country data with the **Global Average** statistics\n7. Convert **Year** values to string format\n8. Sort results by **Entity** name and **Year**\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"Total Electricity\",\n \"explanation\": \"The sum of electricity generated from all three sources (fossil fuels, nuclear, and renewables) measured in terawatt-hours (TWh). This represents the total electricity production capacity for each country.\"\n },\n {\n \"field\": \"Renewable Percentage\",\n \"explanation\": \"The proportion of electricity generated from renewable sources relative to total electricity production, expressed as a percentage: \\\\( \\\\text{Renewable Percentage} = \\\\frac{\\\\text{Electricity from renewables}}{\\\\text{Total Electricity}} \\\\times 100 \\\\). This metric indicates the extent of renewable energy adoption in a country's electricity mix.\"\n }\n]\n```", "role": "assistant"}], "status": "ok"}, "status": "completed"}, "anchored": false, "attachedMetadata": ""}, {"kind": "table", "id": "table-81", "displayId": "energy-source1", "names": ["Energy", "Energy Source", "Year"], "rows": [{"Energy": 7160.71, "Energy Source": "Fossil Fuels", "Year": 2000}, {"Energy": 1996.65, "Energy Source": "Nuclear", "Year": 2000}, {"Energy": 1742.56, "Energy Source": "Renewables", "Year": 2000}, {"Energy": 7273.89, "Energy Source": "Fossil Fuels", "Year": 2001}, {"Energy": 2037.54, "Energy Source": "Nuclear", "Year": 2001}, {"Energy": 1690.11, "Energy Source": "Renewables", "Year": 2001}, {"Energy": 7621.07, "Energy Source": "Fossil Fuels", "Year": 2002}, {"Energy": 2042.18, "Energy Source": "Nuclear", "Year": 2002}, {"Energy": 1757.63, "Energy Source": "Renewables", "Year": 2002}, {"Energy": 8043.86, "Energy Source": "Fossil Fuels", "Year": 2003}, {"Energy": 1998.52, "Energy Source": "Nuclear", "Year": 2003}, {"Energy": 1804.52, "Energy Source": "Renewables", "Year": 2003}, {"Energy": 8399.72, "Energy Source": "Fossil Fuels", "Year": 2004}, {"Energy": 2095, "Energy Source": "Nuclear", "Year": 2004}, {"Energy": 1952.72, "Energy Source": "Renewables", "Year": 2004}, {"Energy": 8828.43, "Energy Source": "Fossil Fuels", "Year": 2005}, {"Energy": 2094.4, "Energy Source": "Nuclear", "Year": 2005}, {"Energy": 2025.26, "Energy Source": "Renewables", "Year": 2005}, {"Energy": 9183.05, "Energy Source": "Fossil Fuels", "Year": 2006}, {"Energy": 2120.38, "Energy Source": "Nuclear", "Year": 2006}, {"Energy": 2165.94, "Energy Source": "Renewables", "Year": 2006}, {"Energy": 9853.09, "Energy Source": "Fossil Fuels", "Year": 2007}, {"Energy": 2067.04, "Energy Source": "Nuclear", "Year": 2007}, {"Energy": 2256.79, "Energy Source": "Renewables", "Year": 2007}, {"Energy": 9817.15, "Energy Source": "Fossil Fuels", "Year": 2008}, {"Energy": 2043.94, "Energy Source": "Nuclear", "Year": 2008}, {"Energy": 2496.03, "Energy Source": "Renewables", "Year": 2008}, {"Energy": 9686.86, "Energy Source": "Fossil Fuels", "Year": 2009}, {"Energy": 2017.25, "Energy Source": "Nuclear", "Year": 2009}, {"Energy": 2563.95, "Energy Source": "Renewables", "Year": 2009}, {"Energy": 10427.03, "Energy Source": "Fossil Fuels", "Year": 2010}, {"Energy": 2083.37, "Energy Source": "Nuclear", "Year": 2010}, {"Energy": 2802.89, "Energy Source": "Renewables", "Year": 2010}, {"Energy": 10974.83, "Energy Source": "Fossil Fuels", "Year": 2011}, {"Energy": 1956, "Energy Source": "Nuclear", "Year": 2011}, {"Energy": 2997.29, "Energy Source": "Renewables", "Year": 2011}, {"Energy": 11277.49, "Energy Source": "Fossil Fuels", "Year": 2012}, {"Energy": 1788.26, "Energy Source": "Nuclear", "Year": 2012}, {"Energy": 3226.1, "Energy Source": "Renewables", "Year": 2012}, {"Energy": 11561.86, "Energy Source": "Fossil Fuels", "Year": 2013}, {"Energy": 1813, "Energy Source": "Nuclear", "Year": 2013}, {"Energy": 3473.9, "Energy Source": "Renewables", "Year": 2013}, {"Energy": 11761.51, "Energy Source": "Fossil Fuels", "Year": 2014}, {"Energy": 1847.87, "Energy Source": "Nuclear", "Year": 2014}, {"Energy": 3753.03, "Energy Source": "Renewables", "Year": 2014}, {"Energy": 11653.61, "Energy Source": "Fossil Fuels", "Year": 2015}, {"Energy": 1886.61, "Energy Source": "Nuclear", "Year": 2015}, {"Energy": 3903.7, "Energy Source": "Renewables", "Year": 2015}, {"Energy": 11785.35, "Energy Source": "Fossil Fuels", "Year": 2016}, {"Energy": 1906.92, "Energy Source": "Nuclear", "Year": 2016}, {"Energy": 4186.83, "Energy Source": "Renewables", "Year": 2016}, {"Energy": 12003.24, "Energy Source": "Fossil Fuels", "Year": 2017}, {"Energy": 1942.94, "Energy Source": "Nuclear", "Year": 2017}, {"Energy": 4500.22, "Energy Source": "Renewables", "Year": 2017}, {"Energy": 12445.24, "Energy Source": "Fossil Fuels", "Year": 2018}, {"Energy": 2016.82, "Energy Source": "Nuclear", "Year": 2018}, {"Energy": 4834.65, "Energy Source": "Renewables", "Year": 2018}, {"Energy": 12355.91, "Energy Source": "Fossil Fuels", "Year": 2019}, {"Energy": 2073.87, "Energy Source": "Nuclear", "Year": 2019}, {"Energy": 5146.11, "Energy Source": "Renewables", "Year": 2019}, {"Energy": 12032.75, "Energy Source": "Fossil Fuels", "Year": 2020}, {"Energy": 1974.46, "Energy Source": "Nuclear", "Year": 2020}, {"Energy": 5497.73, "Energy Source": "Renewables", "Year": 2020}], "metadata": {"Energy": {"type": "number", "semanticType": "Number"}, "Energy Source": {"type": "string", "semanticType": "String", "levels": ["Fossil Fuels", "Nuclear", "Renewables"]}, "Year": {"type": "number", "semanticType": "Year"}}, "derive": {"code": "\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Group by Year and sum electricity generation from all countries\n global_energy = df_energy.groupby('Year').agg({\n 'Electricity from fossil fuels (TWh)': 'sum',\n 'Electricity from nuclear (TWh)': 'sum',\n 'Electricity from renewables (TWh)': 'sum'\n }).reset_index()\n \n # Rename columns for clarity\n global_energy.columns = ['Year', 'Fossil Fuels', 'Nuclear', 'Renewables']\n \n # Transform to long format\n transformed_df = pd.melt(\n global_energy,\n id_vars=['Year'],\n value_vars=['Fossil Fuels', 'Nuclear', 'Renewables'],\n var_name='Energy Source',\n value_name='Energy'\n )\n \n # Sort by Year and Energy Source\n transformed_df = transformed_df.sort_values(['Year', 'Energy Source']).reset_index(drop=True)\n \n return transformed_df\n", "source": ["global-energy-20-small.csv"], "dialog": [{"content": "You are a data scientist to help user to transform data that will be used for visualization.\nThe user will provide you information about what data would be needed, and your job is to create a python function based on the input data summary, transformation instruction and expected fields.\nThe users' instruction includes \"chart_type\" and \"chart_encodings\" that describe the visualization they want, and natural language instructions \"goal\" that describe what data is needed.\n\n**Important:**\n- NEVER make assumptions or judgments about a person's gender, biological sex, sexuality, religion, race, nationality, ethnicity, political stance, socioeconomic status, mental health, invisible disabilities, medical conditions, personality type, social impressions, emotional state, and cognitive state.\n- NEVER create formulas that could be used to discriminate based on age. Ageism of any form (explicit and implicit) is strictly prohibited.\n- If above issue occurs, generate columns with np.nan.\n\nConcretely, you should first refine users' goal and then create a python function in the output section based off the [CONTEXT] and [GOAL]:\n\n 1. First, refine users' [GOAL]. The main objective in this step is to check if \"chart_type\" and \"chart_encodings\" provided by the user are sufficient to achieve their \"goal\". Concretely:\n - based on the user's \"goal\" and \"chart_type\" and \"chart_encodings\", elaborate the goal into a \"detailed_instruction\".\n - \"display_instruction\" is a short verb phrase describing the users' goal. \n - it would be a short verbal description of user intent as a verb phrase (<12 words).\n - generate it based on detailed_instruction and the suggested chart_type and chart_encodings, but don't need to mention the chart details.\n - should capture key computation ideas: by reading the display, the user can understand the purpose and what's derived from the data.\n - if the user specification follows up the previous instruction, the 'display_instruction' should only describe how it builds up the previous instruction without repeating information from previous steps.\n - the phrase can be presented in different styles, e.g., question (what's xxx), instruction (show xxx), description, etc.\n - if you mention column names from the input or the output data, highlight the text in **bold**.\n * the column can either be a column in the input data, or a new column that will be computed in the output data.\n * the mention don't have to be exact match, it can be semantically matching, e.g., if you mentioned \"average score\" in the text while the column to be computed is \"Avg_Score\", you should still highlight \"**average score**\" in the text.\n - determine \"output_fields\", the desired fields that the output data should have to achieve the user's goal, it's a good idea to include intermediate fields here.\n - then decide \"chart_encodings\", which maps visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized, \n - the \"chart_encodings\" should be created to support the user's \"chart_type\".\n - first, determine whether the user has provided sufficient fields in \"chart_encodings\" that are needed to achieve their goal:\n - if the user's \"chart_encodings\" are sufficient, simply copy it.\n - if the user didn't provide sufficient fields in \"chart_encodings\", add missing fields in \"chart_encodings\" (ordered them based on whether the field will be used in x,y axes or legends);\n - \"chart_encodings\" should only include fields that will be visualized (do not include other intermediate fields from \"output_fields\") \n - when adding new fields to \"chart_encodings\", be efficient and add only a minimal number of fields that are needed to achive the user's goal. \n - generally, the total number of fields in \"chart_encodings\" should be no more than 3 for x,y,legend.\n - if the user's \"chart_encodings\" is sufficient but can be optimized, you can reorder encodings to visualize the data more effectively.\n - sometimes, user may provide instruction to update visualizations fields they provided. You should leverage the user's goal to resolve the conflict and decide the final \"chart_encodings\"\n - e.g., they may mention \"use B metric instead\" while A metric is in provided fields, in this case, you should update \"chart_encodings\" to update A metric with B metric.\n - guide on statistical analysis:\n - when the user asks for forecasting or regression analysis, you should consider the following:\n - the output should be a long format table where actual x, y pairs and predicted x, y pairs are included in the X, Y columns, they are differentiated with a third column \"is_predicted\" that is a boolean field.\n - i.e., if the user ask for forecasting based on two columns T and Y, the output should be three columns: T, Y, is_predicted, where\n - T, Y columns contain BOTH original values from the data and predicted values from the data.\n - is_predicted is a boolean field to indicate whether the x, y pairs are original values from the data or predicted / regression values from the data.\n - the recommended chart should be line chart (time series) or scatter plot (quantitative x, y)\n - if the user asks for forecasting, it's good to include predicted x, y pairs for both x in the original data and future x values (i.e., combine regression and forecasting results)\n - in this case, is_predicted should be of three values 'original', 'regression', 'forecasting'\n - when the user asks for clustering:\n - the output should be a long format table where actual x, y pairs with a third column \"cluster_id\" that indicates the cluster id of the data point.\n - the recommended chart should be scatter plot (quantitative x, y)\n \n Prepare the result in the following json format:\n\n```\n{\n \"detailed_instruction\": \"...\" // string, elaborate user instruction with details if the user\n \"display_instruction\": \"...\" // string, the short verb phrase describing the users' goal.\n \"output_fields\": [...] // string[], describe the desired output fields that the output data should have based on the user's goal, it's a good idea to preserve intermediate fields here (i.e., the goal of transformed data)\n \"chart_encodings\": {\n \"x\": \"\",\n \"y\": \"\",\n \"color\": \"\",\n \"size\": \"\",\n \"opacity\": \"\",\n \"facet\": \"\",\n ... // other visualization channels user used\n } // object: map visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized.\n \"reason\": \"...\" // string, explain why this refinement is made\n}\n```\n\n 2. Then, write a python function based on the refined goal, the function input is a dataframe \"df\" (or multiple dataframes based on tables presented in the [CONTEXT] section) and the output is the transformed dataframe \"transformed_df\". \"transformed_df\" should contain all \"output_fields\" from the refined goal.\nThe python function must follow the template provided in [TEMPLATE], do not import any other libraries or modify function name. The function should be as simple as possible and easily readable.\nIf there is no data transformation needed based on \"output_fields\", the transformation function can simply \"return df\".\n\n[TEMPLATE]\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\nfrom sklearn import ... # import necessary libraries from sklearn if needed\n\ndef transform_data(df1, df2, ...): \n # complete the template here\n return transformed_df\n```\n\nnote: \n- if the user provided one table, then it should be `def transform_data(df1)`, if the user provided multiple tables, then it should be `def transform_data(df1, df2, ...)` and you should consider the join between tables to derive the output.\n- **VERY IMPORTANT** the number of arguments in the function must match the number of tables provided, and the order of arguments must match the order of tables provided.\n- try to use intuitive table names to refer to the input dataframes, for example, if the user provided two tables city and weather, you can use `transform_data(df_city, df_weather)` to refer to the two dataframes, as long as the number and order of the arguments match the number and order of the tables provided.\n- datetime objects handling:\n - if the output field is year, convert it to number, if it is year-month / year-month-day, convert it to string object (e.g., \"2020-01\" / \"2020-01-01\").\n - if the output is time only: convert hour to number if it's just the hour (e.g., 10), but convert hour:min or h:m:s to string object (e.g., \"10:30\", \"10:30:45\")\n - never return datetime object directly, convert it to either number (if it only contains year) or string so it's readable.\n\n 3. The output must only contain a json object representing the refined goal and a python code block representing the transformation code, do not add any extra text explanation.\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 5736319.824, 5738290.039, nan, 9861099.609\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 61.99, 62.21, 63.02, 63.71\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[GOAL]\n\n{\n \"instruction\": \"compare each countries renewable between 2000 and 2020\",\n \"chart_type\": \"Grouped Bar Chart\",\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Renewable Energy\",\n \"color\": \"Year\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Compare renewable energy (Electricity from renewables) for each country between the years 2000 and 2020. Filter the data to include only years 2000 and 2020, and prepare the output with country names, year, and renewable energy values for visualization as a grouped bar chart.\",\n \"display_instruction\": \"Compare **renewable energy** by **country** between **2000** and **2020**\",\n \"output_fields\": [\"Entity\", \"Year\", \"Electricity from renewables (TWh)\"],\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Electricity from renewables (TWh)\",\n \"color\": \"Year\"\n },\n \"reason\": \"The user wants to compare renewable energy across countries between 2000 and 2020. The output field 'Renewable Energy' mentioned in the original chart_encodings maps to 'Electricity from renewables (TWh)' in the dataset. To create a grouped bar chart comparing these two years, we need to filter the data to only include years 2000 and 2020, and output the Entity (country), Year, and the renewable energy values. The chart_encodings are updated to use the actual column name from the dataset.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n transformed_df = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Select only the required columns\n transformed_df = transformed_df[['Entity', 'Year', 'Electricity from renewables (TWh)']]\n \n # Convert Year to string for better visualization as a categorical variable\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year for better readability\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Electricity from renewables (TWh) Entity Year\n0 17.11 Australia 2000\n1 63.99 Australia 2020\n2 308.77 Brazil 2000\n3 520.01 Brazil 2020\n4 363.70 Canada 2000\n5 429.24 Canada 2020\n6 225.56 China 2000\n7 2184.94 China 2020\n8 67.83 France 2000\n9 125.28 France 2020\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show their renewable percentage instead, and add an entity to show global average\",\n \"chart_type\": \"Grouped Bar Chart\",\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Electricity from renewables (TWh)\",\n \"color\": \"Year\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Calculate the renewable energy percentage (proportion of renewables to total electricity generation from all sources) for each country between 2000 and 2020. Add a 'Global Average' entity that represents the worldwide average renewable percentage for each year. The output should include Entity (with 'Global Average' added), Year, and the renewable percentage.\",\n \"display_instruction\": \"Show **renewable energy percentage** by **country** with **global average** for **2000** and **2020**\",\n \"output_fields\": [\"Entity\", \"Year\", \"Renewable Percentage\"],\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Renewable Percentage\",\n \"color\": \"Year\"\n },\n \"reason\": \"The user wants to see renewable percentage instead of absolute values, and wants to add a global average entity for comparison. Renewable percentage is calculated as the ratio of renewable energy to total electricity generation (fossil fuels + nuclear + renewables). A 'Global Average' entity needs to be computed by aggregating all countries' data for each year. The y-axis encoding is updated to reflect the new metric 'Renewable Percentage'.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n df_filtered = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Calculate total electricity generation\n df_filtered['Total Electricity'] = (\n df_filtered['Electricity from fossil fuels (TWh)'] + \n df_filtered['Electricity from nuclear (TWh)'] + \n df_filtered['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_filtered['Renewable Percentage'] = (\n df_filtered['Electricity from renewables (TWh)'] / df_filtered['Total Electricity'] * 100\n )\n \n # Select required columns for individual countries\n country_data = df_filtered[['Entity', 'Year', 'Renewable Percentage']].copy()\n \n # Calculate global average for each year\n global_avg = df_filtered.groupby('Year').apply(\n lambda x: pd.Series({\n 'Entity': 'Global Average',\n 'Year': x['Year'].iloc[0],\n 'Renewable Percentage': (\n x['Electricity from renewables (TWh)'].sum() / x['Total Electricity'].sum() * 100\n )\n })\n ).reset_index(drop=True)\n \n # Combine country data with global average\n transformed_df = pd.concat([country_data, global_avg], ignore_index=True)\n \n # Convert Year to string for better visualization\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Entity Renewable Percentage Year\n0 Australia 8.634437 2000\n1 Australia 25.503168 2020\n2 Brazil 90.130772 2000\n3 Brazil 84.641177 2020\n4 Canada 61.809592 2000\n5 Canada 68.779644 2020\n6 China 16.639127 2000\n7 China 28.246461 2020\n8 France 12.711769 2000\n9 France 23.761024 2020\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show global energy composition over time\",\n \"chart_type\": \"Custom Area\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"Energy\",\n \"color\": \"Energy Source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Show the global energy composition over time by aggregating electricity generation from all countries for each year, broken down by energy source (fossil fuels, nuclear, and renewables). The output should be in long format with Year, Energy Source, and Energy values to support a stacked area chart visualization.\",\n \"display_instruction\": \"Show global **energy composition** by **energy source** over time\",\n \"output_fields\": [\"Year\", \"Energy Source\", \"Energy\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"Energy\",\n \"color\": \"Energy Source\"\n },\n \"reason\": \"The user wants to visualize global energy composition over time, which requires aggregating electricity generation data across all countries for each year and breaking it down by energy source type. The data needs to be transformed into long format where each row represents a year-energy source combination, with the total energy value for that source globally. This follows up from the previous renewable percentage analysis by showing the overall energy mix trends rather than focusing on specific countries or percentages.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Group by Year and sum electricity generation from all countries\n global_energy = df_energy.groupby('Year').agg({\n 'Electricity from fossil fuels (TWh)': 'sum',\n 'Electricity from nuclear (TWh)': 'sum',\n 'Electricity from renewables (TWh)': 'sum'\n }).reset_index()\n \n # Rename columns for clarity\n global_energy.columns = ['Year', 'Fossil Fuels', 'Nuclear', 'Renewables']\n \n # Transform to long format\n transformed_df = pd.melt(\n global_energy,\n id_vars=['Year'],\n value_vars=['Fossil Fuels', 'Nuclear', 'Renewables'],\n var_name='Energy Source',\n value_name='Energy'\n )\n \n # Sort by Year and Energy Source\n transformed_df = transformed_df.sort_values(['Year', 'Energy Source']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}], "trigger": {"tableId": "table-27", "resultTableId": "table-81", "chart": {"id": "chart-1760738942326", "chartType": "Custom Area", "encodingMap": {"x": {"channel": "x", "bin": false, "fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"channel": "y", "bin": false, "fieldID": "concept-1760738917347"}, "x2": {"channel": "x2", "bin": false}, "y2": {"channel": "y2", "bin": false}, "color": {"channel": "color", "bin": false, "fieldID": "concept-1760738921927"}, "column": {"channel": "column", "bin": false}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-27", "saved": false, "source": "trigger", "unread": false}, "interaction": [{"from": "user", "to": "datatransform-agent", "role": "instruction", "content": "show global energy composition over time", "displayContent": "Show global **energy composition** by **energy source** over time"}]}, "explanation": {"agent": "CodeExplanationAgent", "code": "- Group the energy data by **Year** and aggregate electricity generation values across all countries by summing **Electricity from fossil fuels (TWh)**, **Electricity from nuclear (TWh)**, and **Electricity from renewables (TWh)**\n- Rename the aggregated columns to simplified labels: **Fossil Fuels**, **Nuclear**, and **Renewables**\n- Transform the data from wide format to long format by unpivoting the three energy source columns into two columns: **Energy Source** (containing the type of energy) and **Energy** (containing the generation value in TWh)\n- Sort the resulting dataset by **Year** and **Energy Source** for consistent ordering", "concepts": [], "dialog": [{"content": "You are a data scientist to help user explain code, \nso that a non-code can clearly understand what the code is doing, you are provided with a summary of the input data, and the transformation code.\n\nYour goal:\n1. You should generate a good itemized explanation of the code so that the reader can understand high-level steps of what the data transformation is doing.\n - Be very concise, and stay at a high-level. The reader doesn't understand code and does not want to learn exactly what the code is doing. They just want to learn what have been done from a logical level.\n - The explanation should be a markdown string that is a list of bullet points (with new lines), highlight constants, data fields, and important verbs.\n2. Generate a list of explanations for new fields (fields not from the input data) that introduce metrics/concepts that are not obvious from the code.\n - provide a declarative definition that explains the new field, use a mathematical notation if applicable.\n - only include new fields explanation of new metrics that are involved in computation (e.g., ROI, commerical_success_score)\n - *DO NOT* explain trivial new fields like \"Decade\" or \"Avg_Rating\", \"US_Sales\" that are self-explanatory.\n - Avoid explaining fields that are simple aggregate of fields in the original data (min_score, avg_value, count, etc.)\n - When a field involves mathematical computation, you can use LaTeX math notation in the explanation. Format mathematical expressions using:\n - Inline math: `\\( ... \\)` for formulas within text\n - Block math: `\\[ ... \\]` for standalone formulas\n - Examples: `\\( \\frac{\\text{Revenue}}{\\text{Cost}} \\)` for ratios, `\\[ \\text{Score} = \\text{Rating} \\times \\text{Worldwide\\_Gross} \\]` for formulas\n - note: when using underscores as part of the text, you need to escape them with a backslash, e.g., `\\_`\n - Note: don't use math notation for fields whose computation is trivial (use plain english), it will likely be confusing to the reader. \n Only use math notation for fields that can not be easilyexplained in plain english. Use it sparingly.\n3. If there are multiple fields that have the similar computation, you can explain them together in one explanation.\n - in \"field\", you can provide a list of fields in format of \"field1, field2, ...\"\n - in \"explanation\", you can provide a single explanation for the computation of the fields.\n - for example, if you have fields like \"Norm_Rating\", \"Norm_Gross\", \"Critical_Commercial_Score\", you can explain Norm_Rating, Norm_Gross together in one explanation and explain Critical_Commercial_Score in another explanation.\n4. If the code is about statistical analysis, you should explain the statistical analysis in the explanation as a concept named \"Statistical Analysis\" in the [CONCEPTS EXPLANATION] section.\n - explain how you model the data, which fields are used, how data processing is done, and what models are used.\n - suggest some other modeling approaches that can be used to analyze the data in the explanation as well.\n \nThe focus is to explain how new fields are computed, don't generate explanation for low-level actions like \"return\", \"load data\" etc. \n\nProvide the result in the following two sections:\n - first section is the code explanation that should be a markdown block explaining the code, in the [CODE EXPLANATION] section.\n - remember to highlight constants, data fields, and important verbs in the code explanation.\n - second section is the concepts explanation that should be a json block (start with ```json) in the [CONCEPTS EXPLANATION] section.\n\n[CODE EXPLANATION]\n\n...(explanation of the code)\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"...\",\n \"explanation\": \"...\"\n }\n]\n\n```\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small_csv)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 158029.9988, 190729.9957, 209229.9957, ..., nan, 4956060.059, nan, 9861099.609\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., nan, nan, nan, nan\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[CODE]\n\nhere is the transformation code: \nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Group by Year and sum electricity generation from all countries\n global_energy = df_energy.groupby('Year').agg({\n 'Electricity from fossil fuels (TWh)': 'sum',\n 'Electricity from nuclear (TWh)': 'sum',\n 'Electricity from renewables (TWh)': 'sum'\n }).reset_index()\n \n # Rename columns for clarity\n global_energy.columns = ['Year', 'Fossil Fuels', 'Nuclear', 'Renewables']\n \n # Transform to long format\n transformed_df = pd.melt(\n global_energy,\n id_vars=['Year'],\n value_vars=['Fossil Fuels', 'Nuclear', 'Renewables'],\n var_name='Energy Source',\n value_name='Energy'\n )\n \n # Sort by Year and Energy Source\n transformed_df = transformed_df.sort_values(['Year', 'Energy Source']).reset_index(drop=True)\n \n return transformed_df\n\n\n[EXPLANATION]\n", "role": "user"}, {"content": "[CODE EXPLANATION]\n\n- Group the energy data by **Year** and aggregate electricity generation values across all countries by summing **Electricity from fossil fuels (TWh)**, **Electricity from nuclear (TWh)**, and **Electricity from renewables (TWh)**\n- Rename the aggregated columns to simplified labels: **Fossil Fuels**, **Nuclear**, and **Renewables**\n- Transform the data from wide format to long format by unpivoting the three energy source columns into two columns: **Energy Source** (containing the type of energy) and **Energy** (containing the generation value in TWh)\n- Sort the resulting dataset by **Year** and **Energy Source** for consistent ordering\n\n[CONCEPTS EXPLANATION]\n\n```json\n[]\n```", "role": "assistant"}], "status": "ok"}, "status": "completed"}, "anchored": false, "attachedMetadata": ""}, {"kind": "table", "id": "table-10", "displayId": "energy-source2", "names": ["Energy", "Energy Source", "Entity", "Year"], "rows": [{"Energy": 1113.3, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2000}, {"Energy": 16.74, "Energy Source": "Nuclear", "Entity": "China", "Year": 2000}, {"Energy": 225.56, "Energy Source": "Renewables", "Entity": "China", "Year": 2000}, {"Energy": 1182.59, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2001}, {"Energy": 17.47, "Energy Source": "Nuclear", "Entity": "China", "Year": 2001}, {"Energy": 280.73, "Energy Source": "Renewables", "Entity": "China", "Year": 2001}, {"Energy": 1337.46, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2002}, {"Energy": 25.13, "Energy Source": "Nuclear", "Entity": "China", "Year": 2002}, {"Energy": 291.41, "Energy Source": "Renewables", "Entity": "China", "Year": 2002}, {"Energy": 1579.96, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2003}, {"Energy": 43.34, "Energy Source": "Nuclear", "Entity": "China", "Year": 2003}, {"Energy": 287.28, "Energy Source": "Renewables", "Entity": "China", "Year": 2003}, {"Energy": 1795.41, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2004}, {"Energy": 50.47, "Energy Source": "Nuclear", "Entity": "China", "Year": 2004}, {"Energy": 357.43, "Energy Source": "Renewables", "Entity": "China", "Year": 2004}, {"Energy": 2042.8, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2005}, {"Energy": 53.09, "Energy Source": "Nuclear", "Entity": "China", "Year": 2005}, {"Energy": 404.37, "Energy Source": "Renewables", "Entity": "China", "Year": 2005}, {"Energy": 2364.16, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2006}, {"Energy": 54.84, "Energy Source": "Nuclear", "Entity": "China", "Year": 2006}, {"Energy": 446.72, "Energy Source": "Renewables", "Entity": "China", "Year": 2006}, {"Energy": 2718.7, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2007}, {"Energy": 62.13, "Energy Source": "Nuclear", "Entity": "China", "Year": 2007}, {"Energy": 500.71, "Energy Source": "Renewables", "Entity": "China", "Year": 2007}, {"Energy": 2762.29, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2008}, {"Energy": 68.39, "Energy Source": "Nuclear", "Entity": "China", "Year": 2008}, {"Energy": 665.08, "Energy Source": "Renewables", "Entity": "China", "Year": 2008}, {"Energy": 2980.2, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2009}, {"Energy": 70.05, "Energy Source": "Nuclear", "Entity": "China", "Year": 2009}, {"Energy": 664.39, "Energy Source": "Renewables", "Entity": "China", "Year": 2009}, {"Energy": 3326.19, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2010}, {"Energy": 74.74, "Energy Source": "Nuclear", "Entity": "China", "Year": 2010}, {"Energy": 786.38, "Energy Source": "Renewables", "Entity": "China", "Year": 2010}, {"Energy": 3811.77, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2011}, {"Energy": 87.2, "Energy Source": "Nuclear", "Entity": "China", "Year": 2011}, {"Energy": 792.38, "Energy Source": "Renewables", "Entity": "China", "Year": 2011}, {"Energy": 3869.38, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2012}, {"Energy": 98.32, "Energy Source": "Nuclear", "Entity": "China", "Year": 2012}, {"Energy": 999.56, "Energy Source": "Renewables", "Entity": "China", "Year": 2012}, {"Energy": 4203.77, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2013}, {"Energy": 111.5, "Energy Source": "Nuclear", "Entity": "China", "Year": 2013}, {"Energy": 1093.37, "Energy Source": "Renewables", "Entity": "China", "Year": 2013}, {"Energy": 4345.86, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2014}, {"Energy": 133.22, "Energy Source": "Nuclear", "Entity": "China", "Year": 2014}, {"Energy": 1289.23, "Energy Source": "Renewables", "Entity": "China", "Year": 2014}, {"Energy": 4222.76, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2015}, {"Energy": 171.38, "Energy Source": "Nuclear", "Entity": "China", "Year": 2015}, {"Energy": 1393.66, "Energy Source": "Renewables", "Entity": "China", "Year": 2015}, {"Energy": 4355, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2016}, {"Energy": 213.18, "Energy Source": "Nuclear", "Entity": "China", "Year": 2016}, {"Energy": 1522.79, "Energy Source": "Renewables", "Entity": "China", "Year": 2016}, {"Energy": 4643.1, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2017}, {"Energy": 248.1, "Energy Source": "Nuclear", "Entity": "China", "Year": 2017}, {"Energy": 1667.06, "Energy Source": "Renewables", "Entity": "China", "Year": 2017}, {"Energy": 4990.28, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2018}, {"Energy": 295, "Energy Source": "Nuclear", "Entity": "China", "Year": 2018}, {"Energy": 1835.32, "Energy Source": "Renewables", "Entity": "China", "Year": 2018}, {"Energy": 5098.22, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2019}, {"Energy": 348.7, "Energy Source": "Nuclear", "Entity": "China", "Year": 2019}, {"Energy": 2014.57, "Energy Source": "Renewables", "Entity": "China", "Year": 2019}, {"Energy": 5184.13, "Energy Source": "Fossil Fuels", "Entity": "China", "Year": 2020}, {"Energy": 366.2, "Energy Source": "Nuclear", "Entity": "China", "Year": 2020}, {"Energy": 2184.94, "Energy Source": "Renewables", "Entity": "China", "Year": 2020}, {"Energy": 475.35, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2000}, {"Energy": 15.77, "Energy Source": "Nuclear", "Entity": "India", "Year": 2000}, {"Energy": 80.27, "Energy Source": "Renewables", "Entity": "India", "Year": 2000}, {"Energy": 491.01, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2001}, {"Energy": 18.89, "Energy Source": "Nuclear", "Entity": "India", "Year": 2001}, {"Energy": 76.19, "Energy Source": "Renewables", "Entity": "India", "Year": 2001}, {"Energy": 517.51, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2002}, {"Energy": 19.35, "Energy Source": "Nuclear", "Entity": "India", "Year": 2002}, {"Energy": 72.78, "Energy Source": "Renewables", "Entity": "India", "Year": 2002}, {"Energy": 545.36, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2003}, {"Energy": 18.14, "Energy Source": "Nuclear", "Entity": "India", "Year": 2003}, {"Energy": 74.63, "Energy Source": "Renewables", "Entity": "India", "Year": 2003}, {"Energy": 567.86, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2004}, {"Energy": 21.26, "Energy Source": "Nuclear", "Entity": "India", "Year": 2004}, {"Energy": 109.2, "Energy Source": "Renewables", "Entity": "India", "Year": 2004}, {"Energy": 579.32, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2005}, {"Energy": 17.73, "Energy Source": "Nuclear", "Entity": "India", "Year": 2005}, {"Energy": 107.47, "Energy Source": "Renewables", "Entity": "India", "Year": 2005}, {"Energy": 599.24, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2006}, {"Energy": 17.63, "Energy Source": "Nuclear", "Entity": "India", "Year": 2006}, {"Energy": 127.56, "Energy Source": "Renewables", "Entity": "India", "Year": 2006}, {"Energy": 636.68, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2007}, {"Energy": 17.83, "Energy Source": "Nuclear", "Entity": "India", "Year": 2007}, {"Energy": 141.75, "Energy Source": "Renewables", "Entity": "India", "Year": 2007}, {"Energy": 674.27, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2008}, {"Energy": 15.23, "Energy Source": "Nuclear", "Entity": "India", "Year": 2008}, {"Energy": 138.91, "Energy Source": "Renewables", "Entity": "India", "Year": 2008}, {"Energy": 728.56, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2009}, {"Energy": 16.82, "Energy Source": "Nuclear", "Entity": "India", "Year": 2009}, {"Energy": 134.33, "Energy Source": "Renewables", "Entity": "India", "Year": 2009}, {"Energy": 771.78, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2010}, {"Energy": 23.08, "Energy Source": "Nuclear", "Entity": "India", "Year": 2010}, {"Energy": 142.61, "Energy Source": "Renewables", "Entity": "India", "Year": 2010}, {"Energy": 828.16, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2011}, {"Energy": 32.22, "Energy Source": "Nuclear", "Entity": "India", "Year": 2011}, {"Energy": 173.62, "Energy Source": "Renewables", "Entity": "India", "Year": 2011}, {"Energy": 893.45, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2012}, {"Energy": 33.14, "Energy Source": "Nuclear", "Entity": "India", "Year": 2012}, {"Energy": 165.25, "Energy Source": "Renewables", "Entity": "India", "Year": 2012}, {"Energy": 924.93, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2013}, {"Energy": 33.31, "Energy Source": "Nuclear", "Entity": "India", "Year": 2013}, {"Energy": 187.9, "Energy Source": "Renewables", "Entity": "India", "Year": 2013}, {"Energy": 1025.29, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2014}, {"Energy": 34.69, "Energy Source": "Nuclear", "Entity": "India", "Year": 2014}, {"Energy": 202.04, "Energy Source": "Renewables", "Entity": "India", "Year": 2014}, {"Energy": 1080.44, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2015}, {"Energy": 38.31, "Energy Source": "Nuclear", "Entity": "India", "Year": 2015}, {"Energy": 203.21, "Energy Source": "Renewables", "Entity": "India", "Year": 2015}, {"Energy": 1155.52, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2016}, {"Energy": 37.9, "Energy Source": "Nuclear", "Entity": "India", "Year": 2016}, {"Energy": 208.21, "Energy Source": "Renewables", "Entity": "India", "Year": 2016}, {"Energy": 1198.85, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2017}, {"Energy": 37.41, "Energy Source": "Nuclear", "Entity": "India", "Year": 2017}, {"Energy": 234.9, "Energy Source": "Renewables", "Entity": "India", "Year": 2017}, {"Energy": 1276.32, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2018}, {"Energy": 39.05, "Energy Source": "Nuclear", "Entity": "India", "Year": 2018}, {"Energy": 263.61, "Energy Source": "Renewables", "Entity": "India", "Year": 2018}, {"Energy": 1273.59, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2019}, {"Energy": 45.16, "Energy Source": "Nuclear", "Entity": "India", "Year": 2019}, {"Energy": 303.16, "Energy Source": "Renewables", "Entity": "India", "Year": 2019}, {"Energy": 1202.34, "Energy Source": "Fossil Fuels", "Entity": "India", "Year": 2020}, {"Energy": 44.61, "Energy Source": "Nuclear", "Entity": "India", "Year": 2020}, {"Energy": 315.76, "Energy Source": "Renewables", "Entity": "India", "Year": 2020}, {"Energy": 2697.28, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2000}, {"Energy": 753.89, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2000}, {"Energy": 350.93, "Energy Source": "Renewables", "Entity": "United States", "Year": 2000}, {"Energy": 2678.68, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2001}, {"Energy": 768.83, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2001}, {"Energy": 280.06, "Energy Source": "Renewables", "Entity": "United States", "Year": 2001}, {"Energy": 2727.83, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2002}, {"Energy": 780.06, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2002}, {"Energy": 336.34, "Energy Source": "Renewables", "Entity": "United States", "Year": 2002}, {"Energy": 2756.03, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2003}, {"Energy": 763.73, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2003}, {"Energy": 349.18, "Energy Source": "Renewables", "Entity": "United States", "Year": 2003}, {"Energy": 2818.28, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2004}, {"Energy": 788.53, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2004}, {"Energy": 345.14, "Energy Source": "Renewables", "Entity": "United States", "Year": 2004}, {"Energy": 2899.96, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2005}, {"Energy": 781.99, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2005}, {"Energy": 353.04, "Energy Source": "Renewables", "Entity": "United States", "Year": 2005}, {"Energy": 2878.56, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2006}, {"Energy": 787.22, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2006}, {"Energy": 381.16, "Energy Source": "Renewables", "Entity": "United States", "Year": 2006}, {"Energy": 2988.24, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2007}, {"Energy": 806.42, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2007}, {"Energy": 347.91, "Energy Source": "Renewables", "Entity": "United States", "Year": 2007}, {"Energy": 2924.21, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2008}, {"Energy": 806.21, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2008}, {"Energy": 377.11, "Energy Source": "Renewables", "Entity": "United States", "Year": 2008}, {"Energy": 2725.41, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2009}, {"Energy": 798.85, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2009}, {"Energy": 415.56, "Energy Source": "Renewables", "Entity": "United States", "Year": 2009}, {"Energy": 2882.49, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2010}, {"Energy": 806.97, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2010}, {"Energy": 424.48, "Energy Source": "Renewables", "Entity": "United States", "Year": 2010}, {"Energy": 2788.93, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2011}, {"Energy": 790.2, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2011}, {"Energy": 509.74, "Energy Source": "Renewables", "Entity": "United States", "Year": 2011}, {"Energy": 2779.02, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2012}, {"Energy": 769.33, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2012}, {"Energy": 492.32, "Energy Source": "Renewables", "Entity": "United States", "Year": 2012}, {"Energy": 2746.21, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2013}, {"Energy": 789.02, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2013}, {"Energy": 520.38, "Energy Source": "Renewables", "Entity": "United States", "Year": 2013}, {"Energy": 2752.01, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2014}, {"Energy": 797.17, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2014}, {"Energy": 546.83, "Energy Source": "Renewables", "Entity": "United States", "Year": 2014}, {"Energy": 2730.32, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2015}, {"Energy": 797.18, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2015}, {"Energy": 556.49, "Energy Source": "Renewables", "Entity": "United States", "Year": 2015}, {"Energy": 2656.96, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2016}, {"Energy": 805.69, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2016}, {"Energy": 624.91, "Energy Source": "Renewables", "Entity": "United States", "Year": 2016}, {"Energy": 2540.17, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2017}, {"Energy": 804.95, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2017}, {"Energy": 707.19, "Energy Source": "Renewables", "Entity": "United States", "Year": 2017}, {"Energy": 2661.3, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2018}, {"Energy": 807.08, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2018}, {"Energy": 733.17, "Energy Source": "Renewables", "Entity": "United States", "Year": 2018}, {"Energy": 2588.21, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2019}, {"Energy": 809.41, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2019}, {"Energy": 760.76, "Energy Source": "Renewables", "Entity": "United States", "Year": 2019}, {"Energy": 2431.9, "Energy Source": "Fossil Fuels", "Entity": "United States", "Year": 2020}, {"Energy": 789.88, "Energy Source": "Nuclear", "Entity": "United States", "Year": 2020}, {"Energy": 821.4, "Energy Source": "Renewables", "Entity": "United States", "Year": 2020}], "metadata": {"Energy": {"type": "number", "semanticType": "Number"}, "Energy Source": {"type": "string", "semanticType": "String", "levels": ["Fossil Fuels", "Nuclear", "Renewables"]}, "Entity": {"type": "string", "semanticType": "Location"}, "Year": {"type": "number", "semanticType": "Year"}}, "derive": {"code": "\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total CO2 emissions by country across all years\n total_co2_by_country = df_energy.groupby('Entity')['Value_co2_emissions_kt_by_country'].sum().reset_index()\n \n # Get top 3 CO2 emitting countries\n top3_countries = total_co2_by_country.nlargest(3, 'Value_co2_emissions_kt_by_country')['Entity'].tolist()\n \n # Filter data for top 3 countries\n df_top3 = df_energy[df_energy['Entity'].isin(top3_countries)].copy()\n \n # Select relevant columns and rename for clarity\n df_top3 = df_top3[['Year', 'Entity', 'Electricity from fossil fuels (TWh)', \n 'Electricity from nuclear (TWh)', 'Electricity from renewables (TWh)']].copy()\n \n df_top3.columns = ['Year', 'Entity', 'Fossil Fuels', 'Nuclear', 'Renewables']\n \n # Transform to long format\n transformed_df = pd.melt(\n df_top3,\n id_vars=['Year', 'Entity'],\n value_vars=['Fossil Fuels', 'Nuclear', 'Renewables'],\n var_name='Energy Source',\n value_name='Energy'\n )\n \n # Sort by Entity, Year, and Energy Source\n transformed_df = transformed_df.sort_values(['Entity', 'Year', 'Energy Source']).reset_index(drop=True)\n \n return transformed_df\n", "source": ["global-energy-20-small.csv"], "dialog": [{"content": "You are a data scientist to help user to transform data that will be used for visualization.\nThe user will provide you information about what data would be needed, and your job is to create a python function based on the input data summary, transformation instruction and expected fields.\nThe users' instruction includes \"chart_type\" and \"chart_encodings\" that describe the visualization they want, and natural language instructions \"goal\" that describe what data is needed.\n\n**Important:**\n- NEVER make assumptions or judgments about a person's gender, biological sex, sexuality, religion, race, nationality, ethnicity, political stance, socioeconomic status, mental health, invisible disabilities, medical conditions, personality type, social impressions, emotional state, and cognitive state.\n- NEVER create formulas that could be used to discriminate based on age. Ageism of any form (explicit and implicit) is strictly prohibited.\n- If above issue occurs, generate columns with np.nan.\n\nConcretely, you should first refine users' goal and then create a python function in the output section based off the [CONTEXT] and [GOAL]:\n\n 1. First, refine users' [GOAL]. The main objective in this step is to check if \"chart_type\" and \"chart_encodings\" provided by the user are sufficient to achieve their \"goal\". Concretely:\n - based on the user's \"goal\" and \"chart_type\" and \"chart_encodings\", elaborate the goal into a \"detailed_instruction\".\n - \"display_instruction\" is a short verb phrase describing the users' goal. \n - it would be a short verbal description of user intent as a verb phrase (<12 words).\n - generate it based on detailed_instruction and the suggested chart_type and chart_encodings, but don't need to mention the chart details.\n - should capture key computation ideas: by reading the display, the user can understand the purpose and what's derived from the data.\n - if the user specification follows up the previous instruction, the 'display_instruction' should only describe how it builds up the previous instruction without repeating information from previous steps.\n - the phrase can be presented in different styles, e.g., question (what's xxx), instruction (show xxx), description, etc.\n - if you mention column names from the input or the output data, highlight the text in **bold**.\n * the column can either be a column in the input data, or a new column that will be computed in the output data.\n * the mention don't have to be exact match, it can be semantically matching, e.g., if you mentioned \"average score\" in the text while the column to be computed is \"Avg_Score\", you should still highlight \"**average score**\" in the text.\n - determine \"output_fields\", the desired fields that the output data should have to achieve the user's goal, it's a good idea to include intermediate fields here.\n - then decide \"chart_encodings\", which maps visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized, \n - the \"chart_encodings\" should be created to support the user's \"chart_type\".\n - first, determine whether the user has provided sufficient fields in \"chart_encodings\" that are needed to achieve their goal:\n - if the user's \"chart_encodings\" are sufficient, simply copy it.\n - if the user didn't provide sufficient fields in \"chart_encodings\", add missing fields in \"chart_encodings\" (ordered them based on whether the field will be used in x,y axes or legends);\n - \"chart_encodings\" should only include fields that will be visualized (do not include other intermediate fields from \"output_fields\") \n - when adding new fields to \"chart_encodings\", be efficient and add only a minimal number of fields that are needed to achive the user's goal. \n - generally, the total number of fields in \"chart_encodings\" should be no more than 3 for x,y,legend.\n - if the user's \"chart_encodings\" is sufficient but can be optimized, you can reorder encodings to visualize the data more effectively.\n - sometimes, user may provide instruction to update visualizations fields they provided. You should leverage the user's goal to resolve the conflict and decide the final \"chart_encodings\"\n - e.g., they may mention \"use B metric instead\" while A metric is in provided fields, in this case, you should update \"chart_encodings\" to update A metric with B metric.\n - guide on statistical analysis:\n - when the user asks for forecasting or regression analysis, you should consider the following:\n - the output should be a long format table where actual x, y pairs and predicted x, y pairs are included in the X, Y columns, they are differentiated with a third column \"is_predicted\" that is a boolean field.\n - i.e., if the user ask for forecasting based on two columns T and Y, the output should be three columns: T, Y, is_predicted, where\n - T, Y columns contain BOTH original values from the data and predicted values from the data.\n - is_predicted is a boolean field to indicate whether the x, y pairs are original values from the data or predicted / regression values from the data.\n - the recommended chart should be line chart (time series) or scatter plot (quantitative x, y)\n - if the user asks for forecasting, it's good to include predicted x, y pairs for both x in the original data and future x values (i.e., combine regression and forecasting results)\n - in this case, is_predicted should be of three values 'original', 'regression', 'forecasting'\n - when the user asks for clustering:\n - the output should be a long format table where actual x, y pairs with a third column \"cluster_id\" that indicates the cluster id of the data point.\n - the recommended chart should be scatter plot (quantitative x, y)\n \n Prepare the result in the following json format:\n\n```\n{\n \"detailed_instruction\": \"...\" // string, elaborate user instruction with details if the user\n \"display_instruction\": \"...\" // string, the short verb phrase describing the users' goal.\n \"output_fields\": [...] // string[], describe the desired output fields that the output data should have based on the user's goal, it's a good idea to preserve intermediate fields here (i.e., the goal of transformed data)\n \"chart_encodings\": {\n \"x\": \"\",\n \"y\": \"\",\n \"color\": \"\",\n \"size\": \"\",\n \"opacity\": \"\",\n \"facet\": \"\",\n ... // other visualization channels user used\n } // object: map visualization channels (x, y, color, size, opacity, facet, etc.) to a subset of \"output_fields\" that will be visualized.\n \"reason\": \"...\" // string, explain why this refinement is made\n}\n```\n\n 2. Then, write a python function based on the refined goal, the function input is a dataframe \"df\" (or multiple dataframes based on tables presented in the [CONTEXT] section) and the output is the transformed dataframe \"transformed_df\". \"transformed_df\" should contain all \"output_fields\" from the refined goal.\nThe python function must follow the template provided in [TEMPLATE], do not import any other libraries or modify function name. The function should be as simple as possible and easily readable.\nIf there is no data transformation needed based on \"output_fields\", the transformation function can simply \"return df\".\n\n[TEMPLATE]\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\nfrom sklearn import ... # import necessary libraries from sklearn if needed\n\ndef transform_data(df1, df2, ...): \n # complete the template here\n return transformed_df\n```\n\nnote: \n- if the user provided one table, then it should be `def transform_data(df1)`, if the user provided multiple tables, then it should be `def transform_data(df1, df2, ...)` and you should consider the join between tables to derive the output.\n- **VERY IMPORTANT** the number of arguments in the function must match the number of tables provided, and the order of arguments must match the order of tables provided.\n- try to use intuitive table names to refer to the input dataframes, for example, if the user provided two tables city and weather, you can use `transform_data(df_city, df_weather)` to refer to the two dataframes, as long as the number and order of the arguments match the number and order of the tables provided.\n- datetime objects handling:\n - if the output field is year, convert it to number, if it is year-month / year-month-day, convert it to string object (e.g., \"2020-01\" / \"2020-01-01\").\n - if the output is time only: convert hour to number if it's just the hour (e.g., 10), but convert hour:min or h:m:s to string object (e.g., \"10:30\", \"10:30:45\")\n - never return datetime object directly, convert it to either number (if it only contains year) or string so it's readable.\n\n 3. The output must only contain a json object representing the refined goal and a python code block representing the transformation code, do not add any extra text explanation.\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 5736319.824, 5738290.039, nan, 9861099.609\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 61.99, 62.21, 63.02, 63.71\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[GOAL]\n\n{\n \"instruction\": \"compare each countries renewable between 2000 and 2020\",\n \"chart_type\": \"Grouped Bar Chart\",\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Renewable Energy\",\n \"color\": \"Year\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Compare renewable energy (Electricity from renewables) for each country between the years 2000 and 2020. Filter the data to include only years 2000 and 2020, and prepare the output with country names, year, and renewable energy values for visualization as a grouped bar chart.\",\n \"display_instruction\": \"Compare **renewable energy** by **country** between **2000** and **2020**\",\n \"output_fields\": [\"Entity\", \"Year\", \"Electricity from renewables (TWh)\"],\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Electricity from renewables (TWh)\",\n \"color\": \"Year\"\n },\n \"reason\": \"The user wants to compare renewable energy across countries between 2000 and 2020. The output field 'Renewable Energy' mentioned in the original chart_encodings maps to 'Electricity from renewables (TWh)' in the dataset. To create a grouped bar chart comparing these two years, we need to filter the data to only include years 2000 and 2020, and output the Entity (country), Year, and the renewable energy values. The chart_encodings are updated to use the actual column name from the dataset.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n transformed_df = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Select only the required columns\n transformed_df = transformed_df[['Entity', 'Year', 'Electricity from renewables (TWh)']]\n \n # Convert Year to string for better visualization as a categorical variable\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year for better readability\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Electricity from renewables (TWh) Entity Year\n0 17.11 Australia 2000\n1 63.99 Australia 2020\n2 308.77 Brazil 2000\n3 520.01 Brazil 2020\n4 363.70 Canada 2000\n5 429.24 Canada 2020\n6 225.56 China 2000\n7 2184.94 China 2020\n8 67.83 France 2000\n9 125.28 France 2020\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show their renewable percentage instead, and add an entity to show global average\",\n \"chart_type\": \"Grouped Bar Chart\",\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Electricity from renewables (TWh)\",\n \"color\": \"Year\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Calculate the renewable energy percentage (proportion of renewables to total electricity generation from all sources) for each country between 2000 and 2020. Add a 'Global Average' entity that represents the worldwide average renewable percentage for each year. The output should include Entity (with 'Global Average' added), Year, and the renewable percentage.\",\n \"display_instruction\": \"Show **renewable energy percentage** by **country** with **global average** for **2000** and **2020**\",\n \"output_fields\": [\"Entity\", \"Year\", \"Renewable Percentage\"],\n \"chart_encodings\": {\n \"x\": \"Entity\",\n \"y\": \"Renewable Percentage\",\n \"color\": \"Year\"\n },\n \"reason\": \"The user wants to see renewable percentage instead of absolute values, and wants to add a global average entity for comparison. Renewable percentage is calculated as the ratio of renewable energy to total electricity generation (fossil fuels + nuclear + renewables). A 'Global Average' entity needs to be computed by aggregating all countries' data for each year. The y-axis encoding is updated to reflect the new metric 'Renewable Percentage'.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Filter data for years 2000 and 2020\n df_filtered = df_energy[df_energy['Year'].isin([2000, 2020])].copy()\n \n # Calculate total electricity generation\n df_filtered['Total Electricity'] = (\n df_filtered['Electricity from fossil fuels (TWh)'] + \n df_filtered['Electricity from nuclear (TWh)'] + \n df_filtered['Electricity from renewables (TWh)']\n )\n \n # Calculate renewable percentage\n df_filtered['Renewable Percentage'] = (\n df_filtered['Electricity from renewables (TWh)'] / df_filtered['Total Electricity'] * 100\n )\n \n # Select required columns for individual countries\n country_data = df_filtered[['Entity', 'Year', 'Renewable Percentage']].copy()\n \n # Calculate global average for each year\n global_avg = df_filtered.groupby('Year').apply(\n lambda x: pd.Series({\n 'Entity': 'Global Average',\n 'Year': x['Year'].iloc[0],\n 'Renewable Percentage': (\n x['Electricity from renewables (TWh)'].sum() / x['Total Electricity'].sum() * 100\n )\n })\n ).reset_index(drop=True)\n \n # Combine country data with global average\n transformed_df = pd.concat([country_data, global_avg], ignore_index=True)\n \n # Convert Year to string for better visualization\n transformed_df['Year'] = transformed_df['Year'].astype(str)\n \n # Sort by Entity and Year\n transformed_df = transformed_df.sort_values(['Entity', 'Year']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Entity Renewable Percentage Year\n0 Australia 8.634437 2000\n1 Australia 25.503168 2020\n2 Brazil 90.130772 2000\n3 Brazil 84.641177 2020\n4 Canada 61.809592 2000\n5 Canada 68.779644 2020\n6 China 16.639127 2000\n7 China 28.246461 2020\n8 France 12.711769 2000\n9 France 23.761024 2020\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show global energy composition over time\",\n \"chart_type\": \"Custom Area\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"Energy\",\n \"color\": \"Energy Source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Show the global energy composition over time by aggregating electricity generation from all countries for each year, broken down by energy source (fossil fuels, nuclear, and renewables). The output should be in long format with Year, Energy Source, and Energy values to support a stacked area chart visualization.\",\n \"display_instruction\": \"Show global **energy composition** by **energy source** over time\",\n \"output_fields\": [\"Year\", \"Energy Source\", \"Energy\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"Energy\",\n \"color\": \"Energy Source\"\n },\n \"reason\": \"The user wants to visualize global energy composition over time, which requires aggregating electricity generation data across all countries for each year and breaking it down by energy source type. The data needs to be transformed into long format where each row represents a year-energy source combination, with the total energy value for that source globally. This follows up from the previous renewable percentage analysis by showing the overall energy mix trends rather than focusing on specific countries or percentages.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Group by Year and sum electricity generation from all countries\n global_energy = df_energy.groupby('Year').agg({\n 'Electricity from fossil fuels (TWh)': 'sum',\n 'Electricity from nuclear (TWh)': 'sum',\n 'Electricity from renewables (TWh)': 'sum'\n }).reset_index()\n \n # Rename columns for clarity\n global_energy.columns = ['Year', 'Fossil Fuels', 'Nuclear', 'Renewables']\n \n # Transform to long format\n transformed_df = pd.melt(\n global_energy,\n id_vars=['Year'],\n value_vars=['Fossil Fuels', 'Nuclear', 'Renewables'],\n var_name='Energy Source',\n value_name='Energy'\n )\n \n # Sort by Year and Energy Source\n transformed_df = transformed_df.sort_values(['Year', 'Energy Source']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}, {"content": "This is the result from the latest python code:\n\n Energy Energy Source Year\n0 7160.71 Fossil Fuels 2000\n1 1996.65 Nuclear 2000\n2 1742.56 Renewables 2000\n3 7273.89 Fossil Fuels 2001\n4 2037.54 Nuclear 2001\n5 1690.11 Renewables 2001\n6 7621.07 Fossil Fuels 2002\n7 2042.18 Nuclear 2002\n8 1757.63 Renewables 2002\n9 8043.86 Fossil Fuels 2003\n......\n\nUpdate the code above based on the following instruction:\n\n{\n \"followup_instruction\": \"show the composition of top 3 CO2 emitting countries\",\n \"chart_type\": \"Custom Area\",\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"Energy\",\n \"color\": \"Energy Source\"\n }\n}", "role": "user"}, {"content": "```json\n{\n \"detailed_instruction\": \"Identify the top 3 CO2 emitting countries based on total CO2 emissions across all years, then show the energy composition (fossil fuels, nuclear, and renewables) for each of these countries over time. The output should be in long format with Year, Entity (country), Energy Source, and Energy values to support faceted area charts.\",\n \"display_instruction\": \"Show **energy composition** of **top 3 CO2 emitting countries** over time\",\n \"output_fields\": [\"Year\", \"Entity\", \"Energy Source\", \"Energy\"],\n \"chart_encodings\": {\n \"x\": \"Year\",\n \"y\": \"Energy\",\n \"color\": \"Energy Source\",\n \"facet\": \"Entity\"\n },\n \"reason\": \"The user wants to see the energy composition specifically for the top 3 CO2 emitting countries over time, following up from the global energy composition view. We need to identify which countries have the highest total CO2 emissions, filter the data for those countries, and transform it into long format by energy source. A facet encoding is added to show separate area charts for each of the top 3 countries, allowing comparison of their individual energy compositions over time.\"\n}\n```\n\n```python\nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total CO2 emissions by country across all years\n total_co2_by_country = df_energy.groupby('Entity')['Value_co2_emissions_kt_by_country'].sum().reset_index()\n \n # Get top 3 CO2 emitting countries\n top3_countries = total_co2_by_country.nlargest(3, 'Value_co2_emissions_kt_by_country')['Entity'].tolist()\n \n # Filter data for top 3 countries\n df_top3 = df_energy[df_energy['Entity'].isin(top3_countries)].copy()\n \n # Select relevant columns and rename for clarity\n df_top3 = df_top3[['Year', 'Entity', 'Electricity from fossil fuels (TWh)', \n 'Electricity from nuclear (TWh)', 'Electricity from renewables (TWh)']].copy()\n \n df_top3.columns = ['Year', 'Entity', 'Fossil Fuels', 'Nuclear', 'Renewables']\n \n # Transform to long format\n transformed_df = pd.melt(\n df_top3,\n id_vars=['Year', 'Entity'],\n value_vars=['Fossil Fuels', 'Nuclear', 'Renewables'],\n var_name='Energy Source',\n value_name='Energy'\n )\n \n # Sort by Entity, Year, and Energy Source\n transformed_df = transformed_df.sort_values(['Entity', 'Year', 'Energy Source']).reset_index(drop=True)\n \n return transformed_df\n```", "role": "assistant"}], "trigger": {"tableId": "table-81", "resultTableId": "table-10", "chart": {"id": "chart-1760739075465", "chartType": "Custom Area", "encodingMap": {"x": {"channel": "x", "bin": false, "fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"channel": "y", "bin": false, "fieldID": "concept-1760738917347"}, "x2": {}, "y2": {"channel": "y2", "bin": false}, "color": {"fieldID": "concept-1760738921927", "sortBy": "[\"Fossil Fuels\",\"Nuclear\",\"Renewables\"]"}, "column": {}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-81", "saved": false, "source": "trigger", "unread": false}, "interaction": [{"from": "user", "to": "datatransform-agent", "role": "instruction", "content": "show the composition of top 3 CO2 emitting countries", "displayContent": "Show **energy composition** of **top 3 CO2 emitting countries** over time"}]}, "explanation": {"agent": "CodeExplanationAgent", "code": "The code performs the following high-level transformations:\n\n1. **Identifies top polluters**: Calculates the total `Value_co2_emissions_kt_by_country` across all years for each `Entity` (country), then selects the **top 3** countries with the highest cumulative CO2 emissions.\n\n2. **Filters the dataset**: Retains only records for the **top 3 CO2-emitting countries** from the original energy data.\n\n3. **Simplifies column structure**: Extracts the relevant energy production columns (`Electricity from fossil fuels (TWh)`, `Electricity from nuclear (TWh)`, `Electricity from renewables (TWh)`) and renames them to **Fossil Fuels**, **Nuclear**, and **Renewables** for clarity.\n\n4. **Reshapes data to long format**: Transforms the data from wide format (separate columns for each energy source) to long format, where each row represents a specific `Year`, `Entity`, and `Energy Source` combination with its corresponding `Energy` production value.\n\n5. **Sorts the output**: Orders the final dataset by `Entity`, `Year`, and `Energy Source` for systematic presentation.", "concepts": [], "dialog": [{"content": "You are a data scientist to help user explain code, \nso that a non-code can clearly understand what the code is doing, you are provided with a summary of the input data, and the transformation code.\n\nYour goal:\n1. You should generate a good itemized explanation of the code so that the reader can understand high-level steps of what the data transformation is doing.\n - Be very concise, and stay at a high-level. The reader doesn't understand code and does not want to learn exactly what the code is doing. They just want to learn what have been done from a logical level.\n - The explanation should be a markdown string that is a list of bullet points (with new lines), highlight constants, data fields, and important verbs.\n2. Generate a list of explanations for new fields (fields not from the input data) that introduce metrics/concepts that are not obvious from the code.\n - provide a declarative definition that explains the new field, use a mathematical notation if applicable.\n - only include new fields explanation of new metrics that are involved in computation (e.g., ROI, commerical_success_score)\n - *DO NOT* explain trivial new fields like \"Decade\" or \"Avg_Rating\", \"US_Sales\" that are self-explanatory.\n - Avoid explaining fields that are simple aggregate of fields in the original data (min_score, avg_value, count, etc.)\n - When a field involves mathematical computation, you can use LaTeX math notation in the explanation. Format mathematical expressions using:\n - Inline math: `\\( ... \\)` for formulas within text\n - Block math: `\\[ ... \\]` for standalone formulas\n - Examples: `\\( \\frac{\\text{Revenue}}{\\text{Cost}} \\)` for ratios, `\\[ \\text{Score} = \\text{Rating} \\times \\text{Worldwide\\_Gross} \\]` for formulas\n - note: when using underscores as part of the text, you need to escape them with a backslash, e.g., `\\_`\n - Note: don't use math notation for fields whose computation is trivial (use plain english), it will likely be confusing to the reader. \n Only use math notation for fields that can not be easilyexplained in plain english. Use it sparingly.\n3. If there are multiple fields that have the similar computation, you can explain them together in one explanation.\n - in \"field\", you can provide a list of fields in format of \"field1, field2, ...\"\n - in \"explanation\", you can provide a single explanation for the computation of the fields.\n - for example, if you have fields like \"Norm_Rating\", \"Norm_Gross\", \"Critical_Commercial_Score\", you can explain Norm_Rating, Norm_Gross together in one explanation and explain Critical_Commercial_Score in another explanation.\n4. If the code is about statistical analysis, you should explain the statistical analysis in the explanation as a concept named \"Statistical Analysis\" in the [CONCEPTS EXPLANATION] section.\n - explain how you model the data, which fields are used, how data processing is done, and what models are used.\n - suggest some other modeling approaches that can be used to analyze the data in the explanation as well.\n \nThe focus is to explain how new fields are computed, don't generate explanation for low-level actions like \"return\", \"load data\" etc. \n\nProvide the result in the following two sections:\n - first section is the code explanation that should be a markdown block explaining the code, in the [CODE EXPLANATION] section.\n - remember to highlight constants, data fields, and important verbs in the code explanation.\n - second section is the concepts explanation that should be a json block (start with ```json) in the [CONCEPTS EXPLANATION] section.\n\n[CODE EXPLANATION]\n\n...(explanation of the code)\n\n[CONCEPTS EXPLANATION]\n\n```json\n[\n {\n \"field\": \"...\",\n \"explanation\": \"...\"\n }\n]\n\n```\n", "role": "system"}, {"content": "[CONTEXT]\n\nHere are our datasets, here are their summaries and samples:\n\n# table1 (global_energy_20_small_csv)\n\n## fields\n\t*Year -- type: int64, values: 2000, 2001, 2002, ..., 2017, 2018, 2019, 2020\n\t*Entity -- type: object, values: Australia, Brazil, Canada, ..., Thailand, Ukraine, United Kingdom, United States\n\t*Value_co2_emissions_kt_by_country -- type: float64, values: 117440.0, 146139.9994, 158029.9988, ..., 5593029.785, 5736319.824, 5738290.039, 9861099.609\n\t*Electricity from fossil fuels (TWh) -- type: float64, values: 28.87, 31.62, 33.5, ..., 4643.1, 4990.28, 5098.22, 5184.13\n\t*Electricity from nuclear (TWh) -- type: float64, values: 0.0, 3.24, 4.94, ..., 62.21, 63.02, 63.71, 439.73\n\t*Electricity from renewables (TWh) -- type: float64, values: 0.0, 0.01, 0.03, ..., 1667.06, 1835.32, 2014.57, 2184.94\n\n## sample\n Year Entity Value_co2_emissions_kt_by_country Electricity from fossil fuels (TWh) Electricity from nuclear (TWh) Electricity from renewables (TWh)\n0 2000 Australia 339450.0000 181.05 0 17.11\n1 2001 Australia 345640.0000 194.33 0 17.40\n2 2002 Australia 353369.9951 197.29 0 17.35\n3 2003 Australia 352579.9866 195.13 0 18.50\n4 2004 Australia 365809.9976 203.66 0 19.41\n......\n\n\n\n\n[CODE]\n\nhere is the transformation code: \nimport pandas as pd\nimport collections\nimport numpy as np\n\ndef transform_data(df_energy):\n # Calculate total CO2 emissions by country across all years\n total_co2_by_country = df_energy.groupby('Entity')['Value_co2_emissions_kt_by_country'].sum().reset_index()\n \n # Get top 3 CO2 emitting countries\n top3_countries = total_co2_by_country.nlargest(3, 'Value_co2_emissions_kt_by_country')['Entity'].tolist()\n \n # Filter data for top 3 countries\n df_top3 = df_energy[df_energy['Entity'].isin(top3_countries)].copy()\n \n # Select relevant columns and rename for clarity\n df_top3 = df_top3[['Year', 'Entity', 'Electricity from fossil fuels (TWh)', \n 'Electricity from nuclear (TWh)', 'Electricity from renewables (TWh)']].copy()\n \n df_top3.columns = ['Year', 'Entity', 'Fossil Fuels', 'Nuclear', 'Renewables']\n \n # Transform to long format\n transformed_df = pd.melt(\n df_top3,\n id_vars=['Year', 'Entity'],\n value_vars=['Fossil Fuels', 'Nuclear', 'Renewables'],\n var_name='Energy Source',\n value_name='Energy'\n )\n \n # Sort by Entity, Year, and Energy Source\n transformed_df = transformed_df.sort_values(['Entity', 'Year', 'Energy Source']).reset_index(drop=True)\n \n return transformed_df\n\n\n[EXPLANATION]\n", "role": "user"}, {"content": "[CODE EXPLANATION]\n\nThe code performs the following high-level transformations:\n\n1. **Identifies top polluters**: Calculates the total `Value_co2_emissions_kt_by_country` across all years for each `Entity` (country), then selects the **top 3** countries with the highest cumulative CO2 emissions.\n\n2. **Filters the dataset**: Retains only records for the **top 3 CO2-emitting countries** from the original energy data.\n\n3. **Simplifies column structure**: Extracts the relevant energy production columns (`Electricity from fossil fuels (TWh)`, `Electricity from nuclear (TWh)`, `Electricity from renewables (TWh)`) and renames them to **Fossil Fuels**, **Nuclear**, and **Renewables** for clarity.\n\n4. **Reshapes data to long format**: Transforms the data from wide format (separate columns for each energy source) to long format, where each row represents a specific `Year`, `Entity`, and `Energy Source` combination with its corresponding `Energy` production value.\n\n5. **Sorts the output**: Orders the final dataset by `Entity`, `Year`, and `Energy Source` for systematic presentation.\n\n[CONCEPTS EXPLANATION]\n\n```json\n[]\n```", "role": "assistant"}], "status": "ok"}, "status": "completed"}, "anchored": false, "attachedMetadata": ""}], "charts": [{"id": "chart-1760739084455", "chartType": "Custom Area", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"fieldID": "concept-1760738917347"}, "x2": {}, "y2": {"channel": "y2", "bin": false}, "color": {"fieldID": "concept-1760738921927"}, "column": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-10", "saved": false, "source": "user", "unread": false}, {"id": "chart-1760739021201", "chartType": "Custom Area", "encodingMap": {"x": {"channel": "x", "bin": false, "fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"channel": "y", "bin": false, "fieldID": "concept-1760738917347"}, "x2": {}, "y2": {"channel": "y2", "bin": false}, "color": {"fieldID": "concept-1760738921927", "sortBy": "[\"Fossil Fuels\",\"Nuclear\",\"Renewables\"]"}, "column": {}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-81", "saved": false, "source": "user", "unread": false}, {"id": "chart-1760738819387", "chartType": "Grouped Bar Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "y": {"fieldID": "concept-Renewable Percentage-1760738820889"}, "column": {"channel": "column", "bin": false}, "row": {"channel": "row", "bin": false}, "group": {"fieldID": "original--global-energy-20-small.csv--Year"}}, "tableRef": "table-27", "saved": false, "source": "user", "unread": false}, {"id": "chart-1760738770100", "chartType": "Grouped Bar Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "y": {"fieldID": "original--global-energy-20-small.csv--Electricity from renewables (TWh)"}, "column": {"channel": "column", "bin": false}, "row": {"channel": "row", "bin": false}, "group": {"fieldID": "original--global-energy-20-small.csv--Year"}}, "tableRef": "table-97", "saved": false, "source": "user", "unread": false}, {"id": "chart-1760738436615", "chartType": "Line Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year", "sortOrder": "ascending"}, "y": {"fieldID": "concept-rank-1760738444550", "sortOrder": "descending"}, "color": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "opacity": {"channel": "opacity", "bin": false}, "column": {}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-78", "saved": false, "source": "user", "unread": false}, {"id": "chart-1760738423852", "chartType": "Line Chart", "config": {"showPoints": true}, "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"fieldID": "concept-renewable_percentage-1760738424337"}, "color": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "column": {}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-45", "saved": false, "source": "user", "unread": false}, {"id": "chart-1760738400970", "chartType": "Line Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"fieldID": "concept-1760738385163"}, "color": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "opacity": {"channel": "opacity", "bin": false}, "column": {"fieldID": "concept-1760738389404", "sortBy": "[\"fossil fuels\",\"nuclear\",\"renewables\"]"}, "row": {"channel": "row", "bin": false}}, "tableRef": "table-82", "saved": false, "source": "user", "unread": false}, {"id": "chart-1760738355655", "chartType": "Line Chart", "encodingMap": {"x": {"fieldID": "original--global-energy-20-small.csv--Year"}, "y": {"fieldID": "original--global-energy-20-small.csv--Value_co2_emissions_kt_by_country"}, "color": {"fieldID": "original--global-energy-20-small.csv--Entity"}, "opacity": {"channel": "opacity", "bin": false}, "column": {"channel": "column", "bin": false}, "row": {"channel": "row", "bin": false}}, "tableRef": "global-energy-20-small.csv", "saved": false, "source": "user", "unread": false}], "conceptShelfItems": [{"id": "concept-1760738921927", "name": "Energy Source", "type": "auto", "description": "", "source": "custom", "tableRef": "custom"}, {"id": "concept-1760738917347", "name": "Energy", "type": "auto", "description": "", "source": "custom", "tableRef": "custom"}, {"id": "concept-Renewable Percentage-1760738820889", "name": "Renewable Percentage", "type": "auto", "description": "", "source": "custom", "tableRef": "custom", "temporary": true}, {"id": "concept-1760738743125", "name": "Renewable Energy", "type": "auto", "description": "", "source": "custom", "tableRef": "custom"}, {"id": "concept-rank-1760738444550", "name": "rank", "type": "auto", "description": "", "source": "custom", "tableRef": "custom", "temporary": true}, {"id": "concept-renewable_percentage-1760738424337", "name": "renewable_percentage", "type": "auto", "description": "", "source": "custom", "tableRef": "custom", "temporary": true}, {"id": "concept-1760738389404", "name": "source", "type": "auto", "description": "", "source": "custom", "tableRef": "custom"}, {"id": "concept-1760738385163", "name": "energy", "type": "auto", "description": "", "source": "custom", "tableRef": "custom"}, {"id": "original--global-energy-20-small.csv--Year", "name": "Year", "type": "integer", "source": "original", "description": "", "tableRef": "global-energy-20-small.csv"}, {"id": "original--global-energy-20-small.csv--Entity", "name": "Entity", "type": "string", "source": "original", "description": "", "tableRef": "global-energy-20-small.csv"}, {"id": "original--global-energy-20-small.csv--Value_co2_emissions_kt_by_country", "name": "Value_co2_emissions_kt_by_country", "type": "number", "source": "original", "description": "", "tableRef": "global-energy-20-small.csv"}, {"id": "original--global-energy-20-small.csv--Electricity from fossil fuels (TWh)", "name": "Electricity from fossil fuels (TWh)", "type": "number", "source": "original", "description": "", "tableRef": "global-energy-20-small.csv"}, {"id": "original--global-energy-20-small.csv--Electricity from nuclear (TWh)", "name": "Electricity from nuclear (TWh)", "type": "number", "source": "original", "description": "", "tableRef": "global-energy-20-small.csv"}, {"id": "original--global-energy-20-small.csv--Electricity from renewables (TWh)", "name": "Electricity from renewables (TWh)", "type": "number", "source": "original", "description": "", "tableRef": "global-energy-20-small.csv"}], "messages": [{"timestamp": 1760831081885, "type": "success", "component": "data formulator", "value": "Successfully loaded Global Energy"}], "displayedMessageIdx": 0, "viewMode": "report", "chartSynthesisInProgress": [], "config": {"formulateTimeoutSeconds": 60, "maxRepairAttempts": 1, "defaultChartWidth": 300, "defaultChartHeight": 300}, "dataCleanBlocks": [], "cleanInProgress": false, "generatedReports": [{"id": "report-1760831156182-8277", "content": "# Global Renewable Energy Shift: 2000 to 2020\n\nBetween 2000 and 2020, the world witnessed a notable transformation in renewable energy adoption. Global renewable electricity nearly doubled from 16% to 29% of total generation, signaling meaningful progress in the energy transition.\n\n[IMAGE(chart-1760738819387)]\n\nThe data reveals striking regional variations. Brazil maintained renewable leadership above 84%, while countries like Australia, Germany, and Italy dramatically expanded their renewable capacity—Australia tripling from 9% to 26%. However, some nations like Mexico experienced declining renewable shares, highlighting uneven progress across different energy systems.\n\n**In summary**, while the 20-year period shows encouraging momentum toward cleaner energy, the pace and direction vary significantly by country, suggesting that achieving global renewable energy goals will require sustained, coordinated efforts tailored to each nation's unique energy landscape and policy environment.", "style": "short note", "selectedChartIds": ["chart-1760738819387"], "createdAt": 1760831163718, "status": "completed", "title": "Global Renewable Energy Shift: 2000 to 2020", "anchorChartId": "chart-1760738819387"}, {"id": "report-1760831130105-4063", "content": "# The Global Renewable Energy Revolution: Two Decades of Transformation\n\nThe world's energy landscape has undergone a remarkable transformation between 2000 and 2020, with renewable electricity generation emerging as a critical component of the global energy mix. This shift reflects both technological advancement and growing commitment to sustainable energy solutions.\n\n[IMAGE(chart-1760738770100)]\n\nLooking at renewable energy adoption across major economies, the growth has been nothing short of extraordinary. China leads the pack with a staggering increase from 225.56 TWh in 2000 to 2,184.94 TWh in 2020—nearly a tenfold expansion. The United States more than doubled its renewable output from 335.45 TWh to 821.40 TWh, while Brazil grew from 308.77 TWh to 520.01 TWh. Notably, countries like Australia, India, and Germany also demonstrated significant gains, with Australia jumping from just 17.11 TWh to 63.99 TWh during this period.\n\n[IMAGE(chart-1760739084455)]\n\nWhen examining the energy portfolios of the three largest CO2 emitters—China, India, and the United States—a complex picture emerges. While China's total energy consumption has grown exponentially, with fossil fuels still dominating, the renewable sector (shown in red) has expanded substantially. The United States shows a more stable total energy consumption, with renewables gradually claiming a larger share. India's energy growth, though significant, remains heavily reliant on fossil fuels, though renewable adoption is accelerating.\n\n**In summary**, the past two decades reveal a global energy transition in progress. While renewable energy has achieved impressive growth worldwide, fossil fuels continue to dominate electricity generation in major economies. Key questions remain: Can this momentum accelerate sufficiently to meet climate goals? What policies will drive faster renewable adoption in emerging economies?", "style": "blog post", "selectedChartIds": ["chart-1760739084455", "chart-1760738770100"], "createdAt": 1760831142289, "status": "completed", "title": "The Global Renewable Energy Revolution: Two Decades of Transformation", "anchorChartId": "chart-1760739084455"}, {"id": "report-1760831094231-2424", "content": "# Global Renewable Energy: A Tale of Leaders and Laggards\n\nThe global energy landscape has undergone significant transformation over the past two decades, with renewable energy emerging as a critical player in the electricity mix. Analyzing data from 21 major economies between 2000 and 2020 reveals striking disparities in how nations have embraced clean energy alternatives.\n\n[IMAGE(chart-1760738423852)]\n\nThe first visualization reveals a fascinating divergence in renewable energy adoption. **Brazil** stands out as a consistent leader, maintaining renewable electricity percentages between 75-90% throughout the entire period, thanks largely to its robust hydroelectric infrastructure. **Canada** follows a similar trajectory, steadily increasing from about 60% to nearly 70% by 2020. Meanwhile, **Germany, Spain, Italy, and the United Kingdom** show remarkable growth trajectories, climbing from under 20% in the early 2000s to over 40% by 2020—demonstrating that nations can dramatically reshape their energy portfolios within two decades.\n\nOn the opposite end of the spectrum, **South Africa** remains nearly flat at the bottom, showing minimal renewable adoption despite global trends. **Saudi Arabia** and **Poland** also lag significantly, though both show modest upticks in recent years.\n\n[IMAGE(chart-1760738436615)]\n\nThe ranking chart illustrates the competitive dynamics of renewable energy leadership. **Brazil and Canada** maintain their dominance at ranks 1-2 throughout most years, while European nations like **Germany, Spain, and the UK** engage in a dynamic competition for the 3rd-5th positions, particularly after 2010. The volatility in middle rankings reflects the rapid changes in energy policy and investment across different nations, with countries like **China** climbing from lower ranks to break into the top 7 by 2020.\n\n**In summary**, the data reveals a bifurcated global energy transition: a group of progressive nations have successfully scaled renewable electricity to 40-90% of their mix, while others remain heavily dependent on fossil fuels. These patterns suggest that political will, natural resource endowment, and infrastructure investment are key determinants of renewable energy success. Important follow-up questions include: What policy mechanisms enabled top performers to achieve such high renewable percentages? Can lagging nations replicate these successes, or do geographic and economic constraints create insurmountable barriers?", "style": "blog post", "selectedChartIds": ["chart-1760738423852", "chart-1760738436615"], "createdAt": 1760831110064, "status": "completed", "title": "Global Renewable Energy: A Tale of Leaders and Laggards", "anchorChartId": "chart-1760738423852"}], "currentReport": {"id": "report-1760750575650-2619", "content": "# Hollywood's Billion-Dollar Hitmakers\n\n*Avatar* stands alone—earning over $2.5B in profit, dwarfing all competition. Action and Adventure films dominate the most profitable titles, with franchises like *Jurassic Park*, *The Dark Knight*, and *Lord of the Rings* proving blockbuster formulas work.\n\n\"Chart\"\n\nSteven Spielberg leads all directors with $7.2B in total profit across his career, showcasing remarkable consistency with hits spanning decades—from *Jurassic Park* to *E.T.* His nearest competitors trail by billions, underlining his unmatched commercial impact.\n\n\"Chart\"\n\n**In summary**, mega-budget Action and Adventure films generate extraordinary returns when they succeed, and a handful of elite directors—led by Spielberg—have mastered the formula for sustained box office dominance.", "style": "short note", "selectedChartIds": ["chart-1760743347871", "chart-1760743768741"], "chartImages": {}, "createdAt": 1760750584189, "title": "Report - 10/17/2025"}, "activeChallenges": [], "_persist": {"version": -1, "rehydrated": true}, "draftNodes": [], "focusedId": {"type": "report", "reportId": "report-1760831156182-8277"}} \ No newline at end of file diff --git a/py-src/data_formulator/agent_config.py b/py-src/data_formulator/agent_config.py new file mode 100644 index 00000000..bec4c670 --- /dev/null +++ b/py-src/data_formulator/agent_config.py @@ -0,0 +1,152 @@ +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT License. + +""" +Single source of truth for per-agent LLM call configuration. + +Edit values here to tune latency vs. quality for each agent. + +Per-agent overrides can also be set at runtime via environment variables: + + DF_REASONING_EFFORT_DATA_TRANSFORM=medium + DF_REASONING_EFFORT_REPORT_GEN=high + +Tiers +----- +- ``"minimal"`` — fastest. Honoured natively only on the OpenAI GPT-5 + base/mini/nano/5.x family (``gpt-5``, ``gpt-5-mini``, ``gpt-5-nano``, + ``gpt-5.1``, ...). On the GPT-5 ``codex`` / ``pro`` variants + :func:`reasoning_effort_for` maps it to ``"none"`` (their lightest + supported tier). On every other reasoning model (o-series, Claude + extended-thinking, Gemini, ...) it is downgraded to ``"low"``. +- ``"none"`` — only accepted by GPT-5 ``codex`` / ``pro``. Downgraded to + ``"low"`` elsewhere. +- ``"low" | "medium" | "high"`` — portable across all reasoning providers via + LiteLLM's normalisation + (https://docs.litellm.ai/docs/reasoning_content). + +Models that don't support reasoning at all silently drop the parameter +because the client is invoked with ``drop_params=True``. +""" + +from __future__ import annotations + +import os +from typing import Literal + +ReasoningEffort = Literal["none", "minimal", "low", "medium", "high"] + +# --------------------------------------------------------------------------- +# Per-agent reasoning effort +# --------------------------------------------------------------------------- +# +# Pick the lowest tier that produces acceptable quality. Heavy code-generation +# and multi-step agents stay at ``"low"``; trivial single-turn extractors and +# classifiers run at ``"minimal"`` (GPT-5) / ``"low"`` (everything else). + +AGENT_REASONING_EFFORT: dict[str, ReasoningEffort] = { + # ── Heavy: code-gen, multi-step, tool-using ───────────────────────────── + "data_transform": "low", # generates Python transform scripts + "data_rec": "low", # chart / transformation recommendation + "data_agent": "low", # multi-step exploration agent + "report_gen": "low", # narrative + inspect/embed tools + "interactive_explore": "low", # exploration idea agent + "data_loading_chat": "low", # conversational data loading w/ tools + + # ── Light: single-turn extractors / classifiers / formatters ──────────── + "data_load": "minimal", # one-shot type inference + "data_clean": "minimal", # extract tables from text + "experience_distill": "minimal", # summarise an analysis context + "chart_insight": "minimal", # title + 1–3 takeaways from a chart + "chart_restyle": "minimal", # apply style edits to a Vega-Lite spec + "code_explanation": "minimal", # describe derived fields + "sort_data": "minimal", # natural-order sort a small list + "simple": "minimal", # nl_to_filter / workspace_name / intent +} + +DEFAULT_REASONING_EFFORT: ReasoningEffort = "low" + +_VALID_TIERS: frozenset[str] = frozenset(("none", "minimal", "low", "medium", "high")) + + +def get_reasoning_effort(agent_id: str | None) -> ReasoningEffort: + """Return the *configured* tier for ``agent_id``. + + Resolution order: + 1. ``DF_REASONING_EFFORT_`` env var + 2. ``AGENT_REASONING_EFFORT[agent_id]`` + 3. ``DEFAULT_REASONING_EFFORT`` + + Note: this does **not** consider the target model. Use + :func:`reasoning_effort_for` at call time to also apply the + GPT-5-only ``"minimal"`` gating. + """ + if agent_id: + env_key = f"DF_REASONING_EFFORT_{agent_id.upper()}" + env_val = (os.environ.get(env_key) or "").strip().lower() + if env_val in _VALID_TIERS: + return env_val # type: ignore[return-value] + if agent_id in AGENT_REASONING_EFFORT: + return AGENT_REASONING_EFFORT[agent_id] + return DEFAULT_REASONING_EFFORT + + +def _supports_minimal(model: str | None) -> bool: + """``"minimal"`` is only accepted by a subset of OpenAI GPT-5 chat models. + + Supported (per OpenAI API): + ``gpt-5``, ``gpt-5-mini``, ``gpt-5-nano``, ``gpt-5.1``, ``gpt-5.4``, + and future GPT-5.x sub-versions of those base variants. + + NOT supported (these reject ``"minimal"`` but accept ``"none"`` / ``xhigh`` + instead): ``gpt-5-codex``, ``gpt-5-pro``. + + Provider prefixes such as ``openai/gpt-5-mini``, ``azure/gpt-5``, + ``openai/responses/gpt-5.4`` are all covered by the substring check. + """ + if not model: + return False + m = model.lower() + if "gpt-5" not in m: + return False + if "codex" in m or "-pro" in m or "/pro" in m: + return False + return True + + +def _supports_none(model: str | None) -> bool: + """``"none"`` is the lightest tier on the GPT-5 ``codex`` / ``pro`` chat + models (which reject ``"minimal"``). Other providers (Claude, Gemini, + o-series) don't accept ``"none"`` as a reasoning_effort value, so we only + use it for these specific GPT-5 variants. + """ + if not model: + return False + m = model.lower() + if "gpt-5" not in m: + return False + return "codex" in m or "-pro" in m or "/pro" in m + + +def reasoning_effort_for(agent_id: str | None, model: str | None) -> ReasoningEffort: + """Resolve the reasoning_effort to actually send to LiteLLM. + + - Reads the configured tier via :func:`get_reasoning_effort`. + - For configured ``"minimal"``: + * keep ``"minimal"`` on GPT-5 base / mini / nano / 5.x; + * map to ``"none"`` on GPT-5 codex / pro (which support ``"none"`` + but not ``"minimal"``); + * fall back to ``"low"`` on every other reasoning model. + - For configured ``"none"`` on a non-supporting model, fall back to + ``"low"``. + """ + effort = get_reasoning_effort(agent_id) + if effort == "minimal": + if _supports_minimal(model): + return "minimal" + if _supports_none(model): + return "none" + return "low" + if effort == "none" and not _supports_none(model): + return "low" + return effort diff --git a/py-src/data_formulator/agents/agent_chart_insight.py b/py-src/data_formulator/agents/agent_chart_insight.py index 72fd9233..a3ae8aba 100644 --- a/py-src/data_formulator/agents/agent_chart_insight.py +++ b/py-src/data_formulator/agents/agent_chart_insight.py @@ -1,12 +1,16 @@ # Copyright (c) Microsoft Corporation. # Licensed under the MIT License. +from data_formulator.agent_config import reasoning_effort_for from data_formulator.agents.agent_utils import generate_data_summary, extract_json_objects +from data_formulator.agents.agent_language import inject_language_instruction import logging logger = logging.getLogger(__name__) +_AGENT_ID = "chart_insight" + SYSTEM_PROMPT = r'''You are a data analyst helping users understand their visualizations. You are given a chart image along with metadata about the chart type, data fields used, and a summary of the underlying data (including schema, value ranges, and sample rows). @@ -92,8 +96,7 @@ def run(self, chart_image_base64, chart_type, field_names, input_tables=None, n= if self._knowledge_store: system_prompt += self._knowledge_store.format_rules_block() - if self.language_instruction: - system_prompt = system_prompt + "\n\n" + self.language_instruction + system_prompt = inject_language_instruction(system_prompt, self.language_instruction) messages = [ {"role": "system", "content": system_prompt}, @@ -103,7 +106,7 @@ def run(self, chart_image_base64, chart_type, field_names, input_tables=None, n= logger.debug(f"ChartInsightAgent: analyzing {chart_type} chart with fields {field_names}") logger.info(f"[ChartInsightAgent] run start | chart_type={chart_type}") - response = self.client.get_completion(messages=messages) + response = self.client.get_completion(messages=messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) candidates = [] for choice in response.choices: diff --git a/py-src/data_formulator/agents/agent_chart_restyle.py b/py-src/data_formulator/agents/agent_chart_restyle.py index d59eb90f..61edc4d6 100644 --- a/py-src/data_formulator/agents/agent_chart_restyle.py +++ b/py-src/data_formulator/agents/agent_chart_restyle.py @@ -23,7 +23,13 @@ import logging from typing import Any +from data_formulator.agent_config import reasoning_effort_for +logger = logging.getLogger(__name__) + +_AGENT_ID = "chart_restyle" + from data_formulator.agents.agent_utils import extract_json_objects +from data_formulator.agents.agent_language import inject_language_instruction logger = logging.getLogger(__name__) @@ -130,9 +136,7 @@ def run( user_text = "\n".join(parts) - system_prompt = SYSTEM_PROMPT - if self.language_instruction: - system_prompt = system_prompt + "\n\n" + self.language_instruction + system_prompt = inject_language_instruction(SYSTEM_PROMPT, self.language_instruction) messages = [ {"role": "system", "content": system_prompt}, @@ -141,7 +145,7 @@ def run( logger.info("[ChartRestyleAgent] run start | chart_type=%s", chart_type) - response = self.client.get_completion(messages=messages) + response = self.client.get_completion(messages=messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) for choice in response.choices: content = choice.message.content or "" diff --git a/py-src/data_formulator/agents/agent_code_explanation.py b/py-src/data_formulator/agents/agent_code_explanation.py index 8819de42..43d78535 100644 --- a/py-src/data_formulator/agents/agent_code_explanation.py +++ b/py-src/data_formulator/agents/agent_code_explanation.py @@ -2,39 +2,41 @@ # Licensed under the MIT License. import json +from data_formulator.agent_config import reasoning_effort_for from data_formulator.agents.agent_utils import generate_data_summary, extract_json_objects, extract_code_from_gpt_response +from data_formulator.agents.agent_language import inject_language_instruction import logging logger = logging.getLogger(__name__) +_AGENT_ID = "code_explanation" -SYSTEM_PROMPT = r'''You are a data scientist to help user explain derived data concepts, -so that a non-coder can clearly understand what new fields mean. You are provided with a summary of the input data, and the transformation code. - -Your goal: -1. Generate a list of explanations for new fields (fields not from the input data) that introduce metrics/concepts that are not obvious from the code. - - provide a declarative definition that explains the new field, use a mathematical notation if applicable. - - only include new fields explanation of new metrics that are involved in computation (e.g., ROI, commerical_success_score) - - *DO NOT* explain trivial new fields like "Decade" or "Avg_Rating", "US_Sales" that are self-explanatory. - - Avoid explaining fields that are simple aggregate of fields in the original data (min_score, avg_value, count, etc.) - - When a field involves mathematical computation, you can use LaTeX math notation in the explanation. Format mathematical expressions using: - - Inline math: `\( ... \)` for formulas within text - - Block math: `\[ ... \]` for standalone formulas - - Examples: `\( \frac{\text{Revenue}}{\text{Cost}} \)` for ratios, `\[ \text{Score} = \text{Rating} \times \text{Worldwide\_Gross} \]` for formulas - - note: when using underscores as part of the text, you need to escape them with a backslash, e.g., `\_` - - Note: don't use math notation for fields whose computation is trivial (use plain english), it will likely be confusing to the reader. - Only use math notation for fields that can not be easilyexplained in plain english. Use it sparingly. -2. If there are multiple fields that have the similar computation, you can explain them together in one explanation. - - in "field", you can provide a list of fields in format of "field1, field2, ..." - - in "explanation", you can provide a single explanation for the computation of the fields. - - for example, if you have fields like "Norm_Rating", "Norm_Gross", "Critical_Commercial_Score", you can explain Norm_Rating, Norm_Gross together in one explanation and explain Critical_Commercial_Score in another explanation. -3. If the code is about statistical analysis, you should explain the statistical analysis in the explanation as a concept named "Statistical Analysis". - - explain how you model the data, which fields are used, how data processing is done, and what models are used. - - suggest some other modeling approaches that can be used to analyze the data in the explanation as well. - -The focus is to explain how new fields are computed, don't generate explanation for low-level actions like "return", "load data" etc. -If there are no non-trivial new fields/concepts, return an empty list. + +SYSTEM_PROMPT = r'''You help a non-coder understand how newly derived fields are computed. + +For each non-trivial derived field, output: + 1. the field name(s) + 2. a short formula — use actual field names (e.g. `Profit = Revenue - Cost`), + and reach for formal math (\sum, \frac, etc.) only when it's the clearest + way to express the computation. + +A brief one-line description before the formula is allowed when it adds clarity +(e.g. "Within each Major\_category:"). Otherwise keep it to just the formula. + +Skip fields whose computation is trivial or obvious from the name +(count/min/max/avg/sum, year/decade extraction, simple rename, etc.). +Group fields that share the same formula shape into one entry +(`"field": "f1, f2, ..."`). + +For statistical-analysis code (regression, clustering, hypothesis tests), +emit a single entry with `"field": "Statistical Analysis"` containing the +model's defining equation(s). + +LaTeX: inline `\( ... \)`, block `\[ ... \]`, escape underscores as `\_`. +Prefer inline for short formulas, block when there's vertical structure. + +If nothing is worth showing, return an empty list. Provide the result as a JSON block (start with ```json) in the [CONCEPTS EXPLANATION] section. @@ -127,15 +129,15 @@ def extract_decade(date_str): ```json [ - { - "field": "Norm_Rating, Norm_Gross", - "explanation": "Normalized values that scale the original values between 0 and 1 using min-max normalization. Formula: -BSLASH-(-BSLASH-text{Normalized} = -BSLASH-frac{-BSLASH-text{Value} - -BSLASH-text{Min}}{-BSLASH-text{Max} - -BSLASH-text{Min}} -BSLASH-)" - }, - { - "field": "Critical_Commercial_Score", - "explanation": "The critical-commercial success score combines **Norm_Rating** and **Norm_Gross** to represent a movie's critical acclaim and commercial performance. Formula: -BSLASH-(-BSLASH-text{Critical-BSLASH-_Commercial-BSLASH-_Score} = -BSLASH-text{Norm-BSLASH-_Rating} -BSLASH-times -BSLASH-text{Norm-BSLASH-_Gross} -BSLASH-)" + { + "field": "Norm_Rating, Norm_Gross", + "explanation": "-BSLASH-[ -BSLASH-text{Normalized} = -BSLASH-frac{x - -BSLASH-min(x)}{-BSLASH-max(x) - -BSLASH-min(x)} -BSLASH-]" + }, + { + "field": "Critical_Commercial_Score", + "explanation": "-BSLASH-[ -BSLASH-text{Critical-BSLASH-_Commercial-BSLASH-_Score} = -BSLASH-text{Norm-BSLASH-_Rating} -BSLASH-times -BSLASH-text{Norm-BSLASH-_Gross} -BSLASH-]" } -] +] ''' class CodeExplanationAgent(object): @@ -158,14 +160,12 @@ def run(self, input_tables, code, n=1): logger.debug(user_query) logger.info(f"[CodeExplanationAgent] run start") - system_prompt = SYSTEM_PROMPT - if self.language_instruction: - system_prompt = system_prompt + "\n\n" + self.language_instruction + system_prompt = inject_language_instruction(SYSTEM_PROMPT, self.language_instruction) messages = [{"role":"system", "content": system_prompt}, {"role":"user","content": user_query}] - response = self.client.get_completion(messages = messages) + response = self.client.get_completion(messages = messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) candidates = [] for choice in response.choices: diff --git a/py-src/data_formulator/agents/agent_data_clean_stream.py b/py-src/data_formulator/agents/agent_data_clean_stream.py deleted file mode 100644 index 3fcfe745..00000000 --- a/py-src/data_formulator/agents/agent_data_clean_stream.py +++ /dev/null @@ -1,240 +0,0 @@ -# Copyright (c) Microsoft Corporation. -# Licensed under the MIT License. - -from data_formulator.agents.agent_utils import extract_json_objects -from data_formulator.agents.web_utils import download_html_content - -import logging -import re -import json -import traceback - -logger = logging.getLogger(__name__) - - -def parse_table_sections(text): - """Parse [TABLE_START] to [TABLE_END] sections and extract metadata and content.""" - tables = [] - - # Split by [TABLE_START] and process each section - sections = text.strip().split('[TABLE_START]') - - print(sections) - - for i, section in enumerate(sections[1:], 1): # Skip first empty section - - # Find the end of this table section - if '[TABLE_END]' not in section: - continue - - table_section = section.split('[TABLE_END]')[0] - - metadata_index = table_section.find('[METADATA]') - content_index = table_section.find('[CONTENT]') - - # Extract metadata between [METADATA] and ``` - if metadata_index != -1 and content_index != -1: - metadata_block = table_section[metadata_index + len('[METADATA]'):content_index] - metadata_json = extract_json_objects(metadata_block)[0] - - # Extract content between [CONTENT] and end - if content_index != -1: - content_block = table_section[content_index + len('[CONTENT]'):].strip() - - # Create table object - table = { - "name": metadata_json.get('name', 'unknown'), - "context": metadata_json.get('context', ''), - "content": { - "type": metadata_json.get('type', 'csv'), - "value": content_block - } - } - - tables.append(table) - - return tables - - -SYSTEM_PROMPT = '''You are a data scientist to help user to generate, extract data from image, or clean a text input into a structured csv table. - -If there are multiple tables in the raw data, you should extract them all. -Each table can either be a csv block or a url (url of an image that you think contains data). -- csv block: a string of csv content (if the content is already available from the input) -- image url: link to an image that contains data - -Based on the raw data provided by the user, extract tables: -- each extracted table should be wrapped in a section, its metadata is a json object describes its name and type in [METADATA] section. -- if the table is a csv block, it should be wrapped in [CONTENT] tags. Do not wrap it in any other tags, just write plain csv content in the [CONTENT]. -- if the table is an image url, [CONTENT] should be the url. -- when there are multiple tables, generate one table at a time. - -Output only extract tables, no other text should be included. Format: - -[TABLE_START] - -[METADATA] - -```json -{ - "name": "...", // suggest a descriptive, meaningful but short name for this dataset, no more than 5 words, if there are duplicate names, add a suffix -1, -2, etc. (e.g., "sales-2024", "us-customer-survey-2024", "seattle-weather-forecast") - "type": "csv" | "image_url", - "context": "..." // a paragraph summarizing the original source document or image this table was extracted from. Include: what the document is about, its main topic, any relevant background, and what the table represents within that context. This summary will be used by downstream agents to understand the data, so be descriptive and informative. (around 50-100 words) -} -``` - -[CONTENT] - -... // the csv block or image url, directly output the content, no other text should be included and don't wrap it in any other tags. - -[TABLE_END] - -**Important:** -- NEVER make assumptions or judgments about a person's gender, biological sex, sexuality, religion, race, nationality, ethnicity, political stance, socioeconomic status, mental health, invisible disabilities, medical conditions, personality type, social impressions, emotional state, and cognitive state. -- NEVER create formulas that could be used to discriminate based on age. Ageism of any form (explicit and implicit) is strictly prohibited. - -**Multiple tables:** -- if the raw data contains multiple tables, based on the user's instruction to decide which table to extract. -- if the user doesn't specify which tables to extract, extract all tables. -- if there are multiple tables yet they can be too large, only extract up to 200 rows for each table. - -**Instructions for creating csv blocks:** -* the output should be a structured csv table: - - if the raw data is unstructured, structure it into a csv table. If the table is in other formats, transform it into a csv table. - - if the raw data contain other informations other than the table (e.g., title, subtitle, footer, summary, etc.), remove surrounding texts that does not belong to the table, so that the table conforms to csv format. - - if the raw data contains multiple levels of header, make it a flat table. It's ok to combine multiple levels of headers to form the new header to not lose information. - - the csv table should have the same number of cells for each line, according to the header. If there are some rows with missing values, patch them with empty values. - - if the header row misses some columns, add their corresponding column names. E.g., when the header doesn't have an index column, but every row has an index value, add the missing column header. -* clean up messy column names: - - if the column name contains special characters like "*", "?", "#", "." remove them. -* csv value format: - - if a column is number but some cells has annotations like "*" "?" or brackets, clean them up. - - if values of a column is all numbers but has units like ($, %, s), remove the unit in the value cells, convert them to number, note unit in the header of this column. - - you don't need to convert format of the cell. - -**Instructions for creating image url:** -- based on the context provided in the prompt and raw input material, decide which url in the raw data may cotain the data we would like to extract (like an image contains structured data). - -**Instructions for generating synthetic data:** -- NEVER generate data that has implicit bias as noted above, if that happens, neutralize the data. -- If the user doesn't indicate how many rows to be generated, plan in generating a dataset with 20-30 rows depending on the content. - -**IMPORTANT:** -- when the user provide an image and ask to extract data, you should extract data from the image into a csv block. -- get all tables that contain structured data from the raw data, including the csv blocks and image urls. -''' - - -EXAMPLE = ''' -Rank NOC Gold Silver Bronze Total -1 South Korea 5 1 1 7 -2 France* 0 1 1 2 - United States 0 1 1 2 -4 China 0 1 0 1 - Germany 0 1 0 1 -6 Mexico 0 0 1 1 - Turkey 0 0 1 1 -Totals (7 entries) 5 5 5 15 -''' - -class DataCleanAgentStream(object): - - def __init__(self, client, language_instruction=""): - self.client = client - self.language_instruction = language_instruction - - def stream(self, prompt, artifacts=[], dialog=[]): - """derive a new concept based on the raw input data - Args: - prompt (str): the prompt to the agent - artifacts (list): the artifacts to the agent of format - [{"type": "image_url", "content": ...}, {"type": "web_url", "content": ...}, ...] - dialog (list): the dialog history - Returns: - generator: the result of the agent - """ - - content = [] - - for artifact in artifacts: - if artifact['type'] == 'image_url': - content.append({ - 'type': 'image_url', - 'image_url': { - "url": artifact['content'], - "detail": "high" - } - }) - elif artifact['type'] == 'web_url': - try: - content.append({ - 'type': 'text', - 'text': f"[HTML CONTENT]\n\n{download_html_content(artifact['content'])}" - }) - except Exception as e: - raise Exception('unable to download html from url ' + artifact['content']) - - content.append({ - 'type': 'text', - 'text': f'''{prompt}''' - }) - - user_prompt = { - 'role': 'user', - 'content': content - } - - logger.debug(user_prompt) - logger.info(f"[DataCleanAgent] run start (streaming)") - - prompt_text = SYSTEM_PROMPT - if self.language_instruction: - prompt_text = prompt_text + "\n\n" + self.language_instruction - - system_message = { - 'role': 'system', - 'content': [ {'type': 'text', 'text': prompt_text}] - } - - messages = [ - system_message, - *[message for message in dialog if message['role'] != 'system'], - user_prompt - ] - - ###### the part that calls open_ai - stream = self.client.get_completion(messages = messages, stream=True) - - accumulated_content = "" - - for part in stream: - if hasattr(part, 'choices') and len(part.choices) > 0: - delta = part.choices[0].delta - if hasattr(delta, 'content') and delta.content: - accumulated_content += delta.content - - # Stream each character for real-time display as JSON - yield delta.content - - # Parse the final content the same way as the non-streaming version - logger.debug("\n=== Python Data Clean Agent Stream ===>\n") - logger.debug(accumulated_content + "\n") - - # Parse table sections from the accumulated content - tables = parse_table_sections(accumulated_content) - - if len(tables) > 0: - # Use the same format as non-streaming version - return the parsed data directly - result = { - 'status': 'ok', - 'content': tables, - } - else: - result = {'status': 'other error', 'content': 'unable to extract tables from response', 'content_code': 'agent.unableExtractTables'} - - result['dialog'] = [*messages, {"role": "assistant", "content": accumulated_content}] - result['agent'] = 'DataCleanAgentStream' - logger.info(f"[DataCleanAgent] run done | status={result.get('status', '?')}") - - # add a newline to the beginning of the result to separate it from the previous result - yield '\n' + json.dumps(result, ensure_ascii=False) + '\n' \ No newline at end of file diff --git a/py-src/data_formulator/agents/agent_data_load.py b/py-src/data_formulator/agents/agent_data_load.py index 1be2b397..baffd103 100644 --- a/py-src/data_formulator/agents/agent_data_load.py +++ b/py-src/data_formulator/agents/agent_data_load.py @@ -3,8 +3,10 @@ import json +from data_formulator.agent_config import reasoning_effort_for from data_formulator.agents.agent_utils import extract_json_objects, generate_data_summary from data_formulator.agents.agent_diagnostics import AgentDiagnostics +from data_formulator.agents.agent_language import inject_language_instruction from data_formulator.agents.semantic_types import ( generate_semantic_types_prompt, ) @@ -13,6 +15,8 @@ logger = logging.getLogger(__name__) +_AGENT_ID = "data_load" + SYSTEM_PROMPT = '''You are a data scientist to help user infer data types based off the table provided by the user. Given a dataset provided by the user, @@ -166,9 +170,7 @@ def __init__(self, client, workspace, language_instruction="", model_info=None): self.workspace = workspace self.language_instruction = language_instruction - self.system_prompt = SYSTEM_PROMPT - if language_instruction: - self.system_prompt = self.system_prompt + "\n\n" + language_instruction + self.system_prompt = inject_language_instruction(SYSTEM_PROMPT, language_instruction) self._diag = AgentDiagnostics( agent_name="DataLoadAgent", @@ -199,7 +201,7 @@ def run(self, input_data, n=1): messages = [{"role":"system", "content": self.system_prompt}, {"role":"user","content": user_query}] - response = self.client.get_completion(messages = messages) + response = self.client.get_completion(messages = messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) candidates = [] for choice in response.choices: diff --git a/py-src/data_formulator/agents/agent_data_loading_chat.py b/py-src/data_formulator/agents/agent_data_loading_chat.py index 4182edbe..61d3a0e6 100644 --- a/py-src/data_formulator/agents/agent_data_loading_chat.py +++ b/py-src/data_formulator/agents/agent_data_loading_chat.py @@ -3,8 +3,7 @@ """Conversational data loading agent. -Replaces the old DataCleanAgentStream with a general-purpose -conversational agent that can: +General-purpose conversational agent that can: - Extract tables from images / text / files - Execute Python code in a sandboxed environment - Show inline table previews @@ -17,14 +16,16 @@ import os import re -import litellm -import openai import pandas as pd -from data_formulator.agents.agent_data_clean_stream import parse_table_sections +from data_formulator.agent_config import reasoning_effort_for +from data_formulator.agents.agent_utils import accumulate_reasoning_content +from data_formulator.datalake.parquet_utils import df_to_safe_records logger = logging.getLogger(__name__) +_AGENT_ID = "data_loading_chat" + # --------------------------------------------------------------------------- # System prompt @@ -34,49 +35,71 @@ You are a data assistant helping users load and prepare data for analysis in Data Formulator. Tools available: -- read_file / write_file / list_directory — workspace filesystem +- read_file / write_file / list_directory — workspace filesystem (scratch/ uploads) - execute_python — run Python (pandas, numpy, DuckDB). All DataFrames are auto-saved to scratch/. -- list_sample_datasets — list available built-in datasets with their tables and exact call syntax -- show_user_data_preview — show interactive table preview with Load button -- search_data_candidates — search across all data sources for tables matching a keyword -- read_candidate_metadata — read detailed metadata for a table from a connected source +- list_data — browse the catalog hierarchy of connected sources (cache-only, fast) +- find_data — regex search across cached catalogs (names, descriptions, columns) +- describe_data — read full metadata for one table +- show_user_data_preview — show interactive table preview with Load button (for execute_python results or extracted tables only) - propose_load_plan — propose a multi-table loading plan for user confirmation CRITICAL: You MUST call the show_user_data_preview tool to show data. Do NOT just describe data in text. -Four workflows: - -**Workflow 1 — Sample dataset:** -1. Call list_sample_datasets to see what's available (returns exact dataset_name to use) -2. Call show_user_data_preview(dataset_name="") — ALL tables in the dataset are shown +Three workflows: -**Workflow 2 — Uploaded file or code processing:** +**Workflow 1 — Uploaded file or code processing:** 1. Inspect files with read_file/list_directory 2. Process with execute_python (DataFrames auto-saved to scratch/) 3. Call show_user_data_preview(saved_dfs=["df_name"]) -**Workflow 3 — Unstructured text or image extraction:** +**Workflow 2 — Unstructured text or image extraction:** 1. Extract table into CSV format 2. Call show_user_data_preview(tables=[{{"name": "...", "data": "col1,col2\\n..."}}]) -**Workflow 4 — Find and load data from connected sources:** -1. Call search_data_candidates(query="...", scope="all") to find relevant tables -2. For EACH promising not-imported table, call read_candidate_metadata(source_id, table_key) to inspect columns and understand available values -3. Based on the column metadata, decide which columns to filter on and what values to use -4. Call propose_load_plan(candidates=[...], reasoning="...") — the UI shows a confirmation card -5. Keep your text brief after propose_load_plan. The UI handles the rest. +**Workflow 3 — Find and load data from connected sources (including sample datasets):** +1. Call find_data(query="...") to search. The query is a case-insensitive regex — + use alternation for synonyms ("orders|sales|revenue"), anchors ("^fact_"), word + boundaries ("\\border\\b"), or optional groups ("customers?") when helpful. Escape + "." if you mean a literal dot. Pass exclude="_staging|_test" to drop noise. + When search is ambiguous, restrict with scope="" or + scope=":". +2. If find_data returns nothing useful or is ambiguous, fall back to list_data: + - list_data() → which sources exist + - list_data(source_id="...") → top-level folders / tables + - list_data(source_id, path=[...]) → drill in + - Pass filter="..." (plain substring, not regex) when a directory has many entries. + Responses are capped at 200 entries; if truncated:true, narrow with filter or + switch back to find_data with a scope. +3. For EACH promising not-imported table, call describe_data(source_id, table_key) + to inspect columns and understand available values. +4. Based on column metadata, decide which columns to filter on and what values to use. +5. Call propose_load_plan(candidates=[...], reasoning="...") — the UI shows a + confirmation card. +6. Keep your text brief after propose_load_plan. The UI handles the rest. + +Workflow selection rubric (apply in order): +- User pasted/uploaded data, attached an image, or asked to process scratch files → Workflow 1 or 2. +- User asked "what data do you have / what's available / which sources are connected" → call + list_data() — it returns the per-source summary. Drill in with list_data(source_id, ...). + Do NOT rely solely on the summary below; it only shows counts. +- Otherwise, if connected data sources are listed below AND the user is describing data they want + to analyze (an entity, metric, time range, region, product, demo data, etc.) → start with + Workflow 3. Try regex variants (English + the user's language, synonyms, table-name fragments, + folder names) with find_data before giving up. The built-in 'sample_datasets' source is + included automatically. +- Only fall back to synthetic data after Workflow 3 returned no plausible matches. Rules: - After show_user_data_preview or propose_load_plan, keep text VERY brief. The UI shows the preview automatically. -- For sample datasets, NEVER use execute_python or write_file to recreate them. +- show_user_data_preview is ONLY for: (a) DataFrames you actually produced with execute_python via saved_dfs=, or (b) tables you literally extracted from a user-provided image or pasted text via tables=. NEVER use show_user_data_preview(tables=...) to narrate, describe, or invent contents of a connector-sourced table. To load ANY table from a connected source (including sample_datasets), you MUST use propose_load_plan. +- For sample datasets, NEVER use execute_python or write_file to recreate them — use Workflow 3. - execute_python auto-saves ALL DataFrames created in code. -- Use Workflow 4 when the user describes an analysis goal and you need to find relevant data from connected sources. -- In propose_load_plan, always pass source_id and table_key exactly from search_data_candidates/read_candidate_metadata. +- In propose_load_plan, always pass source_id and table_key exactly from find_data/describe_data. If propose_load_plan returns an error listing valid source_ids, re-run find_data with a better query and retry — do NOT guess IDs. - Do NOT set row_limit in propose_load_plan; the system applies the user's configured global limit automatically. Filter rules for propose_load_plan: -- You MUST call read_candidate_metadata BEFORE proposing filters. Do NOT guess column names or values. -- Use the column names exactly as returned by read_candidate_metadata. Do NOT invent column names. +- You MUST call describe_data BEFORE proposing filters. Do NOT guess column names or values. +- Use the column names exactly as returned by describe_data. Do NOT invent column names. - Filter values must be plain values without SQL wildcards. WRONG: "%奔图%". CORRECT: "奔图". - For partial text matching, use operator ILIKE — the backend adds wildcards automatically. - For exact matching of a known category value, use operator EQ. @@ -86,8 +109,8 @@ Current date and time: {current_time} Currently loaded workspace tables: {table_names} -Connected data sources: {connector_summary} -Sample datasets are available — call list_sample_datasets to see them. +Connected data sources: +{connector_summary} IMPORTANT: - When extracting tables: clean column names, remove units from values (note in headers), flatten multi-level headers. @@ -183,11 +206,26 @@ { "type": "function", "function": { - "name": "list_sample_datasets", - "description": "List available built-in sample datasets with their tables and the exact dataset_name to use with show_user_data_preview.", + "name": "list_data", + "description": ( + "Browse the catalog of connected data sources. Cache-only, fast.\n" + "- No args: per-source summary (source_id, table_count, is_hierarchical).\n" + "- source_id only: top-level entries (folders with table counts, plus root tables).\n" + "- source_id + path: direct children at that hierarchy level.\n" + "- filter: case-insensitive substring on the next path segment / table name (no regex here).\n" + "Workspace tables are already in the system prompt and are not repeated." + ), "parameters": { "type": "object", - "properties": {}, + "properties": { + "source_id": {"type": "string", "description": "Data source identifier. Omit for source-level summary."}, + "path": { + "type": "array", + "items": {"type": "string"}, + "description": "Hierarchy path as an array of segments (e.g. ['sales', 'fy26']).", + }, + "filter": {"type": "string", "description": "Substring filter on the next path segment / table name."}, + }, "required": [], }, }, @@ -197,18 +235,14 @@ "function": { "name": "show_user_data_preview", "description": ( - "Show interactive table preview(s) with Load button. Three modes (use exactly one):\n" - "1. dataset_name: load built-in sample dataset by name\n" - "2. saved_dfs: reference DataFrames auto-saved by execute_python (by variable name)\n" - "3. tables: inline CSV data for direct extraction from text/images\n" + "Show interactive table preview(s) with Load button. Two modes (use exactly one):\n" + "1. saved_dfs: reference DataFrames auto-saved by execute_python (by variable name)\n" + "2. tables: inline CSV data for direct extraction from text/images\n" + "For tables in a connected source (including sample_datasets), use propose_load_plan instead." ), "parameters": { "type": "object", "properties": { - "dataset_name": { - "type": "string", - "description": "Exact dataset name from list_sample_datasets (e.g. 'Space launches'). All tables in the dataset are shown.", - }, "saved_dfs": { "type": "array", "description": "DataFrame variable names from execute_python (e.g. ['df_clean', 'df_summary'])", @@ -234,13 +268,30 @@ { "type": "function", "function": { - "name": "search_data_candidates", - "description": "Search across all data sources (workspace tables, connected databases, sample datasets) for tables matching a keyword.", + "name": "find_data", + "description": ( + "Regex search across cached catalogs for tables matching a query. " + "Searches table names, table descriptions, column names, and column descriptions.\n" + "- query: case-insensitive regex. Plain keywords work as literals; use alternation " + "(orders|sales|revenue), anchors (^fact_), word boundaries (\\border\\b), and optional " + "groups (customers?) when useful. Escape . if you mean a literal dot.\n" + "- scope: 'all' (default), 'workspace', 'connected', '', or ':' " + "to restrict to a subtree (path is /-joined segments).\n" + "- exclude: optional regex on table name to drop hits (e.g. '_staging|_test').\n" + "- fields: subset of ['name','description','columns'] to restrict matching; default is all." + ), "parameters": { "type": "object", "properties": { - "query": {"type": "string", "description": "Search keyword (e.g. 'orders', 'sales')"}, - "scope": {"type": "string", "enum": ["all", "workspace", "connected"], "description": "Search scope. Default: all"}, + "query": {"type": "string", "description": "Case-insensitive regex."}, + "scope": {"type": "string", "description": "Search scope. Default: all"}, + "exclude": {"type": "string", "description": "Optional regex; drops hits whose name matches."}, + "fields": { + "type": "array", + "items": {"type": "string", "enum": ["name", "description", "columns"]}, + "description": "Restrict matching to these fields. Default: all.", + }, + "limit": {"type": "integer", "description": "Max results. Default 50, max 200."}, }, "required": ["query"], }, @@ -249,8 +300,8 @@ { "type": "function", "function": { - "name": "read_candidate_metadata", - "description": "Read detailed metadata (columns, types, description, row count) for a specific table from a connected data source. Use source_id and table_key from search_data_candidates results.", + "name": "describe_data", + "description": "Read full metadata (columns, types, description, row count) for one table. Use source_id + table_key from find_data results.", "parameters": { "type": "object", "properties": { @@ -314,6 +365,84 @@ def _secure_filename(name: str) -> str: return name or "unnamed" + +def _summarize_catalog_shape(tables: list[dict]) -> tuple[int, int]: + """Return ``(table_count, distinct_folder_count)`` for a catalog. + + Folder count is 0 when no table has a hierarchical ``path`` (depth >= 2); + flat catalogs report 0 folders so the summary stays terse. + """ + folders: set[str] = set() + any_hierarchy = False + for t in tables: + path = t.get("path") or [] + if isinstance(path, list) and len(path) >= 2: + any_hierarchy = True + folders.add(str(path[0])) + return len(tables), (len(folders) if any_hierarchy else 0) + + +def _build_connector_summary_block( + user_home, + *, + max_total_chars: int = 1200, +) -> str: + """Render a compact directory of cached connector catalogs. + + Only shows source IDs with table counts (and folder counts when the + catalog is hierarchical). The agent is expected to call ``list_data`` + for full inventory. + Strictly hard-capped at ``max_total_chars``. + """ + if not user_home: + return " none" + try: + from pathlib import Path + + from data_formulator.datalake.catalog_cache import list_cached_sources, load_catalog + except Exception: + logger.debug("connector summary: imports failed", exc_info=True) + return " none" + + try: + source_ids = list_cached_sources(user_home) + except Exception: + logger.debug("connector summary: list_cached_sources failed", exc_info=True) + return " none" + + if not source_ids: + return " none" + + user_home_path = Path(user_home) + lines: list[str] = [] + for sid in sorted(source_ids): + try: + tables = load_catalog(user_home_path, sid) or [] + except Exception: + logger.debug("connector summary: load_catalog failed for %s", sid, exc_info=True) + tables = [] + n, k = _summarize_catalog_shape(tables) + if n == 0: + lines.append(f"- {sid}: 0 tables cached") + elif k > 0: + lines.append( + f"- {sid}: {n} table{'s' if n != 1 else ''} " + f"across {k} folder{'s' if k != 1 else ''}" + ) + else: + lines.append(f"- {sid}: {n} table{'s' if n != 1 else ''}") + + lines.append( + " (call list_data() for sources, list_data(source_id, ...) to drill, " + "or find_data(query=...) to search)" + ) + + output = "\n".join(lines) + if len(output) > max_total_chars: + output = output[:max_total_chars].rstrip() + "\n ... (truncated)" + return output + + class DataLoadingAgent: """Conversational agent for data loading and extraction.""" @@ -383,6 +512,7 @@ def _agentic_loop(self, llm_messages, collected_text, actions, max_iterations): # Accumulate streaming response tool_calls_acc = {} # id -> {name, arguments_str} current_text = [] + accumulated_reasoning = None finish_reason = None for chunk in response: @@ -392,6 +522,11 @@ def _agentic_loop(self, llm_messages, collected_text, actions, max_iterations): delta = chunk.choices[0].delta finish_reason = chunk.choices[0].finish_reason + # Accumulate reasoning_content (DeepSeek V4 reasoning models) + accumulated_reasoning = accumulate_reasoning_content( + accumulated_reasoning, delta + ) + # Stream text tokens if hasattr(delta, 'content') and delta.content: collected_text.append(delta.content) @@ -421,6 +556,8 @@ def _agentic_loop(self, llm_messages, collected_text, actions, max_iterations): # Build assistant message with tool calls for LLM context assistant_msg = {"role": "assistant", "content": "".join(current_text) or None} + if accumulated_reasoning is not None: + assistant_msg["reasoning_content"] = accumulated_reasoning assistant_msg["tool_calls"] = [] for idx in sorted(tool_calls_acc.keys()): tc = tool_calls_acc[idx] @@ -495,30 +632,10 @@ def _agentic_loop(self, llm_messages, collected_text, actions, max_iterations): # ------------------------------------------------------------------ def _call_llm(self, messages, stream=True): - """Call the LLM with tool definitions, working around Client.get_completion - not supporting a `tools` parameter.""" - if self.client.endpoint == "openai": - client = openai.OpenAI( - base_url=self.client.params.get("api_base", None), - api_key=self.client.params.get("api_key", ""), - timeout=120, - ) - return client.chat.completions.create( - model=self.client.model, - messages=messages, - tools=TOOLS, - stream=stream, - ) - else: - params = self.client.params.copy() - return litellm.completion( - model=self.client.model, - messages=messages, - tools=TOOLS, - drop_params=True, - stream=stream, - **params, - ) + """Call the LLM with tool definitions.""" + return self.client.get_completion_with_tools( + messages, tools=TOOLS, stream=stream, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model), + ) # ------------------------------------------------------------------ # Tool execution @@ -537,14 +654,14 @@ def _execute_tool(self, name, args): return self._tool_list_directory(args, workspace_jail) elif name == "execute_python": return self._tool_execute_python(args) - elif name == "list_sample_datasets": - return self._tool_list_sample_datasets() elif name == "show_user_data_preview": return self._tool_show_user_data_preview(args, scratch_jail) - elif name == "search_data_candidates": - return self._tool_search_data_candidates(args) - elif name == "read_candidate_metadata": - return self._tool_read_candidate_metadata(args) + elif name == "list_data": + return self._tool_list_data(args) + elif name == "find_data": + return self._tool_find_data(args) + elif name == "describe_data": + return self._tool_describe_data(args) elif name == "propose_load_plan": return self._tool_propose_load_plan(args) else: @@ -674,7 +791,7 @@ def _tool_execute_python(self, args): "path": f"scratch/{safe_name}.csv", "rows": len(df), "columns": list(df.columns), - "preview": df.head(3).to_dict(orient="records"), + "preview": df_to_safe_records(df.head(3)), } if saved: @@ -692,19 +809,17 @@ def _tool_execute_python(self, args): return {"stdout": "", "error": "Code execution failed"} def _tool_show_user_data_preview(self, args, scratch_jail): - """Unified data preview with 3 modes.""" - dataset_name = args.get("dataset_name") + """Unified data preview. To load from a connected source (including + the built-in 'sample_datasets'), use propose_load_plan instead.""" saved_dfs = args.get("saved_dfs") tables = args.get("tables") - if dataset_name: - return self._preview_sample_dataset(dataset_name) - elif saved_dfs: + if saved_dfs: return self._preview_saved_dfs(saved_dfs, scratch_jail) elif tables: return self._preview_inline_tables(tables, scratch_jail) else: - return {"error": "Provide one of: dataset_name, saved_dfs, or tables."} + return {"error": "Provide one of: saved_dfs or tables. For connected-source tables (including sample_datasets), use propose_load_plan."} def _preview_saved_dfs(self, df_names, scratch_jail): """Preview DataFrames auto-saved by execute_python.""" @@ -729,7 +844,7 @@ def _preview_saved_dfs(self, df_names, scratch_jail): "type": "preview_table", "name": name, "columns": list(df.columns), - "sample_rows": df.head(5).to_dict(orient="records"), + "sample_rows": df_to_safe_records(df.head(5)), "total_rows": len(df), "csv_scratch_path": f"scratch/{safe_name}.csv", }) @@ -756,7 +871,7 @@ def _preview_inline_tables(self, tables, scratch_jail): "type": "preview_table", "name": name, "columns": list(df.columns), - "sample_rows": df.head(5).to_dict(orient="records"), + "sample_rows": df_to_safe_records(df.head(5)), "total_rows": len(df), "csv_scratch_path": f"scratch/{name}.csv", }) @@ -791,7 +906,7 @@ def _preview_scratch_files(self, scratch_files, scratch_dir): "type": "preview_table", "name": table_name, "columns": list(df.columns), - "sample_rows": df.head(5).to_dict(orient="records"), + "sample_rows": df_to_safe_records(df.head(5)), "total_rows": len(df), "csv_scratch_path": file_path, }) @@ -801,107 +916,167 @@ def _preview_scratch_files(self, scratch_files, scratch_dir): return {"actions": actions} - def _tool_list_sample_datasets(self): - """Return structured list of available datasets with call syntax.""" - from data_formulator.example_datasets_config import EXAMPLE_DATASETS + # ------------------------------------------------------------------ + # Data discovery tools + # ------------------------------------------------------------------ - datasets = [] - for ds in EXAMPLE_DATASETS: - tables = [] - for t in ds.get("tables", []): - url = t.get("url", "") - table_name = url.split("/")[-1].split(".")[0] if url else "table" - sample = t.get("sample", []) - if isinstance(sample, list) and sample: - cols = list(sample[0].keys()) if isinstance(sample[0], dict) else [] - elif isinstance(sample, str) and sample.strip(): - header = sample.strip().split("\n")[0] - sep = "," if t.get("format") == "csv" else "\t" - cols = header.split(sep) - else: - cols = [] - tables.append({"table_name": table_name, "columns_preview": cols[:6]}) - - datasets.append({ - "dataset_name": ds["name"], - "description": ds.get("description", ""), - "tables": tables, - "call": f'show_user_data_preview(dataset_name="{ds["name"]}")', - }) - - return {"datasets": datasets} - - def _preview_sample_dataset(self, dataset_name): - """Build preview actions for a built-in sample dataset (exact match).""" - from data_formulator.example_datasets_config import EXAMPLE_DATASETS - - matched = None - for ds in EXAMPLE_DATASETS: - if ds["name"].lower() == dataset_name.lower().strip(): - matched = ds - break + def _tool_list_data(self, args): + """Browse the catalog hierarchy. - if not matched: - available = ", ".join(ds["name"] for ds in EXAMPLE_DATASETS) - return {"error": f"Dataset '{dataset_name}' not found. Available: {available}. Use list_sample_datasets to see exact names."} - - tables_info = [] - for table in matched.get("tables", []): - sample = table.get("sample", []) - if isinstance(sample, list) and len(sample) > 0: - columns = list(sample[0].keys()) if isinstance(sample[0], dict) else [] - tables_info.append({ - "table_url": table.get("url", ""), - "format": table.get("format", "json"), - "columns": columns, - "sample_rows": sample[:5], - "total_sample_rows": len(sample), - }) - elif isinstance(sample, str) and sample.strip(): - try: - df = pd.read_csv(io.StringIO(sample.strip()), - sep="," if table.get("format") == "csv" else "\t") - tables_info.append({ - "table_url": table.get("url", ""), - "format": table.get("format", "csv"), - "columns": list(df.columns), - "sample_rows": df.head(5).to_dict(orient="records"), - "total_sample_rows": len(df), - }) - except Exception: - tables_info.append({ - "table_url": table.get("url", ""), - "format": table.get("format", "csv"), - "columns": [], - "sample_rows": [], - "total_sample_rows": 0, - }) + Three modes: + * no args → per-source summary + * source_id only → top-level entries of that source + * source_id + path → direct children at that level - actions = [{ - "type": "load_sample_dataset", - "name": matched["name"], - "description": matched.get("description", ""), - "live": matched.get("live", False), - "refreshIntervalSeconds": matched.get("refreshIntervalSeconds"), - "tables": tables_info, - }] + Cache-only. Workspace tables are not included; they're already in the + system prompt. See design-docs/32-data-loading-agent-navigation.md §3.1. + """ + from data_formulator.datalake.catalog_cache import ( + list_path_children, + list_sources_summary, + ) - return {"actions": actions} + user_home = getattr(self.workspace, "user_home", None) + if not user_home: + return {"sources": []} - # ------------------------------------------------------------------ - # Data discovery tools - # ------------------------------------------------------------------ + source_id = (args.get("source_id") or "").strip() + if not source_id: + try: + return {"sources": list_sources_summary(user_home)} + except Exception: + logger.debug("list_data: list_sources_summary failed", exc_info=True) + return {"sources": []} - def _tool_search_data_candidates(self, args): - """Search across workspace + connector catalog for matching tables.""" - from data_formulator.agents.context import handle_search_data_tables - query = args.get("query", "") - scope = args.get("scope", "all") - text = handle_search_data_tables(query, scope, self.workspace) - return {"result": text} + path = args.get("path") or [] + if not isinstance(path, list): + return {"error": "path must be an array of strings"} + filter_arg = args.get("filter") + + try: + return list_path_children( + user_home, source_id, path=path, filter=filter_arg, + ) + except Exception as exc: + logger.debug("list_data: list_path_children failed", exc_info=True) + return {"error": f"list_data failed: {exc}"} + + def _tool_find_data(self, args): + """Regex search across cached catalogs. + + ``scope`` accepts: 'all' (default), 'workspace', 'connected', + '', or ':'. The + path-scoped form restricts catalog search to a subtree. + + Workspace tables are searched with a plain substring match (they're + small, regex-on-name has little extra value there). Catalog cache + search is regex-based. See design-docs §3.2. + """ + from data_formulator.datalake.catalog_cache import ( + CatalogSearchError, + search_catalog_cache, + ) + + query = (args.get("query") or "").strip() + if not query: + return {"error": "query is required"} + + scope_raw = (args.get("scope") or "all").strip() + exclude = args.get("exclude") or None + fields = args.get("fields") or None + limit = args.get("limit") + try: + limit = max(1, min(int(limit), 200)) if limit else 50 + except (TypeError, ValueError): + limit = 50 + + # ── Parse scope ─────────────────────────────────────────────── + search_workspace = False + source_ids: list[str] | None = None + path_prefix: list[str] | None = None + + if scope_raw == "all": + search_workspace = True + elif scope_raw == "workspace": + search_workspace = True + source_ids = [] # skip catalog cache entirely + elif scope_raw == "connected": + pass # catalog only, all sources + elif ":" in scope_raw: + sid, _, path_str = scope_raw.partition(":") + source_ids = [sid.strip()] if sid.strip() else [] + path_prefix = [seg for seg in path_str.split("/") if seg] + else: + source_ids = [scope_raw] + + user_home = getattr(self.workspace, "user_home", None) + results: list[dict] = [] + + # ── Workspace search (substring; existing semantics) ───────── + if search_workspace: + try: + ws_meta = self.workspace.get_metadata() + if ws_meta: + ws_hits = ws_meta.search_tables(query, limit=min(limit, 50)) + for hit in ws_hits: + results.append({ + "source": "workspace", + "name": hit["name"], + "description": (hit.get("description") or "")[:120], + "matched_columns": hit.get("matched_columns", []), + "status": "imported", + }) + except Exception: + logger.debug("find_data: workspace search failed", exc_info=True) + + # ── Catalog cache search (regex) ───────────────────────────── + if source_ids != [] and user_home: + try: + imported_names = {r["name"] for r in results} + cache_hits = search_catalog_cache( + user_home, + query, + source_ids=source_ids, + limit_per_source=min(limit, 50), + exclude_tables=imported_names, + exclude_pattern=exclude, + fields=fields, + path_prefix=path_prefix, + ) + for hit in cache_hits[:limit]: + results.append({ + "source": hit.get("source_id", "connected"), + "source_id": hit.get("source_id", ""), + "table_key": hit.get("table_key", ""), + "name": hit["name"], + "description": (hit.get("description") or "")[:120], + "matched_columns": hit.get("matched_columns", []), + "status": "not imported", + }) + except CatalogSearchError as exc: + return {"error": str(exc)} + except Exception: + logger.debug("find_data: catalog search failed", exc_info=True) - def _tool_read_candidate_metadata(self, args): - """Read detailed metadata for a specific table from a connected source.""" + if not results: + try: + from data_formulator.datalake.catalog_cache import list_cached_sources + known = sorted(list_cached_sources(user_home) or []) if user_home else [] + except Exception: + known = [] + return { + "results": [], + "valid_source_ids": known, + "note": ( + f"No tables matched query={query!r} scope={scope_raw!r}. " + "Try a broader pattern, alternation (a|b), or list_data to browse." + ), + } + + return {"results": results[:limit], "query": query, "scope": scope_raw} + + def _tool_describe_data(self, args): + """Read detailed metadata for one table. Delegates to context handler.""" from data_formulator.agents.context import handle_read_catalog_metadata source_id = args.get("source_id", "") table_key = args.get("table_key", "") @@ -909,13 +1084,37 @@ def _tool_read_candidate_metadata(self, args): return {"result": text} def _tool_propose_load_plan(self, args): - """Produce a structured load plan action for frontend rendering.""" - candidates = [ - self._normalize_load_plan_candidate(c) - for c in (args.get("candidates", []) or []) - if isinstance(c, dict) - ] + """Produce a structured load plan action for frontend rendering. + + Candidates are validated against the cached catalog before they leave + this turn. If *every* candidate fails to resolve, we return a + recoverable error so the model can retry with corrected IDs instead + of emitting a card the user can't actually use. + """ + raw = [c for c in (args.get("candidates", []) or []) if isinstance(c, dict)] + candidates = [self._normalize_load_plan_candidate(c) for c in raw] reasoning = args.get("reasoning", "") + + resolvable = [c for c in candidates if not c.get("resolution_error")] + if candidates and not resolvable: + # All candidates failed. Hand the model the valid IDs and ask it + # to retry. Returning an "error" here keeps the assistant loop + # alive; the frontend never sees a broken card. + hint = self._format_valid_sources_hint() + failures = "; ".join( + f"{c.get('source_id')!r}/{c.get('table_key')!r}: {c.get('resolution_error')}" + for c in candidates + ) + return { + "error": ( + "All proposed candidates failed to resolve against the catalog. " + f"Errors: {failures}. " + "Re-run search_data_candidates and read_candidate_metadata, then " + "call propose_load_plan again with the exact source_id and " + f"table_key from those tools.\n\n{hint}" + ) + } + actions = [{ "type": "load_plan", "candidates": candidates, @@ -929,11 +1128,34 @@ def _normalize_load_plan_candidate(self, candidate): The model sees catalog names and stable table keys, but each loader may require a different opaque import id. Superset, for example, must be loaded by numeric dataset_id, not by the Chinese dataset label. + + If ``source_id`` is not a known cached source or ``table_key`` does + not match any catalog entry, a ``resolution_error`` field is set so + the caller can fail loudly (rather than emit a card that 500s when + the user clicks Load). """ result = dict(candidate) source_id = str(result.get("source_id") or "") table_key = str(result.get("table_key") or "") + + resolution_error = None + known_sources = self._known_source_ids() + if not source_id: + resolution_error = "missing source_id" + elif known_sources and source_id not in known_sources: + resolution_error = ( + f"unknown source_id {source_id!r}; " + f"valid: {', '.join(sorted(known_sources)) or 'none'}" + ) + catalog_entry = self._lookup_catalog_entry(source_id, table_key) + if resolution_error is None and not catalog_entry: + if not table_key: + resolution_error = "missing table_key" + else: + resolution_error = ( + f"table_key {table_key!r} not found in source {source_id!r}" + ) metadata = (catalog_entry or {}).get("metadata") or {} display_name = ( @@ -963,9 +1185,30 @@ def _normalize_load_plan_candidate(self, candidate): result["source_table"] = str(import_id) result["source_table_name"] = str(source_name) result["filters"] = self._normalize_load_plan_filters(result.get("filters")) + if resolution_error: + result["resolution_error"] = resolution_error result.pop("row_limit", None) return result + def _known_source_ids(self): + """Return the set of cached source_ids the agent can legitimately use.""" + try: + user_home = getattr(self.workspace, "user_home", None) + if not user_home: + return set() + from data_formulator.datalake.catalog_cache import list_cached_sources + return set(list_cached_sources(user_home) or []) + except Exception: + logger.debug("Could not list cached sources", exc_info=True) + return set() + + def _format_valid_sources_hint(self) -> str: + """Compact directory of valid source_ids for the model retry path.""" + known = self._known_source_ids() + if not known: + return "No connected sources are currently cached." + return "Valid source_ids: " + ", ".join(sorted(known)) + def _lookup_catalog_entry(self, source_id, table_key): if not source_id or not table_key: return None @@ -1066,16 +1309,8 @@ def _build_system_prompt(self, last_user_text: str = ""): message_code="TABLE_LIST_FAILED", ) - connector_summary = "none" - try: - user_home = getattr(self.workspace, "user_home", None) - if user_home: - from data_formulator.datalake.catalog_cache import list_cached_sources - sources = list_cached_sources(user_home) - if sources: - connector_summary = ", ".join(sources) - except Exception: - logger.debug("Could not list cached sources for prompt", exc_info=True) + user_home = getattr(self.workspace, "user_home", None) + connector_summary = _build_connector_summary_block(user_home) from datetime import datetime current_time = datetime.now().strftime("%Y-%m-%d %H:%M (%A)") diff --git a/py-src/data_formulator/agents/agent_data_rec.py b/py-src/data_formulator/agents/agent_data_rec.py index af7981e0..8bd9f054 100644 --- a/py-src/data_formulator/agents/agent_data_rec.py +++ b/py-src/data_formulator/agents/agent_data_rec.py @@ -4,8 +4,10 @@ import json import time -from data_formulator.agents.agent_utils import extract_json_objects, extract_code_from_gpt_response, generate_data_summary, supplement_missing_block, ensure_output_variable_in_code +from data_formulator.agent_config import reasoning_effort_for +from data_formulator.agents.agent_utils import extract_json_objects, extract_code_from_gpt_response, generate_data_summary, supplement_missing_block, ensure_output_variable_in_code, compose_system_prompt from data_formulator.agents.agent_diagnostics import AgentDiagnostics +from data_formulator.datalake.parquet_utils import df_to_safe_records from data_formulator.security.sanitize import sanitize_error_message import pandas as pd @@ -14,108 +16,15 @@ logger = logging.getLogger(__name__) -# ============================================================================= -# Shared prompt sections (imported by DataTransformationAgent) -# ============================================================================= - -SHARED_ENVIRONMENT = '''**About the execution environment:** -- You can use BOTH DuckDB SQL and pandas operations in the same script -- The script will run in the workspace data directory (all data files are in the current directory) -- Each table in [CONTEXT] has a **file path** (e.g., `student_exam.parquet`, `sales.csv`). Use EXACTLY that path to load data: - - `.parquet`: `pd.read_parquet('file.parquet')` or DuckDB `read_parquet('file.parquet')` - - `.csv`: `pd.read_csv('file.csv')` or DuckDB `read_csv_auto('file.csv')` - - `.json`: `pd.read_json('file.json')` - - `.xlsx`/`.xls`: `pd.read_excel('file.xlsx')` - - `.txt`: `pd.read_csv('file.txt', sep='\\t')` -- **IMPORTANT:** Use the exact filename from the context — do NOT change the file extension or assume all files are parquet. -- **Allowed libraries:** pandas, numpy, duckdb, math, datetime, json, statistics, collections, re, sklearn, scipy, random, itertools, functools, operator, time -- **Not allowed:** matplotlib, plotly, seaborn, requests, subprocess, os, sys, io, or any other library not listed above. -- File system access (open, write) and network access are also forbidden. - -**When to use DuckDB vs pandas:** -- **Prefer plain pandas** for most tasks — it's simpler and more readable. -- Only use DuckDB when the dataset is very large and you need efficient SQL aggregations, filtering, joins, or window functions. -- You can combine both: DuckDB for initial loading/filtering on large files, then pandas for complex operations. - -**Code structure:** standalone script (no function wrapper), imports at top. **CRITICAL:** The final result DataFrame MUST be assigned to the exact variable name you specified in `"output_variable"` in the JSON spec — the system uses this name to extract the result. For example, if your output_variable is `sales_by_region`, the script must contain `sales_by_region = ...`.''' - - -SHARED_SEMANTIC_TYPE_REFERENCE = '''**[SEMANTIC TYPE REFERENCE]** - -Choose the most specific type that fits. Only annotate fields used in chart encodings. - -| Category | Types | -|---|---| -| Temporal | DateTime, Date, Time, Timestamp, Year, Quarter, Month, Week, Day, Hour, YearMonth, YearQuarter, YearWeek, Decade, Duration | -| Monetary measures | Amount, Price | -| Physical measures | Quantity, Temperature | -| Proportion | Percentage | -| Signed/diverging | Profit, PercentageChange, Sentiment, Correlation | -| Generic measures | Count, Number | -| Discrete numeric | Rank, Score | -| Identifier | ID | -| Geographic | Latitude, Longitude, Country, State, City, Region, Address, ZipCode | -| Entity names | Category, Name | -| Coded categorical | Status, Boolean, Direction | -| Binned ranges | Range | -| Fallback | Unknown | - -Key guidelines: -- Use **Amount** for summed monetary totals, **Price** for per-unit prices, **Profit** for values that can be negative. -- Use **Temperature** (not Quantity) for temperature — it has special diverging behavior. -- Use **Year** (not Number) for columns like "year" with values 2020, 2021.''' - - -SHARED_CHART_REFERENCE = '''**[CHART TYPE REFERENCE]** - -| chart_type | encodings | config | -|---|---|---| -| Scatter Plot | x, y, color, size, facet | opacity (0.1–1.0) | -| Regression | x, y, color, size, facet | regressionMethod ("linear","log","exp","pow","quad","poly"), polyOrder (2–10) | -| Bar Chart | x, y, color, facet | — | -| Grouped Bar Chart | x, y, group, facet | — | -| Line Chart | x, y, color, strokeDash, facet | interpolate ("linear","monotone","step") | -| Area Chart | x, y, color, facet | — | -| Heatmap | x, y, color, facet | colorScheme ("viridis","blues","reds","oranges","greens","blueorange","redblue") | -| Boxplot | x, y, color, facet | — | -| Pie Chart | size, color, facet | innerRadius (0–100; 0=pie, >0=donut) | -| Lollipop Chart | x, y, color, facet | — | -| Waterfall Chart | x, y, color, facet | — | -| Candlestick Chart | x, open, high, low, close, facet | — | -| World Map | longitude, latitude, color, size | projection ("mercator","equalEarth","naturalEarth1","orthographic"), projectionCenter ([lon,lat]) | -| US Map | longitude, latitude, color, size | — (fixed albersUsa) | - -**Critical chart rules:** -- **Scatter Plot**: good default for relationships/correlations. Use config opacity (0.1–1.0) for dense data instead of encoding opacity. -- **Regression**: automatically overlays a trend line — do NOT compute regression in Python. Use color to get separate trend lines per group. -- **Bar Chart**: x=categorical, y=quantitative (vertical bars). Swap x↔y for horizontal bars. For histograms/distributions, bin the data in the Python step. Same-x rows are auto-stacked. -- **Grouped Bar Chart**: use the group channel (not color) for side-by-side bars. -- **Line Chart**: use strokeDash to differentiate line styles (e.g. actual vs forecast). -- **Pie Chart**: use "size" channel (not "theta") for the wedge values. Avoid when >7–8 categories. -- **Lollipop Chart**: like bar but with dot+line — cleaner for ranked comparisons. -- **Waterfall Chart**: cumulative gain/loss — each bar starts where the previous ended. -- **Candlestick Chart**: OHLC financial data — requires open, high, low, close columns. -- **World Map/US Map**: use "longitude"/"latitude" as channel names, not "x"/"y". -- **facet**: available for all chart types; use a categorical field with small cardinality. -- All fields in "encodings" must also appear in "output_fields". Typically use 2–3 channels (x, y, color/size).''' - - -SHARED_STATISTICAL_ANALYSIS = '''**Statistical analysis guide:** -- **Regression**: use chart_type "Regression" — the trend line is automatic, do NOT compute regression values in Python code. Configure method via `{"regressionMethod": "linear"}` (options: "linear", "log", "exp", "pow", "quad", "poly"; for poly add `{"polyOrder": 3}`). -- **Forecasting**: compute predicted future values in Python. Use Line Chart with strokeDash to distinguish actual vs forecast, and color for series grouping. -- **Clustering**: compute cluster assignments in Python. Output [x, y, cluster_id]. Use Scatter Plot with color → cluster_id.''' - - -SHARED_DUCKDB_NOTES = '''**DuckDB notes:** -- Escape single quotes with '' (not \\') -- No Unicode escapes (\\u0400); use character ranges directly: [а-яА-Я] -- Cast date columns explicitly: `CAST(col AS DATE)`, `CAST(col AS TIMESTAMP)` -- For complex datetime operations, load data first then use pandas datetime functions -- Critical identifier quoting rule: - * If a table/column name contains non-ASCII characters (e.g., Chinese, Japanese, Korean, Cyrillic, etc.), spaces, or punctuation, - you MUST wrap it in double quotes, e.g. SELECT "金额" FROM "客户表". - * Never output placeholder identifiers like your_table_name, your_column, your_condition.''' +_AGENT_ID = "data_rec" +from data_formulator.agents.chart_creation_guide import ( + SHARED_ENVIRONMENT, + SHARED_SEMANTIC_TYPE_REFERENCE, + SHARED_CHART_REFERENCE, + SHARED_STATISTICAL_ANALYSIS, + SHARED_DUCKDB_NOTES, +) # ============================================================================= # DataRecAgent system prompt @@ -193,29 +102,18 @@ def __init__(self, client, workspace, system_prompt=None, agent_coding_rules="", if system_prompt is not None: self._base_prompt = system_prompt - self.system_prompt = system_prompt else: self._base_prompt = SYSTEM_PROMPT - base_prompt = SYSTEM_PROMPT - if combined_rules: - self.system_prompt = base_prompt + "\n\n[AGENT CODING RULES]\nPlease follow these rules when generating code. Note: if the user instruction conflicts with these rules, you should prioritize user instructions.\n\n" + combined_rules - else: - self.system_prompt = base_prompt - - if language_instruction: - # Insert early (after role definition, before technical sections) - # so the LLM's "last impression" remains chart/code rules, - # reducing recency-bias interference on chart-type selection. - marker = "**About the execution environment:**" - idx = self.system_prompt.find(marker) - if idx > 0: - self.system_prompt = ( - self.system_prompt[:idx] - + language_instruction + "\n\n" - + self.system_prompt[idx:] - ) - else: - self.system_prompt = self.system_prompt + "\n\n" + language_instruction + + # Insert language instruction early (after role definition, before technical + # sections) so the LLM's "last impression" remains chart/code rules, + # reducing recency-bias interference on chart-type selection. + self.system_prompt = compose_system_prompt( + self._base_prompt, + agent_coding_rules=combined_rules if system_prompt is None else "", + language_instruction=language_instruction, + language_marker="**About the execution environment:**", + ) self._diag = AgentDiagnostics( agent_name="DataRecAgent", @@ -349,7 +247,7 @@ def process_gpt_response(self, input_tables, messages, response, t_llm=None): "status": "ok", "code": code, "content": { - 'rows': json.loads(query_output.to_json(orient='records')), + 'rows': df_to_safe_records(query_output), 'virtual': { 'table_name': output_table_name, 'row_count': row_count @@ -461,7 +359,7 @@ def run(self, input_tables, description, n=1, prev_messages: list[dict] = [], pr {"role":"user","content": user_query}] t_llm_start = time.time() - response = self.client.get_completion(messages = messages) + response = self.client.get_completion(messages=messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) t_llm = time.time() - t_llm_start candidates = self.process_gpt_response(input_tables, messages, response, t_llm=t_llm) @@ -496,7 +394,7 @@ def followup(self, input_tables, dialog, latest_data_sample, new_instruction: st "content": f"This is the result from the latest transformation:\n\n{sample_data_str}\n\nUpdate the Python script above based on the following instruction:\n\n{new_instruction}"}] t_llm_start = time.time() - response = self.client.get_completion(messages = messages) + response = self.client.get_completion(messages=messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) t_llm = time.time() - t_llm_start return self.process_gpt_response(input_tables, messages, response, t_llm=t_llm) diff --git a/py-src/data_formulator/agents/agent_data_transform.py b/py-src/data_formulator/agents/agent_data_transform.py index 59be8730..bdd842d2 100644 --- a/py-src/data_formulator/agents/agent_data_transform.py +++ b/py-src/data_formulator/agents/agent_data_transform.py @@ -4,16 +4,17 @@ import json import time -from data_formulator.agents.agent_utils import extract_json_objects, extract_code_from_gpt_response, supplement_missing_block, ensure_output_variable_in_code +from data_formulator.agent_config import reasoning_effort_for +from data_formulator.agents.agent_utils import extract_json_objects, extract_code_from_gpt_response, supplement_missing_block, ensure_output_variable_in_code, compose_system_prompt from data_formulator.agents.agent_diagnostics import AgentDiagnostics +from data_formulator.datalake.parquet_utils import df_to_safe_records from data_formulator.security.sanitize import sanitize_error_message -from data_formulator.agents.agent_data_rec import ( +from data_formulator.agents.chart_creation_guide import ( SHARED_ENVIRONMENT, SHARED_SEMANTIC_TYPE_REFERENCE, SHARED_CHART_REFERENCE, SHARED_STATISTICAL_ANALYSIS, SHARED_DUCKDB_NOTES, - ) import pandas as pd @@ -21,6 +22,8 @@ logger = logging.getLogger(__name__) +_AGENT_ID = "data_transform" + SYSTEM_PROMPT = f'''You are a data scientist who transforms data for visualization. Given [CONTEXT] (dataset summaries) and [GOAL] (user intent + chart spec), refine the goal and write a Python script to produce the transformed data. @@ -99,26 +102,15 @@ def __init__(self, client, workspace, system_prompt=None, agent_coding_rules="", if system_prompt is not None: self._base_prompt = system_prompt - self.system_prompt = system_prompt else: self._base_prompt = SYSTEM_PROMPT - base_prompt = SYSTEM_PROMPT - if combined_rules: - self.system_prompt = base_prompt + "\n\n[AGENT CODING RULES]\nPlease follow these rules when generating code. Note: if the user instruction conflicts with these rules, you should prioritize user instructions.\n\n" + combined_rules - else: - self.system_prompt = base_prompt - - if language_instruction: - marker = "**About the execution environment:**" - idx = self.system_prompt.find(marker) - if idx > 0: - self.system_prompt = ( - self.system_prompt[:idx] - + language_instruction + "\n\n" - + self.system_prompt[idx:] - ) - else: - self.system_prompt = self.system_prompt + "\n\n" + language_instruction + + self.system_prompt = compose_system_prompt( + self._base_prompt, + agent_coding_rules=combined_rules if system_prompt is None else "", + language_instruction=language_instruction, + language_marker="**About the execution environment:**", + ) self._diag = AgentDiagnostics( agent_name="DataTransformationAgent", @@ -250,7 +242,7 @@ def process_gpt_response(self, response, messages, t_llm=None): "status": "ok", "code": code, "content": { - 'rows': json.loads(query_output.to_json(orient='records')), + 'rows': df_to_safe_records(query_output), 'virtual': { 'table_name': output_table_name, 'row_count': row_count @@ -383,16 +375,16 @@ def run(self, input_tables, description, prev_messages: list[dict] = [], n=1, {"role":"user","content": user_content}] t_llm_start = time.time() - response = self.client.get_completion(messages = messages) + response = self.client.get_completion(messages=messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) t_llm = time.time() - t_llm_start except Exception as e: # Fallback to text-only if model doesn't support images logger.warning(f"Image-based completion failed, falling back to text-only: {e}") - messages = [{"role":"system", "content": self.system_prompt}, + messages = [{'role':'system', 'content': self.system_prompt}, *filtered_prev_messages, - {"role":"user","content": user_query}] + {'role':'user','content': user_query}] t_llm_start = time.time() - response = self.client.get_completion(messages = messages) + response = self.client.get_completion(messages=messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) t_llm = time.time() - t_llm_start candidates = self.process_gpt_response(response, messages, t_llm=t_llm) @@ -454,14 +446,14 @@ def followup(self, input_tables, dialog, latest_data_sample, new_instruction: st messages = [*updated_dialog, {"role":"user", "content": user_content}] t_llm_start = time.time() - response = self.client.get_completion(messages = messages) + response = self.client.get_completion(messages=messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) t_llm = time.time() - t_llm_start except Exception as e: # Fallback to text-only if model doesn't support images logger.warning(f"Image-based completion failed, falling back to text-only: {e}") - messages = [*updated_dialog, {"role":"user", "content": followup_text}] + messages = [*updated_dialog, {'role':'user', 'content': followup_text}] t_llm_start = time.time() - response = self.client.get_completion(messages = messages) + response = self.client.get_completion(messages=messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) t_llm = time.time() - t_llm_start candidates = self.process_gpt_response(response, messages, t_llm=t_llm) diff --git a/py-src/data_formulator/agents/agent_experience_distill.py b/py-src/data_formulator/agents/agent_experience_distill.py index ccf9c5c3..cc738495 100644 --- a/py-src/data_formulator/agents/agent_experience_distill.py +++ b/py-src/data_formulator/agents/agent_experience_distill.py @@ -20,13 +20,13 @@ from datetime import datetime, timezone from typing import Any -import litellm -import openai - +from data_formulator.agent_config import reasoning_effort_for from data_formulator.agents.client_utils import Client logger = logging.getLogger(__name__) +_AGENT_ID = "experience_distill" + SYSTEM_PROMPT = """\ You are a knowledge distiller. Given the chronological events of a data @@ -386,26 +386,9 @@ def _render_events(cls, events: list[Any]) -> str: def _call_llm(self, messages: list[dict]) -> str: """Single LLM call to generate the experience document.""" - if self.client.endpoint == "openai": - client = openai.OpenAI( - base_url=self.client.params.get("api_base"), - api_key=self.client.params.get("api_key", ""), - timeout=self.timeout_seconds, - ) - resp = client.chat.completions.create( - model=self.client.model, - messages=messages, - ) - else: - params = self.client.params.copy() - params.setdefault("timeout", self.timeout_seconds) - resp = litellm.completion( - model=self.client.model, - messages=messages, - drop_params=True, - **params, - ) - + resp = self.client.get_completion( + messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model), timeout=self.timeout_seconds, + ) return resp.choices[0].message.content or "" @staticmethod diff --git a/py-src/data_formulator/agents/agent_interactive_explore.py b/py-src/data_formulator/agents/agent_interactive_explore.py index dd4c81fa..67847ec2 100644 --- a/py-src/data_formulator/agents/agent_interactive_explore.py +++ b/py-src/data_formulator/agents/agent_interactive_explore.py @@ -7,10 +7,13 @@ import pandas as pd -import litellm -import openai - -from data_formulator.agents.agent_utils import extract_json_objects, generate_data_summary +from data_formulator.agent_config import reasoning_effort_for +from data_formulator.agents.agent_utils import ( + attach_reasoning_content, + extract_json_objects, + generate_data_summary, +) +from data_formulator.agents.agent_language import inject_language_instruction from data_formulator.agents.context import ( build_focused_thread_context, build_lightweight_table_context, @@ -20,6 +23,8 @@ logger = logging.getLogger(__name__) +_AGENT_ID = "interactive_explore" + # ── Tool definition (inspect only) ──────────────────────────────────────── INSPECT_TOOL = { @@ -184,8 +189,7 @@ def run(self, input_tables, start_question=None, if self._knowledge_store: system_prompt += self._knowledge_store.format_rules_block() - if self.language_instruction: - system_prompt = system_prompt + "\n\n" + self.language_instruction + system_prompt = inject_language_instruction(system_prompt, self.language_instruction) ctx_elapsed = time.time() - t_ctx logger.info( @@ -222,7 +226,7 @@ def run(self, input_tables, start_question=None, # ── Stream the final response ───────────────────────────────── try: - stream = self.client.get_completion(messages=messages, stream=True) + stream = self.client.get_completion(messages=messages, stream=True, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) except Exception as e: # If image fails, retry without it if current_chart: @@ -303,6 +307,7 @@ def _run_inspect_round(self, messages, input_tables): for tc in tool_calls ], } + attach_reasoning_content(assistant_msg, choice.message) messages.append(assistant_msg) # Execute each tool call @@ -333,45 +338,6 @@ def _run_inspect_round(self, messages, input_tables): def _call_llm_with_tools(self, messages, tools): """Non-streaming LLM call with tool definitions.""" - if self.client.endpoint == "openai": - client = openai.OpenAI( - base_url=self.client.params.get("api_base", None), - api_key=self.client.params.get("api_key", ""), - timeout=120, - ) - try: - return client.chat.completions.create( - model=self.client.model, - messages=messages, - tools=tools, - ) - except Exception as e: - if self.client._is_image_deserialize_error(str(e)): - sanitized = self.client._strip_images_from_messages(messages) - return client.chat.completions.create( - model=self.client.model, - messages=sanitized, - tools=tools, - ) - raise - else: - params = self.client.params.copy() - try: - return litellm.completion( - model=self.client.model, - messages=messages, - tools=tools, - drop_params=True, - **params, - ) - except Exception as e: - if self.client._is_image_deserialize_error(str(e)): - sanitized = self.client._strip_images_from_messages(messages) - return litellm.completion( - model=self.client.model, - messages=sanitized, - tools=tools, - drop_params=True, - **params, - ) - raise \ No newline at end of file + return self.client.get_completion_with_tools( + messages, tools=tools, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model), + ) \ No newline at end of file diff --git a/py-src/data_formulator/agents/agent_language.py b/py-src/data_formulator/agents/agent_language.py index 38627d76..e32fe016 100644 --- a/py-src/data_formulator/agents/agent_language.py +++ b/py-src/data_formulator/agents/agent_language.py @@ -112,9 +112,12 @@ def build_language_instruction(language: str, *, mode: str = "full") -> str: ``"full"`` – detailed field-level rules (for text-heavy agents). ``"compact"`` – minimal instruction (for code-generation agents). - Returns ``""`` when *language* is ``"en"`` (or empty / unrecognised). + Returns ``""`` when *language* is ``"en"``, or when it is empty / + whitespace-only (which normalises to the default language, English). + For unrecognised codes (e.g. ``"xx"``), a non-empty instruction block + is still returned using the raw code as the display name. """ - lang = (language or DEFAULT_LANGUAGE).strip().lower() + lang = ((language or "").strip().lower()) or DEFAULT_LANGUAGE.lower() if lang == "en": return "" diff --git a/py-src/data_formulator/agents/agent_report_gen.py b/py-src/data_formulator/agents/agent_report_gen.py index 06749fa7..9d936b28 100644 --- a/py-src/data_formulator/agents/agent_report_gen.py +++ b/py-src/data_formulator/agents/agent_report_gen.py @@ -17,11 +17,15 @@ import logging from typing import Any, Generator -import litellm -import openai import pandas as pd -from data_formulator.agents.agent_utils import generate_data_summary +from data_formulator.agent_config import reasoning_effort_for +from data_formulator.agents.agent_utils import ( + attach_reasoning_content, + generate_data_summary, +) +from data_formulator.agents.agent_language import inject_language_instruction +from data_formulator.datalake.parquet_utils import df_to_safe_records from data_formulator.agents.context import ( build_focused_thread_context, build_lightweight_table_context, @@ -38,6 +42,8 @@ logger = logging.getLogger(__name__) +_AGENT_ID = "report_gen" + # ── Tool definitions ────────────────────────────────────────────────────── INSPECT_TOOLS = [ @@ -90,7 +96,11 @@ # ── System prompt ───────────────────────────────────────────────────────── SYSTEM_PROMPT = """\ -You are a data journalist who creates concise, insightful reports based on data explorations. +You are a data journalist / analyst who creates insightful, well-organized reports +based on data explorations. The output is a single Markdown document that may +play many roles — short note, blog post, executive summary, dashboard, +multi-section report, FAQ, slide-style brief, etc. Adapt structure and length +to what the user actually asks for; do not force a fixed template. The user message contains context about the workspace: - **[PRIMARY TABLE(S)]** / **[OTHER AVAILABLE TABLES]**: Lightweight schema of datasets. @@ -99,33 +109,54 @@ - **[AVAILABLE CHARTS]**: List of charts with their type, encodings, and table references. ## Phase 1 — Inspect -Before writing, use `inspect_chart` and `inspect_source_data` to gather information about the charts and data you want to include. You don't need to inspect everything — focus on what's relevant to the user's request. +Before writing, use `inspect_chart` and `inspect_source_data` to gather information +about the charts and data you want to include. Inspect only what you actually need +to ground your narrative — don't fetch everything. ## Phase 2 — Write the report -Write a concise report (under 200 words, ~1 minute read). +### Embedding charts (REQUIRED FORMAT — do not change this) To embed a chart image, use markdown image syntax with a `chart://` URL: ![Caption describing the chart](chart://chart_id) Example: `![Monthly trade balance trend](chart://chart-123)` -The chart_id must match one from [AVAILABLE CHARTS]. Place each chart embed on its own line. +The chart_id must match one from [AVAILABLE CHARTS]. Place each chart embed on +its own line (it renders as a block). You can embed the same chart at most +once. Captions are short — one line describing what the chart shows. -For data tables, just write standard markdown tables directly: +### Tables +For data tables, write standard markdown tables directly: | date | value | | --- | --- | | 2020-01 | -43.5 | -Guidelines: -- Start with a `# Title` -- Connect findings into a coherent narrative -- For each chart, briefly explain what it shows and the key insight -- Use chart embeds at appropriate places -- Use markdown tables when you want to show specific data points -- End with a **In summary:** paragraph -- Write in markdown, be concise, respect facts in the data -- Adapt your style to the user's request (blog, executive summary, casual, etc.) -- Do NOT make up facts or judgements beyond what the data shows +### Style & structure — adapt to the user's request +The user may ask for any of: +- a short note or social-style summary (a few sentences, maybe one chart), +- a blog post / narrative report (intro → findings → takeaway), +- an executive summary (key numbers up top, then context), +- a KPI dashboard / multi-section overview (headings per topic, multiple charts + arranged with short commentary between them), +- a slide-style brief (compact sections with bullet points and embedded charts), +- a deeper analytical report with sub-sections, methodology notes, and caveats. + +Pick the structure that fits the request and the available material. Reasonable +defaults if the user is vague: +- Start with a `# Title` that reflects the topic. +- Group related findings under `##` (and `###` if useful) headings. +- Around each embedded chart, briefly explain what it shows and the key insight. +- Use bullets / short paragraphs / tables where they help; don't pad. +- Close with a brief takeaway or summary section if the report is more than a + few paragraphs. For very short outputs (notes, single-chart blurbs), a closing + summary is optional. + +### Guardrails +- Write in Markdown. Keep prose tight; let the data and charts carry the weight. +- Stay faithful to the data — do not invent numbers, comparisons, or causation + that the data does not actually support. +- It is fine to flag uncertainty ("based on the sample shown…") when appropriate. +- Embed every chart you discuss; don't reference a chart in prose without showing it. """ @@ -183,8 +214,7 @@ def run( # Build system prompt system_prompt = SYSTEM_PROMPT - if self.language_instruction: - system_prompt += "\n\n" + self.language_instruction + system_prompt = inject_language_instruction(system_prompt, self.language_instruction) messages: list[dict] = [ {"role": "system", "content": system_prompt}, @@ -251,6 +281,7 @@ def _run_inspect_phase( for tc in tool_calls ], } + attach_reasoning_content(assistant_msg, choice.message) messages.append(assistant_msg) # Execute each tool @@ -398,7 +429,7 @@ def _resolve_table_data( df = df.head(max_rows) return { "columns": df.columns.tolist(), - "rows": df.to_dict(orient="records"), + "rows": df_to_safe_records(df), } except Exception as e: logger.error(f"[ReportAgent] resolve_table_data error: {e}") @@ -410,84 +441,16 @@ def _resolve_table_data( def _call_llm(self, messages: list[dict], tools: list[dict] | None = None): """Non-streaming LLM call with optional tool definitions.""" - if self.client.endpoint == "openai": - client = openai.OpenAI( - base_url=self.client.params.get("api_base", None), - api_key=self.client.params.get("api_key", ""), - timeout=120, + if tools: + return self.client.get_completion_with_tools( + messages, tools=tools, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model), ) - kwargs: dict[str, Any] = { - "model": self.client.model, - "messages": messages, - } - if tools: - kwargs["tools"] = tools - try: - return client.chat.completions.create(**kwargs) - except Exception as e: - if self.client._is_image_deserialize_error(str(e)): - sanitized = self.client._strip_images_from_messages(messages) - kwargs["messages"] = sanitized - return client.chat.completions.create(**kwargs) - raise - else: - params = self.client.params.copy() - kwargs = { - "model": self.client.model, - "messages": messages, - "drop_params": True, - } - if tools: - kwargs["tools"] = tools - kwargs.update(params) - try: - return litellm.completion(**kwargs) - except Exception as e: - if self.client._is_image_deserialize_error(str(e)): - sanitized = self.client._strip_images_from_messages(messages) - kwargs["messages"] = sanitized - return litellm.completion(**kwargs) - raise + return self.client.get_completion(messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) def _call_llm_streaming(self, messages: list[dict], tools: list[dict] | None = None): """Streaming LLM call with optional tool definitions.""" - if self.client.endpoint == "openai": - client = openai.OpenAI( - base_url=self.client.params.get("api_base", None), - api_key=self.client.params.get("api_key", ""), - timeout=120, + if tools: + return self.client.get_completion_with_tools( + messages, tools=tools, stream=True, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model), ) - kwargs: dict[str, Any] = { - "model": self.client.model, - "messages": messages, - "stream": True, - } - if tools: - kwargs["tools"] = tools - try: - return client.chat.completions.create(**kwargs) - except Exception as e: - if self.client._is_image_deserialize_error(str(e)): - sanitized = self.client._strip_images_from_messages(messages) - kwargs["messages"] = sanitized - return client.chat.completions.create(**kwargs) - raise - else: - params = self.client.params.copy() - kwargs = { - "model": self.client.model, - "messages": messages, - "stream": True, - "drop_params": True, - } - if tools: - kwargs["tools"] = tools - kwargs.update(params) - try: - return litellm.completion(**kwargs) - except Exception as e: - if self.client._is_image_deserialize_error(str(e)): - sanitized = self.client._strip_images_from_messages(messages) - kwargs["messages"] = sanitized - return litellm.completion(**kwargs) - raise + return self.client.get_completion(messages, stream=True, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) diff --git a/py-src/data_formulator/agents/agent_simple.py b/py-src/data_formulator/agents/agent_simple.py index b2857fe7..c3fb16c9 100644 --- a/py-src/data_formulator/agents/agent_simple.py +++ b/py-src/data_formulator/agents/agent_simple.py @@ -10,11 +10,14 @@ import json import logging +from data_formulator.agent_config import reasoning_effort_for from data_formulator.agents.agent_utils import extract_json_objects from data_formulator.agents.agent_language import inject_language_instruction logger = logging.getLogger(__name__) +_AGENT_ID = "simple" + # --------------------------------------------------------------------------- # System prompts @@ -67,17 +70,25 @@ _CHART_INTENT_SYSTEM_PROMPT = ( "Route a chart edit request to one of two agents.\n" "\n" - "STYLE — refine the current chart, keeping its analytical intent.\n" - " Uses only the fields already on the chart, plus simple derivations\n" - " expressible in Vega-Lite transforms (filter, sort, top-N, bin,\n" - " aggregate over an existing field, simple calculate). Also covers\n" - " any purely visual change: theme, colors, fonts, legend, axes,\n" - " mark size/opacity, donut hole, etc.\n" + "The test: does the request change the set of fields bound to chart\n" + "encodings (x, y, color, size, shape, row, column, facet, theta, etc.)?\n" + "\n" + "STYLE — encoding fields are unchanged. The user is refining the same\n" + "chart that answers the same question. This includes:\n" + " - filter / sort / top-N / limit (even on fields not currently encoded,\n" + " as long as the field already exists in the data)\n" + " - layering or overlay on the same encoded fields (trend line, error bars)\n" + " - aggregation / bin changes on an already-encoded field\n" + " - any visual change: theme, colors, fonts, legend, axes, mark\n" + " size/opacity, donut hole, tooltip text\n" "\n" - "DATA — a new analytical intent or a transformation beyond simple\n" - " Vega-Lite: bring in a different field that isn't on the chart,\n" - " add a column from another table, pivot/unpivot,joins,\n" - " or otherwise reshape the data so it answers a different question.\n" + "DATA — encoding fields change, or a new field must be computed/joined:\n" + " - replace, add, or remove an encoded field (e.g. \"color by region\",\n" + " \"use quantity instead of price on y\", \"drop size\")\n" + " - change chart type in a way that requires different fields\n" + " - pivot / unpivot / reshape, bring in a field from another table\n" + " - compute a new derived field beyond a simple Vega-Lite calculate\n" + " (moving average, percentile rank, etc.)\n" "\n" "Requests may be in any language. Reply with one word: STYLE or DATA." ) @@ -123,7 +134,7 @@ def nl_to_filter(self, columns: list[dict], instruction: str) -> dict: ] logger.info("[SimpleAgents.nl_to_filter] run start") - response = self.client.get_completion(messages=messages) + response = self.client.get_completion(messages=messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) raw = response.choices[0].message.content.strip() # Strip markdown code fences if present @@ -176,7 +187,7 @@ def workspace_name(self, table_names: list[str], user_query: str = "") -> str: ] logger.info("[SimpleAgents.workspace_name] run start") - response = self.client.get_completion(messages=messages) + response = self.client.get_completion(messages=messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) display_name = response.choices[0].message.content.strip().strip("\"'") if len(display_name) > 60: display_name = display_name[:57] + "..." @@ -211,7 +222,7 @@ def classify_chart_intent(self, instruction: str) -> str: ] try: - response = self.client.get_completion(messages=messages) + response = self.client.get_completion(messages=messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) raw = (response.choices[0].message.content or "").strip().upper() except Exception as e: logger.warning("[SimpleAgents.classify_chart_intent] LLM call failed: %s", e) diff --git a/py-src/data_formulator/agents/agent_sort_data.py b/py-src/data_formulator/agents/agent_sort_data.py index f4e8b9cd..9daf82f2 100644 --- a/py-src/data_formulator/agents/agent_sort_data.py +++ b/py-src/data_formulator/agents/agent_sort_data.py @@ -2,6 +2,7 @@ # Licensed under the MIT License. import json +from data_formulator.agent_config import reasoning_effort_for from data_formulator.agents.agent_utils import extract_json_objects from data_formulator.agents.agent_language import inject_language_instruction @@ -9,6 +10,8 @@ logger = logging.getLogger(__name__) +_AGENT_ID = "sort_data" + SYSTEM_PROMPT = '''You are a data scientist to help user to sort data. The user will provide list of items in the form of a json object, and your goal is to sort the data in its natural order based on your knowledge. @@ -74,7 +77,7 @@ def run(self, name, values, n=1): input_obj = { 'name': name, - 'value': values + 'values': values } user_query = f"[INPUT]\n\n{json.dumps(input_obj, ensure_ascii=False)}\n\n[OUTPUT]" @@ -90,7 +93,7 @@ def run(self, name, values, n=1): {"role":"user","content": user_query}] ###### the part that calls open_ai - response = self.client.get_completion(messages = messages) + response = self.client.get_completion(messages = messages, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model)) #log = {'messages': messages, 'response': response.model_dump(mode='json')} diff --git a/py-src/data_formulator/agents/agent_utils.py b/py-src/data_formulator/agents/agent_utils.py index 231225f0..f5bcb7e8 100644 --- a/py-src/data_formulator/agents/agent_utils.py +++ b/py-src/data_formulator/agents/agent_utils.py @@ -14,6 +14,98 @@ _logger = logging.getLogger(__name__) +# --------------------------------------------------------------------------- +# system-prompt composition +# --------------------------------------------------------------------------- + +def compose_system_prompt( + base_prompt: str, + *, + agent_coding_rules: str = "", + language_instruction: str = "", + language_marker: str | None = None, +) -> str: + """Assemble a system prompt by appending coding rules and injecting a language block. + + - ``agent_coding_rules`` (already-combined rules text) is appended under an + ``[AGENT CODING RULES]`` preamble when non-empty. + - ``language_instruction`` is inserted before ``language_marker`` if the marker + is found in the resulting prompt, otherwise appended at the end. + """ + # Local import keeps agent_utils dependency-light at module import time. + from data_formulator.agents.agent_language import inject_language_instruction + + prompt = base_prompt + if agent_coding_rules: + prompt = prompt + ( + "\n\n[AGENT CODING RULES]\n" + "Please follow these rules when generating code. " + "Note: if the user instruction conflicts with these rules, " + "you should prioritize user instructions.\n\n" + ) + agent_coding_rules + if language_instruction: + prompt = inject_language_instruction( + prompt, language_instruction, marker=language_marker + ) + return prompt + + +# --------------------------------------------------------------------------- +# reasoning_content helpers +# --------------------------------------------------------------------------- + +def attach_reasoning_content(msg: dict, choice_message) -> dict: + """Attach ``reasoning_content`` from an LLM response to an assistant message dict. + + Some reasoning models (currently DeepSeek V4) return a + ``reasoning_content`` field alongside the regular ``content``. + In multi-turn conversations this field **must** be echoed back in the + assistant message, otherwise the API may reject the request or the + chain-of-thought context is lost. + + For models that do not produce this field the function is a safe no-op. + + Args: + msg: The assistant message dict (mutated in-place and returned). + choice_message: The ``choice.message`` object from an LLM response. + + Returns: + The same *msg* dict, for chaining convenience. + + See: https://api-docs.deepseek.com/guides/reasoning_model + """ + rc = getattr(choice_message, "reasoning_content", None) + if rc is not None: + msg["reasoning_content"] = rc + return msg + + +def accumulate_reasoning_content( + accumulated: str | None, delta +) -> str | None: + """Accumulate ``reasoning_content`` from streaming delta chunks. + + In streaming mode, reasoning models (currently DeepSeek V4) deliver + ``reasoning_content`` as incremental ``delta.reasoning_content`` + chunks, similar to ``delta.content``. This helper concatenates them. + + For non-reasoning models the delta has no such attribute; the + accumulator is returned unchanged. + + Args: + accumulated: The string accumulated so far, or ``None``. + delta: A streaming ``choice.delta`` object. + + Returns: + Updated accumulator (``str`` once the first chunk arrives, + ``None`` if no reasoning_content has been seen). + """ + rc_delta = getattr(delta, "reasoning_content", None) + if rc_delta: + return (accumulated or "") + rc_delta + return accumulated + + def _source_table_matches_catalog_entry( source_table: str, catalog_entry: dict[str, Any], diff --git a/py-src/data_formulator/agents/chart_creation_guide.py b/py-src/data_formulator/agents/chart_creation_guide.py new file mode 100644 index 00000000..20a6ab94 --- /dev/null +++ b/py-src/data_formulator/agents/chart_creation_guide.py @@ -0,0 +1,153 @@ +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT License. + +"""Chart creation guide and shared prompt fragments. + +This module is the canonical source of truth for the prompt fragments +that describe how the agent should write code, choose chart types, +annotate semantic types, run statistical analyses, and use DuckDB. + +The individual ``SHARED_*`` fragments are imported by ``DataRecAgent`` +and ``DataTransformationAgent`` (which weave them into their system +prompts) and are also composed into ``CHART_CREATION_GUIDE``, the +single block injected lazily on the first ``visualize`` tool call. +""" + + +SHARED_ENVIRONMENT = '''**About the execution environment:** +- You can use BOTH DuckDB SQL and pandas operations in the same script +- The script will run in the workspace data directory (all data files are in the current directory) +- Each table in [CONTEXT] has a **file path** (e.g., `student_exam.parquet`, `sales.csv`). Use EXACTLY that path to load data: + - `.parquet`: `pd.read_parquet('file.parquet')` or DuckDB `read_parquet('file.parquet')` + - `.csv`: `pd.read_csv('file.csv')` or DuckDB `read_csv_auto('file.csv')` + - `.json`: `pd.read_json('file.json')` + - `.xlsx`/`.xls`: `pd.read_excel('file.xlsx')` + - `.txt`: `pd.read_csv('file.txt', sep='\\t')` +- **IMPORTANT:** Use the exact filename from the context — do NOT change the file extension or assume all files are parquet. +- **Allowed libraries:** pandas, numpy, duckdb, math, datetime, json, statistics, collections, re, sklearn, scipy, random, itertools, functools, operator, time +- **Not allowed:** matplotlib, plotly, seaborn, requests, subprocess, os, sys, io, or any other library not listed above. +- File system access (open, write) and network access are also forbidden. + +**When to use DuckDB vs pandas:** +- **Prefer plain pandas** for most tasks — it's simpler and more readable. +- Only use DuckDB when the dataset is very large and you need efficient SQL aggregations, filtering, joins, or window functions. +- You can combine both: DuckDB for initial loading/filtering on large files, then pandas for complex operations. + +**Code structure:** standalone script (no function wrapper), imports at top. **CRITICAL:** The final result DataFrame MUST be assigned to the exact variable name you specified in `"output_variable"` in the JSON spec — the system uses this name to extract the result. For example, if your output_variable is `sales_by_region`, the script must contain `sales_by_region = ...`.''' + + +SHARED_SEMANTIC_TYPE_REFERENCE = '''**[SEMANTIC TYPE REFERENCE]** + +Choose the most specific type that fits. Only annotate fields used in chart encodings. + +| Category | Types | +|---|---| +| Temporal | DateTime, Date, Time, Timestamp, Year, Quarter, Month, Week, Day, Hour, YearMonth, YearQuarter, YearWeek, Decade, Duration | +| Monetary measures | Amount, Price | +| Physical measures | Quantity, Temperature | +| Proportion | Percentage | +| Signed/diverging | Profit, PercentageChange, Sentiment, Correlation | +| Generic measures | Count, Number | +| Discrete numeric | Rank, Score | +| Identifier | ID | +| Geographic | Latitude, Longitude, Country, State, City, Region, Address, ZipCode | +| Entity names | Category, Name | +| Coded categorical | Status, Boolean, Direction | +| Binned ranges | Range | +| Fallback | Unknown | + +Key guidelines: +- Use **Amount** for summed monetary totals, **Price** for per-unit prices, **Profit** for values that can be negative. +- Use **Temperature** (not Quantity) for temperature — it has special diverging behavior. +- Use **Year** (not Number) for columns like "year" with values 2020, 2021.''' + + +SHARED_CHART_REFERENCE = '''**[CHART TYPE REFERENCE]** + +The `chart_type` value in the `visualize` action MUST be one of the names listed +in the first column below (exact spelling, including capitalization). When a row +lists multiple names, pick whichever fits the "when to use" hint best. + +| chart_type | encodings | config | when to use | +|---|---|---|---| +| Scatter Plot | x, y, color, size, facet | opacity (0.1–1.0) | Relationships between two quantitative fields | +| Regression | x, y, color, size, facet | regressionMethod ("linear","log","exp","pow","quad","poly"), polyOrder (2–10) | Trend line over scatter; one line per color group | +| Bar Chart / Lollipop Chart / Waterfall Chart | x, y, color, facet | — | Bar: default categorical comparison. Lollipop: cleaner for ranked lists / sparse categories. Waterfall: cumulative gain/loss, each bar starts where the previous ended | +| Grouped Bar Chart | x, y, group, facet | — | Side-by-side bars across a second categorical dimension | +| Histogram / Density Plot | x, color, facet | — | Distribution of one quantitative field. Histogram: discrete bins, auto-binned. Density Plot: smooth KDE curve | +| Boxplot | x, y, color, facet | — | Distribution summary (median/quartiles/outliers) by category | +| Ranged Dot Plot | x, y, color, facet | — | Min–max range or two-point comparison per category | +| Line Chart | x, y, color, strokeDash, facet | interpolate ("linear","monotone","step") | Trends over an ordered (usually temporal) x-axis | +| Area Chart | x, y, color, facet | — | Magnitude over ordered x; auto-stacks when color is set | +| Pie Chart | size, color, facet | innerRadius (0–100; 0=pie, >0=donut) | Part-of-whole with ≤7 categories. Wedge value goes on **size**, not **theta** | +| Radar Chart | x, y, color, facet | — | Multi-metric profile/comparison; x = metric name, y = value, color = entity (long-form data) | +| Heatmap | x, y, color, facet | colorScheme ("viridis","blues","reds","oranges","greens","blueorange","redblue") | Matrix / 2D density; color encodes the quantitative cell value | +| Bar Table | x, y, color, facet | — | Ranked horizontal table with inline bars; one row per category. y = category, x = value | +| KPI Card | metric, value, goal | — | "Big number" dashboard tile(s); one row per tile. `value` must be pre-aggregated; `goal` is optional | +| Candlestick Chart | x, open, high, low, close, facet | — | OHLC financial data | +| World Map | longitude, latitude, color, size | projection ("mercator","equalEarth","naturalEarth1","orthographic"), projectionCenter ([lon,lat]) | Geographic points/regions on a world projection | +| US Map | longitude, latitude, color, size | — (fixed albersUsa) | US-only points/regions (albersUsa projection) | + +**Critical chart rules:** +- **Scatter Plot**: use config opacity (0.1–1.0) for dense data instead of encoding opacity. +- **Regression**: trend line is automatic — do NOT compute regression coefficients/predictions in Python. Use `color` to get separate trend lines per group. +- **Bar Chart**: x=categorical, y=quantitative (vertical bars). Swap x↔y for horizontal bars. Same-x rows are auto-stacked when `color` is set. +- **Grouped Bar Chart**: use the `group` channel (not `color`) for side-by-side bars. +- **Histogram**: do NOT pre-bin in Python — pass the raw quantitative field on `x` and the chart bins automatically. Pre-aggregating gives wrong bin widths. +- **Line Chart**: use `strokeDash` to differentiate line styles (e.g. actual vs forecast). +- **Pie Chart**: use the `size` channel (not `theta`) for wedge values. Avoid when >7–8 categories. +- **Radar Chart**: data must be long-form — one row per (entity, metric, value). If your data is wide-form (one column per metric), melt it first in the Python step. +- **Bar Table**: y is the category column to rank; x is the quantitative value driving bar length. Don't sort in Python — the template sorts. +- **KPI Card**: channels are `metric`, `value`, `goal` (not x/y). One DataFrame row = one tile. The `value` column must already contain the final number to display (aggregate upstream in the Python step). +- **Candlestick Chart**: requires `open`, `high`, `low`, `close` columns. +- **World Map / US Map**: channel names are `longitude` / `latitude`, not `x` / `y`. +- **facet**: available for nearly all chart types; use a low-cardinality categorical field. +- All fields in `encodings` must also appear in `output_fields`. Typically use 2–3 channels (x, y, color/size).''' + + +SHARED_STATISTICAL_ANALYSIS = '''**Statistical analysis guide:** +- **Regression**: use chart_type "Regression" — the trend line is automatic, do NOT compute regression values in Python code. Configure method via `{"regressionMethod": "linear"}` (options: "linear", "log", "exp", "pow", "quad", "poly"; for poly add `{"polyOrder": 3}`). +- **Forecasting**: compute predicted future values in Python. Use Line Chart with strokeDash to distinguish actual vs forecast, and color for series grouping. +- **Clustering**: compute cluster assignments in Python. Output [x, y, cluster_id]. Use Scatter Plot with color → cluster_id.''' + + +SHARED_DUCKDB_NOTES = '''**DuckDB notes:** +- Escape single quotes with '' (not \\') +- No Unicode escapes (\\u0400); use character ranges directly: [а-яА-Я] +- Cast date columns explicitly: `CAST(col AS DATE)`, `CAST(col AS TIMESTAMP)` +- For complex datetime operations, load data first then use pandas datetime functions +- Critical identifier quoting rule: + * If a table/column name contains non-ASCII characters (e.g., Chinese, Japanese, Korean, Cyrillic, etc.), spaces, or punctuation, + you MUST wrap it in double quotes, e.g. SELECT "金额" FROM "客户表". + * Never output placeholder identifiers like your_table_name, your_column, your_condition.''' + + +CHART_CREATION_GUIDE = f"""\ +## Chart Creation Guide + +The following reference material applies when you call the `visualize` tool. + +### A. Code Execution Rules + +{SHARED_ENVIRONMENT} + +{SHARED_DUCKDB_NOTES} + +**Datetime handling:** +- `date` columns contain date-only values (YYYY-MM-DD). `datetime` columns contain date+time (ISO 8601). +- `time` columns contain time-only values (HH:mm:ss). `duration` columns are time intervals. +- Year → number. Year-month / year-month-day → string ("2020-01" / "2020-01-01"). +- Hour alone → number. Hour:min or h:m:s → string. Never return raw datetime objects. + +### B. Chart Type Reference + +{SHARED_CHART_REFERENCE} + +### C. Semantic Type Reference + +{SHARED_SEMANTIC_TYPE_REFERENCE} + +### D. Statistical Analysis Guide + +{SHARED_STATISTICAL_ANALYSIS} +""" diff --git a/py-src/data_formulator/agents/client_utils.py b/py-src/data_formulator/agents/client_utils.py index 637a4f75..0069c4b4 100644 --- a/py-src/data_formulator/agents/client_utils.py +++ b/py-src/data_formulator/agents/client_utils.py @@ -1,5 +1,4 @@ import litellm -import openai from azure.identity import DefaultAzureCredential, get_bearer_token_provider @@ -21,7 +20,10 @@ def __init__(self, endpoint, model, api_key=None, api_base=None, api_version=No if api_version is not None and api_version != "": self.params["api_version"] = api_version - if self.endpoint == "gemini": + if self.endpoint == "openai": + if not model.startswith("openai/"): + self.model = f"openai/{model}" + elif self.endpoint == "gemini": if model.startswith("gemini/"): self.model = model else: @@ -85,6 +87,13 @@ def _is_image_deserialize_error(self, error_text: str) -> bool: lowered = error_text.lower() return ("image_url" in lowered and "expected `text`" in lowered) or "unknown variant `image_url`" in lowered + def _is_reasoning_effort_error(self, error_text: str) -> bool: + """Detect provider errors caused by an unsupported ``reasoning_effort`` + value (e.g. ``"minimal"`` on a model that only accepts + ``none/low/medium/high/xhigh``). The provider message reliably + mentions the parameter name.""" + return "reasoning_effort" in error_text.lower() + @classmethod def from_config(cls, model_config: dict[str, str]): """ @@ -113,106 +122,72 @@ def ping(self, timeout: int = 10): """Lightweight connectivity check: send a minimal completion with max_tokens=3 and a short timeout. Raises on any failure.""" messages = [{"role": "user", "content": "Reply only 'ok'."}] + params = self.params.copy() + params["timeout"] = timeout + litellm.completion( + model=self.model, messages=messages, + max_tokens=3, drop_params=True, **params, + ) - if self.endpoint == "openai": - client = openai.OpenAI( - base_url=self.params.get("api_base", None), - api_key=self.params.get("api_key", ""), - timeout=timeout, - ) - client.chat.completions.create( - model=self.model, messages=messages, max_tokens=3, - ) - else: - params = self.params.copy() - params["timeout"] = timeout - litellm.completion( - model=self.model, messages=messages, - max_tokens=3, drop_params=True, **params, - ) + def get_completion(self, messages, stream=False, reasoning_effort="low", + **kwargs): + """Send a chat completion request via LiteLLM. - def get_completion(self, messages, stream=False): + All providers (OpenAI, Azure, Anthropic, etc.) are handled uniformly + by LiteLLM. ``drop_params=True`` ensures unsupported parameters + (like ``reasoning_effort`` on non-reasoning models) are silently + ignored rather than causing errors. """ - Returns a LiteLLM client configured for the specified endpoint and model. - Supports OpenAI, Azure, Ollama, and other providers via LiteLLM. - """ - # Configure LiteLLM - - if self.endpoint == "openai": - client = openai.OpenAI( - base_url=self.params.get("api_base", None), - api_key=self.params.get("api_key", ""), - timeout=120 + params = self.params.copy() + params["reasoning_effort"] = reasoning_effort + params.update(kwargs) + try: + return litellm.completion( + model=self.model, messages=messages, + drop_params=True, stream=stream, **params, ) - - completion_params = { - "model": self.model, - "messages": messages, - } - - if self.model.startswith("gpt-5") or self.model.startswith("o1") or self.model.startswith("o3"): - completion_params["reasoning_effort"] = "low" - - try: - return client.chat.completions.create(**completion_params, stream=stream) - except Exception as e: - error_text = str(e) - if self._is_image_deserialize_error(error_text): - sanitized_messages = self._strip_images_from_messages(messages) - completion_params["messages"] = sanitized_messages - return client.chat.completions.create(**completion_params, stream=stream) - raise - else: - - params = self.params.copy() - - if (self.model.startswith("gpt-5") or self.model.startswith("o1") or self.model.startswith("o3") - or self.model.startswith("claude-sonnet-4-5") or self.model.startswith("claude-opus-4")): - params["reasoning_effort"] = "low" - - try: + except Exception as e: + err = str(e) + if self._is_reasoning_effort_error(err): + params.pop("reasoning_effort", None) return litellm.completion( - model=self.model, - messages=messages, - drop_params=True, - stream=stream, - **params + model=self.model, messages=messages, + drop_params=True, stream=stream, **params, ) - except Exception as e: - error_text = str(e) - if self._is_image_deserialize_error(error_text): - sanitized_messages = self._strip_images_from_messages(messages) - return litellm.completion( - model=self.model, - messages=sanitized_messages, - drop_params=True, - stream=stream, - **params - ) - raise + if self._is_image_deserialize_error(err): + sanitized = self._strip_images_from_messages(messages) + return litellm.completion( + model=self.model, messages=sanitized, + drop_params=True, stream=stream, **params, + ) + raise - - def get_response(self, messages: list[dict], tools: list | None = None): - """ - Returns a response using OpenAI's Response API approach. + def get_completion_with_tools(self, messages, tools, stream=False, + reasoning_effort="low", **kwargs): + """Send a chat completion request with tool definitions via LiteLLM. + + Same as ``get_completion`` but accepts ``tools`` (and optional + ``tool_choice``, ``parallel_tool_calls``, etc. via ``**kwargs``). """ - if self.endpoint == "openai": - client = openai.OpenAI( - base_url=self.params.get("api_base", None), - api_key=self.params.get("api_key", ""), - timeout=120 + params = self.params.copy() + params["reasoning_effort"] = reasoning_effort + try: + return litellm.completion( + model=self.model, messages=messages, tools=tools, + drop_params=True, stream=stream, **params, **kwargs, ) - return client.responses.create( - model=self.model, - input=messages, - tools=tools, - **self.params - ) - else: - return litellm.responses( - model=self.model, - input=messages, - tools=tools, - drop_params=True, - **self.params - ) \ No newline at end of file + except Exception as e: + err = str(e) + if self._is_reasoning_effort_error(err): + params.pop("reasoning_effort", None) + return litellm.completion( + model=self.model, messages=messages, tools=tools, + drop_params=True, stream=stream, **params, **kwargs, + ) + if self._is_image_deserialize_error(err): + sanitized = self._strip_images_from_messages(messages) + return litellm.completion( + model=self.model, messages=sanitized, tools=tools, + drop_params=True, stream=stream, **params, **kwargs, + ) + raise \ No newline at end of file diff --git a/py-src/data_formulator/agents/context.py b/py-src/data_formulator/agents/context.py index b110353c..c132a286 100644 --- a/py-src/data_formulator/agents/context.py +++ b/py-src/data_formulator/agents/context.py @@ -26,6 +26,61 @@ TABLE_SAMPLE_CHAR_LIMIT = 1000 +def _ensure_no_auth_catalogs_cached(user_home: Any) -> None: + """Populate the disk catalog cache for any admin connector that has no + required auth parameters and isn't cached yet. + + Used to surface zero-config admin connectors (notably the built-in + ``sample_datasets`` connector) to the agent's search/read tools on + first use, without requiring an explicit "Connect" step in the UI. + Silent on failure — auth-gated connectors will simply remain + un-synced until the user provides credentials through the normal + flow. + """ + if not user_home: + return + try: + from pathlib import Path + from data_formulator.data_connector import ( + DATA_CONNECTORS, + _ADMIN_CONNECTOR_IDS, + ) + from data_formulator.datalake.catalog_cache import save_catalog + + cache_dir = Path(user_home) / "catalog_cache" + for source_id in list(_ADMIN_CONNECTOR_IDS): + cache_path = cache_dir / f"{source_id}.json" + if cache_path.exists(): + continue + connector = DATA_CONNECTORS.get(source_id) + if not connector: + continue + loader_class = connector._loader_class + try: + params = loader_class.list_params() + except Exception: + continue + # Only auto-sync if no params are required (true no-auth case) + if any(p.get("required") for p in params): + continue + try: + loader = loader_class(connector._default_params or {}) + if not loader.test_connection(): + continue + tables = loader.sync_catalog_metadata() + save_catalog(Path(user_home), source_id, tables) + logger.info( + "Auto-synced catalog for '%s' (%d tables)", + source_id, len(tables), + ) + except Exception: + logger.debug( + "Auto-sync failed for '%s'", source_id, exc_info=True, + ) + except Exception: + logger.debug("Catalog auto-sync setup failed", exc_info=True) + + def _get_workspace_metadata_lookups(workspace: Any) -> tuple[dict[str, str], dict[str, dict[str, str]], dict[str, str]]: """Return table descriptions, column descriptions, and import options from workspace metadata.""" table_descs: dict[str, str] = {} @@ -284,91 +339,6 @@ def handle_inspect_source_data( return content -def handle_search_data_tables( - query: str, - scope: str, - workspace: Any, -) -> str: - """Handle a search_data_tables tool call. - - Combines workspace metadata search (layer 1) and disk catalog cache - search (layer 2) into a single Level 0 result set. - - The user home directory is resolved from ``workspace.user_home`` - automatically; catalog cache files live under ``/catalog_cache/``. - - Returns a text summary suitable for LLM consumption. Results are - capped to keep context usage low (~3K tokens). - """ - user_home = getattr(workspace, "user_home", None) - if not query or not query.strip(): - return "Please provide a search keyword." - - results: list[dict[str, Any]] = [] - - # ── Layer 1: workspace metadata search ─────────────────────────── - if scope in ("workspace", "all"): - try: - ws_meta = workspace.get_metadata() - if ws_meta: - ws_hits = ws_meta.search_tables(query, limit=50) - for hit in ws_hits: - results.append({ - "source": "workspace", - "name": hit["name"], - "description": (hit.get("description") or "")[:120], - "matched_columns": hit.get("matched_columns", []), - "column_count": hit.get("column_count", 0), - "status": "imported", - }) - except Exception: - logger.debug("Workspace search failed", exc_info=True) - - # ── Layer 2: disk catalog cache search ─────────────────────────── - if scope in ("connected", "all") and user_home: - try: - from data_formulator.datalake.catalog_cache import search_catalog_cache - - imported_names = {r["name"] for r in results} - cache_hits = search_catalog_cache( - user_home, - query, - limit_per_source=20, - exclude_tables=imported_names, - ) - for hit in cache_hits: - results.append({ - "source": hit.get("source_id", "connected"), - "source_id": hit.get("source_id", ""), - "table_key": hit.get("table_key", ""), - "name": hit["name"], - "description": (hit.get("description") or "")[:120], - "matched_columns": hit.get("matched_columns", []), - "column_count": hit.get("column_count", 0), - "status": "not imported", - }) - except Exception: - logger.debug("Catalog cache search failed", exc_info=True) - - if not results: - return f"No tables found matching '{query}'." - - lines = [f"Search results for '{query}' ({len(results)} matches):\n"] - for i, r in enumerate(results, 1): - line = f"{i}. [{r['source']}] {r['name']}" - if r["description"]: - line += f" — {r['description']}" - if r["matched_columns"]: - line += f" (matched columns: {', '.join(r['matched_columns'][:5])})" - line += f" [{r['status']}]" - if r.get("source_id") and r.get("table_key"): - line += f" {{source_id: {r['source_id']}, table_key: {r['table_key']}}}" - lines.append(line) - - text = "\n".join(lines) - return text[:3000] + "\n..." if len(text) > 3000 else text - - def handle_read_catalog_metadata( source_id: str, table_key: str, @@ -389,6 +359,9 @@ def handle_read_catalog_metadata( if not user_home: return "Cannot read catalog metadata: user home not available." + # Surface zero-config admin connectors (e.g. sample_datasets) on first use. + _ensure_no_auth_catalogs_cached(user_home) + from pathlib import Path from data_formulator.datalake.catalog_cache import load_catalog diff --git a/py-src/data_formulator/agents/data_agent.py b/py-src/data_formulator/agents/data_agent.py index 1b58bc15..8e9cd39a 100644 --- a/py-src/data_formulator/agents/data_agent.py +++ b/py-src/data_formulator/agents/data_agent.py @@ -7,9 +7,9 @@ - **Tools** (explore, inspect_source_data): Called via OpenAI tool-calling API within a single LLM turn. The agent gathers data silently — these are internal to the agent and not surfaced to the user. - - **Actions** (visualize, clarify, explain, present): Structured JSON output - in the LLM's text response. These are externalized to the user — each - one ends the current turn and produces visible output. + - **Actions** (visualize, clarify, explain, summary, delegate): Structured + JSON output in the LLM's text response. These are externalized to the + user — each one ends the current turn and produces visible output. The server-side while loop handles one action per iteration: 1. Call LLM (with tools) → agent may call tools internally @@ -26,11 +26,11 @@ from pathlib import Path from typing import Any, Generator -import litellm -import openai import pandas as pd +from data_formulator.agent_config import reasoning_effort_for from data_formulator.agents.agent_utils import ( + attach_reasoning_content, ensure_output_variable_in_code, extract_json_objects, generate_data_summary, @@ -40,11 +40,10 @@ build_lightweight_table_context, build_peripheral_thread_context, handle_inspect_source_data, - handle_read_catalog_metadata, - handle_search_data_tables, ) from data_formulator.agents.client_utils import Client -from data_formulator.prompts.chart_creation_guide import CHART_CREATION_GUIDE +from data_formulator.datalake.parquet_utils import df_to_safe_records +from data_formulator.agents.chart_creation_guide import CHART_CREATION_GUIDE from data_formulator.security.code_signing import sign_result from data_formulator.workflows.create_vl_plots import ( assemble_vegailte_chart, @@ -56,8 +55,10 @@ logger = logging.getLogger(__name__) +_AGENT_ID = "data_agent" + # ── Weak-model rescue helpers ───────────────────────────────────────────── -# When a weaker LLM calls visualize/clarify/explain/present as a tool instead +# When a weaker LLM calls visualize/clarify/explain/summary as a tool instead # of outputting JSON in text, these helpers validate and normalise the args # so the action can be rescued without wasting rounds. @@ -65,9 +66,13 @@ "visualize": ["code", "output_variable", "chart"], "clarify": ["questions"], "explain": ["explanation"], - "present": ["summary"], + "summary": ["summary"], + "delegate": ["target", "options"], } +# Valid targets for a `delegate` action. +_DELEGATE_TARGETS: tuple[str, ...] = ("data_loading", "report_gen") + def _rescue_unpack_json_strings(data: dict) -> None: """In-place: parse values that are JSON-encoded strings back to objects. @@ -96,27 +101,6 @@ def _rescue_validate_action(data: dict) -> list[str]: # gather data before committing to a user-visible action. TOOLS = [ - { - "type": "function", - "function": { - "name": "think", - "description": ( - "Share your reasoning or findings with the user before taking " - "an action. Use this to explain what you discovered from the " - "data and what you plan to do next." - ), - "parameters": { - "type": "object", - "properties": { - "message": { - "type": "string", - "description": "Your reasoning, findings, or plan.", - }, - }, - "required": ["message"], - }, - }, - }, { "type": "function", "function": { @@ -164,61 +148,6 @@ def _rescue_validate_action(data: dict) -> list[str]: }, }, }, - { - "type": "function", - "function": { - "name": "search_data_tables", - "description": ( - "Search for tables by keyword across workspace and connected " - "data sources. Returns Level 0 summaries: table name, one-line " - "description, matched columns, and — for not-imported tables — " - "source_id and table_key needed by read_catalog_metadata. " - "Use inspect_source_data for imported tables, " - "read_catalog_metadata for not-imported candidates." - ), - "parameters": { - "type": "object", - "properties": { - "query": { - "type": "string", - "description": "Keyword to search for (matches table names, descriptions, column names, column descriptions).", - }, - "scope": { - "type": "string", - "enum": ["workspace", "connected", "all"], - "description": "Search scope: 'workspace' (imported tables only), 'connected' (cached catalogs from connected sources), 'all' (both).", - }, - }, - "required": ["query"], - }, - }, - }, - { - "type": "function", - "function": { - "name": "read_catalog_metadata", - "description": ( - "Read detailed metadata for a specific table from a connected " - "data source's cached catalog. Shows columns, types, descriptions " - "(both source and user-annotated), schema, row count, and metadata " - "status. Use after search_data_tables finds a not-imported candidate." - ), - "parameters": { - "type": "object", - "properties": { - "source_id": { - "type": "string", - "description": "The data source / connector ID (from search results).", - }, - "table_key": { - "type": "string", - "description": "The table's unique key (UUID or _source_name, from search results).", - }, - }, - "required": ["source_id", "table_key"], - }, - }, - }, { "type": "function", "function": { @@ -286,30 +215,23 @@ def _rescue_validate_action(data: dict) -> list[str]: ## Tools (internal — for data gathering) -You have tools you can call to gather data and share reasoning: +You have tools you can call to gather data: -- **think(message)** — optionally share your reasoning or findings with - the user before taking an action. Useful for complex analyses where - you want to explain what you found and why you chose this chart. - **explore(code)** — run Python code to inspect data, compute stats, etc. **Important**: each call runs in a fresh namespace — variables do NOT persist between calls. Combine all related operations (loading, transforming, printing) into a single explore() call. - **inspect_source_data(table_names)** — get schema, stats, and sample rows for source tables (cheaper than explore for basic inspection). -- **search_data_tables(query, scope)** — search for tables by keyword across - workspace and connected data sources. Returns Level 0 summaries (name, - description, matched columns). Not-imported results include ``source_id`` - and ``table_key`` — pass them to read_catalog_metadata to see full details. - For imported tables, use inspect_source_data instead. -- **read_catalog_metadata(source_id, table_key)** — read full cached metadata - for a specific not-imported table from a connected source (columns, types, - descriptions, schema, row count). Only use with ``source_id`` and - ``table_key`` from search_data_tables results; do NOT fabricate these values. - **search_knowledge(query, categories?)** — search the user's knowledge base (rules, experiences) for relevant entries. - **read_knowledge(category, path)** — read the full content of a knowledge entry. +You analyse data that is **already in the workspace**. If the user's +question requires data that isn't present, do NOT try to find it yourself — +emit a `delegate` action targeting the Data Loading agent and the user +can hand off in one click. + The initial context already includes sample rows and statistics for each table. If the data is straightforward, proceed directly to your action without calling tools. Tool results are returned to you before you @@ -321,18 +243,18 @@ def _rescue_validate_action(data: dict) -> list[str]: **exactly one action** as a JSON object in your text response. Actions are shown to the user and end the current turn. -⚠ **CRITICAL**: `visualize`, `clarify`, `explain`, and `present` are -**actions**, NOT tools. Never call them via function/tool calling — they -MUST appear as a JSON object in your **text reply**. Only the items -listed in the Tools section above (`explore`, `inspect_source_data`, -`search_data_tables`, `read_catalog_metadata`, `search_knowledge`, -`read_knowledge`, `think`) may be invoked as tool calls. +⚠ **CRITICAL**: `visualize`, `clarify`, `explain`, `summary`, and +`delegate` are **actions**, NOT tools. Never call them via +function/tool calling — they MUST appear as a JSON object in your **text +reply**. Only the items listed in the Tools section above (`explore`, +`inspect_source_data`, `search_knowledge`, `read_knowledge`) may be +invoked as tool calls. ### `visualize` ```json {{ "action": "visualize", - "display_instruction": "", + "display_instruction": "<≤12 words. State the question or hypothesis the chart investigates — don't recap the chart spec (x/y/color/split are already visible). Bold a **column** if it anchors the question. ✗ 'Plotting price over time, split by fuel, to see trends'>", "input_tables": [""], "code": "", "output_variable": "", @@ -370,9 +292,9 @@ def _rescue_validate_action(data: dict) -> list[str]: ```json {{ "action": "explain", - "explanation": "", + "explanation": "", "followups": [ - "" + "" ] }} ``` @@ -385,32 +307,63 @@ def _rescue_validate_action(data: dict) -> list[str]: should lead to a `visualize` action on the next turn. Omit `followups` entirely if no useful chart-producing follow-ups exist. -### `present` +**Column-name emphasis:** in `explain.explanation`, `followups[]`, and +`clarify.questions[].text` / `options[]`, you may wrap a column name in +`**…**` to render it as a highlighted token in the UI. + +### `summary` ```json {{ - "action": "present", + "action": "summary", "summary": "" }} ``` +Use `summary` to end the run after visualization(s) with a one-sentence +closing remark on the key finding. This is the standard close for any +question you answer with charts. + +### `delegate` +```json +{{ + "action": "delegate", + "target": "data_loading" | "report_gen", + "message": "", + "options": [""] +}} +``` + +Use `delegate` to hand off to a peer agent. Each option becomes a one-click +button (the string is both the button label and the seed prompt). Provide +1–2 options; if two, make them meaningfully distinct (e.g. different search +angles, or executive summary vs. deep-dive). + +Valid `target` values: +- **`data_loading`** — the user's question needs data that isn't in the + workspace. Options are short search phrases (e.g. `'monthly orders 2024'`). + Prefer `clarify` if the workspace tables might already cover the question. +- **`report_gen`** — the user wants a narrative report or write-up over + the charts already produced. Options restate the report style in one + short sentence. + ## Understanding your context {{context_guide}} ## Decision guidelines -- **Start** by understanding the question and data. Use tools if needed, - then `visualize`. If ambiguous, `clarify`. If the user is asking a - conceptual / informational question that does not call for a new chart, - `explain`. -- **After a visualization**, review the observation (data + chart) and: - - `visualize` again to go deeper (drill-down, breakdown, comparison). - - `present` if findings are sufficient. - - `clarify` if the question needs scoping. - - `explain` if the user is just asking about meaning / context. -- **Build a narrative**: overview → drill-down → comparison. -- **Never** repeat a visualization already in the trajectory. -- Present after at most {max_iterations} visualization steps. +- **Classify the question first** (silently) to calibrate effort, not as a hard rule: + - *Conceptual / informational* (asking about meaning, schema, what a field represents, why something is the way it is — no chart needed): use `explain`. + - *Concrete* (one specific answer, e.g. "avg price by region", "which sold most"): **1 visualization** → `summary` (one-line takeaway). + - *Progressive* (one question best answered by a small sequence, e.g. "why did revenue drop?", "compare regions"): **2–3 visualizations** → `summary` to tie them together. + - *Open-ended* (explicit exploration, e.g. "explore", "overview", "what's interesting"): **3–5 visualizations** forming a narrative → `summary` to tie them together. + - *Hand-off needed* — use `delegate` as the terminal action when the request is better served by a peer agent: + - *Missing data* (the user's question needs tables not in the workspace): `delegate(target="data_loading")` with a short search phrase as `prompt`. + - *Report request* (e.g. "create a report about X", "write up the findings", "summarize Y as a narrative"): produce any charts the report needs (0–3, judgment-based — if the workspace already has relevant charts you may delegate immediately), then end with `delegate(target="report_gen")`. +- **After each chart**, continue only if the next chart answers a gap *raised* by the previous one — not just another interesting angle. Otherwise close out (`summary`, or `delegate` for hand-off cases) and let the user ask for more. +- If ambiguous, `clarify`. +- **Never** repeat a visualization already in the trajectory or in another thread. +- {max_iterations} visualizations is a **hard ceiling**, not a target. {agent_exploration_rules} ''' @@ -500,6 +453,7 @@ def run( ``"explore_result"`` – explore code output ``"clarify"`` – clarification question (loop pauses) ``"explain"`` – conversational explanation (loop pauses) + ``"delegate"`` – hand-off to a peer agent (loop terminates) ``"completion"`` – final summary (loop terminates) ``"error"`` – error information """ @@ -609,8 +563,8 @@ def run( "label_code": "agent.clarifyOptionSimplify", }, { - "label": "Present what you have so far", - "label_code": "agent.clarifyOptionPresent", + "label": "Summarize what you have so far", + "label_code": "agent.clarifyOptionSummary", }, ], } @@ -647,7 +601,7 @@ def run( "Here is what was already completed:\n" f"{steps_summary}\n\n" "Please output a JSON action object " - "(visualize / clarify / explain / present) " + "(visualize / clarify / explain / summary / delegate) " "to continue." ), }) @@ -717,8 +671,8 @@ def run( self._log_session_end(rlog, final_status, iteration, total_llm_calls, session_start_time) return - elif action_type == "present": - rlog.log("action_execution", action="present", status="ok", + elif action_type == "summary": + rlog.log("action_execution", action="summary", status="ok", iteration=iteration, total_steps=len(completed_steps)) final_status = "success" yield { @@ -734,6 +688,32 @@ def run( self._log_session_end(rlog, final_status, iteration, total_llm_calls, session_start_time) return + elif action_type == "delegate": + rlog.log("action_execution", action="delegate", status="ok", + iteration=iteration) + final_status = "delegate" + try: + delegate_payload = self._normalize_delegate_action(action) + except ValueError as exc: + final_status = "parse_failed" + yield self._error_event( + iteration, + str(exc) or "delegate action requires non-empty target and options.", + message_code="agent.parseActionFailed", + ) + self._log_session_end(rlog, final_status, iteration, total_llm_calls, session_start_time) + return + yield { + "type": "delegate", + "iteration": iteration, + "thought": action.get("thought", ""), + **delegate_payload, + "trajectory": self._strip_images(trajectory), + "completed_step_count": len(completed_steps), + } + self._log_session_end(rlog, final_status, iteration, total_llm_calls, session_start_time) + return + elif action_type == "visualize": code = action.get("code", "") output_variable = action.get("output_variable", "result_df") @@ -812,7 +792,7 @@ def run( "role": "user", "content": ( f"[ERROR] Unknown action '{action_type}'. " - "Please choose one of: visualize, clarify, explain, present." + "Please choose one of: visualize, clarify, explain, summary, delegate." ), }) yield self._error_event(iteration, f"Unknown action: {action_type}", message_code="agent.unknownAction") @@ -935,6 +915,47 @@ def _normalize_explain_action(cls, action: dict[str, Any]) -> dict[str, Any]: question["options"] = options return {"questions": [question]} + @classmethod + def _normalize_delegate_action(cls, action: dict[str, Any]) -> dict[str, Any]: + """Normalize a delegate action. + + The agent emits this when it wants to hand off to a peer agent + (e.g. the Data Loading agent when the workspace lacks needed + data, or the Report Gen agent when the user wants a written + report). The frontend renders each option as a one-click + handoff card. + + Shape: ``{target, message?, options: [str, ...]}`` with 1–2 + options. + """ + target = str(action.get("target", "")).strip() + if target not in _DELEGATE_TARGETS: + raise ValueError( + f"delegate action requires 'target' ∈ {_DELEGATE_TARGETS}, got {target!r}" + ) + + message = str(action.get("message") or "").strip() + + raw_options = action.get("options") + cleaned: list[str] = [] + if isinstance(raw_options, list): + for opt in raw_options: + if isinstance(opt, str): + text = opt.strip() + if text: + cleaned.append(text) + + if not cleaned: + raise ValueError("delegate action requires non-empty 'options[]'") + + # Cap at 2 — keep the user choice cognitively light. + cleaned = cleaned[:2] + + payload: dict[str, Any] = {"target": target, "options": cleaned} + if message: + payload["message"] = message + return payload + # ------------------------------------------------------------------ # Visualize execution (with repair) # ------------------------------------------------------------------ @@ -1184,7 +1205,7 @@ def _run_visualize_code( "status": "ok", "code": code, "content": { - "rows": json.loads(query_output.to_json(orient='records')), + "rows": df_to_safe_records(query_output), "virtual": { "table_name": output_table_name, "row_count": row_count, @@ -1280,7 +1301,7 @@ def _build_system_prompt( ) if has_other_threads: context_lines.append( - "- **[OTHER THREADS]**: Brief summaries of other exploration threads." + "- **[OTHER THREADS]**: Brief summaries of other exploration threads in this workspace. " ) if has_attached_images: context_lines.append( @@ -1596,6 +1617,7 @@ def _tool_loop( "role": "assistant", "content": content or None, } + attach_reasoning_content(assistant_msg, choice.message) assistant_msg["tool_calls"] = [ { "id": tc.id, @@ -1622,17 +1644,13 @@ def _tool_loop( "purpose": tool_args.get("purpose") if tool_name == "explore" else None, "code": tool_args.get("code") if tool_name == "explore" else None, "table_names": tool_args.get("table_names") if tool_name == "inspect_source_data" else None, - "query": tool_args.get("query") if tool_name in ("search_data_tables", "search_knowledge") else None, + "query": tool_args.get("query") if tool_name == "search_knowledge" else None, } tool_t0 = time.time() tool_status = "ok" - if tool_name == "think": - thought_msg = tool_args.get("message", "") - tool_content = "ok" - yield {"type": "thinking_text", "content": thought_msg} - elif tool_name == "explore": + if tool_name == "explore": result = self._run_explore_code( tool_args.get("code", ""), input_tables or [], @@ -1659,30 +1677,6 @@ def _tool_loop( "status": "ok", "stdout": tool_content, } - elif tool_name == "search_data_tables": - tool_content = handle_search_data_tables( - query=tool_args.get("query", ""), - scope=tool_args.get("scope", "all"), - workspace=self.workspace, - ) - yield { - "type": "tool_result", - "tool": tool_name, - "status": "ok", - "stdout": tool_content, - } - elif tool_name == "read_catalog_metadata": - tool_content = handle_read_catalog_metadata( - source_id=tool_args.get("source_id", ""), - table_key=tool_args.get("table_key", ""), - workspace=self.workspace, - ) - yield { - "type": "tool_result", - "tool": tool_name, - "status": "ok", - "stdout": tool_content, - } elif tool_name == "search_knowledge": tool_content = self._handle_search_knowledge(tool_args) rlog.log("knowledge_search", @@ -1702,10 +1696,10 @@ def _tool_loop( "status": "ok", "stdout": tool_content, } - elif tool_name in ("visualize", "clarify", "explain", "present", "action"): + elif tool_name in ("visualize", "clarify", "explain", "summary", "delegate", "action"): action_data = dict(tool_args) if "action" not in action_data: - real_name = tool_name if tool_name != "action" else action_data.get("type", "present") + real_name = tool_name if tool_name != "action" else action_data.get("type", "summary") action_data["action"] = real_name _rescue_unpack_json_strings(action_data) @@ -1774,13 +1768,15 @@ def _tool_loop( json_retries += 1 logger.warning("[DataAgent] No JSON found (retry %d/%d), asking LLM to reformat", json_retries, max_json_retries) - messages.append({"role": "assistant", "content": content}) + retry_assistant_msg: dict[str, Any] = {"role": "assistant", "content": content} + attach_reasoning_content(retry_assistant_msg, choice.message) + messages.append(retry_assistant_msg) messages.append({ "role": "user", "content": ( "[FORMAT ERROR] Your previous response did not contain a valid JSON action. " "Please output ONLY a JSON object with one of these actions: " - "visualize, clarify, explain, or present. Do NOT repeat your analysis — " + "visualize, clarify, explain, summary, or delegate. Do NOT repeat your analysis — " "just reformat your conclusion as JSON." ), }) @@ -1837,48 +1833,9 @@ def _call_llm(self, messages: list[dict]): def _call_llm_once(self, messages: list[dict]): """Single LLM call (no retry).""" - if self.client.endpoint == "openai": - client = openai.OpenAI( - base_url=self.client.params.get("api_base", None), - api_key=self.client.params.get("api_key", ""), - timeout=120, - ) - try: - return client.chat.completions.create( - model=self.client.model, - messages=messages, - tools=TOOLS, - ) - except Exception as e: - if self.client._is_image_deserialize_error(str(e)): - sanitized = self.client._strip_images_from_messages(messages) - return client.chat.completions.create( - model=self.client.model, - messages=sanitized, - tools=TOOLS, - ) - raise - else: - params = self.client.params.copy() - try: - return litellm.completion( - model=self.client.model, - messages=messages, - tools=TOOLS, - drop_params=True, - **params, - ) - except Exception as e: - if self.client._is_image_deserialize_error(str(e)): - sanitized = self.client._strip_images_from_messages(messages) - return litellm.completion( - model=self.client.model, - messages=sanitized, - tools=TOOLS, - drop_params=True, - **params, - ) - raise + return self.client.get_completion_with_tools( + messages, tools=TOOLS, reasoning_effort=reasoning_effort_for(_AGENT_ID, self.client.model), + ) # ------------------------------------------------------------------ # Observation formatting diff --git a/py-src/data_formulator/app.py b/py-src/data_formulator/app.py index 09d880f5..47d2bda8 100644 --- a/py-src/data_formulator/app.py +++ b/py-src/data_formulator/app.py @@ -4,10 +4,23 @@ import argparse import sys import os +import warnings import mimetypes mimetypes.add_type('application/javascript', '.js') mimetypes.add_type('application/javascript', '.mjs') +# Suppress a noisy pydantic serializer warning emitted by litellm/openai +# when a Chat Completions ``usage`` dict (prompt_tokens / completion_tokens) +# gets serialized through a model whose field is typed as the Responses +# API ``ResponseAPIUsage``. The response itself is correct (HTTP 200); +# only the serializer's "may not be as expected" notice is misleading. +warnings.filterwarnings( + "ignore", + message=r"Pydantic serializer warnings", + category=UserWarning, + module=r"pydantic\..*", +) + import flask from flask import Flask, request, send_from_directory from flask import stream_with_context, Response @@ -46,7 +59,7 @@ # Stores SSO tokens + service tokens without hitting the 4 KB cookie limit. _data_home = os.environ.get( 'DATA_FORMULATOR_HOME', - str(Path.home() / '.data-formulator'), + str(Path.home() / '.data_formulator'), ) _session_dir = os.path.join(_data_home, 'sessions') os.makedirs(_session_dir, exist_ok=True) @@ -210,13 +223,15 @@ def _register_blueprints(): from data_formulator.routes.knowledge import knowledge_bp app.register_blueprint(knowledge_bp) - # Auto-register all installed data loaders as DataConnector instances - if not app.config['CLI_ARGS'].get('disable_data_connectors'): - with spinner("Loading data connectors"): - from data_formulator.data_connector import register_data_connectors - register_data_connectors(app) - else: - print(" Data connectors disabled (DISABLE_DATA_CONNECTORS=true)", flush=True) + # Auto-register all installed data loaders as DataConnector instances. + # We always run this so the connectors blueprint and the built-in + # 'sample_datasets' connector are available; the function itself + # honors disable_data_connectors by skipping admin YAML/env specs. + with spinner("Loading data connectors"): + from data_formulator.data_connector import register_data_connectors + register_data_connectors(app) + if app.config['CLI_ARGS'].get('disable_data_connectors'): + print(" External data connectors disabled (DISABLE_DATA_CONNECTORS=true) — sample datasets remain available", flush=True) def _safety_checks(): @@ -240,13 +255,6 @@ def _safety_checks(): _safety_checks() -@app.route('/api/example-datasets') -def get_sample_datasets(): - from data_formulator.example_datasets_config import EXAMPLE_DATASETS - from data_formulator.error_handler import json_ok - return json_ok(EXAMPLE_DATASETS) - - @app.route("/", defaults={"path": ""}) def index_alt(path): logger.info(app.static_folder) diff --git a/py-src/data_formulator/auth/gateways/oidc_gateway.py b/py-src/data_formulator/auth/gateways/oidc_gateway.py index 03227dbf..57b34af4 100644 --- a/py-src/data_formulator/auth/gateways/oidc_gateway.py +++ b/py-src/data_formulator/auth/gateways/oidc_gateway.py @@ -115,6 +115,14 @@ def oidc_callback(): if not is_backend_oidc_mode(): return _error_redirect("backend_oidc_not_enabled") + idp_error = request.args.get("error") + if idp_error: + logger.warning("OIDC callback: IdP returned error=%s", idp_error) + session.pop("_oauth_state", None) + return _error_redirect( + "access_denied" if idp_error == "access_denied" else "token_exchange_failed", + ) + code = request.args.get("code") state = request.args.get("state") diff --git a/py-src/data_formulator/data_connector.py b/py-src/data_formulator/data_connector.py index 3a7ebc0f..f56f4aca 100644 --- a/py-src/data_formulator/data_connector.py +++ b/py-src/data_formulator/data_connector.py @@ -37,7 +37,7 @@ SENSITIVE_PARAMS, ) from data_formulator.data_loader.connector_errors import classify_connector_error -from data_formulator.datalake.parquet_utils import normalize_dtype_to_app_type +from data_formulator.datalake.parquet_utils import normalize_dtype_to_app_type, df_to_safe_records from data_formulator.security.path_safety import ConfinedDir logger = logging.getLogger(__name__) @@ -237,8 +237,21 @@ def _visible_connector_items(identity: str | None) -> list[tuple[str, "DataConne Admin connectors are global. User connectors are keyed by identity in the process registry. Raw non-admin entries are treated as legacy/test globals; newly created user connectors should use ``_user_connector_key``. + + When external connectors are disabled (browser-only / hosted mode), only + built-in admin connectors (e.g. ``sample_datasets``) are exposed — + previously-persisted user connectors on disk are hidden so the sidebar + stays clean and consistent with the disabled-add-connector UI. """ - if identity: + from flask import current_app + + try: + disabled = bool(current_app.config.get('CLI_ARGS', {}).get('disable_data_connectors')) + except RuntimeError: + # Outside an app context (e.g. unit tests) — fall back to enabled. + disabled = False + + if identity and not disabled: load_connectors(identity) result = [] @@ -246,6 +259,9 @@ def _visible_connector_items(identity: str | None) -> list[tuple[str, "DataConne for key, connector in DATA_CONNECTORS.items(): if key in _ADMIN_CONNECTOR_IDS: result.append((key, connector, True)) + elif disabled: + # Skip user / legacy connectors entirely when disabled. + continue elif user_prefix and key.startswith(user_prefix): result.append((key, connector, False)) elif not _is_user_connector_key(key): @@ -266,6 +282,12 @@ def _resolve_connector_with_key(data: dict[str, Any]) -> tuple[str, "DataConnect except Exception as exc: raise AppError(ErrorCode.INVALID_REQUEST, "Identity is required") from exc + # Ensure this identity's user connectors are lazily restored from disk + # specs into the in-process registry. Without this, a fresh server + # process can fail with "Connector not found" on the first import/preview + # call when the frontend hasn't yet fetched the connector list. + load_connectors(identity) + # Admin/global connector IDs are public registry keys. if connector_id in _ADMIN_CONNECTOR_IDS and connector_id in DATA_CONNECTORS: return connector_id, DATA_CONNECTORS[connector_id] @@ -687,10 +709,14 @@ def list_data_loaders(): This is the discovery endpoint — tells the frontend what kinds of connectors can be created. """ - from data_formulator.data_loader import DATA_LOADERS, DISABLED_LOADERS + from data_formulator.data_loader import ( + DATA_LOADERS, DISABLED_LOADERS, PLUGIN_LOADERS, PLUGIN_ERRORS, PLUGIN_DIR, + ) + from data_formulator.data_loader import _plugin_scanning_enabled # type: ignore[attr-defined] from data_formulator.auth.identity import is_local_mode loaders = [] + plugin_loaded_summary = [] for key, loader_class in DATA_LOADERS.items(): # local_folder has its own dedicated card — hide from Add Connection list if key == "local_folder": @@ -698,22 +724,39 @@ def list_data_loaders(): params = loader_class.list_params() # Append common table_filter param (same as DataConnector.get_frontend_config) params.append(DataConnector._TABLE_FILTER_PARAM) + plugin_path = PLUGIN_LOADERS.get(key) + display_name = loader_class.DISPLAY_NAME or key.replace("_", " ").title() loaders.append({ "type": key, - "name": key.replace("_", " ").title(), + "name": display_name, "params": params, "hierarchy": _hierarchy_dicts(loader_class.catalog_hierarchy()), "auth_mode": loader_class.auth_mode(), "auth_instructions": loader_class.auth_instructions(), "delegated_login": loader_class.delegated_login_config(), + "source": "plugin" if plugin_path else "builtin", + "source_path": plugin_path, }) + if plugin_path: + plugin_loaded_summary.append({ + "type": key, "name": display_name, "source_path": plugin_path, + }) disabled = { name: {"install_hint": hint} for name, hint in DISABLED_LOADERS.items() } - return json_ok({"loaders": loaders, "disabled": disabled}) + enabled, reason = _plugin_scanning_enabled() + plugins_info = { + "dir": PLUGIN_DIR, + "enabled": enabled, + "reason": reason, + "loaded": plugin_loaded_summary, + "errors": list(PLUGIN_ERRORS), + } + + return json_ok({"loaders": loaders, "disabled": disabled, "plugins": plugins_info}) @connectors_bp.route("/api/local/pick-directory", methods=["POST"]) @@ -848,13 +891,18 @@ def list_connectors(): for registry_key, connector, is_admin in _visible_connector_items(identity): has_stored = False connected = False - if identity: + auth_mode = _loader_auth_mode(connector._loader_class) + if auth_mode == "none": + # No-auth connectors (e.g. built-in example datasets) are always + # available — there's no credential to store and no connection + # to establish. + connected = True + elif identity: has_stored = connector.has_stored_credentials(identity) connected = ( connector._get_loader(identity) is not None or has_stored ) - auth_mode = _loader_auth_mode(connector._loader_class) sso_blocked = ( token_store.is_sso_reconnect_blocked(connector._source_id) if token_store else False @@ -1143,6 +1191,21 @@ def connector_connect(): data = request.get_json() or {} source = _resolve_connector(data) + # No-auth connectors (e.g. built-in example datasets) have nothing to + # connect — they're always available. Return a synthetic success + # response so any (legacy) frontend code that still calls connect is + # a no-op rather than an error. + if _loader_auth_mode(source._loader_class) == "none": + loader = source._loader_class() + return json_ok({ + "status": "connected", + "persisted": False, + "params": loader.get_safe_params(), + "hierarchy": _hierarchy_dicts(loader.catalog_hierarchy()), + "effective_hierarchy": _hierarchy_dicts(loader.effective_hierarchy()), + "pinned_scope": loader.pinned_scope(), + }) + try: mode = data.get("mode", "credentials") persist = data.get("persist", True) @@ -1177,17 +1240,29 @@ def connector_connect(): safe = loader.get_safe_params() # Best-effort: seed a lightweight catalog for agent search. - # Do not overwrite a richer sync-catalog-metadata snapshot. + # Do not overwrite a richer sync-catalog-metadata snapshot, EXCEPT + # for local-folder sources: filesystem scans are cheap, and the + # cached snapshot otherwise goes stale whenever the user adds/renames + # files in the connected directory — which causes agent search to + # miss files that are clearly visible on disk. try: from data_formulator.datalake.catalog_cache import save_catalog from data_formulator.datalake.workspace import get_user_home + from data_formulator.data_loader.local_folder_data_loader import ( + LocalFolderDataLoader, + ) identity_for_cache = source._get_identity() user_home = get_user_home(identity_for_cache) flat_tables = loader.list_tables() loader.ensure_table_keys(flat_tables) + cache_mode = ( + "replace" + if isinstance(loader, LocalFolderDataLoader) + else "seed_if_missing" + ) save_catalog( user_home, source._source_id, flat_tables, - mode="seed_if_missing", + mode=cache_mode, ) except Exception: logger.debug("Failed to save catalog cache on connect for '%s'", @@ -1225,6 +1300,15 @@ def connector_disconnect(): data = request.get_json() or {} source = _resolve_connector(data) + # No-auth connectors (e.g. built-in example datasets) cannot be + # disconnected — they have no credentials to clear and are intentionally + # always available. + if _loader_auth_mode(source._loader_class) == "none": + raise AppError( + ErrorCode.INVALID_REQUEST, + "This connector is always available and cannot be disconnected.", + ) + try: identity = source._get_identity() source._loaders.pop(identity, None) @@ -1251,6 +1335,18 @@ def connector_get_status(): data = request.get_json() or {} source = _resolve_connector(data) + # No-auth connectors are always connected. + if _loader_auth_mode(source._loader_class) == "none": + loader = source._loader_class() + return json_ok({ + "connected": True, + "persisted": False, + "params": loader.get_safe_params(), + "hierarchy": _hierarchy_dicts(loader.catalog_hierarchy()), + "effective_hierarchy": _hierarchy_dicts(loader.effective_hierarchy()), + "pinned_scope": loader.pinned_scope(), + }) + identity = source._get_identity() loader = source._get_loader(identity) if loader is None: @@ -1359,19 +1455,39 @@ def connector_get_catalog_tree(): name_filter = data.get("filter") from data_formulator.datalake.workspace import get_user_home - from data_formulator.datalake.catalog_cache import _load_catalog_raw + from data_formulator.datalake.catalog_cache import _load_catalog_raw, save_catalog + from data_formulator.data_loader.local_folder_data_loader import ( + LocalFolderDataLoader, + ) identity = source._get_identity() user_home = get_user_home(identity) raw = _load_catalog_raw(user_home, source._source_id) - if raw and raw.get("tables"): + # Local folders: always re-scan from disk. Filesystem listing is cheap + # and the on-disk snapshot otherwise goes stale (e.g. new files added + # after connect are invisible to agent search). + is_local_folder = isinstance(loader, LocalFolderDataLoader) + + if raw and raw.get("tables") and not is_local_folder: flat_tables = _filter_catalog_tables(raw.get("tables", []), name_filter) flat_tables = _merged_catalog_tables(user_home, source._source_id, flat_tables) else: flat_tables = loader.list_tables(table_filter=name_filter) loader.ensure_table_keys(flat_tables) flat_tables = _merged_catalog_tables(user_home, source._source_id, flat_tables) + if is_local_folder and not name_filter: + # Persist the fresh listing so agent search stays in sync. + try: + save_catalog( + user_home, source._source_id, flat_tables, + mode="replace", + ) + except Exception: + logger.debug( + "Failed to refresh local-folder catalog cache for '%s'", + source._source_id, exc_info=True, + ) tree = _catalog_tree_payload(loader, flat_tables) @@ -1671,7 +1787,7 @@ def connector_preview_data(): import_options=import_options, ) df = arrow_table.to_pandas() - rows = _json.loads(df.to_json(orient="records", date_format="iso")) + rows = df_to_safe_records(df) columns = [{"name": col, "type": normalize_dtype_to_app_type(str(df[col].dtype))} for col in df.columns] # Enrich columns with source-level types from loader metadata. @@ -2062,8 +2178,11 @@ def register_data_connectors(app: Flask) -> None: # 1. Register the global management blueprint app.register_blueprint(connectors_bp) - # 2. Load admin connectors - admin_specs = _load_admin_specs() + # 2. Load admin connectors from YAML/env (skipped when external connectors + # are disabled — but the blueprint and built-in sample_datasets + # connector below remain available so users can still load demo data). + disabled = bool(app.config.get('CLI_ARGS', {}).get('disable_data_connectors')) + admin_specs = [] if disabled else _load_admin_specs() for spec in admin_specs: loader_class = DATA_LOADERS.get(spec.loader_type) @@ -2096,3 +2215,19 @@ def register_data_connectors(app: Flask) -> None: for key, reason in DISABLED_LOADERS.items(): if key not in DATA_CONNECTORS: logger.info("Source '%s' not available: %s", key, reason) + + # 3. Always register the built-in sample datasets connector. This is + # the one data source that remains available even in + # ``--disable_database`` mode — it has no auth, no external + # dependency beyond ``requests``, and gives users a zero-config + # way to explore Data Formulator. + sample_loader_class = DATA_LOADERS.get("sample_datasets") + if sample_loader_class and "sample_datasets" not in DATA_CONNECTORS: + DATA_CONNECTORS["sample_datasets"] = DataConnector.from_loader( + sample_loader_class, + source_id="sample_datasets", + display_name="Example Datasets", + icon="dataset", + ) + _ADMIN_CONNECTOR_IDS.add("sample_datasets") + logger.info("Registered built-in 'sample_datasets' connector") diff --git a/py-src/data_formulator/data_loader/__init__.py b/py-src/data_formulator/data_loader/__init__.py index 258be78f..56a6e3b3 100644 --- a/py-src/data_formulator/data_loader/__init__.py +++ b/py-src/data_formulator/data_loader/__init__.py @@ -7,14 +7,33 @@ is independently imported via try/except so that a missing dependency only disables that one loader. -2. **External plugins** — Python files matching ``*_data_loader.py`` found - in the directory pointed to by ``DF_PLUGIN_DIR`` (default: - ``~/.data-formulator/plugins/``). Any ``ExternalDataLoader`` subclass - found in such a file is auto-registered. If a plugin key collides with - a built-in key the plugin wins (override). +2. **External plugins** — Python files matching ``*_data_loader.py`` + found in the plugin directory. Resolution order: + + 1. ``DF_PLUGIN_DIR`` env var — explicit override (useful for + team-shared dirs, read-only mounts, dev iteration). + 2. ``DATA_FORMULATOR_HOME/plugins`` — the default location, + consistent with every other DF artifact. + 3. ``~/.data_formulator/plugins/`` — final fallback when + ``DATA_FORMULATOR_HOME`` is unset. + + Any ``ExternalDataLoader`` subclass found in such a file is + auto-registered. If a plugin key collides with a built-in key the + plugin wins (override). + + For safety, plugin scanning is enabled only when running in local + single-user mode (``WORKSPACE_BACKEND=local``, the default). In + multi-user / hosted deployments, set ``DF_ALLOW_PLUGINS=1`` to opt in + explicitly — the plugin directory must be trusted, since loading a + plugin executes arbitrary Python code in the server process. ``DATA_LOADERS`` — loaders that imported successfully. -``DISABLED_LOADERS`` — loaders that failed, with a human-readable install hint. +``DISABLED_LOADERS`` — loaders that failed, with a human-readable hint. +``PLUGIN_LOADERS`` — subset of ``DATA_LOADERS`` registered from plugins, + mapped to the source file path. +``PLUGIN_ERRORS`` — plugins that were rejected (e.g. built-in override + attempt or duplicate key). Each entry is a dict: + ``{"file", "reason", "kind"}``. """ from __future__ import annotations @@ -49,6 +68,7 @@ ("athena", "data_formulator.data_loader.athena_data_loader", "AthenaDataLoader", "boto3"), ("superset", "data_formulator.data_loader.superset_data_loader", "SupersetLoader", "requests"), ("local_folder", "data_formulator.data_loader.local_folder_data_loader", "LocalFolderDataLoader", "pyarrow"), + ("sample_datasets", "data_formulator.data_loader.sample_datasets_loader", "SampleDatasetsLoader", "requests"), ] # --------------------------------------------------------------------------- @@ -56,7 +76,10 @@ # --------------------------------------------------------------------------- DATA_LOADERS: dict[str, type[ExternalDataLoader]] = {} -DISABLED_LOADERS: dict[str, str] = {} # key -> install instruction +DISABLED_LOADERS: dict[str, str] = {} # key -> install / failure hint +PLUGIN_LOADERS: dict[str, str] = {} # key -> absolute source file path +PLUGIN_ERRORS: list[dict] = [] # rejected plugin attempts (override / duplicate) +_BUILTIN_KEYS: frozenset[str] = frozenset(spec[0] for spec in _LOADER_SPECS) def _scan_package_loaders() -> None: """Import built-in loaders from ``_LOADER_SPECS``.""" @@ -78,57 +101,168 @@ def _scan_package_loaders() -> None: # Phase 2: scan external plugin directory # --------------------------------------------------------------------------- -_DEFAULT_PLUGIN_DIR = str(Path.home() / ".data-formulator" / "plugins") -PLUGIN_DIR: str = os.environ.get("DF_PLUGIN_DIR", _DEFAULT_PLUGIN_DIR) +def _resolve_plugin_dir() -> str: + """Resolve the plugin directory. + Order: ``DF_PLUGIN_DIR`` (explicit override) > + ``DATA_FORMULATOR_HOME/plugins`` (default) > + ``~/.data_formulator/plugins`` (fallback). + """ + explicit = os.environ.get("DF_PLUGIN_DIR") + if explicit: + return explicit + df_home = os.environ.get("DATA_FORMULATOR_HOME") + base = Path(df_home) if df_home else Path.home() / ".data_formulator" + return str(base / "plugins") + + +PLUGIN_DIR: str = _resolve_plugin_dir() -def _scan_plugin_dir() -> None: - """Scan ``PLUGIN_DIR`` for ``*_data_loader.py`` files. - Each file is loaded as a standalone module. Every public class that - subclasses ``ExternalDataLoader`` is registered. The registry key is - derived from the filename: ``my_custom_data_loader.py`` → ``my_custom``. +def _plugin_scanning_enabled() -> tuple[bool, str]: + """Return ``(enabled, reason)``. - Plugins override built-ins with the same key. + Plugin loading executes arbitrary Python in the server process, so it + is only enabled by default in single-user local mode. Hosted + deployments must opt in via ``DF_ALLOW_PLUGINS=1``. """ - plugin_path = Path(PLUGIN_DIR) - if not plugin_path.is_dir(): + if os.environ.get("DF_ALLOW_PLUGINS", "").lower() in ("1", "true", "yes"): + return True, "DF_ALLOW_PLUGINS opt-in" + backend = os.environ.get("WORKSPACE_BACKEND", "local") + if backend == "local": + return True, "WORKSPACE_BACKEND=local" + return False, f"WORKSPACE_BACKEND={backend} (set DF_ALLOW_PLUGINS=1 to enable)" + + +def _register_plugin_class(key: str, cls: type[ExternalDataLoader], py_file: Path) -> None: + """Register a plugin loader class. + + Plugins are **not** allowed to override built-in loaders or earlier + plugin loaders. Silent overrides are a credential-exfiltration risk + (a malicious ``mysql_data_loader.py`` could replace the built-in + MySQL connector and capture every existing MySQL connection's + password). Collisions are recorded in ``PLUGIN_ERRORS`` so the UI + can surface them at the top of the connector picker. + """ + if key in PLUGIN_LOADERS: + reason = ( + f"plugin file '{py_file.name}' would override earlier plugin loader '{key}' " + f"from {PLUGIN_LOADERS[key]} — rename one of the files so their registry keys differ." + ) + _log.error("Rejected plugin %s: %s", py_file.name, reason) + PLUGIN_ERRORS.append({"file": str(py_file), "reason": reason, "kind": "duplicate"}) + return + # Block override of any built-in spec, even if that built-in failed + # to import (otherwise a missing dependency would silently hand the + # registry slot to the plugin — stealth substitution risk). + if key in _BUILTIN_KEYS: + reason = ( + f"plugin file '{py_file.name}' would override built-in loader '{key}' — " + f"this is blocked for security reasons. Rename the plugin file to use a " + f"different prefix (e.g. '{key}_custom_data_loader.py')." + ) + _log.error("Rejected plugin %s: %s", py_file.name, reason) + PLUGIN_ERRORS.append({"file": str(py_file), "reason": reason, "kind": "override_builtin"}) return + DATA_LOADERS[key] = cls + PLUGIN_LOADERS[key] = str(py_file) + # Clear any stale disabled entry from a previous failed scan. + DISABLED_LOADERS.pop(key, None) + - for py_file in sorted(plugin_path.glob("*_data_loader.py")): - key = py_file.stem.removesuffix("_data_loader") - module_name = f"df_plugin_{key}" +def _load_plugin_file(py_file: Path) -> None: + """Load a single ``*_data_loader.py`` plugin file. + + On failure the key is recorded in ``DISABLED_LOADERS`` so the UI can + surface why the plugin is missing. ``sys.modules`` is cleaned up on + failure to avoid leaking a half-initialized module. + """ + key = py_file.stem.removesuffix("_data_loader") + module_name = f"df_plugin_{key}" + try: + spec = importlib.util.spec_from_file_location(module_name, py_file) + if spec is None or spec.loader is None: + DISABLED_LOADERS[key] = f"plugin {py_file.name}: could not create import spec" + return + mod = importlib.util.module_from_spec(spec) + # Only register in sys.modules *after* successful exec to avoid + # leaving a half-initialized module behind on failure. try: - spec = importlib.util.spec_from_file_location(module_name, py_file) - if spec is None or spec.loader is None: - continue - mod = importlib.util.module_from_spec(spec) sys.modules[module_name] = mod spec.loader.exec_module(mod) # type: ignore[union-attr] + except BaseException: + sys.modules.pop(module_name, None) + raise + + candidates = [ + obj for _name, obj in inspect.getmembers(mod, inspect.isclass) + if ( + issubclass(obj, ExternalDataLoader) + and obj is not ExternalDataLoader + and obj.__module__ == module_name + ) + ] + if not candidates: + msg = f"plugin {py_file.name}: no ExternalDataLoader subclass found" + _log.warning(msg) + DISABLED_LOADERS[key] = msg + sys.modules.pop(module_name, None) + return + if len(candidates) > 1: + _log.warning( + "Plugin %s defines multiple ExternalDataLoader subclasses (%s); " + "registering '%s'.", + py_file.name, + ", ".join(c.__name__ for c in candidates), + candidates[0].__name__, + ) + _register_plugin_class(key, candidates[0], py_file) + _log.info("Plugin loader '%s' registered from %s", key, py_file.name) + except ImportError as exc: + # Most common failure mode: plugin imports a package the user + # hasn't installed. Surface a useful hint to the UI. + missing = getattr(exc, "name", None) or str(exc) + hint = f"plugin {py_file.name}: missing dependency '{missing}' (pip install {missing})" + DISABLED_LOADERS[key] = hint + _log.warning("Failed to load plugin %s: %s", py_file.name, hint, exc_info=True) + except Exception as exc: + DISABLED_LOADERS[key] = f"plugin {py_file.name}: {type(exc).__name__}: {exc}" + _log.warning("Failed to load plugin %s", py_file.name, exc_info=True) + + +def _scan_plugin_dir() -> None: + """Scan ``PLUGIN_DIR`` for ``*_data_loader.py`` files. + + The registry key is derived from the filename: + ``my_custom_data_loader.py`` → ``my_custom``. Plugins override + built-ins with the same key. + """ + enabled, reason = _plugin_scanning_enabled() + if not enabled: + _log.info("Plugin scanning disabled: %s", reason) + return + + plugin_path = Path(PLUGIN_DIR) + if not plugin_path.is_dir(): + _log.debug("Plugin dir %s does not exist; skipping plugin scan", plugin_path) + return + + files = sorted(plugin_path.glob("*_data_loader.py")) + if not files: + _log.info("Plugin dir %s contains no *_data_loader.py files", plugin_path) + return + + for py_file in files: + _load_plugin_file(py_file) - found = False - for name, obj in inspect.getmembers(mod, inspect.isclass): - if ( - issubclass(obj, ExternalDataLoader) - and obj is not ExternalDataLoader - and obj.__module__ == module_name - ): - if key in DATA_LOADERS: - _log.info( - "Plugin '%s' overrides built-in loader '%s'", - py_file.name, key, - ) - DATA_LOADERS[key] = obj - found = True - _log.info("Plugin loader '%s' registered from %s", key, py_file.name) - break # one class per file - - if not found: - _log.warning( - "Plugin file %s has no ExternalDataLoader subclass", py_file.name, - ) - except Exception as exc: - _log.warning("Failed to load plugin %s: %s", py_file.name, exc) + _log.info( + "Plugin scan complete: %d registered, %d failed, %d rejected (dir=%s, reason=%s)", + len(PLUGIN_LOADERS), + sum(1 for k in DISABLED_LOADERS if k not in {s[0] for s in _LOADER_SPECS}), + len(PLUGIN_ERRORS), + plugin_path, + reason, + ) _scan_plugin_dir() @@ -162,6 +296,7 @@ def get_available_loaders() -> dict[str, type[ExternalDataLoader]]: "CatalogNode", "DATA_LOADERS", "DISABLED_LOADERS", + "PLUGIN_LOADERS", "PLUGIN_DIR", "get_available_loaders", ] diff --git a/py-src/data_formulator/data_loader/athena_data_loader.py b/py-src/data_formulator/data_loader/athena_data_loader.py index df7b98e2..8791fe0b 100644 --- a/py-src/data_formulator/data_loader/athena_data_loader.py +++ b/py-src/data_formulator/data_loader/athena_data_loader.py @@ -57,6 +57,8 @@ class AthenaDataLoader(ExternalDataLoader): Use ingest_to_workspace() to store results as parquet in the workspace. """ + DISPLAY_NAME = "Athena" + @staticmethod def list_params() -> list[dict[str, Any]]: params_list = [ diff --git a/py-src/data_formulator/data_loader/azure_blob_data_loader.py b/py-src/data_formulator/data_loader/azure_blob_data_loader.py index a2aedb15..e0d1a7bd 100644 --- a/py-src/data_formulator/data_loader/azure_blob_data_loader.py +++ b/py-src/data_formulator/data_loader/azure_blob_data_loader.py @@ -9,11 +9,13 @@ from pyarrow import fs as pa_fs from data_formulator.data_loader.external_data_loader import ExternalDataLoader, CatalogNode, MAX_IMPORT_ROWS, sanitize_table_name +from data_formulator.datalake.parquet_utils import df_to_safe_records from typing import Any logger = logging.getLogger(__name__) class AzureBlobDataLoader(ExternalDataLoader): + DISPLAY_NAME = "Azure Blob" @staticmethod def list_params() -> list[dict[str, Any]]: @@ -200,7 +202,7 @@ def list_tables(self, table_filter: str | None = None) -> list[dict[str, Any]]: 'type': str(sample_df[col].dtype) } for col in sample_df.columns] - sample_rows = json.loads(sample_df.to_json(orient="records")) + sample_rows = df_to_safe_records(sample_df) row_count = self._estimate_row_count(azure_url, blob) table_metadata = { @@ -339,7 +341,7 @@ def get_metadata(self, path: list[str]) -> dict[str, Any]: try: sample_df = self._read_sample(azure_url, 5) columns = [{"name": c, "type": str(sample_df[c].dtype)} for c in sample_df.columns] - sample_rows = json.loads(sample_df.to_json(orient="records")) + sample_rows = df_to_safe_records(sample_df) row_count = self._estimate_row_count(azure_url) return {"row_count": row_count, "columns": columns, "sample_rows": sample_rows} except Exception as e: diff --git a/py-src/data_formulator/data_loader/bigquery_data_loader.py b/py-src/data_formulator/data_loader/bigquery_data_loader.py index 3c07e716..d5119742 100644 --- a/py-src/data_formulator/data_loader/bigquery_data_loader.py +++ b/py-src/data_formulator/data_loader/bigquery_data_loader.py @@ -12,7 +12,9 @@ class BigQueryDataLoader(ExternalDataLoader): """BigQuery data loader implementation""" - + + DISPLAY_NAME = "BigQuery" + @staticmethod def list_params() -> list[dict[str, Any]]: return [ diff --git a/py-src/data_formulator/data_loader/cosmosdb_data_loader.py b/py-src/data_formulator/data_loader/cosmosdb_data_loader.py index 355666df..f3326ea3 100644 --- a/py-src/data_formulator/data_loader/cosmosdb_data_loader.py +++ b/py-src/data_formulator/data_loader/cosmosdb_data_loader.py @@ -1,4 +1,3 @@ -import json import logging from datetime import datetime @@ -9,12 +8,14 @@ from azure.cosmos.partition_key import PartitionKey from data_formulator.data_loader.external_data_loader import ExternalDataLoader, CatalogNode, MAX_IMPORT_ROWS, sanitize_table_name +from data_formulator.datalake.parquet_utils import df_to_safe_records from typing import Any logger = logging.getLogger(__name__) class CosmosDBDataLoader(ExternalDataLoader): + DISPLAY_NAME = "Cosmos DB" @staticmethod def list_params() -> list[dict[str, Any]]: @@ -266,7 +267,7 @@ def list_tables(self, table_filter: str | None = None) -> list[dict[str, Any]]: 'type': str(df[col].dtype) } for col in df.columns] - sample_rows = json.loads(df.to_json(orient="records")) + sample_rows = df_to_safe_records(df) else: columns = [] sample_rows = [] @@ -338,7 +339,7 @@ def get_metadata(self, path: list[str]) -> dict[str, Any]: if sample: df = self._process_documents(sample) columns = [{"name": c, "type": str(df[c].dtype)} for c in df.columns] - sample_rows = json.loads(df.to_json(orient="records")) + sample_rows = df_to_safe_records(df) else: columns, sample_rows = [], [] return {"row_count": row_count, "columns": columns, "sample_rows": sample_rows} diff --git a/py-src/data_formulator/data_loader/external_data_loader.py b/py-src/data_formulator/data_loader/external_data_loader.py index 26375703..d2963799 100644 --- a/py-src/data_formulator/data_loader/external_data_loader.py +++ b/py-src/data_formulator/data_loader/external_data_loader.py @@ -539,6 +539,13 @@ def auth_instructions() -> str: """Return human-readable authentication instructions.""" pass + #: Human-friendly UI label. When ``None``, the ``/api/data-loaders`` + #: endpoint falls back to title-casing the registry key. Override on + #: the subclass to fix awkward casing (``"SQLite"``, ``"BigQuery"``, + #: ``"MySQL"`` instead of the default ``"Sqlite"`` / ``"Bigquery"`` / + #: ``"Mysql"``). + DISPLAY_NAME: str | None = None + @staticmethod def delegated_login_config() -> dict[str, Any] | None: """Return config for delegated (popup-based) token login, or None. diff --git a/py-src/data_formulator/data_loader/kusto_data_loader.py b/py-src/data_formulator/data_loader/kusto_data_loader.py index 2e9b663f..80a4a531 100644 --- a/py-src/data_formulator/data_loader/kusto_data_loader.py +++ b/py-src/data_formulator/data_loader/kusto_data_loader.py @@ -5,6 +5,7 @@ import pyarrow as pa from data_formulator.data_loader.external_data_loader import ExternalDataLoader, CatalogNode, MAX_IMPORT_ROWS, sanitize_table_name +from data_formulator.datalake.parquet_utils import df_to_safe_records from azure.kusto.data import KustoClient, KustoConnectionStringBuilder from azure.kusto.data.helpers import dataframe_from_result_table @@ -12,6 +13,7 @@ logger = logging.getLogger(__name__) class KustoDataLoader(ExternalDataLoader): + DISPLAY_NAME = "Kusto" @staticmethod def list_params() -> list[dict[str, Any]]: @@ -287,7 +289,7 @@ def get_metadata(self, path: list[str]) -> dict[str, Any]: details = self.query(f".show table ['{table_name}'] details").to_dict(orient="records") row_count = int(details[0]["TotalRowCount"]) sample_df = self.query(f"['{table_name}'] | take 5") - sample_rows = json.loads(sample_df.to_json(orient="records", date_format="iso")) + sample_rows = df_to_safe_records(sample_df) result: dict[str, Any] = {"row_count": row_count, "columns": columns, "sample_rows": sample_rows} doc_string = details[0].get("DocString") if doc_string and str(doc_string).strip(): diff --git a/py-src/data_formulator/data_loader/local_folder_data_loader.py b/py-src/data_formulator/data_loader/local_folder_data_loader.py index 557820bb..240771b0 100644 --- a/py-src/data_formulator/data_loader/local_folder_data_loader.py +++ b/py-src/data_formulator/data_loader/local_folder_data_loader.py @@ -9,6 +9,7 @@ import json import logging +import os from pathlib import Path from typing import Any @@ -18,6 +19,7 @@ import pyarrow.parquet as pq from data_formulator.data_loader.external_data_loader import ExternalDataLoader, CatalogNode +from data_formulator.datalake.parquet_utils import df_to_safe_records from data_formulator.security.path_safety import ConfinedDir logger = logging.getLogger(__name__) @@ -32,6 +34,8 @@ class LocalFolderDataLoader(ExternalDataLoader): """Browse and import data files from a local directory.""" + DISPLAY_NAME = "Local Folder" + @staticmethod def list_params() -> list[dict[str, Any]]: return [ @@ -78,7 +82,11 @@ def catalog_hierarchy() -> list[dict[str, str]]: def __init__(self, params: dict[str, Any]): self.params = params - self.root_dir = Path(params.get("root_dir", "")).resolve() + raw_root = params.get("root_dir", "") or "" + # Expand ~ and environment variables (e.g. $HOME, %USERPROFILE%) so users + # can paste shell-style paths into the connect dialog. + expanded = os.path.expandvars(os.path.expanduser(raw_root)) + self.root_dir = Path(expanded).resolve() recursive_val = params.get("recursive", True) if isinstance(recursive_val, str): self.recursive = recursive_val.lower() not in ("false", "0", "no") @@ -182,8 +190,7 @@ def get_metadata(self, path: list[str]) -> dict[str, Any]: {"name": c, "type": str(sample_df[c].dtype)} for c in sample_df.columns ] - meta["sample_rows"] = json.loads( - sample_df.to_json(orient="records")) + meta["sample_rows"] = df_to_safe_records(sample_df) meta["row_count"] = meta.get("row_count") or len(sample_df) except Exception as exc: logger.debug("Sample read failed for %s: %s", path, exc) diff --git a/py-src/data_formulator/data_loader/mongodb_data_loader.py b/py-src/data_formulator/data_loader/mongodb_data_loader.py index 978ad470..7f085341 100644 --- a/py-src/data_formulator/data_loader/mongodb_data_loader.py +++ b/py-src/data_formulator/data_loader/mongodb_data_loader.py @@ -1,4 +1,3 @@ -import json import logging from datetime import datetime @@ -8,12 +7,14 @@ from bson import ObjectId from data_formulator.data_loader.external_data_loader import ExternalDataLoader, CatalogNode, MAX_IMPORT_ROWS, sanitize_table_name +from data_formulator.datalake.parquet_utils import df_to_safe_records from typing import Any logger = logging.getLogger(__name__) class MongoDBDataLoader(ExternalDataLoader): + DISPLAY_NAME = "MongoDB" @staticmethod def list_params() -> list[dict[str, Any]]: @@ -258,7 +259,7 @@ def list_tables(self, table_filter: str | None = None) -> list[dict[str, Any]]: } for col in df.columns] # Convert sample_data for return - sample_rows = json.loads(df.to_json(orient="records")) + sample_rows = df_to_safe_records(df) else: columns = [] sample_rows = [] @@ -325,7 +326,7 @@ def get_metadata(self, path: list[str]) -> dict[str, Any]: if sample: df = self._process_documents(sample) columns = [{"name": c, "type": str(df[c].dtype)} for c in df.columns] - sample_rows = json.loads(df.to_json(orient="records")) + sample_rows = df_to_safe_records(df) else: columns, sample_rows = [], [] return {"row_count": row_count, "columns": columns, "sample_rows": sample_rows} diff --git a/py-src/data_formulator/data_loader/mssql_data_loader.py b/py-src/data_formulator/data_loader/mssql_data_loader.py index c409a714..bc584a66 100644 --- a/py-src/data_formulator/data_loader/mssql_data_loader.py +++ b/py-src/data_formulator/data_loader/mssql_data_loader.py @@ -7,6 +7,7 @@ import pyodbc from data_formulator.data_loader.external_data_loader import ExternalDataLoader, CatalogNode, MAX_IMPORT_ROWS, sanitize_table_name +from data_formulator.datalake.parquet_utils import df_to_safe_records log = logging.getLogger(__name__) @@ -22,6 +23,8 @@ def _is_nan(value) -> bool: class MSSQLDataLoader(ExternalDataLoader): + DISPLAY_NAME = "SQL Server" + @staticmethod def list_params() -> list[dict[str, Any]]: params_list = [ @@ -669,7 +672,7 @@ def get_metadata(self, path: list[str]) -> dict[str, Any]: sample_df = self._execute_query( f"SELECT TOP 5 {col_list} FROM [{db}].[{schema}].[{table_name}]" ).to_pandas() - sample_rows = json.loads(sample_df.fillna(value=None).to_json(orient="records", date_format="iso", default_handler=str)) + sample_rows = df_to_safe_records(sample_df.fillna(value=None)) result: dict[str, Any] = {"row_count": row_count, "columns": columns, "sample_rows": sample_rows} if table_description: result["description"] = table_description diff --git a/py-src/data_formulator/data_loader/mysql_data_loader.py b/py-src/data_formulator/data_loader/mysql_data_loader.py index 888ea810..8ff179d9 100644 --- a/py-src/data_formulator/data_loader/mysql_data_loader.py +++ b/py-src/data_formulator/data_loader/mysql_data_loader.py @@ -15,11 +15,13 @@ _esc_id, _esc_str, ) +from data_formulator.datalake.parquet_utils import df_to_safe_records logger = logging.getLogger(__name__) class MySQLDataLoader(ExternalDataLoader): + DISPLAY_NAME = "MySQL" @staticmethod def list_params() -> list[dict[str, Any]]: @@ -458,7 +460,7 @@ def get_metadata(self, path: list[str]) -> dict[str, Any]: sample_df = self._read_sql( f"SELECT {col_list} FROM {_esc_id(db, '`')}.{_esc_id(table_name, '`')} LIMIT 5" ).to_pandas() - sample_rows = json.loads(sample_df.to_json(orient="records", date_format="iso")) + sample_rows = df_to_safe_records(sample_df) result: dict[str, Any] = { "row_count": row_count, "columns": columns, diff --git a/py-src/data_formulator/data_loader/postgresql_data_loader.py b/py-src/data_formulator/data_loader/postgresql_data_loader.py index 6e44454a..a2f858fc 100644 --- a/py-src/data_formulator/data_loader/postgresql_data_loader.py +++ b/py-src/data_formulator/data_loader/postgresql_data_loader.py @@ -19,11 +19,13 @@ _esc_id, _esc_str, ) +from data_formulator.datalake.parquet_utils import df_to_safe_records logger = logging.getLogger(__name__) class PostgreSQLDataLoader(ExternalDataLoader): + DISPLAY_NAME = "PostgreSQL" @staticmethod def list_params() -> list[dict[str, Any]]: @@ -824,7 +826,7 @@ def get_metadata(self, path: list[str]) -> dict[str, Any]: sample_df = self._read_sql_on( f'SELECT {col_list} FROM {_esc_id(schema, chr(34))}.{_esc_id(table_name, chr(34))} LIMIT 5', db ).to_pandas() - sample_rows = json.loads(sample_df.to_json(orient="records")) + sample_rows = df_to_safe_records(sample_df) result: dict[str, Any] = { "_source_name": full_source, "row_count": row_count, diff --git a/py-src/data_formulator/data_loader/s3_data_loader.py b/py-src/data_formulator/data_loader/s3_data_loader.py index 472a0d36..e096873e 100644 --- a/py-src/data_formulator/data_loader/s3_data_loader.py +++ b/py-src/data_formulator/data_loader/s3_data_loader.py @@ -10,11 +10,13 @@ from pyarrow import fs as pa_fs from data_formulator.data_loader.external_data_loader import ExternalDataLoader, CatalogNode, MAX_IMPORT_ROWS +from data_formulator.datalake.parquet_utils import df_to_safe_records logger = logging.getLogger(__name__) class S3DataLoader(ExternalDataLoader): + DISPLAY_NAME = "Amazon S3" @staticmethod def list_params() -> list[dict[str, Any]]: @@ -145,7 +147,7 @@ def list_tables(self, table_filter: str | None = None) -> list[dict[str, Any]]: 'type': str(sample_df[col].dtype) } for col in sample_df.columns] - sample_rows = json.loads(sample_df.to_json(orient="records")) + sample_rows = df_to_safe_records(sample_df) row_count = self._estimate_row_count(s3_url) table_metadata = { @@ -256,7 +258,7 @@ def get_metadata(self, path: list[str]) -> dict[str, Any]: sample = self._read_sample_arrow(s3_url, 5) sample_df = sample.to_pandas() columns = [{"name": c, "type": str(sample_df[c].dtype)} for c in sample_df.columns] - sample_rows = json.loads(sample_df.to_json(orient="records")) + sample_rows = df_to_safe_records(sample_df) row_count = self._estimate_row_count(s3_url) return {"row_count": row_count, "columns": columns, "sample_rows": sample_rows} except Exception as e: diff --git a/py-src/data_formulator/data_loader/sample_datasets_loader.py b/py-src/data_formulator/data_loader/sample_datasets_loader.py new file mode 100644 index 00000000..6c3267cf --- /dev/null +++ b/py-src/data_formulator/data_loader/sample_datasets_loader.py @@ -0,0 +1,286 @@ +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT License. + +"""Sample datasets data loader. + +Exposes the built-in ``EXAMPLE_DATASETS`` catalog as a virtual data +connector that behaves exactly like any other connector. No auth, no +external service of its own — table data is fetched on demand from the +public URLs declared in :mod:`data_formulator.example_datasets_config`. + +The connector is registered unconditionally at startup so that even in +``--disable_database`` mode users still have a zero-config way to load +data and explore Data Formulator. +""" + +from __future__ import annotations + +import io +import json +import logging +import threading +from typing import Any + +import pandas as pd +import pyarrow as pa + +from data_formulator.data_loader.external_data_loader import ExternalDataLoader +from data_formulator.datalake.parquet_utils import df_to_safe_records + +logger = logging.getLogger(__name__) + +# In-process cache for sample dataset DataFrames keyed by (url, format). +# These URLs are static public datasets, so caching is safe and dramatically +# speeds up repeat previews/loads (no network + parse cost on every click). +# Bounded by a soft cap to avoid unbounded memory growth if the catalog ever +# expands; eviction is simple FIFO since access patterns are interactive. +_SAMPLE_CACHE: dict[tuple[str, str], pd.DataFrame] = {} +_SAMPLE_CACHE_ORDER: list[tuple[str, str]] = [] +_SAMPLE_CACHE_LOCK = threading.Lock() +_SAMPLE_CACHE_MAX = 64 + + +class SampleDatasetsLoader(ExternalDataLoader): + """Browse and import the built-in sample datasets.""" + + # ------------------------------------------------------------------ + # Metadata + # ------------------------------------------------------------------ + + @staticmethod + def list_params() -> list[dict[str, Any]]: + return [] + + @staticmethod + def auth_instructions() -> str: + return ( + "Built-in sample datasets are always available. " + "No configuration or credentials required." + ) + + @staticmethod + def auth_mode() -> str: + # ``"none"`` declares that this loader needs no authentication and no + # connection setup. The connector framework treats such loaders as + # always-on: they cannot be connected/disconnected, expose no + # credentials UI, and are always reported as ``connected: true``. + return "none" + + @staticmethod + def catalog_hierarchy() -> list[dict[str, str]]: + return [ + {"key": "dataset", "label": "Dataset"}, + {"key": "table", "label": "Table"}, + ] + + # ------------------------------------------------------------------ + # Lifecycle + # ------------------------------------------------------------------ + + def __init__(self, params: dict[str, Any] | None = None): + self.params = params or {} + + def test_connection(self) -> bool: + return True + + # ------------------------------------------------------------------ + # Internal helpers + # ------------------------------------------------------------------ + + def _datasets(self) -> list[dict[str, Any]]: + from data_formulator.example_datasets_config import EXAMPLE_DATASETS + return EXAMPLE_DATASETS + + @staticmethod + def _table_stem(table_entry: dict[str, Any], idx: int) -> str: + url = table_entry.get("url", "") + last = url.split("/")[-1].split("?")[0] + stem = last.rsplit(".", 1)[0] if "." in last else last + return stem or f"table_{idx}" + + def _columns_from_sample(self, sample: Any, fmt: str) -> tuple[list[dict], list[dict]]: + """Infer ``(columns, sample_rows)`` from an embedded preview payload.""" + columns: list[dict] = [] + sample_rows: list[dict] = [] + if isinstance(sample, list) and sample: + first = sample[0] if isinstance(sample[0], dict) else {} + for name, value in first.items(): + ctype = type(value).__name__ if value is not None else "string" + columns.append({"name": str(name), "type": ctype}) + sample_rows = [r for r in sample[:10] if isinstance(r, dict)] + elif isinstance(sample, str) and sample.strip(): + sep = "," if (fmt or "csv").lower() == "csv" else "\t" + try: + df = pd.read_csv(io.StringIO(sample.strip()), sep=sep) + columns = [ + {"name": str(c), "type": str(df[c].dtype)} + for c in df.columns + ] + sample_rows = df_to_safe_records(df.head(10)) + except Exception: + logger.debug("Failed to parse sample CSV preview", exc_info=True) + return columns, sample_rows + + def _resolve(self, source_table: str) -> tuple[dict, dict, int] | None: + """Look up ``(dataset, table_entry, table_idx)`` by ``"Dataset/stem"``. + + Also accepts the bare dataset name when the dataset has a single + table, for convenience. + """ + if not source_table: + return None + parts = source_table.split("/", 1) + ds_name = parts[0] + wanted_stem = parts[1] if len(parts) == 2 else None + for ds in self._datasets(): + if ds.get("name") != ds_name: + continue + tables = ds.get("tables", []) or [] + if wanted_stem is None and len(tables) == 1: + return ds, tables[0], 0 + for idx, t in enumerate(tables): + if self._table_stem(t, idx) == wanted_stem: + return ds, t, idx + return None + + # ------------------------------------------------------------------ + # Catalog + # ------------------------------------------------------------------ + + def list_tables(self, table_filter: str | None = None) -> list[dict[str, Any]]: + needle = (table_filter or "").strip().lower() + results: list[dict[str, Any]] = [] + for ds in self._datasets(): + ds_name = ds["name"] + desc = ds.get("description", "") or "" + tables = ds.get("tables", []) or [] + # Collapse single-table datasets to a single top-level entry so the + # sidebar doesn't render dozens of folders containing one child. + # Multi-table datasets keep the 2-level (dataset / table) hierarchy. + collapse = len(tables) == 1 + for idx, t in enumerate(tables): + stem = self._table_stem(t, idx) + if collapse: + source_id = ds_name + path = [ds_name] + else: + source_id = f"{ds_name}/{stem}" + path = [ds_name, stem] + if needle and needle not in source_id.lower() and needle not in desc.lower(): + continue + fmt = (t.get("format") or "json").lower() + columns, sample_rows = self._columns_from_sample(t.get("sample"), fmt) + results.append({ + "name": source_id, + "table_key": source_id, + "path": path, + "metadata": { + "description": desc, + "columns": columns, + "sample_rows": sample_rows, + "row_count": None, + "_source_name": source_id, + "_format": fmt, + "_url": t.get("url", ""), + "_live": bool(ds.get("live", False)), + "_refresh_interval_seconds": ds.get("refreshIntervalSeconds"), + }, + }) + return results + + def get_column_types(self, source_table: str) -> dict[str, Any]: + resolved = self._resolve(source_table) + if not resolved: + return {} + ds, t, _ = resolved + fmt = (t.get("format") or "json").lower() + columns, _rows = self._columns_from_sample(t.get("sample"), fmt) + return { + "columns": columns, + "description": ds.get("description", ""), + } + + # ------------------------------------------------------------------ + # Data fetch + # ------------------------------------------------------------------ + + def fetch_data_as_arrow( + self, + source_table: str, + import_options: dict[str, Any] | None = None, + ) -> pa.Table: + resolved = self._resolve(source_table) + if not resolved: + raise ValueError(f"Unknown sample table: {source_table!r}") + _ds, t, _idx = resolved + url = t.get("url", "") + fmt = (t.get("format") or "json").lower() + if not url: + raise ValueError(f"Sample table {source_table!r} has no URL configured") + + df = self._load_full_dataframe(url, fmt, source_table) + + # Capture the true total BEFORE any slicing so callers can report + # the real row count even when ``size`` truncates the preview. + self._last_total_rows = len(df) + + opts = import_options or {} + size = opts.get("size") + if isinstance(size, int) and size > 0 and len(df) > size: + df = df.head(size) + + logger.info("Returning %d / %d rows from sample dataset: %s", + len(df), self._last_total_rows, source_table) + return pa.Table.from_pandas(df, preserve_index=False) + + # ------------------------------------------------------------------ + # Internal: cached full-dataset fetch + # ------------------------------------------------------------------ + + def _load_full_dataframe(self, url: str, fmt: str, source_table: str) -> pd.DataFrame: + """Return the full parsed DataFrame for a sample dataset URL. + + Results are cached in-process: sample dataset URLs are static and + small, and previews/loads otherwise re-download + re-parse the + entire file on every click, which is visibly slow for larger + examples (Gapminder, Disasters, ...). + """ + key = (url, fmt) + with _SAMPLE_CACHE_LOCK: + cached = _SAMPLE_CACHE.get(key) + if cached is not None: + # Return a shallow copy so downstream slicing (``.head(size)``) + # doesn't mutate views the cache might re-emit later. + return cached.copy(deep=False) + + import requests + logger.info("Fetching sample dataset over network: %s (%s)", source_table, url) + resp = requests.get(url, timeout=30) + resp.raise_for_status() + text = resp.text + + if fmt == "csv": + df = pd.read_csv(io.StringIO(text)) + elif fmt == "tsv": + df = pd.read_csv(io.StringIO(text), sep="\t") + else: + payload = json.loads(text) + if isinstance(payload, dict): + # Common JSON shapes: {data: [...]}, {rows: [...]}, or a single record + for k in ("data", "rows", "records", "items"): + if isinstance(payload.get(k), list): + payload = payload[k] + break + else: + payload = [payload] + df = pd.DataFrame(payload) + + with _SAMPLE_CACHE_LOCK: + if key not in _SAMPLE_CACHE: + _SAMPLE_CACHE[key] = df + _SAMPLE_CACHE_ORDER.append(key) + # FIFO eviction once we exceed the cap. + while len(_SAMPLE_CACHE_ORDER) > _SAMPLE_CACHE_MAX: + evict = _SAMPLE_CACHE_ORDER.pop(0) + _SAMPLE_CACHE.pop(evict, None) + return df.copy(deep=False) diff --git a/py-src/data_formulator/data_loader/superset_data_loader.py b/py-src/data_formulator/data_loader/superset_data_loader.py index d9c1d010..c5a4f52a 100644 --- a/py-src/data_formulator/data_loader/superset_data_loader.py +++ b/py-src/data_formulator/data_loader/superset_data_loader.py @@ -41,6 +41,8 @@ class SupersetLoader(ExternalDataLoader): "All Datasets" namespace at the root level. """ + DISPLAY_NAME = "Superset" + @staticmethod def list_params() -> list[dict[str, Any]]: return [ diff --git a/py-src/data_formulator/datalake/catalog_cache.py b/py-src/data_formulator/datalake/catalog_cache.py index 6e90c3f1..091f6929 100644 --- a/py-src/data_formulator/datalake/catalog_cache.py +++ b/py-src/data_formulator/datalake/catalog_cache.py @@ -22,6 +22,7 @@ import json import logging import os +import re from datetime import datetime, timezone from pathlib import Path from typing import Any @@ -34,6 +35,14 @@ CATALOG_CACHE_DIR = "catalog_cache" +class CatalogSearchError(ValueError): + """Raised when a catalog search receives a malformed query (e.g. bad regex). + + Agent tools should catch this and surface the message verbatim so the + model can correct its query, instead of returning an empty result set. + """ + + def _cache_dir(workspace_root: Path | str) -> Path: return Path(workspace_root) / CATALOG_CACHE_DIR @@ -103,7 +112,25 @@ def _load_catalog_raw(workspace_root: Path | str, source_id: str) -> dict[str, A def load_catalog(workspace_root: Path | str, source_id: str) -> list[dict[str, Any]] | None: - """Load cached catalog. Returns None if not found or corrupted.""" + """Load cached catalog. Returns None if not found or corrupted. + + In disabled-connectors mode, only admin source_ids (e.g. + ``sample_datasets``) are readable — user catalogs on disk are hidden. + """ + try: + from flask import current_app + disabled = bool( + current_app.config.get('CLI_ARGS', {}).get('disable_data_connectors') + ) + except RuntimeError: + disabled = False + if disabled: + try: + from data_formulator.data_connector import _ADMIN_CONNECTOR_IDS + if source_id not in _ADMIN_CONNECTOR_IDS: + return None + except Exception: + pass raw = _load_catalog_raw(workspace_root, source_id) if raw is None: return None @@ -126,15 +153,55 @@ def delete_catalog(workspace_root: Path | str, source_id: str) -> None: def list_cached_sources(workspace_root: Path | str) -> list[str]: - """Return source IDs (sanitised stems) that have a cached catalog. + """Return the original source IDs that have a cached catalog. + + Each cache file stores the original (un-sanitised) ``source_id`` so that + ``mysql:mysql`` round-trips correctly even though its filename stem is + ``mysql--mysql``. We prefer that stored value here; consumers (agent + context, ``load_catalog``, ``delete_catalog``) all accept the original + id and re-apply ``safe_source_id`` internally when touching the disk. - The returned strings are filename-safe stems, usable as keys for - ``load_catalog`` / ``delete_catalog``. + Falls back to the filename stem if a cache file is missing or corrupt. + + When external connectors are disabled (browser-only / hosted mode), + only built-in admin source IDs (e.g. ``sample_datasets``) are + returned. This keeps the agent's data-discovery tools consistent with + the sidebar — previously-persisted user catalogs on disk stay there + but aren't surfaced. """ cache_dir = _cache_dir(workspace_root) if not cache_dir.exists(): return [] - return [p.stem for p in cache_dir.glob("*.json")] + sources: list[str] = [] + for path in cache_dir.glob("*.json"): + original: str | None = None + try: + with open(path, "r", encoding="utf-8") as f: + raw = json.load(f) + if isinstance(raw, dict): + value = raw.get("source_id") + if isinstance(value, str) and value: + original = value + except Exception: + logger.debug("Failed to read source_id from %s", path, exc_info=True) + sources.append(original or path.stem) + + # Filter to admin-only sources when external connectors are disabled. + try: + from flask import current_app + disabled = bool( + current_app.config.get('CLI_ARGS', {}).get('disable_data_connectors') + ) + except RuntimeError: + disabled = False + if disabled: + try: + from data_formulator.data_connector import _ADMIN_CONNECTOR_IDS + allowed = set(_ADMIN_CONNECTOR_IDS) + sources = [s for s in sources if s in allowed] + except Exception: + logger.debug("Failed to filter cached sources by admin set", exc_info=True) + return sources def _search_python( @@ -143,9 +210,30 @@ def _search_python( all_ids: list[str], exclude: set[str], limit_per_source: int, + *, + exclude_pattern: re.Pattern | None = None, + fields: set[str] | None = None, + path_prefix: list[str] | None = None, ) -> list[dict[str, Any]]: - """Python-based structured field search over the on-disk catalog cache.""" + """Structured field search over the on-disk catalog cache. + + ``needle`` is always a regex pattern (case-insensitive). Callers who + want literal substring matching should ``re.escape`` first. Invalid + patterns raise :class:`CatalogSearchError`. + """ + match_fields = fields if fields is not None else {"name", "description", "columns"} + + try: + compiled = re.compile(needle, re.IGNORECASE) + except re.error as exc: + raise CatalogSearchError(f"Invalid query regex: {exc}") from exc + + def _matches(text: str) -> bool: + return bool(text) and compiled.search(text) is not None + results: list[dict[str, Any]] = [] + plen = len(path_prefix) if path_prefix else 0 + prefix = list(path_prefix or []) for sid in all_ids: raw = _load_catalog_raw(workspace_root, sid) @@ -161,36 +249,49 @@ def _search_python( if tname in exclude: continue + # Path-prefix filter + if plen: + tpath = t.get("path") or [] + if not isinstance(tpath, list) or len(tpath) < plen: + continue + if [str(s) for s in tpath[:plen]] != prefix: + continue + + # Exclude pattern (regex on name) + if exclude_pattern is not None and exclude_pattern.search(tname): + continue + score = 0 matched_cols: list[str] = [] match_reasons: list[str] = [] meta = t.get("metadata") or {} table_key = t.get("table_key", "") - if needle in tname.lower(): + if "name" in match_fields and _matches(tname): score += 10 match_reasons.append("table_name") # Source description src_desc = meta.get("description", "") - if src_desc and needle in src_desc.lower(): + if "description" in match_fields and src_desc and _matches(src_desc): score += 5 match_reasons.append("source_description") # Source columns - for col in meta.get("columns", []): - cname = col.get("name", "") - if needle in cname.lower(): - matched_cols.append(cname) - score += 2 - if "column_name" not in match_reasons: - match_reasons.append("column_name") - cdesc = col.get("description", "") - if cdesc and needle in cdesc.lower(): - matched_cols.append(cname) - score += 1 - if "source_column_description" not in match_reasons: - match_reasons.append("source_column_description") + if "columns" in match_fields: + for col in meta.get("columns", []): + cname = col.get("name", "") + if cname and _matches(cname): + matched_cols.append(cname) + score += 2 + if "column_name" not in match_reasons: + match_reasons.append("column_name") + cdesc = col.get("description", "") + if cdesc and _matches(cdesc): + matched_cols.append(cname) + score += 1 + if "source_column_description" not in match_reasons: + match_reasons.append("source_column_description") if score > 0: source_hits.append({ @@ -211,122 +312,227 @@ def _search_python( return results -def _search_duckdb( - workspace_root: Path | str, - needle: str, - all_ids: list[str], - exclude: set[str], - limit_per_source: int, -) -> list[dict[str, Any]]: - """DuckDB-based catalog cache search using read_json_auto + SQL.""" - import duckdb - - results: list[dict[str, Any]] = [] - like_pat = f"%{needle}%" - - for sid in all_ids: - path = _cache_file(workspace_root, sid) - if not path.exists(): - continue - - escaped = str(path).replace("'", "''") - conn = duckdb.connect(":memory:") - try: - # Flatten tables array from the JSON cache file - rows = conn.execute(f""" - WITH raw AS ( - SELECT unnest(tables) AS t - FROM read_json_auto('{escaped}', format='newline_delimited', - union_by_name=true, maximum_object_size=104857600) - ), - base AS ( - SELECT - t.name AS tname, - COALESCE(t.metadata.description, '') AS tdesc, - t.metadata.columns AS cols, - CASE WHEN lower(t.name) LIKE ? THEN 10 ELSE 0 END - + CASE WHEN COALESCE(t.metadata.description, '') != '' - AND lower(COALESCE(t.metadata.description, '')) LIKE ? - THEN 5 ELSE 0 END AS base_score - FROM raw - ) - SELECT tname, tdesc, cols, base_score - FROM base - WHERE tname NOT IN (SELECT unnest(?::VARCHAR[])) - ORDER BY base_score DESC - """, [like_pat, like_pat, list(exclude)]).fetchall() - - # Determine original source_id and build table_key lookup from raw data - raw = _load_catalog_raw(workspace_root, sid) - original_source_id = raw.get("source_id", sid) if raw else sid - tkey_lookup = { - t.get("name", ""): t.get("table_key", "") - for t in (raw.get("tables", []) if raw else []) - } - - source_hits: list[dict[str, Any]] = [] - for tname, tdesc, cols_raw, base_score in rows: - score = base_score - matched_cols: list[str] = [] - cols = cols_raw if isinstance(cols_raw, list) else [] - for col in cols: - if not isinstance(col, dict): - continue - cname = col.get("name", "") - if needle in cname.lower(): - matched_cols.append(cname) - score += 2 - cdesc = col.get("description", "") - if cdesc and needle in cdesc.lower(): - matched_cols.append(cname) - score += 1 - - if score > 0: - source_hits.append({ - "source_id": original_source_id, - "table_key": tkey_lookup.get(tname, ""), - "name": tname, - "description": tdesc, - "matched_columns": list(dict.fromkeys(matched_cols)), - "score": score, - }) - - source_hits.sort(key=lambda r: -r["score"]) - results.extend(source_hits[:limit_per_source]) - except Exception: - logger.debug("DuckDB search failed for source %s", sid, exc_info=True) - finally: - conn.close() - - results.sort(key=lambda r: -r["score"]) - return results - - def search_catalog_cache( workspace_root: Path | str, query: str, source_ids: list[str] | None = None, limit_per_source: int = 20, exclude_tables: set[str] | None = None, + *, + exclude_pattern: str | None = None, + fields: list[str] | None = None, + path_prefix: list[str] | None = None, ) -> list[dict[str, Any]]: - """Search across cached catalogs for tables matching a keyword. + """Search across cached catalogs for tables matching a regex pattern. + + ``query`` is treated as a case-insensitive regex. Callers passing + user-typed keywords should ``re.escape`` the input first. Invalid + patterns raise :class:`CatalogSearchError`. Returns a flat list of match dicts with fields: ``source_id``, ``table_key``, ``name``, ``description``, ``matched_columns``, ``score``, ``match_reasons``, ``metadata_status``. - Prefers DuckDB for candidate retrieval. Falls back to pure Python - search if DuckDB is unavailable. + ``exclude_pattern``, ``fields``, and ``path_prefix`` further constrain + the search. """ - needle = (query or "").strip().lower() - if not needle: + needle_raw = (query or "").strip() + if not needle_raw: return [] exclude = exclude_tables or set() all_ids = source_ids or list_cached_sources(workspace_root) - try: - return _search_duckdb(workspace_root, needle, all_ids, exclude, limit_per_source) - except Exception: - logger.debug("DuckDB catalog search failed, falling back to Python", exc_info=True) - return _search_python(workspace_root, needle, all_ids, exclude, limit_per_source) + # Compile exclude pattern up-front so a bad pattern surfaces clearly. + excl_re = None + if exclude_pattern: + try: + excl_re = re.compile(exclude_pattern, re.IGNORECASE) + except re.error as exc: + raise CatalogSearchError(f"Invalid exclude regex: {exc}") from exc + + fields_set = set(fields) if fields else None + + return _search_python( + workspace_root, + needle_raw, + all_ids, + exclude, + limit_per_source, + exclude_pattern=excl_re, + fields=fields_set, + path_prefix=list(path_prefix or []), + ) + + +# --------------------------------------------------------------------------- +# Hierarchy navigation (used by the data loading agent's list_data tool) +# --------------------------------------------------------------------------- + +# Hard cap on entries returned in one list_path_children response. See +# design-docs/32-data-loading-agent-navigation.md §5. Truncation pushes the +# agent toward find_data or a tighter filter rather than pagination. +LIST_DATA_LIMIT = 200 + + +def list_sources_summary( + workspace_root: Path | str, +) -> list[dict[str, Any]]: + """Return a per-source summary suitable for ``list_data()`` with no args. + + Each entry: ``{source_id, table_count, is_hierarchical}``. Sources whose + cache file is missing or unreadable are skipped silently — the agent + treats the cache as ground truth (see design-docs §8). + """ + out: list[dict[str, Any]] = [] + for sid in list_cached_sources(workspace_root): + raw = _load_catalog_raw(workspace_root, sid) + if not raw: + continue + tables = raw.get("tables", []) or [] + is_hier = False + for t in tables: + p = t.get("path") + if isinstance(p, list) and len(p) >= 2: + is_hier = True + break + out.append({ + "source_id": raw.get("source_id", sid), + "table_count": len(tables), + "is_hierarchical": is_hier, + }) + out.sort(key=lambda r: r["source_id"]) + return out + + +def list_path_children( + workspace_root: Path | str, + source_id: str, + path: list[str] | None = None, + filter: str | None = None, + limit: int = LIST_DATA_LIMIT, +) -> dict[str, Any]: + """List direct children at a hierarchy level within a source's catalog. + + Path semantics: each cached table record has ``path: list[str]``. The + final element is the table's leaf name in the tree view; earlier elements + are folder segments. For a query at depth ``K = len(path)``: + + * **Folders** = distinct ``path[K]`` from records with ``len(path) >= K+2`` + whose first ``K`` segments equal the input path. + * **Tables** = records with ``len(path) == K+1`` whose first ``K`` segments + equal the input path. At depth 0 we additionally surface records with + empty path, using their ``name`` as the leaf. + + ``filter`` is a case-insensitive substring match on the immediate child + segment / table name (the *next* segment after the prefix), equivalent to + ``ls /**``. Not a regex — keep this primitive cheap. + + Returns ``{source_id, path, folders, tables, total_folders, total_tables, + truncated, hint?}``. Combined ``folders + tables`` are capped at ``limit`` + (folders take precedence to preserve drill-down). + """ + path = [str(p) for p in (path or [])] + K = len(path) + cap = max(1, min(int(limit or LIST_DATA_LIMIT), LIST_DATA_LIMIT)) + filt = (filter or "").strip().lower() or None + + raw = _load_catalog_raw(workspace_root, source_id) + if not raw: + return { + "source_id": source_id, + "path": path, + "folders": [], + "tables": [], + "total_folders": 0, + "total_tables": 0, + "truncated": False, + } + + original_sid = raw.get("source_id", source_id) + tables_raw = raw.get("tables", []) or [] + + folder_counts: dict[str, int] = {} + leaf_tables: list[dict[str, Any]] = [] + + for t in tables_raw: + tname = t.get("name", "") + tpath = t.get("path") or [] + if not isinstance(tpath, list): + tpath = [] + tpath = [str(s) for s in tpath] + plen = len(tpath) + + # Prefix must match exactly for K elements. + if plen < K: + continue + if tpath[:K] != path: + continue + + # Folder: at least one more segment after the prefix beyond the leaf. + if plen >= K + 2: + seg = tpath[K] + if filt and filt not in seg.lower(): + continue + folder_counts[seg] = folder_counts.get(seg, 0) + 1 + continue + + # Table at this level. + if plen == K + 1: + leaf = tpath[K] + elif plen == K and K == 0: + # Empty-path tables surface only at root. + leaf = tname + else: + continue + + if filt and filt not in leaf.lower(): + continue + + meta = t.get("metadata") or {} + desc = (meta.get("description") or "")[:120] + leaf_tables.append({ + "name": leaf, + "table_key": t.get("table_key", "") or "", + "description": desc, + }) + + # Sort folders by table_count desc then name; tables by name. + folders = [ + {"name": name, "table_count": cnt} + for name, cnt in sorted( + folder_counts.items(), key=lambda kv: (-kv[1], kv[0]) + ) + ] + leaf_tables.sort(key=lambda r: r["name"]) + + total_folders = len(folders) + total_tables = len(leaf_tables) + total = total_folders + total_tables + truncated = total > cap + + # Combined cap: folders first (drill-down has higher value), then tables. + if total_folders >= cap: + folders = folders[:cap] + leaf_tables = [] + else: + leaf_tables = leaf_tables[: cap - total_folders] + + result: dict[str, Any] = { + "source_id": original_sid, + "path": path, + "folders": folders, + "tables": leaf_tables, + "total_folders": total_folders, + "total_tables": total_tables, + "truncated": truncated, + } + if truncated: + remaining = total - len(folders) - len(leaf_tables) + result["hint"] = ( + f"{remaining} more entries not shown. Use list_path_children(filter=...) " + f"to narrow, or find_data(query=..., scope='{original_sid}" + + (":" + "/".join(path) if path else "") + + "') to search this subtree." + ) + return result diff --git a/py-src/data_formulator/datalake/parquet_utils.py b/py-src/data_formulator/datalake/parquet_utils.py index cd5be208..3049b067 100644 --- a/py-src/data_formulator/datalake/parquet_utils.py +++ b/py-src/data_formulator/datalake/parquet_utils.py @@ -10,6 +10,7 @@ """ import hashlib +import json import logging import re from pathlib import Path @@ -89,6 +90,22 @@ def get_sample_rows_from_arrow( return make_json_safe(sample.to_pylist()) +def df_to_safe_records(df: pd.DataFrame) -> list[dict[str, Any]]: + """Convert a pandas DataFrame to a list of JSON-safe record dicts. + + Uses ``date_format='iso'`` so that datetime columns are serialized as + ISO-8601 strings instead of epoch milliseconds. ``default_handler=str`` + provides a safety net for exotic types (Decimal, bytes, etc.). + + All code that converts a DataFrame to records for API responses or + streaming should call this function rather than using ``df.to_json`` + / ``df.to_dict`` directly. + """ + return json.loads( + df.to_json(orient="records", date_format="iso", default_handler=str) + ) + + def normalize_dtype_to_app_type(dtype_str: str) -> str: """Map a pandas/Arrow dtype string to a standardized App Type label. diff --git a/py-src/data_formulator/datalake/workspace_manager.py b/py-src/data_formulator/datalake/workspace_manager.py index 7f3712b0..679452ca 100644 --- a/py-src/data_formulator/datalake/workspace_manager.py +++ b/py-src/data_formulator/datalake/workspace_manager.py @@ -408,21 +408,40 @@ def rename_workspace(self, old_id: str, new_id: str) -> Path: return new_dir def update_display_name(self, workspace_id: str, display_name: str) -> None: - """Update only the displayName in workspace_meta.json (no full state write).""" + """Update the displayName in workspace_meta.json and session_state.json. + + Write-through: both files are updated so they stay consistent + even when the workspace is not currently open in the frontend. + """ safe = self._safe_id(workspace_id) meta_file = self._root / safe / WORKSPACE_META_FILENAME if not meta_file.exists(): self._write_meta(workspace_id, display_name) - return - try: - meta = json.loads(meta_file.read_text(encoding="utf-8")) - except Exception: - meta = {} - meta["displayName"] = display_name - meta["updatedAt"] = datetime.now(tz=timezone.utc).isoformat() - meta_file.write_text( - json.dumps(meta, ensure_ascii=False), encoding="utf-8", - ) + else: + try: + meta = json.loads(meta_file.read_text(encoding="utf-8")) + except Exception: + meta = {} + meta["displayName"] = display_name + meta["updatedAt"] = datetime.now(tz=timezone.utc).isoformat() + meta_file.write_text( + json.dumps(meta, ensure_ascii=False), encoding="utf-8", + ) + + # Write-through: patch session_state.json if it exists + state_file = self._root / safe / SESSION_STATE_FILENAME + if state_file.exists(): + try: + state = json.loads(state_file.read_text(encoding="utf-8")) + aw = state.get("activeWorkspace") + if isinstance(aw, dict): + aw["displayName"] = display_name + state_file.write_text( + json.dumps(state, default=str, ensure_ascii=False), + encoding="utf-8", + ) + except Exception: + pass # best-effort; next auto-save will sync # ── Session state persistence ──────────────────────────────────── diff --git a/py-src/data_formulator/knowledge/store.py b/py-src/data_formulator/knowledge/store.py index ecdfbd3e..0b290093 100644 --- a/py-src/data_formulator/knowledge/store.py +++ b/py-src/data_formulator/knowledge/store.py @@ -31,7 +31,7 @@ _MAX_DEPTH = { "rules": 1, - "experiences": 1, # flat: only "file.md" (no sub-directories) + "experiences": 2, # one sub-dir: "category/file.md" } KNOWLEDGE_LIMITS: dict[str, int] = { diff --git a/py-src/data_formulator/model_registry.py b/py-src/data_formulator/model_registry.py index c2611670..5b901d4c 100644 --- a/py-src/data_formulator/model_registry.py +++ b/py-src/data_formulator/model_registry.py @@ -7,20 +7,6 @@ BUILTIN_PROVIDERS = {'openai', 'azure', 'anthropic', 'gemini', 'ollama'} -def is_likely_text_only_model(model_name: str | None) -> bool: - """Return True for known model names that reject image input.""" - return "deepseek-chat" in (model_name or "").lower() - - -def model_supports_vision(model_config: dict | None) -> bool: - """Infer whether a model can receive image input.""" - if not model_config: - return False - if model_config.get("supports_vision") is False: - return False - return not is_likely_text_only_model(model_config.get("model")) - - class ModelRegistry: """ Load global model configurations from environment variables. @@ -92,7 +78,6 @@ def _reload(self) -> None: "api_base": api_base, "api_version": api_version, "provider_display": provider, - "supports_vision": not is_likely_text_only_model(model_name), } def get_config(self, model_id: str) -> Optional[dict]: @@ -111,7 +96,6 @@ def list_public(self) -> list: "model": m["model"], "api_base": m["api_base"], "api_version": m["api_version"], - "supports_vision": m.get("supports_vision", True), "is_global": True, } for m in self._models.values() diff --git a/py-src/data_formulator/prompts/__init__.py b/py-src/data_formulator/prompts/__init__.py deleted file mode 100644 index 59e481eb..00000000 --- a/py-src/data_formulator/prompts/__init__.py +++ /dev/null @@ -1,2 +0,0 @@ -# Copyright (c) Microsoft Corporation. -# Licensed under the MIT License. diff --git a/py-src/data_formulator/prompts/chart_creation_guide.py b/py-src/data_formulator/prompts/chart_creation_guide.py deleted file mode 100644 index 2a201f8d..00000000 --- a/py-src/data_formulator/prompts/chart_creation_guide.py +++ /dev/null @@ -1,49 +0,0 @@ -# Copyright (c) Microsoft Corporation. -# Licensed under the MIT License. - -"""Chart creation guide injected lazily on the first ``visualize`` tool call. - -This module consolidates the code-execution rules, chart type reference, -semantic type reference, and statistical analysis guidance that the data -agent needs when writing ``visualize`` tool calls. The content is -extracted from the existing ``DataRecAgent`` / ``DataTransformationAgent`` -prompts and de-duplicated. -""" - -from data_formulator.agents.agent_data_rec import ( - SHARED_CHART_REFERENCE, - SHARED_DUCKDB_NOTES, - SHARED_ENVIRONMENT, - SHARED_SEMANTIC_TYPE_REFERENCE, - SHARED_STATISTICAL_ANALYSIS, -) - -CHART_CREATION_GUIDE = f"""\ -## Chart Creation Guide - -The following reference material applies when you call the `visualize` tool. - -### A. Code Execution Rules - -{SHARED_ENVIRONMENT} - -{SHARED_DUCKDB_NOTES} - -**Datetime handling:** -- `date` columns contain date-only values (YYYY-MM-DD). `datetime` columns contain date+time (ISO 8601). -- `time` columns contain time-only values (HH:mm:ss). `duration` columns are time intervals. -- Year → number. Year-month / year-month-day → string ("2020-01" / "2020-01-01"). -- Hour alone → number. Hour:min or h:m:s → string. Never return raw datetime objects. - -### B. Chart Type Reference - -{SHARED_CHART_REFERENCE} - -### C. Semantic Type Reference - -{SHARED_SEMANTIC_TYPE_REFERENCE} - -### D. Statistical Analysis Guide - -{SHARED_STATISTICAL_ANALYSIS} -""" diff --git a/py-src/data_formulator/routes/agents.py b/py-src/data_formulator/routes/agents.py index 41951193..cdbf8808 100644 --- a/py-src/data_formulator/routes/agents.py +++ b/py-src/data_formulator/routes/agents.py @@ -25,17 +25,17 @@ from data_formulator.agents.agent_simple import SimpleAgents from data_formulator.auth.identity import get_identity_id from data_formulator.security.code_signing import sign_result, verify_code, MAX_CODE_SIZE +from data_formulator.datalake.parquet_utils import df_to_safe_records from data_formulator.datalake.workspace import Workspace, get_user_home from data_formulator.workspace_factory import get_workspace from data_formulator.agents.agent_data_load import DataLoadAgent -from data_formulator.agents.agent_data_clean_stream import DataCleanAgentStream from data_formulator.agents.agent_data_loading_chat import DataLoadingAgent from data_formulator.agents.agent_code_explanation import CodeExplanationAgent from data_formulator.agents.agent_chart_insight import ChartInsightAgent from data_formulator.agents.agent_interactive_explore import InteractiveExploreAgent from data_formulator.agents.agent_report_gen import ReportGenAgent from data_formulator.agents.client_utils import Client -from data_formulator.model_registry import model_registry, model_supports_vision +from data_formulator.model_registry import model_registry from data_formulator.knowledge.store import KnowledgeStore from data_formulator.agents.data_agent import DataAgent @@ -114,15 +114,6 @@ def _with_warnings(gen): yield w -def _messages_include_image(messages: list[dict]) -> bool: - """Return True when a chat payload contains user image attachments.""" - for msg in messages: - for att in msg.get("attachments") or []: - if att.get("type") == "image" and att.get("url"): - return True - return False - - @agent_bp.after_request def _set_cors(response): """Set CORS headers from server configuration. @@ -305,63 +296,6 @@ def process_data_on_load_request(): raise classify_and_wrap_llm_error(e) from e -@agent_bp.route('/clean-data-stream', methods=['GET', 'POST']) -def clean_data_stream_request(): - from data_formulator.error_handler import stream_error_event - - if not request.is_json: - return stream_preflight_error(AppError(ErrorCode.INVALID_REQUEST, "Invalid request format")) - - content = request.get_json() - client = get_client(content['model']) - - logger.info("# clean-data-stream request") - logger.debug(f" model: {content['model']}") - - language_instruction = get_language_instruction() - prompt = content.get('prompt', '') - artifacts = content.get('artifacts', []) - dialog = content.get('dialog', []) - - def generate(): - agent = DataCleanAgentStream(client=client, language_instruction=language_instruction) - try: - for chunk in agent.stream(prompt, artifacts, dialog): - stripped = chunk.strip() - if stripped.startswith("{"): - try: - result = json.loads(stripped) - except (json.JSONDecodeError, ValueError): - result = None - if isinstance(result, dict): - if result.get("status") == "ok": - yield json.dumps({ - "type": "result", - "data": result, - }, ensure_ascii=False) + "\n" - else: - yield stream_error_event(AppError( - ErrorCode.AGENT_ERROR, - sanitize_error_message(result.get("content", "Unable to extract tables")), - )) - continue - yield json.dumps({"type": "text_delta", "text": chunk}, ensure_ascii=False) + "\n" - except Exception as e: - logger.error("clean-data-stream error", exc_info=e) - if 'unable to download html from url' in str(e): - yield stream_error_event(AppError( - ErrorCode.DATA_LOAD_ERROR, - "This website doesn't allow us to download HTML from URL", - )) - else: - yield stream_error_event(classify_and_wrap_llm_error(e)) - - return Response( - stream_with_context(_with_warnings(generate())), - mimetype='application/x-ndjson', - ) - - @agent_bp.route('/sort-data', methods=['GET', 'POST']) def sort_data_request(): if not request.is_json: @@ -738,12 +672,6 @@ def request_chart_insight(): if not model_config: raise AppError(ErrorCode.INVALID_REQUEST, "Model configuration is required") - if not model_supports_vision(model_config): - raise AppError( - ErrorCode.VALIDATION_ERROR, - "The selected model does not support image input. Please switch to a vision-capable model.", - ) - client = get_client(model_config) identity_id = get_identity_id() workspace = get_workspace(identity_id) @@ -1013,10 +941,10 @@ def refresh_derived_data(): else: display_df = result_df display_df = display_df.loc[:, ~display_df.columns.duplicated()] - response_data["rows"] = json.loads(display_df.to_json(orient='records', date_format='iso')) + response_data["rows"] = df_to_safe_records(display_df) else: result_df = result_df.loc[:, ~result_df.columns.duplicated()] - response_data["rows"] = json.loads(result_df.to_json(orient='records', date_format='iso')) + response_data["rows"] = df_to_safe_records(result_df) return json_ok(response_data) else: @@ -1198,7 +1126,7 @@ def chart_restyle(): client = get_client(model_config) try: - agent = ChartRestyleAgent(client=client, language_instruction=get_language_instruction()) + agent = ChartRestyleAgent(client=client, language_instruction=get_language_instruction(mode="compact")) result = agent.run( vl_spec=vl_spec, instruction=instruction, @@ -1278,7 +1206,7 @@ def scratch_serve(filename): # --------------------------------------------------------------------------- -# Conversational data loading agent (replaces old clean-data-stream) +# Conversational data loading agent # --------------------------------------------------------------------------- @agent_bp.route('/data-loading-chat', methods=['POST']) @@ -1296,9 +1224,6 @@ def data_loading_chat(): logger.info("# data-loading-chat request") messages = content.get("messages", []) - if _messages_include_image(messages) and not model_supports_vision(content.get("model")): - return stream_preflight_error(AppError(ErrorCode.INVALID_REQUEST, "The selected model does not support image input. Please switch to a vision-capable model or remove the image.")) - client = get_client(content['model']) identity_id = get_identity_id() workspace = get_workspace(identity_id) diff --git a/py-src/data_formulator/routes/sessions.py b/py-src/data_formulator/routes/sessions.py index 4471db77..068fefe3 100644 --- a/py-src/data_formulator/routes/sessions.py +++ b/py-src/data_formulator/routes/sessions.py @@ -260,14 +260,31 @@ def update_workspace_meta(): @session_bp.route("/export", methods=["POST"]) def export_session(): - """Export the active workspace as a zip.""" + """Export a workspace as a zip. + + Body: ``{ "state": {...}, "workspace_id": "session_..." }`` + + ``workspace_id`` identifies which workspace's files to package. + This avoids the need for an ``X-Workspace-Id`` header, allowing + export from the landing page where no workspace is active. + """ + if _is_ephemeral(): + raise AppError(ErrorCode.INVALID_REQUEST, "Export is handled client-side in ephemeral mode") + data = request.get_json(force=True) state: dict = data.get("state") + workspace_id: str = (data.get("workspace_id") or "").strip() if state is None: raise AppError(ErrorCode.INVALID_REQUEST, "State payload is required") + if not workspace_id: + raise AppError(ErrorCode.INVALID_REQUEST, "workspace_id is required") identity_id = get_identity_id() - ws = get_workspace(identity_id) + mgr = get_workspace_manager(identity_id) + if not mgr.workspace_exists(workspace_id): + raise AppError(ErrorCode.TABLE_NOT_FOUND, f"Workspace '{workspace_id}' not found") + + ws = mgr.open_workspace(workspace_id, identity_id) from data_formulator.datalake.workspace_manager import _strip_sensitive clean_state = _strip_sensitive(state) @@ -279,14 +296,34 @@ def export_session(): @session_bp.route("/import", methods=["POST"]) def import_session(): - """Import a workspace from a zip.""" + """Import a workspace from a zip. + + The optional ``workspace_id`` form field specifies the target + workspace. If the workspace doesn't exist yet it is created + automatically, so callers can generate a fresh ID client-side. + When omitted, falls back to the ``X-Workspace-Id`` header. + """ + if _is_ephemeral(): + return json_ok({"state": {}, "message": "Import handled client-side in ephemeral mode"}) + if "file" not in request.files: raise AppError(ErrorCode.INVALID_REQUEST, "No file uploaded") file = request.files["file"] + workspace_id = (request.form.get("workspace_id") or "").strip() + try: identity_id = get_identity_id() - ws = get_workspace(identity_id) + mgr = get_workspace_manager(identity_id) + + if workspace_id: + if mgr.workspace_exists(workspace_id): + ws = mgr.open_workspace(workspace_id, identity_id) + else: + ws = mgr.create_and_open_workspace(workspace_id, identity_id) + else: + ws = get_workspace(identity_id) + state = ws.import_session_zip(io.BytesIO(file.read())) return json_ok({"state": state}) except ValueError: diff --git a/py-src/data_formulator/routes/tables.py b/py-src/data_formulator/routes/tables.py index 1b61e0db..04e3baed 100644 --- a/py-src/data_formulator/routes/tables.py +++ b/py-src/data_formulator/routes/tables.py @@ -18,7 +18,7 @@ from data_formulator.auth.identity import get_identity_id from data_formulator.datalake.workspace import Workspace from data_formulator.workspace_factory import get_workspace as _create_workspace -from data_formulator.datalake.parquet_utils import sanitize_table_name as parquet_sanitize_table_name, safe_data_filename, normalize_dtype_to_app_type +from data_formulator.datalake.parquet_utils import sanitize_table_name as parquet_sanitize_table_name, safe_data_filename, normalize_dtype_to_app_type, df_to_safe_records from data_formulator.datalake.file_manager import save_uploaded_file, is_supported_file, get_file_type, normalize_text_encoding from data_formulator.datalake.workspace_metadata import TableMetadata as DatalakeTableMetadata, ColumnInfo import re @@ -59,6 +59,205 @@ def _quote_duckdb(col: str) -> str: return '"' + str(col).replace('"', '""') + '"' +def _quote_lit(value) -> str: + """Quote a literal for safe DuckDB SQL interpolation. + + Supports str/int/float/bool/None and pandas/datetime values via ``str()``. + Numeric values are emitted as-is; strings are single-quoted with internal + quotes doubled. Used by the column-filter WHERE builder (design-doc 31). + """ + if value is None: + return "NULL" + if isinstance(value, bool): + return "TRUE" if value else "FALSE" + if isinstance(value, (int, float)): + # Avoid NaN/Inf landing in SQL. + try: + import math + if isinstance(value, float) and (math.isnan(value) or math.isinf(value)): + return "NULL" + except Exception: + pass + return repr(value) + s = str(value).replace("'", "''") + return f"'{s}'" + + +def _column_type_map(columns_info: list[dict]) -> dict[str, str]: + """``get_parquet_schema`` columns → ``{name: TYPE_UPPER}`` lookup.""" + return {c["name"]: str(c.get("type", "")).upper() for c in columns_info} + + +def _extend_eod_if_timestamp(value, col_type_upper: str): + """Promote a bare ``YYYY-MM-DD`` upper bound to end-of-day for timestamp + columns so a date-only ``<=`` filter is naturally inclusive of that day. + """ + if not isinstance(value, str): + return value + if "TIMESTAMP" not in col_type_upper: + return value + if len(value) == 10 and value[4] == "-" and value[7] == "-": + return value + " 23:59:59.999" + return value + + +def _build_filter_where_duckdb( + filters: list | None, + columns: list[str], + column_types: dict[str, str] | None = None, + *, + alias: str = "", +) -> str: + """Build a DuckDB ``WHERE`` clause from the three-op filter vocabulary + (design-doc 31): ``range`` / ``in`` / ``contains``. Returns either an + empty string or ``" WHERE "`` ready to splice into a query. + + Unknown ops, filters on missing columns, and malformed entries are + silently dropped — the route must continue to work even if the + frontend sends stale state. An empty ``in`` list collapses to + ``WHERE FALSE`` for the field so the user sees an empty grid. + """ + if not filters: + return "" + column_types = column_types or {} + prefix = (alias + ".") if alias else "" + clauses: list[str] = [] + for f in filters: + if not isinstance(f, dict): + continue + op = f.get("op") + field = f.get("field") + if not field or field not in columns: + continue + qc = prefix + _quote_duckdb(field) + col_type_upper = column_types.get(field, "") + + if op == "range": + lo = f.get("min") + hi = f.get("max") + include_nulls = bool(f.get("include_nulls")) + parts = [] + if lo is not None: + parts.append(f"{qc} >= {_quote_lit(lo)}") + if hi is not None: + hi = _extend_eod_if_timestamp(hi, col_type_upper) + parts.append(f"{qc} <= {_quote_lit(hi)}") + if not parts: + # No bounds → no constraint, unless caller asks to only show nulls. + if include_nulls: + clauses.append(f"{qc} IS NULL") + continue + bounds_clause = "(" + " AND ".join(parts) + ")" + if include_nulls: + clauses.append(f"({bounds_clause} OR {qc} IS NULL)") + else: + clauses.append(bounds_clause) + + elif op == "in": + values = f.get("values") + if not isinstance(values, list): + continue + if not values: + # Empty checklist → user deselected everything. + clauses.append("FALSE") + continue + has_null = any(v is None for v in values) + non_null = [v for v in values if v is not None] + parts = [] + if non_null: + parts.append( + f"{qc} IN (" + ", ".join(_quote_lit(v) for v in non_null) + ")" + ) + if has_null: + parts.append(f"{qc} IS NULL") + if parts: + clauses.append("(" + " OR ".join(parts) + ")") + + elif op == "contains": + v = f.get("value") + if not isinstance(v, str) or v == "": + continue + # ILIKE escape: %, _, and \ in the user pattern get backslash-escaped. + esc = v.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_") + pattern = f"%{esc}%" + clauses.append( + f"CAST({qc} AS VARCHAR) ILIKE {_quote_lit(pattern)} ESCAPE '\\'" + ) + + if not clauses: + return "" + return " WHERE " + " AND ".join(clauses) + + +def _apply_filters_pandas(df: pd.DataFrame, filters: list | None) -> pd.DataFrame: + """Pandas mirror of :func:`_build_filter_where_duckdb`.""" + if not filters: + return df + import pandas as _pd # alias to avoid shadowing + columns = set(df.columns) + mask = _pd.Series([True] * len(df), index=df.index) + for f in filters: + if not isinstance(f, dict): + continue + op = f.get("op") + field = f.get("field") + if not field or field not in columns: + continue + col = df[field] + if op == "range": + lo = f.get("min") + hi = f.get("max") + include_nulls = bool(f.get("include_nulls")) + sub = _pd.Series([True] * len(df), index=df.index) + applied_bound = False + if lo is not None: + try: + sub &= (col >= lo) + applied_bound = True + except Exception: + pass + if hi is not None: + try: + if _pd.api.types.is_datetime64_any_dtype(col): + hi_val = _extend_eod_if_timestamp(hi, "TIMESTAMP") + else: + hi_val = hi + sub &= (col <= hi_val) + applied_bound = True + except Exception: + pass + if not applied_bound: + if include_nulls: + mask &= col.isna() + continue + if include_nulls: + mask &= (sub & col.notna()) | col.isna() + else: + mask &= sub & col.notna() + elif op == "in": + values = f.get("values") + if not isinstance(values, list): + continue + if not values: + mask &= False + continue + has_null = any(v is None for v in values) + non_null = [v for v in values if v is not None] + sub = col.isin(non_null) if non_null else _pd.Series([False] * len(df), index=df.index) + if has_null: + sub = sub | col.isna() + mask &= sub + elif op == "contains": + v = f.get("value") + if not isinstance(v, str) or v == "": + continue + try: + mask &= col.astype(str).str.contains(v, case=False, na=False, regex=False) + except Exception: + pass + return df[mask] + + def _dedup_dataframe_columns(df: pd.DataFrame) -> pd.DataFrame: """Remove duplicate columns from a DataFrame, keeping the first occurrence.""" if df.columns.duplicated().any(): @@ -79,14 +278,22 @@ def _build_parquet_sample_sql( order_by_fields: list, sample_size: int, offset: int = 0, + filters: list | None = None, + column_types: dict[str, str] | None = None, ) -> tuple[str, str]: """ Build DuckDB SQL for sampling (and optional aggregation) over parquet. Returns (main_sql, count_sql) where each contains {parquet} placeholder. + + When ``filters`` is provided (design-doc 31), a WHERE clause is added on + the base table — pre-aggregation in the aggregate branch and + pre-``ROW_NUMBER()`` in the non-aggregate branch so the surfaced + ``#rowId`` is contiguous over the filtered slice. """ valid_agg = [(f, fn) for (f, fn) in aggregate_fields_and_functions if f is None or f in columns] valid_select = _dedup_list([f for f in select_fields if f in columns]) valid_order = [f for f in order_by_fields if f in columns] + where_clause = _build_filter_where_duckdb(filters, columns, column_types, alias="t") if valid_agg: select_parts = [] @@ -110,7 +317,7 @@ def _build_parquet_sample_sql( select_parts.append(f"t.{_quote_duckdb(f)}") group_cols = valid_select group_by = f" GROUP BY {', '.join('t.' + _quote_duckdb(c) for c in group_cols)}" if group_cols else "" - inner = f"SELECT {', '.join(select_parts)} FROM {{parquet}} AS t{group_by}" + inner = f"SELECT {', '.join(select_parts)} FROM {{parquet}} AS t{where_clause}{group_by}" count_sql = f"SELECT COUNT(*) FROM ({inner}) AS sub" if method == "random": order_by = " ORDER BY RANDOM()" @@ -124,10 +331,11 @@ def _build_parquet_sample_sql( main_sql = f"SELECT * FROM ({inner}) AS sub{order_by} LIMIT {sample_size}{offset_clause}" return main_sql, count_sql - count_sql = "SELECT COUNT(*) FROM {parquet} AS t" + count_sql = f"SELECT COUNT(*) FROM {{parquet}} AS t{where_clause}" # Wrap the base table with a ROW_NUMBER() so the original row position - # is preserved even after sorting / sampling. - base = "(SELECT ROW_NUMBER() OVER () AS \"#rowId\", t.* FROM {parquet} AS t) AS t" + # is preserved even after sorting / sampling. When filters are active, + # ROW_NUMBER() runs over the filtered set so #rowId is contiguous. + base = f"(SELECT ROW_NUMBER() OVER () AS \"#rowId\", t.* FROM {{parquet}} AS t{where_clause}) AS t" if method == "random": order_by = " ORDER BY RANDOM()" elif method == "head" and valid_order: @@ -241,7 +449,7 @@ def list_tables(): df = workspace.read_data_as_df(table_name) df = df.head(1000) df = _dedup_dataframe_columns(df) - sample_rows = json.loads(df.to_json(orient='records', date_format='iso')) + sample_rows = df_to_safe_records(df) except Exception as e: logger.warning("Could not read sample rows for %s", table_name, exc_info=e) source_metadata = _table_metadata_to_source_metadata(meta) @@ -275,11 +483,14 @@ def _apply_aggregation_and_sample( order_by_fields: list, sample_size: int, offset: int = 0, + filters: list | None = None, ) -> tuple[pd.DataFrame, int]: """ - Apply aggregation (optional), then sample with ordering. + Apply filters (optional), aggregation (optional), then sample with ordering. Returns (sampled_df, total_row_count_after_aggregation). """ + if filters: + df = _apply_filters_pandas(df, filters) columns = list(df.columns) valid_agg = [ (f, fn) for (f, fn) in aggregate_fields_and_functions @@ -338,6 +549,34 @@ def _apply_aggregation_and_sample( return work, total_row_count +# Cardinality threshold used by the column-filter popover (design-doc 31): +# columns with distinct_count <= this get a checklist filter; above get keyword. +_COLUMN_STATS_LEVELS_LIMIT = 100 + + +def _fetch_column_levels_duckdb(workspace, table_name: str, column: str) -> tuple[list, list[int]]: + """Top-N value/count pairs (count-desc) for a single low-card column.""" + qc = _quote_duckdb(column) + sql = ( + f"SELECT {qc} AS value, COUNT(*) AS count " + f"FROM {{parquet}} AS t WHERE {qc} IS NOT NULL " + f"GROUP BY {qc} ORDER BY count DESC, value ASC " + f"LIMIT {_COLUMN_STATS_LEVELS_LIMIT}" + ) + df = workspace.run_parquet_sql(table_name, sql) + levels = df["value"].tolist() + level_counts = [int(c) for c in df["count"].tolist()] + return levels, level_counts + + +def _safe_levels(levels: list) -> list: + """Run levels (which may contain pandas/numpy scalars) through df_to_safe_records.""" + if not levels: + return [] + tmp = pd.DataFrame({"value": levels}) + return [r["value"] for r in df_to_safe_records(tmp)] + + @tables_bp.route('/sample-table', methods=['POST']) def sample_table(): """Sample a table from the workspace. Uses DuckDB for parquet (no full load).""" @@ -350,11 +589,13 @@ def sample_table(): method = data.get('method', 'random') order_by_fields = data.get('order_by_fields', []) offset = data.get('offset', 0) + filters = data.get('filters') or None workspace = _get_workspace() if _should_use_duckdb(workspace, table_id): schema_info = workspace.get_parquet_schema(table_id) columns = [c["name"] for c in schema_info.get("columns", [])] + column_types = _column_type_map(schema_info.get("columns", [])) main_sql, count_sql = _build_parquet_sample_sql( columns, aggregate_fields_and_functions, @@ -363,6 +604,8 @@ def sample_table(): order_by_fields, sample_size, offset, + filters=filters, + column_types=column_types, ) total_row_count = int(workspace.run_parquet_sql(table_id, count_sql).iloc[0, 0]) result_df = workspace.run_parquet_sql(table_id, main_sql) @@ -376,9 +619,10 @@ def sample_table(): order_by_fields, sample_size, offset, + filters=filters, ) result_df = _dedup_dataframe_columns(result_df) - rows_json = json.loads(result_df.to_json(orient='records', date_format='iso')) + rows_json = df_to_safe_records(result_df) return json_ok({ "rows": rows_json, "total_row_count": total_row_count, @@ -408,14 +652,14 @@ def get_table_data(): ) page_df = _dedup_dataframe_columns(page_df) columns = list(page_df.columns) - rows = json.loads(page_df.to_json(orient='records', date_format='iso')) + rows = df_to_safe_records(page_df) else: df = workspace.read_data_as_df(table_name) df = _dedup_dataframe_columns(df) total_rows = len(df) columns = list(df.columns) page_df = df.iloc[offset : offset + page_size] - rows = json.loads(page_df.to_json(orient='records', date_format='iso')) + rows = df_to_safe_records(page_df) return json_ok({ "table_name": table_name, @@ -617,7 +861,7 @@ def parse_file(): for sheet_name in xls.sheet_names: df = xls.parse(sheet_name) df = df.where(df.notna(), None) - records = df.to_dict(orient='records') + records = df_to_safe_records(df) sheets.append({ "sheet_name": sheet_name, "columns": list(df.columns), @@ -629,7 +873,7 @@ def parse_file(): raw = normalize_text_encoding(file.stream.read(), 'csv') df = pd.read_csv(io.BytesIO(raw)) df = df.where(df.notna(), None) - records = df.to_dict(orient='records') + records = df_to_safe_records(df) return json_ok({ "sheets": [{ "sheet_name": "Sheet1", @@ -847,7 +1091,13 @@ def _is_numeric_duckdb_type(col_type: str) -> bool: @tables_bp.route('/analyze', methods=['POST']) def analyze_table(): - """Get basic statistics about a table in the workspace. Uses DuckDB for parquet (no full load).""" + """Get basic statistics about a table in the workspace. Uses DuckDB for parquet (no full load). + + For low-cardinality columns (``unique_count <= _COLUMN_STATS_LEVELS_LIMIT``) + also returns ``levels`` and parallel ``level_counts`` arrays so the data- + grid column filter popover (design-doc 31) can render a checklist + synchronously without a follow-up fetch. + """ try: data = request.get_json() table_name = data.get('table_name') @@ -892,6 +1142,20 @@ def analyze_table(): "unique_count": int(row["unique_count"]), "null_count": int(row["null_count"]), } + # Low-cardinality value list for the filter popover checklist. + uc = stats_dict["unique_count"] + if 0 < uc <= _COLUMN_STATS_LEVELS_LIMIT: + try: + levels, level_counts = _fetch_column_levels_duckdb( + workspace, table_name, col_name + ) + stats_dict["levels"] = _safe_levels(levels) + stats_dict["level_counts"] = level_counts + except Exception as e: + logger.warning( + "analyze: levels pass failed for %s.%s", + table_name, col_name, exc_info=e, + ) stats.append({"column": col_name, "type": col_type, "statistics": stats_dict}) else: df = workspace.read_data_as_df(table_name) @@ -899,15 +1163,26 @@ def analyze_table(): for col_name in df.columns: s = df[col_name] col_type = str(s.dtype) + unique_count = int(s.nunique()) stats_dict = { "count": int(s.count()), - "unique_count": int(s.nunique()), + "unique_count": unique_count, "null_count": int(s.isna().sum()), } if pd.api.types.is_numeric_dtype(s): stats_dict["min"] = float(s.min()) if s.notna().any() else None stats_dict["max"] = float(s.max()) if s.notna().any() else None stats_dict["avg"] = float(s.mean()) if s.notna().any() else None + if 0 < unique_count <= _COLUMN_STATS_LEVELS_LIMIT: + try: + vc = s.dropna().value_counts().head(_COLUMN_STATS_LEVELS_LIMIT) + stats_dict["levels"] = _safe_levels(vc.index.tolist()) + stats_dict["level_counts"] = [int(v) for v in vc.values.tolist()] + except Exception as e: + logger.warning( + "analyze: pandas levels pass failed for %s.%s", + table_name, col_name, exc_info=e, + ) stats.append({"column": col_name, "type": col_type, "statistics": stats_dict}) return json_ok({"table_name": table_name, "statistics": stats}) diff --git a/py-src/data_formulator/workflows/create_vl_plots.py b/py-src/data_formulator/workflows/create_vl_plots.py index 5e352d43..a8ac39a8 100644 --- a/py-src/data_formulator/workflows/create_vl_plots.py +++ b/py-src/data_formulator/workflows/create_vl_plots.py @@ -136,11 +136,6 @@ def detect_field_type(series: pd.Series) -> str: "mark": "line", "channels": ["x", "y", "color", "strokeDash", "opacity", "detail", "column", "row"] }, - { - "chart": "Dotted Line Chart", - "mark": {"type": "line", "point": True}, - "channels": ["x", "y", "color", "detail", "column", "row"] - }, { "chart": "Bar Chart", "mark": "bar", @@ -1895,11 +1890,14 @@ def _ensure_mark_obj(s): _ensure_mark_obj(spec) spec["mark"]["cornerRadius"] = int(corner_radius) - elif chart_type in ("Line Chart", "Dotted Line Chart"): + elif chart_type == "Line Chart": interpolate = config.get("interpolate") if interpolate: _ensure_mark_obj(spec) spec["mark"]["interpolate"] = interpolate + if config.get("showPoints"): + _ensure_mark_obj(spec) + spec["mark"]["point"] = True elif chart_type == "Area Chart": interpolate = config.get("interpolate") diff --git a/pyproject.toml b/pyproject.toml index 381a1344..780e8f98 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "data_formulator" -version = "0.7.0a2" +version = "0.7.0" requires-python = ">=3.11" authors = [ diff --git a/src/app/App.tsx b/src/app/App.tsx index 5fc5acd5..17898f0c 100644 --- a/src/app/App.tsx +++ b/src/app/App.tsx @@ -109,6 +109,7 @@ import InfoOutlinedIcon from '@mui/icons-material/InfoOutlined'; import YouTubeIcon from '@mui/icons-material/YouTube'; import PublicIcon from '@mui/icons-material/Public'; import { useTranslation } from 'react-i18next'; +import { syncVegaLocale } from '../lib/vega-locale'; // Discord Icon Component const DiscordIcon: FC<{ sx?: any }> = ({ sx }) => ( @@ -732,6 +733,7 @@ const ErrorBoundaryFallback: React.FC = () => { }; const AUTH_ERROR_MESSAGES: Record = { + access_denied: 'auth.ssoErrorAccessDenied', invalid_state: 'auth.ssoErrorInvalidState', invalid_client: 'auth.ssoErrorInvalidClient', token_exchange_failed: 'auth.ssoErrorTokenExchange', @@ -978,12 +980,19 @@ const AppShell: FC = () => { export const AppFC: FC = function AppFC(appProps) { const dispatch = useDispatch(); - const { t } = useTranslation(); + const { t, i18n } = useTranslation(); const rawPaletteKey = useSelector((state: DataFormulatorState) => state.config.paletteKey); const activePaletteKey = (rawPaletteKey && palettes[rawPaletteKey]) ? rawPaletteKey : defaultPaletteKey; const [configLoaded, setConfigLoaded] = useState(false); + useEffect(() => { + syncVegaLocale(); + const onLangChanged = () => syncVegaLocale(); + i18n.on('languageChanged', onLangChanged); + return () => { i18n.off('languageChanged', onLangChanged); }; + }, [i18n]); + useEffect(() => { apiRequest(getUrls().APP_CONFIG) .then(({ data }) => { diff --git a/src/app/OidcCallback.tsx b/src/app/OidcCallback.tsx index 4e5d1f99..cffa3a92 100644 --- a/src/app/OidcCallback.tsx +++ b/src/app/OidcCallback.tsx @@ -36,6 +36,12 @@ export function OidcCallback() { if (!mgr) return; const params = new URLSearchParams(window.location.search); + const idpError = params.get("error"); + if (idpError) { + window.location.href = `/?auth_error=${encodeURIComponent(idpError)}`; + return; + } + if (!params.get("state") && params.get("code")) { // IdP-initiated flow: the SSO redirected here directly // without DF having started the login. Re-initiate a diff --git a/src/app/chartRecommendation.ts b/src/app/chartRecommendation.ts index 2ac86c02..4c74c7f9 100644 --- a/src/app/chartRecommendation.ts +++ b/src/app/chartRecommendation.ts @@ -68,6 +68,12 @@ export const resolveRecommendedChart = (refinedGoal: any, allFields: FieldItem[] if (chartObj['config'] && typeof chartObj['config'] === 'object') { newChart.config = { ...chartObj['config'] }; } + + // Legacy: "Dotted Line Chart" was folded into Line Chart with a `showPoints` config flag. + if (rawChartType === 'Dotted Line Chart' || rawChartType === 'dotted_line') { + newChart.chartType = 'Line Chart'; + newChart.config = { ...(newChart.config || {}), showPoints: true }; + } return newChart; }; diff --git a/src/app/dfSlice.tsx b/src/app/dfSlice.tsx index b1194bee..a3fe5add 100644 --- a/src/app/dfSlice.tsx +++ b/src/app/dfSlice.tsx @@ -15,8 +15,9 @@ import { deleteTablesFromWorkspace } from './workspaceService'; import { getChartPngDataUrl } from './chartCache'; import i18n from '../i18n'; import { Type } from '../data/types'; -import { createTableFromFromObjectArray, inferTypeFromValueArray } from '../data/utils'; +import { createTableFromFromObjectArray, inferTypeFromValueArray, refineTemporalType } from '../data/utils'; import { Identity, IdentityType, getBrowserId } from './identity'; +import { REHYDRATE } from 'redux-persist'; enableMapSet(); @@ -28,11 +29,23 @@ export const generateFreshChart = (tableRef: string, chartType: string, source: chartType: chartType, encodingMap: Object.assign({}, ...getChartChannels(chartType).map((channel) => ({ [channel]: { channel: channel, bin: false } }))), tableRef: tableRef, - saved: false, source: source, } } +/** + * Migrate legacy `chartType: "Dotted Line Chart"` to `Line Chart` with `config.showPoints: true`. + * Dotted Line was removed as a standalone type and folded into a Line Chart property. + */ +const migrateDottedLineChart = (chart: any): any => { + if (chart?.chartType !== 'Dotted Line Chart') return chart; + return { + ...chart, + chartType: 'Line Chart', + config: { ...(chart.config || {}), showPoints: true }, + }; +}; + export interface SSEMessage { type: "heartbeat" | "notification" | "action"; text: string; @@ -84,7 +97,6 @@ export interface ModelConfig { api_key?: string; api_base?: string; api_version?: string; - supports_vision?: boolean; /** True for models configured server-side via .env. Their credentials never leave the server. */ is_global?: boolean; } @@ -136,7 +148,7 @@ export interface DataFormulatorState { /** User-added models, persisted across browser sessions. */ models: ModelConfig[]; selectedModelId: string | undefined; - testedModels: {id: string, status: 'ok' | 'error' | 'testing' | 'unknown' | 'configured', message: string}[]; + testedModels: {id: string, status: 'ok' | 'error' | 'testing' | 'unknown', message: string}[]; tables : DictTable[]; draftNodes: DraftNode[]; @@ -157,6 +169,26 @@ export interface DataFormulatorState { chartSynthesisInProgress: string[]; chartInsightInProgress: string[]; + /** + * Thumbnail PNG data URLs keyed by chart id. Stored in a separate slice + * (rather than on `chart.thumbnail`) so a thumbnail update doesn't + * invalidate the `charts` array reference and trigger a cascade of + * `ChartRenderService` effect re-runs / cancelled render queues. + * Not persisted — thumbnails are re-derived from the module-scoped + * `chartCache` on reload. + */ + chartThumbnails: Record; + + /** + * Monotonically increasing counter bumped whenever the focused canvas + * fetches a fresh display-row sample (see `src/app/displayRowsCache.ts`). + * Background services that render off-screen (e.g. ChartRenderService) + * select this so they re-run when the canvas's richer sample becomes + * available, instead of being stuck rendering against the small preview + * slice that virtual tables ship in `table.rows`. + */ + displayRowsTick: number; + serverConfig: ServerConfig; config: ClientConfig; @@ -170,6 +202,23 @@ export interface DataFormulatorState { // Conversational data loading chat dataLoadingChatMessages: ChatMessage[]; dataLoadingChatInProgress: boolean; + /** + * Monotonic counter bumped whenever the chat is reset externally + * (clearChatMessages). DataLoadingChat watches this to abort any + * in-flight stream and discard partial dispatches that would + * otherwise pollute the freshly-cleared thread. + * Transient — not persisted. + */ + dataLoadingChatResetCounter: number; + /** + * Pending hand-off from the Data Agent to a peer agent. Set by the + * Data Agent's `delegate` action card; consumed by `DataFormulator` + * (for `data_loading` → opens the upload dialog) or + * `SimpleChartRecBox` (for `report_gen` → kicks off the report + * generator) which clear this back to null. Transient — not + * persisted across sessions. + */ + agentHandoffRequest: { target: 'data_loading' | 'report_gen'; prompt: string; images?: string[] } | null; // Generated reports state generatedReports: GeneratedReport[]; @@ -219,6 +268,8 @@ const initialState: DataFormulatorState = { chartSynthesisInProgress: [], chartInsightInProgress: [], + chartThumbnails: {}, + displayRowsTick: 0, serverConfig: { DISABLE_DISPLAY_KEYS: false, @@ -247,6 +298,8 @@ const initialState: DataFormulatorState = { dataLoadingChatMessages: [], dataLoadingChatInProgress: false, + dataLoadingChatResetCounter: 0, + agentHandoffRequest: null, generatedReports: [], @@ -260,9 +313,24 @@ const initialState: DataFormulatorState = { focusedConnectorId: undefined, } +/** + * Non-memoized equivalent of `dfSelectors.getAllCharts` for use inside + * reducers. Reducers receive an Immer draft `state`; passing a draft into + * memoized selectors (createSelector) causes the selector to cache draft + * proxies. Once the reducer completes, those proxies are revoked, and any + * later read from the cached array throws "Cannot perform 'get' on a proxy + * that has been revoked". Always use this helper from reducer code paths. + */ +const collectAllCharts = (state: DataFormulatorState): Chart[] => { + const triggerCharts = state.tables + .filter(t => t.derive?.trigger?.chart) + .map(t => t.derive?.trigger?.chart) as Chart[]; + return [...state.charts, ...triggerCharts]; +}; + let getUnrefedDerivedTableIds = (state: DataFormulatorState) => { // find tables directly referred by charts - let allCharts = dfSelectors.getAllCharts(state); + let allCharts = collectAllCharts(state); let chartRefedTables = allCharts.map(chart => getDataTable(chart, state.tables, allCharts, state.conceptShelfItems)) .filter(t => t != undefined).map(t => t.id); let tableWithDescendants = state.tables.filter(table => state.tables.some(t => t.derive?.trigger.tableId == table.id)).map(t => t.id); @@ -271,19 +339,58 @@ let getUnrefedDerivedTableIds = (state: DataFormulatorState) => { } let deleteChartsRoutine = (state: DataFormulatorState, chartIds: string[]) => { - let charts = state.charts.filter(c => !chartIds.includes(c.id)); let currentFocusedChartId = state.focusedId?.type === 'chart' ? state.focusedId.chartId : undefined; - if (currentFocusedChartId && chartIds.includes(currentFocusedChartId)) { - let leafCharts = charts; - if (leafCharts.length > 0) { - state.focusedId = { type: 'chart', chartId: leafCharts[0].id }; + // Capture context BEFORE filtering so we can pick a sensible new focus. + // When the focused chart is being deleted, we prefer: + // 1. The neighboring sibling on the same table (visually adjacent). + // 2. The table itself, if no sibling remains. + // 3. Any remaining chart, as a final fallback. + let deletedFocusedChart = currentFocusedChartId && chartIds.includes(currentFocusedChartId) + ? state.charts.find(c => c.id === currentFocusedChartId) + : undefined; + let focusedTableRef = deletedFocusedChart?.tableRef; + let focusedSiblingIndex = -1; + if (deletedFocusedChart && focusedTableRef) { + const siblings = state.charts.filter(c => c.tableRef === focusedTableRef); + focusedSiblingIndex = siblings.findIndex(c => c.id === currentFocusedChartId); + } + + let charts = state.charts.filter(c => !chartIds.includes(c.id)); + + if (deletedFocusedChart) { + const remainingSiblings = focusedTableRef + ? charts.filter(c => c.tableRef === focusedTableRef) + : []; + if (remainingSiblings.length > 0) { + // Pick the chart just before the deleted one in original order + // (visually "previous"). Clamp so the very first sibling + // falls back to the new first, and tail deletions land on + // the new last. + const targetIdx = Math.min( + Math.max(0, focusedSiblingIndex - 1), + remainingSiblings.length - 1, + ); + state.focusedId = { type: 'chart', chartId: remainingSiblings[targetIdx].id }; + } else if (focusedTableRef && state.tables.some(t => t.id === focusedTableRef)) { + // Last chart on this table — surface the table itself. + state.focusedId = { type: 'table', tableId: focusedTableRef }; + } else if (charts.length > 0) { + state.focusedId = { type: 'chart', chartId: charts[0].id }; } else { state.focusedId = undefined; } } + state.chartSynthesisInProgress = state.chartSynthesisInProgress.filter(s => !chartIds.includes(s)); + // Clean up thumbnail entries for removed charts. + if (state.chartThumbnails) { + for (const id of chartIds) { + delete state.chartThumbnails[id]; + } + } + // update focusedChart and activeThreadChart state.charts = charts; @@ -295,6 +402,42 @@ let deleteChartsRoutine = (state: DataFormulatorState, chartIds: string[]) => { deleteTablesFromWorkspace(tablesToDelete.map(t => t.virtual.tableId)); state.tables = state.tables.filter(t => !tableIdsToDelete.includes(t.id)); + + // If the focus we just set lands on a table that has now been cascade- + // deleted (e.g. an unanchored derived table whose only chart we just + // removed), walk up the derive chain to land on a still-present chart + // — the "previous chart above this table" the user expects. Falls + // through to the parent table itself, then to any remaining chart. + if (state.focusedId?.type === 'table' && !state.tables.some(t => t.id === (state.focusedId as any).tableId)) { + const deletedTablesById = new Map(tablesToDelete.map(t => [t.id, t])); + let cursor: string | undefined = (state.focusedId as any).tableId; + let resolved = false; + while (cursor) { + const removedTable = deletedTablesById.get(cursor); + const parentId: string | undefined = removedTable?.derive?.trigger.tableId; + if (!parentId) break; + if (state.tables.some(t => t.id === parentId)) { + const parentCharts = state.charts.filter(c => c.tableRef === parentId); + if (parentCharts.length > 0) { + state.focusedId = { type: 'chart', chartId: parentCharts[parentCharts.length - 1].id }; + } else { + state.focusedId = { type: 'table', tableId: parentId }; + } + resolved = true; + break; + } + cursor = parentId; + } + if (!resolved) { + if (state.charts.length > 0) { + state.focusedId = { type: 'chart', chartId: state.charts[state.charts.length - 1].id }; + } else if (state.tables.length > 0) { + state.focusedId = { type: 'table', tableId: state.tables[0].id }; + } else { + state.focusedId = undefined; + } + } + } } /** @@ -365,6 +508,29 @@ export const fetchFieldSemanticType = createAsyncThunk( } ); +/** + * Fetch backend-computed per-column statistics for a workspace-stored + * (virtual) table and merge them into ``table.metadata``. Powers the + * data-grid column filter popover (design-doc 31): the response carries + * ``distinct_count`` / ``null_count`` for every column and, for + * low-cardinality columns, ``levels`` + parallel ``level_counts``. + * + * ``levels`` is merged with precedence — curated orderings already on the + * table (LLM ``sort_order``, chart-gallery hints) win; the stats-derived + * list only fills when the existing ``levels`` is empty. + */ +export const fetchColumnStats = createAsyncThunk( + "dataFormulatorSlice/fetchColumnStats", + async (table: DictTable) => { + const { data } = await apiRequest(getUrls().GET_COLUMN_STATS, { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ table_name: table.virtual?.tableId || table.id }), + }); + return { tableId: table.id, statistics: data?.statistics || [] }; + } +); + export const fetchCodeExpl = createAsyncThunk( "dataFormulatorSlice/fetchCodeExpl", async (derivedTable: DictTable, { getState }) => { @@ -396,7 +562,7 @@ export const fetchChartInsight = createAsyncThunk( console.log(">>> call agent to generate chart insight <<<"); const state = getState() as DataFormulatorState; - const chart = dfSelectors.getAllCharts(state).find(c => c.id === args.chartId); + const chart = collectAllCharts(state).find(c => c.id === args.chartId); if (!chart) throw new Error(`Chart not found: ${args.chartId}`); // Wait for chart image to be available in cache (replaces fixed 1.5s delay at call site) @@ -553,6 +719,7 @@ export const dataFormulatorSlice = createSlice({ state.dataLoadingChatMessages = []; state.dataLoadingChatInProgress = false; + state.dataLoadingChatResetCounter = (state.dataLoadingChatResetCounter ?? 0) + 1; state.generatedReports = []; @@ -652,7 +819,18 @@ export const dataFormulatorSlice = createSlice({ } return node; }), - charts: saved.charts || [], + charts: (saved.charts || []).map(migrateDottedLineChart).map((c: Chart) => { + // Legacy sessions stored `thumbnail` on the Chart itself. + // We now keep thumbnails in a sibling slice (see + // `chartThumbnails` in state). Strip the field on load so + // it doesn't get re-persisted; ChartRenderService will + // repopulate the thumbnail slice from the module cache. + if (c && (c as any).thumbnail !== undefined) { + const { thumbnail: _drop, ...rest } = c as any; + return rest as Chart; + } + return c; + }), conceptShelfItems: saved.conceptShelfItems || [], focusedDataCleanBlockId: saved.focusedDataCleanBlockId || undefined, focusedId: saved.focusedId || undefined, @@ -669,6 +847,8 @@ export const dataFormulatorSlice = createSlice({ chartInsightInProgress: [], cleanInProgress: false, dataLoadingChatInProgress: false, + dataLoadingChatResetCounter: 0, + agentHandoffRequest: null, sessionLoading: false, sessionLoadingLabel: '', @@ -676,6 +856,14 @@ export const dataFormulatorSlice = createSlice({ activeWorkspace: saved.activeWorkspace ?? state.activeWorkspace ?? null, dataSourceSidebarOpen: state.dataSourceSidebarOpen, + + // Reset display-rows tick so dependent components re-fetch. + displayRowsTick: 0, + + // Thumbnails are not persisted; ChartRenderService + // repopulates this slice from the module cache / fresh + // renders after load. + chartThumbnails: {}, }; }, setServerConfig: (state, action: PayloadAction) => { @@ -711,7 +899,7 @@ export const dataFormulatorSlice = createSlice({ try { localStorage.removeItem('df_selected_model'); } catch { /* */ } } }, - updateModelStatus: (state, action: PayloadAction<{id: string, status: 'ok' | 'error' | 'testing' | 'unknown' | 'configured', message: string}>) => { + updateModelStatus: (state, action: PayloadAction<{id: string, status: 'ok' | 'error' | 'testing' | 'unknown', message: string}>) => { let id = action.payload.id; let status = action.payload.status; let message = action.payload.message; @@ -768,20 +956,17 @@ export const dataFormulatorSlice = createSlice({ state.tables = state.tables.map(t => { if (t.id == tableId) { - // Update metadata type inference based on new data let newMetadata = { ...t.metadata }; for (let name of t.names) { if (newRows.length > 0 && name in newRows[0]) { + const colVals = newRows.map(r => r[name]); newMetadata[name] = { ...newMetadata[name], - type: inferTypeFromValueArray(newRows.map(r => r[name])), + type: refineTemporalType(colVals, inferTypeFromValueArray(colVals)), }; } } - // Update lastRefreshed timestamp if source exists const updatedSource = t.source ? { ...t.source, lastRefreshed: Date.now() } : undefined; - // Use provided content hash (from backend for virtual/DB tables) or compute locally - // For virtual tables, backend hash reflects full table; for stream tables, compute from actual rows const newContentHash = providedContentHash || computeContentHash(newRows, t.names); const updatedVirtual = { ...t.virtual, rowCount: newRows.length }; return { ...t, rows: newRows, metadata: newMetadata, source: updatedSource, contentHash: newContentHash, virtual: updatedVirtual }; @@ -802,9 +987,10 @@ export const dataFormulatorSlice = createSlice({ let newMetadata = { ...t.metadata }; for (let name of t.names) { if (newRows.length > 0 && name in newRows[0]) { + const colVals = newRows.map(r => r[name]); newMetadata[name] = { ...newMetadata[name], - type: inferTypeFromValueArray(newRows.map(r => r[name])), + type: refineTemporalType(colVals, inferTypeFromValueArray(colVals)), }; } } @@ -863,8 +1049,9 @@ export const dataFormulatorSlice = createSlice({ } let newMetadata = structuredClone(table.metadata); + const inferredColType = refineTemporalType(newValues, inferTypeFromValueArray(newValues)); for (let name of newNames) { - newMetadata[name] = {type: inferTypeFromValueArray(newValues), semanticType: "", levels: []}; + newMetadata[name] = {type: inferredColType, semanticType: "", levels: []}; } // Create new rows with the column positioned after the first parent @@ -942,22 +1129,10 @@ export const dataFormulatorSlice = createSlice({ let chartId = action.payload; let chartCopy = JSON.parse(JSON.stringify(state.charts.find(chart => chart.id == chartId) as Chart)) as Chart; - chartCopy = { ...chartCopy, saved: false } chartCopy.id = `chart-${Date.now()- Math.floor(Math.random() * 10000)}`; state.charts.push(chartCopy); state.focusedId = { type: 'chart', chartId: chartCopy.id }; }, - saveUnsaveChart: (state, action: PayloadAction) => { - let chartId = action.payload; - - state.charts = state.charts.map(chart => { - if (chart.id == chartId) { - return { ...chart, saved: !chart.saved }; - } else { - return chart; - } - }) - }, deleteChartById: (state, action: PayloadAction) => { let chartId = action.payload; deleteChartsRoutine(state, [chartId]); @@ -966,13 +1141,13 @@ export const dataFormulatorSlice = createSlice({ let chartId = action.payload.chartId; let chartType = action.payload.chartType; - let chart = dfSelectors.getAllCharts(state).find(c => c.id == chartId); + let chart = collectAllCharts(state).find(c => c.id == chartId); if (chart) { const template = getChartTemplate(chartType) as ChartTemplate; const sourceType = chart.chartType; // Get data table + semantic types for recommendation-based adaptation - let allCharts = dfSelectors.getAllCharts(state); + let allCharts = collectAllCharts(state); let table = getDataTable(chart, state.tables, allCharts, state.conceptShelfItems); const semanticTypes: Record = {}; if (table) { @@ -1013,17 +1188,14 @@ export const dataFormulatorSlice = createSlice({ if (field) newEncodingMap[ch as Channel] = { fieldID: field.id }; } chart = { ...chart, chartType, encodingMap: newEncodingMap }; - - // Fill any remaining empty channels via full recommendation - if (table) { - const suggested = vlRecommendEncodings(chartType, table.rows, semanticTypes); - for (const [channel, fieldName] of Object.entries(suggested)) { - if (chart.encodingMap[channel as Channel]?.fieldID == undefined) { - const fieldItem = state.conceptShelfItems.find(f => f.name === fieldName && table!.names.includes(f.name)); - if (fieldItem) chart.encodingMap[channel as Channel] = { fieldID: fieldItem.id }; - } - } - } + + // Intentionally do NOT autofill remaining empty channels via a + // second recommendation pass: the adapter already returns at + // most as many fields as the source had, and re-recommending + // here would (a) re-introduce duplicates (e.g. `metric` already + // on `y` getting suggested again for `color`) and (b) surprise + // the user by inflating a 2-encoding chart into a 5-encoding + // one on type switch. Empty channels are left for the user. // Chart type changed — any active variant was authored against // the old structure, so step out of it. The variants stay in @@ -1049,7 +1221,7 @@ export const dataFormulatorSlice = createSlice({ let chartId = action.payload.chartId; let key = action.payload.key; let value = action.payload.value; - let chart = dfSelectors.getAllCharts(state).find(c => c.id == chartId); + let chart = collectAllCharts(state).find(c => c.id == chartId); if (chart) { if (!chart.config) { chart.config = {}; @@ -1062,13 +1234,19 @@ export const dataFormulatorSlice = createSlice({ } }, updateChartThumbnail: (state, action: PayloadAction<{chartId: string, thumbnail: string}>) => { - let chart = dfSelectors.getAllCharts(state).find(c => c.id == action.payload.chartId); - if (chart) { - chart.thumbnail = action.payload.thumbnail; - } + // Write to a dedicated slice (not onto the Chart object) so that + // thumbnail updates don't invalidate the `charts` array reference + // — that ref is in the dep list of ChartRenderService's effect, + // and churning it cancels the in-flight render queue on every + // tick (see design discussion on tick performance). + if (!state.chartThumbnails) state.chartThumbnails = {}; + state.chartThumbnails[action.payload.chartId] = action.payload.thumbnail; + }, + bumpDisplayRowsTick: (state) => { + state.displayRowsTick = (state.displayRowsTick || 0) + 1; }, updateChartInsight: (state, action: PayloadAction<{chartId: string, insight: ChartInsight}>) => { - let chart = dfSelectors.getAllCharts(state).find(c => c.id == action.payload.chartId); + let chart = collectAllCharts(state).find(c => c.id == action.payload.chartId); if (chart) { chart.insight = action.payload.insight; } @@ -1080,7 +1258,7 @@ export const dataFormulatorSlice = createSlice({ // the preview reflects whichever variant the user has active. addStyleVariant: (state, action: PayloadAction<{chartId: string, variant: ChartStyleVariant, activate?: boolean}>) => { const { chartId, variant, activate } = action.payload; - const chart = dfSelectors.getAllCharts(state).find(c => c.id === chartId); + const chart = collectAllCharts(state).find(c => c.id === chartId); if (!chart) return; if (!chart.styleVariants) chart.styleVariants = []; chart.styleVariants.push(variant); @@ -1090,13 +1268,13 @@ export const dataFormulatorSlice = createSlice({ }, setActiveVariant: (state, action: PayloadAction<{chartId: string, variantId: string | undefined}>) => { const { chartId, variantId } = action.payload; - const chart = dfSelectors.getAllCharts(state).find(c => c.id === chartId); + const chart = collectAllCharts(state).find(c => c.id === chartId); if (!chart) return; chart.activeVariantId = variantId; }, deleteStyleVariant: (state, action: PayloadAction<{chartId: string, variantId: string}>) => { const { chartId, variantId } = action.payload; - const chart = dfSelectors.getAllCharts(state).find(c => c.id === chartId); + const chart = collectAllCharts(state).find(c => c.id === chartId); if (!chart || !chart.styleVariants) return; chart.styleVariants = chart.styleVariants.filter(v => v.id !== variantId); if (chart.activeVariantId === variantId) { @@ -1108,7 +1286,7 @@ export const dataFormulatorSlice = createSlice({ }, renameStyleVariant: (state, action: PayloadAction<{chartId: string, variantId: string, label: string}>) => { const { chartId, variantId, label } = action.payload; - const chart = dfSelectors.getAllCharts(state).find(c => c.id === chartId); + const chart = collectAllCharts(state).find(c => c.id === chartId); const v = chart?.styleVariants?.find(v => v.id === variantId); if (v) v.label = label; }, @@ -1117,7 +1295,7 @@ export const dataFormulatorSlice = createSlice({ // the chip doesn't visibly disappear and re-appear. updateStyleVariant: (state, action: PayloadAction<{chartId: string, variantId: string, vlSpec: any, rationale?: string, encodingFingerprint?: string}>) => { const { chartId, variantId, vlSpec, rationale, encodingFingerprint } = action.payload; - const chart = dfSelectors.getAllCharts(state).find(c => c.id === chartId); + const chart = collectAllCharts(state).find(c => c.id === chartId); const v = chart?.styleVariants?.find(v => v.id === variantId); if (!v) return; v.vlSpec = vlSpec; @@ -1128,7 +1306,7 @@ export const dataFormulatorSlice = createSlice({ let chartId = action.payload.chartId; let channel = action.payload.channel; let encoding = action.payload.encoding; - let chart = dfSelectors.getAllCharts(state).find(c => c.id == chartId); + let chart = collectAllCharts(state).find(c => c.id == chartId); if (chart) { chart.encodingMap[channel] = encoding; // Auto-revert to default whenever the user edits the encoding so @@ -1143,7 +1321,7 @@ export const dataFormulatorSlice = createSlice({ let channel = action.payload.channel; let prop = action.payload.prop; let value = action.payload.value; - let chart = dfSelectors.getAllCharts(state).find(c => c.id == chartId); + let chart = collectAllCharts(state).find(c => c.id == chartId); let table = state.tables.find(t => t.id == chart?.tableRef) as DictTable; if (chart) { @@ -1194,7 +1372,7 @@ export const dataFormulatorSlice = createSlice({ let channel1 = action.payload.channel1; let channel2 = action.payload.channel2; - let chart = dfSelectors.getAllCharts(state).find(c => c.id == chartId); + let chart = collectAllCharts(state).find(c => c.id == chartId); if (chart) { let enc1 = chart.encodingMap[channel1]; let enc2 = chart.encodingMap[channel2]; @@ -1221,38 +1399,28 @@ export const dataFormulatorSlice = createSlice({ }, deleteConceptItemByID: (state, action: PayloadAction) => { let conceptID = action.payload; - let allCharts = dfSelectors.getAllCharts(state); + let allCharts = collectAllCharts(state); // remove concepts from encoding maps - if (allCharts.some(chart => chart.saved - && Object.entries(chart.encodingMap).some(([channel, encoding]) => encoding.fieldID && conceptID == encoding.fieldID))) { - console.log("cannot delete!") - } else { - state.conceptShelfItems = state.conceptShelfItems.filter(f => f.id != conceptID); - for (let chart of allCharts) { - for (let [channel, encoding] of Object.entries(chart.encodingMap)) { - if (encoding.fieldID && conceptID == encoding.fieldID) { - // clear the encoding - chart.encodingMap[channel as Channel] = { } - } + state.conceptShelfItems = state.conceptShelfItems.filter(f => f.id != conceptID); + for (let chart of allCharts) { + for (let [channel, encoding] of Object.entries(chart.encodingMap)) { + if (encoding.fieldID && conceptID == encoding.fieldID) { + // clear the encoding + chart.encodingMap[channel as Channel] = { } } } } }, batchDeleteConceptItemByID: (state, action: PayloadAction) => { - let allCharts = dfSelectors.getAllCharts(state); + let allCharts = collectAllCharts(state); for (let conceptID of action.payload) { // remove concepts from encoding maps - if (allCharts.some(chart => chart.saved - && Object.entries(chart.encodingMap).some(([channel, encoding]) => encoding.fieldID && conceptID == encoding.fieldID))) { - console.log("cannot delete!") - } else { - state.conceptShelfItems = state.conceptShelfItems.filter(field => field.id != conceptID); - for (let chart of allCharts) { - for (let [channel, encoding] of Object.entries(chart.encodingMap)) { - if (encoding.fieldID && conceptID == encoding.fieldID) { - // clear the encoding - chart.encodingMap[channel as Channel] = { } - } + state.conceptShelfItems = state.conceptShelfItems.filter(field => field.id != conceptID); + for (let chart of allCharts) { + for (let [channel, encoding] of Object.entries(chart.encodingMap)) { + if (encoding.fieldID && conceptID == encoding.fieldID) { + // clear the encoding + chart.encodingMap[channel as Channel] = { } } } } @@ -1377,7 +1545,7 @@ export const dataFormulatorSlice = createSlice({ }, clearUnReferencedTables: (state) => { // remove all tables that are not referred - let allCharts = dfSelectors.getAllCharts(state); + let allCharts = collectAllCharts(state); let referredTableId = allCharts.map(chart => getDataTable(chart, state.tables, allCharts, state.conceptShelfItems)) .filter(t => t != undefined).map(t => t.id); let tablesToRemove = state.tables.filter(t => t.derive && !referredTableId.some(tableId => tableId == t.id)); @@ -1389,7 +1557,7 @@ export const dataFormulatorSlice = createSlice({ }, clearUnReferencedCustomConcepts: (state) => { let fieldNamesFromTables = state.tables.map(t => t.names).flat(); - let fieldIdsReferredByCharts = dfSelectors.getAllCharts(state).map(c => Object.values(c.encodingMap).map(enc => enc.fieldID).filter(fid => fid != undefined) as string[]).flat(); + let fieldIdsReferredByCharts = collectAllCharts(state).map(c => Object.values(c.encodingMap).map(enc => enc.fieldID).filter(fid => fid != undefined) as string[]).flat(); state.conceptShelfItems = state.conceptShelfItems.filter(field => !(field.source == "custom" && !(fieldNamesFromTables.includes(field.name) || fieldIdsReferredByCharts.includes(field.id)))) @@ -1401,14 +1569,22 @@ export const dataFormulatorSlice = createSlice({ state.displayedMessageIdx = action.payload }, setFocused: (state, action: PayloadAction) => { - state.focusedId = action.payload; + const payload = action.payload; + state.focusedId = payload; - if (action.payload?.type === 'chart' && state.viewMode == 'report') { + if (payload?.type === 'chart' && state.viewMode == 'report') { state.viewMode = 'editor'; } - if (action.payload?.type === 'report') { + if (payload?.type === 'report') { state.viewMode = 'report'; } + // Clear the "unread" mark on a chart as soon as the user focuses it. + if (payload?.type === 'chart') { + const focusedChart = state.charts.find(c => c.id === payload.chartId); + if (focusedChart?.unread) { + focusedChart.unread = false; + } + } }, setFocusedDataCleanBlockId: (state, action: PayloadAction<{blockId: string, itemId: number} | undefined>) => { state.focusedDataCleanBlockId = action.payload; @@ -1480,7 +1656,15 @@ export const dataFormulatorSlice = createSlice({ } }, clearChatMessages: (state) => { + // Reset is a coherent operation: clear messages, drop the + // in-progress flag, and bump the reset counter so the chat + // surface aborts its in-flight stream and discards any + // pending dispatches from that stream. Doing all three in + // one reducer avoids interleaving with redux/react render + // cycles that would otherwise let stale messages slip in. state.dataLoadingChatMessages = []; + state.dataLoadingChatInProgress = false; + state.dataLoadingChatResetCounter = (state.dataLoadingChatResetCounter ?? 0) + 1; }, confirmTableLoad: (state, action: PayloadAction<{messageId: string, tableName: string}>) => { const msg = state.dataLoadingChatMessages.find(m => m.id === action.payload.messageId); @@ -1500,6 +1684,25 @@ export const dataFormulatorSlice = createSlice({ setDataLoadingChatInProgress: (state, action: PayloadAction) => { state.dataLoadingChatInProgress = action.payload; }, + /** + * Request that the Data Agent hand off to a peer agent + * (Data Loading or Report Gen) seeded with a specific prompt + * (and optional images). Consumed by `DataFormulator` (for + * `data_loading` — opens the unified upload dialog on the + * 'extract' tab) or `SimpleChartRecBox` (for `report_gen` + * — kicks off the report generator); each clears the + * request after handling. + */ + requestAgentHandoff: (state, action: PayloadAction<{ target: 'data_loading' | 'report_gen'; prompt: string; images?: string[] }>) => { + state.agentHandoffRequest = { + target: action.payload.target, + prompt: action.payload.prompt, + images: action.payload.images, + }; + }, + clearAgentHandoffRequest: (state) => { + state.agentHandoffRequest = null; + }, // Generated reports actions saveGeneratedReport: (state, action: PayloadAction) => { const report = action.payload; @@ -1523,7 +1726,7 @@ export const dataFormulatorSlice = createSlice({ if (wasFocused && report) { const triggerTableId = report.triggerTableId; if (triggerTableId) { - const allCharts = dfSelectors.getAllCharts(state); + const allCharts = collectAllCharts(state); const tableChart = allCharts.find(c => c.tableRef === triggerTableId && c.source === 'user'); if (tableChart) { state.focusedId = { type: 'chart', chartId: tableChart.id }; @@ -1555,6 +1758,56 @@ export const dataFormulatorSlice = createSlice({ }, extraReducers: (builder) => { builder + .addCase(REHYDRATE, (state: any, action: any) => { + // On a normal page refresh, redux-persist replays the persisted + // state directly into the reducer — it does NOT go through our + // `loadState` action. Any draft that was `running` or + // `clarifying` when the tab closed will rehydrate in that + // status, but the SSE stream that was driving it is gone, so + // the UI gets stuck on a "thinking…" banner with a runaway + // elapsed-time counter. Mark those drafts as interrupted and + // clear transient agent flags, mirroring the same cleanup + // `loadState` performs for session loads. + const incoming = action.payload; + if (!incoming) return; + if (Array.isArray(incoming.draftNodes)) { + incoming.draftNodes = incoming.draftNodes.map((node: DraftNode) => { + if (node.derive?.status === 'running' || node.derive?.status === 'clarifying') { + return { + ...node, + derive: { + ...node.derive, + status: 'interrupted' as const, + runningPlan: undefined, + trigger: { + ...node.derive.trigger, + interaction: [ + ...(node.derive.trigger.interaction || []), + { + from: 'data-agent' as const, + to: 'user' as const, + role: 'error' as const, + content: 'Interrupted by page refresh. You can retry or delete this step.', + timestamp: Date.now(), + }, + ], + }, + }, + }; + } + return node; + }); + } + // Reset other transient in-progress flags that snuck into the + // persisted blob (chartSynthesisInProgress / chartInsightInProgress + // are already blacklisted in store.ts). + incoming.cleanInProgress = false; + incoming.dataLoadingChatInProgress = false; + incoming.sessionLoading = false; + incoming.sessionLoadingLabel = ''; + incoming.messages = []; + incoming.displayedMessageIdx = -1; + }) .addCase(fetchFieldSemanticType.fulfilled, (state, action) => { let data = action.payload; let tableId = action.meta.arg.id; @@ -1564,12 +1817,20 @@ export const dataFormulatorSlice = createSlice({ let typeMap = data['result'][0]['fields']; for (let name of table.names) { - table.metadata[name] = { - type: typeMap[name]['type'] as Type, - semanticType: typeMap[name]['semantic_type'], - levels: typeMap[name]['sort_order'] || undefined, - intrinsicDomain: typeMap[name]['intrinsic_domain'] || undefined, - unit: typeMap[name]['unit'] || undefined, + const prev = table.metadata[name] || { type: Type.String, semanticType: "", levels: [] }; + const sortOrder = typeMap[name]['sort_order']; + const hasCuratedLevels = Array.isArray(sortOrder) && sortOrder.length > 0; + // Per design-doc 31 precedence: when the agent supplies a + // curated sort_order, drop any data-derived levelCounts so + // the popover checklist hides the count column. + table.metadata[name] = { + ...prev, + type: typeMap[name]['type'] as Type, + semanticType: typeMap[name]['semantic_type'], + levels: hasCuratedLevels ? sortOrder : (prev.levels || []), + levelCounts: hasCuratedLevels ? undefined : prev.levelCounts, + intrinsicDomain: typeMap[name]['intrinsic_domain'] || prev.intrinsicDomain, + unit: typeMap[name]['unit'] || prev.unit, }; } @@ -1588,22 +1849,45 @@ export const dataFormulatorSlice = createSlice({ } } }) + .addCase(fetchColumnStats.fulfilled, (state, action) => { + const { tableId, statistics } = action.payload as { + tableId: string; + statistics: Array<{ column: string; statistics: any }>; + }; + const table = state.tables.find(t => t.id === tableId) as DictTable | undefined; + if (!table) return; + for (const entry of statistics) { + const name = entry?.column; + if (!name || !(name in table.metadata)) continue; + const s = entry.statistics || {}; + const prev = table.metadata[name]; + const hasExistingLevels = Array.isArray(prev.levels) && prev.levels.length > 0; + const incomingLevels = Array.isArray(s.levels) ? s.levels : undefined; + const incomingCounts = Array.isArray(s.level_counts) ? s.level_counts : undefined; + table.metadata[name] = { + ...prev, + distinctCount: typeof s.unique_count === 'number' ? s.unique_count : prev.distinctCount, + nullCount: typeof s.null_count === 'number' ? s.null_count : prev.nullCount, + // Curated levels win; only fill when empty. + levels: hasExistingLevels ? prev.levels : (incomingLevels || []), + levelCounts: hasExistingLevels ? prev.levelCounts : incomingCounts, + }; + } + }) .addCase(fetchGlobalModelList.fulfilled, (state, action) => { // Populate globalModels so the UI renders every configured model - // immediately. Status starts as "unknown"; the user can click - // "Test" to verify connectivity, or errors surface on first use. + // immediately. Server-configured models are trusted by default: + // they start as "unknown" and are selectable without a connectivity + // check. Users can click "Test" to verify manually if they want. const models: ModelConfig[] = action.payload; state.globalModels = models; - // Reset all global model statuses to "configured" on every app start. - // "configured" means: admin has set this up in .env, ready to use, - // but connectivity has not been verified this session. - // testedModels is persisted by redux-persist, so without this reset - // stale "ok" statuses from a previous session would linger. - // User-added model test results are preserved. + // Reset stale global model statuses on every app start so a previous + // session's "ok"/"error" doesn't linger. User-added model test + // results are preserved. const globalIds = new Set(models.map(m => m.id)); state.testedModels = [ - ...models.map(m => ({ id: m.id, status: 'configured' as const, message: '' })), + ...models.map(m => ({ id: m.id, status: 'unknown' as const, message: '' })), ...state.testedModels.filter(t => !globalIds.has(t.id)), ]; @@ -1633,7 +1917,7 @@ export const dataFormulatorSlice = createSlice({ state.testedModels = [ ...serverModels.map(m => ({ id: m.id, - status: (m.status === 'connected' ? 'ok' : 'error') as 'ok' | 'error' | 'testing' | 'unknown' | 'configured', + status: (m.status === 'connected' ? 'ok' : 'error') as 'ok' | 'error' | 'testing' | 'unknown', message: m.error ?? '', })), ...state.testedModels.filter(t => !serverModels.some(m => m.id === t.id)), @@ -1654,7 +1938,7 @@ export const dataFormulatorSlice = createSlice({ state.testedModels = state.testedModels.map(model => model.status === 'testing' - ? { ...model, status: 'configured' as const, message: '' } + ? { ...model, status: 'unknown' as const, message: '' } : model ); state.messages.push({ @@ -1697,7 +1981,7 @@ export const dataFormulatorSlice = createSlice({ }) .addCase(fetchChartInsight.fulfilled, (state, action) => { let { chartId, insightKey, title, takeaways } = action.payload; - let chart = dfSelectors.getAllCharts(state).find(c => c.id === chartId); + let chart = collectAllCharts(state).find(c => c.id === chartId); if (chart && (title || (takeaways && takeaways.length > 0))) { chart.insight = { title, takeaways: takeaways || [], key: insightKey }; } @@ -1839,7 +2123,7 @@ export const dfSelectors = { if (!state.focusedId) return undefined; if (state.focusedId.type === 'table') return state.focusedId.tableId; // type === 'chart': derive table from the chart's tableRef - let allCharts = dfSelectors.getAllCharts(state); + let allCharts = collectAllCharts(state); let chart = allCharts.find(c => c.id === (state.focusedId as { type: 'chart'; chartId: string }).chartId); return chart?.tableRef; }, @@ -1864,6 +2148,15 @@ export const dfSelectors = { } ), + /** + * Subscribe to a single chart's thumbnail without re-rendering whenever + * any other chart's thumbnail changes. Use as + * `useSelector(dfSelectors.getChartThumbnail(chartId))`. + */ + getChartThumbnail: (chartId: string) => + (state: DataFormulatorState): string | undefined => + state.chartThumbnails?.[chartId], + replaceChart: (state: DataFormulatorState, chart: Chart) => { if (state.charts.find(c => c.id == chart.id)) { // chart is from charts diff --git a/src/app/displayRowsCache.ts b/src/app/displayRowsCache.ts new file mode 100644 index 00000000..a58bbefa --- /dev/null +++ b/src/app/displayRowsCache.ts @@ -0,0 +1,43 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Shared module-level cache for display-row samples used by the focused + * canvas. Lets background services (e.g. ChartRenderService) reuse the + * same sample the canvas already fetched, so thumbnails match the main + * chart instead of rendering against the small preview slice that virtual + * tables ship in `table.rows`. + * + * Key shape: `${tableId}-${sortedFields.join('_')}-${contentSuffix}` + * where contentSuffix is `${contentHash.slice(0,8)}` if present, + * otherwise `${table.rows.length}`. + */ + +import { Chart, DictTable, FieldItem } from '../components/ComponentType'; +import { extractFieldsFromEncodingMap } from './utils'; + +export interface DisplayRowsEntry { + rows: any[]; + totalCount: number; +} + +export const displayRowsCache = new Map(); + +/** Build the cache key the canvas uses for a given table + chart. */ +export function computeDisplayRowsCacheKey( + table: DictTable, + chart: Chart, + conceptShelfItems: FieldItem[], +): string { + const { aggregateFields, groupByFields } = extractFieldsFromEncodingMap( + chart.encodingMap, conceptShelfItems, + ); + const sortedFields = [ + ...aggregateFields.map(f => `${f[0]}_${f[1]}`), + ...groupByFields, + ].sort(); + const contentSuffix = table.contentHash + ? `-${table.contentHash.slice(0, 8)}` + : `-${table.rows.length}`; + return `${table.id}-${sortedFields.join('_')}${contentSuffix}`; +} diff --git a/src/app/tableThunks.ts b/src/app/tableThunks.ts index 101c68f5..72760816 100644 --- a/src/app/tableThunks.ts +++ b/src/app/tableThunks.ts @@ -15,10 +15,10 @@ import { createAsyncThunk } from '@reduxjs/toolkit'; import { DataSourceConfig, DictTable } from '../components/ComponentType'; import { Type, mapApiTypeToAppType } from '../data/types'; -import { inferTypeFromValueArray } from '../data/utils'; +import { inferTypeFromValueArray, refineTemporalType } from '../data/utils'; import { getUrls, CONNECTOR_ACTION_URLS, computeContentHash, SourceTableRef } from './utils'; import { apiRequest } from './apiClient'; -import { DataFormulatorState, dfActions, fetchFieldSemanticType } from './dfSlice'; +import { DataFormulatorState, dfActions, fetchColumnStats, fetchFieldSemanticType } from './dfSlice'; import { tableDataDB } from './workspaceDB'; import i18n from '../i18n'; @@ -256,14 +256,18 @@ export const loadTable = createAsyncThunk< displayId: table.displayId || table.id, names, rows, - metadata: names.reduce((acc: Record, name: string) => ({ - ...acc, - [name]: { - type: inferTypeFromValueArray(rows.map((r: any) => r[name])), - semanticType: "", - levels: [] - } - }), {}), + metadata: names.reduce((acc: Record, name: string) => { + const colVals = rows.map((r: any) => r[name]); + const inferred = inferTypeFromValueArray(colVals); + return { + ...acc, + [name]: { + type: refineTemporalType(colVals, inferred), + semanticType: "", + levels: [] + } + }; + }, {}), anchored: true, }; } catch (err) { @@ -311,6 +315,12 @@ export const loadTable = createAsyncThunk< // Dispatch the table into Redux state dispatch(dfActions.addTableToStore(finalTable)); dispatch(fetchFieldSemanticType(finalTable)); + // Workspace-stored tables get backend-computed column stats + // (distinct/null counts + low-card value lists) for the grid + // filter popover. Browser-only tables skip this in v1. + if (finalTable.virtual) { + dispatch(fetchColumnStats(finalTable)); + } // Notify user about local (browser) truncation only. // Database-side truncation (row limit from import options) is @@ -393,15 +403,22 @@ export function buildDictTableFromWorkspace( id: wsTable.name, displayId: wsTable.name, names: wsTable.columns.map((col: any) => col.name), - metadata: wsTable.columns.reduce((acc: Record, col: any) => ({ - ...acc, - [col.name]: { - type: convertSqlTypeToAppType(col.type), - semanticType: "", - levels: [], - ...(col.description ? { description: col.description } : {}), + metadata: wsTable.columns.reduce((acc: Record, col: any) => { + let colType = convertSqlTypeToAppType(col.type); + if (colType === Type.DateTime && Array.isArray(wsTable.sample_rows) && wsTable.sample_rows.length > 0) { + const vals = wsTable.sample_rows.map((r: any) => r[col.name]); + colType = refineTemporalType(vals, colType); } - }), {}), + return { + ...acc, + [col.name]: { + type: colType, + semanticType: "", + levels: [], + ...(col.description ? { description: col.description } : {}), + } + }; + }, {}), rows: wsTable.sample_rows, virtual: { tableId: wsTable.name, diff --git a/src/app/useAutoSave.tsx b/src/app/useAutoSave.tsx index dc1589a8..755a4918 100644 --- a/src/app/useAutoSave.tsx +++ b/src/app/useAutoSave.tsx @@ -17,6 +17,9 @@ const EXCLUDED_FIELDS = new Set([ // Transient fields that shouldn't trigger or be included in saves 'chartSynthesisInProgress', 'chartInsightInProgress', 'cleanInProgress', 'sessionLoading', 'sessionLoadingLabel', + // Thumbnails are derived from chart specs + table data; re-rendered + // from the module cache on reload, so don't waste bandwidth saving them. + 'chartThumbnails', ]); /** Debounce interval in milliseconds. */ diff --git a/src/app/useFormulateData.ts b/src/app/useFormulateData.ts index 049aa393..dd128b50 100644 --- a/src/app/useFormulateData.ts +++ b/src/app/useFormulateData.ts @@ -184,6 +184,7 @@ export function useFormulateData() { const triggers = getTriggers(table, tables); if (triggers.length === 0) continue; + const STEP_FINDING_CHAR_LIMIT = 200; const steps = triggers.map(trigger => { const instr = trigger.interaction?.find(e => e.role === 'instruction'); const label = instr?.displayContent || instr?.content || trigger.resultTableId; @@ -193,7 +194,16 @@ export function useFormulateData() { const encStr = Object.entries(chartEncodingsByName(chart)) .map(([k, v]) => `${k}: ${v}`) .join(', '); - return `${label}${chartType ? ` → ${chartType}` : ''}${encStr ? ` (${encStr})` : ''}`; + // Per-step agent commentary: the `summary` entry the visualize + // action emits after running this step. + let finding = trigger.interaction?.find( + e => e.role === 'summary', + )?.content?.trim() || ''; + if (finding.length > STEP_FINDING_CHAR_LIMIT) { + finding = finding.slice(0, STEP_FINDING_CHAR_LIMIT - 1).trimEnd() + '…'; + } + const head = `${label}${chartType ? ` → ${chartType}` : ''}${encStr ? ` (${encStr})` : ''}`; + return finding ? `${head} — finding: ${finding}` : head; }); const sourceTableId = triggers[0].tableId; diff --git a/src/app/utils.tsx b/src/app/utils.tsx index 492a40af..7688bd80 100644 --- a/src/app/utils.tsx +++ b/src/app/utils.tsx @@ -17,15 +17,12 @@ export function getUrls() { APP_CONFIG: `/api/app-config`, AUTH_INFO_PREFIX: `/api/.auth/`, - EXAMPLE_DATASETS: `/api/example-datasets`, - // these functions involves ai agents LIST_GLOBAL_MODELS: `/api/agent/list-global-models`, CHECK_AVAILABLE_MODELS: `/api/agent/check-available-models`, TEST_MODEL: `/api/agent/test-model`, SORT_DATA_URL: `/api/agent/sort-data`, - CLEAN_DATA_URL: `/api/agent/clean-data-stream`, DATA_LOADING_CHAT_URL: `/api/agent/data-loading-chat`, SCRATCH_UPLOAD_URL: `/api/agent/workspace/scratch/upload`, SCRATCH_BASE_URL: `/api/agent/workspace/scratch`, diff --git a/src/app/workspaceDB.ts b/src/app/workspaceDB.ts index cc0f70db..5af62e0b 100644 --- a/src/app/workspaceDB.ts +++ b/src/app/workspaceDB.ts @@ -311,12 +311,24 @@ export async function importWorkspaceFromZip( if (!stateFile) throw new Error('Invalid workspace zip: missing session_state.json'); const state: Record = JSON.parse(await stateFile.async('string')); - // Read workspace.yaml (JSON format) — may be at root or under workspace/ + // Read workspace.yaml — may be at root or under workspace/. + // Demo zips (built by build_demo_zips.py) write actual YAML here, + // while exported workspaces write JSON for parsing convenience. The + // tableIndex inside is optional metadata used to drive on-demand + // refresh of stream/live tables; demos don't need it because their + // rows are embedded directly in session_state.json/state.json. So + // we try JSON.parse and fall back to an empty tableIndex on any + // parse error. let tableIndex: TableIndexEntry[] = []; const yamlFile = zip.file('workspace.yaml') || zip.file('workspace/workspace.yaml'); if (yamlFile) { - const meta = JSON.parse(await yamlFile.async('string')); - tableIndex = meta.tableIndex || []; + const raw = await yamlFile.async('string'); + try { + const meta = JSON.parse(raw); + tableIndex = meta.tableIndex || []; + } catch { + // Non-JSON (YAML demo manifest) — skip; demo state is self-contained. + } } // Save workspace metadata + state @@ -324,6 +336,7 @@ export async function importWorkspaceFromZip( // Read data/*.json or workspace/data/*.json → table_data store let tableCount = 0; + const savedTableIds = new Set(); const dataFolder = zip.folder('data') ?? zip.folder('workspace/data'); if (dataFolder) { const filePromises: Promise[] = []; @@ -333,6 +346,7 @@ export async function importWorkspaceFromZip( filePromises.push( zipEntry.async('string').then(content => { const rows = JSON.parse(content); + savedTableIds.add(tableId); return tableDataDB.save(workspaceId, tableId, rows); }) ); @@ -341,5 +355,25 @@ export async function importWorkspaceFromZip( await Promise.all(filePromises); } + // Fallback for demo zips (build_demo_zips.py): the data/ folder contains + // .parquet files which we can't read in the browser, but the state + // itself carries the first N rows inline under each table's `rows` + // field. Persist those into tableDataDB so ephemeral mode can ship them + // to the server with each request — otherwise the backend has no file + // to read and the agent fails with FileNotFoundError. + const stateTables = Array.isArray((state as any).tables) ? (state as any).tables : []; + const inlinePromises: Promise[] = []; + for (const t of stateTables) { + const tableId: string | undefined = t?.id || t?.name; + const rows = t?.rows; + if (!tableId || !Array.isArray(rows) || rows.length === 0) continue; + if (savedTableIds.has(tableId)) continue; + inlinePromises.push(tableDataDB.save(workspaceId, tableId, rows)); + tableCount++; + } + if (inlinePromises.length > 0) { + await Promise.all(inlinePromises); + } + return { state, tableCount }; } diff --git a/src/app/workspaceService.ts b/src/app/workspaceService.ts index 8f4f1286..01eaf00d 100644 --- a/src/app/workspaceService.ts +++ b/src/app/workspaceService.ts @@ -188,7 +188,7 @@ export async function exportWorkspace(id: string): Promise { const exportRes = await fetchWithIdentity(getUrls().SESSION_EXPORT, { method: 'POST', headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ state: data.state }), + body: JSON.stringify({ state: data.state, workspace_id: id }), }); await assertDownloadResponseOk(exportRes, 'Export failed'); return exportRes.blob(); @@ -205,9 +205,10 @@ export async function importWorkspace( const { state } = await importWorkspaceFromZip(file, workspaceId, displayName); return state as Record; } - // Server: upload zip + // Server: upload zip with target workspace ID const formData = new FormData(); formData.append('file', file); + formData.append('workspace_id', workspaceId); const { data } = await apiRequest<{ state: any }>(getUrls().SESSION_IMPORT, { method: 'POST', body: formData, diff --git a/src/assets/chart-icon-area-min.png b/src/assets/chart-icon-area-min.png deleted file mode 100644 index 39425d22..00000000 Binary files a/src/assets/chart-icon-area-min.png and /dev/null differ diff --git a/src/assets/chart-icon-area-stacked-min.png b/src/assets/chart-icon-area-stacked-min.png deleted file mode 100644 index 12b9a771..00000000 Binary files a/src/assets/chart-icon-area-stacked-min.png and /dev/null differ diff --git a/src/assets/chart-icon-bar-grouped-min.png b/src/assets/chart-icon-bar-grouped-min.png deleted file mode 100644 index 9625a91a..00000000 Binary files a/src/assets/chart-icon-bar-grouped-min.png and /dev/null differ diff --git a/src/assets/chart-icon-bar-min.png b/src/assets/chart-icon-bar-min.png deleted file mode 100644 index 7d1e4259..00000000 Binary files a/src/assets/chart-icon-bar-min.png and /dev/null differ diff --git a/src/assets/chart-icon-bar-stacked-min.png b/src/assets/chart-icon-bar-stacked-min.png deleted file mode 100644 index 18b46f86..00000000 Binary files a/src/assets/chart-icon-bar-stacked-min.png and /dev/null differ diff --git a/src/assets/chart-icon-bar-table.svg b/src/assets/chart-icon-bar-table.svg new file mode 100644 index 00000000..673fbe54 --- /dev/null +++ b/src/assets/chart-icon-bar-table.svg @@ -0,0 +1,32 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/assets/chart-icon-box-plot-min.png b/src/assets/chart-icon-box-plot-min.png deleted file mode 100644 index 05182bde..00000000 Binary files a/src/assets/chart-icon-box-plot-min.png and /dev/null differ diff --git a/src/assets/chart-icon-box-plot.svg b/src/assets/chart-icon-box-plot.svg new file mode 100644 index 00000000..14f89746 --- /dev/null +++ b/src/assets/chart-icon-box-plot.svg @@ -0,0 +1,18 @@ + + + + + + + + + + + + + + + + + + diff --git a/src/assets/chart-icon-bubble-min.png b/src/assets/chart-icon-bubble-min.png deleted file mode 100644 index 25cad3ac..00000000 Binary files a/src/assets/chart-icon-bubble-min.png and /dev/null differ diff --git a/src/assets/chart-icon-column-grouped-min.png b/src/assets/chart-icon-column-grouped-min.png deleted file mode 100644 index 43e0489b..00000000 Binary files a/src/assets/chart-icon-column-grouped-min.png and /dev/null differ diff --git a/src/assets/chart-icon-column-grouped.svg b/src/assets/chart-icon-column-grouped.svg new file mode 100644 index 00000000..60c0f156 --- /dev/null +++ b/src/assets/chart-icon-column-grouped.svg @@ -0,0 +1,15 @@ + + + + + + + + + + + + + + + diff --git a/src/assets/chart-icon-column-min.png b/src/assets/chart-icon-column-min.png deleted file mode 100644 index 432ba451..00000000 Binary files a/src/assets/chart-icon-column-min.png and /dev/null differ diff --git a/src/assets/chart-icon-column-stacked-min.png b/src/assets/chart-icon-column-stacked-min.png deleted file mode 100644 index 12e6a52c..00000000 Binary files a/src/assets/chart-icon-column-stacked-min.png and /dev/null differ diff --git a/src/assets/chart-icon-column-stacked.svg b/src/assets/chart-icon-column-stacked.svg new file mode 100644 index 00000000..a3568c45 --- /dev/null +++ b/src/assets/chart-icon-column-stacked.svg @@ -0,0 +1,15 @@ + + + + + + + + + + + + + + + diff --git a/src/assets/chart-icon-column.svg b/src/assets/chart-icon-column.svg new file mode 100644 index 00000000..459c1217 --- /dev/null +++ b/src/assets/chart-icon-column.svg @@ -0,0 +1,11 @@ + + + + + + + + + + + diff --git a/src/assets/chart-icon-custom-area-min.png b/src/assets/chart-icon-custom-area-min.png deleted file mode 100644 index ca339132..00000000 Binary files a/src/assets/chart-icon-custom-area-min.png and /dev/null differ diff --git a/src/assets/chart-icon-custom-area.svg b/src/assets/chart-icon-custom-area.svg new file mode 100644 index 00000000..5bf7c088 --- /dev/null +++ b/src/assets/chart-icon-custom-area.svg @@ -0,0 +1,8 @@ + + + + + + + + diff --git a/src/assets/chart-icon-custom-bar-min.png b/src/assets/chart-icon-custom-bar-min.png deleted file mode 100644 index b9fa5f83..00000000 Binary files a/src/assets/chart-icon-custom-bar-min.png and /dev/null differ diff --git a/src/assets/chart-icon-custom-bar.svg b/src/assets/chart-icon-custom-bar.svg new file mode 100644 index 00000000..53adc117 --- /dev/null +++ b/src/assets/chart-icon-custom-bar.svg @@ -0,0 +1,8 @@ + + + + + + + + diff --git a/src/assets/chart-icon-custom-line-min.png b/src/assets/chart-icon-custom-line-min.png deleted file mode 100644 index 28bf2890..00000000 Binary files a/src/assets/chart-icon-custom-line-min.png and /dev/null differ diff --git a/src/assets/chart-icon-custom-line.svg b/src/assets/chart-icon-custom-line.svg new file mode 100644 index 00000000..2a16a88b --- /dev/null +++ b/src/assets/chart-icon-custom-line.svg @@ -0,0 +1,10 @@ + + + + + + + + + + diff --git a/src/assets/chart-icon-custom-point-min.png b/src/assets/chart-icon-custom-point-min.png deleted file mode 100644 index 2570fad6..00000000 Binary files a/src/assets/chart-icon-custom-point-min.png and /dev/null differ diff --git a/src/assets/chart-icon-custom-point.svg b/src/assets/chart-icon-custom-point.svg new file mode 100644 index 00000000..1a8cf826 --- /dev/null +++ b/src/assets/chart-icon-custom-point.svg @@ -0,0 +1,10 @@ + + + + + + + + + + diff --git a/src/assets/chart-icon-custom-rect-min.png b/src/assets/chart-icon-custom-rect-min.png deleted file mode 100644 index 6c3c86b6..00000000 Binary files a/src/assets/chart-icon-custom-rect-min.png and /dev/null differ diff --git a/src/assets/chart-icon-custom-rect.svg b/src/assets/chart-icon-custom-rect.svg new file mode 100644 index 00000000..59a25c7c --- /dev/null +++ b/src/assets/chart-icon-custom-rect.svg @@ -0,0 +1,8 @@ + + + + + + + + diff --git a/src/assets/chart-icon-dot-plot-horizontal-min.png b/src/assets/chart-icon-dot-plot-horizontal-min.png deleted file mode 100644 index 730af418..00000000 Binary files a/src/assets/chart-icon-dot-plot-horizontal-min.png and /dev/null differ diff --git a/src/assets/chart-icon-dot-plot-horizontal.svg b/src/assets/chart-icon-dot-plot-horizontal.svg new file mode 100644 index 00000000..d340a5d4 --- /dev/null +++ b/src/assets/chart-icon-dot-plot-horizontal.svg @@ -0,0 +1,18 @@ + + + + + + + + + + + + + + + + + + diff --git a/src/assets/chart-icon-dot-plot-vertical-min.png b/src/assets/chart-icon-dot-plot-vertical-min.png deleted file mode 100644 index ce3c370e..00000000 Binary files a/src/assets/chart-icon-dot-plot-vertical-min.png and /dev/null differ diff --git a/src/assets/chart-icon-dotted-line-min.png b/src/assets/chart-icon-dotted-line-min.png deleted file mode 100644 index 0ad4f471..00000000 Binary files a/src/assets/chart-icon-dotted-line-min.png and /dev/null differ diff --git a/src/assets/chart-icon-dotted-line.svg b/src/assets/chart-icon-dotted-line.svg new file mode 100644 index 00000000..fa9b6d08 --- /dev/null +++ b/src/assets/chart-icon-dotted-line.svg @@ -0,0 +1,20 @@ + + + + + + + + + + + + + + + + + + + + diff --git a/src/assets/chart-icon-heat-map-min.png b/src/assets/chart-icon-heat-map-min.png deleted file mode 100644 index 7319c580..00000000 Binary files a/src/assets/chart-icon-heat-map-min.png and /dev/null differ diff --git a/src/assets/chart-icon-heat-map.svg b/src/assets/chart-icon-heat-map.svg new file mode 100644 index 00000000..c08bf3b5 --- /dev/null +++ b/src/assets/chart-icon-heat-map.svg @@ -0,0 +1,25 @@ + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/assets/chart-icon-histogram-min.png b/src/assets/chart-icon-histogram-min.png deleted file mode 100644 index f1015ada..00000000 Binary files a/src/assets/chart-icon-histogram-min.png and /dev/null differ diff --git a/src/assets/chart-icon-histogram.svg b/src/assets/chart-icon-histogram.svg new file mode 100644 index 00000000..9b4dc990 --- /dev/null +++ b/src/assets/chart-icon-histogram.svg @@ -0,0 +1,14 @@ + + + + + + + + + + + + + + diff --git a/src/assets/chart-icon-kpi-card.svg b/src/assets/chart-icon-kpi-card.svg new file mode 100644 index 00000000..b1d42984 --- /dev/null +++ b/src/assets/chart-icon-kpi-card.svg @@ -0,0 +1,30 @@ + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/assets/chart-icon-line-min.png b/src/assets/chart-icon-line-min.png deleted file mode 100644 index b01484ed..00000000 Binary files a/src/assets/chart-icon-line-min.png and /dev/null differ diff --git a/src/assets/chart-icon-line.svg b/src/assets/chart-icon-line.svg new file mode 100644 index 00000000..84ed1d6f --- /dev/null +++ b/src/assets/chart-icon-line.svg @@ -0,0 +1,10 @@ + + + + + + + + + + diff --git a/src/assets/chart-icon-linear-regression-min.png b/src/assets/chart-icon-linear-regression-min.png deleted file mode 100644 index c03a3aa7..00000000 Binary files a/src/assets/chart-icon-linear-regression-min.png and /dev/null differ diff --git a/src/assets/chart-icon-linear-regression.svg b/src/assets/chart-icon-linear-regression.svg new file mode 100644 index 00000000..9c4b8782 --- /dev/null +++ b/src/assets/chart-icon-linear-regression.svg @@ -0,0 +1,15 @@ + + + + + + + + + + + + + + + diff --git a/src/assets/chart-icon-pie-min.png b/src/assets/chart-icon-pie-min.png deleted file mode 100644 index b53957c2..00000000 Binary files a/src/assets/chart-icon-pie-min.png and /dev/null differ diff --git a/src/assets/chart-icon-pie.svg b/src/assets/chart-icon-pie.svg new file mode 100644 index 00000000..7fe15948 --- /dev/null +++ b/src/assets/chart-icon-pie.svg @@ -0,0 +1,9 @@ + + + + + + + + diff --git a/src/assets/chart-icon-pyramid-min.png b/src/assets/chart-icon-pyramid-min.png deleted file mode 100644 index 25b2ffb2..00000000 Binary files a/src/assets/chart-icon-pyramid-min.png and /dev/null differ diff --git a/src/assets/chart-icon-scatter-min.png b/src/assets/chart-icon-scatter-min.png deleted file mode 100644 index 2e5fd004..00000000 Binary files a/src/assets/chart-icon-scatter-min.png and /dev/null differ diff --git a/src/assets/chart-icon-scatter.svg b/src/assets/chart-icon-scatter.svg new file mode 100644 index 00000000..11ca74c9 --- /dev/null +++ b/src/assets/chart-icon-scatter.svg @@ -0,0 +1,13 @@ + + + + + + + + + + + + + diff --git a/src/assets/chart-icon-table-min.png b/src/assets/chart-icon-table-min.png deleted file mode 100644 index 56449ef8..00000000 Binary files a/src/assets/chart-icon-table-min.png and /dev/null differ diff --git a/src/components/ChartTemplates.tsx b/src/components/ChartTemplates.tsx index 30b099ab..ac5c8d93 100644 --- a/src/components/ChartTemplates.tsx +++ b/src/components/ChartTemplates.tsx @@ -19,37 +19,37 @@ import InsightsIcon from '@mui/icons-material/Insights'; import React from "react"; // Import all chart icons statically so they are included in the build -import chartIconScatter from '../assets/chart-icon-scatter-min.png'; -import chartIconLinearRegression from '../assets/chart-icon-linear-regression-min.png'; -import chartIconDotPlotHorizontal from '../assets/chart-icon-dot-plot-horizontal-min.png'; -import chartIconBoxPlot from '../assets/chart-icon-box-plot-min.png'; -import chartIconColumn from '../assets/chart-icon-column-min.png'; -import chartIconColumnGrouped from '../assets/chart-icon-column-grouped-min.png'; -import chartIconColumnStacked from '../assets/chart-icon-column-stacked-min.png'; -import chartIconHistogram from '../assets/chart-icon-histogram-min.png'; -import chartIconHeatMap from '../assets/chart-icon-heat-map-min.png'; -import chartIconLine from '../assets/chart-icon-line-min.png'; -import chartIconDottedLine from '../assets/chart-icon-dotted-line-min.png'; -import chartIconCustomPoint from '../assets/chart-icon-custom-point-min.png'; -import chartIconCustomLine from '../assets/chart-icon-custom-line-min.png'; -import chartIconCustomBar from '../assets/chart-icon-custom-bar-min.png'; -import chartIconCustomRect from '../assets/chart-icon-custom-rect-min.png'; -import chartIconCustomArea from '../assets/chart-icon-custom-area-min.png'; +import chartIconScatter from '../assets/chart-icon-scatter.svg'; +import chartIconLinearRegression from '../assets/chart-icon-linear-regression.svg'; +import chartIconDotPlotHorizontal from '../assets/chart-icon-dot-plot-horizontal.svg'; +import chartIconBoxPlot from '../assets/chart-icon-box-plot.svg'; +import chartIconColumn from '../assets/chart-icon-column.svg'; +import chartIconColumnGrouped from '../assets/chart-icon-column-grouped.svg'; +import chartIconColumnStacked from '../assets/chart-icon-column-stacked.svg'; +import chartIconHistogram from '../assets/chart-icon-histogram.svg'; +import chartIconHeatMap from '../assets/chart-icon-heat-map.svg'; +import chartIconLine from '../assets/chart-icon-line.svg'; +import chartIconCustomPoint from '../assets/chart-icon-custom-point.svg'; +import chartIconCustomLine from '../assets/chart-icon-custom-line.svg'; +import chartIconCustomBar from '../assets/chart-icon-custom-bar.svg'; +import chartIconCustomRect from '../assets/chart-icon-custom-rect.svg'; +import chartIconCustomArea from '../assets/chart-icon-custom-area.svg'; import chartIconArea from '../assets/chart-icon-area.svg'; import chartIconStreamgraph from '../assets/chart-icon-streamgraph.svg'; import chartIconDensity from '../assets/chart-icon-density.svg'; import chartIconLollipop from '../assets/chart-icon-lollipop.svg'; -import chartIconPie from '../assets/chart-icon-pie-min.png'; +import chartIconPie from '../assets/chart-icon-pie.svg'; import chartIconUSMap from '../assets/chart-icon-us-map-min.png'; import chartIconPyramid from '../assets/chart-icon-pyramid.svg'; import chartIconWorldMap from '../assets/chart-icon-world-map-min.png'; -import chartIconDotPlotVertical from '../assets/chart-icon-dot-plot-vertical-min.png'; import chartIconCandlestick from '../assets/chart-icon-candlestick.svg'; import chartIconWaterfall from '../assets/chart-icon-waterfall.svg'; import chartIconStripPlot from '../assets/chart-icon-strip-plot.svg'; import chartIconRadar from '../assets/chart-icon-radar.svg'; import chartIconBump from '../assets/chart-icon-bump.svg'; import chartIconRose from '../assets/chart-icon-rose.svg'; +import chartIconBarTable from '../assets/chart-icon-bar-table.svg'; +import chartIconKpiCard from '../assets/chart-icon-kpi-card.svg'; // Chart Icon Component using static imports const ChartIcon: React.FC<{ src: string; alt?: string }> = ({ src, alt = "" }) => { @@ -67,6 +67,7 @@ export const CHART_ICONS: Record = { "Ranged Dot Plot": , "Boxplot": , "Bar Chart": , + "Bar Table": , "Pyramid Chart": , "Grouped Bar Chart": , "Stacked Bar Chart": , @@ -77,7 +78,6 @@ export const CHART_ICONS: Record = { "Pie Chart": , "Rose Chart": , "Line Chart": , - "Dotted Line Chart": , "Bump Chart": , "Area Chart": , "Streamgraph": , @@ -87,6 +87,7 @@ export const CHART_ICONS: Record = { "Waterfall Chart": , "Strip Plot": , "Radar Chart": , + "KPI Card": , "Custom Point": , "Custom Line": , "Custom Bar": , diff --git a/src/components/ComponentType.tsx b/src/components/ComponentType.tsx index afe1c7d1..3746db42 100644 --- a/src/components/ComponentType.tsx +++ b/src/components/ComponentType.tsx @@ -3,7 +3,7 @@ import { Type } from '../data/types'; import { channels, type ChartTemplateDef } from '../lib/agents-chart'; -import { inferTypeFromValueArray } from '../data/utils'; +import { inferTypeFromValueArray, refineTemporalType } from '../data/utils'; export type FieldSource = "custom" | "original"; @@ -58,15 +58,24 @@ export interface ClarificationResponse { source: 'option' | 'free_text' | 'freeform'; } +export type DelegateTarget = 'data_loading' | 'report_gen'; + export interface InteractionEntry { from: Actor; to: Actor; - role: 'prompt' | 'clarify' | 'instruction' | 'summary' | 'error' | 'explain'; + role: 'prompt' | 'clarify' | 'instruction' | 'summary' | 'error' | 'explain' | 'delegate'; plan?: string; // agent's reasoning / thought for this action content: string; displayContent?: string; inputTableNames?: string[]; // table names actually used for this derivation step clarificationQuestions?: ClarificationQuestion[]; + /** For 'delegate' entries: which peer agent the Data Agent wants to + * hand off to. Rendered as one or two one-click button cards. */ + delegateTarget?: DelegateTarget; + /** For 'delegate' entries: 1–2 hand-off option prompts. Each string + * is shown on its own button and used as the seed prompt sent to + * the target agent on click. */ + delegateOptions?: string[]; timestamp?: number; } @@ -155,16 +164,6 @@ export interface PendingTableLoad { csvScratchPath: string; preview: InlineTablePreview; confirmed: boolean; - // For sample dataset loading - sampleDataset?: { - datasetName: string; - tables: Array<{ - tableUrl: string; - format: string; - }>; - live?: boolean; - refreshIntervalSeconds?: number; - }; } export interface LoadPlanCandidate { @@ -177,6 +176,8 @@ export interface LoadPlanCandidate { sortBy?: string; sortOrder?: 'asc' | 'desc'; selected?: boolean; + /** Backend-detected reason this candidate cannot be loaded (unknown source_id, missing table_key, etc.). */ + resolutionError?: string; } export interface LoadPlan { @@ -243,6 +244,15 @@ export interface DictTable { type: Type, semanticType: string, levels: any[], + // Parallel to `levels` (same order); only populated when `levels` + // was filled by the backend column-stats pass (design-doc 31). + // When `levels` is curated (chart-gallery / LLM) this stays undefined + // and the column filter checklist hides the count column. + levelCounts?: number[], + // Total distinct non-null values; drives the filter popover variant + // (≤ 100 → checklist, > 100 → keyword search). + distinctCount?: number, + nullCount?: number, intrinsicDomain?: [number, number], unit?: string, displayName?: string, @@ -305,14 +315,18 @@ export function createDictTable( displayId: `${id}`, names, rows, - metadata: names.reduce((acc, name) => ({ - ...acc, - [name]: { - type: inferTypeFromValueArray(rows.map(r => r[name])), - semanticType: "", - levels: [] - } - }), {}), + metadata: names.reduce((acc, name) => { + const colValues = rows.map(r => r[name]); + const inferred = inferTypeFromValueArray(colValues); + return { + ...acc, + [name]: { + type: refineTemporalType(colValues, inferred), + semanticType: "", + levels: [] + } + }; + }, {}), derive, virtual, anchored, @@ -352,13 +366,12 @@ export type Chart = { chartType: string, encodingMap: EncodingMap, tableRef: string, - saved: boolean, source: "user" | "trigger", config?: Record, // additional chart properties defined by the chart template - thumbnail?: string, // PNG data URL for thumbnail display (managed by ChartRenderService, not persisted) insight?: ChartInsight, // AI-generated insight about the visualization styleVariants?: ChartStyleVariant[], // user-authored style refinements (see ChartStyleVariant) activeVariantId?: string, // id of the variant currently rendered in the focused canvas; undefined = default + unread?: boolean, // true for agent-generated charts the user hasn't focused yet; cleared on focus } /** Compute a string key for insight invalidation: chartType|sortedFieldIds */ @@ -394,7 +407,6 @@ export let duplicateChart = (chart: Chart) : Chart => { chartType: chart.chartType, encodingMap: JSON.parse(JSON.stringify(chart.encodingMap)) as EncodingMap, tableRef: chart.tableRef, - saved: false, source: chart.source, config: chart.config ? JSON.parse(JSON.stringify(chart.config)) : undefined, // styleVariants are intentionally NOT copied: they are user-authored diff --git a/src/components/ConnectorTablePreview.tsx b/src/components/ConnectorTablePreview.tsx index 7c2991db..11087653 100644 --- a/src/components/ConnectorTablePreview.tsx +++ b/src/components/ConnectorTablePreview.tsx @@ -19,7 +19,6 @@ import { Box, Button, CircularProgress, - Collapse, IconButton, MenuItem, Stack, @@ -31,8 +30,6 @@ import AddIcon from '@mui/icons-material/Add'; import CloseIcon from '@mui/icons-material/Close'; import RefreshIcon from '@mui/icons-material/Refresh'; import CheckIcon from '@mui/icons-material/Check'; -import KeyboardArrowDownIcon from '@mui/icons-material/KeyboardArrowDown'; -import KeyboardArrowRightIcon from '@mui/icons-material/KeyboardArrowRight'; import { DataFrameTable } from '../views/DataFrameTable'; import { fetchWithIdentity, CONNECTOR_ACTION_URLS, SourceTableRef } from '../app/utils'; @@ -172,7 +169,6 @@ export const ConnectorTablePreview: React.FC = ({ // Preview refresh loading (separate from parent loading) const [refreshing, setRefreshing] = useState(false); - const [metadataExpanded, setMetadataExpanded] = useState(false); const isLoading = loading || refreshing; const effectiveDesc = (tableDescription || sourceDescription || '').trim(); @@ -439,67 +435,126 @@ export const ConnectorTablePreview: React.FC = ({ return ( - {/* Header — name + row count + max rows */} + {/* Header — name + row count */} {displayName} {pathBreadcrumb && ( {pathBreadcrumb} )} - {rowCount != null && ( - - {t('connectorPreview.rowCount', { count: Number(rowCount).toLocaleString(), defaultValue: '{{count}} rows' })} - {sampleRows.length > 0 && ( - - ({t('connectorPreview.previewRowsNotice', { count: sampleRows.length, defaultValue: `Preview shows first ${sampleRows.length} rows only` })}) - - )} - - )} - - + {(() => { + // Row-count line — ALWAYS rendered as a single + // `Typography` so the header height is identical + // across loading/loaded states and across tables + // (no wrap, no conditional rendering, no visibility + // toggling between elements). + // + // `rowCount` is shown only when reliable as the + // true table total: + // - it exceeds the preview sample (more rows + // exist than we returned), OR + // - the sample is shorter than the preview cap + // of 10 (we exhausted the table). + // Otherwise we fall back to the "Preview shows + // first N rows" notice, or — during loading — a + // hidden non-breaking space placeholder that + // reserves the same line height. + const PREVIEW_CAP = 10; + const sampleLen = sampleRows.length; + const totalReliable = rowCount != null && (rowCount > sampleLen || sampleLen < PREVIEW_CAP); + const previewNotice = t('connectorPreview.previewRowsNotice', { + count: sampleLen, + defaultValue: `Preview shows first ${sampleLen} rows only`, + }); + + let content: React.ReactNode; + let hidden = false; + if (totalReliable) { + content = ( + <> + {t('connectorPreview.rowCount', { + count: Number(rowCount).toLocaleString(), + defaultValue: '{{count}} rows', + })} + {sampleLen > 0 && rowCount! > sampleLen && ( + + ({previewNotice}) + + )} + + ); + } else if (sampleLen > 0) { + content = previewNotice; + } else { + content = '\u00A0'; + hidden = true; + } - {hasMetadataRow && ( - - setMetadataExpanded(v => !v)} - sx={{ - display: 'flex', alignItems: 'center', gap: 0.5, - py: 0.25, cursor: 'pointer', userSelect: 'none', - '&:hover': { bgcolor: 'action.hover' }, - }} - > - {metadataExpanded - ? - : } - - {t('connectorPreview.sourceMetadata')} - - - - + return ( - - + ); + })()} + + + + {hasMetadataRow && ( + + + {effectiveDesc} + )} - {/* Preview table — sizes to its content (10-row cap upstream). - Scrollbar only kicks in if the parent container is height- - constrained below the table's natural size. */} - - {isLoading ? ( - - + {/* Preview table — uses a *fixed* height (not minHeight) so the + section is identical across all tables and across the + loading→loaded transition. The value (290px) covers the + worst case: 10 compact rows (~220) + header (~22) + the + "…" continuation row that DataFrameTable renders when the + full table exceeds 10 rows (~22) + horizontal scrollbar + lane for wide tables (~15) + cell borders (~6). + + Overflow is *horizontal only*: content is intrinsically + capped at 10 rows + header + "…" row, so a vertical + scrollbar would never represent real overflow — it would + only appear as a side effect of the horizontal scrollbar + eating into the height. `overflowY: hidden` keeps that + from happening. */} + + {isLoading && sampleRows.length === 0 ? ( + + ) : sampleRows.length > 0 ? ( = ({ {/* Filter conditions — sits *below* the preview, next to the Preview-refresh button so editing filters and applying them - happen in the same place. */} - {enableFilters && columns.length > 0 && !alreadyLoaded && ( + happen in the same place. + + Rendered as soon as `enableFilters` is on and the table + isn't already loaded — we *don't* gate on `columns.length` + so the filter row reserves its own space during loading + (otherwise the popover would grow when columns arrive). + The Add-filter button is disabled until columns are known + and the Preview button uses the same isLoading guard. */} + {enableFilters && !alreadyLoaded && ( {filters.map((f, idx) => { const colMeta = columns.find(c => c.name === f.column); @@ -589,6 +651,7 @@ export const ConnectorTablePreview: React.FC = ({ - - {/* Filter/sort/limit details */} - - {c.filters?.length ? c.filters.map((f, fi) => ( - - )) : ( - - )} - {c.sortBy && ( - - )} - - - - {preview?.loading ? ( - - - - {t('dataLoading.loadPlan.previewing')} - - - ) : preview?.error ? ( - - {preview.error} - - ) : preview && preview.rows.length > 0 ? ( - - ) : preview ? ( - - {t('connectorPreview.noMatchingRows')} - - ) : null} - - - + + : toggleItem(i)} sx={{ p: 0.25 }} />} + trailing={showSourceLabel + ? ({c.sourceId}) + : undefined} + filterChips={hasFilters ? ( + <> + {c.filters?.map((f, fi) => ( + + ))} + {c.sortBy && ( + + )} + + ) : undefined} + preview={previewData} + expanded={!!preview?.expanded} + onTogglePreview={unresolved ? undefined : () => handlePreview(c, i)} + dim={unresolved} + unresolved={unresolved ? { + message: t('dataLoading.loadPlan.unresolved', { + defaultValue: "Couldn't resolve this table — the agent should rerun search and try again.", + }), + detail: c.resolutionError, + } : undefined} + /> ); })} - {/* Load button */} - {!confirmed && ( - + {/* Footer: action button (unconfirmed) or quiet caption (confirmed). + When every candidate shares one source, surface it once down + here instead of duplicating it on each row. */} + + {!showSourceLabel && sharedSourceId && ( + + {t('dataLoading.loadPlan.fromSource', { defaultValue: 'from' })} {sharedSourceId} + + )} + + {confirmed ? ( + + {t('dataLoading.loadPlan.loadedCount', { + count: plan.candidates.filter(c => !c.resolutionError).length, + defaultValue: '✓ Loaded', + })} + + ) : ( - - )} + )} + + + ); +}; + +// --------------------------------------------------------------------------- +// PendingLoadsCard +// --------------------------------------------------------------------------- +// Renders one or more agent-proposed scratch-CSV table loads using the +// same visual shell as `LoadPlanCard` above, so users see a consistent +// multi-table import UI regardless of whether candidates come from a +// connector plan or a notebook-style extract step. + +interface PendingLoadsCardProps { + pendingLoads: PendingTableLoad[]; + onLoad: (pending: PendingTableLoad) => Promise | void; +} + +export const PendingLoadsCard: React.FC = ({ pendingLoads, onLoad }) => { + const theme = useTheme(); + const { t } = useTranslation(); + + // Confirmed = already loaded earlier; unconfirmed = selectable. + const [selection, setSelection] = useState>( + () => Object.fromEntries(pendingLoads.map((p, i) => [i, !p.confirmed])) + ); + // Auto-expand previews — scratch CSV samples are already inlined + // client-side, so there's no fetch cost to showing them by default. + const [expanded, setExpanded] = useState>( + () => Object.fromEntries(pendingLoads.map((_, i) => [i, true])) + ); + const [loading, setLoading] = useState(false); + + const allConfirmed = pendingLoads.every(p => p.confirmed); + const selectedCount = Object.entries(selection) + .filter(([i, on]) => on && !pendingLoads[Number(i)].confirmed).length; + + const toggleItem = (idx: number) => + setSelection(prev => ({ ...prev, [idx]: !prev[idx] })); + const togglePreview = (idx: number) => + setExpanded(prev => ({ ...prev, [idx]: !prev[idx] })); + + const handleConfirm = async () => { + if (selectedCount === 0) return; + setLoading(true); + try { + for (let i = 0; i < pendingLoads.length; i++) { + if (selection[i] && !pendingLoads[i].confirmed) { + await onLoad(pendingLoads[i]); + } + } + } finally { + setLoading(false); + } + }; + + const isDark = theme.palette.mode === 'dark'; + const borderColorBase = allConfirmed + ? alpha(theme.palette.success.main, 0.3) + : alpha(theme.palette.primary.main, isDark ? 0.25 : 0.15); + const borderColorHover = allConfirmed + ? alpha(theme.palette.success.main, 0.45) + : alpha(theme.palette.primary.main, isDark ? 0.4 : 0.3); + const shadowBase = isDark + ? '0 1px 2px rgba(0,0,0,0.4), 0 1px 3px rgba(0,0,0,0.2)' + : '0 1px 2px rgba(0,0,0,0.04), 0 1px 3px rgba(0,0,0,0.03)'; + const shadowHover = isDark + ? '0 2px 4px rgba(0,0,0,0.5), 0 2px 6px rgba(0,0,0,0.3)' + : '0 2px 4px rgba(0,0,0,0.06), 0 2px 6px rgba(0,0,0,0.04)'; + + return ( + + {pendingLoads.map((p, i) => { + const preview = p.preview; + const rowLabel = preview.totalRows > preview.sampleRows.length + ? `${preview.totalRows.toLocaleString()} ${t('dataLoading.rows')}` + : ''; + const meta = [rowLabel, `${preview.columns.length} ${t('dataLoading.cols')}`] + .filter(Boolean).join(' · '); + + const previewData: TablePreviewData = { + state: 'ready', + columns: preview.columns, + rows: preview.sampleRows, + totalRows: preview.totalRows, + }; + + return ( + + : toggleItem(i)} sx={{ p: 0.25 }} />} + preview={previewData} + expanded={!!expanded[i]} + onTogglePreview={preview.sampleRows.length > 0 ? () => togglePreview(i) : undefined} + /> + ); + })} + + + + {allConfirmed ? ( + + {t('dataLoading.loadPlan.loadedCount', { + count: pendingLoads.length, + defaultValue: '✓ Loaded', + })} + + ) : ( + + )} + ); }; diff --git a/src/components/TablePreviewRow.tsx b/src/components/TablePreviewRow.tsx new file mode 100644 index 00000000..aff9d9e6 --- /dev/null +++ b/src/components/TablePreviewRow.tsx @@ -0,0 +1,114 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +import React from 'react'; +import { Box, Button, CircularProgress, Collapse, Typography } from '@mui/material'; +import ErrorOutlineIcon from '@mui/icons-material/ErrorOutline'; +import { useTranslation } from 'react-i18next'; +import { DataFrameTable } from '../views/DataFrameTable'; + +// Shared header+collapsible-preview row used by LoadPlanCard and the +// inline previews in DataLoadingChat. Pure visual; no fetching, no state. + +export interface TablePreviewData { + state: 'idle' | 'loading' | 'error' | 'ready'; + error?: string; + columns?: string[]; + rows?: Record[]; + totalRows?: number; +} + +export interface TablePreviewRowProps { + name: string; + meta?: string; + leading?: React.ReactNode; // checkbox/check icon + trailing?: React.ReactNode; // e.g. source-id caption + filterChips?: React.ReactNode; // optional chip row under header + preview: TablePreviewData; + expanded: boolean; + onTogglePreview?: () => void; + unresolved?: { message: string; detail?: string }; + dim?: boolean; +} + +export const TablePreviewRow: React.FC = ({ + name, meta, leading, trailing, filterChips, + preview, expanded, onTogglePreview, unresolved, dim = false, +}) => { + const { t } = useTranslation(); + const showPreviewButton = !!onTogglePreview && !unresolved; + const isLoading = preview.state === 'loading'; + const indent = leading ? 3.5 : 0; + + const buttonLabel = isLoading + ? t('dataLoading.loadPlan.previewing') + : expanded + ? t('dataLoading.loadPlan.hidePreview', { defaultValue: 'Hide' }) + : t('dataLoading.loadPlan.preview'); + + return ( + + + {leading} + {unresolved && } + {name} + {meta && {meta}} + {trailing} + + {showPreviewButton && ( + + )} + + + {unresolved ? ( + + {unresolved.message} + {unresolved.detail && ( + + {unresolved.detail} + + )} + + ) : ( + <> + {filterChips && ( + + {filterChips} + + )} + + + {preview.state === 'loading' ? ( + + + + {t('dataLoading.loadPlan.previewing')} + + + ) : preview.state === 'error' ? ( + + {preview.error || t('dataLoading.loadPlan.previewFailed')} + + ) : preview.state === 'ready' && (preview.rows?.length ?? 0) > 0 ? ( + + ) : preview.state === 'ready' ? ( + + {t('connectorPreview.noMatchingRows')} + + ) : null} + + + + )} + + ); +}; diff --git a/src/components/VirtualizedCatalogTree.tsx b/src/components/VirtualizedCatalogTree.tsx index c064101d..4999d5ef 100644 --- a/src/components/VirtualizedCatalogTree.tsx +++ b/src/components/VirtualizedCatalogTree.tsx @@ -15,7 +15,6 @@ import { useTranslation } from 'react-i18next'; import { FixedSizeList, ListChildComponentProps } from 'react-window'; import { Box, Tooltip, Typography, useTheme } from '@mui/material'; import CheckIcon from '@mui/icons-material/Check'; -import FolderOutlinedIcon from '@mui/icons-material/FolderOutlined'; import DashboardOutlinedIcon from '@mui/icons-material/DashboardOutlined'; import InfoOutlinedIcon from '@mui/icons-material/InfoOutlined'; import ExpandMoreIcon from '@mui/icons-material/ExpandMore'; @@ -97,6 +96,27 @@ interface RowContext { // ─── Row component (react-window v1 API) ──────────────────────────────────── +// ── Layout constants (Notion/outliner-style: one glyph per row) ── +// Each row has exactly ONE leading glyph in the "item slot": +// - Namespace (folder-like, no semantic icon): chevron itself acts as the +// slot glyph; its rotation signals expanded vs collapsed. +// - Table leaf: TableIcon. +// - Group: DashboardOutlinedIcon (semantic — distinguishes a multi- +// table dataset from a plain namespace; clickable to toggle). +// No separate chevron-in-gutter, so adjacent folder/table rows automatically +// share the same icon-and-label columns. +// +// | depth*INDENT | slot(16) | GAP | label … +// +// Outer leading inset is provided by the catalog tree's wrapper (in +// DataSourceSidebar.tsx), not by this component, so it can be tuned to align +// with the connector header's icon column. +const INDENT_PER_LEVEL = 12; +const ITEM_SLOT = 16; +const ITEM_LABEL_GAP = 4; +/** Left padding for the row's content (slot + label). */ +const rowPadLeft = (depth: number) => depth * INDENT_PER_LEVEL; + function CatalogRow({ index, style, data }: ListChildComponentProps) { const { rows, loadedMap, onToggle, onItemClick, onLoadMore, onDragStart, renderTableActions, selectedItemId } = data; const row = rows[index]; @@ -111,10 +131,14 @@ function CatalogRow({ index, style, data }: ListChildComponentProps) const isNamespace = node.node_type === 'namespace'; const isExpandable = isNamespace || isGroup; + // Lazy / load-more rows: align the leading text with where a real row's + // label would sit at this depth: pl + slot + gap. + const placeholderPadLeft = `${rowPadLeft(depth) + ITEM_SLOT + ITEM_LABEL_GAP}px`; + if (isLazyPlaceholder) { return (
- + Loading…
@@ -124,7 +148,7 @@ function CatalogRow({ index, style, data }: ListChildComponentProps) if (isLoadMore) { return (
- + ) ) ...(isSelected ? { backgroundColor: theme.palette.action.selected, fontWeight: 500 } : {}), }} > - {/* Expand/collapse arrow */} - {isExpandable ? ( - - {isExpanded - ? - : } - - ) : ( - - )} - {/* Icon */} - {isGroup - ? - : isTable - ? - : - } + {/* Single leading glyph (slot) — chevron for namespaces, + type icon for tables/groups. */} + + {isNamespace + ? (isExpanded + ? + : ) + : isGroup + ? + : isTable + ? + : null} + {/* Label */} {node.name} @@ -321,7 +346,6 @@ export const VirtualizedCatalogTree: React.FC = ({ return ( - {/* @ts-expect-error react-window v1 class component vs React 19 JSX type */} { return types[0]; }; +/** + * Downgrade DateTime to Date when every non-null value has a zero time component. + * Call this on a column AFTER type inference to refine the temporal precision. + */ +export const refineTemporalType = (values: any[], inferredType: Type): Type => { + if (inferredType !== Type.DateTime) return inferredType; + const nonNull = values.filter(v => v != null && v !== ''); + if (nonNull.length === 0) return inferredType; + const allMidnight = nonNull.every(v => { + if (typeof v !== 'string') return false; + const tIdx = v.indexOf('T'); + if (tIdx === -1) return false; + const timePart = v.slice(tIdx + 1).replace(/[Z+-].*$/, ''); + const parts = timePart.split(':'); + return parts.length >= 2 && parts.every(p => parseFloat(p) === 0); + }); + return allMidnight ? Type.Date : Type.DateTime; +}; + export const convertTypeToDtype = (type: Type | undefined): string => { switch (type) { case Type.Integer: diff --git a/src/gallery/ChartGallery.tsx b/src/gallery/ChartGallery.tsx index e46ad431..75d625c9 100644 --- a/src/gallery/ChartGallery.tsx +++ b/src/gallery/ChartGallery.tsx @@ -1419,7 +1419,6 @@ const CHART_ICON_ALIAS: Record = { 'Pyramid': 'Pyramid Chart', 'Candlestick': 'Candlestick Chart', 'Waterfall': 'Waterfall Chart', - 'Dotted Line': 'Dotted Line Chart', 'Ranged Dot': 'Ranged Dot Plot', 'Density': 'Density Plot', 'Strip': 'Strip Plot', diff --git a/src/gallery/GallerySidebar.tsx b/src/gallery/GallerySidebar.tsx index 33b86ff7..3f437141 100644 --- a/src/gallery/GallerySidebar.tsx +++ b/src/gallery/GallerySidebar.tsx @@ -65,7 +65,6 @@ const CHART_ICON_ALIAS: Record = { 'Pyramid': 'Pyramid Chart', 'Candlestick': 'Candlestick Chart', 'Waterfall': 'Waterfall Chart', - 'Dotted Line': 'Dotted Line Chart', 'Ranged Dot': 'Ranged Dot Plot', 'Density': 'Density Plot', 'Strip': 'Strip Plot', diff --git a/src/i18n/locales/en/chart.json b/src/i18n/locales/en/chart.json index 1f58a977..a942ab03 100644 --- a/src/i18n/locales/en/chart.json +++ b/src/i18n/locales/en/chart.json @@ -1,6 +1,16 @@ { "chart": { - "derivedConcepts": "Derived Concepts", + "vegaLocale": { + "dateTime": "%x %A %X", + "date": "%-m/%-d/%Y", + "time": "%-I:%M:%S %p", + "periods": ["AM", "PM"], + "days": ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"], + "shortDays": ["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"], + "months": ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"], + "shortMonths": ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"] + }, + "derivedConcepts": "Formulas", "dataTransformCode": "Data transformation code", "dataTransformExplanation": "Data transformation explanation", "zoomIn": "zoom in", @@ -25,8 +35,13 @@ "code": "code", "agentLog": "Agent Log", "explain": "explain", - "concepts": "explanation", - "orStartWithChartType": "or, start with a chart type", + "concepts": "formulas", + "orStartWithChartType": "create a new chart?", + "orCreateYourself": "or create one yourself?", + "emptyStateTitle": "Ready to explore your data?", + "emptyStateSubtitle": "Ask the agent a question in the chat — it can suggest ideas, explain data, transform data, and build charts for you.", + "emptyStateChatHint": "Try the chat input at the bottom-left", + "emptyStateOrPickType": "Or pick a chart type to start manually", "resample": "Resample", "adjustSampleSize": "Adjust sample size: {{sampleSize}} / {{totalSize}} rows", "log": "log", @@ -71,7 +86,6 @@ "lollipopChart": "Lollipop Chart", "pyramidChart": "Pyramid Chart", "lineChart": "Line Chart", - "dottedLineChart": "Dotted Line Chart", "bumpChart": "Bump Chart", "areaChart": "Area Chart", "streamgraph": "Streamgraph", @@ -91,12 +105,12 @@ "customArea": "Custom Area" }, "chartCategoryTip": { - "scatterAndPoint": "Scatter & point-based charts", - "bar": "Bar & column charts", - "lineAndArea": "Line & area charts", - "partToWhole": "Part-to-whole comparison charts", - "statistical": "Statistical analysis charts", - "map": "Geographic map charts", + "points": "Point-based charts (scatter, dot, regression)", + "bars": "Bar & column charts", + "distributions": "Distribution & statistical charts", + "linesAndAreas": "Line & area charts", + "circular": "Radial charts (pie, rose, radar)", + "tablesAndMaps": "Tile, table, KPI & map charts", "custom": "Custom mark types" }, "gallery": { @@ -151,7 +165,6 @@ "histogram": "Histogram", "heatmap": "Heatmap", "lineChart": "Line Chart", - "dottedLineChart": "Dotted Line Chart", "boxplot": "Boxplot", "pieChart": "Pie Chart", "rangedDotPlot": "Ranged Dot Plot", diff --git a/src/i18n/locales/en/common.json b/src/i18n/locales/en/common.json index 7be09ba2..7a52125b 100644 --- a/src/i18n/locales/en/common.json +++ b/src/i18n/locales/en/common.json @@ -377,7 +377,8 @@ "partial": "Partial", "unavailable": "Unavailable", "not_synced": "Not synced" - } + }, + "loadInNewSession": "Load in new session" }, "dataThread": { "title": "Data Threads", @@ -415,6 +416,7 @@ "threadIndex": "thread - {{index}}", "continuedFromAbove": "continued", "continuesBelow": "continues", + "usingSources": "Using", "hmm": "hmm...", "oops": "oops...", "completed": "completed", @@ -439,6 +441,10 @@ "pinTable": "Pin table", "refreshSettings": "Refresh settings", "replaceData": "Replace data", + "viewMetadata": "View metadata", + "metadataFor": "Metadata for {{table}}", + "derivationSummary": "Derivation summary", + "noMetadata": "No description available for this table.", "rowsByColumns": "{{rows}}r \u00d7 {{cols}}c", "chartAlt": "{{type}} chart", "streamSourceLabel": "stream", @@ -498,6 +504,7 @@ "interactive": "interactive", "agent": "agent", "getIdeas": "Get Ideas", + "whatsNext": "What's next?", "editor": "Editor", "getIdeasForVisualization": "get ideas for visualization", "differentIdeas": "Different ideas?", @@ -551,6 +558,12 @@ "invalidExplanation": "The agent returned an invalid explanation.", "explanationTitle": "Explanation", "explanationFollowupsLabel": "Possible next steps:", + "delegateTitle": "Suggested next agent", + "delegateMinimize": "Minimize", + "delegateExpand": "Expand", + "delegateDismiss": "Dismiss", + "delegateToDataLoading": "Search in Data Loading", + "delegateToReportGen": "Generate report", "errorDuringExploration": "Error during exploration", "explorationStep": "Exploration step {{step}}: {{question}}", "threadExplorePrompt": "Explore interesting patterns and trends in this data", @@ -570,7 +583,37 @@ "loadedOfTotal": "{{loaded}} / {{total}} rows", "viewRandomRows": "view 10000 random rows from this table", "downloadAsCsv": "Download as CSV", - "downloading": "Downloading..." + "downloading": "Downloading...", + "columnMenu": { + "openMenu": "Column options", + "sortAsc": "Sort ascending", + "sortDesc": "Sort descending", + "clearSort": "Clear sort", + "filter": "Filter…", + "filterActive": "Filter (active)", + "clearFilter": "Clear filter", + "filterComingSoon": "Filter UI coming next." + }, + "filter": { + "from": "From", + "to": "To", + "includeBlanks": "Show blanks", + "showBlanksOnly": "Show blanks only", + "contains": "Contains…", + "blank": "(blank)", + "apply": "Apply", + "clear": "Clear filter", + "search": "Search values", + "selectAll": "(Select all)", + "noMatches": "No matching values", + "distinctHint": "{{count}} distinct values", + "sectionSort": "Sort", + "sectionFilter": "Filter", + "filterApplied": "Filter applied", + "summaryRows": "{{count, number}} rows", + "summaryDistinct": "{{count, number}} distinct", + "summaryBlanks": "{{count, number}} blanks" + } }, "chatDialog": { "noHistory": "There is no conversation history yet", @@ -609,6 +652,7 @@ "completingLogin": "Completing login…", "idpRedirecting": "Redirecting from SSO, please wait…", "callbackFailed": "Login callback failed: {{message}}", + "ssoErrorAccessDenied": "Authorization was cancelled. If you wish to use SSO, please try signing in again.", "ssoErrorInvalidState": "SSO session expired or was interrupted. Please try signing in again.", "ssoErrorInvalidClient": "SSO client credentials are incorrect. Please contact your administrator to verify the configuration.", "ssoErrorTokenExchange": "SSO login failed during token exchange. Please try again or contact your administrator.", @@ -657,7 +701,26 @@ "recoveredSession": "Recovered Session", "errorOccurred": "An error has occurred, please", "refreshSession": "refresh the session", - "errorPersistHint": "If the problem still exists, click close session." + "errorPersistHint": "If the problem still exists, click close session.", + "yourSessions": "Your Sessions", + "rename": "Rename", + "export": "Export", + "importZip": "Import workspace (.zip)", + "importingFile": "Importing {{name}}...", + "deleteTitle": "Delete session?", + "deleteConfirm": "This will permanently delete {{name}} ({{id}}) and all its data.", + "deleteFailed": "Failed to delete workspace", + "renameFailed": "Failed to rename workspace", + "exportFailed": "Failed to export workspace", + "importFailed": "Failed to import workspace", + "sortNewest": "newest", + "sortOldest": "oldest", + "sortRecentlyModified": "recently modified", + "sortName": "name", + "sortNewestFirst": "newest first", + "sortOldestFirst": "oldest first", + "sortRecentlyModifiedFirst": "recently modified", + "sortNameAsc": "name (a–z)" }, "supersetCatalog": { "title": "Superset Datasets", @@ -694,13 +757,17 @@ "interaction": { "askedForClarification": "asked for clarification", "gaveExplanation": "shared an explanation", + "delegatedToDataLoading": "suggested loading more data", + "delegatedToReportGen": "suggested generating a report", + "delegateLabelDataLoading": "Suggested data", + "delegateLabelReportGen": "Suggested report", "clarificationNeeded": "Awaiting actions" }, "concepts": { - "showFewer": "Show fewer concepts", - "showAll": "Show all concepts", - "showFirstN": "Show first {{count}} concepts", - "showAllN": "Show all {{count}} concepts" + "showFewer": "Show fewer formulas", + "showAll": "Show all formulas", + "showFirstN": "Show first {{count}} formulas", + "showAllN": "Show all {{count}} formulas" }, "dataframe": { "columnCount": "{{count}} columns" @@ -731,6 +798,7 @@ "refresh": "Refresh data", "emptyTree": "No tables found", "addConnector": "Add data connector", + "configureConnector": "Edit connection", "linkLocalFolder": "Link local folder", "newSession": "New session", "noSessions": "No saved sessions", @@ -784,7 +852,25 @@ "knowledge": "Knowledge", "metadataPartial": "Partial metadata", "metadataUnavailable": "Metadata unavailable", - "saving": "Saving..." + "saving": "Saving...", + "rename": "Rename", + "exportSession": "Export", + "exportFailed": "Failed to export session", + "importFailed": "Failed to import workspace", + "failedRenameSession": "Failed to rename session", + "sortNewest": "newest", + "sortOldest": "oldest", + "sortRecentlyModified": "recently modified", + "sortName": "name", + "sortNewestFirst": "newest first", + "sortOldestFirst": "oldest first", + "sortRecentlyModifiedFirst": "recently modified", + "sortNameAsc": "name (a–z)", + "timeJustNow": "just now", + "timeMinutes": "{{count}}m", + "timeHours": "{{count}}h", + "timeYesterday": "yesterday", + "timeDays": "{{count}}d" }, "knowledge": { "title": "Agent Knowledge", diff --git a/src/i18n/locales/en/dataLoading.json b/src/i18n/locales/en/dataLoading.json index af2a975d..20394e7f 100644 --- a/src/i18n/locales/en/dataLoading.json +++ b/src/i18n/locales/en/dataLoading.json @@ -1,7 +1,16 @@ { "dataLoading": { "title": "Data Loading Assistant", - "subtitle": "Extract data from images, files, webpages, or text. Generate synthetic datasets. Browse sample data.", + "subtitle": "I can help you extract, generate, or browse data — or just ask me anything.", + "capabilityAsk": "Ask questions about your connected data sources", + "capabilitySearch": "Search and browse curated sample datasets", + "capabilityExtractImage": "Extract structured data from images", + "capabilityExtractFile": "Extract data from PDFs or pasted text", + "capabilityHint": "Focus the input below to see example prompts.", + "sectionTry": "Try a task", + "sectionChat": "Or just ask", + "chatHint": "", + "chatHintExample": "What data do we have here?", "placeholder": "Describe data to extract, upload, or generate...", "attachTooltip": "Attach file or image", "stopTooltip": "Stop generation", @@ -18,7 +27,6 @@ "stopped": "— stopped", "uploaded": "[Uploaded: {{name}}]", "defaultImageMessage": "Extract data from this image", - "imageModelUnsupported": "The selected model does not support image input. Please switch to a vision-capable model or remove the image.", "syncInProgress": "Syncing catalog metadata…", "syncComplete": "Catalog sync complete", "syncPartial": "Catalog sync partially completed — some metadata may be missing", @@ -27,15 +35,16 @@ "metadataStatusUnavailable": "Unavailable", "metadataStatusNotSynced": "Not synced", "loadPlan": { - "title": "Agent recommends loading", "filters": "Filters", "rowLimit": "Row limit", "loadSelected": "Load selected", - "loaded": "Loaded", + "loadedCount": "✓ Loaded {{count}} table", + "loadedCount_plural": "✓ Loaded {{count}} tables", "preview": "Preview", + "hidePreview": "Hide", "previewing": "Previewing...", "previewFailed": "Preview failed", - "noFilters": "No filters" + "fromSource": "from" }, "toolLabels": { "readingFile": "Reading file", diff --git a/src/i18n/locales/en/messages.json b/src/i18n/locales/en/messages.json index ea29942f..d5f89405 100644 --- a/src/i18n/locales/en/messages.json +++ b/src/i18n/locales/en/messages.json @@ -63,6 +63,7 @@ "clarifyOptionContinue": "Continue exploring", "clarifyOptionSimplify": "Simplify the task", "clarifyOptionPresent": "Present what you have so far", + "clarifyOptionSummary": "Summarize what you have so far", "maxIterationsSummary": "Reached the maximum number of exploration steps.", "emptyDataframe": "Output DataFrame is empty (0 rows). Check filters or data loading.", "fieldsNotFound": "Chart encoding fields not found in output DataFrame: {{missing}}. Available columns: {{available}}", diff --git a/src/i18n/locales/en/model.json b/src/i18n/locales/en/model.json index 13bfdd0b..7771faf5 100644 --- a/src/i18n/locales/en/model.json +++ b/src/i18n/locales/en/model.json @@ -46,9 +46,13 @@ "openaiProviderTip": "Use openai provider for OpenAI-compatible APIs.", "loadingModels": "Loading models...", "serverManaged": "Server managed", + "serverChip": "server configured", + "serverConfigured": "server configured", + "serverManagedTooltip": "Managed by administrator", "serverManagedSection": "Server configured models", "serverManagedReadonly": "Read-only", "userManagedSection": "My models", + "testing": "Testing…", "configured": "Configured", "configuredMessage": "Server configured, click to verify connectivity" } diff --git a/src/i18n/locales/en/upload.json b/src/i18n/locales/en/upload.json index c652c507..292c87d0 100644 --- a/src/i18n/locales/en/upload.json +++ b/src/i18n/locales/en/upload.json @@ -2,17 +2,18 @@ "upload": { "title": "Load Data", "sampleDatasets": "Sample Datasets", - "sampleDatasetsDesc": "Explore and load curated example datasets", + "sampleDatasetsDesc": "Curated example datasets", "uploadFile": "Upload File", - "uploadFileDesc": "Upload local files (CSV, TSV, JSON, Excel)", + "uploadFileDesc": "CSV, TSV, JSON, or Excel", "pasteData": "Paste Data", - "pasteDataDesc": "Paste tabular data directly from clipboard", - "extractData": "Extract Data Agent", - "extractDataDesc": "Find, extract, and load data with AI", + "pasteDataDesc": "Paste from clipboard", + "extractData": "Data Loading Agent", + "extractDataDesc": "Find and extract data with AI", "loadFromUrl": "Load from URL", - "loadFromUrlDesc": "Load data from a URL with optional auto-refresh", + "loadFromUrlTitle": "Load from URL (live)", + "loadFromUrlDesc": "Fetch data from a remote URL", "database": "Database", - "databaseDesc": "Connect to databases or data services", + "databaseDesc": "Connect to a database or service", "databaseDisabled": "Database connection is disabled in this environment", "dragDrop": "Drag & drop files here", "orBrowse": "or Browse", @@ -35,6 +36,22 @@ "previewMode": "Preview mode: Editing disabled. Click \"Show Full\" to enable editing.", "showPreview": "Show Preview", "showFull": "Show Full", + "dataLoadingAgent": "Data Loading Agent", + "resumePreviousConversation": "Previous conversation →", + "agentChatPlaceholder": "Ask the agent to find datasets, or extract data from an image or text…", + "agentChatTabSuggestion": "What datasets do we have here?", + "agentChatSuggestionsLabel": "Try asking", + "agentChatSendTooltip": "Start chatting with the agent", + "agentChatSuggestion": { + "askConnected": "What datasets do we have from connected sources?", + "findCPI": "Help me load consumer price index data", + "extractFromExcel": "Extract data from an attached Excel file", + "kind": { + "ask": "ask", + "find": "find", + "extract": "extract" + } + }, "uploadData": "Upload Data", "importData": "Import data", "dataConnections": "Data connections", @@ -43,7 +60,7 @@ "localData": "Local data", "orConnectToDataSource": "Or connect to a data source (with optional auto-refresh)", "addConnection": "Connect databases", - "addConnectionDesc": "Create a persistent database connection", + "addConnectionDesc": "Connect to a live database", "connectorConnected": "Connected", "connectorDisconnected": "Click to connect", "pickDataSourceType": "Choose a data source type to create a new connection.", @@ -51,7 +68,7 @@ "connectionName": "Connection name", "createConnection": "Create Connection", "creating": "Creating...", - "dataAssistant": "Data Extraction Assistant", + "dataAssistant": "Data Loading Assistant", "addData": "Add Data", "loadDataIn": "Load data in", "browserLabel": "Browser", @@ -91,7 +108,7 @@ "includeSubfolders": "Include subfolders", "localFolder": "Link local folder", "localFolderConnected": "Local folder", - "localFolderDesc": "Connect to a local folder for fast imports", + "localFolderDesc": "Browse files on your computer", "localFolderHint": "Select a folder on your computer to browse and import data files.", "opening": "Opening...", "orTypePath": "or type a path manually", @@ -101,18 +118,19 @@ "storedInBrowser": "Data stays in browser only", "storedOnDisk": "Data stored on disk", "connectorDesc": { - "mysql": "Query tables from a MySQL database", - "postgresql": "Query tables from a PostgreSQL database", - "mssql": "Query tables from Microsoft SQL Server", - "cosmosdb": "Query containers from Azure Cosmos DB", - "mongodb": "Query collections from MongoDB", - "bigquery": "Query datasets from Google BigQuery", - "athena": "Query data via Amazon Athena", - "kusto": "Query data from Azure Data Explorer (Kusto)", - "superset": "Browse datasets from Apache Superset", - "azure_blob": "Load files from Azure Blob Storage", - "s3": "Load files from Amazon S3", - "local_folder": "Browse and import files from a local folder" + "sample_datasets": "Try it out with sample data", + "mysql": "Query MySQL tables", + "postgresql": "Query Postgres tables", + "mssql": "Query SQL Server tables", + "cosmosdb": "Query Cosmos DB containers", + "mongodb": "Query MongoDB collections", + "bigquery": "Query BigQuery datasets", + "athena": "Query Amazon Athena", + "kusto": "Query Azure Data Explorer", + "superset": "Browse Superset datasets", + "azure_blob": "Load Azure Blob files", + "s3": "Load Amazon S3 files", + "local_folder": "Browse local files" }, "localFolderDefaultName": "Local Folder", "errors": { @@ -145,7 +163,10 @@ "copy": "Copy", "copied": "Copied", "viewOnGithub": "View on GitHub", - "viewOnPypi": "PyPI package" + "viewOnPypi": "PyPI package", + "requirements": "Requires Python 3.11+ and ", + "requirementsTail": ". Prefer pip, conda, or Docker? See ", + "otherInstallMethods": "other install methods" } } } diff --git a/src/i18n/locales/zh/chart.json b/src/i18n/locales/zh/chart.json index 3ddb6f17..45e10832 100644 --- a/src/i18n/locales/zh/chart.json +++ b/src/i18n/locales/zh/chart.json @@ -1,6 +1,16 @@ { "chart": { - "derivedConcepts": "派生概念", + "vegaLocale": { + "dateTime": "%x %A %X", + "date": "%Y年%-m月%-d日", + "time": "%H:%M:%S", + "periods": ["上午", "下午"], + "days": ["星期日", "星期一", "星期二", "星期三", "星期四", "星期五", "星期六"], + "shortDays": ["周日", "周一", "周二", "周三", "周四", "周五", "周六"], + "months": ["一月", "二月", "三月", "四月", "五月", "六月", "七月", "八月", "九月", "十月", "十一月", "十二月"], + "shortMonths": ["1月", "2月", "3月", "4月", "5月", "6月", "7月", "8月", "9月", "10月", "11月", "12月"] + }, + "derivedConcepts": "公式", "dataTransformCode": "数据转换代码", "dataTransformExplanation": "数据转换说明", "zoomIn": "放大", @@ -25,8 +35,13 @@ "code": "代码", "agentLog": "Agent 日志", "explain": "解释", - "concepts": "解释", - "orStartWithChartType": "或者,从图表类型开始", + "concepts": "公式", + "orStartWithChartType": "创建新图表?", + "orCreateYourself": "或者自己创建一个?", + "emptyStateTitle": "准备好探索你的数据了吗?", + "emptyStateSubtitle": "在聊天框中向智能体提问 —— 它可以提供探索建议、解读数据、转换数据、生成图表。", + "emptyStateChatHint": "试试左下角的聊天输入框", + "emptyStateOrPickType": "或者手动选择一种图表类型", "resample": "重新采样", "adjustSampleSize": "调整样本大小:{{sampleSize}} / {{totalSize}} 行", "log": "日志", @@ -71,7 +86,6 @@ "lollipopChart": "棒棒糖图", "pyramidChart": "金字塔图", "lineChart": "折线图", - "dottedLineChart": "带点折线图", "bumpChart": "排名变化图", "areaChart": "面积图", "streamgraph": "流图", @@ -91,12 +105,12 @@ "customArea": "自定义面积图" }, "chartCategoryTip": { - "scatterAndPoint": "散点和点类图表", - "bar": "柱状图/条形图类", - "lineAndArea": "折线和面积图类", - "partToWhole": "部分与整体比较图类", - "statistical": "统计分析图类", - "map": "地理地图图表", + "points": "点类图表(散点、点图、回归)", + "bars": "柱状与条形图类", + "distributions": "分布与统计图类", + "linesAndAreas": "线图与面积图类", + "circular": "环状图类(饼图、玫瑰、雷达)", + "tablesAndMaps": "网格、表格、KPI 与地图类", "custom": "自定义标记类型" }, "gallery": { @@ -151,7 +165,6 @@ "histogram": "直方图", "heatmap": "热力图", "lineChart": "折线图", - "dottedLineChart": "带点折线图", "boxplot": "箱线图", "pieChart": "饼图", "rangedDotPlot": "区间点图", diff --git a/src/i18n/locales/zh/common.json b/src/i18n/locales/zh/common.json index 66ef3d1d..1740e92a 100644 --- a/src/i18n/locales/zh/common.json +++ b/src/i18n/locales/zh/common.json @@ -377,7 +377,8 @@ "partial": "不完整", "unavailable": "不可用", "not_synced": "未同步" - } + }, + "loadInNewSession": "在新会话中加载" }, "dataThread": { "title": "数据线程", @@ -415,6 +416,7 @@ "threadIndex": "线程 - {{index}}", "continuedFromAbove": "续上", "continuesBelow": "续下", + "usingSources": "使用", "hmm": "嗯...", "oops": "出错了...", "completed": "已完成", @@ -439,6 +441,10 @@ "pinTable": "固定表", "refreshSettings": "刷新设置", "replaceData": "替换数据", + "viewMetadata": "查看元数据", + "metadataFor": "{{table}} 的元数据", + "derivationSummary": "派生摘要", + "noMetadata": "该表暂无描述信息。", "rowsByColumns": "{{rows}}行 \u00d7 {{cols}}列", "chartAlt": "{{type}}图表", "streamSourceLabel": "流式源", @@ -460,7 +466,37 @@ "loadedOfTotal": "已加载 {{loaded}} / {{total}} 行", "viewRandomRows": "查看此表的 10000 行随机数据", "downloadAsCsv": "下载为 CSV", - "downloading": "下载中..." + "downloading": "下载中...", + "columnMenu": { + "openMenu": "列选项", + "sortAsc": "升序排序", + "sortDesc": "降序排序", + "clearSort": "清除排序", + "filter": "筛选…", + "filterActive": "筛选(已启用)", + "clearFilter": "清除筛选", + "filterComingSoon": "筛选界面即将推出。" + }, + "filter": { + "from": "起始", + "to": "结束", + "includeBlanks": "显示空值", + "showBlanksOnly": "仅显示空值", + "contains": "包含…", + "blank": "(空值)", + "apply": "应用", + "clear": "清除筛选", + "search": "搜索值", + "selectAll": "(全选)", + "noMatches": "无匹配值", + "distinctHint": "{{count}} 个不同值", + "sectionSort": "排序", + "sectionFilter": "筛选", + "filterApplied": "已启用筛选", + "summaryRows": "{{count, number}} 行", + "summaryDistinct": "{{count, number}} 个不同值", + "summaryBlanks": "{{count, number}} 个空值" + } }, "chatDialog": { "noHistory": "尚无对话记录", @@ -519,6 +555,7 @@ "interactive": "交互式", "agent": "Agent", "getIdeas": "获取灵感", + "whatsNext": "接下来呢?", "editor": "编辑器", "getIdeasForVisualization": "获取画图的灵感", "differentIdeas": "换一些灵感?", @@ -572,6 +609,12 @@ "invalidExplanation": "Agent 返回了无效的解释。", "explanationTitle": "说明", "explanationFollowupsLabel": "可能的下一步:", + "delegateTitle": "建议下一步交给其他 Agent", + "delegateMinimize": "收起", + "delegateExpand": "展开", + "delegateDismiss": "忽略", + "delegateToDataLoading": "在数据加载中搜索", + "delegateToReportGen": "生成报告", "errorDuringExploration": "探索过程中出错", "explorationStep": "探索步骤 {{step}}:{{question}}", "threadExplorePrompt": "探索这份数据中有趣的模式和趋势", @@ -609,6 +652,7 @@ "completingLogin": "正在完成登录…", "idpRedirecting": "正在从 SSO 系统跳转,请稍候…", "callbackFailed": "登录回调失败:{{message}}", + "ssoErrorAccessDenied": "授权已取消。如需使用单点登录,请重新登录。", "ssoErrorInvalidState": "SSO 会话已过期或被中断,请重新登录。", "ssoErrorInvalidClient": "SSO 客户端密钥配置错误,请联系管理员检查配置。", "ssoErrorTokenExchange": "SSO 登录过程中令牌交换失败,请重试或联系管理员。", @@ -657,7 +701,26 @@ "recoveredSession": "已恢复的会话", "errorOccurred": "发生了错误,请", "refreshSession": "刷新会话", - "errorPersistHint": "如果问题仍然存在,请点击关闭会话。" + "errorPersistHint": "如果问题仍然存在,请点击关闭会话。", + "yourSessions": "你的会话", + "rename": "重命名", + "export": "导出", + "importZip": "导入工作区 (.zip)", + "importingFile": "正在导入 {{name}}...", + "deleteTitle": "删除会话?", + "deleteConfirm": "将永久删除 {{name}}({{id}})及其所有数据。", + "deleteFailed": "删除工作区失败", + "renameFailed": "重命名工作区失败", + "exportFailed": "导出工作区失败", + "importFailed": "导入工作区失败", + "sortNewest": "最新", + "sortOldest": "最早", + "sortRecentlyModified": "最近修改", + "sortName": "名称", + "sortNewestFirst": "最新优先", + "sortOldestFirst": "最早优先", + "sortRecentlyModifiedFirst": "最近修改优先", + "sortNameAsc": "名称 (a–z)" }, "supersetCatalog": { "title": "Superset 数据集", @@ -694,13 +757,17 @@ "interaction": { "askedForClarification": "请求澄清", "gaveExplanation": "做了说明", + "delegatedToDataLoading": "建议加载更多数据", + "delegatedToReportGen": "建议生成报告", + "delegateLabelDataLoading": "建议加载数据", + "delegateLabelReportGen": "建议生成报告", "clarificationNeeded": "等待操作" }, "concepts": { - "showFewer": "收起概念", - "showAll": "显示所有概念", - "showFirstN": "显示前 {{count}} 个概念", - "showAllN": "显示全部 {{count}} 个概念" + "showFewer": "收起公式", + "showAll": "显示所有公式", + "showFirstN": "显示前 {{count}} 个公式", + "showAllN": "显示全部 {{count}} 个公式" }, "dataframe": { "columnCount": "{{count}} 列" @@ -731,6 +798,7 @@ "refresh": "刷新数据", "emptyTree": "未找到表格", "addConnector": "添加数据连接器", + "configureConnector": "编辑连接", "linkLocalFolder": "链接本地文件夹", "newSession": "新建会话", "noSessions": "暂无已保存的会话", @@ -784,7 +852,25 @@ "knowledge": "知识库", "metadataPartial": "元数据不完整", "metadataUnavailable": "元数据不可用", - "saving": "保存中..." + "saving": "保存中...", + "rename": "重命名", + "exportSession": "导出", + "exportFailed": "导出会话失败", + "importFailed": "导入工作区失败", + "failedRenameSession": "重命名会话失败", + "sortNewest": "最新", + "sortOldest": "最早", + "sortRecentlyModified": "最近修改", + "sortName": "名称", + "sortNewestFirst": "最新优先", + "sortOldestFirst": "最早优先", + "sortRecentlyModifiedFirst": "最近修改优先", + "sortNameAsc": "名称 (a–z)", + "timeJustNow": "刚刚", + "timeMinutes": "{{count}}分钟", + "timeHours": "{{count}}小时", + "timeYesterday": "昨天", + "timeDays": "{{count}}天" }, "knowledge": { "title": "Agent 知识", diff --git a/src/i18n/locales/zh/dataLoading.json b/src/i18n/locales/zh/dataLoading.json index 3875444f..4eab6fcf 100644 --- a/src/i18n/locales/zh/dataLoading.json +++ b/src/i18n/locales/zh/dataLoading.json @@ -1,7 +1,16 @@ { "dataLoading": { "title": "数据加载助手", - "subtitle": "从图片、文件、网页或文本中提取数据。生成合成数据集。浏览示例数据。", + "subtitle": "我可以帮你提取、生成或浏览数据 — 有问题随时问。", + "capabilityAsk": "回答关于已连接数据源的问题", + "capabilitySearch": "搜索并浏览精选示例数据集", + "capabilityExtractImage": "从图片中提取结构化数据", + "capabilityExtractFile": "从 PDF 或粘贴的文本中提取数据", + "capabilityHint": "聚焦下方输入框查看示例提示。", + "sectionTry": "试试这些任务", + "sectionChat": "或直接提问", + "chatHint": "", + "chatHintExample": "这里有什么数据?", "placeholder": "描述要提取、上传或生成的数据...", "attachTooltip": "附加文件或图片", "stopTooltip": "停止生成", @@ -18,7 +27,6 @@ "stopped": "— 已停止", "uploaded": "[已上传:{{name}}]", "defaultImageMessage": "从这张图片中提取数据", - "imageModelUnsupported": "当前选择的模型不支持图片输入。请切换到支持视觉的模型,或移除图片后再发送。", "syncInProgress": "正在同步目录元数据…", "syncComplete": "目录同步完成", "syncPartial": "目录同步部分完成,部分元数据可能缺失", @@ -27,15 +35,15 @@ "metadataStatusUnavailable": "不可用", "metadataStatusNotSynced": "未同步", "loadPlan": { - "title": "Agent 建议载入以下数据", "filters": "筛选条件", "rowLimit": "行数限制", "loadSelected": "加载选中的表", - "loaded": "已加载", + "loadedCount": "✓ 已加载 {{count}} 张表", "preview": "预览", + "hidePreview": "收起", "previewing": "正在预览...", "previewFailed": "预览失败", - "noFilters": "无筛选条件" + "fromSource": "来自" }, "toolLabels": { "readingFile": "读取文件", diff --git a/src/i18n/locales/zh/messages.json b/src/i18n/locales/zh/messages.json index c889319e..59185a44 100644 --- a/src/i18n/locales/zh/messages.json +++ b/src/i18n/locales/zh/messages.json @@ -63,6 +63,7 @@ "clarifyOptionContinue": "继续探索", "clarifyOptionSimplify": "简化任务", "clarifyOptionPresent": "展示当前结果", + "clarifyOptionSummary": "总结当前结果", "maxIterationsSummary": "已达到最大探索步数。", "emptyDataframe": "输出数据为空(0 行),请检查筛选条件或数据加载。", "fieldsNotFound": "输出数据中未找到图表编码字段:{{missing}}。可用列:{{available}}", diff --git a/src/i18n/locales/zh/model.json b/src/i18n/locales/zh/model.json index da1c4cf9..330891fd 100644 --- a/src/i18n/locales/zh/model.json +++ b/src/i18n/locales/zh/model.json @@ -46,9 +46,13 @@ "openaiProviderTip": "对于兼容 OpenAI 的 API,请将 提供商 设置为 openai 。", "loadingModels": "模型加载中...", "serverManaged": "由服务端管理", + "serverChip": "服务端配置", + "serverConfigured": "服务端配置", + "serverManagedTooltip": "由管理员配置", "serverManagedSection": "服务端配置的模型", "serverManagedReadonly": "不可删除", "userManagedSection": "我添加的模型", + "testing": "测试中…", "configured": "已配置", "configuredMessage": "服务端已配置,点击可验证连通性" } diff --git a/src/i18n/locales/zh/upload.json b/src/i18n/locales/zh/upload.json index 0a43bdc9..90f42f6e 100644 --- a/src/i18n/locales/zh/upload.json +++ b/src/i18n/locales/zh/upload.json @@ -2,17 +2,18 @@ "upload": { "title": "加载数据", "sampleDatasets": "示例数据集", - "sampleDatasetsDesc": "浏览并加载精选的示例数据集", + "sampleDatasetsDesc": "精选示例数据集", "uploadFile": "上传文件", - "uploadFileDesc": "上传本地文件(CSV、TSV、JSON、Excel)", + "uploadFileDesc": "CSV、TSV、JSON 或 Excel", "pasteData": "粘贴数据", - "pasteDataDesc": "直接从剪贴板粘贴表格数据", - "extractData": "数据提取助手", - "extractDataDesc": "AI 智能查找、提取和加载数据", + "pasteDataDesc": "从剪贴板粘贴", + "extractData": "数据加载助手", + "extractDataDesc": "使用 AI 查找并提取数据", "loadFromUrl": "从 URL 加载", - "loadFromUrlDesc": "从 URL 加载数据,支持自动刷新", + "loadFromUrlTitle": "从 URL 加载(实时)", + "loadFromUrlDesc": "从远程 URL 获取数据", "database": "数据库", - "databaseDesc": "连接数据库或数据服务", + "databaseDesc": "连接数据库或服务", "databaseDisabled": "此环境中数据库连接已禁用", "dragDrop": "拖放文件到此处", "orBrowse": "或浏览", @@ -35,6 +36,22 @@ "previewMode": "预览模式:编辑已禁用。点击\"显示完整\"以启用编辑。", "showPreview": "显示预览", "showFull": "显示完整", + "dataLoadingAgent": "数据加载助手", + "resumePreviousConversation": "上次对话 →", + "agentChatPlaceholder": "让助手查找数据集,或从图片、文本中提取数据…", + "agentChatTabSuggestion": "这里有哪些数据集?", + "agentChatSuggestionsLabel": "试试这样问", + "agentChatSendTooltip": "开始与助手对话", + "agentChatSuggestion": { + "askConnected": "已连接的数据源里有哪些数据集?", + "findCPI": "帮我加载消费者价格指数数据", + "extractFromExcel": "从附加的 Excel 文件中提取数据", + "kind": { + "ask": "提问", + "find": "查找", + "extract": "提取" + } + }, "uploadData": "上传数据", "importData": "导入数据", "dataConnections": "数据连接", @@ -43,7 +60,7 @@ "localData": "本地数据", "orConnectToDataSource": "或连接数据源(支持自动刷新)", "addConnection": "连接数据库", - "addConnectionDesc": "创建持久数据库连接", + "addConnectionDesc": "连接到实时数据库", "connectorConnected": "已连接", "connectorDisconnected": "点击连接", "pickDataSourceType": "选择数据源类型以创建新连接。", @@ -51,7 +68,7 @@ "connectionName": "连接名称", "createConnection": "创建连接", "creating": "创建中...", - "dataAssistant": "数据提取助手", + "dataAssistant": "数据加载助手", "addData": "添加数据", "loadDataIn": "数据加载到", "browserLabel": "浏览器", @@ -91,7 +108,7 @@ "includeSubfolders": "包含子文件夹", "localFolder": "链接本地文件夹", "localFolderConnected": "本地文件夹", - "localFolderDesc": "连接本地文件夹快速导入", + "localFolderDesc": "浏览本机文件", "localFolderHint": "选择电脑上的文件夹以浏览和导入数据文件。", "opening": "打开中...", "orTypePath": "或手动输入路径", @@ -101,18 +118,19 @@ "storedInBrowser": "数据仅保留在浏览器中", "storedOnDisk": "数据存储在磁盘上", "connectorDesc": { - "mysql": "从 MySQL 数据库查询表", - "postgresql": "从 PostgreSQL 数据库查询表", - "mssql": "从 Microsoft SQL Server 查询表", - "cosmosdb": "从 Azure Cosmos DB 查询容器", - "mongodb": "从 MongoDB 查询集合", - "bigquery": "从 Google BigQuery 查询数据集", - "athena": "通过 Amazon Athena 查询数据", - "kusto": "从 Azure Data Explorer (Kusto) 查询数据", - "superset": "从 Apache Superset 浏览数据集", - "azure_blob": "从 Azure Blob Storage 加载文件", - "s3": "从 Amazon S3 加载文件", - "local_folder": "浏览并导入本地文件夹中的文件" + "sample_datasets": "使用示例数据快速体验", + "mysql": "查询 MySQL 表", + "postgresql": "查询 Postgres 表", + "mssql": "查询 SQL Server 表", + "cosmosdb": "查询 Cosmos DB 容器", + "mongodb": "查询 MongoDB 集合", + "bigquery": "查询 BigQuery 数据集", + "athena": "查询 Amazon Athena", + "kusto": "查询 Azure Data Explorer", + "superset": "浏览 Superset 数据集", + "azure_blob": "加载 Azure Blob 文件", + "s3": "加载 Amazon S3 文件", + "local_folder": "浏览本地文件" }, "localFolderDefaultName": "本地文件夹", "errors": { @@ -145,7 +163,10 @@ "copy": "复制", "copied": "已复制", "viewOnGithub": "在 GitHub 查看", - "viewOnPypi": "PyPI 包" + "viewOnPypi": "PyPI 包", + "requirements": "需要 Python 3.11 以上版本及 ", + "requirementsTail": "。偏好 pip、conda 或 Docker?请查看", + "otherInstallMethods": "其他安装方式" } } } diff --git a/src/icons.tsx b/src/icons.tsx index f9471c45..ea86635c 100644 --- a/src/icons.tsx +++ b/src/icons.tsx @@ -100,14 +100,35 @@ const CONNECTOR_ICON_MAP: Record> = { local_folder: FolderOpenIconMui, }; -/** Category sort order for data source sidebar. Lower = higher in the list. */ +/** Category sort order for data source sidebar. Lower = higher in the list. + * The backend exposes connector type in two shapes depending on the + * endpoint: + * - `/api/connectors` (sidebar): `source_type` is the loader class name + * (`MySQLDataLoader`, `SampleDatasetsLoader`, ...). + * - `/api/data-loaders` (upload dialog): `type` is the loader id + * (`mysql`, `sample_datasets`, ...). + * We index by both so a single comparator works for both call sites. */ const CONNECTOR_CATEGORY_ORDER: Record = { - local_folder: -1, // Local (top) - mysql: 0, mssql: 0, postgresql: 0, // Relational DB - mongodb: 1, cosmosdb: 1, // Document Store - s3: 2, azure_blob: 2, // Cloud Storage - bigquery: 3, kusto: 3, athena: 3, // Query Engine - superset: 4, // Dashboard + // Example Datasets (always top) + sample_datasets: -100, SampleDatasetsLoader: -100, + // Local + local_folder: -1, LocalFolderDataLoader: -1, + // Relational DB + mysql: 0, MySQLDataLoader: 0, + mssql: 0, MSSQLDataLoader: 0, + postgresql: 0, PostgreSQLDataLoader: 0, + // Document Store + mongodb: 1, MongoDBDataLoader: 1, + cosmosdb: 1, CosmosDBDataLoader: 1, + // Cloud Storage + s3: 2, S3DataLoader: 2, + azure_blob: 2, AzureBlobDataLoader: 2, + // Query Engine + bigquery: 3, BigQueryDataLoader: 3, + kusto: 3, KustoDataLoader: 3, + athena: 3, AthenaDataLoader: 3, + // Dashboard + superset: 4, SupersetLoader: 4, }; /** Sort comparator: group by category, then alphabetical within each group. */ diff --git a/src/lib/agents-chart/chartjs/assemble.ts b/src/lib/agents-chart/chartjs/assemble.ts index 8195635d..c77cab0f 100644 --- a/src/lib/agents-chart/chartjs/assemble.ts +++ b/src/lib/agents-chart/chartjs/assemble.ts @@ -178,6 +178,7 @@ export function assembleChartjs(input: ChartAssemblyInput): any { channelSemantics, layout: layoutResult, table: values, + fullTable: convertedData, resolvedEncodings, encodings, chartProperties, diff --git a/src/lib/agents-chart/core/recommendation.ts b/src/lib/agents-chart/core/recommendation.ts index bebe2e3a..9b38249c 100644 --- a/src/lib/agents-chart/core/recommendation.ts +++ b/src/lib/agents-chart/core/recommendation.ts @@ -65,6 +65,17 @@ const FAMILY_XY_STANDARD: ChannelRoleMap = { column: 'facetCol', row: 'facetRow', }; +/** Horizontal x/y charts (Bar Table, etc.): y=category, x=measure. Same channel + * names as standard x/y, but axes are swapped — keeping them as x/y means we + * inherit shelf/color/facet plumbing for free, while adaptation knows the + * category lives on `y` so switching to/from a vertical bar chart auto-swaps. */ +const FAMILY_XY_HORIZONTAL: ChannelRoleMap = { + y: 'category', x: 'measure', color: 'series', + opacity: 'auxiliary', size: 'auxiliary', shape: 'auxiliary', + detail: 'auxiliary', group: 'series', + column: 'facetCol', row: 'facetRow', +}; + /** Pie-like charts: color=category, size=measure */ const FAMILY_PIE: ChannelRoleMap = { color: 'category', size: 'measure', @@ -146,13 +157,13 @@ const FAMILY_RANGE: ChannelRoleMap = { const CHART_ROLE_MAP: Record = { // Axis-based (x/y standard) 'Bar Chart': FAMILY_XY_STANDARD, - 'Pyramid Chart': FAMILY_XY_STANDARD, + 'Pyramid Chart': FAMILY_XY_HORIZONTAL, 'Grouped Bar Chart': FAMILY_XY_STANDARD, 'Stacked Bar Chart': FAMILY_XY_STANDARD, 'Lollipop Chart': FAMILY_XY_STANDARD, 'Waterfall Chart': FAMILY_XY_STANDARD, + 'Bar Table': FAMILY_XY_HORIZONTAL, 'Line Chart': FAMILY_XY_STANDARD, - 'Dotted Line Chart': FAMILY_XY_STANDARD, 'Bump Chart': FAMILY_XY_STANDARD, 'Area Chart': FAMILY_XY_STANDARD, 'Streamgraph': FAMILY_XY_STANDARD, @@ -265,17 +276,15 @@ export function adaptChannels( * Edit-distance-based adaptation: find the minimum-cost mapping from existing * field→channel assignments to target chart channels. * - * Cost model: - * 0 — field stays in the same channel (compatible type) - * 0.5 — field moves to a channel with the same semantic role - * (e.g., color→group — both "series" role) - * 1 — field moves to a channel with a different but compatible role - * 1.5 — field is dropped (not assigned to any target channel) - * (slightly above move-cost so we prefer keeping fields when tied) + * Cost model (see `assignCost` below): + * 0 — same channel name AND same semantic role + * 0.5 — different channel name, same semantic role + * 1 — same channel name but different role, OR different name + role + * 1.5 — field is dropped * ∞ — field type is incompatible with target channel * - * After the minimum-cost assignment, remaining empty target channels are - * filled via the recommendation engine. + * No autofill: the result contains at most as many fields as the source had. + * Empty target channels are left for the user to fill explicitly. */ function adaptViaRecommendation( sourceType: string, @@ -284,7 +293,7 @@ function adaptViaRecommendation( encodings: Record, data: any[], semanticTypes: Record, - recommendFn: RecommendFn, + _recommendFn: RecommendFn, ): Record { // --- Pre-process: handle facet channels and filter data --- const FACET_CHANNELS = ['column', 'row']; @@ -312,7 +321,12 @@ function adaptViaRecommendation( const st = tv.fieldSemanticType[field] ?? ''; const card = tv.fieldLevels[field]?.length ?? 0; switch (role) { - case 'category': return isDiscreteLike(ft, st, card); + // 'category' is for true discrete axes (nominal/ordinal/temporal). + // Quantitative fields — even low-cardinality ones — must NOT + // satisfy this role, otherwise a measure can land on the + // category axis (e.g. Bar Table y) and push the real discrete + // field onto color. + case 'category': return !isQuantitativeField(ft, st) && isDiscreteLike(ft, st, card); case 'measure': return isQuantitativeField(ft, st); case 'series': return isDiscreteLike(ft, st, card); case 'geo': return isGeoCoordinateType(st) || ft === 'quantitative'; @@ -323,18 +337,32 @@ function adaptViaRecommendation( }; // --- Cost function for assigning field (from srcCh) to targetCh --- + // + // Role match is the dominant signal: a field whose role matches the target + // channel's role is preferred over a field that merely happens to share the + // same channel name. This is important when swapping between strict + // "category × quantitative" chart types (Bar, Pie, Heatmap, …) where the + // semantic axis assignment of the target should win over channel-name + // preservation from the source. const assignCost = (srcCh: string, field: string, targetCh: string): number => { const targetRole = getChannelRole(targetType, targetCh); if (!isFieldCompatibleWithRole(targetRole, field)) return Infinity; - // Same channel name → free - if (srcCh === targetCh) return 0; - - // Same semantic role (e.g., color→group, both "series") → small cost const srcRole = getChannelRole(sourceType, srcCh); + + // Same channel AND same role → free preservation (e.g. Bar→Stacked Bar x→x) + if (srcCh === targetCh && srcRole === targetRole) return 0; + + // Same semantic role, different channel name → small cost + // (e.g. Heatmap.color (measure) → Bar.y (measure)) if (srcRole === targetRole) return 0.5; - // Different role but type-compatible → higher cost + // Same channel name but different role → role mismatch is more costly + // than a role-match move; prevents e.g. src.color (series) clobbering + // tgt.color (category) in Pie. + if (srcCh === targetCh) return 1; + + // Different role and different name, type-compatible → highest move cost return 1; }; @@ -389,20 +417,14 @@ function adaptViaRecommendation( solve(0, 0, {}); // --- Merge pre-pinned facets + solver result --- + // + // Intentionally do NOT auto-fill remaining empty target channels via the + // recommendation engine: when the user already configured a chart with N + // fields and switches type, the adapted chart should also have at most N + // fields. Adding extra fields the user never picked is surprising (e.g. + // a 2-encoding Bar Chart turning into a 5-encoding Scatter Plot on type + // switch). Empty channels are left for the user to fill explicitly. const result: Record = { ...prePinned, ...bestAssignment }; - const usedFields = new Set(Object.values(result)); - - // --- Fill remaining empty channels via recommendation --- - const emptyChannels = targetChannels.filter(ch => !(ch in result)); - if (emptyChannels.length > 0) { - const rec = recommendFn(targetType, buildTableView(facetedData, semanticTypes)); - for (const ch of emptyChannels) { - if (rec[ch] && !usedFields.has(rec[ch])) { - result[ch] = rec[ch]; - usedFields.add(rec[ch]); - } - } - } return result; } diff --git a/src/lib/agents-chart/core/types.ts b/src/lib/agents-chart/core/types.ts index 8f976169..7f2da084 100644 --- a/src/lib/agents-chart/core/types.ts +++ b/src/lib/agents-chart/core/types.ts @@ -23,6 +23,8 @@ export const channels = [ "x", "y", "x2", "y2", "id", "color", "opacity", "size", "shape", "strokeDash", "column", "row", "latitude", "longitude", "radius", "detail", "group", "open", "high", "low", "close", "angle", + // KPI Card: one row per tile, no chart axes. + "metric", "value", "goal", ] as const; export const channelGroups: Record = { @@ -30,6 +32,7 @@ export const channelGroups: Record = { "legends": ["color", "group", "size", "shape", "text", "opacity", "strokeDash"], "price": ["open", "high", "low", "close"], "facets": ["column", "row"], + "kpi": ["metric", "value", "goal"], }; /** @@ -392,6 +395,20 @@ export interface InstantiateContext { /** The data table (array of row objects, post-overflow filtering) */ table: any[]; + /** + * The full data table (array of row objects, BEFORE overflow filtering). + * + * `table` may have categories silently dropped by `filterOverflow` to + * fit the canvas. Templates that need an honest view of the raw data + * — e.g. a "top-N + Others" rollup, an annotation that summarizes + * what wasn't shown, or a sparkline reference — should read from + * `fullTable` instead. + * + * Optional for backwards-compatibility; backends that don't set it + * fall back to `table`. + */ + fullTable?: any[]; + /** Resolved VL encoding objects (built by assembler from Phase 0 decisions) */ resolvedEncodings: Record; diff --git a/src/lib/agents-chart/docs/README.md b/src/lib/agents-chart/docs/README.md index 8e3a8b6c..d4108d69 100644 --- a/src/lib/agents-chart/docs/README.md +++ b/src/lib/agents-chart/docs/README.md @@ -918,25 +918,25 @@ visualization decisions: ## Template Catalog -### Vega-Lite (30 chart types) +### Vega-Lite (29 chart types) | Category | Charts | |----------|--------| | **Scatter & Point** | Scatter Plot, Regression, Ranged Dot Plot, Boxplot, Strip Plot | | **Bar** | Bar Chart, Grouped Bar Chart, Stacked Bar Chart, Histogram, Heatmap, Lollipop Chart, Pyramid Chart | -| **Line & Area** | Line Chart, Dotted Line Chart, Bump Chart, Area Chart, Streamgraph | +| **Line & Area** | Line Chart, Bump Chart, Area Chart, Streamgraph | | **Part-to-Whole** | Pie Chart, Rose Chart, Waterfall Chart | | **Statistical** | Density Plot, Candlestick Chart, Radar Chart | | **Map** | US Map, World Map | | **Custom** | Custom Point, Custom Line, Custom Bar, Custom Rect, Custom Area | -### ECharts (28 chart types) +### ECharts (27 chart types) | Category | Charts | |----------|--------| | **Scatter & Point** | Scatter Plot, Regression, Ranged Dot Plot, Boxplot, Strip Plot | | **Bar** | Bar Chart, Grouped Bar Chart, Stacked Bar Chart, Histogram, Heatmap, Lollipop Chart, Pyramid Chart | -| **Line & Area** | Line Chart, Dotted Line Chart, Bump Chart, Area Chart, Streamgraph | +| **Line & Area** | Line Chart, Bump Chart, Area Chart, Streamgraph | | **Part-to-Whole** | Pie Chart, Funnel Chart, Treemap, Sunburst Chart | | **Polar** | Radar Chart, Rose Chart | | **Financial** | Candlestick Chart | diff --git a/src/lib/agents-chart/docs/design-stretch-model.md b/src/lib/agents-chart/docs/design-stretch-model.md index 1f0d7855..1a580a3d 100644 --- a/src/lib/agents-chart/docs/design-stretch-model.md +++ b/src/lib/agents-chart/docs/design-stretch-model.md @@ -499,7 +499,6 @@ When `maintainContinuousAxisRatio` is set, both axes use the maximum of the two |---|---|---|---|---|---| | Scatter | 30 | 30 | 0.3 | 1.5 | — | | Line | 100 | 20 | 0.3 | 1.5 | auto (→ Y) | -| Dotted Line | 100 | 20 | 0.3 | 1.5 | auto (→ Y) | | Area | 100 | 20 | 0.3 | 1.5 | auto (→ Y) | | Streamgraph | 100 | 20 | 0.3 | 1.5 | auto (→ Y) | | Bump | 80 | 20 | 0.3 | 1.5 | auto (→ Y) | diff --git a/src/lib/agents-chart/docs/test_plan.md b/src/lib/agents-chart/docs/test_plan.md index 63ae4a7e..624a2df2 100644 --- a/src/lib/agents-chart/docs/test_plan.md +++ b/src/lib/agents-chart/docs/test_plan.md @@ -23,7 +23,7 @@ UI (`ChartGallery.tsx`) uses `TEST_GENERATORS` and `GALLERY_SECTIONS` from | **Facets** | facet-tests | Column, row, col+row, wrap, clip, overflow faceting | | **Stress/sizing** | stress-tests, gas-pressure-tests, line-area-stretch-tests, discrete-axis-tests | Overflow, elasticity, pressure model, discrete axis sizing | | **Temporal** | date-tests | Year, Month, YearMonth, Decade, DateTime, Hours parsing/formatting | -| **Line/area variants** | line-area-tests | Dotted Line, Bump Chart | +| **Line/area variants** | line-area-tests | Bump Chart | --- diff --git a/src/lib/agents-chart/echarts/assemble.ts b/src/lib/agents-chart/echarts/assemble.ts index dd500911..13a2cde7 100644 --- a/src/lib/agents-chart/echarts/assemble.ts +++ b/src/lib/agents-chart/echarts/assemble.ts @@ -233,6 +233,7 @@ export function assembleECharts(input: ChartAssemblyInput): any { channelSemantics, layout: layoutResult, table: values, + fullTable: convertedData, resolvedEncodings, encodings, chartProperties, diff --git a/src/lib/agents-chart/echarts/recommendation.ts b/src/lib/agents-chart/echarts/recommendation.ts index a212447e..e5da4bcf 100644 --- a/src/lib/agents-chart/echarts/recommendation.ts +++ b/src/lib/agents-chart/echarts/recommendation.ts @@ -31,19 +31,22 @@ function ecGetRecommendation(chartType: string, tv: InternalTableView): Record ({ - paramOverrides: { continuousMarkCrossSection: { x: 100, y: 20, seriesCountAxis: 'auto' } }, - }), - instantiate: (spec, ctx) => { - const { channelSemantics, table, chartProperties } = ctx; - const xCS = channelSemantics.x; - const yCS = channelSemantics.y; - const colorField = channelSemantics.color?.field; - - if (!xCS?.field || !yCS?.field) return; - const xField = xCS.field; - const yField = yCS.field; - - const xIsDiscrete = isDiscrete(xCS.type); - const xIsTemporal = xCS.type === 'temporal'; - const categories = xIsDiscrete ? extractCategories(table, xField, getCategoryOrder(ctx, 'x')) : undefined; - - const option: any = { - tooltip: { trigger: 'axis' }, - xAxis: (() => { - const type = xIsDiscrete ? 'category' : xIsTemporal ? 'time' : 'value'; - const base: any = { - type, - name: xField, - nameLocation: 'middle', - nameGap: 30, - ...(categories ? { data: categories } : {}), - }; - if (xIsDiscrete && categories) { - base.axisTick = { show: true, alignWithLabel: true }; - base.axisLabel = { rotate: areCategoriesNumeric(categories) ? 0 : 90 }; - } else if (xIsTemporal) { - base.axisTick = { show: true, alignWithLabel: true }; - base.axisLabel = { rotate: 90 }; - } else { - base.axisTick = { show: true }; - } - return base; - })(), - yAxis: { - type: 'value', - name: yField, - nameLocation: 'middle', - nameGap: 40, - axisTick: { show: true }, - axisLabel: { rotate: 0 }, - }, - series: [], - }; - option._encodingTooltip = { trigger: 'axis', categoryLabel: xField, valueLabel: yField }; - - if (channelSemantics.y?.zero) { - option.yAxis.scale = !channelSemantics.y.zero.zero; - } - - const interpolate = chartProperties?.interpolate; - const smooth = interpolate === 'monotone' || interpolate === 'basis' || - interpolate === 'cardinal' || interpolate === 'catmull-rom'; - - const baseSeriesOpt = { - showSymbol: true, - symbol: 'circle', - symbolSize: 6, - lineStyle: { type: 'dashed' as const }, - smooth: !!smooth, - }; - - if (colorField) { - const groups = groupBy(table, colorField); - option.legend = { data: [...groups.keys()] }; - for (const [name, rows] of groups) { - const seriesData = xIsDiscrete - ? buildCategoryAlignedData(rows, xField, yField, categories!) - : rows.map(r => [r[xField], r[yField]]); - option.series.push({ - name, - type: 'line', - data: seriesData, - ...baseSeriesOpt, - // 颜色由 ecApplyLayoutToSpec 根据 colorDecisions 统一分配 - }); - } - } else { - const seriesData = xIsDiscrete - ? categories!.map(cat => { - const row = table.find(r => String(r[xField]) === cat); - return row ? row[yField] : null; - }) - : table.map(r => [r[xField], r[yField]]); - option.series.push({ type: 'line', data: seriesData, ...baseSeriesOpt }); - } - - Object.assign(spec, option); - delete spec.mark; - delete spec.encoding; - }, -}; - /** * Bump Chart — line with points, rank axis reversed when y is rank-like (mirror vegalite/templates/bump.ts). * Use yAxis as category with data ['1','2',...,'maxRank'] and inverse: true so rank 1 is at top without diff --git a/src/lib/agents-chart/gallery/bi-tests.ts b/src/lib/agents-chart/gallery/bi-tests.ts new file mode 100644 index 00000000..3d6532cb --- /dev/null +++ b/src/lib/agents-chart/gallery/bi-tests.ts @@ -0,0 +1,194 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Gallery test cases for BI-style chart prototypes (Vega-Lite only). + * + * KPI Card contract + * ───────────────── + * One row per tile. Channels: + * - `metric` (required) → caption + * - `value` (required) → big number (numeric or pre-formatted string) + * - `goal` (optional) → comparison value (numeric or string) + * + * Formatting is delegated to upstream data transformation. The template + * applies only a trivial `toLocaleString` default to numeric values. + * For currency / SI / percent formatting, format the column upstream + * and pass strings. + * + * Progress bar appears when both `value` and `goal` are finite numbers. + * Otherwise the goal renders as a small "Goal: " line. + */ + +import { Type } from '../../../data/types'; +import { TestCase, makeField, makeEncodingItem } from '../test-data/types'; + +export function genGalleryKpiCardTests(): TestCase[] { + return [ + // ── Single tile, numeric ────────────────────────────────────────── + { + title: 'KPI: Single tile (Numeric)', + description: + 'One row, only `value` bound. Caption defaults to the value field name. ' + + 'Numeric value rendered via `toLocaleString`.', + tags: ['gallery', 'bi', 'kpi'], + chartType: 'KPI Card', + data: [{ Revenue: 1_184_320 }], + fields: [makeField('Revenue')], + metadata: { + Revenue: { type: Type.Number, semanticType: 'Quantity', levels: [] }, + }, + encodingMap: { value: makeEncodingItem('Revenue') }, + chartProperties: { layout: 'horizontal' }, + }, + + // ── Single tile, pre-formatted string ───────────────────────────── + { + title: 'KPI: Single tile (pre-formatted string)', + description: + 'Agent formatted upstream. Template renders verbatim.', + tags: ['gallery', 'bi', 'kpi', 'preformatted'], + chartType: 'KPI Card', + data: [{ Metric: 'Revenue', Display: '$1.18M' }], + fields: [makeField('Metric'), makeField('Display')], + metadata: { + Metric: { type: Type.String, semanticType: 'Category', levels: [] }, + Display: { type: Type.String, semanticType: 'Category', levels: [] }, + }, + encodingMap: { + metric: makeEncodingItem('Metric'), + value: makeEncodingItem('Display'), + }, + chartProperties: { layout: 'horizontal' }, + }, + + // ── Multi-tile, numeric ─────────────────────────────────────────── + { + title: 'KPI: Multi-tile (Numeric quantities)', + description: + 'Four tiles, plain numeric values with default `toLocaleString`.', + tags: ['gallery', 'bi', 'kpi', 'multi-metric'], + chartType: 'KPI Card', + data: [ + { Metric: 'Active Users', Value: 12_402 }, + { Metric: 'New Signups', Value: 1_182 }, + { Metric: 'Churn', Value: 214 }, + { Metric: 'Power Users', Value: 876 }, + ], + fields: [makeField('Metric'), makeField('Value')], + metadata: { + Metric: { type: Type.String, semanticType: 'Category', levels: [] }, + Value: { type: Type.Number, semanticType: 'Quantity', levels: [] }, + }, + encodingMap: { + metric: makeEncodingItem('Metric'), + value: makeEncodingItem('Value'), + }, + chartProperties: { layout: 'horizontal' }, + }, + + // ── Multi-tile, pre-formatted heterogeneous ─────────────────────── + { + title: 'KPI: Multi-tile (heterogeneous, pre-formatted)', + description: + 'Per-tile units handled by the agent formatting each `Value` ' + + 'upstream. Template renders verbatim.', + tags: ['gallery', 'bi', 'kpi', 'multi-metric', 'preformatted'], + chartType: 'KPI Card', + data: [ + { Metric: 'Revenue', Value: '$1.23M' }, + { Metric: 'Orders', Value: '5,682' }, + { Metric: 'Avg Cart', Value: '$217.45' }, + { Metric: 'Refunds', Value: '312' }, + ], + fields: [makeField('Metric'), makeField('Value')], + metadata: { + Metric: { type: Type.String, semanticType: 'Category', levels: [] }, + Value: { type: Type.String, semanticType: 'Category', levels: [] }, + }, + encodingMap: { + metric: makeEncodingItem('Metric'), + value: makeEncodingItem('Value'), + }, + chartProperties: { layout: 'horizontal' }, + }, + + // ── Multi-tile with numeric goal (progress bar) ─────────────────── + { + title: 'KPI: With goal (progress bar)', + description: + 'Both value and goal are numeric → small "X% of goal" line + ' + + 'progress bar beneath the big number.', + tags: ['gallery', 'bi', 'kpi', 'goal', 'progress'], + chartType: 'KPI Card', + data: [ + { Metric: 'Q1 Revenue', Value: 1_184_320, Goal: 1_500_000 }, + { Metric: 'Signups', Value: 1_182, Goal: 2_000 }, + { Metric: 'NPS', Value: 47, Goal: 60 }, + { Metric: 'Stretch', Value: 128, Goal: 100 }, // overshoot + ], + fields: [makeField('Metric'), makeField('Value'), makeField('Goal')], + metadata: { + Metric: { type: Type.String, semanticType: 'Category', levels: [] }, + Value: { type: Type.Number, semanticType: 'Quantity', levels: [] }, + Goal: { type: Type.Number, semanticType: 'Quantity', levels: [] }, + }, + encodingMap: { + metric: makeEncodingItem('Metric'), + value: makeEncodingItem('Value'), + goal: makeEncodingItem('Goal'), + }, + chartProperties: { layout: 'horizontal' }, + }, + + // ── Multi-tile with string goal (no progress bar) ───────────────── + { + title: 'KPI: With goal (string, no progress bar)', + description: + 'String value or string goal → progress bar suppressed; ' + + 'goal renders as a "Goal: …" line.', + tags: ['gallery', 'bi', 'kpi', 'goal'], + chartType: 'KPI Card', + data: [ + { Metric: 'Revenue', Value: '$1.18M', Goal: '$1.50M' }, + { Metric: 'Headcount',Value: '142', Goal: '160' }, + ], + fields: [makeField('Metric'), makeField('Value'), makeField('Goal')], + metadata: { + Metric: { type: Type.String, semanticType: 'Category', levels: [] }, + Value: { type: Type.String, semanticType: 'Category', levels: [] }, + Goal: { type: Type.String, semanticType: 'Category', levels: [] }, + }, + encodingMap: { + metric: makeEncodingItem('Metric'), + value: makeEncodingItem('Value'), + goal: makeEncodingItem('Goal'), + }, + chartProperties: { layout: 'horizontal' }, + }, + + // ── Vertical layout ─────────────────────────────────────────────── + { + title: 'KPI: Vertical layout', + description: 'Same data as multi-tile numeric, stacked vertically.', + tags: ['gallery', 'bi', 'kpi', 'multi-metric', 'vertical'], + chartType: 'KPI Card', + data: [ + { Metric: 'Active Users', Value: 12_402 }, + { Metric: 'New Signups', Value: 1_182 }, + { Metric: 'Churn', Value: 214 }, + { Metric: 'Power Users', Value: 876 }, + ], + fields: [makeField('Metric'), makeField('Value')], + metadata: { + Metric: { type: Type.String, semanticType: 'Category', levels: [] }, + Value: { type: Type.Number, semanticType: 'Quantity', levels: [] }, + }, + encodingMap: { + metric: makeEncodingItem('Metric'), + value: makeEncodingItem('Value'), + }, + chartProperties: { layout: 'vertical' }, + }, + ]; +} diff --git a/src/lib/agents-chart/gallery/index.ts b/src/lib/agents-chart/gallery/index.ts index ac923962..a77b5b11 100644 --- a/src/lib/agents-chart/gallery/index.ts +++ b/src/lib/agents-chart/gallery/index.ts @@ -25,6 +25,8 @@ export { genGalleryRegionalSurveyRoseTests, } from './regional-survey-tests'; +export { genGalleryKpiCardTests } from './bi-tests'; + /** Keys registered in `TEST_GENERATORS` for the Regional Survey gallery tab. */ export const GALLERY_REGIONAL_SURVEY_GENERATOR_KEYS = [ 'Gallery: Scatter', diff --git a/src/lib/agents-chart/gofish/assemble.ts b/src/lib/agents-chart/gofish/assemble.ts index 0b569742..470fd6e5 100644 --- a/src/lib/agents-chart/gofish/assemble.ts +++ b/src/lib/agents-chart/gofish/assemble.ts @@ -430,6 +430,7 @@ export function assembleGoFish(input: ChartAssemblyInput): GoFishSpec { channelSemantics, layout: layoutResult, table: values, + fullTable: convertedData, resolvedEncodings, encodings, chartProperties, diff --git a/src/lib/agents-chart/gofish/templates/area.ts b/src/lib/agents-chart/gofish/templates/area.ts index 9744ac3e..7c742a0e 100644 --- a/src/lib/agents-chart/gofish/templates/area.ts +++ b/src/lib/agents-chart/gofish/templates/area.ts @@ -10,7 +10,7 @@ * chart(select("points")).mark(area({opacity: 0.8})), * ]) * - * Multi-series (stacked area) also needs group() — marked TODO. + * Multi-series (stacked area) uses group() for per-series area rendering. */ import { ChartTemplateDef } from '../../core/types'; diff --git a/src/lib/agents-chart/gofish/templates/line.ts b/src/lib/agents-chart/gofish/templates/line.ts index 903db60b..3aa0fdb8 100644 --- a/src/lib/agents-chart/gofish/templates/line.ts +++ b/src/lib/agents-chart/gofish/templates/line.ts @@ -11,7 +11,7 @@ * ]) * * For categorical x-axis we use spread() instead of scatter(). - * Multi-series (color) requires group() which needs layer naming — marked TODO. + * Multi-series (color) uses group() to render one line per series. */ import { ChartTemplateDef } from '../../core/types'; diff --git a/src/lib/agents-chart/test-data/gallery-tree.ts b/src/lib/agents-chart/test-data/gallery-tree.ts index 3fa8dd96..879668dd 100644 --- a/src/lib/agents-chart/test-data/gallery-tree.ts +++ b/src/lib/agents-chart/test-data/gallery-tree.ts @@ -69,7 +69,6 @@ const VEGALITE_CHART_TYPES: GalleryPage[] = [ { id: 'histogram', label: 'Histogram', generatorKeys: ['Histogram'] }, { id: 'heatmap', label: 'Heatmap', generatorKeys: ['Heatmap'] }, { id: 'line-chart', label: 'Line Chart', generatorKeys: ['Line Chart'] }, - { id: 'dotted-line', label: 'Dotted Line Chart', generatorKeys: ['Dotted Line Chart'] }, { id: 'boxplot', label: 'Boxplot', generatorKeys: ['Boxplot'] }, { id: 'pie-chart', label: 'Pie Chart', generatorKeys: ['Pie Chart'] }, { id: 'ranged-dot-plot', label: 'Ranged Dot Plot', generatorKeys: ['Ranged Dot Plot'] }, @@ -80,10 +79,12 @@ const VEGALITE_CHART_TYPES: GalleryPage[] = [ { id: 'bump-chart', label: 'Bump Chart', generatorKeys: ['Bump Chart'] }, { id: 'candlestick', label: 'Candlestick Chart', generatorKeys: ['Candlestick Chart'] }, { id: 'waterfall', label: 'Waterfall Chart', generatorKeys: ['Waterfall Chart'] }, + { id: 'bar-table', label: 'Bar Table', generatorKeys: ['Bar Table'] }, { id: 'strip-plot', label: 'Strip Plot', generatorKeys: ['Strip Plot'] }, { id: 'radar-chart', label: 'Radar Chart', generatorKeys: ['Radar Chart'] }, { id: 'pyramid-chart', label: 'Pyramid Chart', generatorKeys: ['Pyramid Chart'] }, { id: 'rose-chart', label: 'Rose Chart', generatorKeys: ['Rose Chart'] }, + { id: 'kpi-card', label: 'KPI Card', generatorKeys: ['Gallery: KPI Card'] }, { id: 'custom-charts', label: 'Custom Charts', generatorKeys: ['Custom Charts'] }, ].map(p => ({ ...p, render: 'single' as const, library: 'vegalite' as const })); diff --git a/src/lib/agents-chart/test-data/index.ts b/src/lib/agents-chart/test-data/index.ts index 3208f991..93f5f5ef 100644 --- a/src/lib/agents-chart/test-data/index.ts +++ b/src/lib/agents-chart/test-data/index.ts @@ -20,11 +20,11 @@ export { genScatterTests, genRegressionTests } from './scatter-tests'; export { genBarTests, genStackedBarTests, genGroupedBarTests } from './bar-tests'; export { genHistogramTests, genBoxplotTests, genDensityTests, genStripPlotTests } from './distribution-tests'; export { genLineTests } from './line-tests'; -export { genDottedLineTests, genBumpChartTests } from './line-area-tests'; +export { genBumpChartTests } from './line-area-tests'; export { genAreaTests, genStreamgraphTests } from './area-tests'; export { genHeatmapTests, genPieTests, genRangedDotPlotTests, genLollipopTests, - genCustomTests, genWaterfallTests, genCandlestickTests, genRadarTests, genPyramidTests, + genCustomTests, genWaterfallTests, genBarTableTests, genCandlestickTests, genRadarTests, genPyramidTests, genRoseTests, } from './specialized-tests'; export { FACET_SIZES, DISCRETE_SIZES, genFacetColumnTests, genFacetRowTests, genFacetColRowTests, genFacetSmallTests, genFacetWrapTests, genFacetClipTests, genFacetOverflowedColTests, genFacetOverflowedColRowTests, genFacetOverflowedRowTests, genFacetDenseLineTests } from './facet-tests'; @@ -86,11 +86,11 @@ import { genScatterTests, genRegressionTests } from './scatter-tests'; import { genBarTests, genStackedBarTests, genGroupedBarTests } from './bar-tests'; import { genHistogramTests, genBoxplotTests, genDensityTests, genStripPlotTests } from './distribution-tests'; import { genLineTests } from './line-tests'; -import { genDottedLineTests, genBumpChartTests } from './line-area-tests'; +import { genBumpChartTests } from './line-area-tests'; import { genAreaTests, genStreamgraphTests } from './area-tests'; import { genHeatmapTests, genPieTests, genRangedDotPlotTests, genLollipopTests, - genCustomTests, genWaterfallTests, genCandlestickTests, genRadarTests, genPyramidTests, + genCustomTests, genWaterfallTests, genBarTableTests, genCandlestickTests, genRadarTests, genPyramidTests, genRoseTests, } from './specialized-tests'; import { genFacetColumnTests, genFacetRowTests, genFacetColRowTests, genFacetSmallTests, genFacetWrapTests, genFacetClipTests, genFacetOverflowedColTests, genFacetOverflowedColRowTests, genFacetOverflowedRowTests, genFacetDenseLineTests } from './facet-tests'; @@ -115,6 +115,7 @@ import { genGalleryRegionalSurveyRadarTests, genGalleryRegionalSurveyRoseTests, } from '../gallery/regional-survey-tests'; +import { genGalleryKpiCardTests } from '../gallery/bi-tests'; import { genOmniVizGroupedBarTests, genOmniVizLineTests, @@ -135,7 +136,6 @@ export const TEST_GENERATORS: Record TestCase[]> = { 'Histogram': genHistogramTests, 'Heatmap': genHeatmapTests, 'Line Chart': genLineTests, - 'Dotted Line Chart': genDottedLineTests, 'Bump Chart': genBumpChartTests, 'Boxplot': genBoxplotTests, 'Pie Chart': genPieTests, @@ -146,6 +146,7 @@ export const TEST_GENERATORS: Record TestCase[]> = { 'Density Plot': genDensityTests, 'Candlestick Chart': genCandlestickTests, 'Waterfall Chart': genWaterfallTests, + 'Bar Table': genBarTableTests, 'Strip Plot': genStripPlotTests, 'Radar Chart': genRadarTests, 'Pyramid Chart': genPyramidTests, @@ -219,6 +220,7 @@ export const TEST_GENERATORS: Record TestCase[]> = { 'Gallery: Histogram': genGalleryRegionalSurveyHistogramTests, 'Gallery: Radar': genGalleryRegionalSurveyRadarTests, 'Gallery: Rose': genGalleryRegionalSurveyRoseTests, + 'Gallery: KPI Card': genGalleryKpiCardTests, 'Omni: Line': genOmniVizLineTests, 'Omni: Grouped Bar': genOmniVizGroupedBarTests, 'Omni: Waterfall': genOmniVizWaterfallTests, diff --git a/src/lib/agents-chart/test-data/line-area-tests.ts b/src/lib/agents-chart/test-data/line-area-tests.ts index 204e5af6..57c3eee5 100644 --- a/src/lib/agents-chart/test-data/line-area-tests.ts +++ b/src/lib/agents-chart/test-data/line-area-tests.ts @@ -8,32 +8,6 @@ import { seededRandom, genDates, genYears, genMonths, genCategories } from './ge // Line Chart tests have been moved to line-tests.ts (matrix-driven). // Area Chart & Streamgraph tests have been moved to area-tests.ts (matrix-driven). -// ------ Dotted Line Chart ------ -export function genDottedLineTests(): TestCase[] { - const tests: TestCase[] = []; - const rand = seededRandom(650); - - { - const months = genMonths(12); - const data = months.map(m => ({ Month: m, Temperature: Math.round(10 + rand() * 25) })); - tests.push({ - title: 'Ordinal × Quant (dotted line)', - description: 'Monthly temperatures with point markers', - tags: ['ordinal', 'quantitative', 'small'], - chartType: 'Dotted Line Chart', - data, - fields: [makeField('Month'), makeField('Temperature')], - metadata: { - Month: { type: Type.String, semanticType: 'Month', levels: months }, - Temperature: { type: Type.Number, semanticType: 'Temperature', levels: [] }, - }, - encodingMap: { x: makeEncodingItem('Month'), y: makeEncodingItem('Temperature') }, - }); - } - - return tests; -} - // ------ Bump Chart ------ export function genBumpChartTests(): TestCase[] { const tests: TestCase[] = []; diff --git a/src/lib/agents-chart/test-data/specialized-tests.ts b/src/lib/agents-chart/test-data/specialized-tests.ts index 4f66ad61..b7708215 100644 --- a/src/lib/agents-chart/test-data/specialized-tests.ts +++ b/src/lib/agents-chart/test-data/specialized-tests.ts @@ -4,7 +4,7 @@ import { Type } from '../../../data/types'; import { Channel, EncodingItem } from '../../../components/ComponentType'; import { TestCase, makeField, makeEncodingItem, buildMetadata } from './types'; -import { seededRandom, genDates, genMonths, genCategories } from './generators'; +import { seededRandom, genDates, genMonths, genCategories, genRandomNames } from './generators'; // ------ Heatmap ------ export function genHeatmapTests(): TestCase[] { @@ -627,6 +627,547 @@ export function genWaterfallTests(): TestCase[] { return tests; } +// ------ Bar Table ------ +export function genBarTableTests(): TestCase[] { + const tests: TestCase[] = []; + + // 1. Mirrors the Chinese-BI screenshot: top categories contributing to total GMV. + { + const data = [ + { Category: '电子产品', Contribution: 331207.65 }, + { Category: '自行车', Contribution: 89774.50 }, + { Category: '香水和古龙水', Contribution: 57668.30 }, + { Category: '服装', Contribution: 48210.10 }, + { Category: '家具', Contribution: 32104.55 }, + { Category: '玩具', Contribution: 20157.80 }, + { Category: '其他', Contribution: 12998.40 }, + ]; + tests.push({ + title: 'GMV Contribution (7 categories)', + description: 'Ranked contribution table — bar + value + % share, like Chinese BI dashboards', + tags: ['nominal', 'small', 'gradient'], + chartType: 'Bar Table', + data, + fields: [makeField('Category'), makeField('Contribution')], + metadata: { + Category: { type: Type.String, semanticType: 'Category', levels: data.map(d => d.Category) }, + Contribution: { type: Type.Number, semanticType: 'Amount', levels: [] }, + }, + encodingMap: { y: makeEncodingItem('Category'), x: makeEncodingItem('Contribution') }, + }); + } + + // 2. Sales by product — larger N, English labels + { + const products = genCategories('Product', 12); + const rand = seededRandom(2024); + const data = products.map(p => ({ Product: p, Sales: Math.round(rand() * 9000 + 500) })); + tests.push({ + title: 'Sales by Product (12 rows)', + description: 'Medium N — verify row density / band sizing', + tags: ['nominal', 'medium', 'gradient'], + chartType: 'Bar Table', + data, + fields: [makeField('Product'), makeField('Sales')], + metadata: { + Product: { type: Type.String, semanticType: 'Category', levels: products }, + Sales: { type: Type.Number, semanticType: 'Quantity', levels: [] }, + }, + encodingMap: { y: makeEncodingItem('Product'), x: makeEncodingItem('Sales') }, + }); + } + + // 3. With color grouping override (region) instead of gradient-by-value + { + const regions = ['North', 'South', 'East', 'West']; + const products = genCategories('SKU', 8); + const rand = seededRandom(77); + const data = products.map((p, i) => ({ + SKU: p, + Revenue: Math.round(rand() * 5000 + 1000), + Region: regions[i % regions.length], + })); + tests.push({ + title: 'Revenue by SKU, colored by Region', + description: 'color channel overrides default gradient — categorical hue per row', + tags: ['nominal', 'color', 'small'], + chartType: 'Bar Table', + data, + fields: [makeField('SKU'), makeField('Revenue'), makeField('Region')], + metadata: { + SKU: { type: Type.String, semanticType: 'Category', levels: products }, + Revenue: { type: Type.Number, semanticType: 'Quantity', levels: [] }, + Region: { type: Type.String, semanticType: 'Category', levels: regions }, + }, + encodingMap: { + y: makeEncodingItem('SKU'), + x: makeEncodingItem('Revenue'), + color: makeEncodingItem('Region'), + }, + }); + } + + // 4. Non-additive measure — % column should be auto-suppressed. + // Field name "Avg Rating" + values that don't represent a share-of-whole. + { + const data = [ + { Product: 'Alpha', 'Avg Rating': 4.7 }, + { Product: 'Beta', 'Avg Rating': 4.4 }, + { Product: 'Gamma', 'Avg Rating': 4.1 }, + { Product: 'Delta', 'Avg Rating': 3.9 }, + { Product: 'Epsilon', 'Avg Rating': 3.6 }, + { Product: 'Zeta', 'Avg Rating': 3.2 }, + ]; + tests.push({ + title: 'Avg Rating (% auto-hidden)', + description: 'Score is intensive (aggRole≠additive) → template auto-suppresses the % column', + tags: ['nominal', 'small', 'score'], + chartType: 'Bar Table', + data, + fields: [makeField('Product'), makeField('Avg Rating')], + metadata: { + Product: { type: Type.String, semanticType: 'Category', levels: data.map(d => d.Product) }, + 'Avg Rating': { type: Type.Number, semanticType: 'Score', levels: [] }, + }, + encodingMap: { y: makeEncodingItem('Product'), x: makeEncodingItem('Avg Rating') }, + }); + } + + // 5. Mixed-sign values (budget variance) — % auto-suppressed because + // "share of a whole" is ill-defined when signs mix. + // Profit semantic type → diverging palette anchored at 0. + { + const data = [ + { Department: 'Engineering', Variance: 120 }, + { Department: 'Sales', Variance: -45 }, + { Department: 'Marketing', Variance: -80 }, + { Department: 'Operations', Variance: 35 }, + { Department: 'HR', Variance: -20 }, + { Department: 'Finance', Variance: 15 }, + ]; + tests.push({ + title: 'Budget Variance (mixed signs → diverging palette)', + description: 'Profit (signed-additive, diverging:conditional) → palette anchored at 0; % auto-hidden', + tags: ['nominal', 'small', 'diverging', 'mixed-sign'], + chartType: 'Bar Table', + data, + fields: [makeField('Department'), makeField('Variance')], + metadata: { + Department: { type: Type.String, semanticType: 'Category', levels: data.map(d => d.Department) }, + Variance: { type: Type.Number, semanticType: 'Profit', levels: [] }, + }, + encodingMap: { y: makeEncodingItem('Department'), x: makeEncodingItem('Variance') }, + }); + } + + // 6. x is already a Percentage — values shown as-is (0.92, 0.87, …). + // No reformatting unless the framework explicitly resolves a + // format (it only does so when intrinsicDomain disambiguates 0–1 + // vs 0–100). The redundant % column is still auto-hidden. + { + const data = [ + { Team: 'Alpha', completion_rate: 0.92 }, + { Team: 'Beta', completion_rate: 0.87 }, + { Team: 'Gamma', completion_rate: 0.78 }, + { Team: 'Delta', completion_rate: 0.65 }, + { Team: 'Epsilon', completion_rate: 0.54 }, + ]; + tests.push({ + title: 'Completion Rate (raw 0–1 values shown as-is)', + description: 'Percentage without intrinsicDomain → raw values preserved; % column auto-hidden (redundant)', + tags: ['nominal', 'small', 'percentage'], + chartType: 'Bar Table', + data, + fields: [makeField('Team'), makeField('completion_rate')], + metadata: { + Team: { type: Type.String, semanticType: 'Category', levels: data.map(d => d.Team) }, + completion_rate: { type: Type.Number, semanticType: 'Percentage', levels: [] }, + }, + encodingMap: { y: makeEncodingItem('Team'), x: makeEncodingItem('completion_rate') }, + }); + } + + // ── Stress tests ──────────────────────────────────────────────── + + // 7. Many categories (50 unique rows) — triggers Top-N + "Others" + // rollup (default maxRows=20 → displays top 19 + Others(+31)). + { + const rand = seededRandom(909); + const employees = genRandomNames(50, 909); + const data = employees.map(e => ({ Employee: e, Sales: Math.round(rand() * 12000 + 100) })); + tests.push({ + title: 'Many rows (50 employees → Others rollup)', + description: '50 unique categories → top 19 kept, remaining 31 rolled into "Others (+31)"', + tags: ['nominal', 'large', 'rollup', 'stress'], + chartType: 'Bar Table', + data, + fields: [makeField('Employee'), makeField('Sales')], + metadata: { + Employee: { type: Type.String, semanticType: 'Name', levels: employees }, + Sales: { type: Type.Number, semanticType: 'Amount', levels: [] }, + }, + encodingMap: { y: makeEncodingItem('Employee'), x: makeEncodingItem('Sales') }, + }); + } + + // 7b. Same 50 rows but maxRows=0 — disables rollup, exercises the + // "render all rows" code path (no panel-level data override). + { + const rand = seededRandom(909); + const employees = genRandomNames(50, 909); + const data = employees.map(e => ({ Employee: e, Sales: Math.round(rand() * 12000 + 100) })); + tests.push({ + title: 'Many rows (50, rollup disabled)', + description: 'maxRows=0 → render all 50 rows; tests row density at scale', + tags: ['nominal', 'large', 'no-rollup', 'stress'], + chartType: 'Bar Table', + data, + fields: [makeField('Employee'), makeField('Sales')], + metadata: { + Employee: { type: Type.String, semanticType: 'Name', levels: employees }, + Sales: { type: Type.Number, semanticType: 'Amount', levels: [] }, + }, + encodingMap: { y: makeEncodingItem('Employee'), x: makeEncodingItem('Sales') }, + chartProperties: { maxRows: 0 }, + }); + } + + // 8. Very long category names — verifies labelLimit + the + // `labelAlign:'left' + labelPadding` trick under truncation. + { + const longNames = [ + 'Strategic Cloud Infrastructure & Platform Modernization Initiative', + 'Customer-Facing Conversational AI Assistant Rollout (Phase II)', + 'Cross-Functional Data Governance and Quality Improvement Program', + 'Next-Generation Identity & Access Management Migration', + 'Enterprise-Wide Endpoint Detection and Response Deployment', + 'Global Privacy Compliance & Regional Data-Residency Project', + 'Supply Chain Visibility and Real-Time Telemetry Initiative', + ]; + const rand = seededRandom(411); + const data = longNames.map(n => ({ Initiative: n, Budget: Math.round(rand() * 800000 + 50000) })); + tests.push({ + title: 'Very long category labels (truncation)', + description: 'Long y-axis labels → exercises labelLimit/labelPadding clamp at 220px', + tags: ['nominal', 'small', 'long-labels', 'stress'], + chartType: 'Bar Table', + data, + fields: [makeField('Initiative'), makeField('Budget')], + metadata: { + Initiative: { type: Type.String, semanticType: 'Category', levels: longNames }, + Budget: { type: Type.Number, semanticType: 'Amount', levels: [] }, + }, + encodingMap: { y: makeEncodingItem('Initiative'), x: makeEncodingItem('Budget') }, + }); + } + + // 9. Huge numbers (billions) — exercises width measurement for + // very wide value strings; ensures bar panel doesn't collapse. + { + const rand = seededRandom(7); + const countries = ['United States', 'China', 'Japan', 'Germany', 'India', 'United Kingdom', 'France', 'Italy', 'Brazil', 'Canada']; + const data = countries.map((c, i) => ({ + Country: c, + GDP: Math.round((rand() + 0.5) * 5e12) - i * 2e11, + })); + tests.push({ + title: 'Huge values (GDP in raw dollars)', + description: 'Trillion-scale numbers → wide value column, panel-width budget pressure', + tags: ['nominal', 'medium', 'huge-values', 'stress'], + chartType: 'Bar Table', + data, + fields: [makeField('Country'), makeField('GDP')], + metadata: { + Country: { type: Type.String, semanticType: 'Country', levels: countries }, + GDP: { type: Type.Number, semanticType: 'Amount', levels: [] }, + }, + encodingMap: { y: makeEncodingItem('Country'), x: makeEncodingItem('GDP') }, + }); + } + + // 10. Tiny values (sub-1) — exercises raw-as-is display for small + // decimals; no formatting should kick in. + { + const data = [ + { Sensor: 'A', Reading: 0.0023 }, + { Sensor: 'B', Reading: 0.0019 }, + { Sensor: 'C', Reading: 0.0017 }, + { Sensor: 'D', Reading: 0.0011 }, + { Sensor: 'E', Reading: 0.0008 }, + { Sensor: 'F', Reading: 0.0005 }, + ]; + tests.push({ + title: 'Tiny decimal values', + description: 'Sub-1 measurements → ensure raw values shown, no spurious rounding', + tags: ['nominal', 'small', 'tiny-values', 'stress'], + chartType: 'Bar Table', + data, + fields: [makeField('Sensor'), makeField('Reading')], + metadata: { + Sensor: { type: Type.String, semanticType: 'Category', levels: data.map(d => d.Sensor) }, + Reading: { type: Type.Number, semanticType: 'Quantity', levels: [] }, + }, + encodingMap: { y: makeEncodingItem('Sensor'), x: makeEncodingItem('Reading') }, + }); + } + + // 11. Single-row edge case — % must auto-hide (always 100%). + { + const data = [{ Region: 'APAC', Revenue: 4_250_000 }]; + tests.push({ + title: 'Single row (% auto-hidden)', + description: 'n=1 → "% of total" is always 100%, column should be suppressed', + tags: ['nominal', 'edge', 'single-row', 'stress'], + chartType: 'Bar Table', + data, + fields: [makeField('Region'), makeField('Revenue')], + metadata: { + Region: { type: Type.String, semanticType: 'Region', levels: ['APAC'] }, + Revenue: { type: Type.Number, semanticType: 'Amount', levels: [] }, + }, + encodingMap: { y: makeEncodingItem('Region'), x: makeEncodingItem('Revenue') }, + }); + } + + // 12. All-zero values — total=0 should auto-hide % (divide-by-zero). + { + const data = [ + { Quarter: 'Q1', Profit: 0 }, + { Quarter: 'Q2', Profit: 0 }, + { Quarter: 'Q3', Profit: 0 }, + { Quarter: 'Q4', Profit: 0 }, + ]; + tests.push({ + title: 'All zeros (% auto-hidden)', + description: 'total=0 → divide-by-zero guard suppresses % column; bar panel collapses to 0-width but spec stays valid', + tags: ['nominal', 'edge', 'zero-total', 'stress'], + chartType: 'Bar Table', + data, + fields: [makeField('Quarter'), makeField('Profit')], + metadata: { + Quarter: { type: Type.String, semanticType: 'Quarter', levels: ['Q1', 'Q2', 'Q3', 'Q4'] }, + Profit: { type: Type.Number, semanticType: 'Profit', levels: [] }, + }, + encodingMap: { y: makeEncodingItem('Quarter'), x: makeEncodingItem('Profit') }, + }); + } + + // 13. Mixed CJK + Latin labels — exercises CJK width heuristic. + { + const data = [ + { 项目: '人工智能 AI Platform', 收入: 5_420_000 }, + { 项目: '云计算 Cloud Services', 收入: 3_180_000 }, + { 项目: '数据分析 Analytics', 收入: 2_650_000 }, + { 项目: '物联网 IoT Solutions', 收入: 1_840_000 }, + { 项目: '网络安全 Security', 收入: 1_220_000 }, + { 项目: '区块链 Blockchain Lab', 收入: 480_000 }, + ]; + tests.push({ + title: 'Mixed CJK + Latin labels', + description: 'Bi-script category labels → tests CJK 2× width heuristic in label-column sizing', + tags: ['nominal', 'small', 'cjk', 'mixed-script', 'stress'], + chartType: 'Bar Table', + data, + fields: [makeField('项目'), makeField('收入')], + metadata: { + '项目': { type: Type.String, semanticType: 'Category', levels: data.map(d => d.项目) }, + '收入': { type: Type.Number, semanticType: 'Amount', levels: [] }, + }, + encodingMap: { y: makeEncodingItem('项目'), x: makeEncodingItem('收入') }, + }); + } + + // 14. Power-law (Pareto) distribution — verifies the top-1 bar + // doesn't visually crush the rest into invisibility. + { + const cats = genCategories('Vendor', 15); + const rand = seededRandom(31); + const data = cats.map((c, i) => ({ + Vendor: c, + Spend: Math.round(Math.pow(0.55, i) * 1_000_000 + rand() * 1000), + })); + tests.push({ + title: 'Power-law distribution (top dominates)', + description: 'Top row ~1.8M vs tail rows <1K → bar lengths span 3+ orders of magnitude', + tags: ['nominal', 'medium', 'power-law', 'stress'], + chartType: 'Bar Table', + data, + fields: [makeField('Vendor'), makeField('Spend')], + metadata: { + Vendor: { type: Type.String, semanticType: 'Category', levels: cats }, + Spend: { type: Type.Number, semanticType: 'Amount', levels: [] }, + }, + encodingMap: { y: makeEncodingItem('Vendor'), x: makeEncodingItem('Spend') }, + }); + } + + // 15. Multi-row-per-category with color grouping (stacked bars). + // Regression test: text panels must aggregate per-category, not + // render one mark per input row (which would overlap and show + // per-row ≈0% percents instead of category totals). + { + const months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']; + const channels = ['Music','Games','Entertainment','Education','People','Sports','Film','News','Comedy']; + const rand = seededRandom(1234); + const data: any[] = []; + for (const m of months) { + for (const c of channels) { + data.push({ + created_month: m, + channel_type: c, + views: Math.round(rand() * 5e8 + 1e7), + }); + } + } + tests.push({ + title: 'Stacked: views by month × channel_type', + description: 'Multi-row per y-category + color partition → text panels must show per-category totals (regression)', + tags: ['nominal', 'medium', 'stacked', 'multi-row', 'stress'], + chartType: 'Bar Table', + data, + fields: [makeField('created_month'), makeField('channel_type'), makeField('views')], + metadata: { + created_month: { type: Type.String, semanticType: 'Month', levels: months }, + channel_type: { type: Type.String, semanticType: 'Category', levels: channels }, + views: { type: Type.Number, semanticType: 'Amount', levels: [] }, + }, + encodingMap: { + y: makeEncodingItem('created_month'), + x: makeEncodingItem('views'), + color: makeEncodingItem('channel_type'), + }, + }); + } + + // 16. Column-faceted launch leaders with color = facet field and + // % of total enabled. Mirrors the app scenario where hconcat + // Bar Table needs top-level faceting and per-facet denominators. + { + const agencyTypes = ['private', 'startup', 'state']; + const data = [ + { Agency: 'Arianespace', 'Agency Type': 'private', 'Launch Count': 258 }, + { Agency: 'ILS-K', 'Agency Type': 'private', 'Launch Count': 97 }, + { Agency: 'ULA/LMA', 'Agency Type': 'private', 'Launch Count': 70 }, + { Agency: 'MDSSC', 'Agency Type': 'private', 'Launch Count': 62 }, + { Agency: 'OSC-Fairfax', 'Agency Type': 'private', 'Launch Count': 60 }, + { Agency: 'ULA/Boeing', 'Agency Type': 'private', 'Launch Count': 58 }, + { Agency: 'Boeing', 'Agency Type': 'private', 'Launch Count': 56 }, + { Agency: 'LMA', 'Agency Type': 'private', 'Launch Count': 43 }, + { Agency: 'SpaceX', 'Agency Type': 'startup', 'Launch Count': 65 }, + { Agency: 'Rocket Lab', 'Agency Type': 'startup', 'Launch Count': 2 }, + { Agency: 'RVSN', 'Agency Type': 'state', 'Launch Count': 1528 }, + { Agency: 'UNKS', 'Agency Type': 'state', 'Launch Count': 904 }, + { Agency: 'NASA', 'Agency Type': 'state', 'Launch Count': 469 }, + { Agency: 'USAF', 'Agency Type': 'state', 'Launch Count': 388 }, + { Agency: 'AFSC', 'Agency Type': 'state', 'Launch Count': 247 }, + { Agency: 'VKS RVSN', 'Agency Type': 'state', 'Launch Count': 200 }, + { Agency: 'CALT', 'Agency Type': 'state', 'Launch Count': 181 }, + { Agency: 'Roskosmos', 'Agency Type': 'state', 'Launch Count': 128 }, + ]; + tests.push({ + title: 'Faceted launch leaders (% within agency type)', + description: 'column + color both use Agency Type; % of total should be computed within each facet', + tags: ['nominal', 'facet', 'column', 'color', 'percentage', 'regression'], + chartType: 'Bar Table', + data, + fields: [makeField('Agency'), makeField('Agency Type'), makeField('Launch Count')], + metadata: { + Agency: { type: Type.String, semanticType: 'Category', levels: data.map(d => d.Agency) }, + 'Agency Type': { type: Type.String, semanticType: 'Category', levels: agencyTypes }, + 'Launch Count': { type: Type.Number, semanticType: 'Quantity', levels: [] }, + }, + encodingMap: { + y: makeEncodingItem('Agency'), + x: makeEncodingItem('Launch Count'), + color: makeEncodingItem('Agency Type'), + column: makeEncodingItem('Agency Type'), + }, + chartProperties: { showPercent: true }, + }); + } + + // 17. Many column facets with compact row counts. Exercises column + // wrapping, adaptive subplot height, and facet-aware fonts. + { + const markets = ['Americas', 'Europe', 'Middle East', 'Africa', 'East Asia', 'Oceania', 'South Asia']; + const agencies = ['Atlas', 'Beacon', 'Cosmos', 'Delta']; + const rand = seededRandom(8181); + const data: any[] = []; + for (const market of markets) { + for (const agency of agencies) { + data.push({ + Market: market, + Agency: `${agency} ${market}`, + Launches: Math.round(rand() * 180 + 20), + }); + } + } + tests.push({ + title: 'Wrapped market facets (7 panels)', + description: 'Seven column facets with four rows each — mini-table height and font size should adapt after wrapping', + tags: ['nominal', 'facet', 'wrap', 'small-multiple', 'layout'], + chartType: 'Bar Table', + data, + fields: [makeField('Market'), makeField('Agency'), makeField('Launches')], + metadata: { + Market: { type: Type.String, semanticType: 'Category', levels: markets }, + Agency: { type: Type.String, semanticType: 'Category', levels: data.map(d => d.Agency) }, + Launches: { type: Type.Number, semanticType: 'Quantity', levels: [] }, + }, + encodingMap: { + y: makeEncodingItem('Agency'), + x: makeEncodingItem('Launches'), + column: makeEncodingItem('Market'), + }, + chartProperties: { showPercent: true }, + }); + } + + // 18. Row + column faceting. Ensures Bar Table's hconcat spec also + // hoists correctly for two-dimensional facet grids. + { + const regions = ['US', 'EU']; + const eras = ['Historic', 'Recent']; + const agencies = ['National', 'Commercial', 'Defense']; + const data: any[] = []; + for (const era of eras) { + for (const region of regions) { + for (let i = 0; i < agencies.length; i++) { + data.push({ + Era: era, + Region: region, + Agency: `${region} ${agencies[i]}`, + Missions: (era === 'Historic' ? 120 : 70) + i * 35 + (region === 'US' ? 40 : 0), + }); + } + } + } + tests.push({ + title: 'Row + column facets (region × era)', + description: 'Two-dimensional faceting with percent totals computed per region/era panel', + tags: ['nominal', 'facet', 'row', 'column', 'percentage', 'layout'], + chartType: 'Bar Table', + data, + fields: [makeField('Era'), makeField('Region'), makeField('Agency'), makeField('Missions')], + metadata: { + Era: { type: Type.String, semanticType: 'Category', levels: eras }, + Region: { type: Type.String, semanticType: 'Region', levels: regions }, + Agency: { type: Type.String, semanticType: 'Category', levels: data.map(d => d.Agency) }, + Missions: { type: Type.Number, semanticType: 'Quantity', levels: [] }, + }, + encodingMap: { + y: makeEncodingItem('Agency'), + x: makeEncodingItem('Missions'), + column: makeEncodingItem('Region'), + row: makeEncodingItem('Era'), + }, + chartProperties: { showPercent: true }, + }); + } + + return tests; +} + // ------ Candlestick Chart ------ export function genCandlestickTests(): TestCase[] { const tests: TestCase[] = []; diff --git a/src/lib/agents-chart/vegalite/README.md b/src/lib/agents-chart/vegalite/README.md index 5cac2cb0..e614bbe6 100644 --- a/src/lib/agents-chart/vegalite/README.md +++ b/src/lib/agents-chart/vegalite/README.md @@ -36,7 +36,7 @@ vegalite/ index.ts – template registry (27 templates, 7 categories) scatter.ts – Scatter Plot, Linear Regression bar.ts – Bar, Grouped Bar, Stacked Bar, Histogram, Lollipop, Pyramid - line.ts – Line, Dotted Line, Bump Chart + line.ts – Line, Bump Chart area.ts – Area, Streamgraph pie.ts – Pie Chart rose.ts – Rose Chart @@ -58,7 +58,7 @@ vegalite/ |----------|--------| | Scatter & Point | Scatter Plot, Linear Regression, Boxplot, Strip Plot | | Bar | Bar Chart, Grouped Bar, Stacked Bar, Histogram, Lollipop, Pyramid | -| Line & Area | Line Chart, Dotted Line, Bump Chart, Area Chart, Streamgraph | +| Line & Area | Line Chart, Bump Chart, Area Chart, Streamgraph | | Part-to-Whole | Pie Chart, Rose Chart, Heatmap, Waterfall Chart | | Statistical | Density Plot, Ranged Dot Plot, Radar Chart, Candlestick Chart | | Map | US Map, World Map | diff --git a/src/lib/agents-chart/vegalite/assemble.ts b/src/lib/agents-chart/vegalite/assemble.ts index 071bffe2..8ba2e559 100644 --- a/src/lib/agents-chart/vegalite/assemble.ts +++ b/src/lib/agents-chart/vegalite/assemble.ts @@ -303,6 +303,7 @@ export function assembleVegaLite(input: ChartAssemblyInput): any { channelSemantics, layout: layoutResult, table: values, + fullTable: convertedData, resolvedEncodings, encodings, chartProperties, @@ -425,7 +426,7 @@ export function assembleVegaLite(input: ChartAssemblyInput): any { // RESULT // ═══════════════════════════════════════════════════════════════════════ - const result: any = { ...vgObj, data: { values } }; + const result: any = { ...vgObj, data: vgObj.data ?? { values } }; if (warnings.length > 0) { result._warnings = warnings; } @@ -739,6 +740,32 @@ function restructureFacets( facetGrid?: { columns: number; rows: number }, ): void { + const isConcatSpec = () => Array.isArray(vgObj.hconcat) || Array.isArray(vgObj.vconcat) || Array.isArray(vgObj.concat); + + const hoistConcatIntoFacet = (facetDef: any, wrapColumns?: number) => { + const childSpec: any = {}; + for (const key of ['hconcat', 'vconcat', 'concat', 'resolve', 'spacing', 'align', 'bounds', 'center'] as const) { + if (vgObj[key] !== undefined) { + childSpec[key] = vgObj[key]; + delete vgObj[key]; + } + } + if (vgObj.encoding && Object.keys(vgObj.encoding).length > 0) { + childSpec.encoding = vgObj.encoding; + delete vgObj.encoding; + } + + vgObj.facet = facetDef; + if (wrapColumns != null) { + vgObj.columns = wrapColumns; + } + vgObj.spec = childSpec; + vgObj.resolve = { + ...(vgObj.resolve || {}), + scale: { ...(vgObj.resolve?.scale || {}), y: 'independent' }, + }; + }; + if (vgObj.encoding?.column != undefined && vgObj.encoding?.row == undefined) { vgObj.encoding.facet = vgObj.encoding.column; @@ -754,6 +781,20 @@ function restructureFacets( delete vgObj.encoding.column; + // Faceting a concat spec must use top-level `facet` + child + // `spec`. Inline `encoding.facet` is ignored/invalid for + // hconcat/vconcat, which is the structure used by Bar Table. + if (isConcatSpec()) { + const facetDef = { ...vgObj.encoding.facet }; + delete facetDef.columns; + delete vgObj.encoding.facet; + if (Object.keys(vgObj.encoding).length === 0) { + delete vgObj.encoding; + } + hoistConcatIntoFacet(facetDef, numCols); + return; + } + // For layered specs, VL doesn't support encoding.facet inline — // restructure to top-level facet + spec. // IMPORTANT: In top-level facet mode, `columns` must be a sibling @@ -780,6 +821,24 @@ function restructureFacets( return; } + // For concat specs with row-only or column+row facets + if (isConcatSpec() && (vgObj.encoding?.column || vgObj.encoding?.row)) { + const facetDef: any = {}; + if (vgObj.encoding.column) { + facetDef.column = vgObj.encoding.column; + delete vgObj.encoding.column; + } + if (vgObj.encoding.row) { + facetDef.row = vgObj.encoding.row; + delete vgObj.encoding.row; + } + if (Object.keys(vgObj.encoding).length === 0) { + delete vgObj.encoding; + } + hoistConcatIntoFacet(facetDef); + return; + } + // For layered specs with row-only or column+row facets if (vgObj.layer && Array.isArray(vgObj.layer) && (vgObj.encoding?.column || vgObj.encoding?.row)) { diff --git a/src/lib/agents-chart/vegalite/recommendation.ts b/src/lib/agents-chart/vegalite/recommendation.ts index 23d79e70..9312e3f3 100644 --- a/src/lib/agents-chart/vegalite/recommendation.ts +++ b/src/lib/agents-chart/vegalite/recommendation.ts @@ -5,7 +5,7 @@ * Vega-Lite recommendation & adaptation wrappers. * * Extends core/recommendation.ts with VL-only chart types (Regression, - * Ranged Dot Plot, Pyramid, Lollipop, Dotted Line, Bump, Density, Waterfall, + * Ranged Dot Plot, Pyramid, Lollipop, Bump, Density, Waterfall, * Strip, US/World Map) and filters results to VL-valid channels. */ @@ -71,7 +71,6 @@ function vlGetRecommendation(chartType: string, tv: InternalTableView): Record { + // Bar tables split the plot width into 3 horizontal panels + // (bar | % | value), so they need a wider canvas than a basic + // bar chart at the same row count. We also want at least a + // moderately tall canvas so 20+ rows don't squish vertically. + // + // Estimate displayed row count after the Top-N rollup so the + // canvas is sized for what the user will actually see, not for + // the (possibly huge) raw row count. + const yField = cs.y?.field; + const facetFields = [cs.column?.field, cs.row?.field].filter(Boolean) as string[]; + const rawRowCount = (() => { + if (!yField) return 0; + if (facetFields.length === 0) { + return new Set((table ?? []).map((r: any) => r[yField])).size; + } + const perFacetRows = new Map>(); + for (const r of table ?? []) { + const key = facetFields.map(f => String(r[f] ?? '')).join('\x00'); + const rows = perFacetRows.get(key) ?? new Set(); + rows.add(r[yField]); + perFacetRows.set(key, rows); + } + return Math.max(0, ...Array.from(perFacetRows.values()).map(rows => rows.size)); + })(); + const maxRows = Math.max(0, Number(chartProperties?.maxRows ?? 20)); + const displayedRows = maxRows > 0 + ? Math.min(rawRowCount, maxRows) + : rawRowCount; + const minSubplotSize = displayedRows >= 30 ? 360 : 280; + + return { + axisFlags: { y: { banded: true } }, + paramOverrides: { + // Wider per-row band than the basic 20: leaves room for + // both the bar and the two text columns. + defaultBandSize: 24, + // Floor on overall subplot size — scales up when rows + // are dense so the bar column doesn't collapse below + // legibility. + minSubplotSize, + // Lengthen the continuous axis (bar) relative to the + // step height. Without this, a tall narrow canvas + // (many rows) leaves bars only a sliver wide. + targetBandAR: 280, + }, + }; + }, + instantiate: (spec, ctx) => { + const { x, y, color, column, row } = ctx.resolvedEncodings; + const config = ctx.chartProperties; + // ── Source of truth: full pre-overflow data ───────────────── + // + // We read from `ctx.fullTable` (pre-overflow) rather than + // `ctx.table` (post-overflow). The framework's filterOverflow + // step silently drops categories that don't fit pixel-budget + // assumptions — for a Bar Table, those budgets are wrong + // (we override panelHeight ourselves) AND lossy (the whole + // design is "top-N + Others", which only works if we can see + // the actual tail). Falling back to `ctx.table` keeps the + // template usable in tests / standalone callers that don't + // populate fullTable. + const table = ctx.fullTable ?? ctx.table ?? []; + const canvasSize = ctx.canvasSize; + + const xField: string = x?.field || 'Value'; + const yField: string = y?.field || 'Category'; + const colorField: string | undefined = color?.field; + const facetFields = [column?.field, row?.field].filter(Boolean) as string[]; + const hasFacet = facetFields.length > 0; + const scopeKeyOf = (r: any) => facetFields.map(f => String(r[f] ?? '')).join('\x00'); + const scopeValuesOf = (r: any) => Object.fromEntries(facetFields.map(f => [f, r[f]])); + + // ── Channel semantics (resolved by framework, may be undefined + // when the chart is rendered without a full pipeline) ─────── + const xCS: ChannelSemantics | undefined = ctx.channelSemantics?.x; + const yCS: ChannelSemantics | undefined = ctx.channelSemantics?.y; + const xEntry = getRegistryEntry(xCS?.semanticAnnotation?.semanticType ?? 'Unknown'); + + // Sign profile of x values — used by the diverging-palette check. + let hasNegative = false; + let hasPositive = false; + for (const r of table) { + const v = r[xField]; + if (typeof v === 'number' && isFinite(v)) { + if (v < 0) hasNegative = true; + else if (v > 0) hasPositive = true; + } + } + + // showPercent: off by default (safer — avoids misleading shares + // for intensive measures, already-percent values, mixed-sign + // data, etc.). The agent or the user can flip it on when the + // measure is genuinely additive. + const showPercent = config?.showPercent === true; + + // ── Per-category aggregate (built once on the input table) ── + // Used for both the Top-N trim decision and the panel-width + // estimation. `aggValue` collapses one group's sum/count into + // the single number the chart will display. + const useMeanForDisplay = xCS?.aggregationDefault === 'average'; + const aggValue = (g: { sum: number; n: number }) => + useMeanForDisplay ? g.sum / Math.max(1, g.n) : g.sum; + + const scopedCategoryAgg = new Map; + categories: Map; + }>(); + for (const r of table) { + const v = r[xField]; + if (typeof v !== 'number' || !isFinite(v)) continue; + const scopeKey = hasFacet ? scopeKeyOf(r) : ''; + let scope = scopedCategoryAgg.get(scopeKey); + if (!scope) { + scope = { facetValues: hasFacet ? scopeValuesOf(r) : {}, categories: new Map() }; + scopedCategoryAgg.set(scopeKey, scope); + } + const g = scope.categories.get(r[yField]) ?? { sum: 0, n: 0 }; + g.sum += v; g.n += 1; + scope.categories.set(r[yField], g); + } + const scopes = Array.from(scopedCategoryAgg.entries()); + const globalCategoryAgg = new Map(); + for (const { categories } of scopedCategoryAgg.values()) { + for (const [cat, g] of categories.entries()) { + const total = globalCategoryAgg.get(cat) ?? { sum: 0, n: 0 }; + total.sum += g.sum; + total.n += g.n; + globalCategoryAgg.set(cat, total); + } + } + const uniqueCats = Array.from(globalCategoryAgg.keys()); + + // ── Top-N + "Others" rollup ────────────────────────────────── + // + // Long bar tables become unreadable past ~20 rows. We rank + // categories by their per-category aggregate, keep the top + // (maxRows − 1), and roll the rest into one synthetic + // "Others (+N)" row pinned to the bottom. + // + // Skip when: + // * y has a canonical ordinal order (Month, Rank, …) — top-N + // would break the natural sequence. + // * the user disables it via `maxRows: 0`. + // + // When a color field is bound (stacked bars), kept categories + // retain all their original (color-split) rows so VL can still + // stack; the Others row carries no color value and renders gray. + const maxRows: number = Math.max(0, Number(config?.maxRows ?? 20)); + const ySortOrderForTrim: string[] | undefined = yCS?.ordinalSortOrder; + const maxScopedCategoryCount = Math.max(0, ...scopes.map(([, scope]) => scope.categories.size)); + const canTrim = maxRows > 0 + && !(ySortOrderForTrim && ySortOrderForTrim.length > 0) + && maxScopedCategoryCount > maxRows; + + const sortRowsByValue = (items: Array<{ cat: any; value: number }>) => items + .sort((a, b) => yCS?.reversed ? a.value - b.value : b.value - a.value); + + let displayTable: any[] = []; + let othersCatLabel: string | undefined; + let keptCatOrder: any[] | undefined; + let perCatAggValues: number[] = []; + const perScopeAggValues: number[][] = []; + let maxDisplayRowsPerScope = 0; + + if (canTrim) { + const keepN = Math.max(1, maxRows - 1); + const displayRows: any[] = []; + for (const [scopeKey, scope] of scopes) { + const sorted = sortRowsByValue(Array.from(scope.categories.entries()) + .map(([cat, g]) => ({ cat, value: aggValue(g) }))); + const keptItems = sorted.slice(0, keepN); + const rest = sorted.slice(keepN); + if (!hasFacet) { + keptCatOrder = keptItems.map(a => a.cat); + } + const keptCats = new Set(keptItems.map(a => a.cat)); + + if (colorField) { + const keptRanks = new Map(keptItems.map((a, idx) => [a.cat, idx])); + for (const r of table) { + if ((hasFacet ? scopeKeyOf(r) : '') === scopeKey && keptCats.has(r[yField])) { + displayRows.push({ ...r, __bt_sort: keptRanks.get(r[yField]) ?? 0, __bt_others: false, __bt_others_num: 0 }); + } + } + } else { + keptItems.forEach((a, idx) => { + displayRows.push({ ...scope.facetValues, [yField]: a.cat, [xField]: a.value, __bt_sort: idx, __bt_others: false, __bt_others_num: 0 }); + }); + } + + const restSum = rest.reduce((s, a) => s + a.value, 0); + const othersValue = useMeanForDisplay && rest.length > 0 ? restSum / rest.length : restSum; + const scopeOthersLabel = `Others (+${rest.length})`; + othersCatLabel = othersCatLabel ?? scopeOthersLabel; + displayRows.push({ + ...scope.facetValues, + [yField]: scopeOthersLabel, + [xField]: othersValue, + __bt_sort: keptItems.length, + __bt_others: true, + __bt_others_num: 1, + }); + const scopeAggValues = [...keptItems.map(a => a.value), othersValue]; + perCatAggValues.push(...scopeAggValues); + perScopeAggValues.push(scopeAggValues); + maxDisplayRowsPerScope = Math.max(maxDisplayRowsPerScope, keptItems.length + 1); + } + displayTable = displayRows; + } + + if (!canTrim) { + const sortRanksByScope = new Map>(); + for (const [scopeKey, scope] of scopes) { + const sorted = sortRowsByValue(Array.from(scope.categories.entries()) + .map(([cat, g]) => ({ cat, value: aggValue(g) }))); + sortRanksByScope.set(scopeKey, new Map(sorted.map((a, idx) => [a.cat, idx]))); + const scopeAggValues = sorted.map(a => a.value); + perCatAggValues.push(...scopeAggValues); + perScopeAggValues.push(scopeAggValues); + maxDisplayRowsPerScope = Math.max(maxDisplayRowsPerScope, sorted.length); + } + displayTable = table.map(r => { + const scopeKey = hasFacet ? scopeKeyOf(r) : ''; + return { ...r, __bt_sort: sortRanksByScope.get(scopeKey)?.get(r[yField]) ?? 0, __bt_others: false, __bt_others_num: 0 }; + }); + } + + // ── Column header labels ───────────────────────────────────── + // Derived directly from field names; no override knobs. + const categoryHeader = yField; + const percentHeader = '%'; + const valueHeader = xField; + // headerStyle.fontSize is set below once the responsive + // `fontSize` constant is available. + + // ── Format derivation (from ChannelSemantics) ──────────────── + // + // Policy: don't over-process user input. The framework's + // `resolveFormat` already follows a "only override when the raw + // number would be misleading" rule (e.g., 0–1 Percentage with an + // intrinsicDomain, currency with a known unit symbol). When + // `cs.format` is undefined, the raw value is already readable — + // we show it as-is and let VL apply its default number rendering. + // + // The %-share column (panel 1) is a different story: it's a + // *derived* 0..1 ratio computed by us, so it always needs `%` + // formatting. That's `pctPattern` below. + const valueFmt: FormatSpec | undefined = xCS?.format; + const pctPattern = '.1%'; + + // ── Text-panel transforms ──────────────────────────────────── + // + // The bar panel naturally handles stacked / grouped data via VL. + // The text panels (% and value) must show ONE row per category; + // when the input has multiple rows per y-category (e.g. stacked + // by `color`), we first aggregate per-category, then derive the + // share. Without this, each input row would render its own text + // mark (overlapping) and each share would be value/grand-total + // (~0% per row instead of the per-category percent). + const sortOp: 'sum' | 'mean' = (xCS?.aggregationDefault === 'average') ? 'mean' : 'sum'; + const uniqueGroupby = (fields: string[]) => Array.from(new Set(fields)); + const textGroupby = hasFacet ? uniqueGroupby([...facetFields, yField]) : [yField]; + const textPanelTransform: any[] = [ + { aggregate: [{ op: sortOp, field: xField, as: '__bt_val' }, { op: 'min', field: '__bt_sort', as: '__bt_sort' }, { op: 'max', field: '__bt_others_num', as: '__bt_others_num' }], groupby: textGroupby }, + ]; + if (showPercent) { + const totalTransform: any = { joinaggregate: [{ op: 'sum', field: '__bt_val', as: '__bt_total' }] }; + if (hasFacet) { + totalTransform.groupby = facetFields; + } + textPanelTransform.push( + totalTransform, + { calculate: `datum.__bt_total === 0 ? null : datum.__bt_val / datum.__bt_total`, as: '__bt_pct' }, + ); + } + + const uniqueFacetValueCount = (field?: string) => field + ? new Set(displayTable.map(r => r[field])).size + : 0; + const columnFacetCount = uniqueFacetValueCount(column?.field); + const rowFacetCount = uniqueFacetValueCount(row?.field); + const layoutFacetColumns = ctx.layout?.facet?.columns ?? (columnFacetCount || 1); + const facetColsForSizing = hasFacet + ? Math.max(1, Math.min(layoutFacetColumns, columnFacetCount || 1)) + : 1; + const facetRowsForSizing = hasFacet + ? Math.max(1, rowFacetCount || Math.ceil(Math.max(1, columnFacetCount) / facetColsForSizing)) + : 1; + const subplotWidth = hasFacet ? (ctx.layout?.subplotWidth ?? canvasSize?.width) : canvasSize?.width; + const layoutSubplotHeight = hasFacet ? (ctx.layout?.subplotHeight ?? canvasSize?.height) : canvasSize?.height; + const facetHeightBudget = hasFacet && facetRowsForSizing > 1 + ? (() => { + const maxStretch = ctx.assembleOptions?.maxStretch ?? 2; + const facetElasticity = ctx.assembleOptions?.facetElasticity ?? 0.3; + const fixH = ctx.assembleOptions?.facetFixedPadding?.height ?? 0; + const gap = ctx.layout?.effectiveFacetGap ?? ctx.assembleOptions?.facetGap ?? 0; + const stretch = Math.min(maxStretch, Math.pow(facetRowsForSizing, facetElasticity)); + return Math.max(0, Math.round((canvasSize.height * stretch - fixH) / facetRowsForSizing - gap)); + })() + : layoutSubplotHeight; + + // ── Sizing constants (responsive to row density) ──────── + // `displayCount` is the number of rows the chart will actually + // render (post-rollup). As it grows, we shrink fonts so labels + // don't crowd. In facets, shrink again when each mini-table has + // substantially less plot budget than the full canvas. + const displayCount = maxDisplayRowsPerScope || uniqueCats.length; + // 0 (sparse, ≤12 rows) … 1 (dense, ≥52 rows) — font/density curve. + const density = Math.min(1, Math.max(0, (displayCount - 12) / 40)); + const lerp = (a: number, b: number) => Math.round(a + (b - a) * density); + const subplotWidthRatio = hasFacet && canvasSize?.width + ? Math.min(1, Math.max(0, (subplotWidth ?? canvasSize.width) / canvasSize.width)) + : 1; + const subplotHeightRatio = hasFacet && canvasSize?.height + ? Math.min(1, Math.max(0, (facetHeightBudget ?? canvasSize.height) / canvasSize.height)) + : 1; + const facetFontDrop = hasFacet + ? Math.round((1 - Math.min(subplotWidthRatio, subplotHeightRatio)) * 3) + : 0; + + const fontSize = Math.max(9, lerp(12, 10) - facetFontDrop); // text panels + const labelFontSize = Math.max(9, lerp(13, 10) - facetFontDrop); // y-axis tick labels + + // ── Bar geometry: capped thickness, proportional gap ───────── + // + // Design intent: the bar is the signal — never let it get + // stretched into a fat rectangle (un-bar-like) and never let + // the gap dominate it. Vega-Lite default bars sit around 18px; + // we cap a touch tighter at 16 to match BI "data bar" feel. + // + // Bars stay at `barCap` until row count exceeds `compressStart`, + // then shrink linearly to `barMin` by `compressEnd`. Below + // `barMin` the mark becomes a hairline and stops reading as + // a bar, so we hold the floor. + // + // Gap = max(`gapMin`, bar × `gapRatio`) — proportional so the + // bar:gap ratio stays roughly constant (≈4–5×) across densities, + // with a 2px floor so rows never visually merge. + const barCap = 16, barMin = 8; + const gapMin = 2, gapRatio = 0.2; + const compressStart = 30, compressEnd = 80; + const compressT = Math.min(1, Math.max(0, + (displayCount - compressStart) / (compressEnd - compressStart))); + const barPx = Math.round(barCap - (barCap - barMin) * compressT); + const gapPx = Math.max(gapMin, Math.round(barPx * gapRatio)); + const rowStep = barPx + gapPx; + const barBandRatio = +(barPx / rowStep).toFixed(3); + + const charPx = fontSize * 0.6; + const textPad = 12; + const minTextPanel = 36; + const maxTextPanel = 140; + const cjkRe = /[\u4E00-\u9FFF\u3000-\u303F]/; + + const headerStyle = { + fontSize: fontSize, + fontWeight: 'normal' as const, + color: '#999', + }; + + // ── Shared y encoding ──────────────────────────────────────── + // Sort: honor canonical ordinal order if the y field has one + // (e.g. Month, Day-of-week, Rank); otherwise rank by aggregated x. + // When we trimmed, pin the synthetic "Others" row to the bottom + // by using an explicit sort array (kept categories in rank order + // followed by the Others label). + // + // Important: we use an explicit category array (not a + // `{field, op}` sort) even in the un-trimmed case. The y scale + // is `resolve: shared` across 3 panels, but the % / value + // panels run an `aggregate` transform that renames `xField` to + // `__bt_val`. With a field-based sort, VL can't resolve + // `xField` post-transform and falls back to alphabetical + // domain order — which silently breaks the ranking. + const ySortOrder: string[] | undefined = yCS?.ordinalSortOrder; + const rankedCatOrder = (() => { + if (canTrim && keptCatOrder && othersCatLabel) { + return [...keptCatOrder, othersCatLabel]; + } + return uniqueCats + .map(cat => ({ cat, value: aggValue(globalCategoryAgg.get(cat)!) })) + .sort((a, b) => yCS?.reversed ? a.value - b.value : b.value - a.value) + .map(a => a.cat); + })(); + const ySort: any = ySortOrder && ySortOrder.length > 0 + ? ySortOrder + : hasFacet ? { field: '__bt_sort', op: 'min', order: 'ascending' } : rankedCatOrder; + + // Labels are left-aligned and pushed flush with the panel's left + // edge so they line up under the "Category" column header. + const categoryLabelWidth = (() => { + const maxChars = displayTable.reduce((m, r) => { + const s = String(r[yField] ?? ''); + const w = [...s].reduce((a, ch) => a + (cjkRe.test(ch) ? 2 : 1), 0); + return Math.max(m, w); + }, 0); + return Math.min(220, Math.max(60, Math.round(maxChars * labelFontSize * 0.55 + 12))); + })(); + const yEncWithLabels: any = { + field: yField, + type: 'nominal', + sort: ySort, + axis: { + title: null, + domain: false, + ticks: false, + labelFontSize, + labelAlign: 'left', + labelPadding: categoryLabelWidth, + labelLimit: categoryLabelWidth, + }, + }; + const yEncNoLabels: any = { ...yEncWithLabels, axis: null }; + + // ── Color: gradient by value (default) or grouped by field ─── + // Diverging types (Profit/Correlation) and mixed-sign data get a + // diverging palette anchored at 0; otherwise a sequential ramp. + const isDiverging = !colorField && ( + xEntry.diverging === 'inherent' + || (xEntry.diverging === 'conditional' && hasNegative && hasPositive) + ); + const colorEnc = colorField + ? (() => { + // When we trimmed, the synthetic Others row has no value + // for colorField, which would surface as an "undefined" + // entry in the legend. Restrict the scale domain to the + // actual values present in the kept rows. + const base: any = { ...color }; + if (canTrim) { + const vals = Array.from(new Set( + displayTable + .filter(r => !r.__bt_others) + .map(r => r[colorField]) + .filter(v => v !== undefined && v !== null) + )); + base.scale = { ...(base.scale || {}), domain: vals }; + } + return base; + })() + : isDiverging + ? { + field: xField, + type: 'quantitative', + legend: null, + scale: { scheme: 'redyellowgreen', domainMid: 0 }, + } + : { + field: xField, + type: 'quantitative', + legend: null, + scale: { range: ['#cdebd3', '#41a25f'] }, + }; + + // ── Dynamic panel widths from longest formatted label ──────── + // + // Approximates the framework's d3-format output well enough for + // panel sizing. When no format is resolved, we just show the raw + // value via `String(v)` and measure that. + const approxFormat = (v: number): string => { + if (!Number.isFinite(v)) return ''; + if (!valueFmt) return String(v); + const p = valueFmt.pattern || ''; + let body: string; + if (p.includes('%')) { + const dec = /\.(\d+)/.exec(p)?.[1]; + body = (v * 100).toFixed(dec ? parseInt(dec) : 1) + '%'; + } else if (p.includes('d')) { + body = Math.round(v).toLocaleString('en-US'); + } else if (/~s|s$/.test(p)) { + body = Math.abs(v) >= 1e6 ? (v / 1e6).toFixed(1) + 'M' + : Math.abs(v) >= 1e3 ? (v / 1e3).toFixed(1) + 'K' + : v.toFixed(0); + } else if (p) { + const dec = /\.(\d+)/.exec(p)?.[1]; + body = v.toLocaleString('en-US', { + minimumFractionDigits: dec ? parseInt(dec) : 0, + maximumFractionDigits: dec ? parseInt(dec) : 2, + }); + } else { + body = String(v); + } + return (valueFmt.prefix ?? '') + body + (valueFmt.suffix ?? ''); + }; + const approxPct = (v: number) => + Number.isFinite(v) ? `${(v * 100).toFixed(1)}%` : ''; + + const measure = (strs: string[]) => { + const maxChars = strs.reduce((m, s) => Math.max(m, s.length), 0); + return Math.min(maxTextPanel, Math.max(minTextPanel, Math.round(maxChars * charPx + textPad))); + }; + + // ── Header wrap / truncate strategy ────────────────────────── + // Long field names like `video_views_for_the_last_30_days` blow + // past panel width. We try to fit them by: + // 1. measuring single-line width and reserving that as a + // minimum for the panel (capped at maxTextPanel); + // 2. wrapping on `_` / space boundaries into up to two lines + // when the single line still doesn't fit; + // 3. handing VL a `title.limit` so it ellipsizes anything + // that still overflows. + const headerPad = 4; + const headerWidthOf = (s: string) => Math.round(s.length * charPx) + headerPad; + const wrapHeader = (label: string, maxPx: number): { text: string | string[]; widthPx: number } => { + const single = headerWidthOf(label); + if (single <= maxPx) return { text: label, widthPx: single }; + const tokens = label.split(/[_\s]+/).filter(Boolean); + if (tokens.length < 2) return { text: label, widthPx: single }; + // Greedy balanced two-line split on token boundaries. + const totalLen = tokens.reduce((a, t) => a + t.length, 0); + let acc = 0, splitAt = 1; + for (let i = 0; i < tokens.length - 1; i++) { + acc += tokens[i].length; + if (acc >= totalLen / 2) { splitAt = i + 1; break; } + } + const line1 = tokens.slice(0, splitAt).join('_'); + const line2 = tokens.slice(splitAt).join('_'); + return { text: [line1, line2], widthPx: Math.max(headerWidthOf(line1), headerWidthOf(line2)) }; + }; + + // Per-category aggregate values used for panel-width sizing. + // Built above as `perCatAggValues` — reuse directly. + + // Wrap headers up to the panel's max width budget; the wrap + // result's `widthPx` then acts as a floor on the actual panel + // width (so the title doesn't get truncated when the data is + // narrower than the header). + const valueHeaderWrap = wrapHeader(valueHeader, maxTextPanel - headerPad); + const percentHeaderWrap = wrapHeader(percentHeader, maxTextPanel - headerPad); + + const valuePanelDataWidth = measure(perCatAggValues.map(approxFormat)); + const valuePanelWidth = Math.min( + maxTextPanel, + Math.max(valuePanelDataWidth, valueHeaderWrap.widthPx + headerPad, minTextPanel), + ); + const pctValuesForSizing = perScopeAggValues.flatMap(values => { + const scopeTotal = values.reduce((a, b) => a + b, 0); + return Math.abs(scopeTotal) > 1e-9 ? values.map(v => v / scopeTotal) : []; + }); + const percentPanelWidth = showPercent && pctValuesForSizing.length > 0 + ? Math.min( + maxTextPanel, + Math.max( + measure(pctValuesForSizing.map(approxPct)), + percentHeaderWrap.widthPx + headerPad, + minTextPanel, + ), + ) + : 0; + + const totalWidth = subplotWidth ?? 480; + const interPanelGap = 8; + const reservedForText = valuePanelWidth + interPanelGap + + (showPercent ? percentPanelWidth + interPanelGap : 0); + // Bar panel needs a meaningful min width — a 3-panel layout + // squeezes the bar column more than a basic bar chart, and the + // bar IS the chart, so it should never collapse below ~45% of + // the plot budget. Faceted small multiples get a smaller + // absolute floor so each mini-table can shrink like other charts. + const minBarPanelWidth = hasFacet + ? Math.max(80, Math.round(totalWidth * 0.45)) + : Math.max(180, Math.round(totalWidth * 0.45)); + const barPanelWidth = Math.max(minBarPanelWidth, totalWidth - reservedForText - categoryLabelWidth); + + const yCard = Math.max(1, maxDisplayRowsPerScope || new Set(displayTable.map(r => r[yField])).size); + const panelHeight = Math.max(facetHeightBudget ?? 0, yCard * rowStep); + + // ── Helpers to build a text encoding honoring prefix/suffix ── + // + // - No fmt resolved → show the raw field, VL default rendering. + // - Pattern only → VL `format` shortcut. + // - With affixes → calculate transform that concats prefix + // + format(value, pattern) + suffix. + const buildTextEncoding = ( + sourceField: string, + fmt: FormatSpec | undefined, + transformsOut: any[], + outFieldHint: string, + ): any => { + if (!fmt || (!fmt.pattern && !fmt.prefix && !fmt.suffix)) { + return { field: sourceField, type: 'quantitative' }; + } + const hasAffix = !!(fmt.prefix || fmt.suffix); + if (!hasAffix) { + return { field: sourceField, type: 'quantitative', format: fmt.pattern }; + } + // Escape backslashes first (so we don't double-escape the ones + // we add next), then escape single quotes for safe embedding in + // the Vega expression string literal. + const escPfx = (fmt.prefix ?? '').replace(/\\/g, "\\\\").replace(/'/g, "\\'"); + const escSfx = (fmt.suffix ?? '').replace(/\\/g, "\\\\").replace(/'/g, "\\'"); + const formatExpr = fmt.pattern + ? `format(datum['${sourceField}'], '${fmt.pattern}')` + : `datum['${sourceField}']`; + transformsOut.push({ + calculate: `'${escPfx}' + ${formatExpr} + '${escSfx}'`, + as: outFieldHint, + }); + return { field: outFieldHint, type: 'nominal' }; + }; + + // ── X-scale: anchor bars at 0 for diverging measures ───────── + const barXScale: any = { nice: false }; + if (isDiverging) barXScale.domainMid = 0; + + // ── Per-panel data: always register `displayTable` as a named + // dataset and reference it from every panel. The Bar Table + // is self-contained — we read from `ctx.fullTable` and + // derive the exact rows we want to render, so we must NOT + // fall back to the framework's root data injection (which + // is the post-overflow filtered table and would silently + // drop categories behind our back). + const datasetName = '__bt_displayTable'; + spec.datasets = { ...(spec.datasets || {}), [datasetName]: displayTable }; + if (hasFacet) { + spec.data = { name: datasetName }; + } + const withData = (panel: any) => hasFacet ? panel : ({ data: { name: datasetName }, ...panel }); + + // ── Others row: gray out across panels ─────────────────────── + // Text panels aggregate rows, so carry a numeric flag through + // the aggregate instead of relying on facet-specific labels. + const othersGray = '#bdbdbd'; + const othersTextTest = canTrim + ? `datum.__bt_others_num === 1 || datum.__bt_others === true` + : undefined; + + // ── Panel 0: bar (with y-axis labels) ──────────────────────── + // + // For additive measures (sortOp = 'sum'), we let VL natively + // stack raw rows — the bar's length equals the row sum, which + // matches what the value text panel displays. This also keeps + // per-segment detail (gradient stripes / colored sub-groups). + // + // For non-additive measures (sortOp = 'mean'), stacking raw + // rows would silently encode bar length = SUM(values), which + // contradicts the MEAN we display in the value column. In that + // case we aggregate the bar data the same way the text panel + // does so the bar's length matches the displayed number. + const barAggregate = useMeanForDisplay; + const barTransform: any[] | undefined = barAggregate + ? [{ + aggregate: [{ op: sortOp, field: xField, as: '__bt_val' }, { op: 'min', field: '__bt_sort', as: '__bt_sort' }, { op: 'max', field: '__bt_others_num', as: '__bt_others_num' }], + groupby: uniqueGroupby([...facetFields, yField, ...(colorField ? [colorField] : [])]), + }] + : undefined; + const barXField = barAggregate ? '__bt_val' : xField; + + // Gradient-by-value color (no user color field) must reference + // the same field the bar's x encoding uses, otherwise the scale + // can't resolve post-aggregate. + const barColorBase = !colorField && barAggregate + ? (isDiverging + ? { field: '__bt_val', type: 'quantitative', legend: null, scale: { scheme: 'redyellowgreen', domainMid: 0 } } + : { field: '__bt_val', type: 'quantitative', legend: null, scale: { range: ['#cdebd3', '#41a25f'] } }) + : colorEnc; + + // Others-row detection: when we aggregate, the `__bt_others` + // flag is dropped, so fall back to the y-label test used by the + // text panels. + const barOthersTest = barAggregate ? othersTextTest : 'datum.__bt_others'; + const barColorEnc: any = canTrim && barOthersTest + ? { condition: { test: barOthersTest, value: othersGray }, ...barColorBase } + : barColorBase; + + const barPanel: any = withData({ + width: barPanelWidth, + height: panelHeight, + // No `limit` here — the category header is allowed to + // overflow the (narrow) y-label gutter into the bar area + // so long field names stay legible. + title: { text: categoryHeader, anchor: 'start', offset: 6, ...headerStyle }, + ...(barTransform ? { transform: barTransform } : {}), + mark: { + type: 'bar', + height: { band: barBandRatio }, + }, + encoding: { + y: yEncWithLabels, + x: { + field: barXField, + type: 'quantitative', + axis: null, + scale: barXScale, + }, + color: barColorEnc, + }, + }); + + const panels: any[] = [barPanel]; + + // ── Panel 1: % share (right-aligned text column) ───────────── + // Uses the per-category aggregate (__bt_pct) so it shows the + // category's share of the grand total, not a per-row fraction. + if (showPercent) { + const pctColor: any = othersTextTest + ? { condition: { test: othersTextTest, value: othersGray }, value: '#41a25f' } + : { value: '#41a25f' }; + panels.push(withData({ + width: percentPanelWidth, + height: panelHeight, + transform: textPanelTransform, + title: { text: percentHeaderWrap.text, anchor: 'end', offset: 6, limit: Math.max(20, percentPanelWidth - headerPad), ...headerStyle }, + mark: { + type: 'text', + align: 'right', + baseline: 'middle', + fontSize, + }, + encoding: { + y: yEncNoLabels, + x: { datum: 1, axis: null, scale: { type: 'linear', domain: [0, 1] } }, + text: { field: '__bt_pct', type: 'quantitative', format: pctPattern }, + color: pctColor, + }, + })); + } + + // ── Panel 2: aggregated value (right-aligned text column) ──── + // Displays __bt_val (per-category total/mean) — one mark per + // category, regardless of how many input rows existed. + { + const valueTransforms: any[] = [...textPanelTransform]; + const textEnc = buildTextEncoding('__bt_val', valueFmt, valueTransforms, '__bt_val_str'); + const valColor: any = othersTextTest + ? { condition: { test: othersTextTest, value: othersGray }, value: '#666' } + : { value: '#666' }; + panels.push(withData({ + width: valuePanelWidth, + height: panelHeight, + transform: valueTransforms, + title: { text: valueHeaderWrap.text, anchor: 'end', offset: 6, limit: Math.max(20, valuePanelWidth - headerPad), ...headerStyle }, + mark: { + type: 'text', + align: 'right', + baseline: 'middle', + fontSize, + }, + encoding: { + y: yEncNoLabels, + x: { datum: 1, axis: null, scale: { type: 'linear', domain: [0, 1] } }, + text: textEnc, + color: valColor, + }, + })); + } + + spec.spacing = interPanelGap; + spec.hconcat = panels; + + // Facets (column/row) live on the outer spec. + if (column || row) { + spec.encoding = spec.encoding || {}; + if (column) spec.encoding.column = column; + if (row) spec.encoding.row = row; + } + }, + properties: [ + { key: 'maxRows', label: 'Max Rows', type: 'continuous', min: 5, max: 100, step: 1, defaultValue: 20 }, + // Off by default — safer for arbitrary measures. The agent (or + // the user) can flip it on when a "% of total" share is + // meaningful (additive, single-sign, non-zero total). + { key: 'showPercent', label: 'Show % of Total', type: 'binary', defaultValue: false }, + ] as ChartPropertyDef[], +}; diff --git a/src/lib/agents-chart/vegalite/templates/index.ts b/src/lib/agents-chart/vegalite/templates/index.ts index e01b6694..28102d33 100644 --- a/src/lib/agents-chart/vegalite/templates/index.ts +++ b/src/lib/agents-chart/vegalite/templates/index.ts @@ -14,7 +14,7 @@ import { ChartTemplateDef } from '../../core/types'; // --- Individual chart imports --- import { scatterPlotDef, regressionDef, rangedDotPlotDef, boxplotDef } from './scatter'; import { barChartDef, pyramidChartDef, groupedBarChartDef, stackedBarChartDef, histogramDef, heatmapDef } from './bar'; -import { lineChartDef, dottedLineChartDef } from './line'; +import { lineChartDef } from './line'; import { bumpChartDef } from './bump'; import { areaChartDef, streamgraphDef } from './area'; import { pieChartDef } from './pie'; @@ -23,23 +23,29 @@ import { densityPlotDef } from './density'; import { stripPlotDef } from './jitter'; import { candlestickChartDef } from './candlestick'; import { waterfallChartDef } from './waterfall'; +import { barTableDef } from './bar-table'; import { radarChartDef } from './radar'; import { roseChartDef } from './rose'; import { usMapDef, worldMapDef } from './map'; import { customPointDef, customLineDef, customBarDef, customRectDef, customAreaDef } from './custom'; +import { kpiCardDef } from './kpi-card'; /** * All chart template definitions, grouped by category. * Keys are category names shown in the UI, values are arrays of template definitions. + * + * Categories are organized by *mark family* — charts in the same group share + * their dominant visual primitive (point, bar, line/area, etc.). This keeps + * placement objective and the picker readable. */ export const vlTemplateDefs: { [key: string]: ChartTemplateDef[] } = { - "Scatter & Point": [scatterPlotDef, regressionDef, boxplotDef, stripPlotDef], - "Bar": [barChartDef, groupedBarChartDef, stackedBarChartDef, histogramDef, lollipopChartDef, pyramidChartDef], - "Line & Area": [lineChartDef, dottedLineChartDef, bumpChartDef, areaChartDef, streamgraphDef], - "Part-to-Whole": [pieChartDef, roseChartDef, heatmapDef, waterfallChartDef], - "Statistical": [densityPlotDef, rangedDotPlotDef, radarChartDef, candlestickChartDef], - "Map": [usMapDef, worldMapDef], - "Custom": [customPointDef, customLineDef, customBarDef, customRectDef, customAreaDef], + "Points": [scatterPlotDef, regressionDef, rangedDotPlotDef, stripPlotDef], + "Bars": [barChartDef, groupedBarChartDef, stackedBarChartDef, lollipopChartDef, waterfallChartDef], + "Distributions": [histogramDef, densityPlotDef, boxplotDef, pyramidChartDef, candlestickChartDef], + "Lines & Areas": [lineChartDef, bumpChartDef, areaChartDef, streamgraphDef], + "Circular": [pieChartDef, roseChartDef, radarChartDef], + "Tables & Maps": [heatmapDef, barTableDef, kpiCardDef, usMapDef, worldMapDef], + "Custom": [customPointDef, customLineDef, customBarDef, customRectDef, customAreaDef], }; /** diff --git a/src/lib/agents-chart/vegalite/templates/kpi-card.ts b/src/lib/agents-chart/vegalite/templates/kpi-card.ts new file mode 100644 index 00000000..a09c7d63 --- /dev/null +++ b/src/lib/agents-chart/vegalite/templates/kpi-card.ts @@ -0,0 +1,551 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +import { ChartTemplateDef, ChartPropertyDef } from '../../core/types'; + +/** + * KPI Card — "big number" dashboard tile, one row per tile. + * + * Data shape + * ────────── + * The input table is interpreted as a list of tiles. Each row produces + * one tile. + * + * { metric: string, // tile caption (required, via `metric` channel) + * value: number | string, // big number (required, via `value` channel, + * // pre-aggregated upstream) + * goal?: number | string, // optional comparison value (via `goal` channel) + * } + * + * Channels + * ──────── + * - `metric` (required): caption field. + * - `value` (required): big-number field. + * - `goal` (optional): comparison/target field. + * + * Formatting + * ────────── + * Formatting is delegated upstream. The template applies only a trivial + * default (`toLocaleString`) to numeric values so a raw shelf binding + * doesn't show `1184320.0`. If you want `"$1.18M"`, write that string + * into the `value` column in the data prep step. + * + * Progress bar + * ──────────── + * If both `value` and `goal` are numeric and finite, a thin progress + * bar appears beneath the big number showing `value / goal` (clamped + * to [0, 1.5] so overshoot is visible). Otherwise `goal` is shown as a + * small "Goal: " line. + */ + +type Layout = 'horizontal' | 'vertical' | 'grid'; + +const PROGRESS_TRACK = '#e6e9ef'; +const PROGRESS_ON_TRACK = '#5b8def'; // < 100% of goal (in progress) +const PROGRESS_EXCEEDED = '#22a06b'; // ≥ 100% of goal (success) +const PROGRESS_BEHIND = '#e07a3c'; // < 50% of goal (well short) + +// Card frame — drawn behind every tile so each KPI reads as a discrete +// card rather than free-floating text. Sized to content (see cardTop / +// cardBot below) so a tall panel never produces a tall empty card. +const CARD_FILL = '#ffffff'; +const CARD_STROKE = '#e6e9ef'; +const CARD_RADIUS = 8; + +export const kpiCardDef: ChartTemplateDef = { + chart: "KPI Card", + template: { layer: [] }, + channels: ["metric", "value", "goal"], + markCognitiveChannel: 'position', + instantiate: (spec, ctx) => { + const { metric, value, goal } = ctx.resolvedEncodings; + const config = ctx.chartProperties || {}; + + const metricField: string | undefined = metric?.field; + const valueField: string | undefined = value?.field; + const goalField: string | undefined = goal?.field; + + // Behind/on-track cutoff (see properties below). + const rawBehind = Number(config.behindThreshold); + const behindThreshold = Number.isFinite(rawBehind) + ? Math.min(1, Math.max(0, rawBehind)) + : 0.5; + + const sourceTable = ctx.fullTable ?? ctx.table ?? []; + + // ── Collect tiles ────────────────────────────────────────────────── + type Tile = { + caption: string; + valueText: string; + goalText?: string; + // Progress is shown only when both value & goal are numeric. + progress?: { fraction: number; valueNum: number; goalNum: number }; + }; + + const tiles: Tile[] = []; + if (valueField) { + for (const row of sourceTable) { + if (!row) continue; + const rawValue = row[valueField]; + if (rawValue == null) continue; + + const caption = metricField + ? (row[metricField] != null ? String(row[metricField]) : '') + : valueField; + + const rawGoal = goalField ? row[goalField] : undefined; + + const valueText = renderScalar(rawValue); + const goalText = rawGoal != null ? renderScalar(rawGoal) : undefined; + + let progress: Tile['progress']; + if ( + typeof rawValue === 'number' && Number.isFinite(rawValue) && + typeof rawGoal === 'number' && Number.isFinite(rawGoal) && + rawGoal !== 0 + ) { + progress = { + fraction: rawValue / rawGoal, + valueNum: rawValue, + goalNum: rawGoal, + }; + } + + tiles.push({ caption, valueText, goalText, progress }); + } + } + + if (tiles.length === 0) { + tiles.push({ caption: 'Value', valueText: '—' }); + } + + // ── Layout ───────────────────────────────────────────────────────── + const baseW = ctx.canvasSize.width; + // baseH unused: tile height is derived from tile width via the + // target aspect ratio (see TARGET_ASPECT below) rather than from + // the canvas height. + const n = tiles.length; + + const requestedLayout = (config.layout as Layout) || 'auto' as any; + const layout: Layout = + requestedLayout === 'horizontal' || requestedLayout === 'vertical' || requestedLayout === 'grid' + ? requestedLayout + : 'grid'; + + let cols: number; + let rows: number; + if (layout === 'horizontal') { + cols = n; rows = 1; + } else if (layout === 'vertical') { + cols = 1; rows = n; + } else { + cols = Math.ceil(Math.sqrt(n)); + rows = Math.ceil(n / cols); + } + + const spacing = 4; + // Sizing strategy + // ─────────────── + // Cards target an aspect ratio (W:H) in the 1.2–1.5 range, which + // is what Tableau / Power BI / Looker scorecards converge on. + // Width is driven by the panel + tile-count budget; height is + // *derived* from width via TARGET_ASPECT so cards stay in shape + // whether they're compressed or expanded. + // + // Each tile *wants* to be TARGET_TILE_W wide. The canvas may + // stretch up to baseW × MAX_STRETCH (the "budget"). If granting + // every tile its wish would exceed that, tiles compress to share + // what the budget allows. If even that compression would push + // them below MIN_TILE_W, we let the canvas grow past the budget + // — readability wins. tileH is then derived from tileW. + const MAX_STRETCH = 1.6; + const TARGET_ASPECT = 1.4; // card W:H — within Tableau/Power BI range + const TARGET_TILE_W = 220; + const MIN_TILE_W = 130; + const MIN_TILE_H = Math.round(MIN_TILE_W / TARGET_ASPECT); // ~93 + + const wishW = cols * TARGET_TILE_W + (cols - 1) * spacing; + const budgetW = baseW * MAX_STRETCH; + const minRequiredW = cols * MIN_TILE_W + (cols - 1) * spacing; + + const W = Math.max( + minRequiredW, + Math.min(budgetW, Math.max(baseW, wishW)), + ); + + // Tile dimensions follow the (possibly stretched) canvas; tileH + // is derived from tileW via the target aspect ratio so cards + // never go wider than ~1.5:1 or taller than ~1.2:1. + const tileW = Math.max(MIN_TILE_W, Math.floor((W - spacing * (cols - 1)) / cols)); + const tileH = Math.max(MIN_TILE_H, Math.round(tileW / TARGET_ASPECT)); + const H = rows * tileH + (rows - 1) * spacing; + + // Card horizontal inset (must match cardLeft below) and inner + // horizontal padding for text. Computed before font sizing so we + // can constrain valueFont to "longest value text fits inside the + // card" — otherwise long numbers like 32,799,314 overflow the + // card frame at small tile widths. + const cardLeftInset = Math.max(0.5, Math.floor(tileW * 0.04)); + // Inner text padding scales with tile but has a sane floor; this + // keeps a consistent card aspect ratio across tile counts. + const cardInnerPadX = Math.max(8, Math.floor(tileW * 0.06)); + const cardInnerW = Math.max(20, tileW - 2 * cardLeftInset - 2 * cardInnerPadX); + + // Estimate widest text per layer so fonts can be shrunk to fit + // inside the card. Without this, long captions ("Massachusetts") + // or sub-lines ("111% of 761,723") overflow the card border at + // small tile widths. + // - Value is bold ⇒ ~0.66em per glyph on average (conservative; + // real digit widths in the default sans bold land near 0.6, + // but we leave headroom so the number never kisses the border). + // - Caption / sub are regular weight ⇒ ~0.58em. + const CHAR_W_BOLD = 0.66; + const CHAR_W_REGULAR = 0.58; + + const maxValueChars = tiles.reduce((m, t) => Math.max(m, t.valueText.length), 1); + const maxCaptionChars = tiles.reduce((m, t) => Math.max(m, t.caption.length), 1); + // Sub-line text is either "% of " (when both value and + // goal are numeric) or "Goal: " (otherwise). Predict the + // longest possible form per tile so the font shrinks accordingly. + const maxSubChars = tiles.reduce((m, t) => { + if (t.progress) { + const pct = Math.round(t.progress.fraction * 100); + const text = `${pct}% of ${t.goalText ?? ''}`; + return Math.max(m, text.length); + } + if (t.goalText != null) return Math.max(m, (`Goal: ${t.goalText}`).length); + return m; + }, 1); + + const fontFitsWidth = (chars: number, charW: number) => + Math.floor(cardInnerW / Math.max(1, chars * charW)); + + const valueFontByWidth = fontFitsWidth(maxValueChars, CHAR_W_BOLD); + const captionFontByWidth = fontFitsWidth(maxCaptionChars, CHAR_W_REGULAR); + const subFontByWidth = fontFitsWidth(maxSubChars, CHAR_W_REGULAR); + + // Detect sub-line presence early — used both to size value (more + // vertical room when there's no sub) and to lay out vertically below. + const hasSubLine = tiles.some(t => t.progress || t.goalText != null); + const hasProgress = tiles.some(t => !!t.progress); + + // Typographic hierarchy (matches Material / Tableau / Power BI / + // Looker scorecard conventions): + // - value is the hero (~hero number). + // - caption ≈ value / 3 (industry range 0.30–0.40). + // - sub ≈ caption (same size; hierarchy is color/weight, not + // size — the sub-line carries data like "78% of 1.2M"). + // Vertical cap on value is loosened when there's no sub-line, so a + // single-metric card uses more of its real estate. + const valueHCap = hasSubLine ? tileH / 2.6 : tileH / 2.1; + const valueFont = Math.min( + 80, + Math.max(10, Math.floor(Math.min(tileW / 5.0, valueHCap, valueFontByWidth))), + ); + const captionFont = Math.max(11, Math.min(22, Math.floor(Math.min(valueFont / 3.0, captionFontByWidth)))); + const subFont = Math.max(10, Math.min(18, Math.floor(Math.min(captionFont, subFontByWidth)))); + + const padTop = Math.max(4, Math.floor(captionFont * 0.55)); + const padBot = Math.max(4, Math.floor(subFont * 0.6)); + const gapCV = Math.max(6, Math.floor(captionFont * 0.55)); // caption → value + const gapVS = Math.max(8, Math.floor(subFont * 1.0)); // value → sub-line + const gapSB = Math.max(4, Math.floor(subFont * 0.55)); // sub-line → bar + const barHeight = Math.max(2, Math.floor(subFont * 0.4)); + + const captionTop = padTop; + const captionBot = captionTop + captionFont; + const valueTop = captionBot + gapCV; + const valueMid = valueTop + Math.floor(valueFont / 2); + const valueBot = valueTop + valueFont; + const subTop = valueBot + gapVS; + const subBot = subTop + subFont; + const barTop = subBot + gapSB; + const barBot = barTop + barHeight; + + const contentBot = hasProgress ? barBot : hasSubLine ? subBot : valueBot; + const slack = Math.max(0, tileH - (contentBot + padBot)); + const yOffset = Math.floor(slack / 2); + + const captionY = captionTop + yOffset; + const valueY = valueMid + yOffset; + const subY = subTop + yOffset; + const barY = barTop + yOffset; + + const barPad = Math.max(4, Math.floor(tileW * 0.1)); + const barLeft = barPad; + const barRight = tileW - barPad; + const barWidth = Math.max(12, barRight - barLeft); + + // ── Card frame geometry ──────────────────────────────────────────── + // Card fills the tile minus a small outer margin so the card's + // visible aspect ratio tracks the tile's (driven by TARGET_ASPECT + // above). Content stays vertically centered inside via yOffset. + const cardOuterPadY = Math.max(4, Math.floor(tileH * 0.06)); + const cardLeft = cardLeftInset; + const cardRight = tileW - cardLeftInset; + const cardTop = Math.max(0.5, cardOuterPadY); + const cardBot = Math.min(tileH - 0.5, tileH - cardOuterPadY); + + // Card style toggle — see properties[] below. When false, the + // frame layer is skipped and tiles render as plain text. + const showCardFrame = config.style !== false; + + // ── Per-tile spec builder ────────────────────────────────────────── + const buildTile = (t: Tile): any => { + const layers: any[] = []; + + // Card frame (bottom layer) — sized to content, centered with it. + if (showCardFrame) { + layers.push({ + data: { values: [{}] }, + mark: { + type: 'rect', + fill: CARD_FILL, + stroke: CARD_STROKE, + strokeWidth: 1, + cornerRadius: CARD_RADIUS, + tooltip: null, + }, + encoding: { + x: { value: cardLeft }, + x2: { value: cardRight }, + y: { value: cardTop }, + y2: { value: cardBot }, + }, + }); + } + + // Caption + layers.push({ + data: { values: [{}] }, + mark: { + type: 'text', + fontSize: captionFont, + fontWeight: 500, + fill: '#4a4a4a', + align: 'center', + baseline: 'top', + text: t.caption, + tooltip: null, + }, + encoding: { + x: { value: tileW / 2 }, + y: { value: captionY }, + }, + }); + + // Big number + layers.push({ + data: { values: [{}] }, + mark: { + type: 'text', + fontSize: valueFont, + fontWeight: 'bold', + fill: '#1a1a1a', + align: 'center', + baseline: 'middle', + text: t.valueText, + tooltip: null, + }, + encoding: { + x: { value: tileW / 2 }, + y: { value: valueY }, + }, + }); + + // Optional goal / progress line + if (t.progress) { + // Numeric value + numeric goal → "% of " + bar. + const pct = clamp(t.progress.fraction, 0, 1.5); + const pctText = `${Math.round(t.progress.fraction * 100)}% of ${t.goalText}`; + + // Status color: behind / on-track / exceeded. Assumes + // higher-is-better; lower-is-better metrics should be + // handled by the agent inverting the value/goal pair (or + // by a future `direction` chart property). + const isExceeded = t.progress.fraction >= 1; + const isBehind = t.progress.fraction < behindThreshold; + const fillColor = isExceeded + ? PROGRESS_EXCEEDED + : isBehind + ? PROGRESS_BEHIND + : PROGRESS_ON_TRACK; + + layers.push({ + data: { values: [{}] }, + mark: { + type: 'text', + fontSize: subFont, + fontWeight: isExceeded ? 600 : 400, + fill: isExceeded ? PROGRESS_EXCEEDED : '#666', + align: 'center', + baseline: 'top', + text: pctText, + tooltip: null, + }, + encoding: { + x: { value: tileW / 2 }, + y: { value: subY }, + }, + }); + + // Track + layers.push({ + data: { values: [{}] }, + mark: { + type: 'rect', + fill: PROGRESS_TRACK, + cornerRadius: barHeight / 2, + tooltip: null, + }, + encoding: { + x: { value: barLeft }, + x2: { value: barRight }, + y: { value: barY }, + y2: { value: barY + barHeight }, + }, + }); + // Fill — clamped to track width; overshoot capped visually + // at 100% of the track, but the % label and color reveal + // that the goal was exceeded. + const fillEnd = barLeft + Math.min(1, pct) * barWidth; + layers.push({ + data: { values: [{}] }, + mark: { + type: 'rect', + fill: fillColor, + cornerRadius: barHeight / 2, + tooltip: null, + }, + encoding: { + x: { value: barLeft }, + x2: { value: fillEnd }, + y: { value: barY }, + y2: { value: barY + barHeight }, + }, + }); + } else if (t.goalText != null) { + // Non-numeric goal (or non-numeric value) → just show "Goal: …". + layers.push({ + data: { values: [{}] }, + mark: { + type: 'text', + fontSize: subFont, + fill: '#666', + align: 'center', + baseline: 'top', + text: `Goal: ${t.goalText}`, + tooltip: null, + }, + encoding: { + x: { value: tileW / 2 }, + y: { value: subY }, + }, + }); + } + + return { + width: tileW, + height: tileH, + layer: layers, + resolve: { scale: { x: 'independent', y: 'independent' } }, + }; + }; + + const tileSpecs = tiles.map(buildTile); + + if (tileSpecs.length === 1) { + const tile = tileSpecs[0]; + spec.width = tile.width; + spec.height = tile.height; + spec.layer = tile.layer; + spec.resolve = tile.resolve; + return; + } + + delete spec.layer; + delete spec.encoding; + if (layout === 'horizontal') { + spec.hconcat = tileSpecs; + spec.spacing = spacing; + } else if (layout === 'vertical') { + spec.vconcat = tileSpecs; + spec.spacing = spacing; + } else { + const grid: any[] = []; + for (let r = 0; r < rows; r++) { + const rowTiles = tileSpecs.slice(r * cols, (r + 1) * cols); + if (rowTiles.length === 0) continue; + grid.push({ hconcat: rowTiles, spacing }); + } + spec.vconcat = grid; + spec.spacing = spacing; + } + }, + properties: [ + { + key: 'layout', + label: 'Layout', + type: 'discrete', + options: [ + { value: 'horizontal', label: 'Horizontal' }, + { value: 'vertical', label: 'Vertical' }, + { value: 'grid', label: 'Grid' }, + ], + defaultValue: 'grid', + }, + { + // When on (default), each tile renders inside a subtle + // rounded card frame (white fill + 1px border). When off, + // the tile is plain text — useful for single hero numbers + // or when the surrounding panel already provides framing. + key: 'style', + label: 'Card style', + type: 'binary', + defaultValue: true, + }, + { + // Progress fraction below this threshold is considered + // "behind" (amber). Between threshold and 1 is "on track" + // (blue). >= 1 is "exceeded" (green). Only applies when a + // goal channel is bound and both value and goal are numeric. + key: 'behindThreshold', + label: 'Behind threshold', + type: 'continuous', + min: 0, + max: 1, + step: 0.05, + defaultValue: 0.5, + visibleWhen: { channels: ['goal'] }, + }, + ] as ChartPropertyDef[], +}; + +// ─── Helpers ──────────────────────────────────────────────────────────────── + +/** + * Trivial scalar → display string. + * + * - Numbers: `toLocaleString` with at most 2 fraction digits. Tiny + * floating-point noise near zero is snapped so "-0" never appears. + * - Strings / everything else: pass through via `String(...)`. + * + * Any richer formatting (currency symbols, SI abbreviations, percent, + * locale-specific patterns) should be produced by the upstream data + * transformation — the template intentionally does not parse format + * patterns. + */ +function renderScalar(v: any): string { + if (typeof v === 'number' && Number.isFinite(v)) { + // Snap -0 / FP noise. + if (Math.abs(v) < 1e-9) v = 0; + return Number.isInteger(v) + ? v.toLocaleString() + : v.toLocaleString(undefined, { maximumFractionDigits: 2 }); + } + return String(v); +} + +function clamp(n: number, lo: number, hi: number): number { + return Math.max(lo, Math.min(hi, n)); +} diff --git a/src/lib/agents-chart/vegalite/templates/line.ts b/src/lib/agents-chart/vegalite/templates/line.ts index d13e9196..34a2a93e 100644 --- a/src/lib/agents-chart/vegalite/templates/line.ts +++ b/src/lib/agents-chart/vegalite/templates/line.ts @@ -18,11 +18,20 @@ const interpolateConfigProperty: ChartPropertyDef = { ], }; +const showPointsProperty: ChartPropertyDef = { + key: "showPoints", label: "Show points", type: "binary", defaultValue: false, +}; + function applyInterpolate(vgSpec: any, config?: Record): void { if (!config?.interpolate) return; vgSpec.mark = setMarkProp(vgSpec.mark, 'interpolate', config.interpolate); } +function applyShowPoints(vgSpec: any, config?: Record): void { + if (!config?.showPoints) return; + vgSpec.mark = setMarkProp(vgSpec.mark, 'point', true); +} + export const lineChartDef: ChartTemplateDef = { chart: "Line Chart", template: { mark: "line", encoding: {} }, @@ -34,21 +43,7 @@ export const lineChartDef: ChartTemplateDef = { instantiate: (spec, ctx) => { defaultBuildEncodings(spec, ctx.resolvedEncodings); applyInterpolate(spec, ctx.chartProperties); + applyShowPoints(spec, ctx.chartProperties); }, - properties: [interpolateConfigProperty], -}; - -export const dottedLineChartDef: ChartTemplateDef = { - chart: "Dotted Line Chart", - template: { mark: { type: "line", point: true }, encoding: {} }, - channels: ["x", "y", "color", "detail", "column", "row"], - markCognitiveChannel: 'position', - declareLayoutMode: () => ({ - paramOverrides: { continuousMarkCrossSection: { x: 100, y: 20, seriesCountAxis: 'auto' }, facetAspectRatioResistance: 0.5 }, - }), - instantiate: (spec, ctx) => { - defaultBuildEncodings(spec, ctx.resolvedEncodings); - applyInterpolate(spec, ctx.chartProperties); - }, - properties: [interpolateConfigProperty], + properties: [interpolateConfigProperty, showPointsProperty], }; diff --git a/src/lib/vega-locale.ts b/src/lib/vega-locale.ts new file mode 100644 index 00000000..f0004d64 --- /dev/null +++ b/src/lib/vega-locale.ts @@ -0,0 +1,46 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Synchronise Vega's time-format locale with the current i18n language. + * + * Locale data lives in the i18n translation files under "chart.vegaLocale". + * To support a new language, add the d3-time-format locale definition to + * that language's chart.json — no code changes needed here. + * + * Call {@link syncVegaLocale} once at startup and on every language change. + */ + +import { defaultLocale } from 'vega'; +import i18n from '../i18n'; + +const D3_DEFAULT_NUMBER = { + decimal: '.', + thousands: ',', + grouping: [3], + currency: ['$', ''], +}; + +const D3_DEFAULT_TIME = { + dateTime: '%x, %X', + date: '%-m/%-d/%Y', + time: '%-I:%M:%S %p', + periods: ['AM', 'PM'], + days: ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'], + shortDays: ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'], + months: ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'], + shortMonths: ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], +}; + +function readTimeLocale(): Record | null { + const obj = i18n.t('chart.vegaLocale', { returnObjects: true }); + if (obj && typeof obj === 'object' && 'months' in (obj as Record)) { + return obj as Record; + } + return null; +} + +export function syncVegaLocale(): void { + const timeLocale = readTimeLocale(); + defaultLocale(D3_DEFAULT_NUMBER as any, (timeLocale ?? D3_DEFAULT_TIME) as any); +} diff --git a/src/views/AgentChatInput.tsx b/src/views/AgentChatInput.tsx new file mode 100644 index 00000000..44ce111a --- /dev/null +++ b/src/views/AgentChatInput.tsx @@ -0,0 +1,531 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. +// +// Shared chat-style input box for agent surfaces. Renders a rounded +// border with focus glow, an inline image-preview row, a file-attach +// affordance, a multiline `InputBase`, and a send/stop button. Used by +// both the in-chat `DataLoadingChat` and the landing-page Data Loading +// Agent quick-start box so they look and behave identically (paste +// image, drag attach, Shift+Enter, etc.). + +import * as React from 'react'; +import { useRef, useState } from 'react'; +import { + Box, + IconButton, + InputBase, + Tooltip, + Typography, + alpha, + useTheme, +} from '@mui/material'; +import AddIcon from '@mui/icons-material/Add'; +import CloseIcon from '@mui/icons-material/Close'; +import InsertDriveFileOutlinedIcon from '@mui/icons-material/InsertDriveFileOutlined'; +import ArrowUpwardRoundedIcon from '@mui/icons-material/ArrowUpwardRounded'; +import StopIcon from '@mui/icons-material/Stop'; +import { useTranslation } from 'react-i18next'; +import { borderColor, transition } from '../app/tokens'; + +export interface AgentChatInputProps { + value: string; + onChange: (v: string) => void; + images: string[]; + onImagesChange: React.Dispatch>; + onSend: () => void; + onStop?: () => void; + inProgress?: boolean; + placeholder?: string; + autoFocus?: boolean; + /** + * What `` accepts. Defaults to images + common + * tabular text files. + */ + fileAccept?: string; + /** + * Called when the user attaches a non-image file. If omitted, + * non-image files are silently ignored (image-only mode). + */ + onNonImageFile?: (file: File) => void; + /** + * Optional list of attached non-image files (e.g. uploaded Excel/CSV). + * Rendered as removable chips above the input — mirrors the + * image-preview row. The parent owns the array and handles + * upload/removal side effects (e.g. stripping the matching + * `[Uploaded: name]` mention from the prompt). + */ + attachments?: string[]; + onAttachmentsChange?: (names: string[]) => void; + sendTooltip?: string; + stopTooltip?: string; + attachTooltip?: string; + inputRef?: React.Ref; + /** Min visible rows for the text area. Defaults to 1. */ + minRows?: number; + /** Max visible rows for the text area. Defaults to 8. */ + maxRows?: number; + /** + * Optional leading slot rendered to the left of the attach button — + * used by surfaces (e.g. landing page) that want a branded icon + * instead of, or in addition to, the attach affordance. + */ + leadingSlot?: React.ReactNode; + /** + * If false, the attach button is hidden (paste of images still works). + */ + showAttachButton?: boolean; + /** + * Layout style. + * - 'inline' (default): leading slot, attach, input, send share a single row. + * - 'stacked': input occupies its own row; the leading slot + attach button + * sit in a bottom-left toolbar, send button in bottom-right. + * Recommended when `minRows > 1`. + */ + layout?: 'inline' | 'stacked'; + /** + * Optional content rendered above the input (e.g. a chip bar of + * available data sources). Only used in `'stacked'` layout. + */ + topSlot?: React.ReactNode; + /** + * When set and the input is empty, pressing Tab fills the input + * with this string (acts as an accept-suggestion shortcut). + */ + tabSuggestion?: string; + /** + * When provided and the input is focused & empty, surfaces these + * prompts as a Google-style overlay dropdown below the input. + * Each item's `onClick` is invoked when the user picks it — the + * caller is responsible for filling text / attaching images so + * suggestions can hand off arbitrary state (e.g. a sample image + * plus a long prompt). Does not push surrounding content. + */ + focusSuggestions?: Array<{ label: string; onClick: () => void; kind?: string; icon?: React.ReactNode }>; + /** + * Optional header label shown above the focus-suggestion list. + * Defaults to "Try asking". + */ + focusSuggestionsLabel?: string; + /** + * Where to anchor the focus-suggestion overlay relative to the input. + * - 'bottom' (default): drops down below the input. + * - 'top': pops up above the input. Use when the input is pinned to + * the bottom of its container and downward overlays would clip. + */ + focusSuggestionsPlacement?: 'top' | 'bottom'; + sx?: any; +} + +export const AgentChatInput: React.FC = ({ + value, + onChange, + images, + onImagesChange, + onSend, + onStop, + inProgress = false, + placeholder, + autoFocus = false, + fileAccept = 'image/*,.csv,.json,.xlsx,.xls,.txt,.tsv', + onNonImageFile, + attachments, + onAttachmentsChange, + sendTooltip, + stopTooltip, + attachTooltip, + inputRef, + minRows, + maxRows = 8, + leadingSlot, + showAttachButton = true, + layout = 'inline', + topSlot, + tabSuggestion, + focusSuggestions, + focusSuggestionsLabel, + focusSuggestionsPlacement = 'bottom', + sx, +}) => { + const theme = useTheme(); + const { t } = useTranslation(); + const fileInputRef = useRef(null); + const localInputRef = useRef(null); + const actualInputRef = (inputRef as React.RefObject) || localInputRef; + const [focused, setFocused] = useState(false); + const showFocusSuggestions = focused + && value.length === 0 + && !!focusSuggestions + && focusSuggestions.length > 0; + + React.useEffect(() => { + if (autoFocus) actualInputRef.current?.focus(); + // eslint-disable-next-line react-hooks/exhaustive-deps + }, []); + + const canSend = (value.trim().length > 0 || images.length > 0) && !inProgress; + + const handlePaste = (e: React.ClipboardEvent) => { + if (e.clipboardData?.files?.length) { + const imageFiles = Array.from(e.clipboardData.files).filter(f => f.type.startsWith('image/')); + if (imageFiles.length > 0) { + e.preventDefault(); + imageFiles.forEach(file => { + const reader = new FileReader(); + reader.onload = () => { + if (reader.result) onImagesChange(prev => [...prev, reader.result as string]); + }; + reader.readAsDataURL(file); + }); + } + } + }; + + const handleFileUpload = (e: React.ChangeEvent) => { + const file = e.target.files?.[0]; + if (!file) return; + if (file.type.startsWith('image/')) { + const reader = new FileReader(); + reader.onload = () => { + if (reader.result) onImagesChange(prev => [...prev, reader.result as string]); + }; + reader.readAsDataURL(file); + } else if (onNonImageFile) { + onNonImageFile(file); + } + if (fileInputRef.current) fileInputRef.current.value = ''; + }; + + const handleKeyDown = (e: React.KeyboardEvent) => { + if (e.key === 'Enter' && !e.shiftKey) { + e.preventDefault(); + if (canSend) onSend(); + return; + } + if (e.key === 'Tab' && !e.shiftKey && tabSuggestion && value.length === 0) { + e.preventDefault(); + onChange(tabSuggestion); + } + }; + + const attachButton = showAttachButton ? ( + + fileInputRef.current?.click()} + disabled={inProgress} + sx={{ color: 'text.secondary' }}> + + + + ) : null; + + const sendButton = inProgress && onStop ? ( + + + + + + ) : ( + + + + + + + + ); + + const hiddenFileInput = ( + + ); + + const inputField = ( + onChange(e.target.value)} + onKeyDown={handleKeyDown} + onPaste={handlePaste} + onFocus={() => setFocused(true)} + // Delay blur so a mousedown on a suggestion can fire first. + // Suggestion items also call preventDefault on mousedown, so + // in practice the textarea stays focused while we fill it. + onBlur={() => window.setTimeout(() => setFocused(false), 120)} + placeholder={placeholder} + disabled={inProgress} + sx={{ + flex: 1, + width: '100%', + px: 1, + py: 0.75, + fontSize: 13, + lineHeight: 1.5, + alignItems: 'flex-start', + '& .MuiInputBase-input': { width: '100%' }, + }} + /> + ); + + + return ( + + + {/* Top slot (e.g. data-source chip bar) sits flush with the + input area below — no divider, same background. */} + {topSlot && ( + + {topSlot} + + )} + + {/* Image previews */} + {images.length > 0 && ( + + {images.map((img, i) => ( + + + onImagesChange(prev => prev.filter((_, idx) => idx !== i))} + sx={{ + position: 'absolute', top: -4, right: -4, + width: 18, height: 18, + bgcolor: 'rgba(0,0,0,0.55)', color: 'white', + '&:hover': { bgcolor: 'rgba(0,0,0,0.75)' }, + }}> + + + + ))} + + )} + + {/* Attached non-image file chips (Excel, CSV, JSON, …) */} + {attachments && attachments.length > 0 && ( + + {attachments.map((name, i) => ( + + + + {name} + + {onAttachmentsChange && ( + onAttachmentsChange(attachments.filter((_, idx) => idx !== i))} + sx={{ width: 16, height: 16, p: 0, color: 'text.disabled', + '&:hover': { color: 'text.primary', bgcolor: alpha(theme.palette.text.primary, 0.06) } }}> + + + )} + + ))} + + )} + + {hiddenFileInput} + + {layout === 'stacked' ? ( + <> + {/* Input takes its own row so multi-line text aligns naturally. */} + + {inputField} + + {/* Bottom toolbar: leading slot + attach on the left, send on the right. */} + + + {leadingSlot} + {attachButton} + + + {sendButton} + + + + ) : ( + /* Inline layout: everything on a single row. */ + + {leadingSlot} + {showAttachButton && ( + {attachButton} + )} + {inputField} + {sendButton} + + )} + + + {/* Google-style suggestion overlay. Anchored to the outer + relative wrapper so it overlays content below instead of + pushing layout. */} + {showFocusSuggestions && ( + + + {focusSuggestionsLabel ?? 'Try asking'} + + {focusSuggestions!.map((s, i) => ( + { + // Prevent the textarea from blurring so the + // overlay doesn't disappear mid-click. + e.preventDefault(); + s.onClick(); + actualInputRef.current?.focus(); + }} + sx={{ + display: 'flex', + alignItems: 'center', + gap: 1, + px: 1.5, + py: 0.5, + cursor: 'pointer', + color: 'text.secondary', + '&:hover': { bgcolor: 'action.hover', color: 'text.primary' }, + }} + > + {s.icon ? ( + + {s.icon} + + ) : null} + + {s.label} + + {s.kind && ( + + {s.kind} + + )} + + ))} + + )} + + ); +}; diff --git a/src/views/AgentPausePanel.tsx b/src/views/AgentPausePanel.tsx new file mode 100644 index 00000000..f902752e --- /dev/null +++ b/src/views/AgentPausePanel.tsx @@ -0,0 +1,546 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Unified UI for "agent pause" panels that sit above the chat input in + * `SimpleChartRecBox`. All three variants share the same chrome (accent + * background, collapsible header with dismiss + minimize, body area) and + * differ only in their body content and the callback wired to the primary + * action: + * + * - `ClarificationPanel` — agent asks a question (warning palette). + * - `ExplanationPanel` — agent gives an answer with follow-ups (info palette). + * (rendered by `ClarificationPanel` with `variant="explain"`) + * - `DelegatePanel` — agent recommends handing off to a peer + * agent (Data Loading or Report Gen). + * + * Keeping them in one file makes shared styling/layout tweaks (header + * spacing, palette use, collapse animation) trivial to maintain. + */ + +import React, { FC, ReactNode, useEffect, useRef, useState } from 'react'; +import { + Box, Collapse, IconButton, Tooltip, Typography, useTheme, +} from '@mui/material'; +import { alpha } from '@mui/material/styles'; +import SearchIcon from '@mui/icons-material/Search'; +import DescriptionOutlinedIcon from '@mui/icons-material/DescriptionOutlined'; +import DeleteOutlineIcon from '@mui/icons-material/DeleteOutline'; +import UnfoldLessIcon from '@mui/icons-material/UnfoldLess'; +import UnfoldMoreIcon from '@mui/icons-material/UnfoldMore'; +import { useTranslation } from 'react-i18next'; +import { useDispatch } from 'react-redux'; +import { dfActions } from '../app/dfSlice'; +import { AgentToyIcon } from './AgentToyIcon'; +import { + ClarificationQuestion, + ClarificationResponse, + DelegateTarget, +} from '../components/ComponentType'; +import { renderFieldHighlights } from './InteractionEntryCard'; + +// --------------------------------------------------------------------------- +// Shared shell +// --------------------------------------------------------------------------- + +interface AgentPauseShellProps { + /** Localized header label shown both when expanded and minimized. */ + title: string; + /** Short preview text shown next to the title when the panel is minimized. */ + minimizedPreview?: string; + /** Tooltip + behavior for the dismiss (×) icon. */ + dismissTooltip: string; + /** Tooltip labels for the minimize/expand toggle. */ + minimizeTooltip: string; + expandTooltip: string; + /** + * Icon glyph rendered in the header. Callers pass a fully-styled + * `AgentToyIcon` (or any node) so the shell stays agnostic of icon + * variants and colors. The shell only fades the icon when minimized. + */ + icon: ReactNode; + /** + * Optional accent color driving the panel's chrome (bg fill, border, + * hover). When omitted the panel uses neutral greyscale chrome. Each + * pause variant passes its own semantic hue so clarify / explain / + * suggest panels read as visually distinct moments in the timeline + * (parallel to the tinted bubbles in `InteractionEntryCard`). + */ + accentColor?: string; + /** Called when the user dismisses the pause. */ + onCancel: () => void; + /** Reset minimized state whenever this value changes (e.g. new questions). */ + resetKey?: unknown; + children: ReactNode; +} + +const AgentPauseShell: FC = ({ + title, + minimizedPreview, + dismissTooltip, + minimizeTooltip, + expandTooltip, + icon, + accentColor, + onCancel, + resetKey, + children, +}) => { + const theme = useTheme(); + const [minimized, setMinimized] = useState(false); + + // Chrome is either neutral greyscale (no accent) or a soft tinted fill + // in the variant's semantic hue (clarify=warning, explain/suggest=primary). + // The tint is intentionally faint so the panel sits quietly above the + // chat input — interactive affordances (option chips, CTAs) still carry + // the strongest color. + const tinted = !!accentColor; + const panelBg = tinted + ? alpha(accentColor!, 0.05) + : alpha(theme.palette.text.primary, 0.03); + const panelBorder = tinted + ? alpha(accentColor!, 0.18) + : alpha(theme.palette.text.primary, 0.10); + const panelHover = tinted + ? alpha(accentColor!, 0.09) + : alpha(theme.palette.text.primary, 0.04); + const primaryColor = theme.palette.primary.main; + + // Reset minimize when the underlying pause changes so a brand-new + // pause shows up expanded by default. + useEffect(() => { setMinimized(false); }, [resetKey]); + + return ( + + setMinimized(prev => !prev)} + sx={{ + display: 'flex', alignItems: 'center', gap: '6px', minHeight: 16, + cursor: 'pointer', + // Stretch hover background to the panel's full content + // width by extending past the parent's px: 0.5 padding, + // then re-add it on the inside. Header owns the top + // spacing so the hover bg fills cleanly to the panel's + // rounded top edge. + px: 0.5, mx: -0.5, pt: '8px', pb: '6px', + '&:hover': { backgroundColor: panelHover }, + }} + > + + {icon} + + {minimized ? ( + + + {title} + + {minimizedPreview && ( + + {minimizedPreview.slice(0, 120)} + + )} + + ) : ( + + {title} + + )} + + { e.stopPropagation(); onCancel(); }} + sx={{ + p: 0, width: 16, height: 16, + color: theme.palette.text.disabled, + '&:hover': { color: theme.palette.error.main }, + }} + > + + + + + + {minimized + ? + : } + + + + + + {children} + + + ); +}; + +// --------------------------------------------------------------------------- +// ClarificationPanel (also handles `variant="explain"`) +// --------------------------------------------------------------------------- + +interface ClarificationPanelProps { + questions: ClarificationQuestion[]; + /** + * 'clarify' (default) — agent is asking the user a question (warning palette). + * 'explain' — agent gave an answer; options are suggested chart + * follow-ups the user can click (info palette). + */ + variant?: 'clarify' | 'explain'; + /** + * Optional. Currently selected answer per question (keyed by question + * index). When provided together with `onSelectAnswer`, the panel will + * route option clicks (and free-text Enter) through `onSelectAnswer` + * instead of submitting immediately. The parent decides when to submit + * (e.g. after all questions are answered). + */ + selectedAnswers?: Record; + onSelectAnswer?: (questionIndex: number, response: ClarificationResponse) => void; + onSubmit: (responses: ClarificationResponse[]) => void; + onCancel: () => void; +} + +export const ClarificationPanel: FC = ({ + questions, + variant = 'clarify', + selectedAnswers, + onSelectAnswer, + onSubmit, + onCancel, +}) => { + const theme = useTheme(); + const { t } = useTranslation(); + const submittedRef = useRef(false); + + useEffect(() => { submittedRef.current = false; }, [questions]); + + const submitResponses = (responses: ClarificationResponse[]) => { + if (responses.length === 0 || submittedRef.current) return; + submittedRef.current = true; + onSubmit(responses); + }; + + /** + * Handle a single answer (option click or free-text Enter). When the + * parent provides `onSelectAnswer`, the panel defers to it and the + * parent decides when to actually submit. Otherwise we fall back to + * the legacy "click = submit immediately" behavior. + */ + const handleAnswer = (response: ClarificationResponse) => { + if (onSelectAnswer) { + onSelectAnswer(response.question_index, response); + return; + } + submitResponses([response]); + }; + + const isExplain = variant === 'explain'; + // Per-variant accent color drives both the panel chrome (bg/border) and + // the option chip affordances (border, hover, selection) so a clarify + // panel reads entirely in the warning hue and an explain panel entirely + // in the primary hue — no cross-color clashes between chrome and chips. + const chromeAccent = isExplain + ? theme.palette.primary.main + : theme.palette.warning.main; + // Field highlights (`**name**` tokens in question/option text) also use + // the variant accent so the underline color matches the panel. + const accentColor = chromeAccent; + const title = t(isExplain ? 'chartRec.explanationTitle' : 'chartRec.clarificationTitle'); + + return ( + } + accentColor={chromeAccent} + title={title} + minimizedPreview={questions[0]?.text || ''} + dismissTooltip={t('chartRec.cancelClarification')} + minimizeTooltip={t('chartRec.minimizeClarification')} + expandTooltip={t('chartRec.expandClarification')} + onCancel={onCancel} + resetKey={questions} + > + + {questions.map((question, questionIndex) => ( + + {/* Text portion is height-bounded and scrollable so very + long explanations don't push options off-screen. + Options remain fixed below the scrolling region. */} + + + {!isExplain && questions.length > 1 && ( + <> + {t('chartRec.clarificationQuestionLabel', { index: questionIndex + 1 })}{' '} + + )} + {renderFieldHighlights(question.text, accentColor)} + + + + {question.responseType === 'free_text' ? ( + // Free-text questions don't render their own input. + // The user types the answer in the main chat box + // below and hits Send (or Enter). + + {t('chartRec.freeTextClarificationHint')} + + ) : ( + + {isExplain && (question.options || []).length > 0 && ( + + {t('chartRec.explanationFollowupsLabel')} + + )} + + {(question.options || []).map((option, optionIndex) => { + const selected = selectedAnswers?.[questionIndex]; + const isSelected = !!selected && selected.answer === option.label; + return ( + + handleAnswer({ + question_index: questionIndex, + answer: option.label, + source: 'option', + })} + sx={{ + position: 'relative', zIndex: 1, + px: '8px', py: '4px', + borderRadius: '6px', + border: `1px solid ${isSelected ? alpha(accentColor, 0.6) : alpha(theme.palette.text.primary, 0.12)}`, + backgroundColor: isSelected ? alpha(accentColor, 0.12) : theme.palette.background.paper, + cursor: 'pointer', + fontSize: 11, + fontWeight: isSelected ? 600 : 400, + display: 'inline-block', + whiteSpace: 'normal', + wordBreak: 'break-word', + lineHeight: 1.4, + color: theme.palette.text.primary, + textAlign: 'left', + fontFamily: theme.typography.fontFamily, + '&:hover': { backgroundColor: alpha(accentColor, isSelected ? 0.16 : 0.08) }, + }} + > + {renderFieldHighlights(option.label, accentColor)} + + + ); + })} + + + )} + + ))} + + + ); +}; + +// --------------------------------------------------------------------------- +// DelegatePanel +// --------------------------------------------------------------------------- + +interface DelegatePanelProps { + /** Which peer agent the Data Agent recommends handing off to. */ + target: DelegateTarget; + /** Short user-facing message from the Data Agent. */ + message: string; + /** One or two hand-off option prompts (cards). Each string is shown + * on the button and used as the seed prompt for the target agent. */ + options: string[]; + /** Dismiss the suggestion (treated as cancelling the pause). */ + onCancel: () => void; +} + +/** + * Renders when the Data Agent emits a `delegate` action. The card shows + * a short message plus 1–2 one-click hand-off buttons. Picking one + * dispatches an `agentHandoffRequest` to Redux; the matching consumer + * (Data Formulator for `data_loading`, SimpleChartRecBox for + * `report_gen`) picks it up and starts the target agent with the + * selected option as its seed prompt. + */ +export const DelegatePanel: FC = ({ + target, + message, + options, + onCancel, +}) => { + const theme = useTheme(); + const { t } = useTranslation(); + const dispatch = useDispatch(); + + const handleHandoff = (prompt: string) => { + const cleanPrompt = prompt.trim(); + if (!cleanPrompt) return; + dispatch(dfActions.requestAgentHandoff({ target, prompt: cleanPrompt })); + // Hand off — the user's attention moves to the target agent + // and the data-agent run is done. + onCancel(); + }; + + const isReport = target === 'report_gen'; + const ctaCaption = isReport + ? t('chartRec.delegateToReportGen') + : t('chartRec.delegateToDataLoading'); + const CtaIcon = isReport ? DescriptionOutlinedIcon : SearchIcon; + + const validOptions = (options || []) + .map(o => (o || '').trim()) + .filter(o => o.length > 0) + .slice(0, 2); + + return ( + } + accentColor={theme.palette.primary.main} + title={t('chartRec.delegateTitle')} + minimizedPreview={message} + dismissTooltip={t('chartRec.delegateDismiss')} + minimizeTooltip={t('chartRec.delegateMinimize')} + expandTooltip={t('chartRec.delegateExpand')} + onCancel={onCancel} + resetKey={`${target}|${validOptions.join('||')}`} + > + + {message && ( + + + {message} + + + )} + + + {ctaCaption} + + + {validOptions.map((prompt, idx) => ( + + handleHandoff(prompt)} + title={prompt} + sx={{ + position: 'relative', zIndex: 1, + width: '100%', + px: '8px', py: '6px', + borderRadius: '6px', + border: `1px solid ${alpha(theme.palette.text.primary, 0.12)}`, + backgroundColor: theme.palette.background.paper, + cursor: 'pointer', + fontSize: 11, + fontWeight: 400, + display: 'inline-flex', + alignItems: 'center', + gap: '6px', + whiteSpace: 'normal', + wordBreak: 'break-word', + lineHeight: 1.4, + color: theme.palette.text.primary, + textAlign: 'left', + fontFamily: theme.typography.fontFamily, + '&:hover': { + backgroundColor: alpha(theme.palette.primary.main, 0.08), + }, + }} + > + + {prompt} + + + ))} + + + + + ); +}; diff --git a/src/views/AgentToyIcon.tsx b/src/views/AgentToyIcon.tsx new file mode 100644 index 00000000..ea0b028e --- /dev/null +++ b/src/views/AgentToyIcon.tsx @@ -0,0 +1,187 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +import React from 'react'; +import { Box, SvgIcon, SvgIconProps, useTheme } from '@mui/material'; + +export type AgentToyVariant = 'thinking' | 'summary' | 'clarify' | 'explain' | 'default'; + +interface AgentToyIconProps extends SvgIconProps { + variant?: AgentToyVariant; +} + +/** + * A custom robot-head icon with a swappable mouth and optional corner badge, + * used to convey agent role/state in the data-thread gutter. + * + * thinking → squiggle mouth + * summary → smile mouth + * clarify → flat mouth + `?` badge in warning color + * explain → open-`o` mouth + `i` badge in info color + * default → flat mouth + * + * The face inherits `color` via `currentColor`, so it follows the surrounding + * highlight/dim state. The corner badge uses fixed semantic colors so role + * information survives dimming. + */ +export const AgentToyIcon: React.FC = ({ variant = 'default', ...rest }) => { + const theme = useTheme(); + + const renderMouth = () => { + const common = { fill: 'none', stroke: 'currentColor', strokeLinecap: 'round' as const }; + switch (variant) { + case 'thinking': + case 'summary': + // standard smile — neutral/positive resting state + return ; + case 'explain': + // open `o` — reads as "talking, narrating" + return ; + case 'clarify': + case 'default': + default: + // flat neutral + return ; + } + }; + + const renderBadge = () => { + const cx = 19.5, cy = 4.5; + if (variant === 'clarify') { + // larger `?` badge — clarify needs strong attention + const r = 5; + return ( + + + + ? + + ); + } + if (variant === 'explain') { + const r = 5; + return ( + + + + i + + ); + } + if (variant === 'summary') { + // checkmark badge — "presented, done" + const r = 5; + return ( + + + + + + ); + } + return null; + }; + + return ( + + {/* antenna */} + + + {/* head */} + + {/* ears */} + + + {/* eyes */} + + + {/* mouth (variant-specific) */} + {renderMouth()} + {/* corner badge (clarify / explain) */} + {renderBadge()} + + ); +}; + +/** + * A "live" version of the robot face for hero/landing surfaces. Adds three + * subtle ambient animations on top of the base face so it feels like the + * agent is idling, not frozen: + * + * - antenna dot: gentle pulsing opacity + radius + * - eyes: rare blink (squeeze on the Y axis), staggered slightly + * - mouth: slow cycle through smile → little-o → smile → neutral, so + * the agent looks like it's quietly humming along + * + * Everything inherits `color` via `currentColor`. The head itself does NOT + * move — earlier feedback was that head-bob/tilt felt unnatural. + */ +export const AnimatedAgentToyIcon: React.FC = (props) => { + const mouths = React.useMemo( + () => ['smile', 'o', 'smile', 'flat'] as const, + [], + ); + const [mouthIdx, setMouthIdx] = React.useState(0); + React.useEffect(() => { + const id = setInterval(() => { + setMouthIdx(i => (i + 1) % mouths.length); + }, 1600); + return () => clearInterval(id); + }, [mouths.length]); + const mouth = mouths[mouthIdx]; + + const mouthCommon = { fill: 'none', stroke: 'currentColor', strokeLinecap: 'round' as const }; + + // Blink: very rapid scaleY squeeze, mostly held open. Stagger the right + // eye by a tiny delay so blinks don't feel mechanically synchronized. + const eyeBlinkSx = (delay: string) => ({ + transformOrigin: 'center', + transformBox: 'fill-box' as const, + animation: `df-agent-blink 5.4s ${delay} ease-in-out infinite`, + '@keyframes df-agent-blink': { + '0%, 92%, 100%': { transform: 'scaleY(1)' }, + '95%': { transform: 'scaleY(0.1)' }, + '97%': { transform: 'scaleY(1)' }, + }, + }); + + return ( + + {/* antenna dot — gentle pulse */} + + + + + + {/* head */} + + {/* ears */} + + + {/* eyes — blink via CSS keyframes */} + + + {/* mouth — cycles through expressions */} + {mouth === 'smile' && ( + + )} + {mouth === 'o' && ( + + )} + {mouth === 'flat' && ( + + )} + + ); +}; diff --git a/src/views/ChartRecBox.tsx b/src/views/ChartRecBox.tsx index 1303b0fb..872dc3c1 100644 --- a/src/views/ChartRecBox.tsx +++ b/src/views/ChartRecBox.tsx @@ -1,24 +1,20 @@ // Copyright (c) Microsoft Corporation. // Licensed under the MIT License. -import { FC, useEffect, useState, useRef } from 'react' +import { FC, useState, useRef } from 'react' import { useTranslation } from 'react-i18next'; import { transition } from '../app/tokens'; import { useSelector, useDispatch } from 'react-redux' -import { DataFormulatorState, dfActions, generateFreshChart } from '../app/dfSlice'; +import { DataFormulatorState, dfActions, dfSelectors, generateFreshChart } from '../app/dfSlice'; import { AppDispatch } from '../app/store'; import { Box, - Typography, - IconButton, Tooltip, - TextField, - Card, + Typography, SxProps, LinearProgress, - CircularProgress, alpha, useTheme, Theme, @@ -33,10 +29,10 @@ import '../scss/EncodingShelf.scss'; import { resolveRecommendedChart } from '../app/utils'; import { useFormulateData } from '../app/useFormulateData'; -import { AgentIcon as PrecisionManufacturing } from '../icons'; -import TipsAndUpdatesIcon from '@mui/icons-material/TipsAndUpdates'; +import { TableIcon } from '../icons'; import { renderTextWithEmphasis } from './EncodingShelfCard'; -import { ThinkingBufferEffect } from '../components/FunComponents'; +import { getChartTemplate } from '../components/ChartTemplates'; +import { generateChartSkeleton } from './ChartUtils'; // when this is set to true, the new chart will be focused automatically const AUTO_FOCUS_NEW_CHART = false; @@ -123,30 +119,18 @@ export const ChartRecBox: FC = function ({ tableId, placeHolde // reference to states const tables = useSelector((state: DataFormulatorState) => state.tables); - const { streamIdeas, formulateData } = useFormulateData(); + const allCharts = useSelector(dfSelectors.getAllCharts); + // Thumbnails live in their own slice; reading the whole map is fine here + // because the recommendation strip already re-renders when its chart list + // changes. The map identity only changes when some thumbnail is updated. + const chartThumbnails = useSelector((state: DataFormulatorState) => state.chartThumbnails) || {}; + const { formulateData } = useFormulateData(); const focusNextChartRef = useRef(true); - - const modeColor = theme.palette.secondary.main; - - const [prompt, setPrompt] = useState(""); - const [isFormulating, setIsFormulating] = useState(false); - const [ideas, setIdeas] = useState<{text: string, goal: string, tag: string}[]>([]); - - const [thinkingBuffer, setThinkingBuffer] = useState(""); - let thinkingBufferEffect = ; - - // Add state for loading ideas - const [isLoadingIdeas, setIsLoadingIdeas] = useState(false); - const [ideaPhase, setIdeaPhase] = useState(''); - const [ideaElapsed, setIdeaElapsed] = useState(0); + const modeColor = theme.palette.secondary.main; - useEffect(() => { - if (!isLoadingIdeas) { setIdeaElapsed(0); setIdeaPhase(''); return; } - const timer = setInterval(() => setIdeaElapsed(e => e + 1), 1000); - return () => clearInterval(timer); - }, [isLoadingIdeas]); + const [isFormulating, setIsFormulating] = useState(false); // Use the provided tableId and find additional available tables for multi-table operations const currentTable = tables.find(t => t.id === tableId); @@ -160,48 +144,6 @@ export const ChartRecBox: FC = function ({ tableId, placeHolde ...priorityIds.filter(id => rootTables.some(t => t.id === id)), ...rootTables.map(t => t.id).filter(id => !priorityIds.includes(id)) ]; - - const getDefaultPrompt = (): string => { - return "show something interesting about the data"; - }; - - const getQuestionPlaceholder = (): string => { - return t('chartRec.defaultInterestingPromptPlaceholder'); - }; - - // Function to get ideas from the interactive explore agent - const getIdeasFromAgent = async (startQuestion?: string) => { - if (!currentTable || isLoadingIdeas) return; - - await streamIdeas({ - actionTableIds: selectedTableIds, - currentTable, - onIdeas: setIdeas, - onThinkingBuffer: setThinkingBuffer, - onLoadingChange: setIsLoadingIdeas, - onProgress: setIdeaPhase, - startQuestion, - }); - setThinkingBuffer(""); - }; - - useEffect(() => { - setIdeas([]); - }, [tableId]); - - // Handle tab key press for auto-completion - const handleKeyDown = (event: React.KeyboardEvent) => { - if (event.key === 'Tab' && !event.shiftKey) { - event.preventDefault(); - if (prompt.trim() === "") { - setPrompt(getDefaultPrompt()); - } - } else if (event.key === 'Enter' && prompt.trim() !== "") { - event.preventDefault(); - focusNextChartRef.current = true; - deriveDataFromNL(prompt.trim()); - } - }; const deriveDataFromNL = (instruction: string) => { @@ -255,7 +197,6 @@ export const ChartRecBox: FC = function ({ tableId, placeHolde "type": "success", "value": `Data formulation: "${displayInstruction}"` })); - setPrompt(""); }, onError: () => { }, @@ -269,163 +210,654 @@ export const ChartRecBox: FC = function ({ tableId, placeHolde }; return ( - - - - {isFormulating && ( - - )} - - + {isFormulating && ( + + )} + {currentTable && (() => { + // Unified provenance ribbon + chart strip: + // row 1 = the trigger chain (… ▸ grandparent ▸ parent ▸ THIS ▸ child1, child2 ▸ …) + // row 2 = a chart-thumbnail cluster directly under each + // table label that owns charts. + // + // We use a single CSS grid with one column per ribbon item so + // the cluster for table X is always horizontally aligned with + // X's label. The entire grid is then centered inside the + // container, so the ribbon as a whole reads as balanced + // regardless of whether the current table sits near one end + // of the chain (e.g. a root like "gas-prices"). + const parent = currentTable.derive?.trigger?.tableId + ? tables.find(t => t.id === currentTable.derive!.trigger.tableId) + : undefined; + const grandparent = parent?.derive?.trigger?.tableId + ? tables.find(t => t.id === parent.derive!.trigger.tableId) + : undefined; + const hasGreatGrandparent = !!grandparent?.derive?.trigger?.tableId; + const children = tables.filter(t => t.derive?.trigger?.tableId === currentTable.id); + + const ancestors = [grandparent, parent].filter(Boolean) as typeof tables; + + // Symmetric reach: when the current node sits at an end of + // the lineage, extend further into the available direction + // so we always show up to 3 neighbours total. + // + // • At the root (no ancestors) with a single child: also + // surface the grandchild(ren) as additional right-chain + // entries. This turns "Movie Performance → Movie + // Budgets Gross → …" into "Movie Performance → Movie + // Budgets Gross → Genre ROI Summary". + // • At a leaf (no children) we already display two + // ancestors; if there's only a parent, also surface + // the great-grandparent so we still show 3 nodes. + let extraDescendants: typeof tables = []; + if (ancestors.length === 0 && children.length === 1) { + extraDescendants = tables.filter(t => t.derive?.trigger?.tableId === children[0].id); + } + const greatGrandparent = hasGreatGrandparent + ? tables.find(t => t.id === grandparent!.derive!.trigger.tableId) + : undefined; + if (children.length === 0 && ancestors.length === 1 && greatGrandparent) { + ancestors.unshift(greatGrandparent); + } + // Is there still an unseen node above our topmost ancestor? + const topAncestor = ancestors[0]; + const hasHiddenAncestor = !!topAncestor?.derive?.trigger?.tableId; + + if (ancestors.length === 0 && children.length === 0) return null; + + // ── chart filtering ──────────────────────────────────────── + // Drop: + // • the empty-canvas placeholder chart that's rendering us, + // • trigger-source stubs (virtual metadata merged in by + // `selectTriggerCharts` — they have no real thumbnail), + // • placeholder chart types that never render to PNG. + const chartsForTable = (tid: string) => allCharts.filter(c => + c.tableRef === tid + && c.id !== placeHolderChartId + && c.source !== 'trigger' + && !['Auto', '?'].includes(c.chartType) + ); + + // ── ribbon atoms ─────────────────────────────────────────── + const TableRef: FC<{ table: typeof currentTable, current?: boolean }> = ({ table, current }) => ( + dispatch(dfActions.setFocused({ type: 'table', tableId: table.id }))} sx={{ - flex: 1, - "& .MuiInput-input": { fontSize: '14px', lineHeight: 1.5 }, - "& .MuiInput-underline:before": { borderBottom: 'none' }, - "& .MuiInput-underline:hover:not(.Mui-disabled):before": { borderBottom: 'none' }, - "& .MuiInput-underline:not(.Mui-disabled):before": { borderBottom: 'none' }, - "& .MuiInput-underline.Mui-disabled:before": { borderBottom: 'none' }, - "& .MuiInput-underline.Mui-disabled:after": { borderBottom: 'none' }, - "& .MuiInput-underline:after": { borderBottom: 'none' }, + display: 'inline-flex', alignItems: 'center', gap: current ? '6px' : '3px', + border: 'none', background: 'transparent', p: 0, + fontFamily: theme.typography.fontFamily, + fontSize: current ? 16 : 11, lineHeight: 1.4, + color: current ? 'primary.main' : 'text.secondary', + fontWeight: current ? 600 : 400, + cursor: current ? 'default' : 'pointer', + whiteSpace: 'nowrap', + transition: transition.fast, + '&:hover': current ? undefined : { color: 'primary.main' }, }} - disabled={isFormulating || isLoadingIdeas} - onChange={(event) => setPrompt(event.target.value)} - onKeyDown={handleKeyDown} - slotProps={{ - inputLabel: { shrink: true }, - input: { - endAdornment: - - - { - focusNextChartRef.current = true; - deriveDataFromNL(prompt.trim()); - }} - > - {isFormulating ? - - : } - - - + > + + {table.displayId} + + ); + const Sep = () => ( + // Solid 1px connector line — mirrors the timeline guide + // lines used in DataThread to express "this flows into + // that" rather than a generic "next item" arrow. + + ); + const Ellipsis = () => ( + + ); + const Comma = () => ( + , + ); + + // ── progressive truncation (same heuristic as before) ────── + // Estimates per-item width and sheds entries from the longer + // chain until the whole ribbon fits in BUDGET. + const charW = 7; + const currentCharW = 10; + const ITEM_OVERHEAD = 22; + const CURRENT_OVERHEAD = 28; + const SEP_W = 16; + const ELLIPSIS_W = 12; + const BUDGET = 680; + + const estW = (table: typeof currentTable) => + ITEM_OVERHEAD + (table?.displayId.length ?? 0) * charW; + const currentW = CURRENT_OVERHEAD + currentTable.displayId.length * currentCharW; + + // When there are 2+ children we abandon the inline + // comma-chain and render them as a vertical fan to the + // right of the current node — each branch sits on its own + // short rail with its label + inline stack chip. + const useChildrenFan = children.length >= 2; + // The fan is a vertical stack of branches, so its width is + // governed by the LONGEST single branch — not the sum of + // children. Estimate: elbow stub + label padding + label + // glyphs + optional grandchild ellipsis affordance. This + // replaces the old fixed FAN_W = 280 which overestimated + // and caused ancestors to be shed unnecessarily. + const fanBranchW = (t: typeof currentTable) => { + const labelW = (t?.displayId.length ?? 0) * charW; + const grandchildAffordance = + tables.some(tt => tt.derive?.trigger?.tableId === t.id) ? 24 : 0; + // 22 elbow + 14 pl + label + ~10 right padding + return 22 + 14 + labelW + grandchildAffordance + 10; + }; + const FAN_W = useChildrenFan + ? Math.max(...children.map(fanBranchW)) + : 0; + + let leftChain = [...ancestors]; + let rightChain = useChildrenFan ? [] as typeof tables : [...children, ...extraDescendants]; + let leftEllipsis = hasHiddenAncestor; + let rightTruncated = false; + + const totalW = () => { + let w = currentW; + if (leftEllipsis) w += ELLIPSIS_W + SEP_W; + for (const a of leftChain) w += estW(a) + SEP_W; + if (useChildrenFan) { + w += SEP_W + FAN_W; + } else { + if (rightChain.length > 0) w += SEP_W; + rightChain.forEach((c, i) => { w += estW(c) + (i > 0 ? 8 : 0); }); + if (rightTruncated) w += 8 + ELLIPSIS_W; + } + return w; + }; + + while (totalW() > BUDGET) { + // In fan mode we never shed children — the fan owns its + // own vertical real estate. Just shed ancestors. + if (useChildrenFan) { + if (leftChain.length > 0) { + leftChain.shift(); + leftEllipsis = true; + } else { break; } + } else if (rightChain.length > leftChain.length && rightChain.length > 0) { + rightChain.pop(); + rightTruncated = true; + } else if (leftChain.length > 0) { + leftChain.shift(); + leftEllipsis = true; + } else if (rightChain.length > 0) { + rightChain.pop(); + rightTruncated = true; + } else { + break; + } + } + + // ── build a flat sequence of grid items ──────────────────── + // Each item is one of: + // • connector — occupies row 1 of its own column + // • table — label in row 1, cluster in row 2 + // • children-fan — a single cell spanning both rows that + // renders the children as a vertical + // stack of branch rows. + type Connector = { kind: 'connector', key: string, node: React.ReactNode }; + type TableItem = { + kind: 'table', + key: string, + label: React.ReactNode, + charts: Chart[], + current?: boolean, + }; + type FanItem = { + kind: 'children-fan', + key: string, + branches: typeof tables, + }; + const items: (Connector | TableItem | FanItem)[] = []; + + if (leftEllipsis) { + items.push({ kind: 'connector', key: 'lell', node: }); + items.push({ kind: 'connector', key: 'lell-sep', node: }); + } + leftChain.forEach((a, i) => { + items.push({ + kind: 'table', key: `a-${a.id}`, + label: , + charts: chartsForTable(a.id), + }); + items.push({ kind: 'connector', key: `a-${a.id}-sep`, node: }); + }); + items.push({ + kind: 'table', key: `c-${currentTable.id}`, + label: , + charts: chartsForTable(currentTable.id), + current: true, + }); + if (useChildrenFan) { + // The fan draws its own entry stub at its vertical + // midpoint, so no separate row-1 connector is needed + // (a Sep here would dangle from the label baseline + // and never meet the trunk). + items.push({ kind: 'children-fan', key: 'fan', branches: children }); + } else { + rightChain.forEach((c, i) => { + const prev = i === 0 ? currentTable : rightChain[i - 1]; + const isDescendant = c.derive?.trigger?.tableId === prev.id; + // Sep = chain continuation (parent→child). Comma = + // sibling enumeration under the same parent. + items.push({ + kind: 'connector', + key: `c-${c.id}-sep`, + node: isDescendant ? : , + }); + items.push({ + kind: 'table', key: `r-${c.id}`, + label: , + charts: chartsForTable(c.id), + }); + // "…" affordance: only when c has children AND the + // next ribbon entry isn't one of them (otherwise the + // chain already exposes the descendant). + const cChildren = tables.filter(t => t.derive?.trigger?.tableId === c.id); + const nextInChain = rightChain[i + 1]; + const nextIsChild = !!nextInChain && cChildren.some(cc => cc.id === nextInChain.id); + if (cChildren.length > 0 && !nextIsChild) { + items.push({ kind: 'connector', key: `r-${c.id}-sep2`, node: }); + items.push({ kind: 'connector', key: `r-${c.id}-ell`, node: }); + } + }); + if (rightTruncated) { + items.push({ kind: 'connector', key: 'rell-comma', node: }); + items.push({ kind: 'connector', key: 'rell', node: }); + } + } + + // Renders a chart-thumbnail cluster for a single ribbon + // column. Two presentations: + // • strip (focused / current table) — N thumbnails laid + // out side-by-side, auto-scaled to chart count. + // • stacked (neighbour tables) — a constant-width "paper + // stack" card: the first chart on top, up to 2 faint + // layers peeking out behind, and a ×N badge when there's + // more than one. Keeps non-focused slots a uniform + // width so the ribbon stays compact. + const renderCluster = ( + charts: Chart[], + opts: { scale: number, maxVisible: number, dim?: boolean, stacked?: boolean }, + ) => { + if (charts.length === 0) return null; + const { scale, maxVisible, dim, stacked } = opts; + const imgMaxW = Math.round(140 * scale); + const imgMaxH = Math.round(96 * scale); + const boxMinW = Math.round(88 * scale); + const boxMinH = Math.round(68 * scale); + const skeletonPx = Math.round(44 * scale); + + const renderThumb = (chart: Chart) => { + const tpl = getChartTemplate(chart.chartType); + const label = chart.chartType; + const thumb = chartThumbnails[chart.id]; + const content = thumb ? ( + {label} + ) : ( + + {generateChartSkeleton(tpl?.icon, skeletonPx, skeletonPx, 0.4)} + + ); + return ( + dispatch(dfActions.setFocused({ type: 'chart', chartId: chart.id }))} + sx={{ + display: 'inline-flex', alignItems: 'center', justifyContent: 'center', + minWidth: boxMinW, minHeight: boxMinH, + p: 0.5, + border: `1px solid ${alpha(theme.palette.text.primary, 0.08)}`, + borderRadius: '6px', background: theme.palette.background.paper, + cursor: 'pointer', transition: transition.fast, + '&:hover': { + borderColor: 'primary.main', + boxShadow: '0 0 6px rgba(25, 118, 210, 0.25)', + }, + }} + > + {content} + + ); + }; + + if (stacked) { + const front = charts[0]; + const behindCount = Math.min(charts.length - 1, 2); + const offset = 4; // px per buried layer peeks out + // Fixed card dimensions so the front fully covers the + // behind layers — otherwise a wide thumbnail can + // outgrow the paper and the stack falls apart. + const cardW = Math.max(boxMinW, imgMaxW) + 8; + const cardH = Math.max(boxMinH, imgMaxH) + 8; + const totalW = cardW + behindCount * offset; + const totalH = cardH + behindCount * offset; + const cardSx = { + width: cardW, height: cardH, + border: `1px solid ${alpha(theme.palette.text.primary, 0.08)}`, + borderRadius: '6px', + background: theme.palette.background.paper, + boxSizing: 'border-box' as const, + }; + return ( + + {Array.from({ length: behindCount }).map((_, i) => { + // Farthest layer drawn first so the + // front lands on top. Slight rotation + // pivoting from the buried corner + // sells the "pile of paper" feel. + const reverseIdx = behindCount - i; + const off = reverseIdx * offset; + const angle = (reverseIdx % 2 === 0 ? 1 : -1) * (reverseIdx * 1.2); + return ( + + ); + })} + {/* Front card: a fixed-size, fully opaque + slot that buries the layers below. + The thumbnail/skeleton is clipped to + fit so nothing overflows. */} + dispatch(dfActions.setFocused({ type: 'chart', chartId: front.id }))} + sx={{ + ...cardSx, + position: 'absolute', left: 0, top: 0, + p: 0.5, m: 0, + display: 'flex', alignItems: 'center', justifyContent: 'center', + overflow: 'hidden', + cursor: 'pointer', + transition: transition.fast, + '&:hover': { + borderColor: 'primary.main', + boxShadow: '0 0 6px rgba(25, 118, 210, 0.25)', + }, + }} + > + {(() => { + const frontThumb = chartThumbnails[front.id]; + return frontThumb ? ( + {front.chartType} + ) : ( + generateChartSkeleton( + getChartTemplate(front.chartType)?.icon, + skeletonPx, skeletonPx, 0.4, + ) + ); + })()} + + {charts.length > 1 && ( + + {`×${charts.length}`} + + )} + ); + } + + // Strip mode (focused / current table). + const visible = charts.slice(0, maxVisible); + const overflow = charts.length - visible.length; + + return ( + + {visible.map(chart => ( + + {renderThumb(chart)} + + ))} + {overflow > 0 && ( + + + {`+${overflow}`} + + + )} + + ); + }; + + // Center cluster auto-scales with chart count; neighbour + // clusters are halved and dimmed to read as context. + const centerN = Math.min(chartsForTable(currentTable.id).length, 8); + const centerScale = centerN <= 3 ? 1 : centerN <= 5 ? 0.82 : 0.66; + const sideScale = 0.5; + + return ( + + {items.map(item => { + if (item.kind === 'children-fan') { + // Vertical fan-out: branches read top-down, + // with the FIRST branch aligned with the + // current node's label row. Trunk is drawn + // as a single absolutely-positioned line so + // the inter-row flex gap doesn't break it. + const MAX_BRANCHES = 4; + const shown = item.branches.slice(0, MAX_BRANCHES); + const hidden = item.branches.length - shown.length; + const totalRows = shown.length + (hidden > 0 ? 1 : 0); + const ELBOW_W = 22; + const LINE_COLOR = 'rgba(0,0,0,0.22)'; + const ROW_MIN_H = 22; + const ROW_GAP = 6; + const HALF = ROW_MIN_H / 2; // y-offset of any row's centerline + return ( + + {/* Entry stub from the current node: + sized to the column gap so it + sits cleanly in the whitespace + between the focused label and + the fan trunk, without bleeding + into the label glyphs. */} + + {/* Continuous trunk from first row's + centerline down to last row's + centerline, spanning the row + gaps so the connector reads as + one line. */} + {totalRows >= 2 && ( + + )} + {shown.map((c) => { + const hasGrandchildren = tables.some(t => t.derive?.trigger?.tableId === c.id); + return ( + + {/* Horizontal elbow stub from + trunk to this branch label. */} + + + + {hasGrandchildren && ( + <> + + + + )} + + + ); + })} + {hidden > 0 && ( + + + + +{hidden} more + + + )} + + ); } - }} - value={prompt} - placeholder={getQuestionPlaceholder()} - fullWidth - multiline - maxRows={4} - minRows={1} - /> - - - 0 ? t('chartRec.regenerateIdeas') : t('chartRec.getIdeas')}> - - getIdeasFromAgent()} - > - {isLoadingIdeas ? - - : } - - - - - {(ideas.length > 0 || thinkingBuffer || (isLoadingIdeas && ideaPhase)) && ( - - {ideas.map((idea, index) => ( - { - focusNextChartRef.current = true; - setPrompt(idea.text); - deriveDataFromNL(idea.text); - }} - disabled={isFormulating} - sx={{ - width: 'calc(50% - 16px)', - }} - /> - ))} - {isLoadingIdeas && thinkingBuffer && thinkingBufferEffect} - {isLoadingIdeas && !thinkingBuffer && ideaPhase && ( - - {ideaPhase === 'building_context' ? t('chartRec.progressBuildingContext') - : ideaPhase === 'generating' ? t('chartRec.progressGenerating') - : t('chartRec.generatingIdeas')} - {ideaElapsed > 0 ? ` (${ideaElapsed}s)` : ''} - - )} - - )} + return ( + + + {item.kind === 'connector' ? item.node : item.label} + + {item.kind === 'table' && ( + + {renderCluster(item.charts, item.current + ? { scale: centerScale, maxVisible: 8, stacked: true } + : { scale: sideScale, maxVisible: 3, dim: true, stacked: true })} + + )} + + ); + })} + + ); + })()} ); }; \ No newline at end of file diff --git a/src/views/ChartRenderService.tsx b/src/views/ChartRenderService.tsx index bae30946..efdf4d38 100644 --- a/src/views/ChartRenderService.tsx +++ b/src/views/ChartRenderService.tsx @@ -10,7 +10,9 @@ * * Results are stored in: * 1. Module-level chartCache (SVG + PNG) — for VisualizationView to read - * 2. chart.thumbnail in Redux (PNG data URL) — for DataThread tags + * 2. `state.chartThumbnails[chartId]` in Redux (PNG data URL) — for + * DataThread / ChartRecBox / etc. tags. Kept in its own slice + * so thumbnail updates don't invalidate the `charts` array reference. * * This eliminates redundant DOM-based Vega rendering in DataThread * and EncodingShelfThread, replacing heavy / embed() calls @@ -25,6 +27,7 @@ import { assembleVegaChart, prepVisTable } from '../app/utils'; import { buildEmbeddedDataForChart } from '../app/restyle'; import { getDataTable, checkChartAvailability } from './ChartUtils'; import { getCachedChart, setCachedChart, computeCacheKey, invalidateChart, ChartCacheEntry } from '../app/chartCache'; +import { displayRowsCache, computeDisplayRowsCacheKey } from '../app/displayRowsCache'; import { compile } from 'vega-lite'; import { parse, View } from 'vega'; import _ from 'lodash'; @@ -117,6 +120,14 @@ export const ChartRenderService: FC = () => { const conceptShelfItems = useSelector((state: DataFormulatorState) => state.conceptShelfItems); const chartSynthesisInProgress = useSelector((state: DataFormulatorState) => state.chartSynthesisInProgress); const maxStretchFactor = useSelector((state: DataFormulatorState) => state.config.maxStretchFactor); + // Re-run when the focused canvas caches a fresh display-row sample so + // thumbnails can use the same richer data the main chart is rendering. + const displayRowsTick = useSelector((state: DataFormulatorState) => state.displayRowsTick); + // Read the thumbnails map via a ref so we can check current values inside + // the effect without adding the map to the dep list (the dispatch we + // issue below mutates it, and including it would re-enter the effect). + const chartThumbnailsRef = useRef>({}); + chartThumbnailsRef.current = useSelector((state: DataFormulatorState) => state.chartThumbnails) || {}; // Track which charts are currently being rendered to avoid duplicates const renderingRef = useRef>(new Set()); @@ -133,9 +144,17 @@ export const ChartRenderService: FC = () => { try { // --- Prepare data (mirror MemoizedChartObject's pipeline) --- - // Use all rows so the thumbnail faithfully matches the main chart's - // appearance (sort order, aggregation, overflow filtering, etc.). - let visTableRows: any[] = structuredClone(table.rows); + // Prefer the same sample VisualizationView fetched for the + // focused canvas — for virtual tables `table.rows` is only a + // small preview slice, so rendering from it produces a + // thumbnail that doesn't match the main chart. The canvas + // populates `displayRowsCache` with up to 1000 server-sampled + // rows; reuse that when present. + const dispKey = computeDisplayRowsCacheKey(table, chart, items); + const cachedDisplay = displayRowsCache.get(dispKey); + let visTableRows: any[] = cachedDisplay + ? structuredClone(cachedDisplay.rows) + : structuredClone(table.rows); // Pre-aggregate for the encoding map visTableRows = prepVisTable(visTableRows, items, chart.encodingMap); @@ -160,8 +179,6 @@ export const ChartRenderService: FC = () => { chart, visTableRows, table.metadata, items, ); fullSpec.data = { values: variantValues }; - fullSpec.width = FULL_WIDTH; - fullSpec.height = FULL_HEIGHT; } else { fullSpec = assembleVegaChart( chart.chartType, @@ -245,12 +262,21 @@ export const ChartRenderService: FC = () => { const activeVariant = chart.activeVariantId ? chart.styleVariants?.find(v => v.id === chart.activeVariantId) : undefined; + // Mix in the canvas's display-row sample fingerprint so a + // thumbnail rendered from the preview slice gets re-rendered + // once the richer server sample arrives in `displayRowsCache`. + const displayEntry = displayRowsCache.get( + computeDisplayRowsCacheKey(table, chart, conceptShelfItems), + ); + const displayFingerprint = displayEntry + ? `disp:${displayEntry.rows.length}/${displayEntry.totalCount}` + : `preview:${table.rows.length}`; const cacheKey = computeCacheKey( chart.chartType, chart.encodingMap, chart.config, table.rows.length, - table.contentHash, + `${table.contentHash || ''}|${displayFingerprint}`, table.id, table.metadata, activeVariant?.id, @@ -259,9 +285,11 @@ export const ChartRenderService: FC = () => { const cached = getCachedChart(chart.id); if (cached && cached.specKey === cacheKey) { - // Already up-to-date — but ensure Redux thumbnail is set - // (e.g., after page reload where module cache is cleared but Redux persisted) - if (!chart.thumbnail || chart.thumbnail !== cached.thumbnailDataUrl) { + // Already up-to-date — but ensure the Redux thumbnail slice + // matches (e.g. after a page reload where the module cache + // is repopulated but the slice was reset to {}). + const current = chartThumbnailsRef.current[chart.id]; + if (!current || current !== cached.thumbnailDataUrl) { dispatch(dfActions.updateChartThumbnail({ chartId: chart.id, thumbnail: cached.thumbnailDataUrl, @@ -291,7 +319,7 @@ export const ChartRenderService: FC = () => { return () => { cancelled = true; }; } - }, [charts, tables, conceptShelfItems, chartSynthesisInProgress, processChart, dispatch]); + }, [charts, tables, conceptShelfItems, chartSynthesisInProgress, displayRowsTick, processChart, dispatch]); // This component renders nothing return null; diff --git a/src/views/ChartUtils.tsx b/src/views/ChartUtils.tsx index 6cdad248..18ab1dd4 100644 --- a/src/views/ChartUtils.tsx +++ b/src/views/ChartUtils.tsx @@ -46,10 +46,7 @@ export let getDataTable = (chart: Chart, tables: DictTable[], charts: Chart[], return activeFields.every(f => t.names.includes(f.name)); }); - let confirmedTableCandidates = workingTableCandidates.filter(t => !charts.some(c => c.saved && c.tableRef == t.id)); - if(confirmedTableCandidates.length > 0) { - return confirmedTableCandidates[0]; - } else if (workingTableCandidates.length > 0) { + if(workingTableCandidates.length > 0) { return workingTableCandidates[0]; } else { // sort base tables based on how many active fields are covered by existing tables diff --git a/src/views/ClarificationPanel.tsx b/src/views/ClarificationPanel.tsx deleted file mode 100644 index 92e33878..00000000 --- a/src/views/ClarificationPanel.tsx +++ /dev/null @@ -1,301 +0,0 @@ -import React, { FC, useEffect, useRef, useState } from 'react'; -import { Box, Collapse, IconButton, Tooltip, Typography, useTheme } from '@mui/material'; -import { alpha } from '@mui/material/styles'; -import SmartToyOutlinedIcon from '@mui/icons-material/SmartToyOutlined'; -import DeleteOutlineIcon from '@mui/icons-material/DeleteOutline'; -import UnfoldLessIcon from '@mui/icons-material/UnfoldLess'; -import UnfoldMoreIcon from '@mui/icons-material/UnfoldMore'; -import { useTranslation } from 'react-i18next'; -import { - ClarificationQuestion, - ClarificationResponse, -} from '../components/ComponentType'; -import { renderFieldHighlights } from './InteractionEntryCard'; - -interface ClarificationPanelProps { - questions: ClarificationQuestion[]; - /** - * 'clarify' (default) — agent is asking the user a question (warning palette). - * 'explain' — agent gave an answer; options are suggested chart - * follow-ups the user can click (info palette). - * - * Both variants share the same simplified layout: a small header with a - * close (×) icon, the question/explanation text, and clickable options. - * Long-form replies happen in the main chat box below the panel. - */ - variant?: 'clarify' | 'explain'; - /** - * Optional. Currently selected answer per question (keyed by question - * index). When provided together with `onSelectAnswer`, the panel will - * route option clicks (and free-text Enter) through `onSelectAnswer` - * instead of submitting immediately. The parent decides when to submit - * (e.g. after all questions are answered). - */ - selectedAnswers?: Record; - onSelectAnswer?: (questionIndex: number, response: ClarificationResponse) => void; - onSubmit: (responses: ClarificationResponse[]) => void; - onCancel: () => void; -} - -export const ClarificationPanel: FC = ({ - questions, - variant = 'clarify', - selectedAnswers, - onSelectAnswer, - onSubmit, - onCancel, -}) => { - const theme = useTheme(); - const { t } = useTranslation(); - const submittedRef = useRef(false); - // Local minimize state — collapses the panel to a single header row - // (still visible so the user can come back and answer). Distinct from - // dismiss/cancel (the X icon) which actually drops the pause. - const [minimized, setMinimized] = useState(false); - - // Reset minimize whenever the underlying questions change so a brand-new - // clarify pause shows up expanded by default. - useEffect(() => { setMinimized(false); }, [questions]); - - const submitResponses = (responses: ClarificationResponse[]) => { - if (responses.length === 0 || submittedRef.current) return; - submittedRef.current = true; - onSubmit(responses); - }; - - /** - * Handle a single answer (option click or free-text Enter). When the - * parent provides `onSelectAnswer`, the panel defers to it and the - * parent decides when to actually submit (e.g. after all questions - * have been answered). Otherwise we fall back to the legacy "click = - * submit immediately" behavior. - */ - const handleAnswer = (response: ClarificationResponse) => { - if (onSelectAnswer) { - onSelectAnswer(response.question_index, response); - return; - } - submitResponses([response]); - }; - - useEffect(() => { - submittedRef.current = false; - }, [questions]); - - // Variant-specific styling. Explain uses the info palette; clarify uses warning. - const isExplain = variant === 'explain'; - const accentColor = isExplain ? theme.palette.info.main : theme.palette.warning.main; - const headerKey = isExplain ? 'chartRec.explanationTitle' : 'chartRec.clarificationTitle'; - - return ( - - setMinimized(prev => !prev)} - sx={{ - display: 'flex', alignItems: 'center', gap: '6px', minHeight: 16, - cursor: 'pointer', - // Stretch hover background to the panel's full content - // width by extending past the parent's px: 0.5 padding, - // then re-add it on the inside. - px: 0.5, mx: -0.5, py: '6px', - '&:hover': { backgroundColor: alpha(accentColor, 0.06) }, - }} - > - - {minimized ? ( - // Collapsed header: muted variant label + a short preview of - // the first question/explanation so the user keeps context. - - - {t(headerKey)} - - - {(questions[0]?.text || '').slice(0, 120)} - - - ) : ( - - {t(headerKey)} - - )} - - { e.stopPropagation(); onCancel(); }} - sx={{ - p: 0, width: 16, height: 16, - color: theme.palette.text.disabled, - '&:hover': { color: theme.palette.error.main }, - }} - > - - - - - - {minimized - ? - : } - - - - - - - {questions.map((question, questionIndex) => ( - - {/* Text portion is height-bounded and scrollable so very - long explanations don't push options off-screen. - Options remain fixed below the scrolling region. - Bounds adapt to viewport: ~8 lines min, ~16 max, - scaling with viewport height in between (1 line ≈ 18px - at 12px / 1.5 line-height). */} - - - {!isExplain && questions.length > 1 && ( - <> - {t('chartRec.clarificationQuestionLabel', { index: questionIndex + 1 })}{' '} - - )} - {renderFieldHighlights(question.text, alpha(accentColor, 0.06))} - - - - {question.responseType === 'free_text' ? ( - // Free-text questions don't render their own input. - // The user types the answer in the main chat box - // below and hits Send (or Enter). We show a small - // hint so the affordance is clear. - - {t('chartRec.freeTextClarificationHint')} - - ) : ( - - {isExplain && (question.options || []).length > 0 && ( - - {t('chartRec.explanationFollowupsLabel')} - - )} - - {(question.options || []).map((option, optionIndex) => { - const selected = selectedAnswers?.[questionIndex]; - const isSelected = !!selected && selected.answer === option.label; - return ( - - handleAnswer({ - question_index: questionIndex, - answer: option.label, - source: 'option', - })} - sx={{ - position: 'relative', zIndex: 1, - px: '8px', py: '4px', - borderRadius: '6px', - border: `1px solid ${isSelected ? alpha(accentColor, 0.6) : alpha(theme.palette.text.primary, 0.12)}`, - backgroundColor: isSelected ? alpha(accentColor, 0.12) : theme.palette.background.paper, - cursor: 'pointer', - fontSize: 11, - fontWeight: isSelected ? 600 : 400, - // Inline-block so options can flex-wrap and size to content. - // Long labels still wrap inside the button if needed. - display: 'inline-block', - whiteSpace: 'normal', - wordBreak: 'break-word', - lineHeight: 1.4, - color: theme.palette.text.primary, - textAlign: 'left', - fontFamily: theme.typography.fontFamily, - '&:hover': { backgroundColor: alpha(accentColor, isSelected ? 0.16 : 0.08) }, - }} - > - {option.label} - - - ); - })} - - - )} - - ))} - - - - ); -}; diff --git a/src/views/ColumnFilterPopover.tsx b/src/views/ColumnFilterPopover.tsx new file mode 100644 index 00000000..98bbff7d --- /dev/null +++ b/src/views/ColumnFilterPopover.tsx @@ -0,0 +1,630 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Column filter popover for `SelectableDataGrid` (design-doc 31). + * + * The popover variant is chosen synchronously from `metadata.distinctCount`: + * - numeric / temporal column → range form + * - low-cardinality column (`distinctCount <= LEVELS_LIMIT`) → checklist + * - everything else → keyword (case-insensitive substring) form + * + * Apply submits the new filter (or `undefined` to clear) and closes the + * popover. Clear clears in place. + * + * The three wire shapes returned via `onApply` match the backend vocabulary + * (see `_build_filter_where_duckdb` / `_apply_filters_pandas`). + */ + +import * as React from 'react'; +import { useTranslation } from 'react-i18next'; +import { + Box, Button, Checkbox, Divider, FormControlLabel, ListItemButton, + Popover, Stack, TextField, Typography, +} from '@mui/material'; +import ArrowUpwardIcon from '@mui/icons-material/ArrowUpward'; +import ArrowDownwardIcon from '@mui/icons-material/ArrowDownward'; +import { Type } from '../data/types'; +import { formatCellValue } from './ViewUtils'; + +/** Wire shapes — must match the backend filter vocabulary. */ +export type RangeFilter = { + op: 'range'; + field: string; + min?: number | string | null; + max?: number | string | null; + include_nulls?: boolean; +}; +export type InFilter = { + op: 'in'; + field: string; + values: Array; +}; +export type ContainsFilter = { + op: 'contains'; + field: string; + value: string; +}; +export type ColumnFilter = RangeFilter | InFilter | ContainsFilter; + +const LEVELS_LIMIT = 100; + +// Shared styling for the section-title dividers (label sits on the rule, +// uppercase + muted to read as a section break, not a heading). +const SECTION_DIVIDER_SX = { + mx: 1.25, + my: 0.5, + '&::before, &::after': { borderColor: 'divider' }, + '& .MuiDivider-wrapper': { px: 0.75 }, +} as const; + +const isNumericType = (t: Type | undefined): boolean => + t === Type.Integer || t === Type.Number; +const isTemporalType = (t: Type | undefined): boolean => + t === Type.Date || t === Type.DateTime; + +/** Decide which sub-form to show for a column. */ +function pickVariant( + dataType: Type | undefined, + distinctCount: number | undefined, +): 'range' | 'checklist' | 'keyword' { + if (isNumericType(dataType) || isTemporalType(dataType)) return 'range'; + if (distinctCount !== undefined && distinctCount > 0 && distinctCount <= LEVELS_LIMIT) { + return 'checklist'; + } + return 'keyword'; +} + +interface ColumnFilterPopoverProps { + anchor: HTMLElement | null; + open: boolean; + onClose: () => void; + columnId: string; + columnLabel: string; + dataType: Type | undefined; + rowCount?: number; + distinctCount?: number; + nullCount?: number; + levels?: any[]; + levelCounts?: number[]; + currentFilter?: ColumnFilter; + onApply: (filter: ColumnFilter | undefined) => void; + // Sort surface (rendered above the filter form as a unified column panel). + isSorted: boolean; + sortOrder: 'asc' | 'desc'; + onSortAsc: () => void; + onSortDesc: () => void; + onClearSort: () => void; +} + +export const ColumnFilterPopover: React.FC = ({ + anchor, open, onClose, columnId, columnLabel, dataType, + rowCount, distinctCount, nullCount, levels, levelCounts, currentFilter, onApply, + isSorted, sortOrder, onSortAsc, onSortDesc, onClearSort, +}) => { + const { t } = useTranslation(); + const variant = pickVariant(dataType, distinctCount); + + const close = () => onClose(); + + // Build the column-summary line ("N rows · N distinct · N blanks"). + const summaryParts: string[] = []; + if (rowCount !== undefined) { + summaryParts.push(t('dataGrid.filter.summaryRows', { count: rowCount })); + } + if (distinctCount !== undefined) { + summaryParts.push(t('dataGrid.filter.summaryDistinct', { count: distinctCount })); + } + if (nullCount !== undefined && nullCount > 0) { + summaryParts.push(t('dataGrid.filter.summaryBlanks', { count: nullCount })); + } + const summary = summaryParts.join(' · '); + + return ( + + {/* Section 1 — column summary. */} + + + {columnLabel} + + {summary && ( + + {summary} + + )} + + + + {t('dataGrid.filter.sectionSort')} + + + {/* Section 2 — sort actions. */} + + } + label={t('dataGrid.columnMenu.sortAsc')} + selected={isSorted && sortOrder === 'asc'} + onClick={() => { onSortAsc(); close(); }} + /> + } + label={t('dataGrid.columnMenu.sortDesc')} + selected={isSorted && sortOrder === 'desc'} + onClick={() => { onSortDesc(); close(); }} + /> + { onClearSort(); close(); }} + /> + + + + {t('dataGrid.filter.sectionFilter')} + + + {/* Section 3 — filter form. */} + + {variant === 'range' && ( + { onApply(f); close(); }} + /> + )} + {variant === 'checklist' && ( + { onApply(f); close(); }} + /> + )} + {variant === 'keyword' && ( + { onApply(f); close(); }} + /> + )} + + + ); +}; + + +// ────────────────────────────────────────────────────────────────────────── +// Range form (numeric / date) +// ────────────────────────────────────────────────────────────────────────── + +interface RangeFormProps { + columnId: string; + dataType: Type | undefined; + nullCount?: number; + currentFilter?: ColumnFilter; + onApply: (filter: ColumnFilter | undefined) => void; +} + +const RangeFilterForm: React.FC = ({ + columnId, dataType, nullCount, currentFilter, onApply, +}) => { + const { t } = useTranslation(); + const isDate = dataType === Type.Date; + const isDateTime = dataType === Type.DateTime; + const isNumeric = isNumericType(dataType); + // For numeric columns we use type="text" + inputMode="decimal" instead of + // type="number" — the native spinner buttons are visually awkward in a + // narrow popover and add no value when bounds can be typed directly. + const inputType = isDate ? 'date' : (isDateTime ? 'datetime-local' : 'text'); + const inputMode: React.HTMLAttributes['inputMode'] | undefined = + isNumeric ? 'decimal' : undefined; + + const rangeFilter = currentFilter?.op === 'range' ? currentFilter : undefined; + const isBlanksFilter = + currentFilter?.op === 'in' && + currentFilter.values.length === 1 && + currentFilter.values[0] === null; + + const [minStr, setMinStr] = React.useState( + rangeFilter?.min != null ? String(rangeFilter.min) : '', + ); + const [maxStr, setMaxStr] = React.useState( + rangeFilter?.max != null ? String(rangeFilter.max) : '', + ); + // "Show blanks only" checkbox — when checked, the filter is an `in [null]` + // and the min/max bounds are ignored. + const [blanksOnly, setBlanksOnly] = React.useState(isBlanksFilter); + + const buildFilter = (): ColumnFilter | undefined => { + if (blanksOnly) { + return { op: 'in', field: columnId, values: [null] }; + } + const hasMin = minStr.trim() !== ''; + const hasMax = maxStr.trim() !== ''; + if (!hasMin && !hasMax) return undefined; + let minVal: number | string | undefined; + let maxVal: number | string | undefined; + if (isNumeric) { + // type="text" + inputMode="decimal" allows arbitrary strings — + // drop a bound silently if it isn't a finite number. + const n1 = hasMin ? Number(minStr) : NaN; + const n2 = hasMax ? Number(maxStr) : NaN; + minVal = Number.isFinite(n1) ? n1 : undefined; + maxVal = Number.isFinite(n2) ? n2 : undefined; + } else { + minVal = hasMin ? minStr : undefined; + maxVal = hasMax ? maxStr : undefined; + } + if (minVal === undefined && maxVal === undefined) return undefined; + return { + op: 'range', + field: columnId, + ...(minVal !== undefined ? { min: minVal as any } : {}), + ...(maxVal !== undefined ? { max: maxVal as any } : {}), + }; + }; + + return ( + + + setMinStr(e.target.value)} + onKeyDown={(e) => { if (e.key === 'Enter') onApply(buildFilter()); }} + autoComplete="off" + slotProps={{ + inputLabel: { shrink: true }, + htmlInput: { + style: { fontSize: 12 }, + autoComplete: 'off', + name: `df-filter-from-${columnId}`, + ...(inputMode ? { inputMode } : {}), + }, + }} + sx={{ '& .MuiInputBase-root': { fontSize: 12 }, '& .MuiInputLabel-root': { fontSize: 11 } }} + /> + setMaxStr(e.target.value)} + onKeyDown={(e) => { if (e.key === 'Enter') onApply(buildFilter()); }} + autoComplete="off" + slotProps={{ + inputLabel: { shrink: true }, + htmlInput: { + style: { fontSize: 12 }, + autoComplete: 'off', + name: `df-filter-to-${columnId}`, + ...(inputMode ? { inputMode } : {}), + }, + }} + sx={{ '& .MuiInputBase-root': { fontSize: 12 }, '& .MuiInputLabel-root': { fontSize: 11 } }} + /> + + {(nullCount === undefined || nullCount > 0) && ( + setBlanksOnly(e.target.checked)} + sx={{ p: 0.25 }} + /> + } + label={ + + {t('dataGrid.filter.showBlanksOnly')} + {nullCount !== undefined ? ` (${nullCount.toLocaleString()})` : ''} + + } + sx={{ ml: 0, mt: 0.25 }} + /> + )} + onApply(buildFilter())} + onClear={() => onApply(undefined)} + clearDisabled={!currentFilter} + /> + + ); +}; + + +// ────────────────────────────────────────────────────────────────────────── +// Checklist form (categorical, ≤ 100 distinct values) +// ────────────────────────────────────────────────────────────────────────── + +interface ChecklistFormProps { + columnId: string; + dataType: Type | undefined; + nullCount?: number; + levels: any[]; + levelCounts?: number[]; + currentFilter?: InFilter; + onApply: (filter: InFilter | undefined) => void; +} + +const ChecklistFilterForm: React.FC = ({ + columnId, dataType, nullCount, levels, levelCounts, currentFilter, onApply, +}) => { + const { t } = useTranslation(); + const showCounts = Array.isArray(levelCounts) && levelCounts.length === levels.length; + const includeNullRow = nullCount === undefined || nullCount > 0; + + // Initial selection: all levels (and blank row when shown) selected when + // no filter is active, else honour the existing filter. + const initial = React.useMemo>(() => { + if (currentFilter && Array.isArray(currentFilter.values)) { + return new Set(currentFilter.values); + } + const s = new Set(levels); + if (includeNullRow) s.add(null); + return s; + }, [currentFilter, levels, includeNullRow]); + + const [selected, setSelected] = React.useState>(initial); + + const allSelected = selected.size === levels.length + (includeNullRow ? 1 : 0); + + const toggle = (val: any) => { + setSelected(prev => { + const next = new Set(prev); + if (next.has(val)) next.delete(val); else next.add(val); + return next; + }); + }; + const toggleAll = () => { + if (allSelected) { + setSelected(new Set()); + } else { + const all = new Set(levels); + if (includeNullRow) all.add(null); + setSelected(all); + } + }; + + const buildFilter = (): InFilter | undefined => { + if (allSelected) return undefined; + return { + op: 'in', + field: columnId, + values: Array.from(selected), + }; + }; + + return ( + + + {t('dataGrid.filter.selectAll')}} + checked={allSelected} + indeterminate={!allSelected && selected.size > 0} + onToggle={toggleAll} + /> + {includeNullRow && ( + {t('dataGrid.filter.blank')}} + count={showCounts ? nullCount : undefined} + checked={selected.has(null)} + onToggle={() => toggle(null)} + /> + )} + {levels.map((v, idx) => { + const c = showCounts ? levelCounts![idx] : undefined; + return ( + toggle(v)} + /> + ); + })} + + onApply(buildFilter())} + onClear={() => onApply(undefined)} + clearDisabled={!currentFilter} + /> + + ); +}; + +const ChecklistRow: React.FC<{ + label: React.ReactNode; + count?: number; + checked: boolean; + indeterminate?: boolean; + onToggle: () => void; +}> = ({ label, count, checked, indeterminate, onToggle }) => ( + + e.stopPropagation()} + onChange={onToggle} + /> + + {label} + + {count !== undefined && ( + + {count.toLocaleString()} + + )} + +); + + +// ────────────────────────────────────────────────────────────────────────── +// Keyword form (case-insensitive substring) +// ────────────────────────────────────────────────────────────────────────── + +interface KeywordFormProps { + columnId: string; + currentFilter?: ContainsFilter; + onApply: (filter: ContainsFilter | undefined) => void; + +} + +const KeywordFilterForm: React.FC = ({ + columnId, currentFilter, onApply, +}) => { + const { t } = useTranslation(); + const [value, setValue] = React.useState(currentFilter?.value || ''); + + const buildFilter = (): ContainsFilter | undefined => { + const v = value.trim(); + if (!v) return undefined; + return { op: 'contains', field: columnId, value: v }; + }; + + return ( + + setValue(e.target.value)} + onKeyDown={(e) => { if (e.key === 'Enter') onApply(buildFilter()); }} + autoComplete="off" + slotProps={{ htmlInput: { style: { fontSize: 12 }, autoComplete: 'off', name: `df-filter-contains-${columnId}` } }} + sx={{ '& .MuiInputBase-root': { fontSize: 12 } }} + /> + onApply(buildFilter())} + onClear={() => onApply(undefined)} + clearDisabled={!currentFilter} + /> + + ); +}; + + +// ────────────────────────────────────────────────────────────────────────── +// Apply / Clear footer +// ────────────────────────────────────────────────────────────────────────── + +const SortActionRow: React.FC<{ + icon?: React.ReactNode; + label: string; + onClick: () => void; + selected?: boolean; + disabled?: boolean; +}> = ({ icon, label, onClick, selected, disabled }) => ( + + + {icon} + + + {label} + + +); + +const FilterActions: React.FC<{ + onApply: () => void; + onClear: () => void; + clearDisabled?: boolean; +}> = ({ onApply, onClear, clearDisabled }) => { + const { t } = useTranslation(); + return ( + + + + + + ); +}; diff --git a/src/views/DataFormulator.tsx b/src/views/DataFormulator.tsx index 2c4e53c8..2c21c15b 100644 --- a/src/views/DataFormulator.tsx +++ b/src/views/DataFormulator.tsx @@ -90,6 +90,7 @@ export const DataFormulatorFC = ({ }) => { const viewMode = useSelector((state: DataFormulatorState) => state.viewMode); const serverConfig = useSelector((state: DataFormulatorState) => state.serverConfig); const identityKey = useSelector((state: DataFormulatorState) => `${state.identity.type}:${state.identity.id}`); + const dataLoadingChatMessages = useSelector((state: DataFormulatorState) => state.dataLoadingChatMessages); const theme = useTheme(); const dispatch = useDispatch(); @@ -160,17 +161,18 @@ export const DataFormulatorFC = ({ }) => { return onWorkspaceListChanged(fetchWorkspaces); }, [fetchWorkspaces]); - const handleOpenWorkspace = useCallback(async (name: string) => { - dispatch(dfActions.setSessionLoading({ loading: true, label: `Opening workspace...` })); + const handleOpenWorkspace = useCallback(async (name: string, metaDisplayName?: string) => { + dispatch(dfActions.setSessionLoading({ loading: true, label: t('workspace.openingWorkspace') })); try { const result = await loadWorkspace(name); if (result && Object.keys(result.state).length > 0) { - dispatch(dfActions.loadState({ ...result.state, activeWorkspace: { id: name, displayName: result.displayName } })); + const displayName = metaDisplayName || result.displayName; + dispatch(dfActions.loadState({ ...result.state, activeWorkspace: { id: name, displayName } })); } else { - dispatch(dfActions.setActiveWorkspace({ id: name, displayName: 'Untitled Session' })); + dispatch(dfActions.setActiveWorkspace({ id: name, displayName: metaDisplayName || 'Untitled Session' })); } } catch { - dispatch(dfActions.setActiveWorkspace({ id: name, displayName: 'Untitled Session' })); + dispatch(dfActions.setActiveWorkspace({ id: name, displayName: metaDisplayName || 'Untitled Session' })); } dispatch(dfActions.setSessionLoading({ loading: false })); }, [dispatch]); @@ -182,7 +184,7 @@ export const DataFormulatorFC = ({ }) => { } catch { dispatch(dfActions.addMessages({ timestamp: Date.now(), type: 'error', - component: 'workspace', value: 'Failed to delete workspace', + component: 'workspace', value: t('workspace.deleteFailed'), })); } setConfirmDeleteWs(null); @@ -219,42 +221,50 @@ export const DataFormulatorFC = ({ }) => { } catch { dispatch(dfActions.addMessages({ timestamp: Date.now(), type: 'error', - component: 'workspace', value: 'Failed to rename workspace', + component: 'workspace', value: t('workspace.renameFailed'), })); // On failure, refetch so the UI returns to the server's truth. fetchWorkspaces(); } }, [renamingWs, renameDraft, savedWorkspaces, cancelRenameWorkspace, dispatch, fetchWorkspaces]); - const handleExportWorkspace = useCallback(async (name: string) => { + const handleExportWorkspace = useCallback(async (id: string) => { try { - const blob = await exportWorkspace(name); + const blob = await exportWorkspace(id); + const ws = savedWorkspaces.find(w => w.id === id); + const fileName = ws?.display_name || id; const a = document.createElement('a'); a.href = URL.createObjectURL(blob); - a.download = `${name}.zip`; + a.download = `${fileName}.zip`; a.click(); URL.revokeObjectURL(a.href); } catch (e) { console.warn('Failed to export workspace:', e); } - }, []); + }, [savedWorkspaces]); const importRef = useRef(null); const handleImportWorkspace = useCallback(async (event: React.ChangeEvent) => { const file = event.target.files?.[0]; if (!file) return; - dispatch(dfActions.setSessionLoading({ loading: true, label: `Importing ${file.name}...` })); + dispatch(dfActions.setSessionLoading({ loading: true, label: t('workspace.importingFile', { name: file.name }) })); try { const wsName = file.name.replace(/\.zip$/, '') || 'imported'; const wsId = generateSessionId(); const state = await importWorkspace(file, wsId, wsName); - dispatch(dfActions.loadState({ ...state, activeWorkspace: { id: wsId, displayName: wsName } })); + const restoredName = (state as any).activeWorkspace?.displayName || wsName; + dispatch(dfActions.loadState({ ...state, activeWorkspace: { id: wsId, displayName: restoredName } })); } catch (e) { console.warn('Failed to import workspace:', e); + dispatch(dfActions.addMessages({ + timestamp: Date.now(), type: 'error', + component: 'workspace', + value: t('workspace.importFailed'), + })); } dispatch(dfActions.setSessionLoading({ loading: false })); if (importRef.current) importRef.current.value = ''; - }, [dispatch]); + }, [dispatch, t]); // Sorted view of saved workspaces. We don't mutate the underlying // list (the backend's response is the source of truth); we just @@ -291,20 +301,39 @@ export const DataFormulatorFC = ({ }) => { // State for unified data upload dialog const [uploadDialogOpen, setUploadDialogOpen] = useState(false); const [uploadDialogInitialTab, setUploadDialogInitialTab] = useState('menu'); + const [uploadDialogInitialChatPrompt, setUploadDialogInitialChatPrompt] = useState(undefined); + const [uploadDialogInitialChatImages, setUploadDialogInitialChatImages] = useState(undefined); // Loading state for sessions (from Redux, shared with App.tsx) const sessionLoading = useSelector((state: DataFormulatorState) => state.sessionLoading); const sessionLoadingLabel = useSelector((state: DataFormulatorState) => state.sessionLoadingLabel); - const openUploadDialog = (tab: UploadTabType) => { + const openUploadDialog = (tab: UploadTabType, initialChatPrompt?: string, initialChatImages?: string[]) => { // If no workspace is active, generate an ID (backend creates folder lazily on first data op) if (!activeWorkspace) { dispatch(dfActions.setActiveWorkspace({ id: generateSessionId(), displayName: 'Untitled Session' })); } setUploadDialogInitialTab(tab); + setUploadDialogInitialChatPrompt(initialChatPrompt); + setUploadDialogInitialChatImages(initialChatImages); setUploadDialogOpen(true); }; + // Honor cross-component requests to hand off to the Data Loading + // chat seeded with a prompt (e.g. Data Agent's `delegate` card with + // target='data_loading'). Hand-offs targeting other agents (e.g. + // `report_gen`) are consumed elsewhere — we only clear our own. + const agentHandoffRequest = useSelector((state: DataFormulatorState) => state.agentHandoffRequest); + useEffect(() => { + if (agentHandoffRequest && agentHandoffRequest.target === 'data_loading') { + openUploadDialog('extract', agentHandoffRequest.prompt, agentHandoffRequest.images); + dispatch(dfActions.clearAgentHandoffRequest()); + } + // openUploadDialog is stable enough for this purpose; we only react + // to changes in the handoff request itself. + // eslint-disable-next-line react-hooks/exhaustive-deps + }, [agentHandoffRequest]); + const handleLoadExampleSession = async (session: ExampleSession) => { dispatch(dfActions.setSessionLoading({ loading: true, label: t('messages.loadingExample', { title: session.title }) })); @@ -701,8 +730,10 @@ export const DataFormulatorFC = ({ }) => { openUploadDialog(`connector:${conn.id}` as UploadTabType); } }} + onStartChat={(prompt, images) => openUploadDialog('extract', prompt, images)} + hasPriorConversation={dataLoadingChatMessages.length > 0} + onResumeChat={() => openUploadDialog('extract')} serverConfig={serverConfig} - variant="page" connectors={pageConnectors} /> @@ -735,7 +766,7 @@ export const DataFormulatorFC = ({ }) => { - Your Sessions + {t('workspace.yourSessions')} {/* Sort control — placed in the upper-right of the section @@ -760,18 +791,18 @@ export const DataFormulatorFC = ({ }) => { }} renderValue={(v) => { const labels: Record = { - created_desc: 'newest', - created_asc: 'oldest', - updated_desc: 'recently modified', - name_asc: 'name', + created_desc: t('workspace.sortNewest'), + created_asc: t('workspace.sortOldest'), + updated_desc: t('workspace.sortRecentlyModified'), + name_asc: t('workspace.sortName'), }; return labels[v as typeof wsSort]; }} > - newest first - oldest first - recently modified - name (a–z) + {t('workspace.sortNewestFirst')} + {t('workspace.sortOldestFirst')} + {t('workspace.sortRecentlyModifiedFirst')} + {t('workspace.sortNameAsc')} { {sortedSavedWorkspaces.map(w => { const isRenaming = renamingWs === w.id; return ( - handleOpenWorkspace(w.id)} sx={{ + handleOpenWorkspace(w.id, w.display_name)} sx={{ position: 'relative', textAlign: 'left', cursor: isRenaming ? 'default' : 'pointer', '&:hover': isRenaming ? {} : { transform: 'translateY(-2px)', backgroundColor: 'action.hover' }, @@ -827,19 +858,19 @@ export const DataFormulatorFC = ({ }) => { opacity: 0, transition: 'opacity 0.15s', }}> - + { e.stopPropagation(); startRenameWorkspace(w.id, w.display_name); }}> - + { e.stopPropagation(); handleExportWorkspace(w.id); }}> - + { e.stopPropagation(); setConfirmDeleteWs(w.id); }}> @@ -858,24 +889,27 @@ export const DataFormulatorFC = ({ }) => { '&:hover': { transform: 'translateY(-2px)', backgroundColor: 'action.hover' }, }}> - Import workspace (.zip) + {t('workspace.importZip')} {/* ── Delete workspace confirmation ────────────────────── */} setConfirmDeleteWs(null)}> - Delete session? + {t('workspace.deleteTitle')} - - This will permanently delete {savedWorkspaces.find(w => w.id === confirmDeleteWs)?.display_name || confirmDeleteWs}{' '} - ({confirmDeleteWs}) and all its data. - + w.id === confirmDeleteWs)?.display_name || confirmDeleteWs, + id: confirmDeleteWs, + interpolation: { escapeValue: false }, + }), + }} /> - + @@ -897,8 +931,18 @@ export const DataFormulatorFC = ({ }) => { )} { setUploadDialogOpen(false); refreshPageConnectors(); }} + onClose={() => { + setUploadDialogOpen(false); + // Clear one-shot seed values so the next dialog + // open (e.g. via the upload button) doesn't + // re-fire the agent with a stale prompt/image. + setUploadDialogInitialChatPrompt(undefined); + setUploadDialogInitialChatImages(undefined); + refreshPageConnectors(); + }} initialTab={uploadDialogInitialTab} + initialChatPrompt={uploadDialogInitialChatPrompt} + initialChatImages={uploadDialogInitialChatImages} onConnectorsChanged={handleConnectorsChanged} /> {/* Loading overlay for session loading */} diff --git a/src/views/DataFrameTable.tsx b/src/views/DataFrameTable.tsx index b17f0dc4..dd32ecc9 100644 --- a/src/views/DataFrameTable.tsx +++ b/src/views/DataFrameTable.tsx @@ -42,7 +42,9 @@ export interface DataFrameTableProps { * When true, columns size to content (CSS `tableLayout: auto`, * `width: max-content`) instead of stretching to fill the container. * Use for previews inside containers that should adapt to the table's - * natural width rather than dictate it. + * natural width rather than dictate it. The table still stretches to + * `min-width: 100%` of its container, so a narrow table doesn't leave + * empty space when the container has a minimum width of its own. */ autoWidth?: boolean; } @@ -92,6 +94,7 @@ export const DataFrameTable: React.FC = ({ fontSize, fontFamily: CODE_FONT, width: autoWidth ? 'max-content' : '100%', + minWidth: autoWidth ? '100%' : undefined, tableLayout: autoWidth ? 'auto' : 'fixed', '& th, & td': { px: 0.75, py: 0.3, textAlign: 'left', diff --git a/src/views/DataLoadingChat.tsx b/src/views/DataLoadingChat.tsx index 1c2e4029..379e29fb 100644 --- a/src/views/DataLoadingChat.tsx +++ b/src/views/DataLoadingChat.tsx @@ -8,49 +8,36 @@ import Markdown from 'react-markdown'; import { Box, Button, Chip, CircularProgress, IconButton, Paper, Stack, Tooltip, Typography, - alpha, useTheme, Collapse, InputBase, + alpha, useTheme, Collapse, } from '@mui/material'; -import SendIcon from '@mui/icons-material/Send'; -import StopIcon from '@mui/icons-material/Stop'; import AttachFileIcon from '@mui/icons-material/AttachFile'; -import CloseIcon from '@mui/icons-material/Close'; +import InsertDriveFileOutlinedIcon from '@mui/icons-material/InsertDriveFileOutlined'; import CheckCircleIcon from '@mui/icons-material/CheckCircle'; +import ErrorOutlineIcon from '@mui/icons-material/ErrorOutline'; +import CheckIcon from '@mui/icons-material/Check'; import ExpandMoreIcon from '@mui/icons-material/ExpandMore'; import ExpandLessIcon from '@mui/icons-material/ExpandLess'; import LanguageIcon from '@mui/icons-material/Language'; -import ImageIcon from '@mui/icons-material/Image'; -import TextFieldsIcon from '@mui/icons-material/TextFields'; -import DatasetIcon from '@mui/icons-material/Dataset'; import TerminalIcon from '@mui/icons-material/Terminal'; -import AddIcon from '@mui/icons-material/Add'; +import QuestionAnswerOutlinedIcon from '@mui/icons-material/QuestionAnswerOutlined'; +import SearchIcon from '@mui/icons-material/Search'; +import ImageOutlinedIcon from '@mui/icons-material/ImageOutlined'; +import DescriptionOutlinedIcon from '@mui/icons-material/DescriptionOutlined'; import { useTranslation } from 'react-i18next'; import { useDispatch, useSelector } from 'react-redux'; import { AppDispatch } from '../app/store'; import { DataFormulatorState, dfActions, dfSelectors } from '../app/dfSlice'; -import type { ModelConfig } from '../app/dfSlice'; import { borderColor, transition, radius, shadow } from '../app/tokens'; -import exampleImageTable from '../assets/example-image-table.png'; +import { buildDataLoadingSuggestions } from './dataLoadingSuggestions'; import { getUrls, fetchWithIdentity } from '../app/utils'; import { apiRequest, streamRequest } from '../app/apiClient'; import { ChatMessage, ChatAttachment, InlineTablePreview, CodeExecution, PendingTableLoad, LoadPlan, LoadPlanCandidate } from '../components/ComponentType'; import { createTableFromText } from '../data/utils'; -import { createTableFromFromObjectArray } from '../data/utils'; import { loadTable } from '../app/tableThunks'; -import { TableIcon } from '../icons'; -import { DataFrameTable } from './DataFrameTable'; -import { LoadPlanCard } from '../components/LoadPlanCard'; - -/** Returns true when the model name suggests it does not support image input. */ -export function checkIsLikelyTextOnlyModel(modelName: string | undefined): boolean { - return (modelName || '').toLowerCase().includes('deepseek-chat'); -} - -export function checkModelSupportsImageInput(model: Pick | undefined): boolean { - if (!model) return false; - if (model.supports_vision === false) return false; - return !checkIsLikelyTextOnlyModel(model.model); -} +import { LoadPlanCard, PendingLoadsCard } from '../components/LoadPlanCard'; +import { TablePreviewRow, TablePreviewData } from '../components/TablePreviewRow'; +import { AgentChatInput } from './AgentChatInput'; // --------------------------------------------------------------------------- // Helper: generate table name @@ -196,85 +183,72 @@ const InlineTablePreviewView: React.FC<{ const { t } = useTranslation(); const [expanded, setExpanded] = useState(true); - const allCols = preview.columns; - const rowLabel = preview.totalRows > preview.sampleRows.length ? `${preview.totalRows.toLocaleString()} ${t('dataLoading.rows')}` : ''; - const meta = [rowLabel, `${allCols.length} ${t('dataLoading.cols')}`].filter(Boolean).join(' · '); - - // Pill colors - const pillBg = confirmed - ? alpha(theme.palette.success.main, 0.08) - : alpha(theme.palette.primary.main, 0.07); - const pillColor = confirmed - ? theme.palette.success.main - : theme.palette.text.primary; + const meta = [rowLabel, `${preview.columns.length} ${t('dataLoading.cols')}`].filter(Boolean).join(' · '); + + const isDark = theme.palette.mode === 'dark'; + const borderColorBase = confirmed + ? alpha(theme.palette.success.main, 0.3) + : alpha(theme.palette.primary.main, isDark ? 0.25 : 0.15); + const borderColorHover = confirmed + ? alpha(theme.palette.success.main, 0.45) + : alpha(theme.palette.primary.main, isDark ? 0.4 : 0.3); + const shadowBase = isDark + ? '0 1px 2px rgba(0,0,0,0.4), 0 1px 3px rgba(0,0,0,0.2)' + : '0 1px 2px rgba(0,0,0,0.04), 0 1px 3px rgba(0,0,0,0.03)'; + const shadowHover = isDark + ? '0 2px 4px rgba(0,0,0,0.5), 0 2px 6px rgba(0,0,0,0.3)' + : '0 2px 4px rgba(0,0,0,0.06), 0 2px 6px rgba(0,0,0,0.04)'; return ( - - {/* Pill row: pill + Load button inline */} - - {/* Soft pill — click to expand/collapse */} - setExpanded(!expanded)} - sx={{ - display: 'inline-flex', alignItems: 'center', gap: 0.6, - px: 1.25, py: 0.4, - borderRadius: '99px', - bgcolor: pillBg, - cursor: 'pointer', - transition: transition.fast, - '&:hover': { bgcolor: confirmed - ? alpha(theme.palette.success.main, 0.14) - : alpha(theme.palette.primary.main, 0.12), - }, - userSelect: 'none', - }} - > - {confirmed - ? - : - } - - {preview.name} - - {meta && ( - - {meta} + + : undefined} + preview={{ + state: 'ready', + columns: preview.columns, + rows: preview.sampleRows, + totalRows: preview.totalRows, + }} + expanded={expanded} + onTogglePreview={preview.sampleRows.length > 0 ? () => setExpanded(!expanded) : undefined} + /> + {/* Footer: matches LoadPlanCard — right-aligned contained + Load button (unconfirmed) or quiet "Loaded" caption. */} + {(onLoad || confirmed) && ( + + + {confirmed ? ( + + {t('dataLoading.loadPlan.loadedCount', { count: 1, defaultValue: '✓ Loaded' })} - )} - {preview.sampleRows.length > 0 && ( - expanded - ? - : - )} + ) : onLoad ? ( + + ) : null} - - {/* Load button next to pill */} - {onLoad && !confirmed && ( - - )} - - - {/* Collapsible table rows */} - - {preview.sampleRows.length > 0 && ( - - - - )} - + )} ); }; @@ -301,7 +275,10 @@ const CodeBlockView: React.FC<{ block: CodeExecution }> = ({ block }) => { {t('dataLoading.ranPythonCode')} - {block.error && } + {block.error + ? + : + } {expanded ? : } @@ -322,10 +299,10 @@ const CodeBlockView: React.FC<{ block: CodeExecution }> = ({ block }) => { )} {block.error && ( - + {block.error} @@ -354,33 +331,7 @@ const ChatBubble: React.FC<{ const handleLoadTable = async (pending: PendingTableLoad) => { const unique = getUniqueTableName(pending.name, existingNames); try { - if (pending.sampleDataset) { - const ds = pending.sampleDataset; - for (const tableInfo of ds.tables) { - const res = await fetch(tableInfo.tableUrl); - const textData = await res.text(); - const tableName = tableInfo.tableUrl.split('/').pop()?.split('.')[0]?.split('?')[0] || unique; - let dictTable; - if (tableInfo.format === 'csv' || tableInfo.format === 'tsv') { - dictTable = createTableFromText(tableName, textData); - } else { - dictTable = createTableFromFromObjectArray(tableName, JSON.parse(textData), true); - } - if (dictTable) { - if (ds.live) { - dictTable.source = { - type: 'stream', url: tableInfo.tableUrl, - autoRefresh: true, refreshIntervalSeconds: ds.refreshIntervalSeconds || 60, - lastRefreshed: Date.now(), - }; - } else { - dictTable.source = { type: 'example', url: tableInfo.tableUrl }; - } - await dispatch(loadTable({ table: dictTable })); - } - } - dispatch(dfActions.confirmTableLoad({ messageId: message.id, tableName: pending.name })); - } else if (pending.csvScratchPath) { + if (pending.csvScratchPath) { const scratchUrl = `${getUrls().SCRATCH_BASE_URL}/${pending.csvScratchPath.replace(/^scratch\//, '')}`; const res = await fetchWithIdentity(scratchUrl); if (!res.ok) throw new Error(`Failed to fetch: ${res.status}`); @@ -415,10 +366,26 @@ const ChatBubble: React.FC<{ sx={{ maxWidth: '100%', maxHeight: 160, borderRadius: 1, objectFit: 'contain' }} /> ))} - {/* File attachments */} + {/* File attachments — match the muted chip style used + in the input area before send, so visual identity + carries through from compose to history. */} {message.attachments?.filter(a => a.type !== 'image').map((att, i) => ( - } - sx={{ mb: 0.5, mr: 0.5, fontSize: 11 }} /> + + + + {att.name} + + ))} {message.content && ( @@ -447,30 +414,23 @@ const ChatBubble: React.FC<{ {message.content && } {message.codeBlocks?.map((block, i) => )} {message.tables?.map((table, i) => )} - {message.pendingLoads?.map((pending, i) => ( - 1) ? (() => handleLoadTable(pending)) : undefined} /> - ))} + {message.pendingLoads && message.pendingLoads.length > 0 && ( + + )} - {/* Prominent load button at bottom — always shown when there are unloaded tables */} - {message.pendingLoads && message.pendingLoads.some(p => !p.confirmed) && ( - - + {/* Load plan card — Agent-proposed multi-table import. + The plan's reasoning is rendered *above* the card as + plain assistant text so it reads as a continuation of + the agent's voice rather than a callout inside the + card's visual container. */} + {message.loadPlan?.reasoning && ( + + )} - - {/* Load plan card — Agent-proposed multi-table import */} {message.loadPlan && ( = const theme = useTheme(); return ( - {content ? : null} - - {/* Tool call steps */} + {/* Tool call steps are rendered FIRST. Tool calls always + happen before the agent's final text, so showing them + above the text matches actual temporal order and avoids + a confusing "text first, then checkmarks below" layout. */} {toolSteps.length > 0 && ( - + {toolSteps.map((step, i) => ( = )} + {content ? : null} + {/* Bouncing dots when no tool is running and no text yet */} {toolSteps.every(s => s.status === 'done') && ( = ); }; -// --------------------------------------------------------------------------- -// Sample task list item for empty state -// --------------------------------------------------------------------------- - -const SampleTaskItem: React.FC<{ - icon: React.ReactElement; - title: string; - example: string; - onClickExample: () => void; -}> = ({ icon, title, example, onClickExample }) => { - const theme = useTheme(); - return ( - - {icon} - - - {title} - {' — '} - - - {example} - - - - ); -}; - // --------------------------------------------------------------------------- // Main chat component // --------------------------------------------------------------------------- -export const DataLoadingChat: React.FC = () => { +export interface DataLoadingChatProps { + /** + * Optional initial text to pre-fill the chat input when the component + * mounts (or when the value changes). Used by external entry points + * (e.g. landing page quick-chat box) that want to hand off a prompt + * to the agent. + */ + initialPrompt?: string; + /** + * Optional images (data URLs) to seed alongside `initialPrompt` — + * used when an external surface (e.g. landing-page agent box) has + * already collected pasted/attached images and is handing them off. + */ + initialImages?: string[]; + /** + * If true, automatically send the `initialPrompt` once on mount/change. + * Otherwise the prompt is only pre-filled and the user presses Enter. + */ + autoSendInitialPrompt?: boolean; +} + +export const DataLoadingChat: React.FC = ({ + initialPrompt, + initialImages, + autoSendInitialPrompt, +}) => { const theme = useTheme(); const { t } = useTranslation(); const dispatch = useDispatch(); const chatMessages = useSelector((state: DataFormulatorState) => state.dataLoadingChatMessages); const chatInProgress = useSelector((state: DataFormulatorState) => state.dataLoadingChatInProgress); + // External reset signal — bumped by `clearChatMessages` (manual reset + // button, new menu-level query, full session reset). When it changes + // we abort any in-flight stream, drop partial UI state, and re-seed + // from props if the parent provided a new prompt/images. Without + // this, an in-flight stream's eventual dispatches would leak into + // the freshly-cleared thread. + const chatResetCounter = useSelector((state: DataFormulatorState) => state.dataLoadingChatResetCounter ?? 0); const existingTables = useSelector((state: DataFormulatorState) => state.tables); const activeModel = useSelector(dfSelectors.getActiveModel); const frontendRowLimit = useSelector((state: DataFormulatorState) => state.config?.frontendRowLimit ?? 2_000_000); @@ -671,13 +631,19 @@ export const DataLoadingChat: React.FC = () => { const [prompt, setPrompt] = useState(''); const [userImages, setUserImages] = useState([]); + const [userAttachments, setUserAttachments] = useState([]); const [streamingContent, setStreamingContent] = useState(''); const [streamingToolSteps, setStreamingToolSteps] = useState([]); const [debugEvents, setDebugEvents] = useState([]); const [showDebugPanel] = useState(false); const abortControllerRef = useRef(null); + // Monotonic session token. Bumped on every external reset; the + // currently-running `sendMessage` captures the value at the time + // it started and discards any state/dispatch updates if the token + // has moved on (i.e. the user reset / restarted the chat mid-stream). + const sessionRef = useRef(0); + const lastResetRef = useRef(chatResetCounter); const messagesEndRef = useRef(null); - const fileInputRef = useRef(null); const inputRef = useRef(null); // Auto-scroll to bottom @@ -688,85 +654,140 @@ export const DataLoadingChat: React.FC = () => { // Auto-focus input useEffect(() => { inputRef.current?.focus(); }, []); - const canSend = (prompt.trim().length > 0 || userImages.length > 0) && !chatInProgress; + // ---- External initial prompt handling ------------------------------- + // Pre-fill the input (and optionally auto-send) when `initialPrompt` + // is provided. Used by external surfaces (e.g. landing-page quick chat + // box) to hand off text to the agent. Auto-send only fires for a + // fresh conversation — we never auto-resend on remount mid-chat. + const hasExistingMessages = chatMessages.length > 0; + const [pendingAutoSend, setPendingAutoSend] = useState(false); + useEffect(() => { + // Detect external reset: abort, invalidate in-flight session, + // and clear all local UI state before re-seeding. Including + // `chatResetCounter` in the dep list also guarantees that an + // identical-prompt re-submission (same `initialPrompt` string) + // still triggers a fresh auto-send — otherwise the deps would + // be unchanged and the effect would skip. + const isReset = chatResetCounter !== lastResetRef.current; + if (isReset) { + lastResetRef.current = chatResetCounter; + sessionRef.current += 1; + abortControllerRef.current?.abort(); + abortControllerRef.current = null; + setStreamingContent(''); + setStreamingToolSteps([]); + setPrompt(''); + setUserImages([]); + setUserAttachments([]); + setPendingAutoSend(false); + } - // ---- Paste handler (images + text) ---- - const handlePaste = (e: React.ClipboardEvent) => { - if (e.clipboardData?.files?.length) { - const imageFiles = Array.from(e.clipboardData.files).filter(f => f.type.startsWith('image/')); - if (imageFiles.length > 0) { - e.preventDefault(); - imageFiles.forEach(file => { - const reader = new FileReader(); - reader.onload = () => { - if (reader.result) setUserImages(prev => [...prev, reader.result as string]); - }; - reader.readAsDataURL(file); - }); + // Extract `[Uploaded: name]` mentions from the seeded prompt and + // surface them as chips. The mention template is locale-aware, + // so we build the regex from the current i18n value rather than + // hard-coding the English form. + const mentionTemplate = t('dataLoading.uploaded', { name: '__DF_NAME__' }); + const mentionPattern = mentionTemplate + .replace(/[.*+?^${}()|[\]\\]/g, '\\$&') + .replace('__DF_NAME__', '(.+?)'); + const mentionRegex = new RegExp(mentionPattern, 'g'); + let seededPrompt = initialPrompt || ''; + const extractedNames: string[] = []; + if (seededPrompt) { + let match: RegExpExecArray | null; + while ((match = mentionRegex.exec(seededPrompt)) !== null) { + extractedNames.push(match[1]); + } + if (extractedNames.length > 0) { + seededPrompt = seededPrompt + .replace(new RegExp(`\\n?${mentionPattern}`, 'g'), '') + .trim(); } } - }; - // ---- File upload handler ---- - const handleFileUpload = (e: React.ChangeEvent) => { - const file = e.target.files?.[0]; - if (!file) return; - if (file.type.startsWith('image/')) { - const reader = new FileReader(); - reader.onload = () => { - if (reader.result) setUserImages(prev => [...prev, reader.result as string]); - }; - reader.readAsDataURL(file); - } else { - const formData = new FormData(); - formData.append('file', file); - apiRequest(getUrls().SCRATCH_UPLOAD_URL, { - method: 'POST', body: formData, - }).then(() => { - setPrompt(prev => prev + (prev ? '\n' : '') + t('dataLoading.uploaded', { name: file.name })); - }).catch(err => console.error('Upload failed:', err)); + const hasText = seededPrompt.trim().length > 0; + const hasImages = !!initialImages && initialImages.length > 0; + const hasAttachments = extractedNames.length > 0; + // Skip re-seeding the input on a user-initiated reset — the + // reset is meant to restore a clean slate, not re-populate the + // input with the prompt the user just cleared. + if (!isReset) { + if (hasText) setPrompt(seededPrompt); + if (hasAttachments) setUserAttachments(extractedNames); + if (hasImages) { + // Always replace, never append. The prop is a "seed" — each + // change represents a fresh handoff from the parent, not an + // additive update. Appending caused the same image to stack + // up every time the parent re-rendered with a new array ref. + setUserImages([...initialImages!]); + } } - if (fileInputRef.current) fileInputRef.current.value = ''; - }; + // Auto-send only on a genuinely fresh open (no prior messages, + // and not a user-initiated reset). Resetting means the user wants + // a clean slate — re-running the seeded prompt against their will + // would defeat the purpose of the reset button. + if (autoSendInitialPrompt && !isReset && (hasText || hasImages || hasAttachments) && !hasExistingMessages) { + setPendingAutoSend(true); + } + // eslint-disable-next-line react-hooks/exhaustive-deps + }, [initialPrompt, initialImages, autoSendInitialPrompt, chatResetCounter]); const stopGeneration = () => { abortControllerRef.current?.abort(); }; // ---- Send message ---- const sendMessage = useCallback(() => { const text = prompt.trim(); - if (!text && userImages.length === 0) return; + if (!text && userImages.length === 0 && userAttachments.length === 0) return; if (chatInProgress) return; - if (userImages.length > 0 && !checkModelSupportsImageInput(activeModel)) { - dispatch(dfActions.addMessages({ - timestamp: Date.now(), - type: 'warning', - component: t('dataLoading.title'), - value: t('dataLoading.imageModelUnsupported'), - })); - return; - } - - const attachments: ChatAttachment[] = userImages.map((url, i) => ({ + const imageAttachments: ChatAttachment[] = userImages.map((url, i) => ({ type: 'image' as const, name: `image-${i + 1}`, url, })); + const fileAttachments: ChatAttachment[] = userAttachments.map(name => ({ + type: 'file' as const, name, + })); + const attachments: ChatAttachment[] = [...imageAttachments, ...fileAttachments]; + + // The visible bubble keeps the user's original text plus file + // chips (rendered from `attachments`). The agent payload below + // re-injects `[Uploaded: name]` mentions so the backend still + // sees the file references inline. + const displayText = text || (userImages.length > 0 ? t('dataLoading.defaultImageMessage') : ''); const userMsg: ChatMessage = { id: `msg-${Date.now()}-user`, role: 'user', - content: text || (userImages.length > 0 ? t('dataLoading.defaultImageMessage') : ''), + content: displayText, attachments: attachments.length > 0 ? attachments : undefined, timestamp: Date.now(), }; + // Capture the session token at send-time so that, if the user + // resets the chat mid-stream, post-await dispatches below can + // detect they are stale and bail without polluting the fresh + // (now-cleared) thread. + const mySession = sessionRef.current; + const isCurrent = () => mySession === sessionRef.current; + dispatch(dfActions.addChatMessage(userMsg)); dispatch(dfActions.setDataLoadingChatInProgress(true)); setPrompt(''); setUserImages([]); + setUserAttachments([]); setStreamingContent(''); setStreamingToolSteps([]); - const allMessages = [...chatMessages, userMsg].map(m => ({ - role: m.role, content: m.content, attachments: m.attachments, - })); + const allMessages = [...chatMessages, userMsg].map(m => { + // Re-hydrate `[Uploaded: name]` mentions from file attachments + // so the backend still sees them as text references, while + // the chat UI shows clean text + chips. + const fileNames = (m.attachments || []) + .filter(a => a.type === 'file' || a.type === 'text_file') + .map(a => a.name); + const mentions = fileNames.map(name => t('dataLoading.uploaded', { name })).join('\n'); + const augmented = mentions + ? (m.content ? `${m.content}\n${mentions}` : mentions) + : m.content; + return { role: m.role, content: augmented, attachments: m.attachments }; + }); const controller = new AbortController(); abortControllerRef.current = controller; @@ -799,28 +820,6 @@ export const DataLoadingChat: React.FC = () => { csvScratchPath: action.csv_scratch_path || '', preview, confirmed: false, }); - } else if (action.type === 'load_sample_dataset') { - const dsLive = action.live; - const dsRefresh = action.refreshIntervalSeconds; - for (const tbl of (action.tables || [])) { - const tableName = tbl.table_url?.split('/').pop()?.split('.')[0]?.split('?')[0] || action.name || 'table'; - const cols = tbl.columns || (tbl.sample_rows?.[0] ? Object.keys(tbl.sample_rows[0]) : []); - const sampleRows = tbl.sample_rows || []; - const preview: InlineTablePreview = { - name: tableName, columns: cols, - sampleRows: sampleRows.slice(0, 5), - totalRows: sampleRows.length, - }; - tables.push(preview); - pendingLoads.push({ - name: tableName, csvScratchPath: '', preview, confirmed: false, - sampleDataset: { - datasetName: action.name || tableName, - tables: [{ tableUrl: tbl.table_url, format: tbl.format || 'json' }], - live: dsLive, refreshIntervalSeconds: dsRefresh, - }, - }); - } } else if (action.type === 'load_plan') { loadPlanRef = { candidates: (action.candidates || []).map((c: any) => ({ @@ -832,7 +831,8 @@ export const DataLoadingChat: React.FC = () => { filters: c.filters, sortBy: c.sort_by, sortOrder: c.sort_order, - selected: true, + resolutionError: c.resolution_error, + selected: !c.resolution_error, })), reasoning: action.reasoning, }; @@ -850,6 +850,10 @@ export const DataLoadingChat: React.FC = () => { row_limit: frontendRowLimit, }), }, controller.signal)) { + // If a reset has happened while we were awaiting, drop + // all further events on the floor. We avoid `break` so + // the underlying iterator gets a chance to clean up. + if (!isCurrent()) continue; // Log all events for debug panel if (event.type !== 'text_delta') { rawEvents.push(event); @@ -908,6 +912,10 @@ export const DataLoadingChat: React.FC = () => { } } + // Stream finished. If a reset happened in the meantime, don't + // commit a final assistant message into the new thread. + if (!isCurrent()) return; + const assistantMsg: ChatMessage = { id: `msg-${Date.now()}-assistant`, role: 'assistant', content: fullText, @@ -921,6 +929,11 @@ export const DataLoadingChat: React.FC = () => { setStreamingContent(''); setStreamingToolSteps([]); } catch (error: any) { + // A reset (which calls controller.abort()) will trigger + // AbortError here. Discard everything in that case — the + // user wants a fresh thread, not the dying gasps of the + // previous one. + if (!isCurrent()) return; const partialContent = streamingContent; if (error.name === 'AbortError') { if (partialContent) { @@ -942,62 +955,47 @@ export const DataLoadingChat: React.FC = () => { setStreamingContent(''); setStreamingToolSteps([]); } finally { - dispatch(dfActions.setDataLoadingChatInProgress(false)); - abortControllerRef.current = null; + // Only clear the in-progress flag if we still own the + // session. The reset reducer already cleared it; a stale + // dispatch here would flip it back to false after a + // legitimate new stream had set it true. + if (isCurrent()) { + dispatch(dfActions.setDataLoadingChatInProgress(false)); + } + if (abortControllerRef.current === controller) { + abortControllerRef.current = null; + } } })(); }, [prompt, userImages, chatInProgress, chatMessages, activeModel, existingTables, dispatch, streamingContent, t]); - const handleKeyDown = (e: React.KeyboardEvent) => { - if (e.key === 'Enter' && !e.shiftKey) { - e.preventDefault(); - if (canSend) sendMessage(); - } - }; - - const sampleTasks = [ - { - icon: , - title: t('dataLoading.examples.extractFromImage'), - example: t('dataLoading.examples.extractFromImageExample'), - action: () => { - fetch(exampleImageTable) - .then(res => res.blob()) - .then(blob => { - const reader = new FileReader(); - reader.onload = () => { - if (reader.result) { - setUserImages([reader.result as string]); - setPrompt(t('dataLoading.examples.extractFromImageExample')); - setTimeout(() => inputRef.current?.focus(), 50); - } - }; - reader.readAsDataURL(blob); - }); - }, - }, - { - icon: , - title: t('dataLoading.examples.extractFromText'), - example: t('dataLoading.examples.extractFromTextExample'), - action: () => { setPrompt(t('dataLoading.examples.extractFromTextPrompt')); setTimeout(() => inputRef.current?.focus(), 50); }, - }, - { - icon: , - title: t('dataLoading.examples.generateSynthetic'), - example: t('dataLoading.examples.generateSyntheticExample'), - action: () => { setPrompt(t('dataLoading.examples.generateSyntheticExample')); setTimeout(() => inputRef.current?.focus(), 50); }, - }, - { - icon: , - title: t('dataLoading.examples.browseSamples'), - example: t('dataLoading.examples.browseSamplesExample'), - action: () => { setPrompt(t('dataLoading.examples.browseSamplesExample')); setTimeout(() => inputRef.current?.focus(), 50); }, - }, - ]; + // Auto-send the initial prompt once it has been applied to state. + useEffect(() => { + if (!pendingAutoSend) return; + if (chatInProgress) return; + if (prompt.trim().length === 0 && userImages.length === 0) return; + setPendingAutoSend(false); + sendMessage(); + }, [pendingAutoSend, prompt, userImages, chatInProgress, sendMessage]); + + // Reuse the shared sample-task list so this in-session panel stays in + // sync with the upload-dialog entry point (`UnifiedDataUploadDialog`). + const focusSuggestions = React.useMemo(() => buildDataLoadingSuggestions({ + t, + setInput: setPrompt, + setImages: setUserImages, + setAttachments: setUserAttachments, + }), [t]); const isEmpty = chatMessages.length === 0 && !streamingContent; + const capabilities = [ + { icon: , text: t('dataLoading.capabilityAsk') }, + { icon: , text: t('dataLoading.capabilitySearch') }, + { icon: , text: t('dataLoading.capabilityExtractImage') }, + { icon: , text: t('dataLoading.capabilityExtractFile') }, + ]; + return ( { ...(isEmpty ? { flex: 1, justifyContent: 'center', alignItems: 'center' } : {}), }}> {isEmpty ? ( - + {t('dataLoading.title')} - + {t('dataLoading.subtitle')} - - {sampleTasks.map((task, i) => ( - + + {capabilities.map((cap, i) => ( + + + {cap.icon} + + + {cap.text} + + ))} + + {t('dataLoading.capabilityHint')} + ) : ( <> @@ -1044,104 +1059,37 @@ export const DataLoadingChat: React.FC = () => { {/* ── Input area ─────────────────────────────────────── */} - - - {/* Image previews inside the input box */} - {userImages.length > 0 && ( - - {userImages.map((img, i) => ( - - - setUserImages(prev => prev.filter((_, idx) => idx !== i))} - sx={{ - position: 'absolute', top: -4, right: -4, - width: 18, height: 18, - bgcolor: 'rgba(0,0,0,0.55)', color: 'white', - '&:hover': { bgcolor: 'rgba(0,0,0,0.75)' }, - }}> - - - - ))} - - )} - - {/* Text input row */} - - - - fileInputRef.current?.click()} - disabled={chatInProgress} - sx={{ mb: 0.25, color: 'text.secondary' }}> - - - - - setPrompt(e.target.value)} - onKeyDown={handleKeyDown} - onPaste={handlePaste} - placeholder={t('dataLoading.placeholder')} - disabled={chatInProgress} - sx={{ flex: 1, px: 1, py: 0.75, fontSize: 13, lineHeight: 1.5 }} - /> - - {chatInProgress ? ( - - - - - - ) : ( - - - - - - - - )} - + + { + const formData = new FormData(); + formData.append('file', file); + apiRequest(getUrls().SCRATCH_UPLOAD_URL, { + method: 'POST', body: formData, + }).then(() => { + setUserAttachments(prev => [...prev, file.name]); + }).catch(err => console.error('Upload failed:', err)); + }} + attachments={userAttachments} + onAttachmentsChange={setUserAttachments} + focusSuggestions={isEmpty ? focusSuggestions : undefined} + focusSuggestionsLabel={t('dataLoading.sectionTry')} + focusSuggestionsPlacement="top" + /> + + {t('dataLoading.shiftEnterHint')} + - - - {t('dataLoading.shiftEnterHint')} - - ); diff --git a/src/views/DataSourceSidebar.tsx b/src/views/DataSourceSidebar.tsx index 392e2449..a8c4715d 100644 --- a/src/views/DataSourceSidebar.tsx +++ b/src/views/DataSourceSidebar.tsx @@ -11,6 +11,7 @@ import React, { useState, useCallback, useEffect, useMemo, useRef } from 'react'; import { useSelector, useDispatch } from 'react-redux'; import { useTranslation } from 'react-i18next'; +import i18n from '../i18n'; import { Box, Typography, @@ -18,6 +19,7 @@ import { Tooltip, Collapse, CircularProgress, + Fade, Popover, Button, Dialog, @@ -49,7 +51,6 @@ import ChevronLeftIcon from '@mui/icons-material/ChevronLeft'; import ExpandMoreIcon from '@mui/icons-material/ExpandMore'; import ChevronRightIcon from '@mui/icons-material/ChevronRight'; import RefreshIcon from '@mui/icons-material/Refresh'; -import ExploreOutlinedIcon from '@mui/icons-material/ExploreOutlined'; import ContentPasteOutlinedIcon from '@mui/icons-material/ContentPasteOutlined'; import SmartToyOutlinedIcon from '@mui/icons-material/SmartToyOutlined'; import LinkOutlinedIcon from '@mui/icons-material/LinkOutlined'; @@ -71,7 +72,7 @@ import { extractErrorMessage } from '../app/errorHandler'; import { LoadableState, errorLoadable, loadingLoadable, successLoadable } from '../app/loadableState'; import { getConnectorIcon, connectorSortOrder, RelationalDBIcon } from '../icons'; import { loadTable } from '../app/tableThunks'; -import { listWorkspaces, loadWorkspace, deleteWorkspace, updateWorkspaceMeta, onWorkspaceListChanged } from '../app/workspaceService'; +import { listWorkspaces, loadWorkspace, deleteWorkspace, exportWorkspace, importWorkspace, updateWorkspaceMeta, onWorkspaceListChanged } from '../app/workspaceService'; import type { WorkspaceSummary } from '../app/workspaceService'; import { borderColor } from '../app/tokens'; @@ -89,8 +90,8 @@ import { ResizeHandle } from '../components/ResizeHandle'; // ─── Constants ─────────────────────────────────────────────────────────────── const RAIL_WIDTH = 40; -const DEFAULT_PANEL_WIDTH = 260; -const MIN_PANEL_WIDTH = 200; +const DEFAULT_PANEL_WIDTH = 280; +const MIN_PANEL_WIDTH = 240; const MAX_PANEL_WIDTH = 450; const SIDEBAR_WIDTH_KEY = 'df-sidebar-panel-width'; @@ -101,19 +102,19 @@ const SIDEBAR_WIDTH_KEY = 'df-sidebar-panel-width'; function formatCompactTime(iso: string | null | undefined): string { if (!iso) return ''; const then = new Date(iso); - const t = then.getTime(); - if (Number.isNaN(t)) return ''; + const ts = then.getTime(); + if (Number.isNaN(ts)) return ''; const now = Date.now(); - const diffSec = Math.max(0, Math.round((now - t) / 1000)); - if (diffSec < 60) return 'just now'; + const diffSec = Math.max(0, Math.round((now - ts) / 1000)); + if (diffSec < 60) return i18n.t('sidebar.timeJustNow'); const diffMin = Math.round(diffSec / 60); - if (diffMin < 60) return `${diffMin}m`; + if (diffMin < 60) return i18n.t('sidebar.timeMinutes', { count: diffMin }); const diffHr = Math.round(diffMin / 60); - if (diffHr < 24) return `${diffHr}h`; + if (diffHr < 24) return i18n.t('sidebar.timeHours', { count: diffHr }); const startOfToday = new Date(); startOfToday.setHours(0, 0, 0, 0); const diffDays = Math.floor((startOfToday.getTime() - then.getTime()) / 86400000) + 1; - if (diffDays === 1) return 'yesterday'; - if (diffDays < 7) return `${diffDays}d`; + if (diffDays === 1) return i18n.t('sidebar.timeYesterday'); + if (diffDays < 7) return i18n.t('sidebar.timeDays', { count: diffDays }); const sameYear = then.getFullYear() === new Date().getFullYear(); return then.toLocaleDateString(undefined, sameYear ? { month: 'short', day: 'numeric' } @@ -154,13 +155,11 @@ export const DataSourceSidebar: React.FC<{ const toggle = () => dispatch(dfActions.setDataSourceSidebarOpen(!isOpen)); - // When connectors are disabled (browser-only / hosted mode) we land - // users on 'upload' instead of 'sources' so they don't open the panel - // straight onto an upgrade message — but the sources tab remains - // available so users can learn what local install unlocks. - const [initialTab, setInitialTab] = useState<'upload' | 'sources' | 'sessions' | 'knowledge'>( - disableConnectors ? 'upload' : 'sources', - ); + // Default landing tab is 'sources' — even in browser-only mode the + // built-in sample_datasets connector is shown there, giving users + // something useful to explore immediately. The upgrade message only + // appears when they try to add a new connector or link a folder. + const [initialTab, setInitialTab] = useState<'upload' | 'sources' | 'sessions' | 'knowledge'>('sources'); // External callers (e.g. SaveExperienceButton on success) can ask the // sidebar to open and switch to a specific tab. @@ -492,6 +491,9 @@ const DataSourceSidebarPanel: React.FC<{ setSessions(prev => prev.map(s => (s.id === id ? { ...s, display_name: next } : s)), ); + if (activeWorkspace?.id === id) { + dispatch(dfActions.setActiveWorkspace({ id, displayName: next })); + } cancelRenameSession(); try { await updateWorkspaceMeta(id, next); @@ -499,11 +501,55 @@ const DataSourceSidebarPanel: React.FC<{ dispatch(dfActions.addMessages({ timestamp: Date.now(), type: 'error', component: 'data-source-sidebar', - value: 'Failed to rename session', + value: t('sidebar.failedRenameSession'), })); refreshSessions(); } - }, [renamingSession, renameSessionDraft, sessions, cancelRenameSession, dispatch, refreshSessions]); + }, [renamingSession, renameSessionDraft, sessions, activeWorkspace, cancelRenameSession, dispatch, refreshSessions]); + + const handleExportSession = useCallback(async (id: string, displayName: string) => { + try { + const blob = await exportWorkspace(id); + const a = document.createElement('a'); + a.href = URL.createObjectURL(blob); + a.download = `${displayName || id}.zip`; + a.click(); + URL.revokeObjectURL(a.href); + } catch (e) { + dispatch(dfActions.addMessages({ + timestamp: Date.now(), type: 'error', + component: 'data-source-sidebar', + value: t('sidebar.exportFailed'), + })); + } + }, [dispatch, t]); + + const importRef = useRef(null); + const handleImportWorkspace = useCallback(async (event: React.ChangeEvent) => { + const file = event.target.files?.[0]; + if (!file) return; + dispatch(dfActions.setSessionLoading({ loading: true, label: t('workspace.importingFile', { name: file.name }) })); + try { + const wsName = file.name.replace(/\.zip$/, '') || 'imported'; + const now = new Date(); + const date = `${now.getFullYear()}${String(now.getMonth() + 1).padStart(2, '0')}${String(now.getDate()).padStart(2, '0')}`; + const time = `${String(now.getHours()).padStart(2, '0')}${String(now.getMinutes()).padStart(2, '0')}${String(now.getSeconds()).padStart(2, '0')}`; + const short = generateUUID().slice(0, 4); + const wsId = `session_${date}_${time}_${short}`; + const state = await importWorkspace(file, wsId, wsName); + const restoredName = (state as any).activeWorkspace?.displayName || wsName; + dispatch(dfActions.loadState({ ...state, activeWorkspace: { id: wsId, displayName: restoredName } })); + } catch (e) { + console.warn('Failed to import workspace:', e); + dispatch(dfActions.addMessages({ + timestamp: Date.now(), type: 'error', + component: 'data-source-sidebar', + value: t('sidebar.importFailed'), + })); + } + dispatch(dfActions.setSessionLoading({ loading: false })); + if (importRef.current) importRef.current.value = ''; + }, [dispatch, t]); useEffect(() => { refreshSessions(); @@ -527,17 +573,18 @@ const DataSourceSidebarPanel: React.FC<{ return parts.length > 0 ? parts.join(' · ') : t('sidebar.clickToOpen'); }, [t]); - const handleOpenSession = useCallback(async (sessionId: string) => { + const handleOpenSession = useCallback(async (sessionId: string, metaDisplayName?: string) => { dispatch(dfActions.setSessionLoading({ loading: true, label: t('sidebar.openingWorkspace') })); try { const result = await loadWorkspace(sessionId); if (result && Object.keys(result.state).length > 0) { - dispatch(dfActions.loadState({ ...result.state, activeWorkspace: { id: sessionId, displayName: result.displayName } })); + const displayName = metaDisplayName || result.displayName; + dispatch(dfActions.loadState({ ...result.state, activeWorkspace: { id: sessionId, displayName } })); } else { - dispatch(dfActions.setActiveWorkspace({ id: sessionId, displayName: 'Untitled Session' })); + dispatch(dfActions.setActiveWorkspace({ id: sessionId, displayName: metaDisplayName || 'Untitled Session' })); } } catch { - dispatch(dfActions.setActiveWorkspace({ id: sessionId, displayName: 'Untitled Session' })); + dispatch(dfActions.setActiveWorkspace({ id: sessionId, displayName: metaDisplayName || 'Untitled Session' })); } dispatch(dfActions.setSessionLoading({ loading: false })); }, [dispatch]); @@ -553,7 +600,7 @@ const DataSourceSidebarPanel: React.FC<{ const deletedIndex = prev.findIndex(s => s.id === sessionId); const nextSession = updated[Math.min(deletedIndex, updated.length - 1)]; if (nextSession) { - handleOpenSession(nextSession.id); + handleOpenSession(nextSession.id, nextSession.display_name); } else { // No sessions left — start fresh const now = new Date(); @@ -585,7 +632,9 @@ const DataSourceSidebarPanel: React.FC<{ // ── Connector list ─────────────────────────────────────────────────────── const fetchConnectors = useCallback(() => { - if (disableConnectors) return; + // Even when external connectors are disabled, the backend still + // exposes the built-in `sample_datasets` connector — fetch the + // list so users in browser-only mode still see example datasets. setLoadingConnectors(true); apiRequest(CONNECTOR_URLS.LIST, { method: 'GET' }) .then(({ data }) => { @@ -594,7 +643,7 @@ const DataSourceSidebarPanel: React.FC<{ }) .catch(() => { /* connector list is best-effort */ }) .finally(() => setLoadingConnectors(false)); - }, [disableConnectors]); + }, []); // Fetch on mount and whenever identity changes. useEffect(() => { @@ -870,11 +919,16 @@ const DataSourceSidebarPanel: React.FC<{ }); }, [fetchCatalogTree]); - // Auto-expand the first connected connector on panel open so the sidebar - // isn't an all-collapsed list. Tracked per identity/refresh so a user - // collapse stays collapsed across re-renders within the same panel mount. - // The panel re-mounts when the sidebar is reopened, which naturally - // re-triggers this effect. + // Auto-expand only when there's a single connected connector — for a + // fresh user that's just the built-in sample_datasets, so the sidebar + // isn't an empty-looking collapsed list. Once the user has added their + // own connectors, we leave everything collapsed; expansion then happens + // only via explicit user click or a focus handoff (e.g. from the upload + // dialog / front page). + // + // Tracked per identity/refresh so a user collapse stays collapsed across + // re-renders within the same panel mount. The panel re-mounts when the + // sidebar is reopened, which naturally re-triggers this effect. // // Suppressed when an explicit focus request is pending — otherwise that // focus would compete with an unrelated expansion. @@ -886,12 +940,13 @@ const DataSourceSidebarPanel: React.FC<{ const key = `${identityKey}:${connectorRefreshKey}`; if (autoExpandedRef.current === key) return; if (focusedConnectorId) return; - const first = sortedConnectors.find(c => c.connected); - if (!first) return; + const connected = sortedConnectors.filter(c => c.connected); + if (connected.length !== 1) return; + const only = connected[0]; autoExpandedRef.current = key; - setExpandedConnectorId(prev => prev ?? first.id); - if (!catalogCacheRef.current[first.id]) { - fetchCatalogTree(first.id); + setExpandedConnectorId(prev => prev ?? only.id); + if (!catalogCacheRef.current[only.id]) { + fetchCatalogTree(only.id); } }, [sortedConnectors, identityKey, connectorRefreshKey, fetchCatalogTree, focusedConnectorId]); @@ -933,6 +988,29 @@ const DataSourceSidebarPanel: React.FC<{ const ref = buildSourceTableRef(node); const nodeMeta = node.metadata || {}; + // Fast path: when the catalog node already carries an embedded + // preview (columns + sample_rows in metadata, as the sample-datasets + // connector emits via list_tables), skip the network round-trip and + // render the popover instantly. The real data is only fetched when + // the user clicks "Load Table". rowCount is intentionally left + // null — for embedded previews we don't know the true total without + // downloading the URL, and the preview UI handles that gracefully. + const embeddedSampleRows = Array.isArray(nodeMeta.sample_rows) ? nodeMeta.sample_rows : null; + const embeddedColumns = Array.isArray(nodeMeta.columns) ? nodeMeta.columns : null; + if (embeddedSampleRows && embeddedSampleRows.length > 0 && embeddedColumns && embeddedColumns.length > 0) { + setPreview({ + connectorId, + node, + columns: embeddedColumns as any, + sampleRows: embeddedSampleRows, + rowCount: nodeMeta.row_count ?? null, + tableDescription: nodeMeta.source_description || nodeMeta.description, + loading: false, + }); + setPreviewAnchor(anchorEl); + return; + } + setPreview({ connectorId, node, @@ -958,11 +1036,22 @@ const DataSourceSidebarPanel: React.FC<{ setPreview(prev => { if (!prev) return null; const newCols = (data.columns as typeof prev.columns); + const sampleLen = (data.rows || []).length; + // Only treat `total_row_count` as authoritative when + // it's strictly greater than the returned sample, or + // when the sample is short of the preview cap (10) — + // both indicate the loader actually knows the total + // rather than falling back to `len(rows)`. Otherwise + // keep whatever the catalog metadata already gave us. + const total = data.total_row_count; + const totalReliable = + total != null && + (total > sampleLen || sampleLen < 10); return { ...prev, columns: newCols.length > 0 ? newCols : prev.columns, sampleRows: data.rows || [], - rowCount: data.total_row_count ?? prev.rowCount, + rowCount: totalReliable ? total : prev.rowCount, tableDescription: data.description ?? prev.tableDescription, loading: false, }; @@ -1218,7 +1307,6 @@ const DataSourceSidebarPanel: React.FC<{ {[ - { icon: , label: t('upload.sampleDatasets', { defaultValue: 'Sample datasets' }), tab: 'explore' }, { icon: , label: t('upload.uploadFile', { defaultValue: 'Upload file' }), tab: 'upload' }, { icon: , label: t('upload.pasteData', { defaultValue: 'Paste data' }), tab: 'paste' }, { icon: , label: t('upload.extractData', { defaultValue: 'Data Assistant' }), tab: 'extract' }, @@ -1237,29 +1325,12 @@ const DataSourceSidebarPanel: React.FC<{ )} - {/* ── Data Connectors tab ── */} - {activeTab === 'sources' && disableConnectors && ( - - - - {t('sidebar.dataConnectorsTitle', { defaultValue: 'Data Connectors' })} - - - - - - - - - - - - )} - - {/* ── Data Connectors tab ── */} - {activeTab === 'sources' && !disableConnectors && ( + {/* ── Data Connectors tab ── + Sample datasets remain available even when external + connectors are disabled; the Add Connector / Link Folder + actions route through the upload dialog, which renders + the LocalInstallUpgradePanel in disabled mode. */} + {activeTab === 'sources' && ( - {/* Source header */} + {/* Source header — uses the same chevron-in-gutter + layout primitives as the catalog tree below, so + the whole sidebar reads as one continuous tree. + Layout (matches VirtualizedCatalogTree): + LEADING_PAD(6) | chevron(12) | GAP(4) | icon … + The catalog wrapper below then offsets its tree + by exactly one INDENT_PER_LEVEL (12px), so the + first catalog row's chevron sits one step right + of the connector header's chevron. */} { - if (connector.connected) { + // No-auth connectors (auth_mode = 'none') + // are always available — clicking the + // header just toggles expansion, never + // opens a credentials dialog. + const isAlwaysOn = connector.auth_mode === 'none'; + if (connector.connected || isAlwaysOn) { toggleSource(connector.id); } else { // Not connected — open config dialog for this connector @@ -1405,10 +1489,12 @@ const DataSourceSidebarPanel: React.FC<{ } }} sx={{ + position: 'relative', display: 'flex', alignItems: 'center', - gap: 0.75, - px: 1.5, + gap: 0.5, + pl: '22px', // LEADING_PAD(6) + CHEVRON_WIDTH(12) + CHEVRON_GAP(4) + pr: 0.5, py: 0.75, cursor: 'pointer', '&:hover': { bgcolor: 'action.hover' }, @@ -1416,18 +1502,39 @@ const DataSourceSidebarPanel: React.FC<{ userSelect: 'none', }} > - {connector.connected - ? (isExpanded - ? - : ) - : - } + + {(connector.connected || connector.auth_mode === 'none') && isExpanded + ? + : } + {getConnectorIcon(connector.icon || connector.source_type, { sx: { fontSize: 16, opacity: 0.7 } })} - - + {/* Status dot — green for live connections + and for always-on built-ins (which are + ready by definition), warning for + disconnected. */} + + {connector.display_name} - {connector.connected && ( + {(connector.connected || connector.auth_mode === 'none') && ( + )} {connector.deletable ? ( - ) : connector.connected && ( + ) : connector.connected && connector.auth_mode !== 'none' && ( /* Admin connector: surface Disconnect in place of Delete. Only meaningful when there's an active session/credentials to clear; if already disconnected, "Edit connection" is - the path to re-authenticate. */ + the path to re-authenticate. + No-auth connectors have nothing to disconnect. */ {/* Catalog tree — only for connected sources. - Wrapper pl is intentionally one indent step (8px) - deeper than the connector header's `px: 1.5`, - so the first-level namespace icon sits one - step right of the database icon — same step as - namespace → leaf. Keeps the icon column visually - uniform between connector → namespace → leaf. */} + Aligned at the chevron column (the row's + primary structural column). The connector's + type icon (folder / cylinder / etc.) is treated + as part of the connector's name decoration, not + a separate structural column — so depth-0 rows + sit directly under the connector's chevron. + + connector chevron col @ x = 6 + catalog depth 0 slot @ x = 6 (same column) + catalog depth 1 slot @ x = 18 (+12) + catalog depth 2 slot @ x = 30 (+12) */} {connector.connected && ( - + {!displayCache && isLoading && ( @@ -1627,18 +1744,18 @@ const DataSourceSidebarPanel: React.FC<{ }} renderValue={(v) => { const labels: Record = { - created_desc: 'newest', - created_asc: 'oldest', - updated_desc: 'recently modified', - name_asc: 'name', + created_desc: t('sidebar.sortNewest'), + created_asc: t('sidebar.sortOldest'), + updated_desc: t('sidebar.sortRecentlyModified'), + name_asc: t('sidebar.sortName'), }; return labels[v as SessionSortKey]; }} > - newest first - oldest first - recently modified - name (a–z) + {t('sidebar.sortNewestFirst')} + {t('sidebar.sortOldestFirst')} + {t('sidebar.sortRecentlyModifiedFirst')} + {t('sidebar.sortNameAsc')} @@ -1648,31 +1765,38 @@ const DataSourceSidebarPanel: React.FC<{ - {/* New session action */} - { - const now = new Date(); - const date = `${now.getFullYear()}${String(now.getMonth() + 1).padStart(2, '0')}${String(now.getDate()).padStart(2, '0')}`; - const time = `${String(now.getHours()).padStart(2, '0')}${String(now.getMinutes()).padStart(2, '0')}${String(now.getSeconds()).padStart(2, '0')}`; - const short = generateUUID().slice(0, 4); - const wsId = `session_${date}_${time}_${short}`; - dispatch(dfActions.loadState({ tables: [], charts: [], draftNodes: [], conceptShelfItems: [], activeWorkspace: { id: wsId, displayName: 'Untitled Session' } })); - }} - sx={{ - display: 'flex', - alignItems: 'center', - gap: 0.75, - px: 1.5, - py: 0.75, - cursor: 'pointer', - '&:hover': { bgcolor: 'action.hover' }, - userSelect: 'none', - }} - > - - - {t('sidebar.newSession', { defaultValue: 'New session' })} - + {/* New session + import actions */} + + { + const now = new Date(); + const date = `${now.getFullYear()}${String(now.getMonth() + 1).padStart(2, '0')}${String(now.getDate()).padStart(2, '0')}`; + const time = `${String(now.getHours()).padStart(2, '0')}${String(now.getMinutes()).padStart(2, '0')}${String(now.getSeconds()).padStart(2, '0')}`; + const short = generateUUID().slice(0, 4); + const wsId = `session_${date}_${time}_${short}`; + dispatch(dfActions.loadState({ tables: [], charts: [], draftNodes: [], conceptShelfItems: [], activeWorkspace: { id: wsId, displayName: 'Untitled Session' } })); + }} + sx={{ + display: 'flex', + alignItems: 'center', + gap: 0.75, + flex: 1, + cursor: 'pointer', + borderRadius: 0.5, + '&:hover': { bgcolor: 'action.hover' }, + }} + > + + + {t('sidebar.newSession', { defaultValue: 'New session' })} + + + + importRef.current?.click()} sx={{ p: 0.25, color: 'text.disabled', '&:hover': { color: 'text.secondary' } }}> + + + + {sessions.length === 0 ? ( @@ -1696,7 +1820,7 @@ const DataSourceSidebarPanel: React.FC<{ enterDelay={400} > { if (!isRenaming && activeWorkspace?.id !== s.id) handleOpenSession(s.id); }} + onClick={() => { if (!isRenaming && activeWorkspace?.id !== s.id) handleOpenSession(s.id, s.display_name); }} sx={{ position: 'relative', display: 'flex', @@ -1779,19 +1903,28 @@ const DataSourceSidebarPanel: React.FC<{ gap: 0.25, }} > - + { e.stopPropagation(); startRenameSession(s.id, s.display_name); }} - sx={{ p: 0.25, color: 'primary.main' }} + sx={{ p: 0.25, color: 'text.disabled', '&:hover': { color: 'primary.main' } }} > + + { e.stopPropagation(); handleExportSession(s.id, s.display_name); }} + sx={{ p: 0.25, color: 'text.disabled', '&:hover': { color: 'text.primary' } }} + > + + + handleDeleteSession(s.id, e)} - sx={{ p: 0.25, color: 'warning.main' }} + sx={{ p: 0.25, color: 'text.disabled', '&:hover': { color: 'warning.main' } }} > @@ -1832,19 +1965,24 @@ const DataSourceSidebarPanel: React.FC<{ onClose={closePreview} anchorOrigin={{ vertical: 'center', horizontal: 'right' }} transformOrigin={{ vertical: 'center', horizontal: 'left' }} + // Use Fade instead of the default Grow transition. Grow + // scales the paper from 75% → 100%, which reads as a + // "small → large" pop even when the paper layout is + // pixel-stable. Fade only crossfades opacity, so the + // popover appears at its final size in one shot. + slots={{ transition: Fade }} slotProps={{ + transition: { timeout: 120 } as any, paper: { sx: { - // Size to the table's natural width (DataFrameTable - // is rendered with autoWidth in the connector - // preview), bounded so very narrow tables still - // give the header room and very wide tables - // don't dominate the viewport. Users can still - // resize the popover larger via the resize handle. - width: 'auto', - minWidth: 480, - maxWidth: 'min(1100px, 75vw)', - minHeight: 300, + // Fixed width so the popover doesn't grow when + // the table renders; height is content-driven + // because the preview is capped at 10 rows and + // therefore intrinsically stable. The table area + // itself reserves space for ~10 rows during + // loading (see ConnectorTablePreview) so the + // popover opens at its final height. + width: 'min(640px, 70vw)', maxHeight: '85vh', display: 'flex', flexDirection: 'column', overflow: 'hidden', resize: 'both', diff --git a/src/views/DataThread.tsx b/src/views/DataThread.tsx index fa2b0b40..ae0cc9f1 100644 --- a/src/views/DataThread.tsx +++ b/src/views/DataThread.tsx @@ -41,10 +41,11 @@ import { CATALOG_TABLE_ITEM } from '../components/DndTypes'; import type { CatalogTableDragItem } from '../components/DndTypes'; import { loadTable } from '../app/tableThunks'; import { AppDispatch } from '../app/store'; +import { deleteWorkspace } from '../app/workspaceService'; import DeleteIcon from '@mui/icons-material/Delete'; -import StarIcon from '@mui/icons-material/Star'; import PersonIcon from '@mui/icons-material/Person'; +import ForumOutlinedIcon from '@mui/icons-material/ForumOutlined'; import { TableIcon, AnchorIcon, InsightIcon, StreamIcon, AgentIcon } from '../icons'; @@ -81,6 +82,7 @@ import { AgentRulesDialog } from './AgentRulesDialog'; import CheckCircleOutlineIcon from '@mui/icons-material/CheckCircleOutline'; import SmartToyOutlinedIcon from '@mui/icons-material/SmartToyOutlined'; +import { AgentToyIcon } from './AgentToyIcon'; import AutoAwesomeIcon from '@mui/icons-material/AutoAwesome'; import ArticleIcon from '@mui/icons-material/Article'; import TerminalIcon from '@mui/icons-material/Terminal'; @@ -89,26 +91,79 @@ import WarningAmberIcon from '@mui/icons-material/WarningAmber'; import InfoOutlinedIcon from '@mui/icons-material/InfoOutlined'; import SearchIcon from '@mui/icons-material/Search'; import AutoGraphIcon from '@mui/icons-material/AutoGraph'; +import CallMergeIcon from '@mui/icons-material/CallMerge'; import { ViewBorderStyle, ComponentBorderStyle, transition, radius, borderColor } from '../app/tokens'; import { SimpleChartRecBox } from './SimpleChartRecBox'; -import { InteractionEntryCard, getEntryGutterIcon, getDefaultGutterIcon, PlanStepsView } from './InteractionEntryCard'; +import { InteractionEntryCard, ResolvedConversationCard, getEntryGutterIcon, getDefaultGutterIcon, PlanStepsView } from './InteractionEntryCard'; /** Pick the icon component for a step line based on known prefixes. */ // Re-exported from InteractionEntryCard — kept here for backward compat with gutter icon logic -/** Render a multi-step thinking banner as a single block with sectioned steps. */ -export const ThinkingStepsBanner = (steps: string[], sx?: SxProps) => { +/** Live elapsed-time hint in whole seconds (`5s`, `12s`). + * Ticks once per second — fast enough to read as live, slow enough that the + * digit stays readable and doesn't pull peripheral attention. Liveness is + * also conveyed by the banner's shimmer animation. + * When `startTime` is omitted, anchors to the component's mount time — + * useful for places where we don't have a meaningful upstream anchor. + * When `resetKey` changes (e.g. the active step transitions from "thinking" + * to "running code"), the anchor is reset to *now* so the timer reflects + * the duration of the **current** action rather than the cumulative wait. */ +const LiveStatus: React.FC<{ startTime?: number; resetKey?: string }> = ({ startTime, resetKey }) => { + const anchorRef = useRef(startTime ?? Date.now()); + const lastResetKeyRef = useRef(resetKey); + if (startTime != null && anchorRef.current !== startTime && resetKey === lastResetKeyRef.current) { + anchorRef.current = startTime; + } + if (resetKey !== lastResetKeyRef.current) { + anchorRef.current = Date.now(); + lastResetKeyRef.current = resetKey; + } + const [now, setNow] = useState(() => Date.now()); + useEffect(() => { + const id = setInterval(() => setNow(Date.now()), 1000); + return () => clearInterval(id); + }, []); + const secs = Math.max(0, Math.floor((now - anchorRef.current) / 1000)); + const label = secs < 60 + ? `${secs}s` + : `${Math.floor(secs / 60)}m${secs % 60}s`; + return ( + + {label} + + ); +}; + +/** Render a multi-step thinking banner as a single block with sectioned steps. + * When `startTime` is provided, the live timer is appended *inline* next to + * the active (last) step's text — same alignment grammar as the single-line + * ThinkingBanner — rather than right-flushed in a separate column. + * The timer resets whenever the active step changes so it shows the time + * spent on the **current** action, not the cumulative wait. */ +export const ThinkingStepsBanner = (steps: string[], sx?: SxProps, startTime?: number) => { + const activeStep = steps.length > 0 ? steps[steps.length - 1] : ''; return ( - - + + : undefined} + /> ); }; /** Simple single-message thinking banner (used when no step breakdown is available). */ -export const ThinkingBanner = (message: string, sx?: SxProps, active: boolean = true) => { +export const ThinkingBanner = (message: string, sx?: SxProps, active: boolean = true, showTimer: boolean = false, startTime?: number) => { return ( {message} + {showTimer && } ); }; @@ -782,6 +838,7 @@ let SingleThreadGroupView: FC<{ }) { let tables = useSelector((state: DataFormulatorState) => state.tables); + const activeWorkspace = useSelector((state: DataFormulatorState) => state.activeWorkspace); const { t } = useTranslation(); const { manualRefresh } = useDataRefresh(); const tableById = useMemo(() => new Map(tables.map(t => [t.id, t])), [tables]); @@ -847,10 +904,11 @@ let SingleThreadGroupView: FC<{ const ids = new Map(); for (const d of draftNodes) { if (d.derive?.status === 'clarifying') { - // The pause entry is either 'clarify' or 'explain'; both shape - // the timeline the same way. + // The pause entry is one of clarify / explain / + // delegate; all three shape the timeline the + // same way (an attention row above the input box). const pauseEntry = d.derive.trigger.interaction - ?.filter(e => e.role === 'clarify' || e.role === 'explain').pop(); + ?.filter(e => e.role === 'clarify' || e.role === 'explain' || e.role === 'delegate').pop(); ids.set(d.derive.trigger.tableId, { question: pauseEntry?.content || '' }); } } @@ -1119,9 +1177,44 @@ let SingleThreadGroupView: FC<{ }); // Build a flat sequence of timeline items: [trigger, table, charts, trigger, table, charts, ...] - type TimelineItem = { key: string; element: React.ReactNode; type: 'used-table' | 'trigger' | 'table' | 'chart' | 'leaf-trigger' | 'leaf-table' | 'report'; highlighted: boolean; tableId?: string; chartType?: string; isRunning?: boolean; isClarifying?: boolean; isCompleted?: boolean; interactionEntry?: InteractionEntry; reportId?: string; stepLabel?: string }; + type TimelineItem = { key: string; element: React.ReactNode; type: 'used-table' | 'trigger' | 'table' | 'chart' | 'leaf-trigger' | 'leaf-table' | 'report' | 'merge'; highlighted: boolean; tableId?: string; chartType?: string; isRunning?: boolean; isClarifying?: boolean; isCompleted?: boolean; interactionEntry?: InteractionEntry; reportId?: string; stepLabel?: string; gutterIcon?: React.ReactNode }; let timelineItems: TimelineItem[] = []; + // Each running/clarifying draft should produce at most ONE banner per + // render pass. The same draft can be reachable from multiple + // pushAgentDraftItems call sites (the trigger-table loop *and* the + // leaf-table loop both call it for whichever tableId they're rendering), + // and after a `visualize` event the draft's `trigger.tableId` flips to + // the freshly-created child — which is then visited again as a leaf, + // so without deduping we get a duplicate "working..." banner. + const renderedDraftIds = new Set(); + + // Provenance tracker: the set of source-table IDs currently in scope for + // this thread. A merge node is emitted whenever an instruction's input + // table set differs from this — covering joins (set grows), narrowings + // (set shrinks), and substitutions (set changes). Initialised to the + // **root computation parents** of the thread's anchor so the first + // derivation against the same roots stays silent. + // + // We compare on table IDs rather than display names: names are derived + // from `displayId || stripExt(sid)` and can drift between sides. + // + // Why "root parents" instead of `parentTable.id`: `derive.source` + // contains *root/anchored* table IDs (computation parents), while + // `parentTable` may itself be a derived intermediate. Comparing the + // intermediate's own id against an instruction's root-id source set + // would always mismatch and emit a redundant merge node on the very + // first derivation in the thread. + const sourceSetKey = (ids: string[]): string => [...ids].sort().join('\x1F'); + const initialSourceIds: string[] = (() => { + if (!parentTable) return []; + // If parentTable is a root (no derive) or anchored, it IS the source. + const src = parentTable.derive?.source as string[] | undefined; + if (!src || src.length === 0) return [parentTable.id]; + return src; + })(); + let prevSourceKey: string | null = initialSourceIds.length > 0 ? sourceSetKey(initialSourceIds) : null; + // ── Shared helpers for building timeline items from interaction entries ── /** Push visible interaction entries as timeline items. @@ -1165,7 +1258,57 @@ let SingleThreadGroupView: FC<{ ? { ...entry, inputTableNames: deriveSourceNames } : entry; - const isResolved = (entry.role === 'clarify' || entry.role === 'explain') + // ── Resolved Q&A folding ── + // When a clarify/explain/delegate has been resolved + // by a following user reply, fold the pair into a single + // compact "conversation" timeline item. Consecutive resolved + // pairs are accumulated into ONE item so a back-and-forth of + // multiple rounds collapses to one trace. + const isPauseRole = entry.role === 'clarify' + || entry.role === 'explain' + || entry.role === 'delegate'; + if (isPauseRole && entry.from !== 'user') { + const pairs: { agentEntry: InteractionEntry; userEntry: InteractionEntry }[] = []; + let cursor = ei; + while (cursor < entries.length) { + const ag = entries[cursor]; + const agIsPause = ag.role === 'clarify' || ag.role === 'explain' || ag.role === 'delegate'; + if (!agIsPause || ag.from === 'user') break; + // Find the next user entry to pair with this agent question. + let userIdx = -1; + for (let j = cursor + 1; j < entries.length; j++) { + if (entries[j].from === 'user') { userIdx = j; break; } + // Stop searching if we hit another agent pause without + // an intervening user reply — that pause is still + // unresolved and shouldn't fold. + const r = entries[j].role; + if (r === 'clarify' || r === 'explain' || r === 'delegate') break; + } + if (userIdx < 0) break; + pairs.push({ agentEntry: ag, userEntry: entries[userIdx] }); + cursor = userIdx + 1; + } + if (pairs.length > 0) { + timelineItems.push({ + key: `${keyPrefix}-conv-${tableId}-${ei}`, + type: triggerType, + highlighted, + element: , + interactionEntry: pairs[pairs.length - 1].userEntry, + gutterIcon: ( + + ), + ...extraProps, + }); + ei = cursor - 1; // for-loop's ++ will advance past the last consumed user entry + continue; + } + } + + const isResolved = (entry.role === 'clarify' || entry.role === 'explain' || entry.role === 'delegate') && entries.slice(ei + 1).some(e => e.from === 'user'); timelineItems.push({ key: `${keyPrefix}-${entry.role}-${tableId}-${ei}`, @@ -1175,6 +1318,57 @@ let SingleThreadGroupView: FC<{ interactionEntry: entry, ...extraProps, }); + + // Emit a structural "merge node" between the instruction and its + // result table whenever the set of source tables CHANGES from the + // previously-active set in this thread — covers joining-in new + // sources, narrowing the set, or substituting one source for + // another. Repeated derivations against the same source set stay + // silent (no chrome). + // + // Compare on table IDs (from `derive.source`) for stability; + // names are only used for display. + const mergeNames = enrichedEntry.inputTableNames; + const mergeIds = derivedTable?.derive?.source as string[] | undefined; + if (entry.role === 'instruction' && mergeNames && mergeNames.length > 0 && mergeIds && mergeIds.length > 0) { + const nextKey = sourceSetKey(mergeIds); + // eslint-disable-next-line no-console + console.log('[merge-node check]', { + tableId, + parentTableId: parentTable?.id, + initialSourceIds, + prevSourceKey, + mergeIds, + mergeNames, + nextKey, + fires: nextKey !== prevSourceKey, + }); + if (nextKey !== prevSourceKey) { + const mergeColor = highlighted ? theme.palette.primary.main : theme.palette.text.secondary; + timelineItems.push({ + key: `${keyPrefix}-merge-${tableId}-${ei}`, + type: 'merge', + highlighted, + element: ( + + + {t('dataThread.usingSources')} + + {mergeNames.map((name, idx) => ( + + + + {name} + + + ))} + + ), + ...extraProps, + }); + prevSourceKey = nextKey; + } + } } }; @@ -1207,7 +1401,19 @@ let SingleThreadGroupView: FC<{ isRunning: boolean, keyPrefix: string, ) => { - const pauseIdx = interaction.findIndex(e => e.role === 'clarify' || e.role === 'explain'); + // For the live banner, anchor elapsed-time to the most recent + // user-side entry so resuming after a clarify resets the counter + // (the agent's *current* cycle started then, not the original + // prompt). Falls back to the first interaction timestamp. + const lastUserTs = (() => { + for (let i = interaction.length - 1; i >= 0; i--) { + if (interaction[i].from === 'user' && interaction[i].timestamp) { + return interaction[i].timestamp as number; + } + } + return interaction[0]?.timestamp; + })(); + const pauseIdx = interaction.findIndex(e => e.role === 'clarify' || e.role === 'explain' || e.role === 'delegate'); if (pauseIdx < 0) { // No pause — render all entries then ThinkingStepsBanner pushInteractionEntries(interaction, tableId, triggerType, highlighted, keyPrefix); @@ -1217,7 +1423,7 @@ let SingleThreadGroupView: FC<{ type: triggerType, highlighted, isRunning, - element: ThinkingStepsBanner(planLines, { px: 1, py: 0.5 }), + element: ThinkingStepsBanner(planLines, { px: 1, py: 0.5 }, isRunning ? lastUserTs : undefined), }); return; } @@ -1257,13 +1463,17 @@ let SingleThreadGroupView: FC<{ type: triggerType, highlighted, isRunning: true, - element: ThinkingStepsBanner(planLines, { px: 1, py: 0.5 }), + element: ThinkingStepsBanner(planLines, { px: 1, py: 0.5 }, lastUserTs), }); } }; if (runningAgentTableIds.has(tableId)) { const runningDraft = draftNodes.find(d => d.derive?.status === 'running' && d.derive.trigger.tableId === tableId); + if (runningDraft && renderedDraftIds.has(runningDraft.id)) { + return; + } + if (runningDraft) renderedDraftIds.add(runningDraft.id); const draftInteraction = runningDraft?.derive?.trigger?.interaction; if (draftInteraction && draftInteraction.length > 0) { renderSplitByClarity( @@ -1280,11 +1490,15 @@ let SingleThreadGroupView: FC<{ type: 'chart', highlighted, isRunning: true, - element: ThinkingBanner(message, { px: 1, py: 0.5 }), + element: ThinkingBanner(message, { px: 1, py: 0.5 }, true, true), }); } } else if (clarifyAgentTableIds.has(tableId)) { const clarifyDraft = draftNodes.find(d => d.derive?.status === 'clarifying' && d.derive.trigger.tableId === tableId); + if (clarifyDraft && renderedDraftIds.has(clarifyDraft.id)) { + return; + } + if (clarifyDraft) renderedDraftIds.add(clarifyDraft.id); const clarifyInteraction = clarifyDraft?.derive?.trigger?.interaction; if (clarifyInteraction && clarifyInteraction.length > 0) { renderSplitByClarity( @@ -1294,7 +1508,7 @@ let SingleThreadGroupView: FC<{ 'agent-clarify-entry', ); const lastItem = timelineItems[timelineItems.length - 1]; - if (lastItem?.interactionEntry?.role === 'clarify' || lastItem?.interactionEntry?.role === 'explain') { + if (lastItem?.interactionEntry?.role === 'clarify' || lastItem?.interactionEntry?.role === 'explain' || lastItem?.interactionEntry?.role === 'delegate') { lastItem.isClarifying = true; } } else { @@ -1558,14 +1772,15 @@ let SingleThreadGroupView: FC<{ // entry card's palette. const getClarifyIcon = (item: typeof timelineItems[0]) => { const role = item.interactionEntry?.role; - const color = role === 'explain' ? theme.palette.info.main : theme.palette.warning.main; - return ; }; @@ -1661,6 +1876,7 @@ let SingleThreadGroupView: FC<{ const isTrigger = item.type === 'trigger' || item.type === 'leaf-trigger'; const isTable = item.type === 'table' || item.type === 'leaf-table' || item.type === 'used-table'; const isChart = item.type === 'chart'; + const isMerge = item.type === 'merge'; const dashedColor = item.highlighted ? alpha(theme.palette.primary.main, 0.6) : 'rgba(0,0,0,0.1)'; const dashedWidth = '2px'; const dashedStyle = 'solid'; @@ -1672,6 +1888,32 @@ let SingleThreadGroupView: FC<{ // No dimming or background — rely on timeline color + card border for highlighting const rowHighlightSx = {}; + // Merge nodes: a confluence glyph in the gutter + inline list of + // joined source tables. Communicates provenance changes (join, narrow, + // or substitute) — rendered with stronger weight than ambient chrome + // since it conveys meaningful lineage information. + if (isMerge) { + return ( + + + + + + + {!isLast && } + {isLast && hasContinuationBelow && } + {isLast && !hasContinuationBelow && } + + + {item.element} + + + ); + } + // Triggers: icon based on interaction entry's `from` actor if (isTrigger) { const entry = item.interactionEntry; @@ -1700,9 +1942,11 @@ let SingleThreadGroupView: FC<{ ? getClarifyIcon(item) : item.isCompleted && item.stepLabel ? getStepIcon(item.stepLabel, iconColor) - : entry - ? getEntryGutterIcon(entry, iconColor) - : getDefaultGutterIcon(iconColor); + : item.gutterIcon + ? item.gutterIcon + : entry + ? getEntryGutterIcon(entry, iconColor) + : getDefaultGutterIcon(iconColor); // Clarification rows are clickable to bring the agent's pause // back into focus. Prefer the latest chart on the associated @@ -2028,7 +2272,27 @@ let SingleThreadGroupView: FC<{ onClick={(e) => { e.stopPropagation(); if (selectedTableForMenu) { + // If this is the last source table, also wipe the + // session itself — a workspace with no source + // table is effectively empty and the user would + // otherwise be left staring at a blank thread. + const isSource = !selectedTableForMenu.derive; + const remainingSources = tables.filter(t => !t.derive && t.id !== selectedTableForMenu.id); + const shouldDeleteSession = isSource && remainingSources.length === 0; + const wsToDelete = shouldDeleteSession ? activeWorkspace?.id : undefined; dispatch(dfActions.deleteTable(selectedTableForMenu.id)); + if (shouldDeleteSession && wsToDelete) { + (async () => { + try { + await deleteWorkspace(wsToDelete); + } catch { + // best effort — user can still manually delete from the sidebar + } + // Drop into the unsessioned landing state instead of + // auto-creating a new "Untitled Session" workspace. + dispatch(dfActions.resetState()); + })(); + } } handleCloseTableMenu(); }} @@ -2072,25 +2336,11 @@ const ChartThumbnail: FC<{ table: DictTable; status: 'available' | 'pending' | 'unavailable'; onChartClick: (chartId: string, tableId: string) => void; - onDelete: (chartId: string) => void; -}> = ({ chart, table, status, onChartClick, onDelete }) => { +}> = ({ chart, table, status, onChartClick }) => { const { t } = useTranslation(); - - let deleteButton = - { event.stopPropagation(); onDelete(chart.id); }}> - - - + // Thumbnails live in a dedicated slice so updating one chart's preview + // doesn't invalidate the `charts` array reference for every consumer. + const thumbnail = useSelector(dfSelectors.getChartThumbnail(chart.id)); const pendingOverlay = status == 'pending' ? {pendingOverlay} - {deleteButton} ; } @@ -2122,13 +2371,12 @@ const ChartThumbnail: FC<{ {generateChartSkeleton(chartTemplate?.icon, 32, 32, chart.chartType == 'Table' ? 1 : 0.5)} - {deleteButton} ; } // ---- Thumbnail path: use cached PNG from ChartRenderService ---- - if (chart.thumbnail) { + if (thumbnail) { return ( onChartClick(chart.id, table.id)} @@ -2136,15 +2384,11 @@ const ChartThumbnail: FC<{ style={{ width: "100%", position: "relative", cursor: "pointer" }} > - {chart.saved && - - } {pendingOverlay} - {deleteButton} {t('dataThread.chartAlt', @@ -2171,15 +2415,11 @@ const ChartThumbnail: FC<{ style={{ width: "100%", position: "relative", cursor: "pointer" }} > - {chart.saved && - - } {pendingOverlay} - {deleteButton} = function ({ sx }) { onChartClick={() => { dispatch(dfActions.setFocused({ type: 'chart', chartId: chart.id })); }} - onDelete={() => {dispatch(dfActions.deleteChartById(chart.id))}} />; - return { chartId: chart.id, tableId: table.id, element }; + return { + chartId: chart.id, + tableId: table.id, + element, + onDelete: () => { dispatch(dfActions.deleteChartById(chart.id)); }, + deleteTooltip: t('dataThread.deleteChart'), + unread: !!chart.unread, + }; }); }, [charts, tables, conceptShelfItems, chartSynthesisInProgress]); diff --git a/src/views/DataThreadCards.tsx b/src/views/DataThreadCards.tsx index eb249370..76660495 100644 --- a/src/views/DataThreadCards.tsx +++ b/src/views/DataThreadCards.tsx @@ -12,6 +12,7 @@ import { Tooltip, ButtonGroup, useTheme, + alpha, } from '@mui/material'; import { dfActions } from '../app/dfSlice'; @@ -29,22 +30,75 @@ import { ComponentBorderStyle, shadow, transition } from '../app/tokens'; // ─── Chart Card ────────────────────────────────────────────────────────────── export let buildChartCard = ( - chartElement: { tableId: string, chartId: string, element: any }, + chartElement: { tableId: string, chartId: string, element: any, onDelete?: () => void, deleteTooltip?: string, unread?: boolean }, focusedChartId?: string, ) => { let selectedClassName = focusedChartId == chartElement.chartId ? 'selected-card' : ''; - return - {chartElement.element} - + + {chartElement.element} + {isUnread && ( + + )} + + {chartElement.onDelete && ( + + { event.stopPropagation(); chartElement.onDelete?.(); }} + > + + + + )} + } // ─── Trigger Card Wrapper ──────────────────────────────────────────────────── diff --git a/src/views/EncodingShelfCard.tsx b/src/views/EncodingShelfCard.tsx index 666001de..39faebd2 100644 --- a/src/views/EncodingShelfCard.tsx +++ b/src/views/EncodingShelfCard.tsx @@ -99,7 +99,7 @@ const chartNameToI18nKey: Record = { "Bar Chart": "barChart", "Grouped Bar Chart": "groupedBarChart", "Stacked Bar Chart": "stackedBarChart", "Histogram": "histogram", "Lollipop Chart": "lollipopChart", "Pyramid Chart": "pyramidChart", - "Line Chart": "lineChart", "Dotted Line Chart": "dottedLineChart", + "Line Chart": "lineChart", "Bump Chart": "bumpChart", "Area Chart": "areaChart", "Streamgraph": "streamgraph", "Pie Chart": "pieChart", "Rose Chart": "roseChart", "Heatmap": "heatmap", "Waterfall Chart": "waterfallChart", @@ -111,9 +111,13 @@ const chartNameToI18nKey: Record = { }; const chartCategoryToI18nKey: Record = { - "Scatter & Point": "scatterAndPoint", "Bar": "bar", - "Line & Area": "lineAndArea", "Part-to-Whole": "partToWhole", - "Statistical": "statistical", "Map": "map", "Custom": "custom", + "Points": "points", + "Bars": "bars", + "Distributions": "distributions", + "Lines & Areas": "linesAndAreas", + "Circular": "circular", + "Tables & Maps": "tablesAndMaps", + "Custom": "custom", }; import { TableIcon, AgentIcon as PrecisionManufacturing } from '../icons'; import ChangeCircleOutlinedIcon from '@mui/icons-material/ChangeCircleOutlined'; @@ -330,13 +334,6 @@ const STYLE_PRESETS: StylePreset[] = [ instruction: 'Restyle this chart in the FiveThirtyEight (538) blog style.', }, - { - key: 'minimal', - label: 'Minimal', - description: 'Clean and pared-back', - instruction: - 'Restyle this chart in a minimal, pared-back modernist style.', - }, { key: 'dark', label: 'Dark Mode', @@ -351,6 +348,13 @@ const STYLE_PRESETS: StylePreset[] = [ instruction: 'Restyle this chart for a slide-deck presentation, optimized for being viewed at a distance.', }, + { + key: 'comic', + label: 'Comic', + description: 'Hand-drawn comic book look', + instruction: + 'Restyle this chart in a comic style.', + }, ]; @@ -485,7 +489,11 @@ export const EncodingShelfCard: FC = function ({ chartId useEffect(() => { if (!isLoadingIdeas) { setIdeaElapsed(0); return; } - const timer = setInterval(() => setIdeaElapsed(e => e + 1), 1000); + // Tick once per second — fast enough to read as live, slow enough to + // stay readable; the loading indicator carries the liveness cue. + // Anchor to a start timestamp to avoid float drift. + const t0 = Date.now(); + const timer = setInterval(() => setIdeaElapsed(Math.floor((Date.now() - t0) / 1000)), 1000); return () => clearInterval(timer); }, [isLoadingIdeas]); @@ -700,7 +708,6 @@ export const EncodingShelfCard: FC = function ({ chartId newChart = structuredClone(chart) as Chart; newChart.source = "user"; newChart.id = `chart-${Date.now() - Math.floor(Math.random() * 10000)}`; - newChart.saved = false; newChart.tableRef = candidateTable.id; // Style variants belong to the chart they were authored // against — don't carry them over to a follow-up chart. @@ -1271,8 +1278,8 @@ export const EncodingShelfCard: FC = function ({ chartId }}> {/* Left group: one-click style presets. Clicking the palette icon opens a menu of curated "style sheets" (NYT, Economist, - FiveThirtyEight, minimal, dark mode, presentation). Each - preset sends a detailed style instruction straight to the + FiveThirtyEight, minimal, dark mode, presentation, comic). + Each preset sends a detailed style instruction straight to the restyle agent — bypassing the intent classifier since these are guaranteed style-only changes. The user can still type freeform instructions in the textbox above; the menu's diff --git a/src/views/EncodingShelfThread.tsx b/src/views/EncodingShelfThread.tsx index 33c1df98..da3d4175 100644 --- a/src/views/EncodingShelfThread.tsx +++ b/src/views/EncodingShelfThread.tsx @@ -37,6 +37,7 @@ export let ChartElementFC: FC<{ boxWidth?: number, boxHeight?: number}> = function({chart, tableRows, tableMetadata, boxWidth, boxHeight}) { const conceptShelfItems = useSelector((state: DataFormulatorState) => state.conceptShelfItems); + const thumbnail = useSelector(dfSelectors.getChartThumbnail(chart.id)); let WIDTH = boxWidth || 120; let HEIGHT = boxHeight || 80; @@ -58,12 +59,12 @@ export let ChartElementFC: FC<{ } // Use cached thumbnail from ChartRenderService when available - if (chart.thumbnail) { + if (thumbnail) { return ( + backgroundColor: "white" }}> {`${chart.chartType} diff --git a/src/views/ExplComponents.tsx b/src/views/ExplComponents.tsx index 10c264a9..7856545a 100644 --- a/src/views/ExplComponents.tsx +++ b/src/views/ExplComponents.tsx @@ -126,7 +126,8 @@ const renderWithMath = (text: string) => { const ConceptExplanationCard = styled(Box, { shouldForwardProp: (prop) => prop !== 'secondary', })<{ secondary: boolean }>(({ theme, secondary }) => ({ - padding: '8px 12px', + padding: '8px 10px', + minWidth: 0, borderLeft: `3px solid ${secondary ? theme.palette.secondary.main : theme.palette.primary.light}`, borderRadius: '2px', backgroundColor: alpha(theme.palette.background.paper, 0.5), @@ -151,14 +152,23 @@ const ConceptName = styled(Typography, { const ConceptExplanation = styled(Typography)(({ theme }) => ({ fontSize: '11px', lineHeight: 1.4, - overflow: 'auto', + minWidth: 0, color: theme.palette.text.primary, '& .katex': { - fontSize: '12px', + fontSize: '11px', lineHeight: 1.2, }, + // KaTeX block-math defaults to `overflow-x: auto` with vertical padding + // that reserves room for a scrollbar even when the formula fits. Drop + // the bottom padding and only show the scrollbar if it's actually needed + // (and hide its track to keep the card clean). '& .katex-display': { margin: '4px 0', + paddingBottom: 0, + overflowX: 'auto', + overflowY: 'hidden', + scrollbarWidth: 'none', + '&::-webkit-scrollbar': { display: 'none' }, }, })); @@ -189,13 +199,15 @@ export const ConceptExplCards: FC = ({ return ( - - {/* Concepts Grid */} - + {/* Formulas grid — reflows to one column when there isn't room for two + side-by-side, so long formulas (\sum, fractions) don't overflow. */} + {displayConcepts.map((concept, index) => { let secondary = concept.field == "Statistical Analysis"; diff --git a/src/views/InteractionEntryCard.tsx b/src/views/InteractionEntryCard.tsx index 41251f64..8cfd0000 100644 --- a/src/views/InteractionEntryCard.tsx +++ b/src/views/InteractionEntryCard.tsx @@ -8,6 +8,7 @@ import { Box, Collapse, Typography, useTheme } from '@mui/material'; import { alpha } from '@mui/material/styles'; import PersonIcon from '@mui/icons-material/Person'; import SmartToyOutlinedIcon from '@mui/icons-material/SmartToyOutlined'; +import { AgentToyIcon, AgentToyVariant } from './AgentToyIcon'; import TerminalIcon from '@mui/icons-material/Terminal'; import SearchIcon from '@mui/icons-material/Search'; import AutoGraphIcon from '@mui/icons-material/AutoGraph'; @@ -17,7 +18,7 @@ import ErrorOutlineIcon from '@mui/icons-material/ErrorOutline'; import WarningAmberIcon from '@mui/icons-material/WarningAmber'; import InfoOutlinedIcon from '@mui/icons-material/InfoOutlined'; import { InteractionEntry } from '../components/ComponentType'; -import { AgentIcon, TableIcon } from '../icons'; +import { AgentIcon } from '../icons'; import { radius, borderColor } from '../app/tokens'; /** Pick the icon component for a step line based on known prefixes. */ @@ -38,7 +39,8 @@ export const getStepIconComponent = (line: string) => { const PlanStepItem: React.FC<{ step: string; showShimmer: boolean; -}> = ({ step, showShimmer }) => { + trailing?: React.ReactNode; +}> = ({ step, showShimmer, trailing }) => { const [expanded, setExpanded] = useState(false); const isChecked = step.startsWith('✓'); const isFailed = step.startsWith('✗'); @@ -87,6 +89,7 @@ const PlanStepItem: React.FC<{ }}> {displayLine} + {trailing} ); }; @@ -98,7 +101,9 @@ export const PlanStepsView: React.FC<{ steps: string[]; activeLastStep?: boolean; filterCreatingChart?: boolean; -}> = ({ steps, activeLastStep = false, filterCreatingChart = false }) => { + /** Inline node appended after the text of the last (active) step — used for live timers. */ + trailing?: React.ReactNode; +}> = ({ steps, activeLastStep = false, filterCreatingChart = false, trailing }) => { const filtered = filterCreatingChart ? steps.filter(l => { const stripped = l.startsWith('✓') ? l.slice(2) : l; @@ -113,7 +118,7 @@ export const PlanStepsView: React.FC<{ const isLast = idx === filtered.length - 1; const isChecked = step.startsWith('✓'); const showShimmer = activeLastStep && isLast && !isChecked; - return ; + return ; })} ); @@ -166,18 +171,28 @@ export const CompactMarkdown: React.FC<{ content: string; color: string }> = ({ ); -/** Render text with **field** markers as styled spans with subtle background. */ -export function renderFieldHighlights(text: string, bgColor: string): React.ReactNode { +/** Render text with `**field**` markers as styled spans. The marker is + * rendered as a flat "highlighter underline" — a thin colored bar sitting + * just below the text baseline. Text weight, size, and color stay + * unchanged so the cue scales gracefully with marker density (one or many) + * without dominating the prose. + * + * `accentColor` is the solid base color; alpha is applied internally. */ +export function renderFieldHighlights(text: string, accentColor: string): React.ReactNode { const parts = text.split(/(\*\*[^*]+\*\*)/g); return parts.map((part, i) => { const match = part.match(/^\*\*(.+)\*\*$/); if (match) { return ( {match[1]} @@ -187,6 +202,13 @@ export function renderFieldHighlights(text: string, bgColor: string): React.Reac }); } +/** Strip `**field**` markers from text, leaving plain inline text. Used in + * agent prose where the field-highlight chip is suppressed but the raw + * markers should not leak through. */ +export function stripFieldMarkers(text: string): string { + return text.replace(/\*\*([^*]+)\*\*/g, '$1'); +} + export interface InteractionEntryCardProps { entry: InteractionEntry; highlighted?: boolean; @@ -206,37 +228,26 @@ export const InteractionEntryCard: React.FC = memo(({ // User prompts and user instructions — card with custom palette if (entry.from === 'user' && (entry.role === 'prompt' || entry.role === 'instruction')) { const palette = theme.palette.custom; - const fieldBg = alpha(palette.main, 0.08); - const userInputTablesSuffix = entry.inputTableNames && entry.inputTableNames.length > 0 ? ( - - {entry.inputTableNames.map((name, idx) => ( - - {idx > 0 && ,} - - - - {name} - - - - ))} - - ) : null; + // Provenance for multi-input derivations is rendered as a structural + // "merge node" in the timeline gutter (see DataThread), so the + // instruction card itself stays free of chip-strip chrome. return ( - {renderFieldHighlights(text, fieldBg)} + {renderFieldHighlights(text, palette.main)} - {userInputTablesSuffix} ); } @@ -246,6 +257,10 @@ export const InteractionEntryCard: React.FC = memo(({ if (entry.from !== 'user') { const fieldBg = alpha(theme.palette.primary.main, 0.05); + const displayText = stripFieldMarkers(entry.role === 'instruction' + ? (entry.displayContent || entry.content) + : text); + // Role-specific color: secondary for content, semantic colors for status let color: string; let collapsedLabel: string | null = null; @@ -254,21 +269,32 @@ export const InteractionEntryCard: React.FC = memo(({ color = theme.palette.text.secondary; break; case 'clarify': - color = resolved ? theme.palette.text.secondary : theme.palette.warning.main; - if (resolved) collapsedLabel = t('interaction.askedForClarification'); + // Active conversational entries (clarify / explain / + // suggest_data_search) use neutral text — the semantic cue + // is carried by the icon, not by recoloring whole paragraphs. + color = resolved ? theme.palette.text.secondary : theme.palette.text.primary; + if (resolved) collapsedLabel = (displayText || t('interaction.askedForClarification')).replace(/\s+/g, ' ').trim(); break; case 'explain': { - color = resolved ? theme.palette.text.secondary : theme.palette.info.main; + color = resolved ? theme.palette.text.secondary : theme.palette.text.primary; if (resolved) { - collapsedLabel = t('interaction.gaveExplanation'); + collapsedLabel = (displayText || t('interaction.gaveExplanation')).replace(/\s+/g, ' ').trim(); + } + break; + } + case 'delegate': { + color = resolved ? theme.palette.text.secondary : theme.palette.text.primary; + if (resolved) { + const target = entry.delegateTarget || 'data_loading'; + const defaultLabel = target === 'report_gen' + ? t('interaction.delegatedToReportGen') + : t('interaction.delegatedToDataLoading'); + collapsedLabel = (displayText || defaultLabel).replace(/\s+/g, ' ').trim(); } - // Active explain renders inline in the info color (same - // treatment as an active clarify) so it reads as primary - // content, not a disabled preview. break; } case 'summary': - color = theme.palette.text.secondary; + color = theme.palette.text.primary; break; case 'error': color = theme.palette.error.main; @@ -277,9 +303,6 @@ export const InteractionEntryCard: React.FC = memo(({ color = theme.palette.text.secondary; } - const displayText = entry.role === 'instruction' - ? (entry.displayContent || entry.content) - : text; const hasPlan = !!entry.plan && entry.plan !== displayText; // Active clarify/explain entries are read in the ClarificationPanel @@ -287,7 +310,7 @@ export const InteractionEntryCard: React.FC = memo(({ // click). Their truncated preview here should always stay clamped — // no in-place expand, to avoid duplicating the panel content. const isActiveAgentPause = !resolved - && (entry.role === 'clarify' || entry.role === 'explain'); + && (entry.role === 'clarify' || entry.role === 'explain' || entry.role === 'delegate'); // Auto-clamp very long agent text bubbles. Tied to the same // `expanded` state as thinking — one parent click reveals both — @@ -303,22 +326,73 @@ export const InteractionEntryCard: React.FC = memo(({ const isCollapsible = hasPlan || !!collapsedLabel || canClampText; const [expanded, setExpanded] = useState(false); - // Render input table names suffix if available - const inputTablesSuffix = entry.inputTableNames && entry.inputTableNames.length > 0 ? ( - - {entry.inputTableNames.map((name, idx) => ( - - {idx > 0 && ,} - - - - {name} - - - - ))} - - ) : null; + // Provenance for multi-input derivations is rendered as a structural + // "merge node" in the timeline gutter (see DataThread), so the + // instruction card itself stays free of chip-strip chrome. + + // Conversational agent entries (instruction / clarify / explain / + // summary) all read as "the agent talking" — wrap them in a bordered + // bubble matching the user's instruction card so the timeline reads + // as a sibling pair of cards. Active clarify/explain and error keep + // their semantic color via a left-border accent rather than a + // tinted fill. `summary` entries are the agent's findings/conclusions, + // so they get a distinct soft info-tinted fill (boxed color only — + // same border/shape as other bubbles) to read as "insight" rather + // than "in-progress discussion". + const isConversational = entry.role === 'instruction' + || entry.role === 'clarify' + || entry.role === 'explain' + || entry.role === 'delegate' + || entry.role === 'summary'; + // Bubble chrome stays close to neutral, but the special states earn + // a soft tinted fill in their per-variant semantic hue. The hues + // here match `AgentPausePanel` so a paused entry and its panel + // above the input read as the same color family: + // clarify → warning ("you're being asked") + // explain / suggest → primary ("here's an answer / handoff") + // summary → secondary ("agent's finding") + // error → error + const isActiveClarify = entry.role === 'clarify' && !resolved; + const isActiveExplain = (entry.role === 'explain' + || entry.role === 'delegate') && !resolved; + const isSummary = entry.role === 'summary'; + // Resolved clarify / explain / delegate entries collapse + // into a "light timeline trace" — no card chrome, just a faded + // one-line note. They still expand on click (the full text is + // preserved via `collapsedLabel`/`displayText`), but at rest the + // data thread foregrounds charts/data instead of back-and-forth. + const isResolvedPause = resolved + && (entry.role === 'clarify' + || entry.role === 'explain' + || entry.role === 'delegate'); + const bubbleAccent = entry.role === 'error' + ? theme.palette.error.main + : isSummary + ? theme.palette.primary.main + : isActiveClarify + ? theme.palette.warning.main + : isActiveExplain + ? theme.palette.primary.main + : null; + const bubbleBg = bubbleAccent + ? alpha(bubbleAccent, 0.05) + : alpha(theme.palette.text.primary, 0.03); + const bubbleHover = bubbleAccent + ? alpha(bubbleAccent, 0.09) + : alpha(theme.palette.text.primary, 0.05); + // Conversational bubbles get card chrome, except resolved pauses + // which render as a chrome-less compact trace. + const bubbleSx = (isConversational && !isResolvedPause) ? { + py: 0.5, px: 1, + borderRadius: radius.sm, + backgroundColor: bubbleBg, + border: `1px solid ${borderColor.component}`, + } : isResolvedPause ? { + // Minimal trace: just inline padding so the text aligns with + // the gutter icon and adjacent bubbles. No bg, no border. + py: '2px', px: '4px', + opacity: 0.7, + } : {}; return ( = memo(({ // refocus — show pointer here too so the affordance // reads consistently across icon, gutter, and text. cursor: (isCollapsible || isActiveAgentPause) ? 'pointer' : 'default', - ...(isCollapsible ? { + ...bubbleSx, + ...(isCollapsible && !isConversational ? { borderRadius: '4px', px: '2px', mx: '-2px', '&:hover': { backgroundColor: 'rgba(0,0,0,0.03)' }, } : {}), + ...(isCollapsible && isConversational ? { + '&:hover': { backgroundColor: bubbleHover }, + } : {}), }} onClick={() => isCollapsible && setExpanded(!expanded)} > @@ -349,13 +427,11 @@ export const InteractionEntryCard: React.FC = memo(({ {hasPlan && } {collapsedLabel && ( - {entry.role === 'instruction' || entry.role === 'explain' - ? renderFieldHighlights(displayText, fieldBg) - : displayText} + {displayText} )} @@ -363,21 +439,25 @@ export const InteractionEntryCard: React.FC = memo(({ {collapsedLabel ? ( !expanded && ( {collapsedLabel} ) ) : entry.role === 'summary' ? ( - + ) : ( = memo(({ overflow: 'hidden', } : {}), }}> - {(entry.role === 'clarify' || entry.role === 'explain') && !resolved && ( + {(entry.role === 'clarify' || entry.role === 'explain' || entry.role === 'delegate') && !resolved && ( - ({t('interaction.clarificationNeeded')}) + ({entry.role === 'delegate' + ? (entry.delegateTarget === 'report_gen' + ? t('interaction.delegateLabelReportGen') + : t('interaction.delegateLabelDataLoading')) + : t('interaction.clarificationNeeded')}) )} - {entry.role === 'instruction' || entry.role === 'explain' - ? renderFieldHighlights(displayText, fieldBg) + {/* Active conversational bubbles render `**field**` + markers as highlights tinted in the bubble's own + accent color, so the underline matches the + bubble bg (and the matching panel above the input). */} + {(isActiveClarify || isActiveExplain) + ? renderFieldHighlights(text, bubbleAccent ?? theme.palette.primary.main) : displayText} )} - {inputTablesSuffix} ); } @@ -420,16 +507,147 @@ export const InteractionEntryCard: React.FC = memo(({ ); }); +export interface ResolvedConversationCardProps { + pairs: { agentEntry: InteractionEntry; userEntry: InteractionEntry }[]; + highlighted?: boolean; +} + +/** Render one or more resolved clarify/explain/suggest_data_search + * exchanges (each followed by a user reply) folded together into a + * single compact "conversation" timeline item. Collapsed by default to + * a one-line trace prefixed with a chat-bubble glyph; clicking expands + * to show every Q & A in order as paired bubbles. + * + * This declutters the data thread once a back-and-forth is resolved — + * the timeline foregrounds charts/data; the exchange recedes into a + * hinted "💬 conversation happened here" marker that stays openable + * for context. + */ +export const ResolvedConversationCard: React.FC = memo(({ pairs }) => { + const theme = useTheme(); + const { t } = useTranslation(); + const [expanded, setExpanded] = useState(false); + + if (pairs.length === 0) return null; + + // Preview uses the LAST user reply (most recent resolution); fall back + // to the last agent question if that reply is empty. + const lastPair = pairs[pairs.length - 1]; + const lastUserText = stripFieldMarkers(lastPair.userEntry.displayContent || lastPair.userEntry.content).replace(/\s+/g, ' ').trim(); + const lastAgentText = stripFieldMarkers(lastPair.agentEntry.displayContent || lastPair.agentEntry.content).replace(/\s+/g, ' ').trim(); + const previewText = lastUserText || lastAgentText; + + const dim = theme.palette.text.secondary; + const customPalette = theme.palette.custom; + const turnCount = pairs.length; + + return ( + setExpanded(v => !v)} + sx={{ + cursor: 'pointer', + py: '2px', + px: '4px', + borderRadius: '4px', + '&:hover': { backgroundColor: 'rgba(0,0,0,0.03)' }, + }} + > + {!expanded ? ( + + {turnCount > 1 && ( + + ×{turnCount} + + )} + + {previewText} + + + ) : ( + + {pairs.map((p, idx) => ( + + + + {renderFieldHighlights(p.agentEntry.displayContent || p.agentEntry.content, theme.palette.primary.main)} + + + + + {renderFieldHighlights(p.userEntry.displayContent || p.userEntry.content, customPalette.main)} + + + + ))} + + )} + + ); +}); +ResolvedConversationCard.displayName = 'ResolvedConversationCard'; + /** Returns the appropriate gutter icon for an InteractionEntry. */ export function getEntryGutterIcon(entry: InteractionEntry, color: string): React.ReactNode { - const iconSx = { width: 14, height: 14, color }; + const iconSx = { width: 18, height: 18, color }; if (entry.from === 'user') { return ; } - return ; + // Pick a role-specific variant of the agent toy so the gutter conveys + // state at a glance (thinking / summary / clarify / explain). + const variant: AgentToyVariant = (() => { + switch (entry.role) { + case 'clarify': return 'clarify'; + case 'explain': return 'explain'; + case 'delegate': return 'explain'; + case 'summary': return 'summary'; + case 'instruction': return 'thinking'; + default: return 'default'; + } + })(); + return ; } /** Returns the appropriate gutter icon when no entry is available (fallback). */ export function getDefaultGutterIcon(color: string): React.ReactNode { - return ; + return ; } diff --git a/src/views/ModelSelectionDialog.tsx b/src/views/ModelSelectionDialog.tsx index 2249acb6..ccc23238 100644 --- a/src/views/ModelSelectionDialog.tsx +++ b/src/views/ModelSelectionDialog.tsx @@ -58,7 +58,6 @@ import CheckCircleOutlineIcon from '@mui/icons-material/CheckCircleOutline'; import ErrorOutlineIcon from '@mui/icons-material/ErrorOutline'; import HelpOutlineIcon from '@mui/icons-material/HelpOutline'; import InfoOutlinedIcon from '@mui/icons-material/InfoOutlined'; -import SettingsOutlinedIcon from '@mui/icons-material/SettingsOutlined'; import { getUrls } from '../app/utils'; import { apiRequest, ApiRequestError } from '../app/apiClient'; @@ -336,16 +335,19 @@ export const ModelSelectionButton: React.FC<{}> = ({ }) => { /** Render a single model row. isGlobal controls delete button and key display. */ const renderModelRow = (model: ModelConfig, isGlobal: boolean) => { const status = getStatus(model.id); + // Server-configured models in 'unknown' are trusted by default and + // displayed as "server configured" instead of an untested "Test" row. + const serverConfigured = isGlobal && status === 'unknown'; const statusIcon = - status === 'configured' ? : - status === 'unknown' ? : - status === 'testing' ? : - status === 'ok' ? : - ; + serverConfigured ? : + status === 'unknown' ? : + status === 'testing' ? : + status === 'ok' ? : + ; let message = t('model.modelReadyMessage'); - if (status === 'configured') { + if (serverConfigured) { message = t('model.configuredMessage', 'Server configured, click to verify connectivity'); } else if (status === 'unknown') { message = t('model.clickToTestModel'); @@ -354,7 +356,9 @@ export const ModelSelectionButton: React.FC<{}> = ({ }) => { message = t('model.errorMessage', { message: decodeHtmlEntities(rawMessage) }); } - const selectable = status === 'ok' || status === 'configured' || (isGlobal && status !== 'error'); + // Selectable when verified ('ok'), or when it's a server-configured + // model in 'unknown' state (trusted by default, no test required). + const selectable = status === 'ok' || serverConfigured; const isSelected = tempSelectedModelId === model.id; return ( @@ -363,7 +367,7 @@ export const ModelSelectionButton: React.FC<{}> = ({ }) => { sx={{ cursor: selectable ? 'pointer' : 'default', // Don't dim error rows so the Retest button and error message remain clearly clickable. - opacity: selectable || status === 'error' ? 1 : 0.5, + opacity: selectable || status === 'error' || status === 'testing' ? 1 : 0.5, backgroundColor: isSelected ? alpha(theme.palette.primary.main, 0.04) : 'transparent', outline: isSelected ? `2px solid ${theme.palette.primary.main}` : 'none', outlineOffset: -2, @@ -374,11 +378,37 @@ export const ModelSelectionButton: React.FC<{}> = ({ }) => { )} > - {model.model} + + + {model.model} + + {isGlobal && ( + + + {t('model.serverChip', 'server configured')} + + + )} + {isGlobal - ? {t('model.serverManaged', 'Server managed')} + ? null : model.api_key ? (showKeys ? {model.api_key} @@ -390,34 +420,40 @@ export const ModelSelectionButton: React.FC<{}> = ({ }) => { {model.endpoint} - {model.api_base - ? {model.api_base} - : {t('model.default')} + {isGlobal + ? null + : model.api_base + ? {model.api_base} + : {t('model.default')} } - {model.api_version - ? model.api_version - : {t('model.default')} + {isGlobal + ? null + : model.api_version + ? model.api_version + : {t('model.default')} } - + - - {!isGlobal && ( + {!isGlobal && ( + = ({ }) => { - )} - + + )} {status === 'error' && ( @@ -448,16 +484,6 @@ export const ModelSelectionButton: React.FC<{}> = ({ }) => { ); }; - const sectionHeader = (label: string) => ( - - - - {label} - - - - ); - let modelTable =
= ({ }) => { - {/* Global / server-managed models */} - {globalModels.length > 0 && sectionHeader( - t('model.serverManagedSection', 'Server configured models'), - )} + {/* Server-configured models first, then user-added models. */} {globalModels.map(model => renderModelRow(model, true))} - - {/* User-added models */} - {sectionHeader(t('model.userManagedSection', 'My models'))} {models.map(model => renderModelRow(model, false))} {newModelEntry} @@ -510,12 +530,14 @@ export const ModelSelectionButton: React.FC<{}> = ({ }) => { const allModels = [...globalModels, ...models]; - // A model is "ready" if tested ok, or if it's a server-configured model - // (status "configured") that the admin has set up and is trusted to work. + // A model is "ready" to use when it's been verified ('ok') or when it's a + // server-configured model in 'unknown' state (trusted by default). const isModelReady = (id: string | undefined): boolean => { if (!id) return false; const status = getStatus(id); - return status === 'ok' || status === 'configured'; + if (status === 'ok') return true; + const isGlobal = globalModels.some(m => m.id === id); + return isGlobal && status === 'unknown'; }; let modelNotReady = !isModelReady(tempSelectedModelId); diff --git a/src/views/ReportView.tsx b/src/views/ReportView.tsx index d5b1241b..a01f9b1a 100644 --- a/src/views/ReportView.tsx +++ b/src/views/ReportView.tsx @@ -34,6 +34,8 @@ export const ReportView: FC = () => { const allGeneratedReports = useSelector(dfSelectors.getAllGeneratedReports); const serverConfig = useSelector((state: DataFormulatorState) => state.serverConfig); const focusedId = useSelector((state: DataFormulatorState) => state.focusedId); + // Thumbnails live in their own slice so updates don't churn `state.charts`. + const chartThumbnails = useSelector((state: DataFormulatorState) => state.chartThumbnails) || {}; const focusedChartId = focusedId?.type === 'chart' ? focusedId.chartId : undefined; const theme = useTheme(); const { t } = useTranslation(); @@ -462,9 +464,9 @@ ${styles} const blob = new Blob([cached.svg], { type: 'image/svg+xml;charset=utf-8' }); const blobUrl = URL.createObjectURL(blob); updateCachedReportImages(chartId, blobUrl, config.defaultChartWidth, config.defaultChartHeight); - } else if (chart.thumbnail) { + } else if (chartThumbnails[chartId]) { // Fall back to thumbnail - updateCachedReportImages(chartId, chart.thumbnail, config.defaultChartWidth, config.defaultChartHeight); + updateCachedReportImages(chartId, chartThumbnails[chartId], config.defaultChartWidth, config.defaultChartHeight); } }); } @@ -561,8 +563,8 @@ ${styles} if (cached?.svg) { const blob = new Blob([cached.svg], { type: 'image/svg+xml;charset=utf-8' }); updateCachedReportImages(chart.id, URL.createObjectURL(blob), config.defaultChartWidth, config.defaultChartHeight); - } else if (chart.thumbnail) { - updateCachedReportImages(chart.id, chart.thumbnail, config.defaultChartWidth, config.defaultChartHeight); + } else if (chartThumbnails[chart.id]) { + updateCachedReportImages(chart.id, chartThumbnails[chart.id], config.defaultChartWidth, config.defaultChartHeight); } }); } diff --git a/src/views/SelectableDataGrid.tsx b/src/views/SelectableDataGrid.tsx index 8b2e46dc..a400d4ab 100644 --- a/src/views/SelectableDataGrid.tsx +++ b/src/views/SelectableDataGrid.tsx @@ -30,12 +30,14 @@ import { TableIcon } from '../icons'; import CasinoIcon from '@mui/icons-material/Casino'; import ArrowUpwardIcon from '@mui/icons-material/ArrowUpward'; import ArrowDownwardIcon from '@mui/icons-material/ArrowDownward'; -import UnfoldMoreIcon from '@mui/icons-material/UnfoldMore'; +import MoreVertIcon from '@mui/icons-material/MoreVert'; +import FilterAltIcon from '@mui/icons-material/FilterAlt'; import { getUrls, fetchWithIdentity } from '../app/utils'; import { apiRequest, assertDownloadResponseOk } from '../app/apiClient'; import { useDrag } from 'react-dnd'; import { useSelector } from 'react-redux'; import { DataFormulatorState } from '../app/dfSlice'; +import { ColumnFilter, ColumnFilterPopover } from './ColumnFilterPopover'; export interface ColumnDef { id: string; @@ -100,14 +102,27 @@ interface DraggableHeaderProps { columnDef: ColumnDef; orderBy: string | undefined; order: 'asc' | 'desc'; - onSortClick: () => void; + onSortAsc: () => void; + onSortDesc: () => void; + onClearSort: () => void; + onOpenFilter: (anchor: HTMLElement) => void; + hasFilter: boolean; tableId: string; } const DraggableHeader: React.FC = ({ - columnDef, orderBy, order, onSortClick, tableId + columnDef, orderBy, order, onSortAsc, onSortDesc, onClearSort, + onOpenFilter, hasFilter, tableId }) => { const { t } = useTranslation(); + const filterButtonRef = React.useRef(null); + const kebabButtonRef = React.useRef(null); + const isSorted = orderBy === columnDef.id; + const cycleSort = () => { + if (!isSorted) { onSortAsc(); return; } + if (order === 'asc') { onSortDesc(); return; } + onClearSort(); + }; const theme = useTheme(); const conceptShelfItems = useSelector((state: DataFormulatorState) => state.conceptShelfItems); const semanticType = useSelector( @@ -152,16 +167,6 @@ const DraggableHeader: React.FC = ({ const hoverBackgroundColor = field ? alpha(getColorForFieldSource(field.source, theme), 0.1) : backgroundColor; - - // Determine sort icon - const getSortIcon = () => { - if (orderBy !== columnDef.id) { - return ; - } - return order === 'asc' - ? - : ; - }; return ( = ({ display: 'none', }, }} - active={orderBy === columnDef.id} - direction={orderBy === columnDef.id ? order : 'asc'} + active={isSorted} + direction={isSorted ? order : 'asc'} onClick={(e) => { // Prevent sort when dragging if (!isDragging) { e.stopPropagation(); - onSortClick(); + cycleSort(); } }} > @@ -238,27 +243,60 @@ const DraggableHeader: React.FC = ({ {columnDef.label} + {isSorted && ( + + {order === 'asc' + ? + : } + + )} - {/* Separate sort handler button */} - {t('dataGrid.sortBy', { label: columnDef.label })}}> + {/* Inline filter status hint \u2014 only rendered when a filter is active on this column. + Clicking reopens the filter popover. */} + {hasFilter && ( + {t('dataGrid.columnMenu.filterActive')}}> + { + e.stopPropagation(); + onOpenFilter(e.currentTarget); + }} + sx={{ + padding: '2px', + marginLeft: '2px', + color: theme.palette.primary.main, + '&:hover': { + backgroundColor: alpha(theme.palette.action.hover, 0.2), + }, + }} + > + + + + )} + {/* Column kebab — single entry point that opens the unified + sort + filter popover (design-doc 31 §4.2). */} + {t('dataGrid.columnMenu.openMenu')}}> { e.stopPropagation(); - onSortClick(); + onOpenFilter(e.currentTarget); }} sx={{ padding: '2px', - marginLeft: '4px', + marginLeft: '1px', marginRight: '2px', - opacity: orderBy === columnDef.id ? 1 : 0.5, + color: alpha(theme.palette.text.primary, 0.55), '&:hover': { - opacity: 1, + color: theme.palette.text.primary, backgroundColor: alpha(theme.palette.action.hover, 0.2), }, }} > - {getSortIcon()} + @@ -290,6 +328,30 @@ export const SelectableDataGrid: React.FC = React.memo( const [orderBy, setOrderBy] = React.useState(undefined); const [order, setOrder] = React.useState<'asc' | 'desc'>('asc'); + // Column filter state — keyed by column id. Presence of an entry means + // the column has an active filter (design-doc 31). Server-side pushdown + // is handled by fetchVirtualData; non-virtual tables ignore filters in v1. + const [columnFilters, setColumnFilters] = React.useState>({}); + const [filterPopover, setFilterPopover] = React.useState<{ columnId: string; anchor: HTMLElement } | null>(null); + + // Lookup of per-column metadata (distinctCount/nullCount/levels/levelCounts/type) + // to drive the filter popover's variant selection synchronously. + const tableMetadata = useSelector( + (state: DataFormulatorState) => state.tables.find(t => t.id === tableId)?.metadata, + ); + + // Ref-based bridge to fetchVirtualData (declared further below); lets stable + // sort handlers call into the latest fetch function without re-creating themselves. + const fetchVirtualDataRef = React.useRef<((sortByColumnIds: string[], sortOrder: 'asc' | 'desc', offset?: number, append?: boolean) => void) | null>(null); + + const applySort = React.useCallback((newOrderBy: string | undefined, newOrder: 'asc' | 'desc') => { + setOrder(newOrder); + setOrderBy(newOrderBy); + if (virtual) { + fetchVirtualDataRef.current?.(newOrderBy ? [newOrderBy] : [], newOrder); + } + }, [virtual]); + let theme = useTheme(); const [rowsToDisplay, setRowsToDisplay] = React.useState(rows); @@ -377,6 +439,13 @@ export const SelectableDataGrid: React.FC = React.memo( } }; + // Hold the live filters in a ref so fetchVirtualData stays stable yet + // always reads the latest set when called from sort/scroll handlers. + const filtersRef = React.useRef([]); + React.useEffect(() => { + filtersRef.current = Object.values(columnFilters); + }, [columnFilters]); + const fetchVirtualData = React.useCallback(( sortByColumnIds: string[], sortOrder: 'asc' | 'desc', @@ -391,6 +460,7 @@ export const SelectableDataGrid: React.FC = React.memo( const currentFetchId = ++fetchIdRef.current; + const activeFilters = filtersRef.current; const message: Record = { table: tableId, size: PAGE_SIZE, @@ -399,6 +469,7 @@ export const SelectableDataGrid: React.FC = React.memo( ? (sortOrder === 'asc' ? 'head' : 'bottom') : 'head', order_by_fields: sortByColumnIds.length > 0 ? sortByColumnIds : ['#rowId'], + ...(activeFilters.length > 0 ? { filters: activeFilters } : {}), }; apiRequest(getUrls().SAMPLE_TABLE, { @@ -428,6 +499,25 @@ export const SelectableDataGrid: React.FC = React.memo( }); }, [tableId, rowCount]); + // Keep the ref in sync so stable handlers (e.g. applySort) can call the latest version. + React.useEffect(() => { + fetchVirtualDataRef.current = fetchVirtualData; + }, [fetchVirtualData]); + + // Refetch from offset 0 whenever filters change (virtual tables only). + // Skip the initial mount; the first load happens via the existing + // virtual-data effect. + const didMountFiltersRef = React.useRef(false); + React.useEffect(() => { + if (!virtual) return; + if (!didMountFiltersRef.current) { + didMountFiltersRef.current = true; + return; + } + fetchVirtualData(orderBy ? [orderBy] : [], order, 0, false); + // eslint-disable-next-line react-hooks/exhaustive-deps + }, [columnFilters]); + const handleEndReached = React.useCallback(() => { if (!virtual || !hasMore || isLoadingMore || isLoading) return; fetchVirtualData( @@ -523,25 +613,11 @@ export const SelectableDataGrid: React.FC = React.memo( orderBy={orderBy} order={order} tableId={tableId} - onSortClick={() => { - let newOrder: 'asc' | 'desc' = 'asc'; - let newOrderBy : string | undefined = columnDef.id; - if (orderBy === columnDef.id && order === 'asc') { - newOrder = 'desc'; - } else if (orderBy === columnDef.id && order === 'desc') { - newOrder = 'asc'; - newOrderBy = undefined; - } else { - newOrder = 'asc'; - } - - setOrder(newOrder); - setOrderBy(newOrderBy); - - if (virtual) { - fetchVirtualData(newOrderBy ? [newOrderBy] : [], newOrder); - } - }} + hasFilter={Boolean(columnFilters[columnDef.id])} + onSortAsc={() => applySort(columnDef.id, 'asc')} + onSortDesc={() => applySort(columnDef.id, 'desc')} + onClearSort={() => applySort(undefined, 'asc')} + onOpenFilter={(anchor) => setFilterPopover({ columnId: columnDef.id, anchor })} /> )} @@ -632,6 +708,41 @@ export const SelectableDataGrid: React.FC = React.memo( + {/* Column filter popover — variant chosen synchronously from metadata + (design-doc 31). Server-side pushdown via fetchVirtualData. */} + {filterPopover && (() => { + const meta = tableMetadata?.[filterPopover.columnId]; + const colDef = columnDefs.find(c => c.id === filterPopover.columnId); + return ( + setFilterPopover(null)} + columnId={filterPopover.columnId} + columnLabel={colDef?.label || filterPopover.columnId} + dataType={colDef?.dataType} + distinctCount={meta?.distinctCount} + nullCount={meta?.nullCount} + levels={meta?.levels} + levelCounts={meta?.levelCounts} + currentFilter={columnFilters[filterPopover.columnId]} + onApply={(f) => { + setColumnFilters(prev => { + const next = { ...prev }; + if (f) next[filterPopover.columnId] = f; + else delete next[filterPopover.columnId]; + return next; + }); + }} + rowCount={rowCount} + isSorted={orderBy === filterPopover.columnId} + sortOrder={order} + onSortAsc={() => applySort(filterPopover.columnId, 'asc')} + onSortDesc={() => applySort(filterPopover.columnId, 'desc')} + onClearSort={() => applySort(undefined, 'asc')} + /> + ); + })()} ); }); diff --git a/src/views/SimpleChartRecBox.tsx b/src/views/SimpleChartRecBox.tsx index c7212318..c96dec65 100644 --- a/src/views/SimpleChartRecBox.tsx +++ b/src/views/SimpleChartRecBox.tsx @@ -32,19 +32,20 @@ import { normalizeClarifyEvent, formatClarificationResponses } from '../app/clar import { alpha } from '@mui/material/styles'; import { WritingPencil } from '../components/FunComponents'; -import SendIcon from '@mui/icons-material/Send'; +import ArrowUpwardRoundedIcon from '@mui/icons-material/ArrowUpwardRounded'; import CloseIcon from '@mui/icons-material/Close'; import AddIcon from '@mui/icons-material/Add'; import TipsAndUpdatesIcon from '@mui/icons-material/TipsAndUpdates'; -import StopCircleOutlinedIcon from '@mui/icons-material/StopCircleOutlined'; +import StopIcon from '@mui/icons-material/Stop'; import AutoGraphIcon from '@mui/icons-material/AutoGraph'; import DescriptionOutlinedIcon from '@mui/icons-material/DescriptionOutlined'; import { UnifiedDataUploadDialog } from './UnifiedDataUploadDialog'; +import { transition } from '../app/tokens'; import { Theme } from '@mui/material/styles'; import { useTranslation } from 'react-i18next'; import { shouldAutoFocusGeneratedChart } from '../app/agentInteractionPolicy'; -import { ClarificationPanel } from './ClarificationPanel'; +import { ClarificationPanel, DelegatePanel } from './AgentPausePanel'; const AgentWorkingOverlay: FC<{ message?: string; elapsed?: number; theme: Theme; onCancel?: () => void; color?: 'primary' | 'warning' }> = ({ message, elapsed, theme, onCancel, color = 'primary' }) => { const { t } = useTranslation(); @@ -67,30 +68,41 @@ const AgentWorkingOverlay: FC<{ message?: string; elapsed?: number; theme: Theme px: 2, overflow: 'hidden', }}> - - - + + + {t('chartRec.agentWorking')} {onCancel && ( - - - + + + + + )} - {latestMessage}{elapsedSuffix} @@ -113,6 +125,7 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ const config = useSelector((state: DataFormulatorState) => state.config); const activeModel = useSelector(dfSelectors.getActiveModel); const draftNodes = useSelector((state: DataFormulatorState) => state.draftNodes); + const chartThumbnails = useSelector((state: DataFormulatorState) => state.chartThumbnails) || {}; const theme = useTheme(); const { t } = useTranslation(); @@ -142,11 +155,18 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ const agentAbortRef = useRef(null); const userChartFocusLockedRef = useRef(false); const lastAutoFocusedChartIdRef = useRef(null); + // Whether we've already auto-focused an artifact during the current + // agent run. We only jump focus once per run (to the FIRST generated + // chart), so the user isn't yanked around as further charts stream in. + // Subsequent artifacts rely on the "freshly created" highlight + NEW + // tag for discoverability instead. + const firstFocusedThisRunRef = useRef(false); useEffect(() => { if (!isChatFormulating) { userChartFocusLockedRef.current = false; lastAutoFocusedChartIdRef.current = null; + firstFocusedThisRunRef.current = false; return; } if (focusedId?.type === 'chart') { @@ -317,15 +337,26 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ // Extract the active structured clarification (or explanation) from // DraftNode interaction log. Both are stored as ClarificationQuestion[] // — the entry's role ('clarify' vs 'explain') is what differs. + // `delegate` pauses share the same slot but render a different panel + // (a one-click handoff to the target peer agent). const clarificationQuestions = React.useMemo(() => { if (!pendingClarification?.draftId) return null; const draft = draftNodes.find(d => d.id === pendingClarification.draftId); const interaction = draft?.derive?.trigger?.interaction || []; - // Find the most recent clarify or explain entry. + // Find the most recent pause entry (clarify / explain / delegate). for (let i = interaction.length - 1; i >= 0; i--) { const entry = interaction[i]; + if (entry.role === 'delegate') { + return { + kind: 'delegate' as const, + target: entry.delegateTarget || 'data_loading', + message: entry.content || '', + options: entry.delegateOptions || [], + }; + } if (entry.role === 'clarify' || entry.role === 'explain') { return { + kind: 'clarification' as const, questions: entry.clarificationQuestions || null, variant: entry.role === 'explain' ? 'explain' as const : 'clarify' as const, }; @@ -357,6 +388,14 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ const actionId = isResume ? clarificationContext!.actionId : `exploreDataFromNL_${String(Date.now())}`; const actionTables = selectedTableIds.map(id => tables.find(t => t.id === id) as DictTable); + // Seed the auto-focus baseline with whatever chart the user is + // currently looking at. Otherwise the lock effect would compare the + // current focused chart to `null` on run-start and trip immediately, + // blocking the first-artifact auto-focus. + lastAutoFocusedChartIdRef.current = focusedId?.type === 'chart' ? focusedId.chartId : null; + firstFocusedThisRunRef.current = false; + userChartFocusLockedRef.current = false; + setIsChatFormulating(true); // DraftNode handles status @@ -421,8 +460,8 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ }; // Include chart thumbnail for the focused leaf table (the one the user is looking at) - if (walkTable.id === focusedTableId && resolvedChart?.thumbnail) { - step.chart_thumbnail = resolvedChart.thumbnail; + if (walkTable.id === focusedTableId && resolvedChart && chartThumbnails[resolvedChart.id]) { + step.chart_thumbnail = chartThumbnails[resolvedChart.id]; } focusedSteps.unshift(step); @@ -448,6 +487,7 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ const triggers = getTriggers(leaf, tables); if (triggers.length === 0) continue; + const STEP_FINDING_CHAR_LIMIT = 200; const steps: string[] = []; for (const trig of triggers) { const tt = tables.find(t2 => t2.id === trig.resultTableId); @@ -466,7 +506,16 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ }) .join(', ') : ''; - steps.push(`${label}${chartType ? ` → ${chartType}` : ''}${encStr ? ` (${encStr})` : ''}`); + // Per-step agent commentary: the `summary` entry that the + // visualize action emits after running this step. + let finding = trig.interaction?.find( + (e: InteractionEntry) => e.role === 'summary', + )?.content?.trim() || ''; + if (finding.length > STEP_FINDING_CHAR_LIMIT) { + finding = finding.slice(0, STEP_FINDING_CHAR_LIMIT - 1).trimEnd() + '…'; + } + const head = `${label}${chartType ? ` → ${chartType}` : ''}${encStr ? ` (${encStr})` : ''}`; + steps.push(finding ? `${head} — finding: ${finding}` : head); } if (steps.length > 0) { @@ -623,8 +672,7 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ } // ── tool_start: agent is calling a tool (explore/inspect) ── - // (think tool is handled via thinking_text event, not here) - if (result.type === "tool_start" && result.tool !== "think") { + if (result.type === "tool_start") { // Show pending thought as a visible step before the tool step if (pendingThought) { thinkingSteps.push(pendingThought); @@ -654,8 +702,7 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ } // ── tool_result: mark the last tool step as done ── - // (skip for think tool — it doesn't add steps) - if (result.type === "tool_result" && result.tool !== "think") { + if (result.type === "tool_result") { const isError = result.status === "error" || !!result.error; for (let i = thinkingSteps.length - 1; i >= 0; i--) { if (!thinkingSteps[i].startsWith('✓') && !thinkingSteps[i].startsWith('✗')) { @@ -805,9 +852,14 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ const currentConcepts = [...conceptShelfItems.filter(c => names.includes(c.name)), ...allNewConcepts, ...conceptsToAdd]; let newChart = resolveRecommendedChart(refinedGoal, currentConcepts, candidateTable); + // Mark as unread by default; cleared below if we auto-focus it + // (i.e. it's the first artifact this run) or by setFocused when + // the user clicks the card. + newChart.unread = true; createdCharts.push(newChart); dispatch(dfActions.addChart(newChart)); - if (shouldAutoFocusGeneratedChart(userChartFocusLockedRef.current)) { + if (!firstFocusedThisRunRef.current && shouldAutoFocusGeneratedChart(userChartFocusLockedRef.current)) { + firstFocusedThisRunRef.current = true; lastAutoFocusedChartIdRef.current = newChart.id; dispatch(dfActions.setFocused({ type: 'chart', chartId: newChart.id })); } @@ -891,6 +943,54 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ isCompleted = true; } + // ── delegate: agent hands off to a peer agent ── + // The data agent has decided the conversation is better + // served by another agent (data loading when the workspace + // lacks needed data; report gen when the user wants a + // narrative). We render the rationale + a one-click handoff + // card. Shares the 'clarifying' status / pending-clarification + // slot with the clarify/explain pauses so the panel renders in + // the same UI position above the input box. + if (result.type === "delegate") { + const message = String(result.message || '').trim(); + const rawOptions = Array.isArray(result.options) ? result.options : []; + const options: string[] = rawOptions + .map((o: any) => (typeof o === 'string' ? o.trim() : '')) + .filter((s: string) => s.length > 0) + .slice(0, 2); + const target = (result.target === 'report_gen' ? 'report_gen' : 'data_loading') as 'data_loading' | 'report_gen'; + if (currentDraftId) { + const priorSteps = thinkingSteps.filter(s => s.trim()).join('\n'); + thinkingSteps = []; + pendingThought = ''; + dispatch(dfActions.updateDraftRunningPlan({ draftId: currentDraftId, plan: '' })); + + const pauseEntry: InteractionEntry = { + from: 'data-agent', to: 'user', + role: 'delegate', + plan: priorSteps || result.thought || undefined, + content: message, + delegateTarget: target, + delegateOptions: options, + timestamp: Date.now(), + }; + dispatch(dfActions.appendDraftInteraction({ draftId: currentDraftId, entry: pauseEntry })); + currentDraftInteraction.push(pauseEntry); + dispatch(dfActions.updateDeriveStatus({ nodeId: currentDraftId, status: 'clarifying' })); + dispatch(dfActions.updateDraftClarification({ draftId: currentDraftId, pendingClarification: { + trajectory: result.trajectory || [], + completedStepCount: result.completed_step_count || 0, + lastCreatedTableId, + }})); + } + setIsChatFormulating(false); + agentAbortRef.current = null; + clearTimeout(timeoutId); + setChatPrompt(""); + setAttachedImages([]); + isCompleted = true; + } + // ── completion: final summary ── if (result.type === "completion") { if (lastCreatedTableId) { @@ -972,7 +1072,7 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ allResults.push(data); processStreamingResult(data); - if (data.type === "completion" || data.type === "clarify" || data.type === "explain") { + if (data.type === "completion" || data.type === "clarify" || data.type === "explain" || data.type === "delegate") { handleCompletion(); return; } @@ -1020,6 +1120,9 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ const cleanPrompt = prompt.trim() || 'Create a report summarizing the exploration.'; setChatPrompt(''); + lastAutoFocusedChartIdRef.current = focusedId?.type === 'chart' ? focusedId.chartId : null; + firstFocusedThisRunRef.current = false; + userChartFocusLockedRef.current = false; setIsChatFormulating(true); // Build available charts list @@ -1083,6 +1186,35 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ agentAbortRef.current = controller; let accumulatedMarkdown = ''; + // Coalesce per-token updates: dispatching on every text_delta forces the + // Tiptap editor to re-parse the entire document each time, which makes + // the stream feel chunky / non-streaming. Batch updates on a short + // timer so the editor refreshes ~10×/sec while the wire still streams. + const FLUSH_INTERVAL_MS = 90; + let lastDispatched = ''; + let flushTimer: ReturnType | null = null; + const flushNow = () => { + if (flushTimer) { + clearTimeout(flushTimer); + flushTimer = null; + } + if (accumulatedMarkdown === lastDispatched) return; + lastDispatched = accumulatedMarkdown; + const titleMatch = accumulatedMarkdown.match(/^#\s+(.+)$/m); + dispatch(dfActions.updateGeneratedReportContent({ + id: reportId, + content: accumulatedMarkdown, + title: titleMatch ? titleMatch[1].trim() : undefined, + })); + }; + const scheduleFlush = () => { + if (flushTimer) return; + flushTimer = setTimeout(() => { + flushTimer = null; + flushNow(); + }, FLUSH_INTERVAL_MS); + }; + try { for await (const event of streamRequest(getUrls().GENERATE_REPORT_CHAT, { method: 'POST', @@ -1091,6 +1223,7 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ }, controller.signal)) { if (event.type === 'text_delta') { accumulatedMarkdown += (event as any).content; + scheduleFlush(); } else if (event.type === 'error') { const errMsg = event.error ? getErrorMessage(event.error) : t('messages.error'); accumulatedMarkdown += `\n\n**Error:** ${errMsg}`; @@ -1098,6 +1231,7 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ timestamp: Date.now(), type: 'error', component: 'report-agent', value: errMsg, })); + flushNow(); } else if (event.type === 'warning') { dispatch(dfActions.addMessages({ timestamp: Date.now(), type: 'warning', @@ -1105,15 +1239,11 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ value: (event as any).warning?.message ?? 'Warning from server', })); } - const titleMatch = accumulatedMarkdown.match(/^#\s+(.+)$/m); - dispatch(dfActions.updateGeneratedReportContent({ - id: reportId, - content: accumulatedMarkdown, - title: titleMatch ? titleMatch[1].trim() : undefined, - })); } - // Final update with completed status + // Final update with completed status — make sure the latest content + // is in state before we mark it complete. + flushNow(); const titleMatch = accumulatedMarkdown.match(/^#\s+(.+)$/m); dispatch(dfActions.updateGeneratedReportContent({ id: reportId, @@ -1130,11 +1260,31 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ })); } } finally { + if (flushTimer) { + clearTimeout(flushTimer); + flushTimer = null; + } agentAbortRef.current = null; setIsChatFormulating(false); } }, [focusedTableId, charts, tables, selectedTableIds, primaryTableIds, conceptShelfItems, activeModel, dispatch]); + // Honor cross-component handoff requests targeting the Report Gen + // agent (e.g. Data Agent's `delegate` card with target='report_gen'). + // Hand-offs targeting other agents (e.g. `data_loading`) are consumed + // elsewhere — we only react to ours. + const agentHandoffRequest = useSelector((state: DataFormulatorState) => state.agentHandoffRequest); + useEffect(() => { + if (agentHandoffRequest && agentHandoffRequest.target === 'report_gen') { + const promptText = agentHandoffRequest.prompt; + dispatch(dfActions.clearAgentHandoffRequest()); + // Fire-and-forget: reportFromChat manages its own streaming + // state via Redux dispatches. + reportFromChat(promptText); + } + // eslint-disable-next-line react-hooks/exhaustive-deps + }, [agentHandoffRequest]); + // ── Unified submit handler ─────────────────────────────────────── const submitChat = useCallback((prompt: string, clarificationCtx?: any, displayPrompt?: string) => { if (selectedAgent === 'report') { @@ -1237,25 +1387,74 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ ? `linear-gradient(135deg, ${alpha(theme.palette.warning.main, 0.3)}, ${alpha(theme.palette.warning.dark, 0.25)})` : `linear-gradient(135deg, ${alpha(theme.palette.primary.main, 0.3)}, ${alpha(theme.palette.secondary.main, 0.25)})`; + // Landing / "no thread yet" highlight: when the user has loaded data + // but hasn't started an exploration on the focused table (no real + // charts AND the table isn't part of a derivation chain), gently pulse + // a colored ring around the input card to anchor the eye here. + // Suppressed once they start typing, while an agent is running, or + // while a clarification is pending — anything that already draws focus. + const focusedTableHasCharts = !!focusedTableId && charts.some(c => + c.tableRef === focusedTableId + && c.chartType !== '?' + && c.chartType !== 'Auto' + && c.source !== 'trigger' + ); + const focusedTableObj = focusedTableId ? tables.find(t => t.id === focusedTableId) : undefined; + const focusedTableHasDerivation = !!focusedTableObj && ( + focusedTableObj.derive !== undefined + || tables.some(t => t.derive?.trigger?.tableId === focusedTableId) + ); + const isLandingHighlight = ( + !!focusedTableId + && !focusedTableHasCharts + && !focusedTableHasDerivation + && !isChatFormulating + && !pendingClarification + && chatPrompt.trim() === '' + ); + const inputBox = ( void }> = function ({ }, }} > - {clarificationQuestions?.questions && pendingClarification && !isChatFormulating && ( + {clarificationQuestions?.kind === 'clarification' && clarificationQuestions.questions && pendingClarification && !isChatFormulating && ( void }> = function ({ onCancel={cancelAgent} /> )} + {clarificationQuestions?.kind === 'delegate' && pendingClarification && !isChatFormulating && ( + + )} {/* Input area wrapper */} {/* @-mention table chips and image attachments. @@ -1456,8 +1663,8 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ } fullWidth multiline - minRows={2} - maxRows={4} + minRows={3} + maxRows={6} /> {/* Action buttons */} @@ -1466,11 +1673,14 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ { e.stopPropagation(); setUploadDialogOpen(true); }} - sx={{ p: 0, width: 18, height: 18, color: theme.palette.text.secondary, + sx={{ + p: 0.5, + color: theme.palette.text.secondary, borderRadius: '4px', - '&:hover': { color: theme.palette.primary.main, borderColor: alpha(theme.palette.primary.main, 0.5) } }} + '&:hover': { color: theme.palette.primary.main, backgroundColor: alpha(theme.palette.primary.main, 0.06) }, + }} > - + {/* Agent mode toggle */} @@ -1480,22 +1690,22 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ onClick={() => setSelectedAgent(prev => prev === 'explore' ? 'report' : 'explore')} sx={{ textTransform: 'none', - fontSize: 10, + fontSize: 11, minWidth: 0, - px: 0.75, + px: 0.875, py: 0, - height: 20, + height: 26, color: isReportMode ? theme.palette.warning.main : theme.palette.primary.main, borderRadius: '4px', display: 'flex', alignItems: 'center', - gap: '2px', + gap: '3px', '&:hover': { backgroundColor: alpha(isReportMode ? theme.palette.warning.main : theme.palette.primary.main, 0.08) }, }} > {selectedAgent === 'explore' - ? - : } + ? + : } {selectedAgent === 'explore' ? t('chartRec.modeExplore') : t('chartRec.modeReport')} @@ -1523,8 +1733,6 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ { if (pendingClarification) { @@ -1533,8 +1741,26 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ submitChat(chatPrompt); } }} + // When the user has typed text, promote + // the arrow to a contained primary + // affordance so it reads as the active + // submit action. Otherwise stay as a + // quiet outlined icon button. + sx={{ + p: 0, + width: 28, height: 28, + bgcolor: canSend ? 'primary.main' : 'transparent', + color: canSend ? 'common.white' : 'primary.main', + '&:hover': { + bgcolor: canSend ? 'primary.dark' : 'transparent', + }, + '&.Mui-disabled': { + bgcolor: 'transparent', + color: 'text.disabled', + }, + }} > - + @@ -1564,7 +1790,6 @@ export const SimpleChartRecBox: FC<{ onInputFocus?: () => void }> = function ({ open={uploadDialogOpen} onClose={() => setUploadDialogOpen(false)} initialTab="menu" - hideSampleDatasets /> ); diff --git a/src/views/TableSelectionView.tsx b/src/views/TableSelectionView.tsx deleted file mode 100644 index 8449265f..00000000 --- a/src/views/TableSelectionView.tsx +++ /dev/null @@ -1,188 +0,0 @@ -// Copyright (c) Microsoft Corporation. -// Licensed under the MIT License. - -import * as React from 'react'; -import { useEffect, useState, useMemo } from 'react'; -import { useTranslation } from 'react-i18next'; - -import Typography from '@mui/material/Typography'; -import Box from '@mui/material/Box'; -import { Button, Chip } from '@mui/material'; -import { borderColor } from '../app/tokens'; -import { StreamIcon } from '../icons'; -import { createTableFromFromObjectArray } from '../data/utils'; -import { MultiTablePreview } from './MultiTablePreview'; -import { DictTable } from '../components/ComponentType'; - -// Update the interface to support multiple tables per dataset -export interface DatasetMetadata { - name: string; - description: string; - source: string; - tables: { - table_name: string; - url: string; - format: string; - sample: any[]; - }[]; - // Live/streaming dataset properties - live?: boolean; - refreshIntervalSeconds?: number; -} - -export interface DatasetSelectionViewProps { - datasets: DatasetMetadata[]; - handleSelectDataset: (datasetMetadata: DatasetMetadata) => void; - handleSelectDatasetNewSession?: (datasetMetadata: DatasetMetadata) => void; - hideRowNum?: boolean; -} - -export const DatasetSelectionView: React.FC = function DatasetSelectionView({ datasets, handleSelectDataset, handleSelectDatasetNewSession, hideRowNum }) { - - const { t } = useTranslation(); - const [selectedDatasetName, setSelectedDatasetName] = useState(undefined); - const [tableActiveIndex, setTableActiveIndex] = useState(0); - - useEffect(() => { - if (datasets.length > 0) { - setSelectedDatasetName(datasets[0].name); - } - }, [datasets]); - - // Reset table active index when dataset changes - useEffect(() => { - setTableActiveIndex(0); - }, [selectedDatasetName]); - - const handleDatasetSelect = (index: number) => { - setSelectedDatasetName(datasets[index].name); - }; - - let datasetTitles : string[] = []; - for (let i = 0; i < datasets.length; i ++) { - let k = 0; - let title = datasets[i].name; - while (datasetTitles.includes(title)) { - k = k + 1; - title = `${title}_${k}`; - } - datasetTitles.push(title); - } - - // Convert dataset tables to DictTable objects for the preview - const selectedDataset = datasets.find(d => d.name === selectedDatasetName); - const previewTables: DictTable[] = useMemo(() => { - if (!selectedDataset) return []; - return selectedDataset.tables.map((table) => { - const dictTable = createTableFromFromObjectArray(table.table_name, table.sample, true); - // Use the table name from URL as displayId for better labeling - const displayName = table.url.split("/").pop()?.split(".")[0]?.split("?")[0] || table.table_name; - return { - ...dictTable, - displayId: displayName, - }; - }); - }, [selectedDataset]); - - return ( - - {/* Button navigation */} - - - {datasetTitles.map((title, i) => ( - - ))} - - - - {/* Content area */} - - - {selectedDataset && ( - - - - {selectedDataset.description} {t('tableSelection.fromSource', { source: selectedDataset.source })} - - - - - - - - {handleSelectDatasetNewSession && ( - - )} - - - )} - - - - ); -} diff --git a/src/views/TiptapReportEditor.tsx b/src/views/TiptapReportEditor.tsx index 655a8245..b9083511 100644 --- a/src/views/TiptapReportEditor.tsx +++ b/src/views/TiptapReportEditor.tsx @@ -6,6 +6,10 @@ import { useTranslation } from 'react-i18next'; import { useEditor, EditorContent, NodeViewWrapper, NodeViewProps, ReactNodeViewRenderer } from '@tiptap/react'; import StarterKit from '@tiptap/starter-kit'; import Image from '@tiptap/extension-image'; +import { Table } from '@tiptap/extension-table'; +import { TableRow } from '@tiptap/extension-table-row'; +import { TableHeader } from '@tiptap/extension-table-header'; +import { TableCell } from '@tiptap/extension-table-cell'; import { Markdown } from 'tiptap-markdown'; import { Box, Button, IconButton, Menu, MenuItem, Tooltip, Divider, useTheme, Typography } from '@mui/material'; import { alpha } from '@mui/material/styles'; @@ -194,6 +198,13 @@ export const TiptapReportEditor: FC = ({ ResizableImage.configure({ inline: false, }), + Table.configure({ + resizable: true, + HTMLAttributes: { class: 'report-table' }, + }), + TableRow, + TableHeader, + TableCell, Markdown.configure({ html: true, transformPastedText: true, @@ -584,6 +595,41 @@ export const TiptapReportEditor: FC = ({ borderTop: `1px solid ${alpha(theme.palette.divider, 0.5)}`, margin: '1.5em 0', }, + '& table': { + borderCollapse: 'collapse', + margin: '0.75em 0', + width: 'auto', + tableLayout: 'auto', + fontSize: '0.82em', + lineHeight: 1.35, + color: 'rgb(73, 73, 73)', + }, + '& table td, & table th': { + border: `1px solid ${alpha(theme.palette.divider, 0.55)}`, + padding: '4px 8px', + verticalAlign: 'top', + position: 'relative', + minWidth: '1em', + }, + '& table th': { + backgroundColor: alpha(theme.palette.text.primary, 0.04), + fontWeight: 600, + textAlign: 'left', + color: 'rgb(55, 53, 47)', + }, + '& .tableWrapper': { + overflowX: 'auto', + margin: '0.75em 0', + }, + '& .column-resize-handle': { + position: 'absolute', + right: -2, + top: 0, + bottom: -2, + width: 4, + backgroundColor: alpha(theme.palette.primary.main, 0.35), + pointerEvents: 'none', + }, }, }}> diff --git a/src/views/UnifiedDataUploadDialog.tsx b/src/views/UnifiedDataUploadDialog.tsx index 3cead17a..bd7167f8 100644 --- a/src/views/UnifiedDataUploadDialog.tsx +++ b/src/views/UnifiedDataUploadDialog.tsx @@ -2,9 +2,11 @@ // Licensed under the MIT License. import * as React from 'react'; -import { useState, useCallback, useEffect, useRef } from 'react'; +import { useState, useCallback, useEffect, useMemo, useRef } from 'react'; import { borderColor, transition, radius } from '../app/tokens'; import { + Alert, + AlertTitle, Box, Button, Chip, @@ -26,26 +28,25 @@ import UploadFileIcon from '@mui/icons-material/UploadFile'; import ContentPasteIcon from '@mui/icons-material/ContentPaste'; import LinkIcon from '@mui/icons-material/Link'; import { StreamIcon, getConnectorIcon, connectorSortOrder } from '../icons'; -import SmartToyOutlinedIcon from '@mui/icons-material/SmartToyOutlined'; -import ExploreIcon from '@mui/icons-material/Explore'; import RestartAltIcon from '@mui/icons-material/RestartAlt'; import ArrowBackIcon from '@mui/icons-material/ArrowBack'; import AddIcon from '@mui/icons-material/Add'; import Paper from '@mui/material/Paper'; import CircularProgress from '@mui/material/CircularProgress'; -import Backdrop from '@mui/material/Backdrop'; import { useDispatch, useSelector } from 'react-redux'; import { DataFormulatorState, dfActions } from '../app/dfSlice'; import { AppDispatch } from '../app/store'; -import { generateUUID } from '../app/identity'; import { loadTable } from '../app/tableThunks'; import { DataSourceConfig, DictTable, ConnectorInstance } from '../components/ComponentType'; import { createTableFromFromObjectArray, createTableFromText, loadTextDataWrapper, loadBinaryDataWrapper, readFileText } from '../data/utils'; import { DataLoadingChat } from './DataLoadingChat'; -import { DatasetSelectionView, DatasetMetadata } from './TableSelectionView'; -import { getUrls, fetchWithIdentity, CONNECTOR_URLS } from '../app/utils'; +import { AnimatedAgentToyIcon } from './AgentToyIcon'; +import { AgentChatInput } from './AgentChatInput'; +import { buildDataLoadingSuggestions } from './dataLoadingSuggestions'; +import { getUrls, CONNECTOR_URLS } from '../app/utils'; import { apiRequest } from '../app/apiClient'; +import { generateUUID } from '../app/identity'; import { DataLoaderForm } from './DBTableManager'; import { MultiTablePreview } from './MultiTablePreview'; import { @@ -59,7 +60,7 @@ import LanguageIcon from '@mui/icons-material/Language'; import { useTranslation } from 'react-i18next'; import { LocalInstallUpgradePanel } from './LocalInstallUpgradePanel'; -export type UploadTabType = 'menu' | 'upload' | 'paste' | 'url' | 'database' | 'extract' | 'explore' | 'local-folder' | 'add-connection' | `connector:${string}`; +export type UploadTabType = 'menu' | 'upload' | 'paste' | 'url' | 'database' | 'extract' | 'local-folder' | 'add-connection' | `connector:${string}`; interface TabPanelProps { children?: React.ReactNode; @@ -84,6 +85,27 @@ function TabPanel(props: TabPanelProps) { ); } +// Collapse a filesystem path's home prefix to `~` for privacy / readability +// (so screenshots and shared workspaces don't expose usernames). Recognizes +// macOS (/Users//), Linux (/home//), and Windows (C:\Users\\). +// If `tail` is given, also keeps only the last N path segments, prefixed +// with `…/` when truncated. +const displayPath = (p: string, tail?: number): string => { + if (!p) return ''; + let s = p.replace(/^\/Users\/[^/]+/, '~') + .replace(/^\/home\/[^/]+/, '~') + .replace(/^[A-Za-z]:\\Users\\[^\\]+/, '~'); + if (tail && tail > 0) { + const sep = s.includes('\\') ? '\\' : '/'; + const parts = s.split(sep).filter(Boolean); + if (parts.length > tail) { + const prefix = s.startsWith('~') ? '~' + sep : ''; + s = (prefix || '…' + sep) + parts.slice(-tail).join(sep); + } + } + return s; +}; + // Data source menu card component interface DataSourceCardProps { icon: React.ReactNode; @@ -98,6 +120,8 @@ interface DataSourceCardProps { */ variant?: 'data' | 'action'; badge?: React.ReactNode; + /** Optional hover tooltip; useful when `description` is truncated. */ + tooltip?: React.ReactNode; } const DataSourceCard: React.FC = ({ @@ -108,6 +132,7 @@ const DataSourceCard: React.FC = ({ disabled = false, variant = 'data', badge, + tooltip, }) => { const theme = useTheme(); const isAction = variant === 'action'; @@ -180,7 +205,9 @@ const DataSourceCard: React.FC = ({ ); - return card; + return tooltip + ? {card} + : card; }; const getUniqueTableName = (baseName: string, existingNames: Set): string => { @@ -382,6 +409,7 @@ export { type ConnectorInstance } from '../components/ComponentType'; // Map connector source_type (class name) to i18n key suffix const CONNECTOR_TYPE_KEY_MAP: Record = { + SampleDatasetsLoader: 'sample_datasets', MySQLDataLoader: 'mysql', PostgreSQLDataLoader: 'postgresql', MSSQLDataLoader: 'mssql', @@ -418,31 +446,65 @@ export interface DataLoadMenuProps { * instead of opening an in-dialog catalog tab. */ onSelectConnector?: (connector: ConnectorInstance) => void; + /** + * Called when the user submits a prompt from the top-level Data Loading + * Agent chat box. Implementations should open the agent chat surface + * with the prompt (and optional pasted/attached images) pre-filled — + * typically auto-sent. If not provided, the chat box falls back to + * `onSelectTab('extract')`. + */ + onStartChat?: (prompt: string, images?: string[]) => void; + /** + * True when a prior data-loading agent conversation exists in + * state. When set together with `onResumeChat`, the menu renders + * a small "Resume previous" affordance next to the agent label so + * the user can re-open the previous thread instead of being forced + * to overwrite it with a new query. + */ + hasPriorConversation?: boolean; + /** + * Called when the user clicks the "Resume previous" affordance. + * Should open the agent chat surface without clearing the existing + * message history and without auto-sending anything. + */ + onResumeChat?: () => void; serverConfig?: { WORKSPACE_BACKEND?: string; IS_LOCAL_MODE?: boolean }; - variant?: 'dialog' | 'page'; // 'dialog' uses smaller cards, 'page' uses larger cards - hideSampleDatasets?: boolean; connectors?: ConnectorInstance[]; } export const DataLoadMenu: React.FC = ({ onSelectTab, onSelectConnector, + onStartChat, + hasPriorConversation = false, + onResumeChat, serverConfig = { WORKSPACE_BACKEND: 'local' }, - variant = 'dialog', - hideSampleDatasets = false, connectors = [], }) => { const theme = useTheme(); const { t } = useTranslation(); - // Data source configurations + const dispatch = useDispatch(); + const activeWorkspace = useSelector((state: DataFormulatorState) => state.activeWorkspace); + // Backend requires an active workspace (X-Workspace-Id header) for + // scratch uploads and chat. The menu can be opened on the entry + // surface before any workspace has been picked, so we lazily mint + // one here — the parent's `openUploadDialog` does the same when it + // can, but we cover the path where this menu is rendered directly. + const ensureActiveWorkspace = (): string => { + if (activeWorkspace?.id) return activeWorkspace.id; + const now = new Date(); + const date = `${now.getFullYear()}${String(now.getMonth() + 1).padStart(2, '0')}${String(now.getDate()).padStart(2, '0')}`; + const time = `${String(now.getHours()).padStart(2, '0')}${String(now.getMinutes()).padStart(2, '0')}${String(now.getSeconds()).padStart(2, '0')}`; + const wsId = `session_${date}_${time}_${generateUUID().slice(0, 4)}`; + dispatch(dfActions.setActiveWorkspace({ id: wsId, displayName: 'Untitled Session' })); + return wsId; + }; + // Data source configurations (upload-style entries — file, paste, + // URL). The "Data Loading Agent" entry is surfaced separately as a + // chat box at the top of the menu. Sample datasets are no longer + // listed here — they're now exposed as the built-in `sample_datasets` + // connector in the Data Connectors section below. const regularDataSources = [ - { - value: 'explore' as UploadTabType, - title: t('upload.sampleDatasets'), - description: t('upload.sampleDatasetsDesc'), - icon: , - disabled: false - }, { value: 'upload' as UploadTabType, title: t('upload.uploadFile'), @@ -457,16 +519,9 @@ export const DataLoadMenu: React.FC = ({ icon: , disabled: false }, - { - value: 'extract' as UploadTabType, - title: t('upload.extractData'), - description: t('upload.extractDataDesc'), - icon: , - disabled: false - }, { value: 'url' as UploadTabType, - title: t('upload.loadFromUrl'), + title: t('upload.loadFromUrlTitle', { defaultValue: 'Load from URL (live)' }), description: t('upload.loadFromUrlDesc'), icon: , disabled: false, @@ -476,24 +531,29 @@ export const DataLoadMenu: React.FC = ({ '100%': { opacity: 1 }, } }} />, }, - ].filter(source => !(hideSampleDatasets && source.value === 'explore')); + ]; // Data connections — persistent configured sources (databases, services, etc.) - const connectionSources: Array<{ value: UploadTabType; title: string; description: string; icon: React.ReactNode; disabled: boolean; variant?: 'data' | 'action' }> = [ + const connectionSources: Array<{ value: UploadTabType; title: string; description: string; icon: React.ReactNode; disabled: boolean; variant?: 'data' | 'action'; tooltip?: React.ReactNode }> = [ // Per-connector cards — all instances ...connectors.map((conn) => { const isLocalFolder = conn.source_type === 'LocalFolderDataLoader' || conn.id.startsWith('local_folder'); const folderPath = isLocalFolder ? (conn.pinned_params?.root_dir || '') : ''; + // Show only the tail of the path on the card (privacy-friendly + // for screenshots), with the home-collapsed full path on hover. + const folderDisplay = displayPath(folderPath, 2); + const folderTooltip = displayPath(folderPath); return { value: `connector:${conn.id}` as UploadTabType, title: conn.display_name, description: isLocalFolder - ? (folderPath || t('upload.localFolderConnected', { defaultValue: 'Local folder' })) + ? (folderDisplay || t('upload.localFolderConnected', { defaultValue: 'Local folder' })) : getConnectorTypeDescription(conn.source_type, conn.connected, t), icon: isLocalFolder ? : getConnectorIcon(conn.icon || conn.source_type), disabled: false, + tooltip: isLocalFolder && folderTooltip ? folderTooltip : undefined, }; }), // "Local Folder" card (action variant, local mode only) @@ -532,30 +592,168 @@ export const DataLoadMenu: React.FC = ({ onSelectTab(sourceValue); }; - if (variant === 'page') { - // Page variant: two sections stacked, local data in 3 columns, live sources in 2 columns with wrap - return ( - - {/* Local Data Sources */} + // ------------------------------------------------------------------ + // Data Loading Agent quick-chat box. Surfaced at the top of the menu + // so users can start a conversation with the agent directly. Pressing + // Enter (or the send button) hands the prompt/images off to the chat + // surface via `onStartChat`, which opens the agent and auto-sends. + // Falls back to `onSelectTab('extract')` if no handler is provided. + // ------------------------------------------------------------------ + const [agentInput, setAgentInput] = useState(''); + const [agentImages, setAgentImages] = useState([]); + const [agentAttachments, setAgentAttachments] = useState([]); + const submitAgentChat = () => { + const text = agentInput.trim(); + if (text.length === 0 && agentImages.length === 0 && agentAttachments.length === 0) { + // Empty submission — just open the chat surface. + if (onStartChat) onStartChat('', []); + else onSelectTab('extract'); + return; + } + // Augment the outgoing prompt with `[Uploaded: name]` lines so the + // agent sees attachments as text references, without polluting + // the editable input the user sees. + const mentions = agentAttachments + .map(name => t('dataLoading.uploaded', { name })) + .join('\n'); + const finalText = mentions + ? (text ? `${text}\n${mentions}` : mentions) + : text; + if (onStartChat) { + onStartChat(finalText, agentImages); + } else { + onSelectTab('extract'); + } + setAgentInput(''); + setAgentImages([]); + setAgentAttachments([]); + }; + + // Suggestions surfaced as a focus-time dropdown — sourced from a shared + // factory so the in-session `DataLoadingChat` panel renders the exact + // same list. See `dataLoadingSuggestions.ts`. + const agentChatSuggestions = useMemo(() => buildDataLoadingSuggestions({ + t, + setInput: setAgentInput, + setImages: setAgentImages, + setAttachments: setAgentAttachments, + ensureActiveWorkspace, + }), [t]); + const agentChatBox = ( + + + + + {t('upload.dataLoadingAgent', { defaultValue: 'Data Loading Agent' })} + + {hasPriorConversation && onResumeChat && ( + + + {t('upload.resumePreviousConversation', { defaultValue: 'Previous conversation →' })} + + + )} + + { + // Upload non-image files (Excel, CSV, JSON, …) to the + // session scratch space. The filename is shown as a + // chip; the `[Uploaded: name]` mention is appended to + // the outgoing prompt at send-time so the editable + // input stays clean. + ensureActiveWorkspace(); + const formData = new FormData(); + formData.append('file', file); + apiRequest(getUrls().SCRATCH_UPLOAD_URL, { + method: 'POST', body: formData, + }).then(() => { + setAgentAttachments(prev => [...prev, file.name]); + }).catch(err => console.error('Upload failed:', err)); + }} + attachments={agentAttachments} + onAttachmentsChange={setAgentAttachments} + minRows={1} + tabSuggestion={t('upload.agentChatTabSuggestion', { + defaultValue: 'What dataset do we have here?', + })} + focusSuggestionsLabel={t('upload.agentChatSuggestionsLabel', { defaultValue: 'Try asking' })} + focusSuggestions={agentChatSuggestions} + placeholder={t('upload.agentChatPlaceholder', { + defaultValue: 'Ask the agent to find datasets, or extract data from an image or text…', + })} + sendTooltip={t('upload.agentChatSendTooltip', { defaultValue: 'Start chatting with the agent' })} + /> + + ); + + return ( + + {/* Data Loading Agent quick-chat */} + {agentChatBox} + + {/* Upload data */} + - {t('upload.importData')} + {t('upload.uploadData', { defaultValue: 'Upload data' })} = ({ /> ))} + - {/* Data Connections */} + {/* Data Connections */} + = ({ onClick={() => handleConnectionClick(source.value)} disabled={source.disabled} variant={source.variant} + tooltip={source.tooltip} /> ))} - ); - } - - // Dialog variant: two-section layout - return ( - - {/* Import Data */} - - {t('upload.importData')} - - - - {regularDataSources.map((source) => ( - onSelectTab(source.value)} - disabled={source.disabled} - badge={source.badge} - /> - ))} - - - {/* Data Connections */} - - - {t('upload.dataConnections')} - - - - {connectionSources.map((source) => ( - handleConnectionClick(source.value)} - disabled={source.disabled} - variant={source.variant} - /> - ))} - ); }; @@ -721,6 +832,16 @@ interface LoaderType { auth_mode?: string; auth_instructions?: string; delegated_login?: { login_url: string; label?: string } | null; + source?: 'plugin' | 'builtin'; + source_path?: string | null; +} + +interface PluginsInfo { + dir: string; + enabled: boolean; + reason: string; + loaded: Array<{ type: string; name: string; source_path: string }>; + errors: Array<{ file: string; reason: string; kind: string }>; } const AddConnectionPanel: React.FC<{ @@ -732,6 +853,7 @@ const AddConnectionPanel: React.FC<{ ); const [loaderTypes, setLoaderTypes] = useState([]); const [disabledLoaders, setDisabledLoaders] = useState>({}); + const [pluginsInfo, setPluginsInfo] = useState(null); const [selectedType, setSelectedType] = useState(''); const [displayName, setDisplayName] = useState(''); const dispatch = useDispatch(); @@ -750,6 +872,7 @@ const AddConnectionPanel: React.FC<{ .then(({ data }) => { setLoaderTypes(data.loaders || []); setDisabledLoaders(data.disabled || {}); + setPluginsInfo(data.plugins || null); if (data.loaders?.length > 0) { setSelectedType(data.loaders[0].type); setDisplayName(data.loaders[0].name); @@ -853,17 +976,47 @@ const AddConnectionPanel: React.FC<{ }}> {t('upload.dataSourceTypes', { defaultValue: 'Data Sources' })} - {[...loaderTypes].sort((a, b) => connectorSortOrder(a.type, b.type)).map((loader) => ( - - ))} + {[...loaderTypes].sort((a, b) => connectorSortOrder(a.type, b.type)).map((loader) => { + const isPlugin = loader.source === 'plugin'; + const btn = ( + + ); + return isPlugin ? ( + + {btn} + + ) : btn; + })} {Object.entries(disabledLoaders).sort(([a], [b]) => connectorSortOrder(a, b)).map(([name, { install_hint }]) => ( @@ -886,6 +1039,23 @@ const AddConnectionPanel: React.FC<{ {/* Right panel: display name + DataLoaderForm (or simplified Local Folder panel) */} + {/* Plugin rejection banner — surfaces plugins that failed to load + so users notice broken extensions. Successful loads are indicated + by the "plugin" tag next to the loader name in the sidebar. */} + {pluginsInfo && pluginsInfo.errors.length > 0 && ( + + + + {pluginsInfo.errors.length} plugin{pluginsInfo.errors.length === 1 ? '' : 's'} rejected + + {pluginsInfo.errors.map((e, i) => ( + + {e.file.split('/').pop()}: {e.reason} + + ))} + + + )} {selectedLoader && selectedType === 'local_folder' ? ( /* Simplified Local Folder panel — no connection name, no form tiers */ void; initialTab?: UploadTabType; - hideSampleDatasets?: boolean; + /** + * Optional initial prompt to hand off to the Data Loading Agent. When + * non-empty and `initialTab === 'extract'`, the prompt is pre-filled + * and auto-sent in the chat panel. + */ + initialChatPrompt?: string; + /** Optional images (data URLs) to seed the chat alongside `initialChatPrompt`. */ + initialChatImages?: string[]; onConnectorsChanged?: () => void; } @@ -950,7 +1127,8 @@ export const UnifiedDataUploadDialog: React.FC = ( open, onClose, initialTab = 'menu', - hideSampleDatasets = false, + initialChatPrompt, + initialChatImages, onConnectorsChanged, }) => { const theme = useTheme(); @@ -965,6 +1143,21 @@ export const UnifiedDataUploadDialog: React.FC = ( const existingNames = new Set(existingTables.map(t => t.id)); const [activeTab, setActiveTab] = useState(initialTab === 'menu' ? 'menu' : initialTab); + // Prompt to seed the agent chat with. Sourced from the `initialChatPrompt` + // prop when the dialog opens directly on 'extract', or set internally + // when the user submits the in-menu agent chat box. + const [seededChatPrompt, setSeededChatPrompt] = useState( + initialTab === 'extract' ? initialChatPrompt : undefined, + ); + const [seededChatImages, setSeededChatImages] = useState( + initialTab === 'extract' ? initialChatImages : undefined, + ); + const [autoSendSeededPrompt, setAutoSendSeededPrompt] = useState( + initialTab === 'extract' && ( + (!!initialChatPrompt && initialChatPrompt.trim().length > 0) + || (!!initialChatImages && initialChatImages.length > 0) + ), + ); const fileInputRef = useRef(null); const urlInputRef = useRef(null); @@ -982,8 +1175,27 @@ export const UnifiedDataUploadDialog: React.FC = ( if (open) { setConnectorInstances([]); refreshConnectors(); + // Re-seed chat prompt/images from props each time the dialog opens. + if (initialTab === 'extract') { + setSeededChatPrompt(initialChatPrompt); + setSeededChatImages(initialChatImages); + const hasText = !!initialChatPrompt && initialChatPrompt.trim().length > 0; + const hasImages = !!initialChatImages && initialChatImages.length > 0; + setAutoSendSeededPrompt(hasText || hasImages); + // Opening the dialog with a fresh prompt/images means the + // user wants a new data-loading conversation; clear any + // stale messages from a previous session so the new query + // isn't appended to an unrelated thread. + if ((hasText || hasImages) && dataLoadingChatMessages.length > 0) { + dispatch(dfActions.clearChatMessages()); + } + } else { + setSeededChatPrompt(undefined); + setSeededChatImages(undefined); + setAutoSendSeededPrompt(false); + } } - }, [open, refreshConnectors, identityKey]); + }, [open, refreshConnectors, identityKey, initialTab, initialChatPrompt, initialChatImages]); // Storage is determined by backend config — no user toggle const isEphemeral = serverConfig.WORKSPACE_BACKEND === 'ephemeral'; @@ -1014,16 +1226,9 @@ export const UnifiedDataUploadDialog: React.FC = ( // Example URLs state const [exampleUrls, setExampleUrls] = useState>([]); - // Sample datasets state - const [datasetPreviews, setDatasetPreviews] = useState([]); - // Loading state for table loading (file/URL/paste) const [tableLoading, setTableLoading] = useState(false); - // Loading state for dataset loading - const [datasetLoading, setDatasetLoading] = useState(false); - const [datasetLoadingLabel, setDatasetLoadingLabel] = useState(''); - // Constants const MAX_DISPLAY_LINES = 20; const LARGE_CONTENT_THRESHOLD = 50000; @@ -1056,81 +1261,29 @@ export const UnifiedDataUploadDialog: React.FC = ( }, [open, activeTab, connectorInstances]); - // Load sample datasets + // Load example URLs for the URL tab. The demo-stream/info endpoint + // returns plain JSON (no {status:'success', data:...} envelope), so we + // bypass apiRequest's envelope check and call fetch directly. useEffect(() => { - if (open && activeTab === 'explore') { - apiRequest(`${getUrls().EXAMPLE_DATASETS}`) - .then(({ data: result }) => { - let datasets: DatasetMetadata[] = result.map((info: any) => { - let tables = info["tables"].map((table: any) => { - if (table["format"] == "json") { - return { - table_name: table["name"], - url: table["url"], - format: table["format"], - sample: table["sample"], - } - } - else if (table["format"] == "csv" || table["format"] == "tsv") { - const delimiter = table["format"] === "csv" ? "," : "\t"; - const rows = table["sample"] - .split("\n") - .map((row: string) => row.split(delimiter)); - - if (rows.length > 0) { - const headers = rows[0]; - const dataRows = rows.slice(1); - const sampleData = dataRows.map((row: string[]) => { - const obj: any = {}; - headers.forEach((header: string, index: number) => { - obj[header] = row[index] || ''; - }); - return obj; - }); - - return { - table_name: table["name"], - url: table["url"], - format: table["format"], - sample: sampleData, - }; - } - - return { - table_name: table["name"], - url: table["url"], - format: table["format"], - sample: [], - }; - } - }) - return { - tables: tables, - name: info["name"], - source: info["source"], - live: info["live"], - refreshIntervalSeconds: info["refreshIntervalSeconds"] - } - }).filter((t: DatasetMetadata | undefined) => t != undefined); - setDatasetPreviews(datasets); - }); - } else if (open && activeTab === 'url') { - apiRequest(`${window.location.origin}/api/demo-stream/info`) - .then(({ data }) => { - const demoExamples = data.demo_examples + if (!(open && activeTab === 'url')) return; + let cancelled = false; + fetch(`${window.location.origin}/api/demo-stream/info`) + .then(r => r.ok ? r.json() : Promise.reject(new Error(`HTTP ${r.status}`))) + .then((data) => { + if (cancelled) return; + const demoExamples = (data.demo_examples || []) .map((ex: any) => ({ label: ex.name, url: ex.url, refreshSeconds: ex.refresh_seconds || 60, resetUrl: ex.reset_url || undefined, - })); - + })); setExampleUrls(demoExamples); }) .catch((err) => { - console.error('Failed to load examples:', err); + if (!cancelled) console.error('Failed to load demo examples:', err); }); - } + return () => { cancelled = true; }; }, [open, activeTab]); const handleClose = useCallback(() => { @@ -1569,6 +1722,9 @@ export const UnifiedDataUploadDialog: React.FC = ( const hasMultipleUrlTables = (urlPreviewTables?.length || 0) > 1; const showFilePreview = filePreviewLoading || !!filePreviewError || (filePreviewTables && filePreviewTables.length > 0); const showUrlPreview = urlPreviewLoading || !!urlPreviewError || (urlPreviewTables && urlPreviewTables.length > 0); + const showExamples = exampleUrls.length > 0 + && (!urlPreviewTables || urlPreviewTables.length === 0) + && !urlPreviewLoading; const hasPasteContent = (pasteContent || '').trim() !== ''; // Get current tab title for header @@ -1583,7 +1739,6 @@ export const UnifiedDataUploadDialog: React.FC = ( } const tabTitles: Record = { 'menu': t('upload.title'), - 'explore': t('upload.sampleDatasets'), 'upload': t('upload.uploadFile'), 'paste': t('upload.pasteData'), 'extract': t('upload.dataAssistant'), @@ -1679,6 +1834,7 @@ export const UnifiedDataUploadDialog: React.FC = ( {/* Main Menu */} + setActiveTab(tab)} onSelectConnector={(conn) => { @@ -1692,11 +1848,35 @@ export const UnifiedDataUploadDialog: React.FC = ( setActiveTab(`connector:${conn.id}` as UploadTabType); } }} + onStartChat={(prompt, images) => { + const hasText = prompt.trim().length > 0; + const hasImages = !!images && images.length > 0; + // If a prior conversation exists, treat a + // new query from the menu as a fresh data + // reload and reset the chat. Without this + // the new prompt would be appended onto an + // unrelated thread, confusing the agent. + if ((hasText || hasImages) && dataLoadingChatMessages.length > 0) { + dispatch(dfActions.clearChatMessages()); + } + setSeededChatPrompt(prompt); + setSeededChatImages(images); + setAutoSendSeededPrompt(hasText || hasImages); + setActiveTab('extract'); + }} + hasPriorConversation={dataLoadingChatMessages.length > 0} + onResumeChat={() => { + // Reopen the existing thread without + // clearing messages or auto-sending. + setSeededChatPrompt(undefined); + setSeededChatImages(undefined); + setAutoSendSeededPrompt(false); + setActiveTab('extract'); + }} serverConfig={serverConfig} - variant="dialog" - hideSampleDatasets={hideSampleDatasets} connectors={connectorInstances} /> + @@ -1815,10 +1995,19 @@ export const UnifiedDataUploadDialog: React.FC = ( p: 2, justifyContent: showUrlPreview ? 'flex-start' : 'center', }}> - - {/* URL Input */} - - + + {/* Left column: URL input + Watch Mode */} + + = ( onChange={(e) => setTableURL((e.target.value || '').trim())} inputRef={urlInputRef} error={tableURL !== "" && !hasValidUrl} - helperText={tableURL !== "" && !hasValidUrl ? t('upload.helperText.urlInvalid') : undefined} + helperText={tableURL !== "" && !hasValidUrl + ? t('upload.helperText.urlInvalid') + : t('upload.urlFormatHint')} size="small" - sx={{ + sx={{ flex: 1, - '& .MuiInputBase-input': { - fontSize: '0.875rem', - }, - '& .MuiInputBase-input::placeholder': { - fontSize: '0.875rem', - }, + '& .MuiInputBase-input, & .MuiInputBase-input::placeholder': { fontSize: '0.875rem' }, + '& .MuiFormHelperText-root': { fontSize: '0.75rem', ml: 0.5 }, }} /> - - {t('upload.urlFormatHint')} - - - - {/* Watch/Auto-refresh options - always visible */} - - + + {/* Watch mode — single inline row, no card wrapper. */} + = ( } label={ - {t('upload.watchMode')} + {t('upload.autoRefresh')} } + sx={{ mr: 0.25 }} /> - {urlAutoRefresh ? ( - - - {t('upload.checkUpdatesEvery')} + {urlAutoRefresh && ( + <> + + every {[ { seconds: 5, label: '5s' }, @@ -1885,118 +2068,110 @@ export const UnifiedDataUploadDialog: React.FC = ( { seconds: 1800, label: '30m' }, { seconds: 3600, label: '1h' }, { seconds: 86400, label: '24h' }, - ].map((opt) => ( - setUrlRefreshInterval(opt.seconds)} - sx={{ - cursor: 'pointer', - fontSize: '0.7rem', - height: 24, - }} - /> - ))} - - ) : - {t('upload.watchHint')} - } - + ].map((opt) => { + const selected = urlRefreshInterval === opt.seconds; + return ( + setUrlRefreshInterval(opt.seconds)} + sx={{ + cursor: 'pointer', + fontSize: '0.7rem', + height: 20, + border: 'none', + bgcolor: selected ? 'primary.main' : 'transparent', + color: selected ? 'primary.contrastText' : 'text.secondary', + fontWeight: selected ? 600 : 400, + '& .MuiChip-label': { px: 0.75 }, + '&:hover': { bgcolor: selected ? 'primary.dark' : 'action.hover' }, + }} + /> + ); + })} + + )} - + - {/* Example APIs - Compact List */} - {(!urlPreviewTables || urlPreviewTables.length === 0) && !urlPreviewLoading && ( - - + {/* Right column: example APIs. */} + {showExamples && ( + + {t('upload.tryExamples')} - - {exampleUrls.map((example) => ( - ( + + { - console.log('example', example); - if (example.url) { - - setTableURL(example.url); - setUrlAutoRefresh(true); - setUrlRefreshInterval(example.refreshSeconds || 60); - handleURLPreview(example.url); - } + if (!example.url) return; + setTableURL(example.url); + setUrlAutoRefresh(true); + setUrlRefreshInterval(example.refreshSeconds || 60); + handleURLPreview(example.url); }} sx={{ + fontSize: '0.75rem', + color: 'primary.main', cursor: 'pointer', - '&::before': { - content: '"• "', - color: 'text.secondary', - marginRight: 0.5, - } + overflow: 'hidden', + textOverflow: 'ellipsis', + whiteSpace: 'nowrap', + '&:hover': { textDecoration: 'underline' }, }} > - + {example.resetUrl && ( + { + e.stopPropagation(); + apiRequest(`${window.location.origin}${example.resetUrl}`, { method: 'POST' }) + .then(() => { + dispatch(dfActions.addMessages({ + timestamp: Date.now(), type: 'success', + component: 'data upload', value: 'Example data reset successful', + })); + }) + .catch(() => { + dispatch(dfActions.addMessages({ + timestamp: Date.now(), type: 'error', + component: 'data upload', value: 'Failed to reset example data', + })); + }); + }} + sx={{ + fontSize: '0.7rem', + color: 'text.secondary', + cursor: 'pointer', + display: 'inline-flex', + alignItems: 'center', + gap: 0.25, + '&:hover': { color: 'warning.main' }, }} > - {example.label} + + {t('upload.resetLabel')} - {example.resetUrl && ( - { - e.stopPropagation(); - apiRequest(`${window.location.origin}${example.resetUrl}`, { method: 'POST' }) - .then(() => { - dispatch(dfActions.addMessages({ - timestamp: Date.now(), type: 'success', - component: 'data upload', value: 'Example data reset successful', - })); - }) - .catch(() => { - dispatch(dfActions.addMessages({ - timestamp: Date.now(), type: 'error', - component: 'data upload', value: 'Failed to reset example data', - })); - }); - }} - sx={{ - fontSize: '0.7rem', - color: 'text.secondary', - ml: 1, - cursor: 'pointer', - display: 'inline-flex', - alignItems: 'center', - gap: 0.25, - '&:hover': { color: 'warning.main' }, - }} - > - - {t('upload.resetLabel')} - - )} - - ))} - + )} + + ))} )} @@ -2228,7 +2403,11 @@ export const UnifiedDataUploadDialog: React.FC = ( {/* Extract Data Tab */} - + {/* Local Folder Tab */} @@ -2250,147 +2429,7 @@ export const UnifiedDataUploadDialog: React.FC = ( )} - {/* Explore Sample Datasets Tab */} - - - { - // Check if this is a live dataset - const isLiveDataset = dataset.live === true; - - setDatasetLoading(true); - setDatasetLoadingLabel(t('upload.loadingDataset', { name: dataset.name })); - - try { - const loadPromises = dataset.tables.map(async (table) => { - // For live datasets with relative URLs, construct full URL - let fullUrl = table.url; - if (table.url.startsWith('/')) { - fullUrl = window.location.origin + table.url; - } - - const res = await fetch(fullUrl); - const textData = await res.text(); - let tableName = table.url.split("/").pop()?.split(".")[0]?.split("?")[0] || 'table-' + Date.now().toString().substring(0, 8); - let dictTable; - if (table.format == "csv") { - dictTable = createTableFromText(tableName, textData); - } else if (table.format == "json") { - dictTable = createTableFromFromObjectArray(tableName, JSON.parse(textData), true); - } - if (dictTable) { - // For live datasets, set up as stream source with auto-refresh - if (isLiveDataset) { - dictTable.source = { - type: 'stream', - url: fullUrl, - autoRefresh: true, - refreshIntervalSeconds: dataset.refreshIntervalSeconds || 60, - lastRefreshed: Date.now() - }; - } else { - // Regular example data - dictTable.source = { type: 'example', url: table.url }; - } - await dispatch(loadTable({ table: dictTable })); - } - }); - await Promise.all(loadPromises); - } catch (error) { - console.error('Failed to load dataset:', error); - } finally { - setDatasetLoading(false); - setDatasetLoadingLabel(''); - } - handleClose(); - }} - handleSelectDatasetNewSession={activeWorkspace ? async (dataset) => { - const now = new Date(); - const date = `${now.getFullYear()}${String(now.getMonth() + 1).padStart(2, '0')}${String(now.getDate()).padStart(2, '0')}`; - const time = `${String(now.getHours()).padStart(2, '0')}${String(now.getMinutes()).padStart(2, '0')}${String(now.getSeconds()).padStart(2, '0')}`; - const short = generateUUID().slice(0, 4); - const wsId = `session_${date}_${time}_${short}`; - dispatch(dfActions.resetForNewWorkspace({ id: wsId, displayName: dataset.name })); - - const isLiveDataset = dataset.live === true; - - setDatasetLoading(true); - setDatasetLoadingLabel(t('upload.loadingDataset', { name: dataset.name })); - - try { - const loadPromises = dataset.tables.map(async (table) => { - let fullUrl = table.url; - if (table.url.startsWith('/')) { - fullUrl = window.location.origin + table.url; - } - - const res = await fetch(fullUrl); - const textData = await res.text(); - let tableName = table.url.split("/").pop()?.split(".")[0]?.split("?")[0] || 'table-' + Date.now().toString().substring(0, 8); - let dictTable; - if (table.format == "csv") { - dictTable = createTableFromText(tableName, textData); - } else if (table.format == "json") { - dictTable = createTableFromFromObjectArray(tableName, JSON.parse(textData), true); - } - if (dictTable) { - if (isLiveDataset) { - dictTable.source = { - type: 'stream', - url: fullUrl, - autoRefresh: true, - refreshIntervalSeconds: dataset.refreshIntervalSeconds || 60, - lastRefreshed: Date.now() - }; - } else { - dictTable.source = { type: 'example', url: table.url }; - } - await dispatch(loadTable({ table: dictTable })); - } - }); - await Promise.all(loadPromises); - } catch (error) { - console.error('Failed to load dataset:', error); - } finally { - setDatasetLoading(false); - setDatasetLoadingLabel(''); - } - handleClose(); - } : undefined} - /> - - - - - {/* Loading overlay for dataset loading */} - theme.zIndex.drawer + 1, - backgroundColor: 'rgba(255, 255, 255, 0.85)', - backdropFilter: 'blur(4px)', - display: 'flex', - flexDirection: 'column', - gap: 2, - }} - > - - - {datasetLoadingLabel || t('upload.loadingData')} - - - ); }; diff --git a/src/views/ViewUtils.tsx b/src/views/ViewUtils.tsx index 368dea4a..e54b9cad 100644 --- a/src/views/ViewUtils.tsx +++ b/src/views/ViewUtils.tsx @@ -14,6 +14,24 @@ import CommitIcon from '@mui/icons-material/Commit'; import { DictTable } from '../components/ComponentType'; +/** + * Compact MUI Menu/Select slotProps for dense list surfaces (column filter + * popover, table-header kebab). Keeps menu rows visually consistent across + * the data-grid filter UI (design-doc 31). + */ +export const DENSE_MENU_SLOT_PROPS = { + paper: { + sx: { + '& .MuiMenuItem-root': { + fontSize: 12, + minHeight: 28, + py: 0.25, + px: 1, + }, + }, + }, +} as const; + export const groupConceptItems = (conceptShelfItems: FieldItem[], tables: DictTable[]) => { // group concepts based on which source table they belongs to return conceptShelfItems.map(f => { @@ -62,7 +80,14 @@ export const getIconFromType = (t: Type | undefined): JSX.Element => { export const formatCellValue = (value: any, dataType?: Type, semanticType?: string): string => { if (value == null) return ''; - if (typeof value === 'number' && dataType !== Type.Duration) { + if (dataType === Type.DateTime || dataType === Type.Date || dataType === Type.Time) { + return formatTemporalValue(value, dataType); + } + if (dataType === Type.Duration) { + return formatDuration(value); + } + + if (typeof value === 'number') { if (!Number.isFinite(value)) return String(value); if (shouldDisplayNumericSemanticAsPlainText(semanticType)) return String(value); if (Number.isInteger(value)) { @@ -74,13 +99,6 @@ export const formatCellValue = (value: any, dataType?: Type, semanticType?: stri }); } - if (dataType === Type.DateTime || dataType === Type.Date || dataType === Type.Time) { - return formatTemporalValue(value, dataType); - } - if (dataType === Type.Duration) { - return formatDuration(value); - } - if (typeof value === 'boolean') return String(value); if (typeof value === 'object') return String(value); return String(value); @@ -110,6 +128,10 @@ const shouldDisplayNumericSemanticAsPlainText = (semanticType?: string): boolean const formatTemporalValue = (value: any, dataType: Type): string => { if (dataType === Type.Time) { + if (typeof value === 'number') { + const d = new Date(value); + if (!isNaN(d.getTime())) return d.toLocaleTimeString(); + } const d = new Date(`1970-01-01T${value}`); if (isNaN(d.getTime())) return String(value); return d.toLocaleTimeString(); diff --git a/src/views/VisualizationView.tsx b/src/views/VisualizationView.tsx index b6ab396c..96d91d8c 100644 --- a/src/views/VisualizationView.tsx +++ b/src/views/VisualizationView.tsx @@ -46,6 +46,7 @@ import '../scss/DataView.scss'; import { useDispatch, useSelector } from 'react-redux'; import { DataFormulatorState, dfActions, fetchChartInsight } from '../app/dfSlice'; import { assembleVegaChart, extractFieldsFromEncodingMap, getUrls, prepVisTable, fetchWithIdentity } from '../app/utils'; +import { displayRowsCache } from '../app/displayRowsCache'; import { buildEmbeddedDataForChart } from '../app/restyle'; import { apiRequest } from '../app/apiClient'; import embed from 'vega-embed'; @@ -54,17 +55,16 @@ import { DictTable } from "../components/ComponentType"; import AddchartIcon from '@mui/icons-material/Addchart'; import DeleteIcon from '@mui/icons-material/Delete'; -import StarIcon from '@mui/icons-material/Star'; import TerminalIcon from '@mui/icons-material/Terminal'; -import StarBorderIcon from '@mui/icons-material/StarBorder'; import QuestionAnswerIcon from '@mui/icons-material/QuestionAnswer'; import ContentCopyIcon from '@mui/icons-material/ContentCopy'; import ZoomInIcon from '@mui/icons-material/ZoomIn'; import ZoomOutIcon from '@mui/icons-material/ZoomOut'; -import AutoStoriesIcon from '@mui/icons-material/AutoStories'; +import FunctionsIcon from '@mui/icons-material/Functions'; import CasinoIcon from '@mui/icons-material/Casino'; import SaveAltIcon from '@mui/icons-material/SaveAlt'; import OpenInNewIcon from '@mui/icons-material/OpenInNew'; +import { AgentToyIcon, AnimatedAgentToyIcon } from './AgentToyIcon'; import { CHART_TEMPLATES, getChartTemplate } from '../components/ChartTemplates'; @@ -239,11 +239,6 @@ export let SampleSizeEditor: FC<{ } -/** - * Module-level cache: avoids re-fetching server data when switching back to a chart. - */ -const displayRowsCache = new Map(); - /** Main chart uses vega-embed (interactive tooltips). Static toSVG() removes hover behavior. */ const VegaChartRenderer: FC<{ chart: Chart; @@ -255,8 +250,9 @@ const VegaChartRenderer: FC<{ scaleFactor: number; maxStretchFactor?: number; chartUnavailable: boolean; + insightTitle?: string; onSpecReady?: (spec: any | null) => void; -}> = React.memo(({ chart, conceptShelfItems, visTableRows, tableMetadata, chartWidth, chartHeight, scaleFactor, maxStretchFactor, chartUnavailable, onSpecReady }) => { +}> = React.memo(({ chart, conceptShelfItems, visTableRows, tableMetadata, chartWidth, chartHeight, scaleFactor, maxStretchFactor, chartUnavailable, insightTitle, onSpecReady }) => { const dispatch = useDispatch(); const elementId = `focused-chart-element-${chart.id}`; @@ -294,8 +290,7 @@ const VegaChartRenderer: FC<{ chart, visTableRows, tableMetadata, conceptShelfItems, ); spec.data = { values: variantValues }; - spec.width = chartWidth; - spec.height = chartHeight; + } else { spec = assembleVegaChart( chart.chartType, @@ -329,6 +324,24 @@ const VegaChartRenderer: FC<{ } spec['background'] = 'white'; + + // Inject the insight title into the Vega-Lite spec instead of rendering + // it as outside HTML. Vega-Lite anchors the title against the plot group + // (frame: 'group'), so it stays centered over the actual chart area even + // when a legend pushes the embed wrapper off-center. We don't override a + // title already supplied by a style variant. + if (insightTitle && !spec.title) { + const faceted = !!(chart.encodingMap.column?.fieldID || chart.encodingMap.row?.fieldID); + spec.title = { + text: insightTitle, + anchor: 'middle', + fontWeight: 500, + fontSize: 13, + color: '#555', + offset: 12, + }; + } + onSpecReady?.(spec); const el = document.getElementById(elementId); @@ -359,7 +372,7 @@ const VegaChartRenderer: FC<{ el.innerHTML = ''; }; - }, [chart.id, chart.chartType, chart.encodingMap, chart.config, chart.activeVariantId, chart.styleVariants, conceptShelfItems, visTableRows, tableMetadata, chartWidth, chartHeight, scaleFactor, maxStretchFactor, chartUnavailable, onSpecReady, elementId]); + }, [chart.id, chart.chartType, chart.encodingMap, chart.config, chart.activeVariantId, chart.styleVariants, conceptShelfItems, visTableRows, tableMetadata, chartWidth, chartHeight, scaleFactor, maxStretchFactor, chartUnavailable, insightTitle, onSpecReady, elementId]); if (chart.chartType === "Auto") { return @@ -387,12 +400,13 @@ const VegaChartRenderer: FC<{ } return ( - + @@ -571,6 +585,7 @@ export const ChartEditorFC: FC<{}> = function ChartEditorFC({}) { setVisTableTotalRowCount(data.total_row_count); setDataVersion(versionId); displayRowsCache.set(versionId, { rows: data.rows, totalCount: data.total_row_count }); + dispatch(dfActions.bumpDisplayRowsTick()); } // Else: this response is stale, ignore it }) @@ -593,6 +608,7 @@ export const ChartEditorFC: FC<{}> = function ChartEditorFC({}) { setDataVersion(versionId); // Cache for instant reuse on chart revisit displayRowsCache.set(versionId, { rows: clonedRows, totalCount: table.rows.length }); + dispatch(dfActions.bumpDisplayRowsTick()); } } @@ -665,21 +681,6 @@ export const ChartEditorFC: FC<{}> = function ChartEditorFC({}) { }, }; - let saveButton = ( - - - { - if (!chartUnavailable) { - dispatch(dfActions.saveUnsaveChart(focusedChart.id)); - } - }}> - {focusedChart.saved ? : } - - - - ); - let deleteButton = ( @@ -724,6 +725,7 @@ export const ChartEditorFC: FC<{}> = function ChartEditorFC({}) { backgroundColor: active ? 'rgba(25, 118, 210, 0.08)' : 'transparent', transition: 'all 0.15s ease', minWidth: 'auto', + '& .MuiButton-startIcon': { mr: 0.5 }, '&:hover': { backgroundColor: 'rgba(25, 118, 210, 0.08)', color: 'primary.main', @@ -734,7 +736,7 @@ export const ChartEditorFC: FC<{}> = function ChartEditorFC({}) { ); @@ -743,14 +745,14 @@ export const ChartEditorFC: FC<{}> = function ChartEditorFC({}) { , ...(hasConcepts ? [ ] : []), @@ -772,13 +774,10 @@ export const ChartEditorFC: FC<{}> = function ChartEditorFC({}) { sx={toggleBtnSx(bottomTab === 'insight')} startIcon={insightLoading ? : } onClick={() => { - setBottomTab(prev => { - if (prev === 'insight') return ''; - if (!insightFresh && !insightLoading) { - dispatch(fetchChartInsight({ chartId: focusedChart.id, tableId: table.id }) as any); - } - return 'insight'; - }); + if (bottomTab !== 'insight' && !insightFresh && !insightLoading) { + dispatch(fetchChartInsight({ chartId: focusedChart.id, tableId: table.id }) as any); + } + setBottomTab('insight'); }}> {t('chart.insight')} @@ -790,7 +789,6 @@ export const ChartEditorFC: FC<{}> = function ChartEditorFC({}) { ...derivedTableItems, , logButton, - saveButton, // vegaEditorButton, deleteButton, ] @@ -849,15 +847,23 @@ export const ChartEditorFC: FC<{}> = function ChartEditorFC({}) { let focusedElement = - {insightFresh && focusedChart.insight?.title && ( - - {focusedChart.insight.title} - - )} - + {/* + Chart container chrome + ────────────────────── + - pt: 40 → reserves a strip at the top so the absolutely + positioned zoom-slider overlay (chartResizer, ~32px tall + anchored top-left) never covers chart content. Without this, + full-width charts like KPI grids run right up under the slider. + - pr: 28 → reserves a strip on the right for vega-embed's + actions menu ("..."), which floats at the top-right of the + Vega canvas and can otherwise hug / extend past the panel edge. + - minHeight: 280 → guarantees the Vega actions menu and its + dropdown have vertical room to render even when a chart's + intrinsic height is very small (e.g. one row of compact cards). + These are view-level concerns and intentionally NOT solved per + chart template. + */} + = function ChartEditorFC({}) { scaleFactor={localScaleFactor} maxStretchFactor={config.maxStretchFactor} chartUnavailable={chartUnavailable} + insightTitle={insightFresh && focusedChart.insight?.title ? focusedChart.insight.title : undefined} onSpecReady={handleSpecReady} /> @@ -1101,6 +1108,37 @@ export const ChartEditorFC: FC<{}> = function ChartEditorFC({}) { } +// Landing / empty-state hero shown when no chart is focused AND there is +// no existing thread (no charts, no derived/ancestor tables) for the +// focused table — i.e., the very first moment after data is loaded. +// Leads with a friendly welcome and points the user toward the chat input +// at the bottom-left, then presents the manual chart palette below an +// "or" divider so it's still one click away. +const EmptyStateHero: FC<{ chartSelectionBox: React.ReactNode }> = ({ chartSelectionBox }) => { + const { t } = useTranslation(); + return ( + + + + + {t('chart.emptyStateSubtitle')} + + + {/* "or" divider + manual chart picker — always visible on the + fresh-start landing so a user who'd rather start manually + isn't gated behind an extra click. */} + + + {t('chart.orCreateYourself')} + + + + {chartSelectionBox} + + + ); +}; + export const VisualizationViewFC: FC = function VisualizationView({ }) { const { t } = useTranslation(); @@ -1125,19 +1163,30 @@ export const VisualizationViewFC: FC = function VisualizationView // when there is no result and synthesis is running, just show the waiting panel if (!focusedChart || focusedChart?.chartType == "?") { - let chartSelectionBox = + let chartSelectionBox = {Object.entries(CHART_TEMPLATES) .filter(([category, templates]) => category !== "Custom" && templates.some(t => t.chart !== "Auto")) .map(([category, templates]) => ( - + + {category} {templates .filter(t => t.chart !== "Auto") .map((t, index) => ( )) } @@ -1163,14 +1214,45 @@ export const VisualizationViewFC: FC = function VisualizationView - - {focusedTableId ? : null} - - - {t('chart.orStartWithChartType')} - - - {chartSelectionBox} + + {(() => { + // "Has thread" = the focused table is + // already part of an exploration: it has + // real charts, or it's derived from / + // feeds into another table. In that case + // we keep the original compact layout + // (provenance ribbon + "or" + palette). + // Otherwise (fresh start, just-loaded + // data) we show the welcoming hero with + // a chat pointer. + const hasRealCharts = !!focusedTableId && allCharts.some(c => + c.tableRef === focusedTableId + && c.chartType !== '?' + && c.chartType !== 'Auto' + && c.source !== 'trigger' + ); + const focusedTable = focusedTableId ? tables.find(t => t.id === focusedTableId) : undefined; + const hasDerivation = !!focusedTable && ( + focusedTable.derive !== undefined + || tables.some(t => t.derive?.trigger?.tableId === focusedTableId) + ); + const hasThread = hasRealCharts || hasDerivation; + + if (hasThread) { + return ( + <> + {focusedTableId ? : null} + + + {t('chart.orStartWithChartType')} + + + {chartSelectionBox} + + ); + } + return ; + })()} {focusedId?.type === 'table' && focusedTableId && (() => { diff --git a/src/views/dataLoadingSuggestions.ts b/src/views/dataLoadingSuggestions.ts new file mode 100644 index 00000000..8d91b92b --- /dev/null +++ b/src/views/dataLoadingSuggestions.ts @@ -0,0 +1,145 @@ +// Shared sample-task suggestions for the Data Loading Agent. Both the +// upload-dialog entry point (`UnifiedDataUploadDialog`) and the in-session +// chat panel (`DataLoadingChat`) render these via `AgentChatInput`'s +// `focusSuggestions` dropdown. Keep this single source of truth so the +// two surfaces stay in sync. + +import { TFunction } from 'i18next'; +import React from 'react'; +import QuestionAnswerOutlinedIcon from '@mui/icons-material/QuestionAnswerOutlined'; +import SearchIcon from '@mui/icons-material/Search'; +import ImageOutlinedIcon from '@mui/icons-material/ImageOutlined'; +import DescriptionOutlinedIcon from '@mui/icons-material/DescriptionOutlined'; +import TableChartOutlinedIcon from '@mui/icons-material/TableChartOutlined'; +import { apiRequest } from '../app/apiClient'; +import { getUrls } from '../app/utils'; +import exampleImageTable from '../assets/example-image-table.png'; + +export interface DataLoadingSuggestion { + kind: string; + label: string; + icon?: React.ReactNode; + onClick: () => void; +} + +export interface BuildSuggestionsArgs { + t: TFunction; + setInput: (value: string) => void; + setImages: (images: string[]) => void; + setAttachments: (names: string[]) => void; + /** Optional hook that workspaces use to make sure a session exists before uploading. */ + ensureActiveWorkspace?: () => void; +} + +const EXCEL_SAMPLE_NAME = 'climate-gas-indicator.xlsx'; + +export function buildDataLoadingSuggestions( + { t, setInput, setImages, setAttachments, ensureActiveWorkspace }: BuildSuggestionsArgs, +): DataLoadingSuggestion[] { + const kindAsk = t('upload.agentChatSuggestion.kind.ask', { defaultValue: 'ask' }); + const kindFind = t('upload.agentChatSuggestion.kind.find', { defaultValue: 'find' }); + const kindExtract = t('upload.agentChatSuggestion.kind.extract', { defaultValue: 'extract' }); + + const askLabel = t('upload.agentChatSuggestion.askConnected', { + defaultValue: 'What datasets do we have from connected sources?', + }); + const findLabel = t('upload.agentChatSuggestion.findCPI', { + defaultValue: 'Help me load consumer price index data', + }); + const extractExcelLabel = t('upload.agentChatSuggestion.extractFromExcel', { + defaultValue: 'Extract data from an attached Excel file', + }); + const extractImageLabel = t('dataLoading.examples.extractFromImageExample', { + defaultValue: 'Extract revenue data from this image', + }); + const extractTextLabel = t('dataLoading.examples.extractFromTextExample', { + defaultValue: 'Extract revenue growth data from this text: Business Highlights ...', + }); + const extractTextPrompt = t('dataLoading.examples.extractFromTextPrompt', { + defaultValue: extractTextLabel, + }); + + const iconSx = { fontSize: 14 }; + + return [ + { + kind: kindAsk, + label: askLabel, + icon: React.createElement(QuestionAnswerOutlinedIcon, { sx: iconSx }), + onClick: () => { + setImages([]); + setAttachments([]); + setInput(askLabel); + }, + }, + { + kind: kindFind, + label: findLabel, + icon: React.createElement(SearchIcon, { sx: iconSx }), + onClick: () => { + setImages([]); + setAttachments([]); + setInput(findLabel); + }, + }, + { + kind: kindExtract, + label: extractExcelLabel, + icon: React.createElement(TableChartOutlinedIcon, { sx: iconSx }), + onClick: () => { + // Surface the attachment chip synchronously so it is + // always present when the user hits send, even if the + // upload below is still mid-flight. The chip is what + // gets serialised into the outgoing `[Uploaded: name]` + // mention and ultimately the chat bubble. + setImages([]); + setAttachments([EXCEL_SAMPLE_NAME]); + setInput(extractExcelLabel); + ensureActiveWorkspace?.(); + fetch(`/${EXCEL_SAMPLE_NAME}`) + .then(res => res.blob()) + .then(blob => { + const file = new File([blob], EXCEL_SAMPLE_NAME, { + type: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', + }); + const formData = new FormData(); + formData.append('file', file); + return apiRequest(getUrls().SCRATCH_UPLOAD_URL, { + method: 'POST', body: formData, + }); + }) + .catch(err => console.error('Sample Excel upload failed:', err)); + }, + }, + { + kind: kindExtract, + label: extractImageLabel, + icon: React.createElement(ImageOutlinedIcon, { sx: iconSx }), + onClick: () => { + fetch(exampleImageTable) + .then(res => res.blob()) + .then(blob => { + const reader = new FileReader(); + reader.onload = () => { + if (reader.result) { + setImages([reader.result as string]); + setAttachments([]); + setInput(extractImageLabel); + } + }; + reader.readAsDataURL(blob); + }); + }, + }, + { + kind: kindExtract, + label: extractTextLabel, + icon: React.createElement(DescriptionOutlinedIcon, { sx: iconSx }), + onClick: () => { + setImages([]); + setAttachments([]); + setInput(extractTextPrompt); + }, + }, + ]; +} diff --git a/tests/backend/agents/test_agent_language.py b/tests/backend/agents/test_agent_language.py new file mode 100644 index 00000000..9dd6c474 --- /dev/null +++ b/tests/backend/agents/test_agent_language.py @@ -0,0 +1,234 @@ +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT License. + +"""Unit tests for data_formulator.agents.agent_language. + +Covers: +- build_language_instruction: English returns empty string, non-English + returns a non-empty instruction, mode='compact' vs mode='full', + unknown language codes, empty/None input, extra rules injection (zh/ja). +- inject_language_instruction: appending, marker-based insertion, + empty instruction is a no-op, marker not found falls back to append. +- LANGUAGE_DISPLAY_NAMES and LANGUAGE_EXTRA_RULES registry sanity. +""" + +from __future__ import annotations + +import pytest + +from data_formulator.agents.agent_language import ( + LANGUAGE_DISPLAY_NAMES, + LANGUAGE_EXTRA_RULES, + DEFAULT_LANGUAGE, + build_language_instruction, + inject_language_instruction, +) + +pytestmark = [pytest.mark.backend] + + +# --------------------------------------------------------------------------- +# LANGUAGE_DISPLAY_NAMES registry +# --------------------------------------------------------------------------- + +class TestLanguageRegistry: + def test_english_in_registry(self): + assert "en" in LANGUAGE_DISPLAY_NAMES + + def test_common_languages_present(self): + for lang in ["zh", "ja", "ko", "fr", "de", "es", "pt"]: + assert lang in LANGUAGE_DISPLAY_NAMES, f"{lang!r} missing from LANGUAGE_DISPLAY_NAMES" + + def test_display_names_are_non_empty_strings(self): + for code, name in LANGUAGE_DISPLAY_NAMES.items(): + assert isinstance(name, str) and name.strip(), \ + f"Display name for {code!r} is blank or not a string" + + def test_default_language_is_en(self): + assert DEFAULT_LANGUAGE == "en" + + def test_extra_rules_values_are_strings(self): + for code, rule in LANGUAGE_EXTRA_RULES.items(): + assert isinstance(rule, str), f"Extra rule for {code!r} is not a string" + + def test_extra_rules_codes_are_subset_of_display_names(self): + """Every code that has extra rules must also have a display name.""" + for code in LANGUAGE_EXTRA_RULES: + assert code in LANGUAGE_DISPLAY_NAMES, \ + f"{code!r} has extra rules but no display name" + + +# --------------------------------------------------------------------------- +# build_language_instruction — English / empty input +# --------------------------------------------------------------------------- + +class TestBuildLanguageInstructionEnglish: + def test_english_returns_empty_string(self): + assert build_language_instruction("en") == "" + + def test_english_compact_also_returns_empty(self): + assert build_language_instruction("en", mode="compact") == "" + + def test_empty_string_defaults_to_en_returns_empty(self): + assert build_language_instruction("") == "" + + def test_none_coerced_to_default_returns_empty(self): + # None is cast via (language or DEFAULT_LANGUAGE) — equivalent to "en" + assert build_language_instruction(None) == "" # type: ignore[arg-type] + + def test_whitespace_only_returns_empty(self): + assert build_language_instruction(" ") == "" + + def test_case_insensitive_en(self): + assert build_language_instruction("EN") == "" + assert build_language_instruction("En") == "" + + +# --------------------------------------------------------------------------- +# build_language_instruction — non-English, full mode +# --------------------------------------------------------------------------- + +class TestBuildLanguageInstructionFull: + @pytest.mark.parametrize("lang", ["zh", "ja", "ko", "fr", "de", "es"]) + def test_non_english_returns_non_empty(self, lang): + result = build_language_instruction(lang) + assert result, f"Expected non-empty instruction for {lang!r}" + + def test_result_contains_language_marker(self): + result = build_language_instruction("zh") + assert "[LANGUAGE INSTRUCTION]" in result + + def test_result_contains_display_name(self): + result = build_language_instruction("zh") + assert "Simplified Chinese" in result + + def test_full_mode_is_default(self): + assert build_language_instruction("zh") == build_language_instruction("zh", mode="full") + + def test_full_mode_mentions_user_visible_fields(self): + result = build_language_instruction("ja") + assert "title" in result + assert "display_instruction" in result + + def test_full_mode_mentions_internal_fields(self): + result = build_language_instruction("fr") + assert "output_variable" in result or "JSON" in result + + def test_zh_extra_rules_injected(self): + result = build_language_instruction("zh") + assert "Simplified Chinese" in result + assert "Traditional" in result # warns not to use Traditional Chinese + + def test_ja_extra_rules_injected(self): + result = build_language_instruction("ja") + assert "です" in result or "ます" in result + + def test_lang_without_extra_rules_has_no_extra_block(self): + """Languages without LANGUAGE_EXTRA_RULES should not error.""" + # French has no extra rules + result = build_language_instruction("fr") + assert result # just must not be empty and must not raise + + +# --------------------------------------------------------------------------- +# build_language_instruction — compact mode +# --------------------------------------------------------------------------- + +class TestBuildLanguageInstructionCompact: + def test_compact_returns_non_empty_for_non_english(self): + result = build_language_instruction("zh", mode="compact") + assert result + + def test_compact_contains_language_marker(self): + result = build_language_instruction("zh", mode="compact") + assert "[LANGUAGE INSTRUCTION]" in result + + def test_compact_shorter_than_full(self): + full = build_language_instruction("zh", mode="full") + compact = build_language_instruction("zh", mode="compact") + assert len(compact) < len(full) + + def test_compact_mentions_display_instruction(self): + result = build_language_instruction("ko", mode="compact") + assert "display_instruction" in result + + def test_compact_instructs_english_for_code(self): + result = build_language_instruction("de", mode="compact") + assert "English" in result + + def test_compact_zh_extra_rules_present(self): + result = build_language_instruction("zh", mode="compact") + assert "Simplified" in result + + +# --------------------------------------------------------------------------- +# build_language_instruction — unknown language codes +# --------------------------------------------------------------------------- + +class TestBuildLanguageInstructionUnknown: + def test_unknown_code_returns_non_empty(self): + """An unknown code should still produce some instruction (graceful degradation).""" + result = build_language_instruction("xx") + assert result # should not be empty since "xx" != "en" + + def test_unknown_code_uses_raw_code_as_display_name(self): + result = build_language_instruction("xx") + # When not in LANGUAGE_DISPLAY_NAMES, the code itself is used as name + assert "xx" in result + + +# --------------------------------------------------------------------------- +# inject_language_instruction +# --------------------------------------------------------------------------- + +class TestInjectLanguageInstruction: + BASE = "You are a helpful assistant.\n[RULES]\nFollow the rules." + INSTRUCTION = "[LANGUAGE INSTRUCTION]\nRespond in French." + + def test_empty_instruction_is_noop(self): + result = inject_language_instruction(self.BASE, "") + assert result == self.BASE + + def test_non_empty_instruction_appended_by_default(self): + result = inject_language_instruction(self.BASE, self.INSTRUCTION) + assert result.endswith(self.INSTRUCTION) + assert result.startswith("You are a helpful assistant.") + + def test_instruction_appended_after_base(self): + result = inject_language_instruction("base prompt", "language block") + assert result == "base prompt\n\nlanguage block" + + def test_marker_found_inserts_before_marker(self): + result = inject_language_instruction(self.BASE, self.INSTRUCTION, marker="[RULES]") + # Instruction should appear before "[RULES]" + idx_instr = result.index(self.INSTRUCTION) + idx_rules = result.index("[RULES]") + assert idx_instr < idx_rules + + def test_marker_not_found_falls_back_to_append(self): + result = inject_language_instruction(self.BASE, self.INSTRUCTION, marker="[NONEXISTENT]") + assert result.endswith(self.INSTRUCTION) + + def test_marker_at_start_of_string_not_inserted(self): + """If marker is at position 0, idx > 0 is False, so fallback to append.""" + result = inject_language_instruction("[RULES]\nrest", self.INSTRUCTION, marker="[RULES]") + # marker at 0 → append + assert result.endswith(self.INSTRUCTION) + + def test_original_prompt_is_preserved_in_output(self): + result = inject_language_instruction(self.BASE, self.INSTRUCTION) + assert self.BASE in result + + def test_marker_insertion_preserves_rest_of_prompt(self): + result = inject_language_instruction(self.BASE, self.INSTRUCTION, marker="[RULES]") + assert "[RULES]" in result + assert "Follow the rules." in result + + def test_round_trip_with_build_and_inject(self): + """build_language_instruction + inject_language_instruction integration.""" + instruction = build_language_instruction("zh") + base = "System prompt here.\n[BEGIN]\nDo things." + result = inject_language_instruction(base, instruction, marker="[BEGIN]") + assert "Simplified Chinese" in result + assert "[BEGIN]" in result + assert "Do things." in result diff --git a/tests/backend/agents/test_client_image_strip.py b/tests/backend/agents/test_client_image_strip.py index e7447e0c..ddfac240 100644 --- a/tests/backend/agents/test_client_image_strip.py +++ b/tests/backend/agents/test_client_image_strip.py @@ -203,16 +203,15 @@ def test_no_retry_on_success(self, mock_litellm, client): # --------------------------------------------------------------------------- -# get_completion retry logic (openai path) +# get_completion retry logic (openai endpoint — now also via litellm) # --------------------------------------------------------------------------- class TestGetCompletionRetryOpenAI: - @patch("data_formulator.agents.client_utils.openai") - def test_retries_on_image_error(self, mock_openai_mod, openai_client): - mock_client = MagicMock() - mock_openai_mod.OpenAI.return_value = mock_client + """OpenAI endpoint now uses litellm.completion like all other providers.""" - mock_client.chat.completions.create.side_effect = [ + @patch("data_formulator.agents.client_utils.litellm") + def test_retries_on_image_error(self, mock_litellm, openai_client): + mock_litellm.completion.side_effect = [ Exception('image_url is not supported, expected `text`'), MagicMock(name="success_response"), ] @@ -225,18 +224,14 @@ def test_retries_on_image_error(self, mock_openai_mod, openai_client): ] result = openai_client.get_completion(messages) - assert mock_client.chat.completions.create.call_count == 2 - retry_kwargs = mock_client.chat.completions.create.call_args_list[1][1] - retry_msgs = retry_kwargs.get("messages", retry_kwargs) + assert mock_litellm.completion.call_count == 2 assert result is not None - @patch("data_formulator.agents.client_utils.openai") - def test_raises_unrelated_error(self, mock_openai_mod, openai_client): - mock_client = MagicMock() - mock_openai_mod.OpenAI.return_value = mock_client - mock_client.chat.completions.create.side_effect = Exception("Unauthorized") + @patch("data_formulator.agents.client_utils.litellm") + def test_raises_unrelated_error(self, mock_litellm, openai_client): + mock_litellm.completion.side_effect = Exception("Unauthorized") with pytest.raises(Exception, match="Unauthorized"): openai_client.get_completion([{"role": "user", "content": "hi"}]) - assert mock_client.chat.completions.create.call_count == 1 + assert mock_litellm.completion.call_count == 1 diff --git a/tests/backend/agents/test_client_utils.py b/tests/backend/agents/test_client_utils.py new file mode 100644 index 00000000..02872a56 --- /dev/null +++ b/tests/backend/agents/test_client_utils.py @@ -0,0 +1,235 @@ +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT License. + +"""Unit tests for data_formulator.agents.client_utils.Client. + +Tests cover the pure-logic parts that don't require a live LLM: +- Model name prefixing for gemini / anthropic / ollama +- Ollama api_base normalisation (trailing /api stripping) +- Image block stripping helpers (_strip_image_blocks, _strip_images_from_messages) +- Image deserialise error detection (_is_image_deserialize_error) +- Client.from_config constructor +""" + +from __future__ import annotations + +import pytest + +from data_formulator.agents.client_utils import Client + +pytestmark = [pytest.mark.backend] + + +# --------------------------------------------------------------------------- +# Model name prefixing +# --------------------------------------------------------------------------- + +class TestModelNamePrefixing: + def test_gemini_prefix_added_when_missing(self): + c = Client("gemini", "gemini-1.5-pro", api_key="k") + assert c.model == "gemini/gemini-1.5-pro" + + def test_gemini_prefix_not_doubled(self): + c = Client("gemini", "gemini/gemini-1.5-pro", api_key="k") + assert c.model == "gemini/gemini-1.5-pro" + + def test_anthropic_prefix_added_when_missing(self): + c = Client("anthropic", "claude-3-opus-20240229", api_key="k") + assert c.model == "anthropic/claude-3-opus-20240229" + + def test_anthropic_prefix_not_doubled(self): + c = Client("anthropic", "anthropic/claude-3", api_key="k") + assert c.model == "anthropic/claude-3" + + def test_ollama_prefix_added_when_missing(self): + c = Client("ollama", "llama3", api_base="http://localhost:11434") + assert c.model == "ollama/llama3" + + def test_ollama_prefix_not_doubled(self): + c = Client("ollama", "ollama/llama3", api_base="http://localhost:11434") + assert c.model == "ollama/llama3" + + def test_openai_model_prefixed(self): + c = Client("openai", "gpt-4o", api_key="k") + assert c.model == "openai/gpt-4o" + + +# --------------------------------------------------------------------------- +# Ollama api_base normalisation +# --------------------------------------------------------------------------- + +class TestOllamaApiBaseNormalisation: + def test_trailing_slash_stripped(self): + c = Client("ollama", "llama3", api_base="http://localhost:11434/") + assert not c.params["api_base"].endswith("/") + + def test_trailing_api_stripped(self): + """Users sometimes copy-paste the URL ending in /api — we strip it.""" + c = Client("ollama", "llama3", api_base="http://localhost:11434/api") + assert not c.params["api_base"].endswith("/api") + assert c.params["api_base"] == "http://localhost:11434" + + def test_trailing_api_slash_stripped(self): + c = Client("ollama", "llama3", api_base="http://localhost:11434/api/") + assert c.params["api_base"] == "http://localhost:11434" + + def test_non_api_suffix_preserved(self): + c = Client("ollama", "llama3", api_base="http://myserver:11434/ollama") + assert c.params["api_base"] == "http://myserver:11434/ollama" + + def test_default_base_when_none(self): + c = Client("ollama", "llama3") + assert c.params["api_base"] == "http://localhost:11434" + + +# --------------------------------------------------------------------------- +# _strip_image_blocks +# --------------------------------------------------------------------------- + +class TestStripImageBlocks: + def setup_method(self): + self.client = Client("openai", "gpt-4o", api_key="k") + + def test_string_content_unchanged(self): + result = self.client._strip_image_blocks("hello") + assert result == "hello" + + def test_image_url_blocks_removed(self): + content = [ + {"type": "text", "text": "Describe this"}, + {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}, + ] + result = self.client._strip_image_blocks(content) + assert len(result) == 1 + assert result[0]["type"] == "text" + + def test_non_image_blocks_preserved(self): + content = [ + {"type": "text", "text": "Hello"}, + {"type": "text", "text": "World"}, + ] + result = self.client._strip_image_blocks(content) + assert len(result) == 2 + + def test_mixed_list_with_non_dict_preserved(self): + content = [ + "plain string", + {"type": "image_url", "image_url": {}}, + {"type": "text", "text": "keep"}, + ] + result = self.client._strip_image_blocks(content) + assert "plain string" in result + assert any(isinstance(r, dict) and r.get("type") == "text" for r in result) + assert not any(isinstance(r, dict) and r.get("type") == "image_url" for r in result) + + def test_all_images_removed_returns_empty_list(self): + content = [{"type": "image_url"}, {"type": "image_url"}] + result = self.client._strip_image_blocks(content) + assert result == [] + + +# --------------------------------------------------------------------------- +# _strip_images_from_messages +# --------------------------------------------------------------------------- + +class TestStripImagesFromMessages: + def setup_method(self): + self.client = Client("openai", "gpt-4o", api_key="k") + + def _multimodal_messages(self): + return [ + {"role": "system", "content": "You are helpful."}, + {"role": "user", "content": [ + {"type": "text", "text": "What is in this image?"}, + {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}, + ]}, + ] + + def test_system_message_unchanged(self): + msgs = self._multimodal_messages() + result = self.client._strip_images_from_messages(msgs) + assert result[0]["content"] == "You are helpful." + + def test_image_blocks_removed_from_user_message(self): + msgs = self._multimodal_messages() + result = self.client._strip_images_from_messages(msgs) + user_content = result[1]["content"] + assert all( + not (isinstance(b, dict) and b.get("type") == "image_url") + for b in user_content + ) + + def test_text_blocks_preserved_in_user_message(self): + msgs = self._multimodal_messages() + result = self.client._strip_images_from_messages(msgs) + user_content = result[1]["content"] + assert any(isinstance(b, dict) and b.get("type") == "text" for b in user_content) + + def test_original_messages_not_mutated(self): + msgs = self._multimodal_messages() + original_len = len(msgs[1]["content"]) + self.client._strip_images_from_messages(msgs) + assert len(msgs[1]["content"]) == original_len + + def test_non_dict_messages_preserved(self): + msgs = ["plain string message"] + result = self.client._strip_images_from_messages(msgs) + assert result == ["plain string message"] + + +# --------------------------------------------------------------------------- +# _is_image_deserialize_error +# --------------------------------------------------------------------------- + +class TestIsImageDeserializeError: + def setup_method(self): + self.client = Client("openai", "gpt-4o", api_key="k") + + def test_image_url_expected_text_detected(self): + err = "Error: image_url content part was sent but expected `text` content" + assert self.client._is_image_deserialize_error(err) is True + + def test_unknown_variant_image_url_detected(self): + err = "unknown variant `image_url`, expected one of `text`, `audio`" + assert self.client._is_image_deserialize_error(err) is True + + def test_unrelated_error_not_detected(self): + assert self.client._is_image_deserialize_error("rate limit exceeded") is False + + def test_empty_string_not_detected(self): + assert self.client._is_image_deserialize_error("") is False + + def test_partial_match_image_url_without_expected(self): + """'image_url' alone without 'expected `text`' should NOT match.""" + assert self.client._is_image_deserialize_error("received image_url block") is False + + +# --------------------------------------------------------------------------- +# Client.from_config +# --------------------------------------------------------------------------- + +class TestFromConfig: + def test_creates_client_from_dict(self): + cfg = {"endpoint": "openai", "model": "gpt-4o", "api_key": "mykey"} + c = Client.from_config(cfg) + assert c.endpoint == "openai" + assert c.model == "openai/gpt-4o" + assert c.params["api_key"] == "mykey" + + def test_strips_whitespace_from_values(self): + cfg = {"endpoint": " openai ", "model": " gpt-4o ", "api_key": " key "} + c = Client.from_config(cfg) + assert c.endpoint == "openai" + assert c.model == "openai/gpt-4o" + assert c.params["api_key"] == "key" + + def test_optional_fields_absent_when_empty(self): + cfg = {"endpoint": "openai", "model": "gpt-4o", "api_key": "k"} + c = Client.from_config(cfg) + # api_base and api_version should not be set when absent + assert "api_base" not in c.params or c.params.get("api_base", "") == "" + + def test_gemini_prefix_applied_via_from_config(self): + cfg = {"endpoint": "gemini", "model": "gemini-pro", "api_key": "k"} + c = Client.from_config(cfg) + assert c.model.startswith("gemini/") diff --git a/tests/backend/agents/test_data_agent_clarification.py b/tests/backend/agents/test_data_agent_clarification.py index 65789a55..16107e6c 100644 --- a/tests/backend/agents/test_data_agent_clarification.py +++ b/tests/backend/agents/test_data_agent_clarification.py @@ -117,3 +117,125 @@ def fake_get_next_action(trajectory, input_tables, outer_iteration=0): assert "id" not in q for opt in q.get("options", []): assert "id" not in opt + + +class TestDataAgentDelegate: + """Tests for the delegate action.""" + + def test_emits_delegate_event_for_data_loading(self, monkeypatch) -> None: + agent = _agent() + + def fake_get_next_action(trajectory, input_tables, outer_iteration=0): + yield { + "type": "agent_action", + "action_data": { + "action": "delegate", + "thought": "User asked about Q4 sales but no sales table is loaded.", + "target": "data_loading", + "message": "I don't see a sales table loaded — want to import one?", + "options": ["quarterly sales 2024"], + }, + "reason": "ok", + "llm_calls": 1, + } + + monkeypatch.setattr(agent, "_get_next_action", fake_get_next_action) + + events = list(agent.run([], "", trajectory=[{"role": "system", "content": "test"}])) + + evt = events[-1] + assert evt["type"] == "delegate" + assert evt["target"] == "data_loading" + assert evt["options"] == ["quarterly sales 2024"] + assert evt["message"] == "I don't see a sales table loaded — want to import one?" + assert evt["thought"] == "User asked about Q4 sales but no sales table is loaded." + assert "trajectory" in evt + assert evt["completed_step_count"] == 0 + + def test_emits_delegate_event_for_report_gen(self, monkeypatch) -> None: + agent = _agent() + + def fake_get_next_action(trajectory, input_tables, outer_iteration=0): + yield { + "type": "agent_action", + "action_data": { + "action": "delegate", + "target": "report_gen", + "message": "Pick an angle for the write-up:", + "options": [ + "Write a 200-word executive summary of regional trends.", + "Create a detailed analytical report on regional trends with category breakdowns.", + ], + }, + "reason": "ok", + "llm_calls": 1, + } + + monkeypatch.setattr(agent, "_get_next_action", fake_get_next_action) + + events = list(agent.run([], "", trajectory=[{"role": "system", "content": "test"}])) + + evt = events[-1] + assert evt["type"] == "delegate" + assert evt["target"] == "report_gen" + assert len(evt["options"]) == 2 + assert evt["options"][0] == "Write a 200-word executive summary of regional trends." + assert evt["options"][1].startswith("Create a detailed") + + def test_missing_prompt_yields_parse_error(self, monkeypatch) -> None: + agent = _agent() + + def fake_get_next_action(trajectory, input_tables, outer_iteration=0): + yield { + "type": "agent_action", + "action_data": { + "action": "delegate", + "target": "data_loading", + "message": "missing", + "options": [""], + }, + "reason": "ok", + "llm_calls": 1, + } + + monkeypatch.setattr(agent, "_get_next_action", fake_get_next_action) + + events = list(agent.run([], "", trajectory=[{"role": "system", "content": "test"}])) + + # Last event should be an error event (not a delegate). + assert events[-1]["type"] != "delegate" + + def test_normalizer_validates_fields(self) -> None: + with pytest.raises(ValueError): + DataAgent._normalize_delegate_action( + {"target": "", "options": ["x"]} + ) + with pytest.raises(ValueError): + DataAgent._normalize_delegate_action( + {"target": "unknown", "options": ["x"]} + ) + with pytest.raises(ValueError): + DataAgent._normalize_delegate_action( + {"target": "data_loading", "options": []} + ) + with pytest.raises(ValueError): + DataAgent._normalize_delegate_action( + {"target": "data_loading", "options": [" "]} + ) + # Normal multi-option report_gen payload. + out = DataAgent._normalize_delegate_action({ + "target": " report_gen ", + "message": " pick one ", + "options": [" Brief recap. ", " Full report. "], + }) + assert out == { + "target": "report_gen", + "message": "pick one", + "options": ["Brief recap.", "Full report."], + } + # Message is optional; >2 options are truncated to 2. + out2 = DataAgent._normalize_delegate_action({ + "target": "report_gen", + "options": ["A", "B", "C"], + }) + assert out2 == {"target": "report_gen", "options": ["A", "B"]} diff --git a/tests/backend/agents/test_data_loading_discovery_tools.py b/tests/backend/agents/test_data_loading_discovery_tools.py index 3b88b8a5..61e65de2 100644 --- a/tests/backend/agents/test_data_loading_discovery_tools.py +++ b/tests/backend/agents/test_data_loading_discovery_tools.py @@ -1,15 +1,18 @@ -"""Tests for the data discovery tools added to DataLoadingAgent. +"""Tests for the data discovery tools on DataLoadingAgent. -These tools (search_data_candidates, read_candidate_metadata, propose_load_plan) -enable the agent to find and recommend data from connected sources. +Covers the navigation surface introduced by design-docs/32: +``list_data``, ``find_data``, ``describe_data``, and the existing +``propose_load_plan`` / ``_normalize_load_plan_filters`` helpers. """ from __future__ import annotations -from unittest.mock import patch, MagicMock +from pathlib import Path +from unittest.mock import patch import pytest from data_formulator.agents.agent_data_loading_chat import DataLoadingAgent +from data_formulator.datalake.catalog_cache import CatalogSearchError, save_catalog pytestmark = [pytest.mark.backend] @@ -27,72 +30,222 @@ def user_home(self): def list_tables(self): return [] + def get_metadata(self): + return None + + +# ------------------------------------------------------------------ +# list_data +# ------------------------------------------------------------------ + +_SAMPLE_TABLES = [ + { + "name": "monthly_orders", + "table_key": "k_orders", + "path": ["Sales", "monthly_orders"], + "metadata": {"description": "Monthly orders", "columns": []}, + }, + { + "name": "monthly_returns", + "table_key": "k_returns", + "path": ["Sales", "monthly_returns"], + "metadata": {"description": "Monthly returns", "columns": []}, + }, + { + "name": "customers", + "table_key": "k_customers", + "path": ["customers"], + "metadata": {"description": "Customer dimension", "columns": []}, + }, +] + + +class TestListData: + def test_no_args_returns_sources_summary(self, tmp_path: Path) -> None: + save_catalog(tmp_path, "pg_prod", _SAMPLE_TABLES) + save_catalog(tmp_path, "flat_src", [{"name": "t1", "table_key": "k1", "metadata": {}}]) + + agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace(tmp_path)) + result = agent._tool_list_data({}) + + assert "sources" in result + by_id = {s["source_id"]: s for s in result["sources"]} + assert by_id["pg_prod"]["table_count"] == 3 + assert by_id["pg_prod"]["is_hierarchical"] is True + assert by_id["flat_src"]["is_hierarchical"] is False + + def test_no_user_home_returns_empty_sources(self) -> None: + agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace(None)) + assert agent._tool_list_data({}) == {"sources": []} -class TestSearchDataCandidates: - def test_delegates_to_handle_search_data_tables(self) -> None: - agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace()) - with patch( - "data_formulator.agents.context.handle_search_data_tables", - return_value="Search results for 'orders' (2 matches):\n1. [workspace] orders\n2. [pg_prod] public.orders", - ) as mock_search: - result = agent._tool_search_data_candidates({"query": "orders", "scope": "all"}) + def test_source_id_at_root(self, tmp_path: Path) -> None: + save_catalog(tmp_path, "pg_prod", _SAMPLE_TABLES) + agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace(tmp_path)) - mock_search.assert_called_once_with("orders", "all", agent.workspace) - assert "result" in result - assert "orders" in result["result"] + result = agent._tool_list_data({"source_id": "pg_prod"}) - def test_default_scope_is_all(self) -> None: - agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace()) - with patch( - "data_formulator.agents.context.handle_search_data_tables", - return_value="No tables found.", - ) as mock_search: - agent._tool_search_data_candidates({"query": "sales"}) + assert result["source_id"] == "pg_prod" + folder_names = {f["name"] for f in result["folders"]} + table_names = {t["name"] for t in result["tables"]} + assert "Sales" in folder_names + assert "customers" in table_names + + def test_source_id_with_path_drills_into_folder(self, tmp_path: Path) -> None: + save_catalog(tmp_path, "pg_prod", _SAMPLE_TABLES) + agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace(tmp_path)) + + result = agent._tool_list_data({"source_id": "pg_prod", "path": ["Sales"]}) + + assert result["folders"] == [] + table_names = {t["name"] for t in result["tables"]} + assert table_names == {"monthly_orders", "monthly_returns"} + + def test_filter_narrows_tables(self, tmp_path: Path) -> None: + save_catalog(tmp_path, "pg_prod", _SAMPLE_TABLES) + agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace(tmp_path)) + + result = agent._tool_list_data({ + "source_id": "pg_prod", + "path": ["Sales"], + "filter": "orders", + }) - mock_search.assert_called_once_with("sales", "all", agent.workspace) + table_names = {t["name"] for t in result["tables"]} + assert table_names == {"monthly_orders"} - def test_empty_query_passes_through(self) -> None: + def test_invalid_path_type_returns_error(self, tmp_path: Path) -> None: + save_catalog(tmp_path, "pg_prod", _SAMPLE_TABLES) + agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace(tmp_path)) + + result = agent._tool_list_data({"source_id": "pg_prod", "path": "Sales"}) + assert "error" in result + + +# ------------------------------------------------------------------ +# find_data +# ------------------------------------------------------------------ + +class TestFindData: + def test_empty_query_returns_error(self) -> None: agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace()) - with patch( - "data_formulator.agents.context.handle_search_data_tables", - return_value="Please provide a search keyword.", - ) as mock_search: - result = agent._tool_search_data_candidates({"query": ""}) + result = agent._tool_find_data({"query": ""}) + assert "error" in result + + def test_searches_catalog_with_regex(self, tmp_path: Path) -> None: + save_catalog(tmp_path, "pg_prod", _SAMPLE_TABLES) + agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace(tmp_path)) + + result = agent._tool_find_data({ + "query": "monthly_(orders|returns)", + "scope": "connected", + }) + + names = {r["name"] for r in result["results"]} + assert names == {"monthly_orders", "monthly_returns"} + for r in result["results"]: + assert r["source_id"] == "pg_prod" + assert r["status"] == "not imported" + + def test_scope_with_source_id(self, tmp_path: Path) -> None: + save_catalog(tmp_path, "pg_prod", _SAMPLE_TABLES) + save_catalog(tmp_path, "other", [{ + "name": "monthly_orders", + "table_key": "kx", + "path": ["monthly_orders"], + "metadata": {}, + }]) + agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace(tmp_path)) + + result = agent._tool_find_data({"query": "monthly", "scope": "pg_prod"}) + + source_ids = {r["source_id"] for r in result["results"]} + assert source_ids == {"pg_prod"} + + def test_scope_with_path_prefix(self, tmp_path: Path) -> None: + save_catalog(tmp_path, "pg_prod", _SAMPLE_TABLES) + agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace(tmp_path)) + + result = agent._tool_find_data({ + "query": "customers|monthly", + "scope": "pg_prod:Sales", + }) + + names = {r["name"] for r in result["results"]} + # ``customers`` lives at the root, not under Sales — must be excluded. + assert "customers" not in names + assert names == {"monthly_orders", "monthly_returns"} + + def test_scope_workspace_skips_catalog(self, tmp_path: Path) -> None: + save_catalog(tmp_path, "pg_prod", _SAMPLE_TABLES) + agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace(tmp_path)) - assert "result" in result + result = agent._tool_find_data({"query": "monthly", "scope": "workspace"}) + # Workspace metadata is empty in the stub → no results, with a note. + assert result["results"] == [] + assert "note" in result + def test_bad_regex_returns_error(self, tmp_path: Path) -> None: + save_catalog(tmp_path, "pg_prod", _SAMPLE_TABLES) + agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace(tmp_path)) -class TestReadCandidateMetadata: + result = agent._tool_find_data({"query": "(", "scope": "connected"}) + assert "error" in result + + def test_no_match_returns_note_and_valid_sources(self, tmp_path: Path) -> None: + save_catalog(tmp_path, "pg_prod", _SAMPLE_TABLES) + agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace(tmp_path)) + + result = agent._tool_find_data({ + "query": "zzz_no_such_thing", + "scope": "connected", + }) + assert result["results"] == [] + assert "pg_prod" in result["valid_source_ids"] + assert "note" in result + + +# ------------------------------------------------------------------ +# describe_data +# ------------------------------------------------------------------ + +class TestDescribeData: def test_delegates_to_handle_read_catalog_metadata(self) -> None: agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace("/tmp/home")) with patch( "data_formulator.agents.context.handle_read_catalog_metadata", - return_value="## orders\nSource: pg_prod\nColumns (5):\n - id (integer)\n - amount (float)", + return_value="## orders\nColumns (1):\n - id (int)", ) as mock_read: - result = agent._tool_read_candidate_metadata({ + result = agent._tool_describe_data({ "source_id": "pg_prod", - "table_key": "public.orders", + "table_key": "k_orders", }) - mock_read.assert_called_once_with("pg_prod", "public.orders", agent.workspace) - assert "result" in result + mock_read.assert_called_once_with("pg_prod", "k_orders", agent.workspace) assert "orders" in result["result"] - def test_missing_params_still_calls(self) -> None: + def test_missing_params_still_calls_with_empty_strings(self) -> None: agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace()) with patch( "data_formulator.agents.context.handle_read_catalog_metadata", return_value="Both source_id and table_key are required.", ) as mock_read: - result = agent._tool_read_candidate_metadata({}) + result = agent._tool_describe_data({}) mock_read.assert_called_once_with("", "", agent.workspace) assert "required" in result["result"] +# ------------------------------------------------------------------ +# propose_load_plan (unchanged behavior) +# ------------------------------------------------------------------ + class TestProposeLoadPlan: - def test_returns_load_plan_action(self) -> None: - agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace()) + def test_returns_load_plan_action(self, tmp_path: Path) -> None: + save_catalog(tmp_path, "pg_prod", [ + {"name": "orders", "table_key": "public.orders", "metadata": {}}, + {"name": "customers", "table_key": "public.customers", "metadata": {}}, + ]) + agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace(tmp_path)) result = agent._tool_propose_load_plan({ "candidates": [ { @@ -108,35 +261,21 @@ def test_returns_load_plan_action(self) -> None: "source_table": "public.customers", }, ], - "reasoning": "Orders for last quarter + customer dimension", + "reasoning": "Orders + customer dimension", }) assert "actions" in result - assert len(result["actions"]) == 1 action = result["actions"][0] assert action["type"] == "load_plan" assert len(action["candidates"]) == 2 - assert action["candidates"][0]["source_id"] == "pg_prod" - assert "row_limit" not in action["candidates"][0] - assert action["reasoning"] == "Orders for last quarter + customer dimension" + assert action["reasoning"] == "Orders + customer dimension" def test_empty_candidates_returns_empty_action(self) -> None: agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace()) result = agent._tool_propose_load_plan({"candidates": []}) - assert result["actions"][0]["type"] == "load_plan" assert result["actions"][0]["candidates"] == [] - def test_action_flows_through_actions_pipeline(self) -> None: - """Verify propose_load_plan output is structured for the actions pipeline.""" - agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace()) - result = agent._tool_propose_load_plan({ - "candidates": [{"source_id": "s", "table_key": "k", "display_name": "n", "source_table": "t"}], - }) - - assert "actions" in result - assert result["actions"][0]["type"] == "load_plan" - def test_resolves_superset_dataset_id_from_catalog(self) -> None: agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace("/tmp/home")) catalog = [{ @@ -148,7 +287,10 @@ def test_resolves_superset_dataset_id_from_catalog(self) -> None: }, }] - with patch("data_formulator.datalake.catalog_cache.load_catalog", return_value=catalog): + with patch( + "data_formulator.datalake.catalog_cache.load_catalog", + return_value=catalog, + ): result = agent._tool_propose_load_plan({ "candidates": [{ "source_id": "superset", @@ -162,13 +304,17 @@ def test_resolves_superset_dataset_id_from_catalog(self) -> None: candidate = result["actions"][0]["candidates"][0] assert candidate["source_table"] == "136" assert candidate["source_table_name"] == "product_periodic_sales_trend" - assert candidate["filters"] == [{"column": "brand", "operator": "EQ", "value": "Pantum"}] + assert candidate["filters"] == [ + {"column": "brand", "operator": "EQ", "value": "Pantum"}, + ] assert "row_limit" not in candidate -class TestNormalizeLoadPlanFilters: - """Test _normalize_load_plan_filters sanitization logic.""" +# ------------------------------------------------------------------ +# _normalize_load_plan_filters (unchanged behavior) +# ------------------------------------------------------------------ +class TestNormalizeLoadPlanFilters: def test_strips_wildcards_and_upgrades_eq_to_ilike(self) -> None: filters = [{"column": "brand", "operator": "EQ", "value": "%奔图%"}] result = DataLoadingAgent._normalize_load_plan_filters(filters) @@ -196,9 +342,7 @@ def test_symbol_operators_mapped(self) -> None: ] result = DataLoadingAgent._normalize_load_plan_filters(filters) assert result[0]["operator"] == "GTE" - assert result[0]["value"] == 10 assert result[1]["operator"] == "NEQ" - assert result[1]["value"] == "closed" def test_contains_mapped_to_ilike(self) -> None: filters = [{"column": "name", "operator": "CONTAINS", "value": "printer"}] @@ -225,11 +369,18 @@ def test_non_list_returns_empty(self) -> None: assert DataLoadingAgent._normalize_load_plan_filters("bad") == [] def test_missing_column_skipped(self) -> None: - filters = [{"operator": "EQ", "value": "x"}, {"column": "", "operator": "EQ", "value": "y"}] + filters = [ + {"operator": "EQ", "value": "x"}, + {"column": "", "operator": "EQ", "value": "y"}, + ] result = DataLoadingAgent._normalize_load_plan_filters(filters) assert result == [] +# ------------------------------------------------------------------ +# _build_system_prompt: connector summary block +# ------------------------------------------------------------------ + class TestBuildSystemPromptConnectorSummary: def test_includes_connector_summary_when_sources_exist(self) -> None: agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace("/tmp/home")) @@ -250,12 +401,12 @@ def test_shows_none_when_no_sources(self) -> None: ): prompt = agent._build_system_prompt("test query") - assert "Connected data sources: none" in prompt + assert "Connected data sources:\n none" in prompt def test_graceful_when_user_home_missing(self) -> None: agent = DataLoadingAgent(client=None, workspace=_FakeWorkspace(None)) prompt = agent._build_system_prompt("test query") - assert "Connected data sources: none" in prompt + assert "Connected data sources:\n none" in prompt def test_includes_current_date_and_time(self) -> None: from datetime import datetime diff --git a/tests/backend/agents/test_duckdb_notes_prompt.py b/tests/backend/agents/test_duckdb_notes_prompt.py index 87b6f5e8..385313f3 100644 --- a/tests/backend/agents/test_duckdb_notes_prompt.py +++ b/tests/backend/agents/test_duckdb_notes_prompt.py @@ -7,7 +7,7 @@ import pytest -from data_formulator.agents.agent_data_rec import SHARED_DUCKDB_NOTES +from data_formulator.agents.chart_creation_guide import SHARED_DUCKDB_NOTES pytestmark = [pytest.mark.backend] diff --git a/tests/backend/agents/test_model_registry.py b/tests/backend/agents/test_model_registry.py index d7bef630..2f46990b 100644 --- a/tests/backend/agents/test_model_registry.py +++ b/tests/backend/agents/test_model_registry.py @@ -13,8 +13,6 @@ from data_formulator.model_registry import ( ModelRegistry, BUILTIN_PROVIDERS, - is_likely_text_only_model, - model_supports_vision, ) pytestmark = [pytest.mark.backend] @@ -111,7 +109,6 @@ def test_public_fields_are_complete(self): assert "id" in model assert "endpoint" in model assert "model" in model - assert "supports_vision" in model assert "is_global" in model assert model["is_global"] is True @@ -157,26 +154,3 @@ def test_custom_provider_defaults_to_openai_endpoint(self): assert config["endpoint"] == "openai" -# --------------------------------------------------------------------------- -# Tests: image input capability hints -# --------------------------------------------------------------------------- - -class TestVisionCapability: - @pytest.mark.parametrize("model_name", ["deepseek-chat", "provider/deepseek-chat-v2", "DeepSeek-Chat"]) - def test_known_text_only_models_are_detected(self, model_name): - assert is_likely_text_only_model(model_name) is True - - @pytest.mark.parametrize("model_name", ["gpt-4o", "claude-sonnet-4-20250514", "deepseek-reasoner", None]) - def test_other_models_are_not_marked_text_only(self, model_name): - assert is_likely_text_only_model(model_name) is False - - @patch.dict(os.environ, SAMPLE_ENV, clear=True) - def test_public_listing_marks_deepseek_chat_as_not_vision_capable(self): - registry = ModelRegistry() - by_id = {m["id"]: m for m in registry.list_public()} - - assert by_id["global-deepseek-deepseek-chat"]["supports_vision"] is False - assert by_id["global-openai-gpt-4o"]["supports_vision"] is True - - def test_explicit_supports_vision_false_wins(self): - assert model_supports_vision({"model": "gpt-4o", "supports_vision": False}) is False diff --git a/tests/backend/agents/test_reasoning_content_helpers.py b/tests/backend/agents/test_reasoning_content_helpers.py new file mode 100644 index 00000000..b6cb0763 --- /dev/null +++ b/tests/backend/agents/test_reasoning_content_helpers.py @@ -0,0 +1,102 @@ +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT License. + +"""Tests for attach_reasoning_content / accumulate_reasoning_content helpers.""" + +import pytest + +from data_formulator.agents.agent_utils import ( + attach_reasoning_content, + accumulate_reasoning_content, +) + + +# --------------------------------------------------------------------------- +# Helpers to fake LLM response objects +# --------------------------------------------------------------------------- + +class _FakeMessage: + """Minimal stand-in for ``choice.message``.""" + def __init__(self, content="hello", reasoning_content=None): + self.content = content + if reasoning_content is not None: + self.reasoning_content = reasoning_content + + +class _FakeDelta: + """Minimal stand-in for a streaming ``choice.delta``.""" + def __init__(self, content=None, reasoning_content=None): + self.content = content + if reasoning_content is not None: + self.reasoning_content = reasoning_content + + +# --------------------------------------------------------------------------- +# attach_reasoning_content +# --------------------------------------------------------------------------- + +class TestAttachReasoningContent: + def test_present(self): + msg = {"role": "assistant", "content": "hi"} + result = attach_reasoning_content(msg, _FakeMessage(reasoning_content="think")) + assert result is msg + assert msg["reasoning_content"] == "think" + + def test_absent(self): + msg = {"role": "assistant", "content": "hi"} + attach_reasoning_content(msg, _FakeMessage()) + assert "reasoning_content" not in msg + + def test_none_value_not_attached(self): + """If the attribute exists but is explicitly None, don't add it.""" + fake = _FakeMessage() + fake.reasoning_content = None + msg = {"role": "assistant", "content": "hi"} + attach_reasoning_content(msg, fake) + assert "reasoning_content" not in msg + + def test_empty_string_attached(self): + """An empty string is a valid value — it should still be attached.""" + msg = {"role": "assistant", "content": "hi"} + attach_reasoning_content(msg, _FakeMessage(reasoning_content="")) + assert msg["reasoning_content"] == "" + + +# --------------------------------------------------------------------------- +# accumulate_reasoning_content +# --------------------------------------------------------------------------- + +class TestAccumulateReasoningContent: + def test_first_chunk(self): + result = accumulate_reasoning_content(None, _FakeDelta(reasoning_content="a")) + assert result == "a" + + def test_subsequent_chunks(self): + acc = "ab" + result = accumulate_reasoning_content(acc, _FakeDelta(reasoning_content="c")) + assert result == "abc" + + def test_no_attr(self): + result = accumulate_reasoning_content(None, _FakeDelta()) + assert result is None + + def test_no_attr_preserves_accumulator(self): + result = accumulate_reasoning_content("existing", _FakeDelta()) + assert result == "existing" + + def test_none_value_no_change(self): + delta = _FakeDelta() + delta.reasoning_content = None + result = accumulate_reasoning_content("existing", delta) + assert result == "existing" + + def test_empty_string_delta_no_change(self): + """Empty-string delta is falsy — accumulator should not change.""" + result = accumulate_reasoning_content(None, _FakeDelta(reasoning_content="")) + assert result is None + + def test_multiple_chunks_sequence(self): + acc = None + for chunk_text in ["Think", "ing ", "step"]: + acc = accumulate_reasoning_content(acc, _FakeDelta(reasoning_content=chunk_text)) + assert acc == "Thinking step" diff --git a/tests/backend/agents/test_semantic_types.py b/tests/backend/agents/test_semantic_types.py new file mode 100644 index 00000000..9e86fdd0 --- /dev/null +++ b/tests/backend/agents/test_semantic_types.py @@ -0,0 +1,348 @@ +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT License. + +"""Unit tests for data_formulator.agents.semantic_types. + +Covers: +- ALL_SEMANTIC_TYPES completeness +- Classification helpers: is_measure_type, is_timeseries_type, + is_categorical_type, is_ordinal_type, is_geo_type, + is_non_measure_numeric, is_signed_measure +- get_vl_type: VL-type mapping for every registered semantic type +- infer_vl_type_from_name: name-based heuristic inference +- generate_semantic_types_prompt: output shape and content +""" + +from __future__ import annotations + +import pytest + +from data_formulator.agents.semantic_types import ( + # Constants + AMOUNT, ADDRESS, BOOLEAN, CATEGORY, CITY, CORRELATION, COUNT, + COUNTRY, DATE, DATETIME, DAY, DECADE, DIRECTION, DURATION, HOUR, + ID, LATITUDE, LONGITUDE, MONTH, NAME, NUMBER, PERCENTAGE, + PERCENTAGE_CHANGE, PRICE, PROFIT, QUANTITY, QUARTER, RANGE, RANK, + REGION, SCORE, SENTIMENT, STATE, STATUS, TEMPERATURE, TIME, + TIMESTAMP, UNKNOWN, WEEK, YEAR, YEAR_MONTH, YEAR_QUARTER, YEAR_WEEK, + ZIP_CODE, + # Sets / lists + ALL_SEMANTIC_TYPES, MEASURE_TYPES, TIMESERIES_X_TYPES, + CATEGORICAL_TYPES, ORDINAL_TYPES, GEO_TYPES, SIGNED_MEASURE_TYPES, + NON_MEASURE_NUMERIC_TYPES, SEMANTIC_TYPE_CATEGORIES, + VL_TYPE_MAP, + # Functions + is_measure_type, is_timeseries_type, is_categorical_type, + is_ordinal_type, is_geo_type, is_non_measure_numeric, is_signed_measure, + get_vl_type, infer_vl_type_from_name, generate_semantic_types_prompt, +) + +pytestmark = [pytest.mark.backend] + + +# --------------------------------------------------------------------------- +# ALL_SEMANTIC_TYPES completeness +# --------------------------------------------------------------------------- + +class TestAllSemanticTypes: + def test_no_duplicates(self): + """Duplicate entries in ALL_SEMANTIC_TYPES would cause silent bugs.""" + assert len(ALL_SEMANTIC_TYPES) == len(set(ALL_SEMANTIC_TYPES)) + + def test_includes_known_types(self): + for t in [DATETIME, DATE, AMOUNT, COUNT, CATEGORY, COUNTRY, UNKNOWN]: + assert t in ALL_SEMANTIC_TYPES + + def test_every_vl_map_key_is_in_all_types(self): + """VL_TYPE_MAP should not reference types absent from ALL_SEMANTIC_TYPES.""" + for key in VL_TYPE_MAP: + assert key in ALL_SEMANTIC_TYPES, f"{key!r} in VL_TYPE_MAP but not in ALL_SEMANTIC_TYPES" + + def test_vl_map_values_are_valid(self): + valid = {"quantitative", "ordinal", "nominal", "temporal"} + for key, val in VL_TYPE_MAP.items(): + assert val in valid, f"{key!r} maps to unexpected VL type {val!r}" + + def test_all_types_covered_in_semantic_categories(self): + """Every type in ALL_SEMANTIC_TYPES should appear in at least one category group.""" + all_in_cats = {t for types in SEMANTIC_TYPE_CATEGORIES.values() for t in types} + for t in ALL_SEMANTIC_TYPES: + assert t in all_in_cats, f"{t!r} missing from SEMANTIC_TYPE_CATEGORIES" + + +# --------------------------------------------------------------------------- +# is_measure_type +# --------------------------------------------------------------------------- + +class TestIsMeasureType: + @pytest.mark.parametrize("t", [ + AMOUNT, PRICE, QUANTITY, TEMPERATURE, PERCENTAGE, + PROFIT, PERCENTAGE_CHANGE, SENTIMENT, CORRELATION, + COUNT, NUMBER, DURATION, + ]) + def test_measure_types_return_true(self, t): + assert is_measure_type(t) is True + + @pytest.mark.parametrize("t", [ + CATEGORY, NAME, COUNTRY, DATE, DATETIME, RANK, SCORE, ID, UNKNOWN, + ]) + def test_non_measure_types_return_false(self, t): + assert is_measure_type(t) is False + + def test_unknown_string_returns_false(self): + assert is_measure_type("NotAType") is False + + +# --------------------------------------------------------------------------- +# is_timeseries_type +# --------------------------------------------------------------------------- + +class TestIsTimeseriesType: + @pytest.mark.parametrize("t", [ + DATETIME, DATE, TIME, TIMESTAMP, + YEAR_MONTH, YEAR_QUARTER, YEAR_WEEK, + YEAR, QUARTER, MONTH, WEEK, DAY, HOUR, DECADE, + ]) + def test_timeseries_types_return_true(self, t): + assert is_timeseries_type(t) is True + + @pytest.mark.parametrize("t", [ + AMOUNT, CATEGORY, COUNTRY, COUNT, RANK, UNKNOWN, + ]) + def test_non_timeseries_return_false(self, t): + assert is_timeseries_type(t) is False + + +# --------------------------------------------------------------------------- +# is_categorical_type +# --------------------------------------------------------------------------- + +class TestIsCategoricalType: + @pytest.mark.parametrize("t", [ + NAME, CATEGORY, STATUS, BOOLEAN, DIRECTION, + COUNTRY, STATE, CITY, REGION, ADDRESS, ZIP_CODE, RANGE, + ]) + def test_categorical_types_return_true(self, t): + assert is_categorical_type(t) is True + + @pytest.mark.parametrize("t", [ + AMOUNT, DATE, COUNT, RANK, LATITUDE, LONGITUDE, UNKNOWN, + ]) + def test_non_categorical_return_false(self, t): + assert is_categorical_type(t) is False + + +# --------------------------------------------------------------------------- +# is_ordinal_type +# --------------------------------------------------------------------------- + +class TestIsOrdinalType: + @pytest.mark.parametrize("t", [ + YEAR, QUARTER, MONTH, WEEK, DAY, HOUR, DECADE, + RANK, SCORE, RANGE, DIRECTION, + ]) + def test_ordinal_types_return_true(self, t): + assert is_ordinal_type(t) is True + + @pytest.mark.parametrize("t", [ + AMOUNT, DATETIME, CATEGORY, COUNTRY, UNKNOWN, + ]) + def test_non_ordinal_return_false(self, t): + assert is_ordinal_type(t) is False + + +# --------------------------------------------------------------------------- +# is_geo_type +# --------------------------------------------------------------------------- + +class TestIsGeoType: + @pytest.mark.parametrize("t", [ + LATITUDE, LONGITUDE, + COUNTRY, STATE, CITY, REGION, ADDRESS, ZIP_CODE, + ]) + def test_geo_types_return_true(self, t): + assert is_geo_type(t) is True + + @pytest.mark.parametrize("t", [ + AMOUNT, DATE, CATEGORY, RANK, UNKNOWN, + ]) + def test_non_geo_return_false(self, t): + assert is_geo_type(t) is False + + +# --------------------------------------------------------------------------- +# is_non_measure_numeric +# --------------------------------------------------------------------------- + +class TestIsNonMeasureNumeric: + @pytest.mark.parametrize("t", [ + RANK, SCORE, YEAR, MONTH, DAY, HOUR, LATITUDE, LONGITUDE, + ]) + def test_non_measure_numerics_return_true(self, t): + assert is_non_measure_numeric(t) is True + + @pytest.mark.parametrize("t", [ + AMOUNT, CATEGORY, DATETIME, UNKNOWN, + ]) + def test_others_return_false(self, t): + assert is_non_measure_numeric(t) is False + + +# --------------------------------------------------------------------------- +# is_signed_measure +# --------------------------------------------------------------------------- + +class TestIsSignedMeasure: + @pytest.mark.parametrize("t", [PROFIT, PERCENTAGE_CHANGE, SENTIMENT, CORRELATION]) + def test_signed_measures_return_true(self, t): + assert is_signed_measure(t) is True + + @pytest.mark.parametrize("t", [AMOUNT, COUNT, CATEGORY, DATE, UNKNOWN]) + def test_non_signed_return_false(self, t): + assert is_signed_measure(t) is False + + +# --------------------------------------------------------------------------- +# get_vl_type — VL type mapping +# --------------------------------------------------------------------------- + +class TestGetVlType: + @pytest.mark.parametrize("semantic, expected_vl", [ + (DATETIME, "temporal"), + (DATE, "temporal"), + (TIME, "temporal"), + (TIMESTAMP, "temporal"), + (YEAR_MONTH, "temporal"), + (YEAR_QUARTER, "temporal"), + (YEAR_WEEK, "temporal"), + (YEAR, "temporal"), + (QUARTER, "ordinal"), + (MONTH, "ordinal"), + (WEEK, "ordinal"), + (DAY, "ordinal"), + (HOUR, "ordinal"), + (DECADE, "ordinal"), + (DURATION, "quantitative"), + (AMOUNT, "quantitative"), + (PRICE, "quantitative"), + (QUANTITY, "quantitative"), + (TEMPERATURE, "quantitative"), + (PERCENTAGE, "quantitative"), + (PROFIT, "quantitative"), + (PERCENTAGE_CHANGE, "quantitative"), + (SENTIMENT, "quantitative"), + (CORRELATION, "quantitative"), + (COUNT, "quantitative"), + (NUMBER, "quantitative"), + (RANK, "ordinal"), + (SCORE, "quantitative"), + (ID, "nominal"), + (LATITUDE, "quantitative"), + (LONGITUDE, "quantitative"), + (COUNTRY, "nominal"), + (STATE, "nominal"), + (CITY, "nominal"), + (REGION, "nominal"), + (ADDRESS, "nominal"), + (ZIP_CODE, "nominal"), + (NAME, "nominal"), + (CATEGORY, "nominal"), + (STATUS, "nominal"), + (BOOLEAN, "nominal"), + (DIRECTION, "nominal"), + (RANGE, "ordinal"), + (UNKNOWN, "nominal"), + ]) + def test_vl_type_mapping(self, semantic, expected_vl): + assert get_vl_type(semantic) == expected_vl + + def test_unknown_type_returns_none(self): + assert get_vl_type("NotARegisteredType") is None + + +# --------------------------------------------------------------------------- +# infer_vl_type_from_name — name-based heuristic inference +# --------------------------------------------------------------------------- + +class TestInferVlTypeFromName: + # Temporal heuristics + @pytest.mark.parametrize("name", [ + "date", "created_at", "updated_at", "started_at", "ended_at", + "order_date", "timestamp", "datetime", "time", "year", + ]) + def test_temporal_names(self, name): + assert infer_vl_type_from_name(name) == "temporal", f"Expected temporal for {name!r}" + + # Ordinal heuristics + @pytest.mark.parametrize("name", [ + "month", "quarter", "week", "day", "hour", "decade", + "year_month", "year_quarter", + "rank", "ranking", "user_rank", "priority_level", "tier", + ]) + def test_ordinal_names(self, name): + assert infer_vl_type_from_name(name) == "ordinal", f"Expected ordinal for {name!r}" + + # Quantitative heuristics + @pytest.mark.parametrize("name", [ + "revenue_sum", "total_sales", "avg_price", "count_orders", + "mean_temperature", "max_score", "min_cost", "profit_change", + "growth_rate", "pct_change", "lat", "lon", "latitude", "longitude", + ]) + def test_quantitative_names(self, name): + assert infer_vl_type_from_name(name) == "quantitative", f"Expected quantitative for {name!r}" + + # Nominal heuristics + @pytest.mark.parametrize("name", [ + "user_name", "product_category", "status", "group_id", + "country", "city", "region", "brand", "company", + ]) + def test_nominal_names(self, name): + assert infer_vl_type_from_name(name) == "nominal", f"Expected nominal for {name!r}" + + # No-signal names — should return None + @pytest.mark.parametrize("name", [ + "x", "value", "col_a", "data", "info", + ]) + def test_no_signal_names_return_none(self, name): + assert infer_vl_type_from_name(name) is None, f"Expected None for {name!r}" + + def test_temporal_takes_priority_over_ordinal(self): + """'year' should match temporal before ordinal.""" + assert infer_vl_type_from_name("year") == "temporal" + + def test_case_insensitive(self): + assert infer_vl_type_from_name("Created_At") == "temporal" + assert infer_vl_type_from_name("REVENUE_SUM") == "quantitative" + + +# --------------------------------------------------------------------------- +# generate_semantic_types_prompt +# --------------------------------------------------------------------------- + +class TestGenerateSemanticTypesPrompt: + def test_returns_non_empty_string(self): + prompt = generate_semantic_types_prompt() + assert isinstance(prompt, str) + assert len(prompt) > 100 + + def test_contains_category_headers(self): + prompt = generate_semantic_types_prompt() + assert "Temporal" in prompt + assert "Numeric" in prompt + assert "Geographic" in prompt + assert "Categorical" in prompt + + def test_contains_type_names(self): + prompt = generate_semantic_types_prompt() + for t in [DATETIME, AMOUNT, COUNTRY, CATEGORY, UNKNOWN]: + assert t in prompt, f"{t!r} missing from semantic types prompt" + + def test_contains_guidelines(self): + prompt = generate_semantic_types_prompt() + assert "Guidelines" in prompt + + def test_all_registered_types_appear_in_prompt(self): + """Every type in ALL_SEMANTIC_TYPES must appear somewhere in the prompt.""" + prompt = generate_semantic_types_prompt() + for t in ALL_SEMANTIC_TYPES: + assert t in prompt, f"Type {t!r} not present in generated prompt" diff --git a/tests/backend/agents/test_sort_data_agent.py b/tests/backend/agents/test_sort_data_agent.py new file mode 100644 index 00000000..d09d0007 --- /dev/null +++ b/tests/backend/agents/test_sort_data_agent.py @@ -0,0 +1,199 @@ +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT License. + +"""Unit tests for SortDataAgent. + +Focuses on things we can test without a real LLM: +- The input dict key must be 'values' (not 'value') to match the system prompt. +- The user query is serialised as valid JSON. +- The agent correctly handles choices that return valid JSON blocks. +- The agent handles choices whose content cannot be parsed as JSON. +- The 'agent' field on each candidate is set to 'SortDataAgent'. +- The 'dialog' field includes both the initial messages and the LLM reply. +""" + +from __future__ import annotations + +import json +from types import SimpleNamespace +from unittest.mock import MagicMock + +import pytest + +from data_formulator.agents.agent_sort_data import SortDataAgent + +pytestmark = [pytest.mark.backend] + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + +def _make_choice(content: str) -> SimpleNamespace: + """Build a fake response choice with the given message content.""" + msg = SimpleNamespace(role="assistant", content=content) + return SimpleNamespace(message=msg) + + +def _make_response(*contents: str) -> SimpleNamespace: + """Build a fake completion response with one choice per content string.""" + return SimpleNamespace(choices=[_make_choice(c) for c in contents]) + + +def _make_agent(response) -> SortDataAgent: + """Build a SortDataAgent whose client returns *response* on get_completion.""" + client = MagicMock() + client.get_completion.return_value = response + return SortDataAgent(client) + + +# --------------------------------------------------------------------------- +# Input construction +# --------------------------------------------------------------------------- + +class TestInputConstruction: + def test_input_key_is_values_not_value(self): + """Regression for the 'value' → 'values' key name bug. + + The system prompt examples use 'values'; sending 'value' would confuse + the LLM. This test captures the call to get_completion and inspects + the serialised user message. + """ + client = MagicMock() + client.get_completion.return_value = _make_response('{"name": "x", "sorted_values": ["a"], "reason": "ok"}') + agent = SortDataAgent(client) + + agent.run("month", ["March", "January", "February"]) + + # Extract the user message that was sent to the LLM + call_args = client.get_completion.call_args + messages = call_args.kwargs.get("messages") or call_args.args[0] + user_content = next(m["content"] for m in messages if m["role"] == "user") + + # The serialised JSON block must contain the key "values", not "value" + assert '"values"' in user_content, ( + f"Expected key 'values' in user content but got:\n{user_content}" + ) + # Parse out the JSON block from [INPUT]\n\n{...}\n\n[OUTPUT] + json_part = user_content.split("[INPUT]")[1].split("[OUTPUT]")[0].strip() + parsed = json.loads(json_part) + assert "values" in parsed + assert "value" not in parsed + + def test_input_name_is_preserved(self): + """The 'name' field in the serialised input must match what was passed.""" + client = MagicMock() + client.get_completion.return_value = _make_response( + '{"name": "grades", "sorted_values": ["A", "B"], "reason": "ok"}' + ) + agent = SortDataAgent(client) + agent.run("grades", ["B", "A"]) + + messages = client.get_completion.call_args.kwargs.get("messages") or \ + client.get_completion.call_args.args[0] + user_content = next(m["content"] for m in messages if m["role"] == "user") + json_part = user_content.split("[INPUT]")[1].split("[OUTPUT]")[0].strip() + parsed = json.loads(json_part) + assert parsed["name"] == "grades" + + def test_input_values_are_preserved(self): + """Every value passed in must appear in the serialised JSON.""" + client = MagicMock() + client.get_completion.return_value = _make_response( + '{"name": "x", "sorted_values": [], "reason": "ok"}' + ) + agent = SortDataAgent(client) + values = [">=60", "10", "20", "50"] + agent.run("grades", values) + + messages = client.get_completion.call_args.kwargs.get("messages") or \ + client.get_completion.call_args.args[0] + user_content = next(m["content"] for m in messages if m["role"] == "user") + json_part = user_content.split("[INPUT]")[1].split("[OUTPUT]")[0].strip() + parsed = json.loads(json_part) + assert parsed["values"] == values + + def test_unicode_values_serialised_correctly(self): + """Non-ASCII values must not be escaped as \\uXXXX in the user query + (ensure_ascii=False is set so the LLM sees readable text).""" + client = MagicMock() + client.get_completion.return_value = _make_response( + '{"name": "month", "sorted_values": ["一月", "二月"], "reason": "ok"}' + ) + agent = SortDataAgent(client) + agent.run("month", ["二月", "一月"]) + + messages = client.get_completion.call_args.kwargs.get("messages") or \ + client.get_completion.call_args.args[0] + user_content = next(m["content"] for m in messages if m["role"] == "user") + # Must appear as literal Chinese characters, not escaped sequences + assert "一月" in user_content or "二月" in user_content + + +# --------------------------------------------------------------------------- +# Response parsing +# --------------------------------------------------------------------------- + +class TestResponseParsing: + def test_valid_json_block_in_content_returns_ok(self): + payload = '{"name": "month", "sorted_values": ["January", "February"], "reason": "natural order"}' + response = _make_response(payload) + agent = _make_agent(response) + + candidates = agent.run("month", ["February", "January"]) + + assert len(candidates) == 1 + assert candidates[0]["status"] == "ok" + assert candidates[0]["content"]["sorted_values"] == ["January", "February"] + + def test_json_wrapped_in_text_is_extracted(self): + """The agent uses extract_json_objects, so JSON embedded in prose should work.""" + payload = 'Here is the result:\n{"name": "x", "sorted_values": ["a", "b"], "reason": "r"}\nDone.' + response = _make_response(payload) + agent = _make_agent(response) + + candidates = agent.run("x", ["b", "a"]) + + assert candidates[0]["status"] == "ok" + + def test_unparseable_content_returns_error_status(self): + response = _make_response("Sorry, I cannot sort this data.") + agent = _make_agent(response) + + candidates = agent.run("x", ["b", "a"]) + + assert len(candidates) == 1 + assert candidates[0]["status"] != "ok" + + def test_multiple_choices_produce_multiple_candidates(self): + good = '{"name": "x", "sorted_values": ["a"], "reason": "ok"}' + bad = "unparseable" + response = _make_response(good, bad) + agent = _make_agent(response) + + candidates = agent.run("x", ["a"]) + + assert len(candidates) == 2 + + def test_agent_field_is_set(self): + response = _make_response('{"name": "x", "sorted_values": [], "reason": "r"}') + agent = _make_agent(response) + + candidates = agent.run("x", []) + assert candidates[0]["agent"] == "SortDataAgent" + + def test_dialog_includes_system_and_user_and_assistant(self): + content = '{"name": "x", "sorted_values": [], "reason": "r"}' + response = _make_response(content) + agent = _make_agent(response) + + candidates = agent.run("x", []) + dialog = candidates[0]["dialog"] + + roles = [m["role"] for m in dialog] + assert "system" in roles + assert "user" in roles + assert "assistant" in roles + # Assistant reply must be the last message + assert dialog[-1]["role"] == "assistant" + assert dialog[-1]["content"] == content diff --git a/tests/backend/auth/test_oidc_gateway.py b/tests/backend/auth/test_oidc_gateway.py index 77b94eea..f9370cdc 100644 --- a/tests/backend/auth/test_oidc_gateway.py +++ b/tests/backend/auth/test_oidc_gateway.py @@ -193,6 +193,48 @@ def test_callback_disabled_without_secret(self, app, monkeypatch): assert resp.status_code == 302 assert "auth_error=backend_oidc_not_enabled" in resp.headers["Location"] + def test_callback_access_denied_returns_access_denied_error(self, client): + """When user cancels at IdP, the error param is recognized.""" + with client.session_transaction() as sess: + sess["_oauth_state"] = "test-state" + resp = client.get("/auth/callback?error=access_denied&state=test-state") + assert resp.status_code == 302 + assert "auth_error=access_denied" in resp.headers["Location"] + + def test_callback_access_denied_clears_oauth_state(self, client): + """Cancelled auth should clean up the pending oauth state.""" + with client.session_transaction() as sess: + sess["_oauth_state"] = "test-state" + client.get("/auth/callback?error=access_denied&state=test-state") + with client.session_transaction() as sess: + assert "_oauth_state" not in sess + + def test_callback_idp_server_error_returns_token_exchange_failed(self, client): + """Non-access_denied IdP errors map to token_exchange_failed.""" + with client.session_transaction() as sess: + sess["_oauth_state"] = "test-state" + resp = client.get("/auth/callback?error=server_error&state=test-state") + assert resp.status_code == 302 + assert "auth_error=token_exchange_failed" in resp.headers["Location"] + + def test_callback_access_denied_preserves_existing_sso_session(self, client): + """Cancelling re-auth must NOT invalidate the existing SSO session.""" + import time + with client.session_transaction() as sess: + sess["_oauth_state"] = "test-state" + sess["sso"] = { + "access_token": "existing-tok", + "refresh_token": None, + "expires_at": time.time() + 3600, + "user": {"name": "Alice"}, + } + resp = client.get("/auth/callback?error=access_denied&state=test-state") + assert resp.status_code == 302 + assert "auth_error=access_denied" in resp.headers["Location"] + with client.session_transaction() as sess: + assert "sso" in sess + assert sess["sso"]["access_token"] == "existing-tok" + # ================================================================== # oidc_bp: /api/auth/oidc/status diff --git a/tests/backend/data/test_catalog_cache.py b/tests/backend/data/test_catalog_cache.py index 64cf8f8b..7efac880 100644 --- a/tests/backend/data/test_catalog_cache.py +++ b/tests/backend/data/test_catalog_cache.py @@ -27,7 +27,6 @@ save_catalog, search_catalog_cache, _search_python, - _search_duckdb, ) pytestmark = [pytest.mark.backend, pytest.mark.plugin] @@ -157,6 +156,26 @@ def test_returns_source_ids(self, tmp_path: Path) -> None: def test_returns_empty_for_missing_dir(self, tmp_path: Path) -> None: assert list_cached_sources(tmp_path / "nonexistent") == [] + def test_returns_canonical_id_with_colon(self, tmp_path: Path) -> None: + # ``mysql:mysql`` is sanitised to ``mysql--mysql.json`` on disk, but + # callers (agents, frontend) need the canonical id back so that + # ``connector_id`` lookups against the in-memory registry succeed. + # Regression: previously this returned the filename stem. + save_catalog(tmp_path, "mysql:mysql", []) + save_catalog(tmp_path, "postgresql:prod-db", []) + assert set(list_cached_sources(tmp_path)) == { + "mysql:mysql", + "postgresql:prod-db", + } + + def test_falls_back_to_stem_when_source_id_missing(self, tmp_path: Path) -> None: + # Corrupted / legacy files without a ``source_id`` field still surface + # something usable rather than silently dropping the source. + cache_dir = tmp_path / "catalog_cache" + cache_dir.mkdir() + (cache_dir / "legacy_stem.json").write_text("{}", encoding="utf-8") + assert list_cached_sources(tmp_path) == ["legacy_stem"] + # ================================================================== # Tests: search_catalog_cache @@ -210,7 +229,7 @@ def test_search_respects_limit_per_source(self) -> None: # ================================================================== -# Tests: DuckDB search and Python fallback consistency +# Tests: structured-field search produces match_reasons and scoring # ================================================================== RICH_TABLES: list[dict[str, Any]] = [ @@ -241,62 +260,230 @@ def test_search_respects_limit_per_source(self) -> None: ] -class TestDuckDBSearchConsistency: - """Verify DuckDB and Python search produce equivalent results.""" +class TestStructuredFieldSearch: + """Verify per-field scoring + match_reasons reported by ``_search_python``.""" @pytest.fixture(autouse=True) def _setup_cache(self, tmp_path: Path) -> None: self.user_home = tmp_path save_catalog(tmp_path, "pg_prod", RICH_TABLES) - def _compare(self, query: str, exclude: set[str] | None = None) -> None: - exc = exclude or set() + def _run(self, query: str, **kwargs: Any) -> list[dict[str, Any]]: ids = list_cached_sources(self.user_home) - needle = query.strip().lower() - py_results = _search_python(self.user_home, needle, ids, exc, 20) - duck_results = _search_duckdb(self.user_home, needle, ids, exc, 20) - assert len(py_results) == len(duck_results), f"Count mismatch for '{query}'" - for pr, dr in zip(py_results, duck_results): - assert pr["name"] == dr["name"], f"Name mismatch for '{query}'" - assert pr["score"] == dr["score"], f"Score mismatch for '{query}': {pr} vs {dr}" - assert set(pr["matched_columns"]) == set(dr["matched_columns"]) - - def test_table_name_match(self) -> None: - self._compare("orders") + return _search_python(self.user_home, query, ids, set(), 20, **kwargs) + + def test_table_name_match_reports_table_name_reason(self) -> None: + results = self._run("orders") + assert len(results) >= 1 + assert results[0]["name"] == "public.orders" + assert "table_name" in results[0]["match_reasons"] def test_table_description_match(self) -> None: - self._compare("订单") + results = self._run("订单") + assert len(results) >= 1 + assert results[0]["name"] == "public.orders" + assert "source_description" in results[0]["match_reasons"] def test_column_name_match(self) -> None: - self._compare("customer_name") + results = self._run("customer_name") + assert len(results) >= 1 + assert "customer_name" in results[0]["matched_columns"] + assert "column_name" in results[0]["match_reasons"] def test_column_description_match(self) -> None: - self._compare("Primary key") + results = self._run("Primary key") + assert len(results) >= 1 + assert "source_column_description" in results[0]["match_reasons"] - def test_no_match(self) -> None: - self._compare("zzz_nonexistent_zzz") + def test_no_match_returns_empty(self) -> None: + assert self._run("zzz_nonexistent_zzz") == [] - def test_exclude_tables(self) -> None: - exc = {"public.orders"} + def test_exclude_tables_drops_matches(self) -> None: ids = list_cached_sources(self.user_home) - needle = "orders" - py_results = _search_python(self.user_home, needle, ids, exc, 20) - duck_results = _search_duckdb(self.user_home, needle, ids, exc, 20) - assert all(r["name"] != "public.orders" for r in py_results) - assert all(r["name"] != "public.orders" for r in duck_results) - - def test_search_catalog_cache_uses_duckdb_by_default(self) -> None: - """Top-level search_catalog_cache should still work end-to-end.""" + results = _search_python( + self.user_home, "orders", ids, {"public.orders"}, 20, + ) + assert all(r["name"] != "public.orders" for r in results) + + def test_search_catalog_cache_end_to_end(self) -> None: results = search_catalog_cache(self.user_home, "product") - assert len(results) >= 1 assert any(r["name"] == "public.products" for r in results) - def test_fallback_when_duckdb_fails(self) -> None: - with patch("data_formulator.datalake.catalog_cache._search_duckdb", - side_effect=RuntimeError("DuckDB broken")): - results = search_catalog_cache(self.user_home, "orders") - assert len(results) >= 1 - assert results[0]["name"] == "public.orders" + def test_regex_query_alternation(self) -> None: + results = self._run("orders|products") + names = {r["name"] for r in results} + assert {"public.orders", "public.products"} <= names + + +# ================================================================== +# Tests: list_sources_summary / list_path_children (design-docs/32) +# ================================================================== + +_HIER_TABLES: list[dict[str, Any]] = [ + { + "name": "monthly_orders", + "table_key": "k_orders", + "path": ["Sales", "monthly_orders"], + "metadata": {"description": "Monthly orders", "columns": []}, + }, + { + "name": "monthly_returns", + "table_key": "k_returns", + "path": ["Sales", "monthly_returns"], + "metadata": {"description": "Monthly returns", "columns": []}, + }, + { + "name": "fy24", + "table_key": "k_fy24", + "path": ["Sales", "Archive", "fy24"], + "metadata": {"description": "FY24 archive", "columns": []}, + }, + { + "name": "customers", + "table_key": "k_customers", + "path": ["customers"], + "metadata": {"description": "Customer dimension", "columns": []}, + }, +] + + +class TestListSourcesSummary: + def test_flat_and_hierarchical(self, tmp_path: Path) -> None: + from data_formulator.datalake.catalog_cache import list_sources_summary + + save_catalog(tmp_path, "pg_prod", _HIER_TABLES) + save_catalog(tmp_path, "flat_src", [ + {"name": "t1", "table_key": "k1", "metadata": {}}, + {"name": "t2", "table_key": "k2", "metadata": {}}, + ]) + + summary = list_sources_summary(tmp_path) + by_id = {s["source_id"]: s for s in summary} + assert by_id["pg_prod"]["table_count"] == 4 + assert by_id["pg_prod"]["is_hierarchical"] is True + assert by_id["flat_src"]["table_count"] == 2 + assert by_id["flat_src"]["is_hierarchical"] is False + + def test_empty_when_no_cache(self, tmp_path: Path) -> None: + from data_formulator.datalake.catalog_cache import list_sources_summary + + assert list_sources_summary(tmp_path) == [] + + +class TestListPathChildren: + @pytest.fixture(autouse=True) + def _setup(self, tmp_path: Path) -> None: + self.user_home = tmp_path + save_catalog(tmp_path, "pg_prod", _HIER_TABLES) + + def test_root_lists_folders_and_top_level_tables(self) -> None: + from data_formulator.datalake.catalog_cache import list_path_children + + result = list_path_children(self.user_home, "pg_prod") + + folder_names = {f["name"] for f in result["folders"]} + table_names = {t["name"] for t in result["tables"]} + assert folder_names == {"Sales"} + assert table_names == {"customers"} + assert result["total_folders"] == 1 + assert result["total_tables"] == 1 + assert result["truncated"] is False + + def test_drill_into_folder(self) -> None: + from data_formulator.datalake.catalog_cache import list_path_children + + result = list_path_children(self.user_home, "pg_prod", path=["Sales"]) + + folder_names = {f["name"] for f in result["folders"]} + table_names = {t["name"] for t in result["tables"]} + assert folder_names == {"Archive"} + assert table_names == {"monthly_orders", "monthly_returns"} + + def test_filter_narrows_results(self) -> None: + from data_formulator.datalake.catalog_cache import list_path_children + + result = list_path_children( + self.user_home, "pg_prod", path=["Sales"], filter="orders", + ) + assert {t["name"] for t in result["tables"]} == {"monthly_orders"} + assert result["folders"] == [] + + def test_missing_source_returns_empty(self, tmp_path: Path) -> None: + from data_formulator.datalake.catalog_cache import list_path_children + + result = list_path_children(tmp_path, "missing_src") + assert result["folders"] == [] + assert result["tables"] == [] + assert result["truncated"] is False + + def test_truncation_includes_hint(self) -> None: + from data_formulator.datalake.catalog_cache import list_path_children + + # 5 leaves at root, cap to 2 → truncated with hint. + many_root = [ + {"name": f"t{i}", "table_key": f"k{i}", "path": [f"t{i}"], "metadata": {}} + for i in range(5) + ] + save_catalog(self.user_home, "many_src", many_root) + result = list_path_children(self.user_home, "many_src", limit=2) + + assert result["truncated"] is True + assert len(result["tables"]) == 2 + assert "hint" in result + assert result["total_tables"] == 5 + + +# ================================================================== +# Tests: search_catalog_cache regex / exclude / fields / path_prefix +# ================================================================== + +class TestSearchCatalogCacheExtended: + @pytest.fixture(autouse=True) + def _setup(self, tmp_path: Path) -> None: + self.user_home = tmp_path + save_catalog(tmp_path, "pg_prod", _HIER_TABLES) + + def test_regex_alternation_matches_two_tables(self) -> None: + results = search_catalog_cache( + self.user_home, "monthly_(orders|returns)", + ) + names = {r["name"] for r in results} + assert names == {"monthly_orders", "monthly_returns"} + + def test_exclude_pattern_filters_out_matches(self) -> None: + results = search_catalog_cache( + self.user_home, "monthly", exclude_pattern="returns", + ) + names = {r["name"] for r in results} + assert names == {"monthly_orders"} + + def test_path_prefix_scopes_search(self) -> None: + results = search_catalog_cache( + self.user_home, "customers|monthly_orders", + path_prefix=["Sales"], + ) + names = {r["name"] for r in results} + # ``customers`` is at the root → must be excluded by the prefix. + assert names == {"monthly_orders"} + + def test_fields_restricts_search_surface(self) -> None: + # ``archive`` appears only in the FY24 description; the leaf name + # is ``fy24``. Restricting to ``name`` should miss; ``description`` + # should hit. + name_only = search_catalog_cache( + self.user_home, "archive", fields=["name"], + ) + desc_only = search_catalog_cache( + self.user_home, "archive", fields=["description"], + ) + assert name_only == [] + assert {r["name"] for r in desc_only} == {"fy24"} + + def test_bad_regex_raises_catalog_search_error(self) -> None: + from data_formulator.datalake.catalog_cache import CatalogSearchError + + with pytest.raises(CatalogSearchError): + search_catalog_cache(self.user_home, "(") # ================================================================== diff --git a/tests/backend/data/test_data_connector_config.py b/tests/backend/data/test_data_connector_config.py index f0cf1542..8205ea15 100644 --- a/tests/backend/data/test_data_connector_config.py +++ b/tests/backend/data/test_data_connector_config.py @@ -517,6 +517,7 @@ def test_create_connector_persists_only_non_auth_params(self, app, tmp_path): assert entry["default_params"] == { "host": "db.local", "database": "analytics", + "user": "alice", } diff --git a/tests/backend/data/test_data_connector_framework.py b/tests/backend/data/test_data_connector_framework.py index eef8b47a..dde99993 100644 --- a/tests/backend/data/test_data_connector_framework.py +++ b/tests/backend/data/test_data_connector_framework.py @@ -342,6 +342,7 @@ def test_pinned_params_do_not_expose_auth_or_sensitive_values(self): assert cfg["pinned_params"] == { "host": "db.local", "database": "analytics", + "user": "alice", } form_names = {f["name"] for f in cfg["params_form"]} assert "user" not in form_names diff --git a/tests/backend/data/test_df_to_safe_records.py b/tests/backend/data/test_df_to_safe_records.py new file mode 100644 index 00000000..177aa569 --- /dev/null +++ b/tests/backend/data/test_df_to_safe_records.py @@ -0,0 +1,79 @@ +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT License. + +"""Tests for df_to_safe_records — the single entry-point for +DataFrame → JSON-safe records conversion.""" + +import math +from datetime import date, datetime + +import numpy as np +import pandas as pd +import pytest + +pytestmark = [pytest.mark.backend] + + +from data_formulator.datalake.parquet_utils import df_to_safe_records + + +class TestDatetimeSerialization: + """datetime64 columns must produce ISO-8601 strings, not epoch numbers.""" + + def test_datetime_column_returns_iso_string(self): + df = pd.DataFrame({"ts": pd.to_datetime(["2026-03-15", "2026-04-20"])}) + records = df_to_safe_records(df) + assert records[0]["ts"] == "2026-03-15T00:00:00.000" + assert records[1]["ts"] == "2026-04-20T00:00:00.000" + + def test_datetime_with_time_component(self): + df = pd.DataFrame({"ts": pd.to_datetime(["2026-03-15 14:30:00"])}) + records = df_to_safe_records(df) + assert "14:30:00" in records[0]["ts"] + + def test_nat_becomes_null(self): + df = pd.DataFrame({"ts": pd.to_datetime(["2026-01-01", None])}) + records = df_to_safe_records(df) + assert records[0]["ts"] is not None + assert records[1]["ts"] is None + + +class TestMixedTypes: + """DataFrames with mixed column types should serialize correctly.""" + + def test_int_string_datetime_mixed(self): + df = pd.DataFrame({ + "id": [1, 2], + "name": ["Alice", "Bob"], + "created": pd.to_datetime(["2026-01-01", "2026-06-15"]), + }) + records = df_to_safe_records(df) + assert records[0]["id"] == 1 + assert records[0]["name"] == "Alice" + assert isinstance(records[0]["created"], str) + assert records[0]["created"].startswith("2026-01-01") + + def test_float_with_nan(self): + df = pd.DataFrame({"val": [1.5, float("nan"), 3.0]}) + records = df_to_safe_records(df) + assert records[0]["val"] == 1.5 + assert records[1]["val"] is None + assert records[2]["val"] == 3.0 + + +class TestEdgeCases: + """Empty DataFrames and exotic types.""" + + def test_empty_dataframe(self): + df = pd.DataFrame({"a": pd.Series([], dtype="int64")}) + assert df_to_safe_records(df) == [] + + def test_empty_dataframe_no_columns(self): + df = pd.DataFrame() + assert df_to_safe_records(df) == [] + + def test_default_handler_catches_exotic_types(self): + df = pd.DataFrame({"val": [np.int64(42), np.float64(3.14)]}) + records = df_to_safe_records(df) + assert records[0]["val"] == 42 + assert abs(records[1]["val"] - 3.14) < 0.001 diff --git a/tests/backend/data/test_phase5_agent_metadata.py b/tests/backend/data/test_phase5_agent_metadata.py index 0c52187b..a616753d 100644 --- a/tests/backend/data/test_phase5_agent_metadata.py +++ b/tests/backend/data/test_phase5_agent_metadata.py @@ -320,133 +320,6 @@ def test_lookups_graceful_without_user_home(self): assert col_metas == {} -# ── handle_search_data_tables ───────────────────────────────────────── - -class TestSearchDataTablesTool: - """Tests for the search_data_tables agent tool handler.""" - - def test_workspace_scope_returns_results(self): - from data_formulator.agents.context import handle_search_data_tables - - ws = MagicMock() - ws_meta = WorkspaceMetadata.create_new() - ws_meta.add_table(TableMetadata( - name="orders", - source_type="data_loader", - filename="orders.parquet", - file_type="parquet", - created_at=datetime.now(timezone.utc), - description="Order data", - columns=[ColumnInfo("order_id", "int64")], - )) - ws.get_metadata.return_value = ws_meta - - result = handle_search_data_tables("order", "workspace", ws) - assert "orders" in result - assert "imported" in result - - def test_all_scope_combines_workspace_and_cache(self): - from data_formulator.agents.context import handle_search_data_tables - from data_formulator.datalake.catalog_cache import save_catalog - with tempfile.TemporaryDirectory() as tmp: - ws = MagicMock() - ws_meta = WorkspaceMetadata.create_new() - ws_meta.add_table(TableMetadata( - name="orders", - source_type="data_loader", - filename="orders.parquet", - file_type="parquet", - created_at=datetime.now(timezone.utc), - description="imported order table", - )) - ws.get_metadata.return_value = ws_meta - - save_catalog(tmp, "pg", [ - {"name": "remote_orders", "table_key": "uuid-remote-1", - "metadata": {"description": "Remote order data", "columns": []}}, - ]) - - ws.user_home = Path(tmp) - result = handle_search_data_tables("order", "all", ws) - assert "orders" in result - assert "remote_orders" in result - - def test_not_imported_results_include_source_id_and_table_key(self): - """Not-imported search results must include source_id and table_key for read_catalog_metadata.""" - from data_formulator.agents.context import handle_search_data_tables - from data_formulator.datalake.catalog_cache import save_catalog - with tempfile.TemporaryDirectory() as tmp: - ws = MagicMock() - ws.get_metadata.return_value = None - - save_catalog(tmp, "superset_prod", [ - {"name": "monthly_orders", "table_key": "uuid-42", - "metadata": {"description": "Monthly order aggregation", "columns": []}}, - ]) - - ws.user_home = Path(tmp) - result = handle_search_data_tables("order", "all", ws) - assert "source_id: superset_prod" in result - assert "table_key: uuid-42" in result - - def test_search_then_read_catalog_metadata_roundtrip(self): - """search_data_tables output provides source_id/table_key for read_catalog_metadata.""" - from data_formulator.agents.context import handle_search_data_tables, handle_read_catalog_metadata - from data_formulator.datalake.catalog_cache import save_catalog - with tempfile.TemporaryDirectory() as tmp: - ws = MagicMock() - ws.get_metadata.return_value = None - - save_catalog(tmp, "pg_analytics", [ - {"name": "revenue_summary", "table_key": "rev-uuid-1", - "metadata": { - "description": "Monthly revenue rollup", - "columns": [ - {"name": "month", "type": "DATE"}, - {"name": "total_revenue", "type": "DECIMAL"}, - ], - "source_metadata_status": "synced", - }}, - ]) - - ws.user_home = Path(tmp) - search_result = handle_search_data_tables("revenue", "all", ws) - assert "source_id: pg_analytics" in search_result - assert "table_key: rev-uuid-1" in search_result - - read_result = handle_read_catalog_metadata( - "pg_analytics", "rev-uuid-1", workspace=ws, - ) - assert "revenue_summary" in read_result - assert "total_revenue" in read_result - assert "DECIMAL" in read_result - assert "synced" in read_result - - def test_empty_query_returns_message(self): - from data_formulator.agents.context import handle_search_data_tables - ws = MagicMock() - result = handle_search_data_tables("", "all", ws) - assert "keyword" in result.lower() or "provide" in result.lower() - - def test_results_do_not_contain_credentials(self): - """Search results must not leak loader_params, tokens, or connection strings.""" - from data_formulator.agents.context import handle_search_data_tables - from data_formulator.datalake.catalog_cache import save_catalog - with tempfile.TemporaryDirectory() as tmp: - ws = MagicMock() - ws.get_metadata.return_value = None - save_catalog(tmp, "pg", [ - {"name": "users", "metadata": { - "description": "User table", - "columns": [{"name": "email"}], - }}, - ]) - ws.user_home = Path(tmp) - result = handle_search_data_tables("user", "all", ws) - for sensitive in ("password", "api_key", "secret", "token", "connection_string"): - assert sensitive not in result.lower() - - # ── _merge_source_metadata — empty description clears ───────────────── class TestMergeSourceMetadataEmptyClear: diff --git a/tests/backend/data/test_plugin_scanner.py b/tests/backend/data/test_plugin_scanner.py new file mode 100644 index 00000000..fef14ddd --- /dev/null +++ b/tests/backend/data/test_plugin_scanner.py @@ -0,0 +1,281 @@ +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT License. + +"""Tests for the data-loader plugin scanner. + +The scanner runs at module import time, so each test sets up env vars and +plugin files first, then reloads the ``data_formulator.data_loader`` +module to trigger a fresh scan. +""" + +from __future__ import annotations + +import importlib +import os +import sys +from pathlib import Path + +import pytest + +pytestmark = [pytest.mark.backend] + + +GOOD_PLUGIN = """\ +from data_formulator.data_loader.external_data_loader import ExternalDataLoader + + +class DemoLoader(ExternalDataLoader): + def __init__(self, params=None): + self.params = params or {} +""" + +BROKEN_IMPORT_PLUGIN = "import nonexistent_pkg_xyz_for_test # noqa: F401\n" + +NO_SUBCLASS_PLUGIN = "x = 1\n" + +OVERRIDE_BUILTIN_PLUGIN = """\ +from data_formulator.data_loader.external_data_loader import ExternalDataLoader + + +class OverrideMysql(ExternalDataLoader): + def __init__(self, params=None): + self.params = params or {} +""" + +MULTI_CLASS_PLUGIN = """\ +from data_formulator.data_loader.external_data_loader import ExternalDataLoader + + +class AaaLoader(ExternalDataLoader): + def __init__(self, params=None): + self.params = params or {} + + +class BbbLoader(ExternalDataLoader): + def __init__(self, params=None): + self.params = params or {} +""" + + +def _write(dir_: Path, name: str, body: str) -> Path: + p = dir_ / name + p.write_text(body) + return p + + +def _reload_scanner(): + """Purge cached scanner state and reimport so the scan re-runs.""" + for mod_name in list(sys.modules): + if mod_name == "data_formulator.data_loader" or mod_name.startswith("df_plugin_"): + sys.modules.pop(mod_name, None) + return importlib.import_module("data_formulator.data_loader") + + +# ── gating ──────────────────────────────────────────────────────────────── + + +def test_scanner_disabled_in_hosted_mode(tmp_path, monkeypatch): + """Plugin scanning is off when WORKSPACE_BACKEND != local and no opt-in.""" + _write(tmp_path, "demo_data_loader.py", GOOD_PLUGIN) + monkeypatch.setenv("DF_PLUGIN_DIR", str(tmp_path)) + monkeypatch.setenv("WORKSPACE_BACKEND", "azure_blob") + monkeypatch.delenv("DF_ALLOW_PLUGINS", raising=False) + + dl = _reload_scanner() + + assert dl.PLUGIN_LOADERS == {} + assert "demo" not in dl.DATA_LOADERS + + +def test_scanner_enabled_in_local_mode(tmp_path, monkeypatch): + _write(tmp_path, "demo_data_loader.py", GOOD_PLUGIN) + monkeypatch.setenv("DF_PLUGIN_DIR", str(tmp_path)) + monkeypatch.setenv("WORKSPACE_BACKEND", "local") + + dl = _reload_scanner() + + assert "demo" in dl.PLUGIN_LOADERS + assert dl.PLUGIN_LOADERS["demo"].endswith("demo_data_loader.py") + assert "demo" in dl.DATA_LOADERS + assert dl.DATA_LOADERS["demo"].__name__ == "DemoLoader" + + +def test_scanner_opt_in_overrides_hosted_gate(tmp_path, monkeypatch): + _write(tmp_path, "demo_data_loader.py", GOOD_PLUGIN) + monkeypatch.setenv("DF_PLUGIN_DIR", str(tmp_path)) + monkeypatch.setenv("WORKSPACE_BACKEND", "azure_blob") + monkeypatch.setenv("DF_ALLOW_PLUGINS", "1") + + dl = _reload_scanner() + + assert "demo" in dl.PLUGIN_LOADERS + + +# ── failure paths surface in DISABLED_LOADERS ───────────────────────────── + + +def test_missing_dependency_recorded_with_pip_hint(tmp_path, monkeypatch): + _write(tmp_path, "broken_data_loader.py", BROKEN_IMPORT_PLUGIN) + monkeypatch.setenv("DF_PLUGIN_DIR", str(tmp_path)) + monkeypatch.setenv("WORKSPACE_BACKEND", "local") + + dl = _reload_scanner() + + assert "broken" not in dl.DATA_LOADERS + assert "broken" not in dl.PLUGIN_LOADERS + assert "broken" in dl.DISABLED_LOADERS + msg = dl.DISABLED_LOADERS["broken"] + assert "nonexistent_pkg_xyz_for_test" in msg + assert "pip install" in msg + + +def test_no_subclass_recorded_in_disabled(tmp_path, monkeypatch): + _write(tmp_path, "empty_data_loader.py", NO_SUBCLASS_PLUGIN) + monkeypatch.setenv("DF_PLUGIN_DIR", str(tmp_path)) + monkeypatch.setenv("WORKSPACE_BACKEND", "local") + + dl = _reload_scanner() + + assert "empty" not in dl.DATA_LOADERS + assert "empty" in dl.DISABLED_LOADERS + assert "no ExternalDataLoader subclass" in dl.DISABLED_LOADERS["empty"] + + +def test_broken_plugin_does_not_leak_sys_modules(tmp_path, monkeypatch): + """A failed exec_module should not leave a half-initialized module behind.""" + _write(tmp_path, "broken_data_loader.py", BROKEN_IMPORT_PLUGIN) + monkeypatch.setenv("DF_PLUGIN_DIR", str(tmp_path)) + monkeypatch.setenv("WORKSPACE_BACKEND", "local") + + _reload_scanner() + + assert "df_plugin_broken" not in sys.modules + + +# ── override semantics ──────────────────────────────────────────────────── + + +def test_plugin_overriding_builtin_is_rejected(tmp_path, monkeypatch): + """Security: a plugin keyed ``mysql`` must NOT replace the built-in. + + Silent override would let a malicious plugin capture credentials for + every existing MySQL connection. The scanner rejects the override and + records a structured entry in PLUGIN_ERRORS so the UI can surface it. + """ + _write(tmp_path, "mysql_data_loader.py", OVERRIDE_BUILTIN_PLUGIN) + monkeypatch.setenv("DF_PLUGIN_DIR", str(tmp_path)) + monkeypatch.setenv("WORKSPACE_BACKEND", "local") + + dl = _reload_scanner() + + # Plugin must NOT be registered. + assert "mysql" not in dl.PLUGIN_LOADERS + # If the built-in is present (pymysql installed), it must remain the + # original class — definitely not the plugin's class. + if "mysql" in dl.DATA_LOADERS: + assert dl.DATA_LOADERS["mysql"].__name__ != "OverrideMysql" + # Error must surface in PLUGIN_ERRORS with the right shape. + errors = [e for e in dl.PLUGIN_ERRORS if e["kind"] == "override_builtin"] + assert len(errors) == 1 + assert errors[0]["file"].endswith("mysql_data_loader.py") + assert "mysql" in errors[0]["reason"] + + +def test_duplicate_plugin_keys_are_rejected(tmp_path, monkeypatch): + """Two plugins claiming the same registry key: second is rejected.""" + # Both files have prefix ``demo`` → same registry key. + _write(tmp_path, "demo_data_loader.py", GOOD_PLUGIN) + sub = tmp_path / "sub" + sub.mkdir() + # Same dir collision: write a second file that also resolves to "demo". + second = tmp_path / "demo_data_loader.py.bak" # won't collide + # Instead, use the actual collision path via two distinct filenames that + # produce the same key. Since key = filename minus "_data_loader", + # two files cannot share a name in one dir. So this test instead + # verifies that re-scanning twice without clearing PLUGIN_LOADERS + # rejects the second registration. Easiest path: write the same file, + # call _load_plugin_file twice on it manually. + monkeypatch.setenv("DF_PLUGIN_DIR", str(tmp_path)) + monkeypatch.setenv("WORKSPACE_BACKEND", "local") + dl = _reload_scanner() + assert "demo" in dl.PLUGIN_LOADERS + + # Manually re-invoke loader on the same file → simulates collision. + from pathlib import Path as _P + dl._load_plugin_file(_P(tmp_path) / "demo_data_loader.py") # type: ignore[attr-defined] + dups = [e for e in dl.PLUGIN_ERRORS if e["kind"] == "duplicate"] + assert len(dups) >= 1 + + +# ── multiple subclasses ─────────────────────────────────────────────────── + + +def test_multiple_subclasses_registers_first_alphabetically(tmp_path, monkeypatch): + _write(tmp_path, "multi_data_loader.py", MULTI_CLASS_PLUGIN) + monkeypatch.setenv("DF_PLUGIN_DIR", str(tmp_path)) + monkeypatch.setenv("WORKSPACE_BACKEND", "local") + + dl = _reload_scanner() + + assert "multi" in dl.PLUGIN_LOADERS + # inspect.getmembers returns alphabetically-sorted; AaaLoader wins. + assert dl.DATA_LOADERS["multi"].__name__ == "AaaLoader" + + +# ── empty / missing plugin dir ──────────────────────────────────────────── + + +def test_missing_plugin_dir_is_silent(tmp_path, monkeypatch): + missing = tmp_path / "does_not_exist" + monkeypatch.setenv("DF_PLUGIN_DIR", str(missing)) + monkeypatch.setenv("WORKSPACE_BACKEND", "local") + + dl = _reload_scanner() + + assert dl.PLUGIN_LOADERS == {} + + +def test_empty_plugin_dir_is_silent(tmp_path, monkeypatch): + monkeypatch.setenv("DF_PLUGIN_DIR", str(tmp_path)) + monkeypatch.setenv("WORKSPACE_BACKEND", "local") + + dl = _reload_scanner() + + assert dl.PLUGIN_LOADERS == {} + + +# ── plugin dir resolution ───────────────────────────────────────────────── + + +def test_plugin_dir_defaults_to_data_formulator_home(tmp_path, monkeypatch): + """``DATA_FORMULATOR_HOME/plugins`` is the default when DF_PLUGIN_DIR unset.""" + plugins = tmp_path / "plugins" + plugins.mkdir() + _write(plugins, "demo_data_loader.py", GOOD_PLUGIN) + + monkeypatch.delenv("DF_PLUGIN_DIR", raising=False) + monkeypatch.setenv("DATA_FORMULATOR_HOME", str(tmp_path)) + monkeypatch.setenv("WORKSPACE_BACKEND", "local") + + dl = _reload_scanner() + + assert dl.PLUGIN_DIR == str(plugins) + assert "demo" in dl.PLUGIN_LOADERS + + +def test_df_plugin_dir_overrides_data_formulator_home(tmp_path, monkeypatch): + """Explicit DF_PLUGIN_DIR wins over DATA_FORMULATOR_HOME/plugins.""" + df_home = tmp_path / "home" + (df_home / "plugins").mkdir(parents=True) + explicit = tmp_path / "explicit" + explicit.mkdir() + _write(explicit, "demo_data_loader.py", GOOD_PLUGIN) + + monkeypatch.setenv("DATA_FORMULATOR_HOME", str(df_home)) + monkeypatch.setenv("DF_PLUGIN_DIR", str(explicit)) + monkeypatch.setenv("WORKSPACE_BACKEND", "local") + + dl = _reload_scanner() + + assert dl.PLUGIN_DIR == str(explicit) + assert "demo" in dl.PLUGIN_LOADERS diff --git a/tests/backend/data/test_workspace_manager.py b/tests/backend/data/test_workspace_manager.py index d159324d..e2a00eb7 100644 --- a/tests/backend/data/test_workspace_manager.py +++ b/tests/backend/data/test_workspace_manager.py @@ -128,6 +128,38 @@ def test_sensitive_fields_stripped(self, manager): def test_load_nonexistent(self, manager): assert manager.load_session_state("nope") is None + def test_update_display_name_patches_session_state(self, manager): + """update_display_name writes both meta and session_state.""" + manager.create_workspace("ws") + manager.save_session_state("ws", { + "tables": [], + "activeWorkspace": {"id": "ws", "displayName": "Old Name"}, + }) + + manager.update_display_name("ws", "New Name") + + meta = json.loads( + (manager.get_workspace_path("ws") / WORKSPACE_META_FILENAME) + .read_text(encoding="utf-8") + ) + assert meta["displayName"] == "New Name" + + state = manager.load_session_state("ws") + assert state["activeWorkspace"]["displayName"] == "New Name" + + def test_update_display_name_skips_missing_session_state(self, manager): + """update_display_name does not error when session_state.json is absent.""" + manager.create_workspace("ws") + # No save_session_state call — file does not exist + + manager.update_display_name("ws", "Some Name") + + meta = json.loads( + (manager.get_workspace_path("ws") / WORKSPACE_META_FILENAME) + .read_text(encoding="utf-8") + ) + assert meta["displayName"] == "Some Name" + def test_save_to_nonexistent_workspace_raises(self, manager): with pytest.raises(ValueError, match="does not exist"): manager.save_session_state("nope", {"tables": []}) diff --git a/tests/backend/routes/test_chart_insight_route.py b/tests/backend/routes/test_chart_insight_route.py index 28cdece4..f6ba3c41 100644 --- a/tests/backend/routes/test_chart_insight_route.py +++ b/tests/backend/routes/test_chart_insight_route.py @@ -85,17 +85,6 @@ def test_missing_model_returns_error(self, client) -> None: assert data["status"] == "error" assert data["error"]["code"] == ErrorCode.INVALID_REQUEST - @patch("data_formulator.routes.agents.model_supports_vision", return_value=False) - def test_non_vision_model_returns_error(self, _mock_vision, client) -> None: - resp = client.post( - "/api/agent/chart-insight", - json=_valid_body(), - ) - data = resp.get_json() - assert data["status"] == "error" - assert data["error"]["code"] == ErrorCode.VALIDATION_ERROR - assert "vision" in data["error"]["message"].lower() - # --------------------------------------------------------------------------- # Success path @@ -103,7 +92,6 @@ def test_non_vision_model_returns_error(self, _mock_vision, client) -> None: class TestChartInsightSuccess: - @patch("data_formulator.routes.agents.model_supports_vision", return_value=True) @patch("data_formulator.routes.agents._get_knowledge_store") @patch("data_formulator.routes.agents.get_workspace") @patch("data_formulator.routes.agents.get_identity_id", return_value="test-user") @@ -116,7 +104,6 @@ def test_success_returns_title_and_takeaways( mock_get_identity, mock_get_workspace, mock_get_ks, - mock_vision, client, ) -> None: agent_instance = MagicMock() @@ -141,14 +128,13 @@ def test_success_returns_title_and_takeaways( class TestChartInsightAgentErrors: - @patch("data_formulator.routes.agents.model_supports_vision", return_value=True) @patch("data_formulator.routes.agents._get_knowledge_store") @patch("data_formulator.routes.agents.get_workspace") @patch("data_formulator.routes.agents.get_identity_id", return_value="test-user") @patch("data_formulator.routes.agents.get_client") @patch("data_formulator.routes.agents.ChartInsightAgent") def test_empty_candidates_returns_agent_error( - self, MockAgent, mock_client, mock_id, mock_ws, mock_ks, mock_vision, client, + self, MockAgent, mock_client, mock_id, mock_ws, mock_ks, client, ) -> None: MockAgent.return_value.run.return_value = [] @@ -157,14 +143,13 @@ def test_empty_candidates_returns_agent_error( assert data["status"] == "error" assert data["error"]["code"] == ErrorCode.AGENT_ERROR - @patch("data_formulator.routes.agents.model_supports_vision", return_value=True) @patch("data_formulator.routes.agents._get_knowledge_store") @patch("data_formulator.routes.agents.get_workspace") @patch("data_formulator.routes.agents.get_identity_id", return_value="test-user") @patch("data_formulator.routes.agents.get_client") @patch("data_formulator.routes.agents.ChartInsightAgent") def test_candidate_status_not_ok_returns_agent_error( - self, MockAgent, mock_client, mock_id, mock_ws, mock_ks, mock_vision, client, + self, MockAgent, mock_client, mock_id, mock_ws, mock_ks, client, ) -> None: MockAgent.return_value.run.return_value = [{"status": "error", "content": "parse fail"}] @@ -173,14 +158,13 @@ def test_candidate_status_not_ok_returns_agent_error( assert data["status"] == "error" assert data["error"]["code"] == ErrorCode.AGENT_ERROR - @patch("data_formulator.routes.agents.model_supports_vision", return_value=True) @patch("data_formulator.routes.agents._get_knowledge_store") @patch("data_formulator.routes.agents.get_workspace") @patch("data_formulator.routes.agents.get_identity_id", return_value="test-user") @patch("data_formulator.routes.agents.get_client") @patch("data_formulator.routes.agents.ChartInsightAgent") def test_llm_exception_returns_classified_error( - self, MockAgent, mock_client, mock_id, mock_ws, mock_ks, mock_vision, client, + self, MockAgent, mock_client, mock_id, mock_ws, mock_ks, client, ) -> None: exc = Exception("Error code: 401 - Unauthorized, invalid api key") MockAgent.return_value.run.side_effect = exc diff --git a/tests/backend/routes/test_data_loaders_discovery.py b/tests/backend/routes/test_data_loaders_discovery.py new file mode 100644 index 00000000..52ab084e --- /dev/null +++ b/tests/backend/routes/test_data_loaders_discovery.py @@ -0,0 +1,200 @@ +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT License. + +"""Integration test: plugin loaders appear in /api/data-loaders. + +This is the end-to-end "drop a file → connector card appears" contract. +It guards against future refactors of the discovery endpoint silently +dropping plugins. +""" + +from __future__ import annotations + +import importlib +import sys +from pathlib import Path + +import flask +import pytest + +pytestmark = [pytest.mark.backend] + + +PLUGIN_BODY = '''\ +from data_formulator.data_loader.external_data_loader import ExternalDataLoader +import pyarrow as pa + + +class FakeWarehouseLoader(ExternalDataLoader): + DISPLAY_NAME = "Fake Warehouse" + + @staticmethod + def list_params(): + return [ + {"name": "endpoint", "type": "string", "required": True, + "tier": "connection", "description": "Server URL"}, + ] + + @staticmethod + def auth_instructions(): + return "Test plugin — no auth required." + + def __init__(self, params=None): + self.params = params or {} + + def list_tables(self, table_filter=None): + return [] + + def fetch_data_as_arrow(self, source_table, import_options=None): + return pa.table({}) +''' + + +def _reload_data_loader_module(): + """Purge cached state and reimport so the scanner runs against the + current env vars.""" + for mod_name in list(sys.modules): + if mod_name == "data_formulator.data_loader" or mod_name.startswith("df_plugin_"): + sys.modules.pop(mod_name, None) + return importlib.import_module("data_formulator.data_loader") + + +@pytest.fixture() +def client_with_plugin(tmp_path, monkeypatch): + """Spin up a Flask app whose data_loader registry includes a fake plugin.""" + plugins_dir = tmp_path / "plugins" + plugins_dir.mkdir() + (plugins_dir / "fake_warehouse_data_loader.py").write_text(PLUGIN_BODY) + + monkeypatch.setenv("DATA_FORMULATOR_HOME", str(tmp_path)) + monkeypatch.setenv("WORKSPACE_BACKEND", "local") + monkeypatch.delenv("DF_PLUGIN_DIR", raising=False) + + dl = _reload_data_loader_module() + assert "fake_warehouse" in dl.PLUGIN_LOADERS, "plugin scanner didn't pick up fixture" + + # Mount just the discovery blueprint on a minimal Flask app. + from data_formulator.data_connector import connectors_bp + from data_formulator.error_handler import register_error_handlers + + test_app = flask.Flask(__name__) + test_app.config["TESTING"] = True + test_app.register_blueprint(connectors_bp) + register_error_handlers(test_app) + return test_app.test_client() + + +def _loaders_by_type(payload): + """Normalize the response envelope and return loaders keyed by type.""" + body = payload.get("data", payload) if isinstance(payload, dict) else payload + return {ldr["type"]: ldr for ldr in body["loaders"]} + + +def test_plugin_appears_in_discovery_endpoint(client_with_plugin): + resp = client_with_plugin.get("/api/data-loaders") + assert resp.status_code == 200 + + loaders = _loaders_by_type(resp.get_json()) + assert "fake_warehouse" in loaders + + plugin = loaders["fake_warehouse"] + # DISPLAY_NAME class attr overrides the title-cased default. + assert plugin["name"] == "Fake Warehouse" + # Source attribution: plugin file path surfaces in the API. + assert plugin["source"] == "plugin" + assert plugin["source_path"].endswith("fake_warehouse_data_loader.py") + # User-declared params come through (plus the common table_filter row). + param_names = {p["name"] for p in plugin["params"]} + assert "endpoint" in param_names + assert "table_filter" in param_names + # Auth instructions surface verbatim. + assert "Test plugin" in plugin["auth_instructions"] + + +def test_builtin_loader_marked_as_builtin(client_with_plugin): + """Built-ins must be tagged source='builtin' (regression guard).""" + resp = client_with_plugin.get("/api/data-loaders") + loaders = _loaders_by_type(resp.get_json()) + + # Find any built-in that's available in the test env. + builtin_candidates = ["sample_datasets", "mysql", "postgresql", "s3"] + builtin = next((loaders[k] for k in builtin_candidates if k in loaders), None) + assert builtin is not None, "no built-in loader available in test env" + + assert builtin["source"] == "builtin" + assert builtin["source_path"] is None + + +def test_display_name_default_titlecases_registry_key(tmp_path, monkeypatch): + """A plugin without DISPLAY_NAME gets title-cased registry-key as name.""" + plugins_dir = tmp_path / "plugins" + plugins_dir.mkdir() + (plugins_dir / "my_thing_data_loader.py").write_text( + PLUGIN_BODY.replace('DISPLAY_NAME = "Fake Warehouse"', "") + .replace("FakeWarehouseLoader", "MyThingLoader") + ) + + monkeypatch.setenv("DATA_FORMULATOR_HOME", str(tmp_path)) + monkeypatch.setenv("WORKSPACE_BACKEND", "local") + monkeypatch.delenv("DF_PLUGIN_DIR", raising=False) + + _reload_data_loader_module() + + from data_formulator.data_connector import connectors_bp + from data_formulator.error_handler import register_error_handlers + + test_app = flask.Flask(__name__) + test_app.config["TESTING"] = True + test_app.register_blueprint(connectors_bp) + register_error_handlers(test_app) + + resp = test_app.test_client().get("/api/data-loaders") + loaders = _loaders_by_type(resp.get_json()) + + assert loaders["my_thing"]["name"] == "My Thing" + + +def test_plugins_block_surfaces_loaded_and_rejected(tmp_path, monkeypatch): + """The discovery endpoint must expose a ``plugins`` summary containing + both successfully-loaded plugins and rejected override attempts. The + frontend uses this to render the security-status alert.""" + plugins_dir = tmp_path / "plugins" + plugins_dir.mkdir() + # One legitimate plugin and one that tries to override a built-in. + (plugins_dir / "fake_warehouse_data_loader.py").write_text(PLUGIN_BODY) + (plugins_dir / "mysql_data_loader.py").write_text( + PLUGIN_BODY.replace("FakeWarehouseLoader", "BadMysql") + .replace('DISPLAY_NAME = "Fake Warehouse"', "") + ) + + monkeypatch.setenv("DATA_FORMULATOR_HOME", str(tmp_path)) + monkeypatch.setenv("WORKSPACE_BACKEND", "local") + monkeypatch.delenv("DF_PLUGIN_DIR", raising=False) + + _reload_data_loader_module() + + from data_formulator.data_connector import connectors_bp + from data_formulator.error_handler import register_error_handlers + + test_app = flask.Flask(__name__) + test_app.config["TESTING"] = True + test_app.register_blueprint(connectors_bp) + register_error_handlers(test_app) + + resp = test_app.test_client().get("/api/data-loaders") + body = resp.get_json() + body = body.get("data", body) + assert "plugins" in body + plugins = body["plugins"] + assert plugins["enabled"] is True + assert plugins["dir"].endswith("plugins") + # Loaded list contains the legitimate plugin. + loaded_types = {p["type"] for p in plugins["loaded"]} + assert "fake_warehouse" in loaded_types + # The mysql override attempt must NOT appear in loaded. + assert "mysql" not in loaded_types + # ...and must appear in the errors list with the override_builtin kind. + err_files = [Path(e["file"]).name for e in plugins["errors"]] + assert "mysql_data_loader.py" in err_files + kinds = {e["kind"] for e in plugins["errors"]} + assert "override_builtin" in kinds diff --git a/tests/backend/routes/test_data_loading_chat_route.py b/tests/backend/routes/test_data_loading_chat_route.py new file mode 100644 index 00000000..07f26108 --- /dev/null +++ b/tests/backend/routes/test_data_loading_chat_route.py @@ -0,0 +1,158 @@ +"""Tests for /api/agent/data-loading-chat route. + +Validates input validation, message forwarding (including image attachments), +and agent error handling via NDJSON streaming. +""" +from __future__ import annotations + +import json +from unittest.mock import MagicMock, patch + +import flask +import pytest + +from data_formulator.errors import AppError, ErrorCode + +pytestmark = [pytest.mark.backend] + + +# --------------------------------------------------------------------------- +# Fixtures +# --------------------------------------------------------------------------- + +@pytest.fixture() +def app(): + test_app = flask.Flask(__name__) + test_app.config["TESTING"] = True + + from data_formulator.error_handler import register_error_handlers + from data_formulator.routes.agents import agent_bp + test_app.register_blueprint(agent_bp) + register_error_handlers(test_app) + + return test_app + + +@pytest.fixture() +def client(app): + return app.test_client() + + +def _valid_body(**overrides): + body = { + "model": {"provider": "openai", "model": "gpt-4o", "name": "gpt-4o"}, + "messages": [{"role": "user", "content": "load some data"}], + } + body.update(overrides) + return body + + +# --------------------------------------------------------------------------- +# Input validation +# --------------------------------------------------------------------------- + +class TestDataLoadingChatValidation: + + def test_non_json_request_returns_error(self, client) -> None: + resp = client.post( + "/api/agent/data-loading-chat", + data="not json", + content_type="text/plain", + ) + data = resp.get_json() + assert data["status"] == "error" + assert data["error"]["code"] == ErrorCode.INVALID_REQUEST + + +# --------------------------------------------------------------------------- +# Success path — messages (with and without images) reach the agent +# --------------------------------------------------------------------------- + +class TestDataLoadingChatSuccess: + + @patch("data_formulator.routes.agents._get_knowledge_store") + @patch("data_formulator.routes.agents.get_language_instruction", return_value="") + @patch("data_formulator.routes.agents.get_workspace") + @patch("data_formulator.routes.agents.get_identity_id", return_value="test-user") + @patch("data_formulator.routes.agents.get_client") + @patch("data_formulator.routes.agents.DataLoadingAgent") + def test_messages_forwarded_to_agent( + self, MockAgent, mock_client, mock_id, mock_ws, mock_lang, mock_ks, client, + ) -> None: + """Ensures the `messages` variable is properly parsed and passed to + agent.stream(). This is the regression test for the NameError bug where + `messages` was accidentally deleted.""" + agent_instance = MagicMock() + agent_instance.stream.return_value = iter([ + {"type": "status", "content": "done"}, + ]) + MockAgent.return_value = agent_instance + + body = _valid_body() + resp = client.post("/api/agent/data-loading-chat", json=body) + assert resp.status_code == 200 + + agent_instance.stream.assert_called_once() + forwarded_messages = agent_instance.stream.call_args[0][0] + assert forwarded_messages == body["messages"] + + @patch("data_formulator.routes.agents._get_knowledge_store") + @patch("data_formulator.routes.agents.get_language_instruction", return_value="") + @patch("data_formulator.routes.agents.get_workspace") + @patch("data_formulator.routes.agents.get_identity_id", return_value="test-user") + @patch("data_formulator.routes.agents.get_client") + @patch("data_formulator.routes.agents.DataLoadingAgent") + def test_image_messages_are_forwarded( + self, MockAgent, mock_client, mock_id, mock_ws, mock_lang, mock_ks, client, + ) -> None: + """Image attachments in messages must reach the agent without being + blocked by any pre-flight vision check.""" + agent_instance = MagicMock() + agent_instance.stream.return_value = iter([ + {"type": "status", "content": "done"}, + ]) + MockAgent.return_value = agent_instance + + messages_with_image = [{ + "role": "user", + "content": "extract data from this", + "attachments": [{"type": "image", "name": "img.png", "url": "data:image/png;base64,abc"}], + }] + resp = client.post( + "/api/agent/data-loading-chat", + json=_valid_body(messages=messages_with_image), + ) + assert resp.status_code == 200 + + forwarded = agent_instance.stream.call_args[0][0] + assert forwarded[0]["attachments"][0]["type"] == "image" + + +# --------------------------------------------------------------------------- +# Agent errors stream correctly +# --------------------------------------------------------------------------- + +class TestDataLoadingChatErrors: + + @patch("data_formulator.routes.agents._get_knowledge_store") + @patch("data_formulator.routes.agents.get_language_instruction", return_value="") + @patch("data_formulator.routes.agents.get_workspace") + @patch("data_formulator.routes.agents.get_identity_id", return_value="test-user") + @patch("data_formulator.routes.agents.get_client") + @patch("data_formulator.routes.agents.DataLoadingAgent") + def test_agent_exception_streams_error_event( + self, MockAgent, mock_client, mock_id, mock_ws, mock_lang, mock_ks, client, + ) -> None: + agent_instance = MagicMock() + agent_instance.stream.side_effect = Exception( + "Error code: 401 - Unauthorized" + ) + MockAgent.return_value = agent_instance + + resp = client.post("/api/agent/data-loading-chat", json=_valid_body()) + assert resp.status_code == 200 + + lines = [l for l in resp.data.decode().strip().split("\n") if l] + assert len(lines) >= 1 + event = json.loads(lines[-1]) + assert event.get("type") == "error" diff --git a/tests/backend/routes/test_session_export_import.py b/tests/backend/routes/test_session_export_import.py new file mode 100644 index 00000000..7b926138 --- /dev/null +++ b/tests/backend/routes/test_session_export_import.py @@ -0,0 +1,228 @@ +"""Unit tests for the export / import session routes. + +Verifies that the endpoints correctly route to the specified workspace_id +instead of relying on the X-Workspace-Id header. +""" + +from __future__ import annotations + +import io +import json +import zipfile +from unittest.mock import MagicMock, patch + +import flask +import pytest + +from data_formulator.routes.sessions import session_bp + +pytestmark = [pytest.mark.backend] + + +@pytest.fixture +def app(): + from data_formulator.error_handler import register_error_handlers + + _app = flask.Flask(__name__) + _app.config["TESTING"] = True + _app.register_blueprint(session_bp) + register_error_handlers(_app) + return _app + + +@pytest.fixture +def client(app): + return app.test_client() + + +def _make_zip_bytes(state: dict | None = None) -> bytes: + """Build a minimal session zip with a state.json entry.""" + buf = io.BytesIO() + with zipfile.ZipFile(buf, "w") as zf: + zf.writestr("state.json", json.dumps(state or {"tables": []})) + buf.seek(0) + return buf.read() + + +# ── Export ──────────────────────────────────────────────────────────────── + + +class TestExportSession: + def test_export_uses_workspace_id_from_body(self, client): + ws = MagicMock() + zip_buf = io.BytesIO(_make_zip_bytes()) + ws.export_session_zip.return_value = zip_buf + + mgr = MagicMock() + mgr.workspace_exists.return_value = True + mgr.open_workspace.return_value = ws + + with ( + patch("data_formulator.routes.sessions._is_ephemeral", return_value=False), + patch("data_formulator.routes.sessions.get_identity_id", return_value="user:alice"), + patch("data_formulator.routes.sessions.get_workspace_manager", return_value=mgr), + patch("data_formulator.datalake.workspace_manager._strip_sensitive", side_effect=lambda s: s), + ): + resp = client.post( + "/api/sessions/export", + json={"state": {"tables": []}, "workspace_id": "ws-123"}, + ) + + assert resp.status_code == 200 + assert resp.content_type.startswith("application/zip") + mgr.open_workspace.assert_called_once_with("ws-123", "user:alice") + + def test_export_rejects_missing_workspace_id(self, client): + with ( + patch("data_formulator.routes.sessions._is_ephemeral", return_value=False), + patch("data_formulator.routes.sessions.get_identity_id", return_value="user:alice"), + ): + resp = client.post( + "/api/sessions/export", + json={"state": {"tables": []}}, + ) + + assert resp.status_code == 200 + body = resp.get_json() + assert body["status"] == "error" + assert body["error"]["code"] == "INVALID_REQUEST" + assert "workspace_id" in body["error"]["message"] + + def test_export_rejects_missing_state(self, client): + with ( + patch("data_formulator.routes.sessions._is_ephemeral", return_value=False), + patch("data_formulator.routes.sessions.get_identity_id", return_value="user:alice"), + ): + resp = client.post( + "/api/sessions/export", + json={"workspace_id": "ws-123"}, + ) + + assert resp.status_code == 200 + body = resp.get_json() + assert body["status"] == "error" + assert body["error"]["code"] == "INVALID_REQUEST" + + def test_export_returns_error_for_unknown_workspace(self, client): + mgr = MagicMock() + mgr.workspace_exists.return_value = False + + with ( + patch("data_formulator.routes.sessions._is_ephemeral", return_value=False), + patch("data_formulator.routes.sessions.get_identity_id", return_value="user:alice"), + patch("data_formulator.routes.sessions.get_workspace_manager", return_value=mgr), + ): + resp = client.post( + "/api/sessions/export", + json={"state": {"tables": []}, "workspace_id": "nonexistent"}, + ) + + assert resp.status_code == 200 + body = resp.get_json() + assert body["status"] == "error" + assert body["error"]["code"] == "TABLE_NOT_FOUND" + + def test_export_rejected_in_ephemeral_mode(self, client): + with ( + patch("data_formulator.routes.sessions._is_ephemeral", return_value=True), + ): + resp = client.post( + "/api/sessions/export", + json={"state": {"tables": []}, "workspace_id": "ws-1"}, + ) + + assert resp.status_code == 200 + body = resp.get_json() + assert body["status"] == "error" + assert body["error"]["code"] == "INVALID_REQUEST" + + +# ── Import ──────────────────────────────────────────────────────────────── + + +class TestImportSession: + def test_import_creates_workspace_when_not_existing(self, client): + ws = MagicMock() + ws.import_session_zip.return_value = { + "tables": [], + "activeWorkspace": {"id": "ws-new", "displayName": "MySession"}, + } + + mgr = MagicMock() + mgr.workspace_exists.return_value = False + mgr.create_and_open_workspace.return_value = ws + + with ( + patch("data_formulator.routes.sessions._is_ephemeral", return_value=False), + patch("data_formulator.routes.sessions.get_identity_id", return_value="user:bob"), + patch("data_formulator.routes.sessions.get_workspace_manager", return_value=mgr), + ): + data = {"file": (io.BytesIO(_make_zip_bytes()), "session.zip")} + resp = client.post( + "/api/sessions/import", + data={**data, "workspace_id": "ws-new"}, + content_type="multipart/form-data", + ) + + assert resp.status_code == 200 + body = resp.get_json() + assert body["status"] == "success" + mgr.create_and_open_workspace.assert_called_once_with("ws-new", "user:bob") + + def test_import_opens_existing_workspace(self, client): + ws = MagicMock() + ws.import_session_zip.return_value = {"tables": []} + + mgr = MagicMock() + mgr.workspace_exists.return_value = True + mgr.open_workspace.return_value = ws + + with ( + patch("data_formulator.routes.sessions._is_ephemeral", return_value=False), + patch("data_formulator.routes.sessions.get_identity_id", return_value="user:bob"), + patch("data_formulator.routes.sessions.get_workspace_manager", return_value=mgr), + ): + data = {"file": (io.BytesIO(_make_zip_bytes()), "session.zip")} + resp = client.post( + "/api/sessions/import", + data={**data, "workspace_id": "ws-existing"}, + content_type="multipart/form-data", + ) + + assert resp.status_code == 200 + mgr.open_workspace.assert_called_once_with("ws-existing", "user:bob") + + def test_import_falls_back_to_active_workspace(self, client): + ws = MagicMock() + ws.import_session_zip.return_value = {"tables": []} + + with ( + patch("data_formulator.routes.sessions._is_ephemeral", return_value=False), + patch("data_formulator.routes.sessions.get_identity_id", return_value="user:bob"), + patch("data_formulator.routes.sessions.get_workspace", return_value=ws), + ): + data = {"file": (io.BytesIO(_make_zip_bytes()), "session.zip")} + resp = client.post( + "/api/sessions/import", + data=data, + content_type="multipart/form-data", + ) + + assert resp.status_code == 200 + body = resp.get_json() + assert body["status"] == "success" + + def test_import_rejects_missing_file(self, client): + with ( + patch("data_formulator.routes.sessions._is_ephemeral", return_value=False), + ): + resp = client.post( + "/api/sessions/import", + data={"workspace_id": "ws-1"}, + content_type="multipart/form-data", + ) + + assert resp.status_code == 200 + body = resp.get_json() + assert body["status"] == "error" + assert body["error"]["code"] == "INVALID_REQUEST" diff --git a/tests/frontend/unit/app/OidcCallback.test.tsx b/tests/frontend/unit/app/OidcCallback.test.tsx new file mode 100644 index 00000000..14db85ea --- /dev/null +++ b/tests/frontend/unit/app/OidcCallback.test.tsx @@ -0,0 +1,98 @@ +/** + * Tests for OidcCallback component error-parameter handling. + * + * When the IdP redirects back with ?error=access_denied (user cancelled), + * the component must redirect to /?auth_error=access_denied instead of + * attempting signinRedirectCallback(). + */ +import React from "react"; +import "@testing-library/jest-dom/vitest"; +import { describe, it, expect, vi, beforeEach } from "vitest"; +import { render, waitFor } from "@testing-library/react"; +import { ThemeProvider, createTheme } from "@mui/material/styles"; + +const mocks = vi.hoisted(() => ({ + getUserManager: vi.fn(), + signinRedirectCallback: vi.fn(), + signinRedirect: vi.fn(), +})); + +vi.mock("../../../../src/app/oidcConfig", () => ({ + getUserManager: () => mocks.getUserManager(), +})); + +vi.mock("react-i18next", () => ({ + useTranslation: () => ({ + t: (key: string, opts?: any) => opts?.message ? `${key}: ${opts.message}` : key, + }), +})); + +import { OidcCallback } from "../../../../src/app/OidcCallback"; + +const theme = createTheme(); +let locationHref = "/auth/callback"; + +function renderCallback() { + return render( + + + , + ); +} + +beforeEach(() => { + vi.clearAllMocks(); + + const mgr = { + signinRedirectCallback: mocks.signinRedirectCallback, + signinRedirect: mocks.signinRedirect, + }; + mocks.getUserManager.mockResolvedValue(mgr); + mocks.signinRedirectCallback.mockResolvedValue({}); + + locationHref = "/auth/callback"; + Object.defineProperty(window, "location", { + writable: true, + configurable: true, + value: { + get href() { return locationHref; }, + set href(v: string) { locationHref = v; }, + search: "", + origin: "http://localhost:3000", + }, + }); +}); + +describe("OidcCallback error handling", () => { + it("redirects to /?auth_error=access_denied when IdP returns error", async () => { + (window.location as any).search = "?error=access_denied&state=abc"; + + renderCallback(); + + await waitFor(() => { + expect(locationHref).toBe("/?auth_error=access_denied"); + }); + expect(mocks.signinRedirectCallback).not.toHaveBeenCalled(); + }); + + it("redirects with encoded error for other IdP error values", async () => { + (window.location as any).search = "?error=consent_required&state=abc"; + + renderCallback(); + + await waitFor(() => { + expect(locationHref).toBe("/?auth_error=consent_required"); + }); + expect(mocks.signinRedirectCallback).not.toHaveBeenCalled(); + }); + + it("proceeds with signinRedirectCallback when no error param", async () => { + (window.location as any).search = "?code=auth-code&state=abc"; + + renderCallback(); + + await waitFor(() => { + expect(mocks.signinRedirectCallback).toHaveBeenCalled(); + }); + }); +}); diff --git a/tests/frontend/unit/lib/agents-chart/vegalite/barTableFacet.test.ts b/tests/frontend/unit/lib/agents-chart/vegalite/barTableFacet.test.ts new file mode 100644 index 00000000..65524e97 --- /dev/null +++ b/tests/frontend/unit/lib/agents-chart/vegalite/barTableFacet.test.ts @@ -0,0 +1,121 @@ +import { describe, expect, it } from 'vitest'; +import { compile } from 'vega-lite'; +import { assembleVegaLite } from '../../../../../../src/lib/agents-chart'; + +const canvasSize = { width: 400, height: 300 }; + +const encoding = { + y: { field: 'agency' }, + x: { field: 'launches' }, + column: { field: 'agency_type' }, +}; + +const semanticTypes = { + agency: 'Category', + launches: 'Quantity', + agency_type: 'Category', +}; + +describe('Vega-Lite Bar Table facets', () => { + it('hoists column facets around the hconcat bar table and wraps them', () => { + const data = [ + { agency_type: 'state', agency: 'RVSN', launches: 1528 }, + { agency_type: 'state', agency: 'UNKS', launches: 904 }, + { agency_type: 'state', agency: 'NASA', launches: 469 }, + { agency_type: 'private', agency: 'Arianespace', launches: 258 }, + { agency_type: 'private', agency: 'ILS-K', launches: 97 }, + { agency_type: 'startup', agency: 'SpaceX', launches: 65 }, + ]; + + const spec = assembleVegaLite({ + data: { values: data }, + semantic_types: semanticTypes, + chart_spec: { + chartType: 'Bar Table', + encodings: encoding, + canvasSize, + }, + }); + + expect(spec.facet).toEqual({ field: 'agency_type', type: 'nominal', sort: null }); + expect(spec.columns).toBe(2); + expect(spec.hconcat).toBeUndefined(); + expect(spec.spec.hconcat).toHaveLength(2); + expect(spec.spec.hconcat[0].data).toBeUndefined(); + expect(spec.data).toEqual({ name: '__bt_displayTable' }); + expect(spec.resolve.scale.y).toBe('independent'); + expect(spec.spec.resolve.scale.y).toBe('shared'); + expect(spec.spec.hconcat[0].width).toBeLessThan(canvasSize.width); + expect(spec.spec.hconcat[0].height).toBeLessThan(canvasSize.height); + expect(spec.spec.hconcat[0].encoding.y.axis.labelFontSize).toBeLessThan(13); + expect(spec.spec.hconcat[1].mark.fontSize).toBeLessThan(12); + + expect(() => compile(spec)).not.toThrow(); + }); + + it('rolls rows up within each facet without creating an undefined facet', () => { + const data = [ + { agency_type: 'state', agency: 'RVSN', launches: 100 }, + { agency_type: 'state', agency: 'UNKS', launches: 90 }, + { agency_type: 'state', agency: 'NASA', launches: 80 }, + { agency_type: 'state', agency: 'USAF', launches: 70 }, + { agency_type: 'private', agency: 'Arianespace', launches: 60 }, + { agency_type: 'private', agency: 'ILS-K', launches: 50 }, + { agency_type: 'private', agency: 'ULA', launches: 40 }, + { agency_type: 'private', agency: 'Boeing', launches: 30 }, + ]; + + const spec = assembleVegaLite({ + data: { values: data }, + semantic_types: semanticTypes, + chart_spec: { + chartType: 'Bar Table', + encodings: encoding, + canvasSize, + chartProperties: { maxRows: 3 }, + }, + }); + + const displayRows = spec.datasets.__bt_displayTable; + expect(displayRows).toHaveLength(6); + expect(displayRows.filter((row: any) => row.__bt_others)).toEqual([ + expect.objectContaining({ agency_type: 'state', agency: 'Others (+2)', launches: 150 }), + expect.objectContaining({ agency_type: 'private', agency: 'Others (+2)', launches: 70 }), + ]); + expect(displayRows.every((row: any) => row.agency_type === 'state' || row.agency_type === 'private')).toBe(true); + }); + + it('computes percentage totals within each facet', () => { + const data = [ + { agency_type: 'state', agency: 'RVSN', launches: 100 }, + { agency_type: 'state', agency: 'UNKS', launches: 50 }, + { agency_type: 'private', agency: 'Arianespace', launches: 40 }, + { agency_type: 'private', agency: 'ILS-K', launches: 10 }, + ]; + + const spec = assembleVegaLite({ + data: { values: data }, + semantic_types: semanticTypes, + chart_spec: { + chartType: 'Bar Table', + encodings: { ...encoding, color: { field: 'agency_type' } }, + canvasSize, + chartProperties: { showPercent: true }, + }, + }); + + expect(spec.spec.hconcat).toHaveLength(3); + const percentPanel = spec.spec.hconcat[1]; + expect(percentPanel.transform[0].groupby).toEqual(['agency_type', 'agency']); + expect(percentPanel.transform[1]).toEqual({ + joinaggregate: [{ op: 'sum', field: '__bt_val', as: '__bt_total' }], + groupby: ['agency_type'], + }); + expect(percentPanel.transform[2]).toEqual({ + calculate: 'datum.__bt_total === 0 ? null : datum.__bt_val / datum.__bt_total', + as: '__bt_pct', + }); + + expect(() => compile(spec)).not.toThrow(); + }); +}); diff --git a/tests/frontend/unit/views/ClarificationPanel.test.tsx b/tests/frontend/unit/views/ClarificationPanel.test.tsx index 9b22c156..86274f34 100644 --- a/tests/frontend/unit/views/ClarificationPanel.test.tsx +++ b/tests/frontend/unit/views/ClarificationPanel.test.tsx @@ -2,9 +2,13 @@ import React from 'react'; import '@testing-library/jest-dom/vitest'; import { fireEvent, render, screen } from '@testing-library/react'; import { describe, expect, it, vi } from 'vitest'; -import { ClarificationPanel } from '../../../../src/views/ClarificationPanel'; +import { ClarificationPanel } from '../../../../src/views/AgentPausePanel'; vi.mock('react-i18next', () => ({ + // The panel now lives in `AgentPausePanel.tsx` which transitively pulls + // in `dfSlice` → `i18n/index` → `.use(initReactI18next)`. Provide a no-op + // plugin shim so the i18n init code path succeeds under the mock. + initReactI18next: { type: '3rdParty', init: () => {} }, useTranslation: () => ({ t: (key: string, params?: Record) => { const labels: Record = { diff --git a/tests/frontend/unit/views/checkIsLikelyTextOnlyModel.test.ts b/tests/frontend/unit/views/checkIsLikelyTextOnlyModel.test.ts deleted file mode 100644 index 5e32b2e4..00000000 --- a/tests/frontend/unit/views/checkIsLikelyTextOnlyModel.test.ts +++ /dev/null @@ -1,58 +0,0 @@ -import { describe, it, expect } from 'vitest'; -import { checkIsLikelyTextOnlyModel, checkModelSupportsImageInput } from '../../../../src/views/DataLoadingChat'; - -describe('checkIsLikelyTextOnlyModel', () => { - it('returns true for deepseek-chat', () => { - expect(checkIsLikelyTextOnlyModel('deepseek-chat')).toBe(true); - }); - - it('returns true for DeepSeek-Chat (case-insensitive)', () => { - expect(checkIsLikelyTextOnlyModel('DeepSeek-Chat')).toBe(true); - }); - - it('returns true when deepseek-chat is a substring', () => { - expect(checkIsLikelyTextOnlyModel('provider/deepseek-chat-v2')).toBe(true); - }); - - it('returns false for gpt-4o (multimodal)', () => { - expect(checkIsLikelyTextOnlyModel('gpt-4o')).toBe(false); - }); - - it('returns false for claude-sonnet-4-20250514', () => { - expect(checkIsLikelyTextOnlyModel('claude-sonnet-4-20250514')).toBe(false); - }); - - it('returns false for gemini-2.5-pro', () => { - expect(checkIsLikelyTextOnlyModel('gemini-2.5-pro')).toBe(false); - }); - - it('returns false for deepseek-reasoner (vision-capable)', () => { - expect(checkIsLikelyTextOnlyModel('deepseek-reasoner')).toBe(false); - }); - - it('returns false for undefined', () => { - expect(checkIsLikelyTextOnlyModel(undefined)).toBe(false); - }); - - it('returns false for empty string', () => { - expect(checkIsLikelyTextOnlyModel('')).toBe(false); - }); -}); - -describe('checkModelSupportsImageInput', () => { - it('returns false when the model is explicitly marked text-only', () => { - expect(checkModelSupportsImageInput({ model: 'gpt-4o', supports_vision: false })).toBe(false); - }); - - it('returns false for known text-only model names', () => { - expect(checkModelSupportsImageInput({ model: 'deepseek-chat' })).toBe(false); - }); - - it('returns true for likely multimodal models', () => { - expect(checkModelSupportsImageInput({ model: 'gpt-4o' })).toBe(true); - }); - - it('returns false without an active model', () => { - expect(checkModelSupportsImageInput(undefined)).toBe(false); - }); -}); diff --git a/tests/test_plan.md b/tests/test_plan.md index 1ed6ddc0..4f485cd9 100644 --- a/tests/test_plan.md +++ b/tests/test_plan.md @@ -223,7 +223,6 @@ multi-user + no sandbox → `logger.critical` warning, safe configs → no warni - `DataRecAgent` — given a mock LLM response, produces correct Vega-Lite spec + code - `DataTransformationAgent` — generates valid Python transformation code - `DataLoadAgent` — infers semantic types and suggests table names from raw data -- `DataCleanAgentStream` — streams cleaning suggestions - `CodeExplanationAgent` — produces explanation from code input - `ChartInsightAgent` — generates insight from chart spec - `DataAgent` — observe→think→act loop terminates correctly diff --git a/uv.lock b/uv.lock index 9e9e72d8..6a9c320e 100644 --- a/uv.lock +++ b/uv.lock @@ -579,7 +579,7 @@ wheels = [ [[package]] name = "data-formulator" -version = "0.7.0a2" +version = "0.7.0" source = { editable = "." } dependencies = [ { name = "azure-cosmos" }, @@ -1274,11 +1274,11 @@ wheels = [ [[package]] name = "idna" -version = "3.11" +version = "3.15" source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/6f/6d/0703ccc57f3a7233505399edb88de3cbd678da106337b9fcde432b65ed60/idna-3.11.tar.gz", hash = "sha256:795dafcc9c04ed0c1fb032c2aa73654d8e8c5023a7df64a53f39190ada629902", size = 194582, upload-time = "2025-10-12T14:55:20.501Z" } +sdist = { url = "https://files.pythonhosted.org/packages/82/77/7b3966d0b9d1d31a36ddf1746926a11dface89a83409bf1483f0237aa758/idna-3.15.tar.gz", hash = "sha256:ca962446ea538f7092a95e057da437618e886f4d349216d2b1e294abfdb65fdc", size = 199245, upload-time = "2026-05-12T22:45:57.011Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/0e/61/66938bbb5fc52dbdf84594873d5b51fb1f7c7794e9c0f5bd885f30bc507b/idna-3.11-py3-none-any.whl", hash = "sha256:771a87f49d9defaf64091e6e6fe9c18d4833f140bd19464795bc32d966ca37ea", size = 71008, upload-time = "2025-10-12T14:55:18.883Z" }, + { url = "https://files.pythonhosted.org/packages/d2/23/408243171aa9aaba178d3e2559159c24c1171a641aa83b67bdd3394ead8e/idna-3.15-py3-none-any.whl", hash = "sha256:048adeaf8c2d788c40fee287673ccaa74c24ffd8dcf09ffa555a2fbb59f10ac8", size = 72340, upload-time = "2026-05-12T22:45:55.733Z" }, ] [[package]] diff --git a/yarn.lock b/yarn.lock index 08783c24..f44d9bf0 100644 --- a/yarn.lock +++ b/yarn.lock @@ -131,10 +131,10 @@ "@babel/helper-string-parser" "^7.27.1" "@babel/helper-validator-identifier" "^7.28.5" -"@base-ui/utils@^0.2.6": - version "0.2.7" - resolved "https://registry.npmjs.org/@base-ui/utils/-/utils-0.2.7.tgz#a7a57d08af6d02a905c0ca5e87b6dde85597046d" - integrity sha512-nXYKhiL/0JafyJE8PfcflipGftOftlIwKd72rU15iZ1M5yqgg5J9P8NHU71GReDuXco5MJA/eVQqUT5WRqX9sA== +"@base-ui/utils@^0.2.8": + version "0.2.9" + resolved "https://registry.npmjs.org/@base-ui/utils/-/utils-0.2.9.tgz" + integrity sha512-x/PDDCYzoqPpjrdyb3VcyylTI2IjUXEtYDGi5foh7KsnmNJIIaVwA2GLgDH1dps1GgXiJbA60hM+AyuTfQzIvw== dependencies: "@babel/runtime" "^7.29.2" "@floating-ui/utils" "^0.2.11" @@ -290,132 +290,132 @@ "@esbuild/aix-ppc64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/aix-ppc64/-/aix-ppc64-0.27.7.tgz#82b74f92aa78d720b714162939fb248c90addf53" + resolved "https://registry.npmjs.org/@esbuild/aix-ppc64/-/aix-ppc64-0.27.7.tgz" integrity sha512-EKX3Qwmhz1eMdEJokhALr0YiD0lhQNwDqkPYyPhiSwKrh7/4KRjQc04sZ8db+5DVVnZ1LmbNDI1uAMPEUBnQPg== "@esbuild/android-arm64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/android-arm64/-/android-arm64-0.27.7.tgz#f78cb8a3121fc205a53285adb24972db385d185d" + resolved "https://registry.npmjs.org/@esbuild/android-arm64/-/android-arm64-0.27.7.tgz" integrity sha512-62dPZHpIXzvChfvfLJow3q5dDtiNMkwiRzPylSCfriLvZeq0a1bWChrGx/BbUbPwOrsWKMn8idSllklzBy+dgQ== "@esbuild/android-arm@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/android-arm/-/android-arm-0.27.7.tgz#593e10a1450bbfcac6cb321f61f468453bac209d" + resolved "https://registry.npmjs.org/@esbuild/android-arm/-/android-arm-0.27.7.tgz" integrity sha512-jbPXvB4Yj2yBV7HUfE2KHe4GJX51QplCN1pGbYjvsyCZbQmies29EoJbkEc+vYuU5o45AfQn37vZlyXy4YJ8RQ== "@esbuild/android-x64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/android-x64/-/android-x64-0.27.7.tgz#453143d073326033d2d22caf9e48de4bae274b07" + resolved "https://registry.npmjs.org/@esbuild/android-x64/-/android-x64-0.27.7.tgz" integrity sha512-x5VpMODneVDb70PYV2VQOmIUUiBtY3D3mPBG8NxVk5CogneYhkR7MmM3yR/uMdITLrC1ml/NV1rj4bMJuy9MCg== "@esbuild/darwin-arm64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/darwin-arm64/-/darwin-arm64-0.27.7.tgz#6f23000fb9b40b7e04b7d0606c0693bd0632f322" + resolved "https://registry.npmjs.org/@esbuild/darwin-arm64/-/darwin-arm64-0.27.7.tgz" integrity sha512-5lckdqeuBPlKUwvoCXIgI2D9/ABmPq3Rdp7IfL70393YgaASt7tbju3Ac+ePVi3KDH6N2RqePfHnXkaDtY9fkw== "@esbuild/darwin-x64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/darwin-x64/-/darwin-x64-0.27.7.tgz#27393dd18bb1263c663979c5f1576e00c2d024be" + resolved "https://registry.npmjs.org/@esbuild/darwin-x64/-/darwin-x64-0.27.7.tgz" integrity sha512-rYnXrKcXuT7Z+WL5K980jVFdvVKhCHhUwid+dDYQpH+qu+TefcomiMAJpIiC2EM3Rjtq0sO3StMV/+3w3MyyqQ== "@esbuild/freebsd-arm64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/freebsd-arm64/-/freebsd-arm64-0.27.7.tgz#22e4638fa502d1c0027077324c97640e3adf3a62" + resolved "https://registry.npmjs.org/@esbuild/freebsd-arm64/-/freebsd-arm64-0.27.7.tgz" integrity sha512-B48PqeCsEgOtzME2GbNM2roU29AMTuOIN91dsMO30t+Ydis3z/3Ngoj5hhnsOSSwNzS+6JppqWsuhTp6E82l2w== "@esbuild/freebsd-x64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/freebsd-x64/-/freebsd-x64-0.27.7.tgz#9224b8e4fea924ce2194e3efc3e9aebf822192d6" + resolved "https://registry.npmjs.org/@esbuild/freebsd-x64/-/freebsd-x64-0.27.7.tgz" integrity sha512-jOBDK5XEjA4m5IJK3bpAQF9/Lelu/Z9ZcdhTRLf4cajlB+8VEhFFRjWgfy3M1O4rO2GQ/b2dLwCUGpiF/eATNQ== "@esbuild/linux-arm64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/linux-arm64/-/linux-arm64-0.27.7.tgz#4f5d1c27527d817b35684ae21419e57c2bda0966" + resolved "https://registry.npmjs.org/@esbuild/linux-arm64/-/linux-arm64-0.27.7.tgz" integrity sha512-RZPHBoxXuNnPQO9rvjh5jdkRmVizktkT7TCDkDmQ0W2SwHInKCAV95GRuvdSvA7w4VMwfCjUiPwDi0ZO6Nfe9A== "@esbuild/linux-arm@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/linux-arm/-/linux-arm-0.27.7.tgz#b9e9d070c8c1c0449cf12b20eac37d70a4595921" + resolved "https://registry.npmjs.org/@esbuild/linux-arm/-/linux-arm-0.27.7.tgz" integrity sha512-RkT/YXYBTSULo3+af8Ib0ykH8u2MBh57o7q/DAs3lTJlyVQkgQvlrPTnjIzzRPQyavxtPtfg0EopvDyIt0j1rA== "@esbuild/linux-ia32@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/linux-ia32/-/linux-ia32-0.27.7.tgz#3f80fb696aa96051a94047f35c85b08b21c36f9e" + resolved "https://registry.npmjs.org/@esbuild/linux-ia32/-/linux-ia32-0.27.7.tgz" integrity sha512-GA48aKNkyQDbd3KtkplYWT102C5sn/EZTY4XROkxONgruHPU72l+gW+FfF8tf2cFjeHaRbWpOYa/uRBz/Xq1Pg== "@esbuild/linux-loong64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/linux-loong64/-/linux-loong64-0.27.7.tgz#9be1f2c28210b13ebb4156221bba356fe1675205" + resolved "https://registry.npmjs.org/@esbuild/linux-loong64/-/linux-loong64-0.27.7.tgz" integrity sha512-a4POruNM2oWsD4WKvBSEKGIiWQF8fZOAsycHOt6JBpZ+JN2n2JH9WAv56SOyu9X5IqAjqSIPTaJkqN8F7XOQ5Q== "@esbuild/linux-mips64el@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/linux-mips64el/-/linux-mips64el-0.27.7.tgz#4ab5ee67a3dfcbcb5e8fd7883dae6e735b1163b8" + resolved "https://registry.npmjs.org/@esbuild/linux-mips64el/-/linux-mips64el-0.27.7.tgz" integrity sha512-KabT5I6StirGfIz0FMgl1I+R1H73Gp0ofL9A3nG3i/cYFJzKHhouBV5VWK1CSgKvVaG4q1RNpCTR2LuTVB3fIw== "@esbuild/linux-ppc64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/linux-ppc64/-/linux-ppc64-0.27.7.tgz#dac78c689f6499459c4321e5c15032c12307e7ea" + resolved "https://registry.npmjs.org/@esbuild/linux-ppc64/-/linux-ppc64-0.27.7.tgz" integrity sha512-gRsL4x6wsGHGRqhtI+ifpN/vpOFTQtnbsupUF5R5YTAg+y/lKelYR1hXbnBdzDjGbMYjVJLJTd2OFmMewAgwlQ== "@esbuild/linux-riscv64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/linux-riscv64/-/linux-riscv64-0.27.7.tgz#050f7d3b355c3a98308e935bc4d6325da91b0027" + resolved "https://registry.npmjs.org/@esbuild/linux-riscv64/-/linux-riscv64-0.27.7.tgz" integrity sha512-hL25LbxO1QOngGzu2U5xeXtxXcW+/GvMN3ejANqXkxZ/opySAZMrc+9LY/WyjAan41unrR3YrmtTsUpwT66InQ== "@esbuild/linux-s390x@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/linux-s390x/-/linux-s390x-0.27.7.tgz#d61f715ce61d43fe5844ad0d8f463f88cbe4fef6" + resolved "https://registry.npmjs.org/@esbuild/linux-s390x/-/linux-s390x-0.27.7.tgz" integrity sha512-2k8go8Ycu1Kb46vEelhu1vqEP+UeRVj2zY1pSuPdgvbd5ykAw82Lrro28vXUrRmzEsUV0NzCf54yARIK8r0fdw== "@esbuild/linux-x64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/linux-x64/-/linux-x64-0.27.7.tgz#ca8e1aa478fc8209257bf3ac8f79c4dc2982f32a" + resolved "https://registry.npmjs.org/@esbuild/linux-x64/-/linux-x64-0.27.7.tgz" integrity sha512-hzznmADPt+OmsYzw1EE33ccA+HPdIqiCRq7cQeL1Jlq2gb1+OyWBkMCrYGBJ+sxVzve2ZJEVeePbLM2iEIZSxA== "@esbuild/netbsd-arm64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/netbsd-arm64/-/netbsd-arm64-0.27.7.tgz#1650f2c1b948deeb3ef948f2fc30614723c09690" + resolved "https://registry.npmjs.org/@esbuild/netbsd-arm64/-/netbsd-arm64-0.27.7.tgz" integrity sha512-b6pqtrQdigZBwZxAn1UpazEisvwaIDvdbMbmrly7cDTMFnw/+3lVxxCTGOrkPVnsYIosJJXAsILG9XcQS+Yu6w== "@esbuild/netbsd-x64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/netbsd-x64/-/netbsd-x64-0.27.7.tgz#65772ab342c4b3319bf0705a211050aac1b6e320" + resolved "https://registry.npmjs.org/@esbuild/netbsd-x64/-/netbsd-x64-0.27.7.tgz" integrity sha512-OfatkLojr6U+WN5EDYuoQhtM+1xco+/6FSzJJnuWiUw5eVcicbyK3dq5EeV/QHT1uy6GoDhGbFpprUiHUYggrw== "@esbuild/openbsd-arm64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/openbsd-arm64/-/openbsd-arm64-0.27.7.tgz#37ed7cfa66549d7955852fce37d0c3de4e715ea1" + resolved "https://registry.npmjs.org/@esbuild/openbsd-arm64/-/openbsd-arm64-0.27.7.tgz" integrity sha512-AFuojMQTxAz75Fo8idVcqoQWEHIXFRbOc1TrVcFSgCZtQfSdc1RXgB3tjOn/krRHENUB4j00bfGjyl2mJrU37A== "@esbuild/openbsd-x64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/openbsd-x64/-/openbsd-x64-0.27.7.tgz#01bf3d385855ef50cb33db7c4b52f957c34cd179" + resolved "https://registry.npmjs.org/@esbuild/openbsd-x64/-/openbsd-x64-0.27.7.tgz" integrity sha512-+A1NJmfM8WNDv5CLVQYJ5PshuRm/4cI6WMZRg1by1GwPIQPCTs1GLEUHwiiQGT5zDdyLiRM/l1G0Pv54gvtKIg== "@esbuild/openharmony-arm64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/openharmony-arm64/-/openharmony-arm64-0.27.7.tgz#6c1f94b34086599aabda4eac8f638294b9877410" + resolved "https://registry.npmjs.org/@esbuild/openharmony-arm64/-/openharmony-arm64-0.27.7.tgz" integrity sha512-+KrvYb/C8zA9CU/g0sR6w2RBw7IGc5J2BPnc3dYc5VJxHCSF1yNMxTV5LQ7GuKteQXZtspjFbiuW5/dOj7H4Yw== "@esbuild/sunos-x64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/sunos-x64/-/sunos-x64-0.27.7.tgz#4b0dd17ae0a6941d2d0fd35a906392517071a90d" + resolved "https://registry.npmjs.org/@esbuild/sunos-x64/-/sunos-x64-0.27.7.tgz" integrity sha512-ikktIhFBzQNt/QDyOL580ti9+5mL/YZeUPKU2ivGtGjdTYoqz6jObj6nOMfhASpS4GU4Q/Clh1QtxWAvcYKamA== "@esbuild/win32-arm64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/win32-arm64/-/win32-arm64-0.27.7.tgz#34193ab5565d6ff68ca928ac04be75102ccb2e77" + resolved "https://registry.npmjs.org/@esbuild/win32-arm64/-/win32-arm64-0.27.7.tgz" integrity sha512-7yRhbHvPqSpRUV7Q20VuDwbjW5kIMwTHpptuUzV+AA46kiPze5Z7qgt6CLCK3pWFrHeNfDd1VKgyP4O+ng17CA== "@esbuild/win32-ia32@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/win32-ia32/-/win32-ia32-0.27.7.tgz#eb67f0e4482515d8c1894ede631c327a4da9fc4d" + resolved "https://registry.npmjs.org/@esbuild/win32-ia32/-/win32-ia32-0.27.7.tgz" integrity sha512-SmwKXe6VHIyZYbBLJrhOoCJRB/Z1tckzmgTLfFYOfpMAx63BJEaL9ExI8x7v0oAO3Zh6D/Oi1gVxEYr5oUCFhw== "@esbuild/win32-x64@0.27.7": version "0.27.7" - resolved "https://registry.npmjs.org/@esbuild/win32-x64/-/win32-x64-0.27.7.tgz#8fe30b3088b89b4873c3a6cc87597ae3920c0a8b" + resolved "https://registry.npmjs.org/@esbuild/win32-x64/-/win32-x64-0.27.7.tgz" integrity sha512-56hiAJPhwQ1R4i+21FVF7V8kSD5zZTdHcVuRFMW0hn753vVfQN8xlx4uOPT4xoGH0Z/oVATuR82AiqSTDIpaHg== "@eslint-community/eslint-utils@^4.8.0", "@eslint-community/eslint-utils@^4.9.1": @@ -681,14 +681,14 @@ "@mui/types@^9.0.0": version "9.0.0" - resolved "https://registry.npmjs.org/@mui/types/-/types-9.0.0.tgz#92d8c64e72cb863ee59108cb20cc476d648a3ab9" + resolved "https://registry.npmjs.org/@mui/types/-/types-9.0.0.tgz" integrity sha512-i1cuFCAWN44b3AJWO7mh7tuh1sqbQSeVr/94oG0TX5uXivac8XalgE4/6fQZcmGZigzbQ35IXxj/4jLpRIBYZg== dependencies: "@babel/runtime" "^7.29.2" "@mui/utils@9.0.0": version "9.0.0" - resolved "https://registry.npmjs.org/@mui/utils/-/utils-9.0.0.tgz#25b563ccbf537feba5f89c37a00cb8e6eea45ad0" + resolved "https://registry.npmjs.org/@mui/utils/-/utils-9.0.0.tgz" integrity sha512-bQcqyg/gjULUqTuyUjSAFr6LQGLvtkNtDbJerAtoUn9kGZ0hg5QJiN1PLHMLbeFpe3te1831uq7GFl2ITokGdg== dependencies: "@babel/runtime" "^7.29.2" @@ -710,25 +710,25 @@ prop-types "^15.8.1" react-is "^19.2.3" -"@mui/x-internals@^9.0.0": - version "9.0.0" - resolved "https://registry.npmjs.org/@mui/x-internals/-/x-internals-9.0.0.tgz#8851a058e09b719690b4f398319805239e923855" - integrity sha512-E/4rdg69JjhyybpPGypCjAKSKLLnSdCFM+O6P/nkUg47+qt3uftxQEhjQO53rcn6ahHl6du/uNZ9BLgeY6kYxQ== +"@mui/x-internals@^9.1.0": + version "9.1.0" + resolved "https://registry.npmjs.org/@mui/x-internals/-/x-internals-9.1.0.tgz" + integrity sha512-fVezTa1lU+Hb3y9UMI8D/iWXADhs0I8PaZqoh2LOUXjGEUJmKqwsRD19ZXInZsH2yu+YS0dqYMPDvzjYTTyo+Q== dependencies: - "@babel/runtime" "^7.28.6" + "@babel/runtime" "^7.29.2" "@mui/utils" "9.0.0" reselect "^5.1.1" use-sync-external-store "^1.6.0" "@mui/x-tree-view@^9.0.1": - version "9.0.1" - resolved "https://registry.npmjs.org/@mui/x-tree-view/-/x-tree-view-9.0.1.tgz#0bc1feb262ef5648ec44f00b6e1125ddb34bcbb8" - integrity sha512-vUSxqg5dAbaHJsifc/PjFhf57vQWjMSK9cJKPy1SXcAFHWhXSJ1oCHpUwfQ7hjKvkeqf8emKxtCVMXNVRRhIWw== + version "9.1.0" + resolved "https://registry.npmjs.org/@mui/x-tree-view/-/x-tree-view-9.1.0.tgz" + integrity sha512-Pht6/VS5cmKBCKr7MfbvS+XY4vPKpzDEKZVaSpUiv4ClnSsm9kJeSbzziA1MM1yeEmJx40XyNqZKUCslLO76kw== dependencies: - "@babel/runtime" "^7.28.6" - "@base-ui/utils" "^0.2.6" + "@babel/runtime" "^7.29.2" + "@base-ui/utils" "^0.2.8" "@mui/utils" "9.0.0" - "@mui/x-internals" "^9.0.0" + "@mui/x-internals" "^9.1.0" "@types/react-transition-group" "^4.4.12" clsx "^2.1.1" prop-types "^15.8.1" @@ -830,7 +830,7 @@ "@react-dnd/asap@^4.0.0": version "4.0.1" - resolved "https://registry.yarnpkg.com/@react-dnd/asap/-/asap-4.0.1.tgz#5291850a6b58ce6f2da25352a64f1b0674871aab" + resolved "https://registry.npmjs.org/@react-dnd/asap/-/asap-4.0.1.tgz" integrity sha512-kLy0PJDDwvwwTXxqTFNAAllPHD73AycE9ypWeln/IguoGBEbvFcPDbCV03G52bEcC5E+YgupBE0VzHGdC8SIXg== "@react-dnd/asap@^5.0.1": @@ -840,7 +840,7 @@ "@react-dnd/invariant@^2.0.0": version "2.0.0" - resolved "https://registry.yarnpkg.com/@react-dnd/invariant/-/invariant-2.0.0.tgz#09d2e81cd39e0e767d7da62df9325860f24e517e" + resolved "https://registry.npmjs.org/@react-dnd/invariant/-/invariant-2.0.0.tgz" integrity sha512-xL4RCQBCBDJ+GRwKTFhGUW8GXa4yoDfJrPbLblc3U09ciS+9ZJXJ3Qrcs/x2IODOdIE5kQxvMmE2UKyqUictUw== "@react-dnd/invariant@^4.0.1": @@ -850,7 +850,7 @@ "@react-dnd/shallowequal@^2.0.0": version "2.0.0" - resolved "https://registry.yarnpkg.com/@react-dnd/shallowequal/-/shallowequal-2.0.0.tgz#a3031eb54129f2c66b2753f8404266ec7bf67f0a" + resolved "https://registry.npmjs.org/@react-dnd/shallowequal/-/shallowequal-2.0.0.tgz" integrity sha512-Pc/AFTdwZwEKJxFJvlxrSmGe/di+aAOBn60sremrpLo6VI/6cmiUYNNwlI5KNYttg7uypzA3ILPMPgxB2GYZEg== "@react-dnd/shallowequal@^4.0.1": @@ -883,135 +883,135 @@ resolved "https://registry.npmjs.org/@rolldown/pluginutils/-/pluginutils-1.0.0-beta.27.tgz" integrity sha512-+d0F4MKMCbeVUJwG96uQ4SgAznZNSq93I3V+9NHA4OpvqG8mRCpGdKmK8l/dl02h2CCDHwW2FqilnTyDcAnqjA== -"@rollup/rollup-android-arm-eabi@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-android-arm-eabi/-/rollup-android-arm-eabi-4.60.3.tgz#31503ca40424374cd6c5198031cf4d5a73de9727" - integrity sha512-x35CNW/ANXG3hE/EZpRU8MXX1JDN86hBb2wMGAtltkz7pc6cxgjpy1OMMfDosOQ+2hWqIkag/fGok1Yady9nGw== - -"@rollup/rollup-android-arm64@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-android-arm64/-/rollup-android-arm64-4.60.3.tgz#7cbc30c88507013d0f982cfeb8884337ba1e0bb2" - integrity sha512-xw3xtkDApIOGayehp2+Rz4zimfkaX65r4t47iy+ymQB2G4iJCBBfj0ogVg5jpvjpn8UWn/+q9tprxleYeNp3Hw== - -"@rollup/rollup-darwin-arm64@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-darwin-arm64/-/rollup-darwin-arm64-4.60.3.tgz#bc341a93bb2111326a2865f55d1d23baedecf40c" - integrity sha512-vo6Y5Qfpx7/5EaamIwi0WqW2+zfiusVihKatLvtN1VFVy3D13uERk/6gZLU1UiHRL6fDXqj/ELIeVRGnvcTE1g== - -"@rollup/rollup-darwin-x64@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-darwin-x64/-/rollup-darwin-x64-4.60.3.tgz#dfa0236581c55ecc0bcaeb2ea1f2e800c58dc3e2" - integrity sha512-D+0QGcZhBzTN82weOnsSlY7V7+RMmPuF1CkbxyMAGE8+ZHeUjyb76ZiWmBlCu//AQQONvxcqRbwZTajZKqjuOw== - -"@rollup/rollup-freebsd-arm64@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-freebsd-arm64/-/rollup-freebsd-arm64-4.60.3.tgz#4c5977413b87808a13b5edd524e46fafddb85b52" - integrity sha512-6HnvHCT7fDyj6R0Ph7A6x8dQS/S38MClRWeDLqc0MdfWkxjiu1HSDYrdPhqSILzjTIC/pnXbbJbo+ft+gy/9hQ== - -"@rollup/rollup-freebsd-x64@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-freebsd-x64/-/rollup-freebsd-x64-4.60.3.tgz#5cb2cee62ffee3ada4a0b44353e96cf98cfc7c3c" - integrity sha512-KHLgC3WKlUYW3ShFKnnosZDOJ0xjg9zp7au3sIm2bs/tGBeC2ipmvRh/N7JKi0t9Ue20C0dpEshi8WUubg+cnA== - -"@rollup/rollup-linux-arm-gnueabihf@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-linux-arm-gnueabihf/-/rollup-linux-arm-gnueabihf-4.60.3.tgz#04700cad36dd43ae81044fe7ee73e925845c4b85" - integrity sha512-DV6fJoxEYWJOvaZIsok7KrYl0tPvga5OZ2yvKHNNYyk/2roMLqQAbGhr78EQ5YhHpnhLKJD3S1WFusAkmUuV5g== - -"@rollup/rollup-linux-arm-musleabihf@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-linux-arm-musleabihf/-/rollup-linux-arm-musleabihf-4.60.3.tgz#548ebf3997b3a6dcc7cdd7da813ff0c46000ac0a" - integrity sha512-mQKoJAzvuOs6F+TZybQO4GOTSMUu7v0WdxEk24krQ/uUxXoPTtHjuaUuPmFhtBcM4K0ons8nrE3JyhTuCFtT/w== - -"@rollup/rollup-linux-arm64-gnu@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-linux-arm64-gnu/-/rollup-linux-arm64-gnu-4.60.3.tgz#0264608f504b33725639ebe93be02c40e71a35c1" - integrity sha512-Whjj2qoiJ6+OOJMGptTYazaJvjOJm+iKHpXQM1P3LzGjt7Ff++Tp7nH4N8J/BUA7R9IHfDyx4DJIflifwnbmIA== - -"@rollup/rollup-linux-arm64-musl@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-linux-arm64-musl/-/rollup-linux-arm64-musl-4.60.3.tgz#147cf4889502cd3b331a800b8ca6741f87873079" - integrity sha512-4YTNHKqGng5+yiZt3mg77nmyuCfmNfX4fPmyUapBcIk+BdwSwmCWGXOUxhXbBEkFHtoN5boLj/5NON+u5QC9tg== - -"@rollup/rollup-linux-loong64-gnu@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-linux-loong64-gnu/-/rollup-linux-loong64-gnu-4.60.3.tgz#0c27c6b5258dcb3d0290e3bd04ba6277c9d7e541" - integrity sha512-SU3kNlhkpI4UqlUc2VXPGK9o886ZsSeGfMAX2ba2b8DKmMXq4AL7KUrkSWVbb7koVqx41Yczx6dx5PNargIrEA== - -"@rollup/rollup-linux-loong64-musl@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-linux-loong64-musl/-/rollup-linux-loong64-musl-4.60.3.tgz#f0f18075ea0bfa2c992f8e3933b39b6ef91f7799" - integrity sha512-6lDLl5h4TXpB1mTf2rQWnAk/LcXrx9vBfu/DT5TIPhvMhRWaZ5MxkIc8u4lJAmBo6klTe1ywXIUHFjylW505sg== - -"@rollup/rollup-linux-ppc64-gnu@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-linux-ppc64-gnu/-/rollup-linux-ppc64-gnu-4.60.3.tgz#149bb5cb8893589ffaa1924b4eac4282e9fa4c69" - integrity sha512-BMo8bOw8evlup/8G+cj5xWtPyp93xPdyoSN16Zy90Q2QZ0ZYRhCt6ZJSwbrRzG9HApFabjwj2p25TUPDWrhzqQ== - -"@rollup/rollup-linux-ppc64-musl@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-linux-ppc64-musl/-/rollup-linux-ppc64-musl-4.60.3.tgz#200a063e298b05f996917d2aa53de749d54c0ca0" - integrity sha512-E0L8X1dZN1/Rph+5VPF6Xj2G7JJvMACVXtamTJIDrVI44Y3K+G8gQaMEAavbqCGTa16InptiVrX6eM6pmJ+7qA== - -"@rollup/rollup-linux-riscv64-gnu@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-linux-riscv64-gnu/-/rollup-linux-riscv64-gnu-4.60.3.tgz#6d6d6eb996197ba86f95f9a6c442bc862f0756d4" - integrity sha512-oZJ/WHaVfHUiRAtmTAeo3DcevNsVvH8mbvodjZy7D5QKvCefO371SiKRpxoDcCxB3PTRTLayWBkvmDQKTcX/sw== - -"@rollup/rollup-linux-riscv64-musl@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-linux-riscv64-musl/-/rollup-linux-riscv64-musl-4.60.3.tgz#9deb86001785cfcbc761457f50cd7c112fda0df9" - integrity sha512-Dhbyh7j9FybM3YaTgaHmVALwA8AkUwTPccyCQ79TG9AJUsMQqgN1DDEZNr4+QUfwiWvLDumW5vdwzoeUF+TNxQ== - -"@rollup/rollup-linux-s390x-gnu@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-linux-s390x-gnu/-/rollup-linux-s390x-gnu-4.60.3.tgz#d8228720c6e42da190d96c31a3495d70cf8284b9" - integrity sha512-cJd1X5XhHHlltkaypz1UcWLA8AcoIi1aWhsvaWDskD1oz2eKCypnqvTQ8ykMNI0RSmm7NkTdSqSSD7zM0xa6Ig== - -"@rollup/rollup-linux-x64-gnu@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-linux-x64-gnu/-/rollup-linux-x64-gnu-4.60.3.tgz#df6bb38617a66a842bd2aeac9560cd729d084258" - integrity sha512-DAZDBHQfG2oQuhY7mc6I3/qB4LU2fQCjRvxbDwd/Jdvb9fypP4IJ4qmtu6lNjes6B531AI8cg1aKC2di97bUxA== +"@rollup/rollup-android-arm-eabi@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-android-arm-eabi/-/rollup-android-arm-eabi-4.60.0.tgz#7e158ddfc16f78da99c0d5ccbae6cae403ef3284" + integrity sha512-WOhNW9K8bR3kf4zLxbfg6Pxu2ybOUbB2AjMDHSQx86LIF4rH4Ft7vmMwNt0loO0eonglSNy4cpD3MKXXKQu0/A== + +"@rollup/rollup-android-arm64@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-android-arm64/-/rollup-android-arm64-4.60.0.tgz#49f4ae0e22b6f9ffbcd3818b9a0758fa2d10b1cd" + integrity sha512-u6JHLll5QKRvjciE78bQXDmqRqNs5M/3GVqZeMwvmjaNODJih/WIrJlFVEihvV0MiYFmd+ZyPr9wxOVbPAG2Iw== + +"@rollup/rollup-darwin-arm64@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-darwin-arm64/-/rollup-darwin-arm64-4.60.0.tgz" + integrity sha512-qEF7CsKKzSRc20Ciu2Zw1wRrBz4g56F7r/vRwY430UPp/nt1x21Q/fpJ9N5l47WWvJlkNCPJz3QRVw008fi7yA== + +"@rollup/rollup-darwin-x64@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-darwin-x64/-/rollup-darwin-x64-4.60.0.tgz#1bf7a92b27ebdd5e0d1d48503c7811160773be1a" + integrity sha512-WADYozJ4QCnXCH4wPB+3FuGmDPoFseVCUrANmA5LWwGmC6FL14BWC7pcq+FstOZv3baGX65tZ378uT6WG8ynTw== + +"@rollup/rollup-freebsd-arm64@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-freebsd-arm64/-/rollup-freebsd-arm64-4.60.0.tgz#5ccf537b99c5175008444702193ad0b1c36f7f16" + integrity sha512-6b8wGHJlDrGeSE3aH5mGNHBjA0TTkxdoNHik5EkvPHCt351XnigA4pS7Wsj/Eo9Y8RBU6f35cjN9SYmCFBtzxw== + +"@rollup/rollup-freebsd-x64@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-freebsd-x64/-/rollup-freebsd-x64-4.60.0.tgz#1196ecd7bf4e128624ef83cd1f9d785114474a77" + integrity sha512-h25Ga0t4jaylMB8M/JKAyrvvfxGRjnPQIR8lnCayyzEjEOx2EJIlIiMbhpWxDRKGKF8jbNH01NnN663dH638mA== + +"@rollup/rollup-linux-arm-gnueabihf@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-linux-arm-gnueabihf/-/rollup-linux-arm-gnueabihf-4.60.0.tgz#cc147633a4af229fee83a737bf2334fbac3dc28e" + integrity sha512-RzeBwv0B3qtVBWtcuABtSuCzToo2IEAIQrcyB/b2zMvBWVbjo8bZDjACUpnaafaxhTw2W+imQbP2BD1usasK4g== + +"@rollup/rollup-linux-arm-musleabihf@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-linux-arm-musleabihf/-/rollup-linux-arm-musleabihf-4.60.0.tgz#3559f9f060153ea54594a42c3b87a297bedcc26e" + integrity sha512-Sf7zusNI2CIU1HLzuu9Tc5YGAHEZs5Lu7N1ssJG4Tkw6e0MEsN7NdjUDDfGNHy2IU+ENyWT+L2obgWiguWibWQ== + +"@rollup/rollup-linux-arm64-gnu@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-linux-arm64-gnu/-/rollup-linux-arm64-gnu-4.60.0.tgz#e91f887b154123485cfc4b59befe2080fcd8f2df" + integrity sha512-DX2x7CMcrJzsE91q7/O02IJQ5/aLkVtYFryqCjduJhUfGKG6yJV8hxaw8pZa93lLEpPTP/ohdN4wFz7yp/ry9A== + +"@rollup/rollup-linux-arm64-musl@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-linux-arm64-musl/-/rollup-linux-arm64-musl-4.60.0.tgz#660752f040df9ba44a24765df698928917c0bf21" + integrity sha512-09EL+yFVbJZlhcQfShpswwRZ0Rg+z/CsSELFCnPt3iK+iqwGsI4zht3secj5vLEs957QvFFXnzAT0FFPIxSrkQ== + +"@rollup/rollup-linux-loong64-gnu@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-linux-loong64-gnu/-/rollup-linux-loong64-gnu-4.60.0.tgz#cb0e939a5fa479ccef264f3f45b31971695f869c" + integrity sha512-i9IcCMPr3EXm8EQg5jnja0Zyc1iFxJjZWlb4wr7U2Wx/GrddOuEafxRdMPRYVaXjgbhvqalp6np07hN1w9kAKw== + +"@rollup/rollup-linux-loong64-musl@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-linux-loong64-musl/-/rollup-linux-loong64-musl-4.60.0.tgz#42f86fbc82cd1a81be2d346476dd3231cf5ee442" + integrity sha512-DGzdJK9kyJ+B78MCkWeGnpXJ91tK/iKA6HwHxF4TAlPIY7GXEvMe8hBFRgdrR9Ly4qebR/7gfUs9y2IoaVEyog== + +"@rollup/rollup-linux-ppc64-gnu@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-linux-ppc64-gnu/-/rollup-linux-ppc64-gnu-4.60.0.tgz#39776a647a789dc95ea049277c5ef8f098df77f9" + integrity sha512-RwpnLsqC8qbS8z1H1AxBA1H6qknR4YpPR9w2XX0vo2Sz10miu57PkNcnHVaZkbqyw/kUWfKMI73jhmfi9BRMUQ== + +"@rollup/rollup-linux-ppc64-musl@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-linux-ppc64-musl/-/rollup-linux-ppc64-musl-4.60.0.tgz#466f20029a8e8b3bb2954c7ddebc9586420cac2c" + integrity sha512-Z8pPf54Ly3aqtdWC3G4rFigZgNvd+qJlOE52fmko3KST9SoGfAdSRCwyoyG05q1HrrAblLbk1/PSIV+80/pxLg== + +"@rollup/rollup-linux-riscv64-gnu@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-linux-riscv64-gnu/-/rollup-linux-riscv64-gnu-4.60.0.tgz#cff9877c78f12e7aa6246f6902ad913e99edb2b7" + integrity sha512-3a3qQustp3COCGvnP4SvrMHnPQ9d1vzCakQVRTliaz8cIp/wULGjiGpbcqrkv0WrHTEp8bQD/B3HBjzujVWLOA== + +"@rollup/rollup-linux-riscv64-musl@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-linux-riscv64-musl/-/rollup-linux-riscv64-musl-4.60.0.tgz#9a762fb99b5a82a921017f56491b7e892b9fb17d" + integrity sha512-pjZDsVH/1VsghMJ2/kAaxt6dL0psT6ZexQVrijczOf+PeP2BUqTHYejk3l6TlPRydggINOeNRhvpLa0AYpCWSQ== + +"@rollup/rollup-linux-s390x-gnu@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-linux-s390x-gnu/-/rollup-linux-s390x-gnu-4.60.0.tgz#9d25ad8ac7dab681935baf78ac5ea92d14629cdf" + integrity sha512-3ObQs0BhvPgiUVZrN7gqCSvmFuMWvWvsjG5ayJ3Lraqv+2KhOsp+pUbigqbeWqueGIsnn+09HBw27rJ+gYK4VQ== + +"@rollup/rollup-linux-x64-gnu@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-linux-x64-gnu/-/rollup-linux-x64-gnu-4.60.0.tgz#5e5139e11819fa38a052368da79422cb4afcf466" + integrity sha512-EtylprDtQPdS5rXvAayrNDYoJhIz1/vzN2fEubo3yLE7tfAw+948dO0g4M0vkTVFhKojnF+n6C8bDNe+gDRdTg== "@rollup/rollup-linux-x64-gnu@^4.24.4": - version "4.60.1" - resolved "https://registry.npmjs.org/@rollup/rollup-linux-x64-gnu/-/rollup-linux-x64-gnu-4.60.1.tgz#56a6a0d9076f2a05a976031493b24a20ddcc0e77" - integrity sha512-77PpsFQUCOiZR9+LQEFg9GClyfkNXj1MP6wRnzYs0EeWbPcHs02AXu4xuUbM1zhwn3wqaizle3AEYg5aeoohhg== - -"@rollup/rollup-linux-x64-musl@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-linux-x64-musl/-/rollup-linux-x64-musl-4.60.3.tgz#75e3e72849266b4fdd65f2da6c62423051e35636" - integrity sha512-cRxsE8c13mZOh3vP+wLDxpQBRrOHDIGOWyDL93Sy0Ga8y515fBcC2pjUfFwUe5T7tqvTvWbCpg1URM/AXdWIXA== - -"@rollup/rollup-openbsd-x64@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-openbsd-x64/-/rollup-openbsd-x64-4.60.3.tgz#e1080f0efb8b15cda39b3e62de5fb806079ab6e9" - integrity sha512-QaWcIgRxqEdQdhJqW4DJctsH6HCmo5vHxY0krHSX4jMtOqfzC+dqDGuHM87bu4H8JBeibWx7jFz+h6/4C8wA5Q== - -"@rollup/rollup-openharmony-arm64@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-openharmony-arm64/-/rollup-openharmony-arm64-4.60.3.tgz#1fbda2d95c29dbfceb62785431754cd5aab86c72" - integrity sha512-AaXwSvUi3QIPtroAUw1t5yHGIyqKEXwH54WUocFolZhpGDruJcs8c+xPNDRn4XiQsS7MEwnYsHW2l0MBLDMkWg== - -"@rollup/rollup-win32-arm64-msvc@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-win32-arm64-msvc/-/rollup-win32-arm64-msvc-4.60.3.tgz#deab3470815f97996f1d0d3608549cf1b7e4ffc2" - integrity sha512-65LAKM/bAWDqKNEelHlcHvm2V+Vfb8C6INFxQXRHCvaVN1rJfwr4NvdP4FyzUaLqWfaCGaadf6UbTm8xJeYfEg== - -"@rollup/rollup-win32-ia32-msvc@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-win32-ia32-msvc/-/rollup-win32-ia32-msvc-4.60.3.tgz#817acae2ed4572960b59235ff2322381b6d82f26" - integrity sha512-EEM2gyhBF5MFnI6vMKdX1LAosE627RGBzIoGMdLloPZkXrUN0Ckqgr2Qi8+J3zip/8NVVro3/FjB+tjhZUgUHA== - -"@rollup/rollup-win32-x64-gnu@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-win32-x64-gnu/-/rollup-win32-x64-gnu-4.60.3.tgz#48129be99b0250d76b9c6d0ac983bef563a1c48a" - integrity sha512-E5Eb5H/DpxaoXH++Qkv28RcUJboMopmdDUALBczvHMf7hNIxaDZqwY5lK12UK1BHacSmvupoEWGu+n993Z0y1A== - -"@rollup/rollup-win32-x64-msvc@4.60.3": - version "4.60.3" - resolved "https://registry.npmjs.org/@rollup/rollup-win32-x64-msvc/-/rollup-win32-x64-msvc-4.60.3.tgz#cc6f094a3ffe5556bb4a831ee6fb572b8cd81a75" - integrity sha512-hPt/bgL5cE+Qp+/TPHBqptcAgPzgj46mPcg/16zNUmbQk0j+mOEQV/+Lqu8QRtDV3Ek95Q6FeFITpuhl6OTsAA== + version "4.60.4" + resolved "https://registry.npmjs.org/@rollup/rollup-linux-x64-gnu/-/rollup-linux-x64-gnu-4.60.4.tgz#23c9bf79771d804fb87415eb0767569f273261e5" + integrity sha512-Boiz5+MsaROEWDf+GGEwF8VMHGhlUoQMtIPjOgA5fv4osupqTVnJteQNKJwUcnUog2G55jYXH7KZFFiJe0TEzQ== + +"@rollup/rollup-linux-x64-musl@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-linux-x64-musl/-/rollup-linux-x64-musl-4.60.0.tgz#b6211d46e11b1f945f5504cc794fce839331ed08" + integrity sha512-k09oiRCi/bHU9UVFqD17r3eJR9bn03TyKraCrlz5ULFJGdJGi7VOmm9jl44vOJvRJ6P7WuBi/s2A97LxxHGIdw== + +"@rollup/rollup-openbsd-x64@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-openbsd-x64/-/rollup-openbsd-x64-4.60.0.tgz#e6e09eebaa7012bb9c7331b437a9e992bd94ca35" + integrity sha512-1o/0/pIhozoSaDJoDcec+IVLbnRtQmHwPV730+AOD29lHEEo4F5BEUB24H0OBdhbBBDwIOSuf7vgg0Ywxdfiiw== + +"@rollup/rollup-openharmony-arm64@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-openharmony-arm64/-/rollup-openharmony-arm64-4.60.0.tgz#f7d99ae857032498e57a5e7259fb7100fd24a87e" + integrity sha512-pESDkos/PDzYwtyzB5p/UoNU/8fJo68vcXM9ZW2V0kjYayj1KaaUfi1NmTUTUpMn4UhU4gTuK8gIaFO4UGuMbA== + +"@rollup/rollup-win32-arm64-msvc@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-win32-arm64-msvc/-/rollup-win32-arm64-msvc-4.60.0.tgz#41e392f5d9f3bf1253fdaf2f6d6f6b1bfc452856" + integrity sha512-hj1wFStD7B1YBeYmvY+lWXZ7ey73YGPcViMShYikqKT1GtstIKQAtfUI6yrzPjAy/O7pO0VLXGmUVWXQMaYgTQ== + +"@rollup/rollup-win32-ia32-msvc@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-win32-ia32-msvc/-/rollup-win32-ia32-msvc-4.60.0.tgz#f41b0490be0e5d3cf459b4dc076a192b532adea9" + integrity sha512-SyaIPFoxmUPlNDq5EHkTbiKzmSEmq/gOYFI/3HHJ8iS/v1mbugVa7dXUzcJGQfoytp9DJFLhHH4U3/eTy2Bq4w== + +"@rollup/rollup-win32-x64-gnu@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-win32-x64-gnu/-/rollup-win32-x64-gnu-4.60.0.tgz#0fcf9f1fcb750f0317b13aac3b3231687e6397a5" + integrity sha512-RdcryEfzZr+lAr5kRm2ucN9aVlCCa2QNq4hXelZxb8GG0NJSazq44Z3PCCc8wISRuCVnGs0lQJVX5Vp6fKA+IA== + +"@rollup/rollup-win32-x64-msvc@4.60.0": + version "4.60.0" + resolved "https://registry.npmjs.org/@rollup/rollup-win32-x64-msvc/-/rollup-win32-x64-msvc-4.60.0.tgz#3afdb30405f6d4248df5e72e1ca86c5eab55fab8" + integrity sha512-PrsWNQ8BuE00O3Xsx3ALh2Df8fAj9+cvvX9AIA6o4KpATR98c9mud4XtDWVvsEuyia5U4tVSTKygawyJkjm60w== "@standard-schema/spec@^1.1.0": version "1.1.0" @@ -1263,6 +1263,26 @@ resolved "https://registry.npmjs.org/@tiptap/extension-strike/-/extension-strike-3.22.2.tgz" integrity sha512-YFC3elKU1L8PiGbcB6tqd/7vWPF5IbydJz0POJpHzSjstX+VfT8VsvS7ubxVuSIWQ11kGkH3mzX6LX8JHsHZxg== +"@tiptap/extension-table-cell@^3.23.6": + version "3.23.6" + resolved "https://registry.npmjs.org/@tiptap/extension-table-cell/-/extension-table-cell-3.23.6.tgz" + integrity sha512-hS9TmmvRlT9/ikT+0ukACS+hmJuii4zQaH47cg3oJkz/Fv7O7tL7GZniKtK6l2OUZGPhY+4SV2RkDB6bD7DXfw== + +"@tiptap/extension-table-header@^3.23.6": + version "3.23.6" + resolved "https://registry.npmjs.org/@tiptap/extension-table-header/-/extension-table-header-3.23.6.tgz" + integrity sha512-D6o0a1cJXUU0xWakainBFGPnGHinQkPcdu1YqGd/PoFANY38lnuZt/NW2O/OLfLXu5LXDRfpqF1+dsKww27dUA== + +"@tiptap/extension-table-row@^3.23.6": + version "3.23.6" + resolved "https://registry.npmjs.org/@tiptap/extension-table-row/-/extension-table-row-3.23.6.tgz" + integrity sha512-OauWVzkyRQg0rKOqM/a3PuKPc1S7YXMb1LRN7Nh8Ytvglvd7GFRTbl1lVqdZRaz4Jzopag4PQnriIZfMPUpxWw== + +"@tiptap/extension-table@^3.23.6": + version "3.23.6" + resolved "https://registry.npmjs.org/@tiptap/extension-table/-/extension-table-3.23.6.tgz" + integrity sha512-XbhZXjhsS6AP7ThoZxjAnNs+NiR81YRori25l6E+ORqB7quiPkIXOAi5h4AIpkn/CYIqze6ere11lWsYpDjtaQ== + "@tiptap/extension-text@^3.22.2": version "3.22.2" resolved "https://registry.npmjs.org/@tiptap/extension-text/-/extension-text-3.22.2.tgz" @@ -1569,7 +1589,7 @@ "@types/debug@^4.0.0": version "4.1.13" - resolved "https://registry.npmjs.org/@types/debug/-/debug-4.1.13.tgz#22d1cc9d542d3593caea764f974306ab36286ee7" + resolved "https://registry.npmjs.org/@types/debug/-/debug-4.1.13.tgz" integrity sha512-KSVgmQmzMwPlmtljOomayoR89W4FynCAi3E8PPs7vmDVPe84hT+vGPKkJfThkmXs0x0jAaa9U8uW8bbfyS2fWw== dependencies: "@types/ms" "*" @@ -1588,7 +1608,7 @@ "@types/estree-jsx@^1.0.0": version "1.0.5" - resolved "https://registry.npmjs.org/@types/estree-jsx/-/estree-jsx-1.0.5.tgz#858a88ea20f34fe65111f005a689fa1ebf70dc18" + resolved "https://registry.npmjs.org/@types/estree-jsx/-/estree-jsx-1.0.5.tgz" integrity sha512-52CcUVNFyfb1A2ALocQw/Dd1BQFNmSdkuC3BkZ6iqhdMfQz7JWOFRuJFloOzjk+6WijU56m9oKXFAXc7o3Towg== dependencies: "@types/estree" "*" @@ -1605,7 +1625,7 @@ "@types/hast@^3.0.0": version "3.0.4" - resolved "https://registry.npmjs.org/@types/hast/-/hast-3.0.4.tgz#1d6b39993b82cea6ad783945b0508c25903e15aa" + resolved "https://registry.npmjs.org/@types/hast/-/hast-3.0.4.tgz" integrity sha512-WPs+bbQw5aCj+x6laNGWLH3wviHtoCv/P3+otBhbOhJgG8qtpdAMlTCxLtsTWA7LH1Oh/bFCHsBn0TPS5m30EQ== dependencies: "@types/unist" "*" @@ -1655,7 +1675,7 @@ "@types/mdast@^4.0.0": version "4.0.4" - resolved "https://registry.npmjs.org/@types/mdast/-/mdast-4.0.4.tgz#7ccf72edd2f1aa7dd3437e180c64373585804dd6" + resolved "https://registry.npmjs.org/@types/mdast/-/mdast-4.0.4.tgz" integrity sha512-kGaNbPh1k7AFzgpud/gMdvIm5xuECykRR+JnWKQno9TAXVa6WIVCGTPvYGekIDL4uwCZQSYbUxNBSb1aUo79oA== dependencies: "@types/unist" "*" @@ -1672,7 +1692,7 @@ "@types/ms@*": version "2.1.0" - resolved "https://registry.npmjs.org/@types/ms/-/ms-2.1.0.tgz#052aa67a48eccc4309d7f0191b7e41434b90bb78" + resolved "https://registry.npmjs.org/@types/ms/-/ms-2.1.0.tgz" integrity sha512-GsCCIZDE/p3i96vtEqx+7dBUGXrc7zeSK3wwPHIaRThS+9OhWIXRqzs4d6k1SVU8g91DrNRWxWUGhp5KXQb2VA== "@types/node@^14.0.1": @@ -1721,19 +1741,12 @@ "@types/react-window@^1.8": version "1.8.8" - resolved "https://registry.yarnpkg.com/@types/react-window/-/react-window-1.8.8.tgz#c20645414d142364fbe735818e1c1e0a145696e3" + resolved "https://registry.npmjs.org/@types/react-window/-/react-window-1.8.8.tgz" integrity sha512-8Ls660bHR1AUA2kuRvVG9D/4XpRC6wjAaPT9dil7Ckc76eP9TKWZwwmgfq8Q1LANX3QNDnoU4Zp48A3w+zK69Q== dependencies: "@types/react" "*" -"@types/react@*": - version "19.2.14" - resolved "https://registry.npmjs.org/@types/react/-/react-19.2.14.tgz" - integrity sha512-ilcTH/UniCkMdtexkoCN0bI7pMcJDvmQFPvuPvmEaYA/NSfFTAgdUSLAoVjaRJm7+6PvcM+q1zYOwS4wTYMF9w== - dependencies: - csstype "^3.2.2" - -"@types/react@^18.3.3": +"@types/react@*", "@types/react@^18.3.3": version "18.3.28" resolved "https://registry.npmjs.org/@types/react/-/react-18.3.28.tgz" integrity sha512-z9VXpC7MWrhfWipitjNdgCauoMLRdIILQsAEV+ZesIzBq/oUlxk0m3ApZuMFCXdnS4U7KrI+l3WRUEGQ8K1QKw== @@ -1748,12 +1761,12 @@ "@types/unist@*", "@types/unist@^3.0.0": version "3.0.3" - resolved "https://registry.npmjs.org/@types/unist/-/unist-3.0.3.tgz#acaab0f919ce69cce629c2d4ed2eb4adc1b6c20c" + resolved "https://registry.npmjs.org/@types/unist/-/unist-3.0.3.tgz" integrity sha512-ko/gIFJRv177XgZsZcBwnqJN5x/Gien8qNOn0D5bQU/zAzVf9Zt3BlcUiLqhV9y4ARk0GbT3tnUiPNgnTXzc/Q== "@types/unist@^2.0.0": version "2.0.11" - resolved "https://registry.npmjs.org/@types/unist/-/unist-2.0.11.tgz#11af57b127e32487774841f7a4e54eab166d03c4" + resolved "https://registry.npmjs.org/@types/unist/-/unist-2.0.11.tgz" integrity sha512-CmBKiL6NNo/OqgmMn95Fk9Whlp2mtvIv+KNpQKN2F4SjvrEesubTRWGYSg+BnWZOnlCaSTU1sMpsBOzgbYhnsA== "@types/use-sync-external-store@^0.0.3": @@ -1868,9 +1881,9 @@ eslint-visitor-keys "^5.0.0" "@ungap/structured-clone@^1.0.0": - version "1.3.0" - resolved "https://registry.npmjs.org/@ungap/structured-clone/-/structured-clone-1.3.0.tgz#d06bbb384ebcf6c505fde1c3d0ed4ddffe0aaff8" - integrity sha512-WmoN8qaIAo7WTYWbAZuG8PYEhn5fkz7dZrqTBZ7dtt//lL2Gwms1IcnQ5yHqjDfX8Ft5j4YzDM23f87zBfDe9g== + version "1.3.1" + resolved "https://registry.npmjs.org/@ungap/structured-clone/-/structured-clone-1.3.1.tgz" + integrity sha512-mUFwbeTqrVgDQxFveS+df2yfap6iuP20NAKAsBt5jDEoOTDew+zwLAOilHCeQJOVSvmgCX4ogqIrA0mnyr08yQ== "@vitejs/plugin-react-swc@^3.7.0": version "3.11.0" @@ -2187,7 +2200,7 @@ babel-plugin-macros@^3.1.0: bail@^2.0.0: version "2.0.2" - resolved "https://registry.npmjs.org/bail/-/bail-2.0.2.tgz#d26f5cd8fe5d6f832a31517b9f7c356040ba6d5d" + resolved "https://registry.npmjs.org/bail/-/bail-2.0.2.tgz" integrity sha512-0xO6mYd7JB2YesxDKplafRpsiOzPt9V02ddPCLbY1xYGPOX24NTyN50qnUxgCPcSoYMhKpAuBTjQoRZCAkUDRw== balanced-match@^1.0.0: @@ -2335,7 +2348,7 @@ canvas@^3.2.1: ccount@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/ccount/-/ccount-2.0.1.tgz#17a3bf82302e0870d6da43a01311a8bc02a3ecf5" + resolved "https://registry.npmjs.org/ccount/-/ccount-2.0.1.tgz" integrity sha512-eyrF0jiFpY+3drT6383f1qhkbGsLSifNAjA61IUjZjmLCWjItY6LB9ft9YhoDgwfmclB2zhu51Lc7+95b8NRAg== chai@^6.2.2: @@ -2360,22 +2373,22 @@ chalk@^4.0.0: character-entities-html4@^2.0.0: version "2.1.0" - resolved "https://registry.npmjs.org/character-entities-html4/-/character-entities-html4-2.1.0.tgz#1f1adb940c971a4b22ba39ddca6b618dc6e56b2b" + resolved "https://registry.npmjs.org/character-entities-html4/-/character-entities-html4-2.1.0.tgz" integrity sha512-1v7fgQRj6hnSwFpq1Eu0ynr/CDEw0rXo2B61qXrLNdHZmPKgb7fqS1a2JwF0rISo9q77jDI8VMEHoApn8qDoZA== character-entities-legacy@^3.0.0: version "3.0.0" - resolved "https://registry.npmjs.org/character-entities-legacy/-/character-entities-legacy-3.0.0.tgz#76bc83a90738901d7bc223a9e93759fdd560125b" + resolved "https://registry.npmjs.org/character-entities-legacy/-/character-entities-legacy-3.0.0.tgz" integrity sha512-RpPp0asT/6ufRm//AJVwpViZbGM/MkjQFxJccQRHmISF/22NBtsHqAWmL+/pmkPWoIUJdWyeVleTl1wydHATVQ== character-entities@^2.0.0: version "2.0.2" - resolved "https://registry.npmjs.org/character-entities/-/character-entities-2.0.2.tgz#2d09c2e72cd9523076ccb21157dff66ad43fcc22" + resolved "https://registry.npmjs.org/character-entities/-/character-entities-2.0.2.tgz" integrity sha512-shx7oQ0Awen/BRIdkjkvz54PnEEI/EjwXDSIZp86/KKdbafHh1Df/RYGBhn4hbe2+uKC9FnT5UCEdyPz3ai9hQ== character-reference-invalid@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/character-reference-invalid/-/character-reference-invalid-2.0.1.tgz#85c66b041e43b47210faf401278abf808ac45cb9" + resolved "https://registry.npmjs.org/character-reference-invalid/-/character-reference-invalid-2.0.1.tgz" integrity sha512-iBZ4F4wRbyORVsu0jPV7gXkOsGYjGHPmAyv+HiHG8gi5PtC9KI2j1+v8/tlibRvjoWX027ypmG/n0HtO5t7unw== chart.js@^4.5.1: @@ -2435,7 +2448,7 @@ color-name@~1.1.4: comma-separated-tokens@^2.0.0: version "2.0.3" - resolved "https://registry.npmjs.org/comma-separated-tokens/-/comma-separated-tokens-2.0.3.tgz#4e89c9458acb61bc8fef19f4529973b2392839ee" + resolved "https://registry.npmjs.org/comma-separated-tokens/-/comma-separated-tokens-2.0.3.tgz" integrity sha512-Fu4hJdvzeylCfQPp9SGWidpzrMs7tTrlu6Vb8XGaRGck8QSNZJJp538Wrb60Lax4fPwR64ViY468OIUTbRlGZg== commander@2: @@ -2863,7 +2876,7 @@ decimal.js@^10.6.0: decode-named-character-reference@^1.0.0: version "1.3.0" - resolved "https://registry.npmjs.org/decode-named-character-reference/-/decode-named-character-reference-1.3.0.tgz#3e40603760874c2e5867691b599d73a7da25b53f" + resolved "https://registry.npmjs.org/decode-named-character-reference/-/decode-named-character-reference-1.3.0.tgz" integrity sha512-GtpQYB283KrPp6nRw50q3U9/VfOutZOe103qlN7BPP6Ad27xYnOIWv4lPzo8HCAL+mMZofJ9KEy30fq6MfaK6Q== dependencies: character-entities "^2.0.0" @@ -2922,14 +2935,14 @@ detect-libc@^2.0.0, detect-libc@^2.0.3: devlop@^1.0.0, devlop@^1.1.0: version "1.1.0" - resolved "https://registry.npmjs.org/devlop/-/devlop-1.1.0.tgz#4db7c2ca4dc6e0e834c30be70c94bbc976dc7018" + resolved "https://registry.npmjs.org/devlop/-/devlop-1.1.0.tgz" integrity sha512-RWmIqhcFf1lRYBvNmr7qTNuyCt/7/ns2jbpp1+PalgE/rDQcBT0fioSMUpJ93irlUhC5hrg4cYqe6U+0ImW0rA== dependencies: dequal "^2.0.0" dnd-core@14.0.1: version "14.0.1" - resolved "https://registry.yarnpkg.com/dnd-core/-/dnd-core-14.0.1.tgz#76d000e41c494983210fb20a48b835f81a203c2e" + resolved "https://registry.npmjs.org/dnd-core/-/dnd-core-14.0.1.tgz" integrity sha512-+PVS2VPTgKFPYWo3vAFEA8WPbTf7/xo43TifH9G8S1KqnrQu0o77A3unrF5yOugy4mIz7K5wAVFHUcha7wsz6A== dependencies: "@react-dnd/asap" "^4.0.0" @@ -2970,10 +2983,17 @@ dom-helpers@^5.0.1: "@babel/runtime" "^7.8.7" csstype "^3.0.2" -dompurify@*, dompurify@^3.4.0, dompurify@^3.4.2: - version "3.4.2" - resolved "https://registry.npmjs.org/dompurify/-/dompurify-3.4.2.tgz#f0ff81be682c485505097ba8195a058d8f575218" - integrity sha512-lHeS9SA/IKeIFFyYciHBr2n0v1VMPlSj843HdLOwjb2OxNwdq9Xykxqhk+FE42MzAdHvInbAolSE4mhahPpjXA== +dompurify@*, dompurify@^3.4.2: + version "3.4.7" + resolved "https://registry.npmjs.org/dompurify/-/dompurify-3.4.7.tgz#e2702ea4fd5d83467f1baef62309466ce7d44a82" + integrity sha512-2jBxDJY4RR06tQNy4w5FlFH7kfxsQZlufd0sbv+chfHCxeJwrFw2baUDsSwvBISD4K4RDbd0PTfy3uNXsR6siA== + optionalDependencies: + "@types/trusted-types" "^2.0.7" + +dompurify@^3.4.0: + version "3.4.5" + resolved "https://registry.npmjs.org/dompurify/-/dompurify-3.4.5.tgz" + integrity sha512-OrwIBKsdNSVEeubdJ1HBv/wNENRM9ytAVCv7YXt//A3vPdVMNuACRqK9mXCGCBW2ln7BT/A4X0jXHo2Gu89miA== optionalDependencies: "@types/trusted-types" "^2.0.7" @@ -3168,7 +3188,7 @@ es-to-primitive@^1.3.0: esbuild@^0.27.0: version "0.27.7" - resolved "https://registry.npmjs.org/esbuild/-/esbuild-0.27.7.tgz#bcadce22b2f3fd76f257e3a64f83a64986fea11f" + resolved "https://registry.npmjs.org/esbuild/-/esbuild-0.27.7.tgz" integrity sha512-IxpibTjyVnmrIQo5aqNpCgoACA/dTKLTlhMHihVHhdkxKyPO1uBBthumT0rdHmcsk9uMonIWS0m4FljWzILh3w== optionalDependencies: "@esbuild/aix-ppc64" "0.27.7" @@ -3346,7 +3366,7 @@ estraverse@^5.1.0, estraverse@^5.2.0, estraverse@^5.3.0: estree-util-is-identifier-name@^3.0.0: version "3.0.0" - resolved "https://registry.npmjs.org/estree-util-is-identifier-name/-/estree-util-is-identifier-name-3.0.0.tgz#0b5ef4c4ff13508b34dcd01ecfa945f61fce5dbd" + resolved "https://registry.npmjs.org/estree-util-is-identifier-name/-/estree-util-is-identifier-name-3.0.0.tgz" integrity sha512-hFtqIDZTIUZ9BXLb8y4pYGyk6+wekIivNVTcmvk8NoOh+VeRn5y6cEHzbURrWbfp1fIqdVipilzj+lfaadNZmg== estree-walker@^3.0.3: @@ -3393,7 +3413,7 @@ expect-type@^1.3.0: extend@^3.0.0: version "3.0.2" - resolved "https://registry.npmjs.org/extend/-/extend-3.0.2.tgz#f8b1136b4071fbd8eb140aff858b1019ec2915fa" + resolved "https://registry.npmjs.org/extend/-/extend-3.0.2.tgz" integrity sha512-fjquC59cD7CyW6urNXK0FBufkZcoiGG80wTuPujX590cB5Ttln20E2UB4S/WARVqhXffZl2LNgS+gQdPIIim/g== fast-csv@^4.3.1: @@ -3681,7 +3701,7 @@ hasown@^2.0.2: hast-util-to-jsx-runtime@^2.0.0: version "2.3.6" - resolved "https://registry.npmjs.org/hast-util-to-jsx-runtime/-/hast-util-to-jsx-runtime-2.3.6.tgz#ff31897aae59f62232e21594eac7ef6b63333e98" + resolved "https://registry.npmjs.org/hast-util-to-jsx-runtime/-/hast-util-to-jsx-runtime-2.3.6.tgz" integrity sha512-zl6s8LwNyo1P9uw+XJGvZtdFF1GdAkOg8ujOw+4Pyb76874fLps4ueHXDhXWdk6YHQ6OgUtinliG7RsYvCbbBg== dependencies: "@types/estree" "^1.0.0" @@ -3702,7 +3722,7 @@ hast-util-to-jsx-runtime@^2.0.0: hast-util-whitespace@^3.0.0: version "3.0.0" - resolved "https://registry.npmjs.org/hast-util-whitespace/-/hast-util-whitespace-3.0.0.tgz#7778ed9d3c92dd9e8c5c8f648a49c21fc51cb621" + resolved "https://registry.npmjs.org/hast-util-whitespace/-/hast-util-whitespace-3.0.0.tgz" integrity sha512-88JUN06ipLwsnv+dVn+OIYOvAuvBMy/Qoi6O7mQHxdPXpjy+Cd6xRkWwux7DKO+4sYILtLBRIKgsdpS2gQc7qw== dependencies: "@types/hast" "^3.0.0" @@ -3730,7 +3750,7 @@ html-parse-stringify@^3.0.1: html-url-attributes@^3.0.0: version "3.0.1" - resolved "https://registry.npmjs.org/html-url-attributes/-/html-url-attributes-3.0.1.tgz#83b052cd5e437071b756cd74ae70f708870c2d87" + resolved "https://registry.npmjs.org/html-url-attributes/-/html-url-attributes-3.0.1.tgz" integrity sha512-ol6UPyBWqsrO6EJySPz2O7ZSr856WDrEzM5zMqp+FJJLGMW35cLYmmZnl0vztAZxRUoNZJFTCohfjuIJ8I4QBQ== html2canvas@^1.4.1: @@ -3830,7 +3850,7 @@ ini@~1.3.0: inline-style-parser@0.2.7: version "0.2.7" - resolved "https://registry.npmjs.org/inline-style-parser/-/inline-style-parser-0.2.7.tgz#b1fc68bfc0313b8685745e4464e37f9376b9c909" + resolved "https://registry.npmjs.org/inline-style-parser/-/inline-style-parser-0.2.7.tgz" integrity sha512-Nb2ctOyNR8DqQoR0OwRG95uNWIC0C1lCgf5Naz5H6Ji72KZ8OcFZLz2P5sNgwlyoJ8Yif11oMuYs5pBQa86csA== internal-slot@^1.1.0: @@ -3849,12 +3869,12 @@ internal-slot@^1.1.0: is-alphabetical@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/is-alphabetical/-/is-alphabetical-2.0.1.tgz#01072053ea7c1036df3c7d19a6daaec7f19e789b" + resolved "https://registry.npmjs.org/is-alphabetical/-/is-alphabetical-2.0.1.tgz" integrity sha512-FWyyY60MeTNyeSRpkM2Iry0G9hpr7/9kD40mD/cGQEuilcZYS4okz8SN2Q6rLCJ8gbCt6fN+rC+6tMGS99LaxQ== is-alphanumerical@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/is-alphanumerical/-/is-alphanumerical-2.0.1.tgz#7c03fbe96e3e931113e57f964b0a368cc2dfd875" + resolved "https://registry.npmjs.org/is-alphanumerical/-/is-alphanumerical-2.0.1.tgz" integrity sha512-hmbYhX/9MUMF5uh7tOXyK/n0ZvWpad5caBA17GsC6vyuCqaWliRG5K1qS9inmUhEMaOBIW7/whAnSwveW/LtZw== dependencies: is-alphabetical "^2.0.0" @@ -3931,7 +3951,7 @@ is-date-object@^1.0.5, is-date-object@^1.1.0: is-decimal@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/is-decimal/-/is-decimal-2.0.1.tgz#9469d2dc190d0214fd87d78b78caecc0cc14eef7" + resolved "https://registry.npmjs.org/is-decimal/-/is-decimal-2.0.1.tgz" integrity sha512-AAB9hiomQs5DXWcRB1rqsxGUstbRroFOPPVAomNk/3XHR5JyEZChOyTWe2oayKnsSsr/kcGqF+z6yuH6HHpN0A== is-extglob@^2.1.1: @@ -3966,7 +3986,7 @@ is-glob@^4.0.0, is-glob@^4.0.3: is-hexadecimal@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/is-hexadecimal/-/is-hexadecimal-2.0.1.tgz#86b5bf668fca307498d319dfc03289d781a90027" + resolved "https://registry.npmjs.org/is-hexadecimal/-/is-hexadecimal-2.0.1.tgz" integrity sha512-DgZQp241c8oO6cA1SbTEWiXeoxV42vlcJxgH+B3hi1AiqqKruZR3ZGF8In3fj4+/y/7rHvlOZLZtgJ/4ttYGZg== is-map@^2.0.3: @@ -3989,7 +4009,7 @@ is-number-object@^1.1.1: is-plain-obj@^4.0.0: version "4.1.0" - resolved "https://registry.npmjs.org/is-plain-obj/-/is-plain-obj-4.1.0.tgz#d65025edec3657ce032fd7db63c97883eaed71f0" + resolved "https://registry.npmjs.org/is-plain-obj/-/is-plain-obj-4.1.0.tgz" integrity sha512-+Pgi+vMuUNkJyExiMBt5IlFoMyKnr5zhJ4Uspz58WOhBF5QoIZkFyNHIbBAtHwzVAgk5RtndVNsDRN61/mmDqg== is-potential-custom-element-name@^1.0.1: @@ -4362,12 +4382,12 @@ lodash.uniq@^4.5.0: lodash@^4.17.21, lodash@^4.18.1: version "4.18.1" - resolved "https://registry.npmjs.org/lodash/-/lodash-4.18.1.tgz#ff2b66c1f6326d59513de2407bf881439812771c" + resolved "https://registry.npmjs.org/lodash/-/lodash-4.18.1.tgz" integrity sha512-dMInicTPVE8d1e5otfwmmjlxkZoUpiVLwyeTdUsi/Caj/gfzzblBcCE5sRHV/AsjuCmxWrte2TNGSYuCeCq+0Q== longest-streak@^3.0.0: version "3.1.0" - resolved "https://registry.npmjs.org/longest-streak/-/longest-streak-3.1.0.tgz#62fa67cd958742a1574af9f39866364102d90cd4" + resolved "https://registry.npmjs.org/longest-streak/-/longest-streak-3.1.0.tgz" integrity sha512-9Ri+o0JYgehTaVBBDoMqIl8GXtbWg711O3srftcHhZ0dqnETqLaoIK0x17fUw9rFSlK/0NlsKe0Ahhyl5pXE2g== loose-envify@^1.1.0, loose-envify@^1.4.0: @@ -4423,7 +4443,7 @@ math-intrinsics@^1.1.0: mdast-util-from-markdown@^2.0.0: version "2.0.3" - resolved "https://registry.npmjs.org/mdast-util-from-markdown/-/mdast-util-from-markdown-2.0.3.tgz#c95822b91aab75f18a4cbe8b2f51b873ed2cf0c7" + resolved "https://registry.npmjs.org/mdast-util-from-markdown/-/mdast-util-from-markdown-2.0.3.tgz" integrity sha512-W4mAWTvSlKvf8L6J+VN9yLSqQ9AOAAvHuoDAmPkz4dHf553m5gVj2ejadHJhoJmcmxEnOv6Pa8XJhpxE93kb8Q== dependencies: "@types/mdast" "^4.0.0" @@ -4441,7 +4461,7 @@ mdast-util-from-markdown@^2.0.0: mdast-util-mdx-expression@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/mdast-util-mdx-expression/-/mdast-util-mdx-expression-2.0.1.tgz#43f0abac9adc756e2086f63822a38c8d3c3a5096" + resolved "https://registry.npmjs.org/mdast-util-mdx-expression/-/mdast-util-mdx-expression-2.0.1.tgz" integrity sha512-J6f+9hUp+ldTZqKRSg7Vw5V6MqjATc+3E4gf3CFNcuZNWD8XdyI6zQ8GqH7f8169MM6P7hMBRDVGnn7oHB9kXQ== dependencies: "@types/estree-jsx" "^1.0.0" @@ -4453,7 +4473,7 @@ mdast-util-mdx-expression@^2.0.0: mdast-util-mdx-jsx@^3.0.0: version "3.2.0" - resolved "https://registry.npmjs.org/mdast-util-mdx-jsx/-/mdast-util-mdx-jsx-3.2.0.tgz#fd04c67a2a7499efb905a8a5c578dddc9fdada0d" + resolved "https://registry.npmjs.org/mdast-util-mdx-jsx/-/mdast-util-mdx-jsx-3.2.0.tgz" integrity sha512-lj/z8v0r6ZtsN/cGNNtemmmfoLAFZnjMbNyLzBafjzikOM+glrjNHPlf6lQDOTccj9n5b0PPihEBbhneMyGs1Q== dependencies: "@types/estree-jsx" "^1.0.0" @@ -4471,7 +4491,7 @@ mdast-util-mdx-jsx@^3.0.0: mdast-util-mdxjs-esm@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/mdast-util-mdxjs-esm/-/mdast-util-mdxjs-esm-2.0.1.tgz#019cfbe757ad62dd557db35a695e7314bcc9fa97" + resolved "https://registry.npmjs.org/mdast-util-mdxjs-esm/-/mdast-util-mdxjs-esm-2.0.1.tgz" integrity sha512-EcmOpxsZ96CvlP03NghtH1EsLtr0n9Tm4lPUJUBccV9RwUOneqSycg19n5HGzCf+10LozMRSObtVr3ee1WoHtg== dependencies: "@types/estree-jsx" "^1.0.0" @@ -4483,7 +4503,7 @@ mdast-util-mdxjs-esm@^2.0.0: mdast-util-phrasing@^4.0.0: version "4.1.0" - resolved "https://registry.npmjs.org/mdast-util-phrasing/-/mdast-util-phrasing-4.1.0.tgz#7cc0a8dec30eaf04b7b1a9661a92adb3382aa6e3" + resolved "https://registry.npmjs.org/mdast-util-phrasing/-/mdast-util-phrasing-4.1.0.tgz" integrity sha512-TqICwyvJJpBwvGAMZjj4J2n0X8QWp21b9l0o7eXyVJ25YNWYbJDVIyD1bZXE6WtV6RmKJVYmQAKWa0zWOABz2w== dependencies: "@types/mdast" "^4.0.0" @@ -4491,7 +4511,7 @@ mdast-util-phrasing@^4.0.0: mdast-util-to-hast@^13.0.0: version "13.2.1" - resolved "https://registry.npmjs.org/mdast-util-to-hast/-/mdast-util-to-hast-13.2.1.tgz#d7ff84ca499a57e2c060ae67548ad950e689a053" + resolved "https://registry.npmjs.org/mdast-util-to-hast/-/mdast-util-to-hast-13.2.1.tgz" integrity sha512-cctsq2wp5vTsLIcaymblUriiTcZd0CwWtCbLvrOzYCDZoWyMNV8sZ7krj09FSnsiJi3WVsHLM4k6Dq/yaPyCXA== dependencies: "@types/hast" "^3.0.0" @@ -4506,7 +4526,7 @@ mdast-util-to-hast@^13.0.0: mdast-util-to-markdown@^2.0.0: version "2.1.2" - resolved "https://registry.npmjs.org/mdast-util-to-markdown/-/mdast-util-to-markdown-2.1.2.tgz#f910ffe60897f04bb4b7e7ee434486f76288361b" + resolved "https://registry.npmjs.org/mdast-util-to-markdown/-/mdast-util-to-markdown-2.1.2.tgz" integrity sha512-xj68wMTvGXVOKonmog6LwyJKrYXZPvlwabaryTjLh9LuvovB/KAH+kvi8Gjj+7rJjsFi23nkUxRQv1KqSroMqA== dependencies: "@types/mdast" "^4.0.0" @@ -4521,7 +4541,7 @@ mdast-util-to-markdown@^2.0.0: mdast-util-to-string@^4.0.0: version "4.0.0" - resolved "https://registry.npmjs.org/mdast-util-to-string/-/mdast-util-to-string-4.0.0.tgz#7a5121475556a04e7eddeb67b264aae79d312814" + resolved "https://registry.npmjs.org/mdast-util-to-string/-/mdast-util-to-string-4.0.0.tgz" integrity sha512-0H44vDimn51F0YwvxSJSm0eCDOJTRlmN0R1yBh4HLj9wiV1Dn0QoXGbvFAWj2hSItVTlCmBF1hqKlIyUBVFLPg== dependencies: "@types/mdast" "^4.0.0" @@ -4538,12 +4558,12 @@ mdurl@^2.0.0: "memoize-one@>=3.1.1 <6": version "5.2.1" - resolved "https://registry.yarnpkg.com/memoize-one/-/memoize-one-5.2.1.tgz#8337aa3c4335581839ec01c3d594090cebe8f00e" + resolved "https://registry.npmjs.org/memoize-one/-/memoize-one-5.2.1.tgz" integrity sha512-zYiwtZUcYyXKo/np96AGZAckk+FWWsUdJ3cHGGmld7+AhvcWmQyGCYUh1hc4Q/pkOhb65dQR/pqCyK0cOaHz4Q== micromark-core-commonmark@^2.0.0: version "2.0.3" - resolved "https://registry.npmjs.org/micromark-core-commonmark/-/micromark-core-commonmark-2.0.3.tgz#c691630e485021a68cf28dbc2b2ca27ebf678cd4" + resolved "https://registry.npmjs.org/micromark-core-commonmark/-/micromark-core-commonmark-2.0.3.tgz" integrity sha512-RDBrHEMSxVFLg6xvnXmb1Ayr2WzLAWjeSATAoxwKYJV94TeNavgoIdA0a9ytzDSVzBy2YKFK+emCPOEibLeCrg== dependencies: decode-named-character-reference "^1.0.0" @@ -4565,7 +4585,7 @@ micromark-core-commonmark@^2.0.0: micromark-factory-destination@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/micromark-factory-destination/-/micromark-factory-destination-2.0.1.tgz#8fef8e0f7081f0474fbdd92deb50c990a0264639" + resolved "https://registry.npmjs.org/micromark-factory-destination/-/micromark-factory-destination-2.0.1.tgz" integrity sha512-Xe6rDdJlkmbFRExpTOmRj9N3MaWmbAgdpSrBQvCFqhezUn4AHqJHbaEnfbVYYiexVSs//tqOdY/DxhjdCiJnIA== dependencies: micromark-util-character "^2.0.0" @@ -4574,7 +4594,7 @@ micromark-factory-destination@^2.0.0: micromark-factory-label@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/micromark-factory-label/-/micromark-factory-label-2.0.1.tgz#5267efa97f1e5254efc7f20b459a38cb21058ba1" + resolved "https://registry.npmjs.org/micromark-factory-label/-/micromark-factory-label-2.0.1.tgz" integrity sha512-VFMekyQExqIW7xIChcXn4ok29YE3rnuyveW3wZQWWqF4Nv9Wk5rgJ99KzPvHjkmPXF93FXIbBp6YdW3t71/7Vg== dependencies: devlop "^1.0.0" @@ -4584,7 +4604,7 @@ micromark-factory-label@^2.0.0: micromark-factory-space@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/micromark-factory-space/-/micromark-factory-space-2.0.1.tgz#36d0212e962b2b3121f8525fc7a3c7c029f334fc" + resolved "https://registry.npmjs.org/micromark-factory-space/-/micromark-factory-space-2.0.1.tgz" integrity sha512-zRkxjtBxxLd2Sc0d+fbnEunsTj46SWXgXciZmHq0kDYGnck/ZSGj9/wULTV95uoeYiK5hRXP2mJ98Uo4cq/LQg== dependencies: micromark-util-character "^2.0.0" @@ -4592,7 +4612,7 @@ micromark-factory-space@^2.0.0: micromark-factory-title@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/micromark-factory-title/-/micromark-factory-title-2.0.1.tgz#237e4aa5d58a95863f01032d9ee9b090f1de6e94" + resolved "https://registry.npmjs.org/micromark-factory-title/-/micromark-factory-title-2.0.1.tgz" integrity sha512-5bZ+3CjhAd9eChYTHsjy6TGxpOFSKgKKJPJxr293jTbfry2KDoWkhBb6TcPVB4NmzaPhMs1Frm9AZH7OD4Cjzw== dependencies: micromark-factory-space "^2.0.0" @@ -4602,7 +4622,7 @@ micromark-factory-title@^2.0.0: micromark-factory-whitespace@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/micromark-factory-whitespace/-/micromark-factory-whitespace-2.0.1.tgz#06b26b2983c4d27bfcc657b33e25134d4868b0b1" + resolved "https://registry.npmjs.org/micromark-factory-whitespace/-/micromark-factory-whitespace-2.0.1.tgz" integrity sha512-Ob0nuZ3PKt/n0hORHyvoD9uZhr+Za8sFoP+OnMcnWK5lngSzALgQYKMr9RJVOWLqQYuyn6ulqGWSXdwf6F80lQ== dependencies: micromark-factory-space "^2.0.0" @@ -4612,7 +4632,7 @@ micromark-factory-whitespace@^2.0.0: micromark-util-character@^2.0.0: version "2.1.1" - resolved "https://registry.npmjs.org/micromark-util-character/-/micromark-util-character-2.1.1.tgz#2f987831a40d4c510ac261e89852c4e9703ccda6" + resolved "https://registry.npmjs.org/micromark-util-character/-/micromark-util-character-2.1.1.tgz" integrity sha512-wv8tdUTJ3thSFFFJKtpYKOYiGP2+v96Hvk4Tu8KpCAsTMs6yi+nVmGh1syvSCsaxz45J6Jbw+9DD6g97+NV67Q== dependencies: micromark-util-symbol "^2.0.0" @@ -4620,14 +4640,14 @@ micromark-util-character@^2.0.0: micromark-util-chunked@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/micromark-util-chunked/-/micromark-util-chunked-2.0.1.tgz#47fbcd93471a3fccab86cff03847fc3552db1051" + resolved "https://registry.npmjs.org/micromark-util-chunked/-/micromark-util-chunked-2.0.1.tgz" integrity sha512-QUNFEOPELfmvv+4xiNg2sRYeS/P84pTW0TCgP5zc9FpXetHY0ab7SxKyAQCNCc1eK0459uoLI1y5oO5Vc1dbhA== dependencies: micromark-util-symbol "^2.0.0" micromark-util-classify-character@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/micromark-util-classify-character/-/micromark-util-classify-character-2.0.1.tgz#d399faf9c45ca14c8b4be98b1ea481bced87b629" + resolved "https://registry.npmjs.org/micromark-util-classify-character/-/micromark-util-classify-character-2.0.1.tgz" integrity sha512-K0kHzM6afW/MbeWYWLjoHQv1sgg2Q9EccHEDzSkxiP/EaagNzCm7T/WMKZ3rjMbvIpvBiZgwR3dKMygtA4mG1Q== dependencies: micromark-util-character "^2.0.0" @@ -4636,7 +4656,7 @@ micromark-util-classify-character@^2.0.0: micromark-util-combine-extensions@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/micromark-util-combine-extensions/-/micromark-util-combine-extensions-2.0.1.tgz#2a0f490ab08bff5cc2fd5eec6dd0ca04f89b30a9" + resolved "https://registry.npmjs.org/micromark-util-combine-extensions/-/micromark-util-combine-extensions-2.0.1.tgz" integrity sha512-OnAnH8Ujmy59JcyZw8JSbK9cGpdVY44NKgSM7E9Eh7DiLS2E9RNQf0dONaGDzEG9yjEl5hcqeIsj4hfRkLH/Bg== dependencies: micromark-util-chunked "^2.0.0" @@ -4644,14 +4664,14 @@ micromark-util-combine-extensions@^2.0.0: micromark-util-decode-numeric-character-reference@^2.0.0: version "2.0.2" - resolved "https://registry.npmjs.org/micromark-util-decode-numeric-character-reference/-/micromark-util-decode-numeric-character-reference-2.0.2.tgz#fcf15b660979388e6f118cdb6bf7d79d73d26fe5" + resolved "https://registry.npmjs.org/micromark-util-decode-numeric-character-reference/-/micromark-util-decode-numeric-character-reference-2.0.2.tgz" integrity sha512-ccUbYk6CwVdkmCQMyr64dXz42EfHGkPQlBj5p7YVGzq8I7CtjXZJrubAYezf7Rp+bjPseiROqe7G6foFd+lEuw== dependencies: micromark-util-symbol "^2.0.0" micromark-util-decode-string@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/micromark-util-decode-string/-/micromark-util-decode-string-2.0.1.tgz#6cb99582e5d271e84efca8e61a807994d7161eb2" + resolved "https://registry.npmjs.org/micromark-util-decode-string/-/micromark-util-decode-string-2.0.1.tgz" integrity sha512-nDV/77Fj6eH1ynwscYTOsbK7rR//Uj0bZXBwJZRfaLEJ1iGBR6kIfNmlNqaqJf649EP0F3NWNdeJi03elllNUQ== dependencies: decode-named-character-reference "^1.0.0" @@ -4661,31 +4681,31 @@ micromark-util-decode-string@^2.0.0: micromark-util-encode@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/micromark-util-encode/-/micromark-util-encode-2.0.1.tgz#0d51d1c095551cfaac368326963cf55f15f540b8" + resolved "https://registry.npmjs.org/micromark-util-encode/-/micromark-util-encode-2.0.1.tgz" integrity sha512-c3cVx2y4KqUnwopcO9b/SCdo2O67LwJJ/UyqGfbigahfegL9myoEFoDYZgkT7f36T0bLrM9hZTAaAyH+PCAXjw== micromark-util-html-tag-name@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/micromark-util-html-tag-name/-/micromark-util-html-tag-name-2.0.1.tgz#e40403096481986b41c106627f98f72d4d10b825" + resolved "https://registry.npmjs.org/micromark-util-html-tag-name/-/micromark-util-html-tag-name-2.0.1.tgz" integrity sha512-2cNEiYDhCWKI+Gs9T0Tiysk136SnR13hhO8yW6BGNyhOC4qYFnwF1nKfD3HFAIXA5c45RrIG1ub11GiXeYd1xA== micromark-util-normalize-identifier@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/micromark-util-normalize-identifier/-/micromark-util-normalize-identifier-2.0.1.tgz#c30d77b2e832acf6526f8bf1aa47bc9c9438c16d" + resolved "https://registry.npmjs.org/micromark-util-normalize-identifier/-/micromark-util-normalize-identifier-2.0.1.tgz" integrity sha512-sxPqmo70LyARJs0w2UclACPUUEqltCkJ6PhKdMIDuJ3gSf/Q+/GIe3WKl0Ijb/GyH9lOpUkRAO2wp0GVkLvS9Q== dependencies: micromark-util-symbol "^2.0.0" micromark-util-resolve-all@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/micromark-util-resolve-all/-/micromark-util-resolve-all-2.0.1.tgz#e1a2d62cdd237230a2ae11839027b19381e31e8b" + resolved "https://registry.npmjs.org/micromark-util-resolve-all/-/micromark-util-resolve-all-2.0.1.tgz" integrity sha512-VdQyxFWFT2/FGJgwQnJYbe1jjQoNTS4RjglmSjTUlpUMa95Htx9NHeYW4rGDJzbjvCsl9eLjMQwGeElsqmzcHg== dependencies: micromark-util-types "^2.0.0" micromark-util-sanitize-uri@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/micromark-util-sanitize-uri/-/micromark-util-sanitize-uri-2.0.1.tgz#ab89789b818a58752b73d6b55238621b7faa8fd7" + resolved "https://registry.npmjs.org/micromark-util-sanitize-uri/-/micromark-util-sanitize-uri-2.0.1.tgz" integrity sha512-9N9IomZ/YuGGZZmQec1MbgxtlgougxTodVwDzzEouPKo3qFWvymFHWcnDi2vzV1ff6kas9ucW+o3yzJK9YB1AQ== dependencies: micromark-util-character "^2.0.0" @@ -4694,7 +4714,7 @@ micromark-util-sanitize-uri@^2.0.0: micromark-util-subtokenize@^2.0.0: version "2.1.0" - resolved "https://registry.npmjs.org/micromark-util-subtokenize/-/micromark-util-subtokenize-2.1.0.tgz#d8ade5ba0f3197a1cf6a2999fbbfe6357a1a19ee" + resolved "https://registry.npmjs.org/micromark-util-subtokenize/-/micromark-util-subtokenize-2.1.0.tgz" integrity sha512-XQLu552iSctvnEcgXw6+Sx75GflAPNED1qx7eBJ+wydBb2KCbRZe+NwvIEEMM83uml1+2WSXpBAcp9IUCgCYWA== dependencies: devlop "^1.0.0" @@ -4704,17 +4724,17 @@ micromark-util-subtokenize@^2.0.0: micromark-util-symbol@^2.0.0: version "2.0.1" - resolved "https://registry.npmjs.org/micromark-util-symbol/-/micromark-util-symbol-2.0.1.tgz#e5da494e8eb2b071a0d08fb34f6cefec6c0a19b8" + resolved "https://registry.npmjs.org/micromark-util-symbol/-/micromark-util-symbol-2.0.1.tgz" integrity sha512-vs5t8Apaud9N28kgCrRUdEed4UJ+wWNvicHLPxCa9ENlYuAY31M0ETy5y1vA33YoNPDFTghEbnh6efaE8h4x0Q== micromark-util-types@^2.0.0: version "2.0.2" - resolved "https://registry.npmjs.org/micromark-util-types/-/micromark-util-types-2.0.2.tgz#f00225f5f5a0ebc3254f96c36b6605c4b393908e" + resolved "https://registry.npmjs.org/micromark-util-types/-/micromark-util-types-2.0.2.tgz" integrity sha512-Yw0ECSpJoViF1qTU4DC6NwtC4aWGt1EkzaQB8KPPyCRR8z9TWeV0HbEFGTO+ZY1wB22zmxnJqhPyTpOVCpeHTA== micromark@^4.0.0: version "4.0.2" - resolved "https://registry.npmjs.org/micromark/-/micromark-4.0.2.tgz#91395a3e1884a198e62116e33c9c568e39936fdb" + resolved "https://registry.npmjs.org/micromark/-/micromark-4.0.2.tgz" integrity sha512-zpe98Q6kvavpCr1NPVSCMebCKfD7CA2NqZ+rykeNhONIJBpc1tFKt9hucLGwha3jNTNI8lHpctWJWoimVF4PfA== dependencies: "@types/debug" "^4.0.0" @@ -4960,7 +4980,7 @@ parent-module@^1.0.0: parse-entities@^4.0.0: version "4.0.2" - resolved "https://registry.npmjs.org/parse-entities/-/parse-entities-4.0.2.tgz#61d46f5ed28e4ee62e9ddc43d6b010188443f159" + resolved "https://registry.npmjs.org/parse-entities/-/parse-entities-4.0.2.tgz" integrity sha512-GG2AQYWoLgL877gQIKeRPGO1xF9+eG1ujIb5soS5gPvLQ1y2o8FL90w2QWNdf9I361Mpp7726c+lj3U0qK1uGw== dependencies: "@types/unist" "^2.0.0" @@ -5034,9 +5054,9 @@ possible-typed-array-names@^1.0.0: integrity sha512-/+5VFTchJDoVj3bhoqi6UeymcD00DAwb1nJwamzPvHEszJ4FpF6SNNbUbOS8yI56qHzdV8eK0qEfOSiodkTdxg== postcss@^8.5.6: - version "8.5.14" - resolved "https://registry.npmjs.org/postcss/-/postcss-8.5.14.tgz#a66c2d7808fadf69ebb5b84a03f8bafd76c4919c" - integrity sha512-SoSL4+OSEtR99LHFZQiJLkT59C5B1amGO1NzTwj7TT1qCUgUO6hxOvzkOYxD+vMrXBM3XJIKzokoERdqQq/Zmg== + version "8.5.8" + resolved "https://registry.npmjs.org/postcss/-/postcss-8.5.8.tgz" + integrity sha512-OW/rX8O/jXnm82Ey1k44pObPtdblfiuWnrd8X7GJ7emImCOstunGbXUpp7HdBrFQX6rJzn3sPT397Wp5aCwCHg== dependencies: nanoid "^3.3.11" picocolors "^1.1.1" @@ -5105,7 +5125,7 @@ prop-types@^15.6.2, prop-types@^15.8.1: property-information@^7.0.0: version "7.1.0" - resolved "https://registry.npmjs.org/property-information/-/property-information-7.1.0.tgz#b622e8646e02b580205415586b40804d3e8bfd5d" + resolved "https://registry.npmjs.org/property-information/-/property-information-7.1.0.tgz" integrity sha512-TwEZ+X+yCJmYfL7TPUOcvBZ4QfoT5YenQiJuX//0th53DE6w0xxLEtfK3iyryQFddXuvkIk51EEgrJQ0WJkOmQ== prosemirror-changeset@^2.3.0: @@ -5300,10 +5320,10 @@ react-animate-on-change@^2.2.0: resolved "https://registry.npmjs.org/react-animate-on-change/-/react-animate-on-change-2.2.0.tgz" integrity sha512-cM0YHbsxIh8fshX/U24+pk4nDG7Ike9NsEy21reqJPqVt6xRA+6oYkaQHEggINKjYEMbztwK40Ro0/EHZ5naVQ== -react-arborist@^3.5.0: - version "3.5.0" - resolved "https://registry.yarnpkg.com/react-arborist/-/react-arborist-3.5.0.tgz#e8cb5b6e6c03a0aafb93b84b208776dbf2d4fede" - integrity sha512-FdXOICSt7P2h+Pxin1ULN02b4qrXJznNcshgwwWVtuYMLWSJcD245PQ4HOSj/Lr2T1uEegmnEm5Lbns2hUUsqg== +react-arborist@3.7.0: + version "3.7.0" + resolved "https://registry.npmjs.org/react-arborist/-/react-arborist-3.7.0.tgz#b39156f1fe4bb31477118c57905827a3ba4d0ec5" + integrity sha512-gh2SoO0eXQVSP6zxXMGqFeXF+l2uabDGBVn0+RKqy/s7mrG5xGnfM5mhyB67cMVobC3vWYLqe6HGh7ZEZadW/w== dependencies: react-dnd "^14.0.3" react-dnd-html5-backend "^14.0.3" @@ -5313,7 +5333,7 @@ react-arborist@^3.5.0: react-dnd-html5-backend@^14.0.3: version "14.1.0" - resolved "https://registry.yarnpkg.com/react-dnd-html5-backend/-/react-dnd-html5-backend-14.1.0.tgz#b35a3a0c16dd3a2bfb5eb7ec62cf0c2cace8b62f" + resolved "https://registry.npmjs.org/react-dnd-html5-backend/-/react-dnd-html5-backend-14.1.0.tgz" integrity sha512-6ONeqEC3XKVf4eVmMTe0oPds+c5B9Foyj8p/ZKLb7kL2qh9COYxiBHv3szd6gztqi/efkmriywLUVlPotqoJyw== dependencies: dnd-core "14.0.1" @@ -5327,7 +5347,7 @@ react-dnd-html5-backend@^16.0.1: react-dnd@^14.0.3: version "14.0.5" - resolved "https://registry.yarnpkg.com/react-dnd/-/react-dnd-14.0.5.tgz#ecf264e220ae62e35634d9b941502f3fca0185ed" + resolved "https://registry.npmjs.org/react-dnd/-/react-dnd-14.0.5.tgz" integrity sha512-9i1jSgbyVw0ELlEVt/NkCUkxy1hmhJOkePoCH713u75vzHGyXhPDm28oLfc2NMSBjZRM1Y+wRjHXJT3sPrTy+A== dependencies: "@react-dnd/invariant" "^2.0.0" @@ -5385,9 +5405,9 @@ react-is@^19.2.3: integrity sha512-W+EWGn2v0ApPKgKKCy/7s7WHXkboGcsrXE+2joLyVxkbyVQfO3MUEaUQDHoSmb8TFFrSKYa9mw64WZHNHSDzYA== react-is@^19.2.4: - version "19.2.5" - resolved "https://registry.npmjs.org/react-is/-/react-is-19.2.5.tgz#7e7b54143e9313fed787b23fd4295d5a23872ad9" - integrity sha512-Dn0t8IQhCmeIT3wu+Apm1/YVsJXsGWi6k4sPdnBIdqMVtHtv0IGi6dcpNpNkNac0zB2uUAqNX3MHzN8c+z2rwQ== + version "19.2.6" + resolved "https://registry.npmjs.org/react-is/-/react-is-19.2.6.tgz" + integrity sha512-XjBR15BhXuylgWGuslhDKqlSayuqvqBX91BP8pauG8kd1zY8kotkNWbXksTCNRarse4kuGbe2kIY05ARtwNIvw== react-katex@^3.1.0: version "3.1.0" @@ -5398,7 +5418,7 @@ react-katex@^3.1.0: react-markdown@^10.1.0: version "10.1.0" - resolved "https://registry.npmjs.org/react-markdown/-/react-markdown-10.1.0.tgz#e22bc20faddbc07605c15284255653c0f3bad5ca" + resolved "https://registry.npmjs.org/react-markdown/-/react-markdown-10.1.0.tgz" integrity sha512-qKxVopLT/TyA6BX3Ue5NwabOsAzm0Q7kAPwq6L+wWDwisYs7R8vZ0nRXqq6rkueboxpkjvLGU9fWifiX/ZZFxQ== dependencies: "@types/hast" "^3.0.0" @@ -5477,7 +5497,7 @@ react-virtuoso@^4.3.10: react-window@^1.8.11: version "1.8.11" - resolved "https://registry.yarnpkg.com/react-window/-/react-window-1.8.11.tgz#a857b48fa85bd77042d59cc460964ff2e0648525" + resolved "https://registry.npmjs.org/react-window/-/react-window-1.8.11.tgz" integrity sha512-+SRbUVT2scadgFSWx+R1P754xHPEqvcfSfVX10QYg6POOz+WNgkN48pS+BtZNIMGiL1HYrSEiCkwsMS15QogEQ== dependencies: "@babel/runtime" "^7.0.0" @@ -5551,7 +5571,7 @@ redux@^4.1.1, redux@^4.2.0, redux@^4.2.1: redux@^5.0.0: version "5.0.1" - resolved "https://registry.yarnpkg.com/redux/-/redux-5.0.1.tgz#97fa26881ce5746500125585d5642c77b6e9447b" + resolved "https://registry.npmjs.org/redux/-/redux-5.0.1.tgz" integrity sha512-M9/ELqF6fy8FwmkpnF0S3YKOqMyoWJ4+CS5Efg2ct3oY9daQvd/Pc71FpGZsVsbl3Cpb+IIcjBDUnnyBdQbq4w== reflect.getprototypeof@^1.0.6, reflect.getprototypeof@^1.0.9: @@ -5582,7 +5602,7 @@ regexp.prototype.flags@^1.5.3, regexp.prototype.flags@^1.5.4: remark-parse@^11.0.0: version "11.0.0" - resolved "https://registry.npmjs.org/remark-parse/-/remark-parse-11.0.0.tgz#aa60743fcb37ebf6b069204eb4da304e40db45a1" + resolved "https://registry.npmjs.org/remark-parse/-/remark-parse-11.0.0.tgz" integrity sha512-FCxlKLNGknS5ba/1lmpYijMUzX2esxW5xQqjWxw2eHFfS2MSdaHVINFmhjo+qN1WhZhNimq0dZATN9pH0IDrpA== dependencies: "@types/mdast" "^4.0.0" @@ -5592,7 +5612,7 @@ remark-parse@^11.0.0: remark-rehype@^11.0.0: version "11.1.2" - resolved "https://registry.npmjs.org/remark-rehype/-/remark-rehype-11.1.2.tgz#2addaadda80ca9bd9aa0da763e74d16327683b37" + resolved "https://registry.npmjs.org/remark-rehype/-/remark-rehype-11.1.2.tgz" integrity sha512-Dh7l57ianaEoIpzbp0PC9UKAdCSVklD8E5Rpw7ETfbTl3FqcOOgq5q2LVDhgGCkaBv7p24JXikPdvhhmHvKMsw== dependencies: "@types/hast" "^3.0.0" @@ -5612,9 +5632,9 @@ reselect@^4.1.8: integrity sha512-ab9EmR80F/zQTMNeneUr4cv+jSwPJgIlvEmVwLerwrWVbpLlBuls9XHzIeTFy4cegU2NHBp3va0LKOzU5qFEYQ== reselect@^5.1.1: - version "5.1.1" - resolved "https://registry.npmjs.org/reselect/-/reselect-5.1.1.tgz#c766b1eb5d558291e5e550298adb0becc24bb72e" - integrity sha512-K/BG6eIky/SBpzfHZv/dd+9JBFiS4SWV7FIujVyJRux6e45+73RaUHXLmIR1f7WOMaQ0U1km6qwklRQxpJJY0w== + version "5.2.0" + resolved "https://registry.npmjs.org/reselect/-/reselect-5.2.0.tgz" + integrity sha512-AgZ3UOZm3YndfrJ4OYjgrT7bmCm/1iqkjvEfH/oYjzh6PD2qw4QuT3jjnXIrpdt4MTpMXclMT3lXbmRY+XRakw== resolve-from@^4.0.0: version "4.0.0" @@ -5655,37 +5675,37 @@ robust-predicates@^3.0.2: integrity sha512-NS3levdsRIUOmiJ8FZWCP7LG3QpJyrs/TE0Zpf1yvZu8cAJJ6QMW92H1c7kWpdIHo8RvmLxN/o2JXTKHp74lUA== rollup@^4.43.0: - version "4.60.3" - resolved "https://registry.npmjs.org/rollup/-/rollup-4.60.3.tgz#789258d41d090687d0ca7e80e8583d733711ddd3" - integrity sha512-pAQK9HalE84QSm4Po3EmWIZPd3FnjkShVkiMlz1iligWYkWQ7wHYd1PF/T7QZ5TVSD6uSTon5gBVMSM4JfBV+A== + version "4.60.0" + resolved "https://registry.npmjs.org/rollup/-/rollup-4.60.0.tgz" + integrity sha512-yqjxruMGBQJ2gG4HtjZtAfXArHomazDHoFwFFmZZl0r7Pdo7qCIXKqKHZc8yeoMgzJJ+pO6pEEHa+V7uzWlrAQ== dependencies: "@types/estree" "1.0.8" optionalDependencies: - "@rollup/rollup-android-arm-eabi" "4.60.3" - "@rollup/rollup-android-arm64" "4.60.3" - "@rollup/rollup-darwin-arm64" "4.60.3" - "@rollup/rollup-darwin-x64" "4.60.3" - "@rollup/rollup-freebsd-arm64" "4.60.3" - "@rollup/rollup-freebsd-x64" "4.60.3" - "@rollup/rollup-linux-arm-gnueabihf" "4.60.3" - "@rollup/rollup-linux-arm-musleabihf" "4.60.3" - "@rollup/rollup-linux-arm64-gnu" "4.60.3" - "@rollup/rollup-linux-arm64-musl" "4.60.3" - "@rollup/rollup-linux-loong64-gnu" "4.60.3" - "@rollup/rollup-linux-loong64-musl" "4.60.3" - "@rollup/rollup-linux-ppc64-gnu" "4.60.3" - "@rollup/rollup-linux-ppc64-musl" "4.60.3" - "@rollup/rollup-linux-riscv64-gnu" "4.60.3" - "@rollup/rollup-linux-riscv64-musl" "4.60.3" - "@rollup/rollup-linux-s390x-gnu" "4.60.3" - "@rollup/rollup-linux-x64-gnu" "4.60.3" - "@rollup/rollup-linux-x64-musl" "4.60.3" - "@rollup/rollup-openbsd-x64" "4.60.3" - "@rollup/rollup-openharmony-arm64" "4.60.3" - "@rollup/rollup-win32-arm64-msvc" "4.60.3" - "@rollup/rollup-win32-ia32-msvc" "4.60.3" - "@rollup/rollup-win32-x64-gnu" "4.60.3" - "@rollup/rollup-win32-x64-msvc" "4.60.3" + "@rollup/rollup-android-arm-eabi" "4.60.0" + "@rollup/rollup-android-arm64" "4.60.0" + "@rollup/rollup-darwin-arm64" "4.60.0" + "@rollup/rollup-darwin-x64" "4.60.0" + "@rollup/rollup-freebsd-arm64" "4.60.0" + "@rollup/rollup-freebsd-x64" "4.60.0" + "@rollup/rollup-linux-arm-gnueabihf" "4.60.0" + "@rollup/rollup-linux-arm-musleabihf" "4.60.0" + "@rollup/rollup-linux-arm64-gnu" "4.60.0" + "@rollup/rollup-linux-arm64-musl" "4.60.0" + "@rollup/rollup-linux-loong64-gnu" "4.60.0" + "@rollup/rollup-linux-loong64-musl" "4.60.0" + "@rollup/rollup-linux-ppc64-gnu" "4.60.0" + "@rollup/rollup-linux-ppc64-musl" "4.60.0" + "@rollup/rollup-linux-riscv64-gnu" "4.60.0" + "@rollup/rollup-linux-riscv64-musl" "4.60.0" + "@rollup/rollup-linux-s390x-gnu" "4.60.0" + "@rollup/rollup-linux-x64-gnu" "4.60.0" + "@rollup/rollup-linux-x64-musl" "4.60.0" + "@rollup/rollup-openbsd-x64" "4.60.0" + "@rollup/rollup-openharmony-arm64" "4.60.0" + "@rollup/rollup-win32-arm64-msvc" "4.60.0" + "@rollup/rollup-win32-ia32-msvc" "4.60.0" + "@rollup/rollup-win32-x64-gnu" "4.60.0" + "@rollup/rollup-win32-x64-msvc" "4.60.0" fsevents "~2.3.2" rope-sequence@^1.3.0: @@ -5926,7 +5946,7 @@ source-map@^0.5.7: space-separated-tokens@^2.0.0: version "2.0.2" - resolved "https://registry.npmjs.org/space-separated-tokens/-/space-separated-tokens-2.0.2.tgz#1ecd9d2350a3844572c3f4a312bceb018348859f" + resolved "https://registry.npmjs.org/space-separated-tokens/-/space-separated-tokens-2.0.2.tgz" integrity sha512-PEGlAwrG8yXGXRjW32fGbg66JAlOAwbObuqVoJpv/mRgoWDQfgH1wDPvtzWyUSNAXBGSk8h755YDbbcEy3SH2Q== spectral.js@^2.0.2: @@ -6045,7 +6065,7 @@ string_decoder@~1.1.1: stringify-entities@^4.0.0: version "4.0.4" - resolved "https://registry.npmjs.org/stringify-entities/-/stringify-entities-4.0.4.tgz#b3b79ef5f277cc4ac73caeb0236c5ba939b3a4f3" + resolved "https://registry.npmjs.org/stringify-entities/-/stringify-entities-4.0.4.tgz" integrity sha512-IwfBptatlO+QCJUo19AqvrPNqlVMpW9YEL2LIVY+Rpv2qsjCGxaDLNRgeGsQWJhfItebuJhsGSLjaBbNSQ+ieg== dependencies: character-entities-html4 "^2.0.0" @@ -6077,14 +6097,14 @@ strip-json-comments@~2.0.1: style-to-js@^1.0.0: version "1.1.21" - resolved "https://registry.npmjs.org/style-to-js/-/style-to-js-1.1.21.tgz#2908941187f857e79e28e9cd78008b9a0b3e0e8d" + resolved "https://registry.npmjs.org/style-to-js/-/style-to-js-1.1.21.tgz" integrity sha512-RjQetxJrrUJLQPHbLku6U/ocGtzyjbJMP9lCNK7Ag0CNh690nSH8woqWH9u16nMjYBAok+i7JO1NP2pOy8IsPQ== dependencies: style-to-object "1.0.14" style-to-object@1.0.14: version "1.0.14" - resolved "https://registry.npmjs.org/style-to-object/-/style-to-object-1.0.14.tgz#1d22f0e7266bb8c6d8cae5caf4ec4f005e08f611" + resolved "https://registry.npmjs.org/style-to-object/-/style-to-object-1.0.14.tgz" integrity sha512-LIN7rULI0jBscWQYaSswptyderlarFkjQ+t79nzty8tcIAceVomEVlLzH5VP4Cmsv6MtKhs7qaAiwlcp+Mgaxw== dependencies: inline-style-parser "0.2.7" @@ -6185,9 +6205,9 @@ tldts@^7.0.5: tldts-core "^7.0.27" tmp@^0.2.0: - version "0.2.5" - resolved "https://registry.npmjs.org/tmp/-/tmp-0.2.5.tgz" - integrity sha512-voyz6MApa1rQGUxT3E+BK7/ROe8itEx7vD8/HEvt4xwXucvQ5G5oeEiHkmHZJuBO21RpOf+YYm9MOivj709jow== + version "0.2.7" + resolved "https://registry.yarnpkg.com/tmp/-/tmp-0.2.7.tgz#26f4db11d1601ce8012dcb8a798ece1c06a99059" + integrity sha512-e0votIpp4Uo2AJYSzVHV6xCcawuiez3DzqDAbrTc3YxBkplN6e+dM13ZeIcZnDg/QpSuU2zfZ3rzwY8ukEnaXw== topojson-client@^3.1.0: version "3.1.0" @@ -6217,12 +6237,12 @@ tr46@^6.0.0: trim-lines@^3.0.0: version "3.0.1" - resolved "https://registry.npmjs.org/trim-lines/-/trim-lines-3.0.1.tgz#d802e332a07df861c48802c04321017b1bd87338" + resolved "https://registry.npmjs.org/trim-lines/-/trim-lines-3.0.1.tgz" integrity sha512-kRj8B+YHZCc9kQYdWfJB2/oUl9rA99qbowYYBtr4ui4mZyAQ2JpvVBd/6U2YloATfqBhBTSMhTpgBHtU0Mf3Rg== trough@^2.0.0: version "2.2.0" - resolved "https://registry.npmjs.org/trough/-/trough-2.2.0.tgz#94a60bd6bd375c152c1df911a4b11d5b0256f50f" + resolved "https://registry.npmjs.org/trough/-/trough-2.2.0.tgz" integrity sha512-tmMpK00BjZiUyVyvrBK7knerNgmgvcV/KLVyuma/SC+TQN167GrMRciANTz09+k3zW8L8t60jWO1GpfkZdjTaw== ts-api-utils@^2.4.0: @@ -6341,7 +6361,7 @@ undici@^7.24.5: unified@^11.0.0: version "11.0.5" - resolved "https://registry.npmjs.org/unified/-/unified-11.0.5.tgz#f66677610a5c0a9ee90cab2b8d4d66037026d9e1" + resolved "https://registry.npmjs.org/unified/-/unified-11.0.5.tgz" integrity sha512-xKvGhPWw3k84Qjh8bI3ZeJjqnyadK+GEFtazSfZv/rKeTkTjOJho6mFqh2SM96iIcZokxiOpg78GazTSg8+KHA== dependencies: "@types/unist" "^3.0.0" @@ -6354,28 +6374,28 @@ unified@^11.0.0: unist-util-is@^6.0.0: version "6.0.1" - resolved "https://registry.npmjs.org/unist-util-is/-/unist-util-is-6.0.1.tgz#d0a3f86f2dd0db7acd7d8c2478080b5c67f9c6a9" + resolved "https://registry.npmjs.org/unist-util-is/-/unist-util-is-6.0.1.tgz" integrity sha512-LsiILbtBETkDz8I9p1dQ0uyRUWuaQzd/cuEeS1hoRSyW5E5XGmTzlwY1OrNzzakGowI9Dr/I8HVaw4hTtnxy8g== dependencies: "@types/unist" "^3.0.0" unist-util-position@^5.0.0: version "5.0.0" - resolved "https://registry.npmjs.org/unist-util-position/-/unist-util-position-5.0.0.tgz#678f20ab5ca1207a97d7ea8a388373c9cf896be4" + resolved "https://registry.npmjs.org/unist-util-position/-/unist-util-position-5.0.0.tgz" integrity sha512-fucsC7HjXvkB5R3kTCO7kUjRdrS0BJt3M/FPxmHMBOm8JQi2BsHAHFsy27E0EolP8rp0NzXsJ+jNPyDWvOJZPA== dependencies: "@types/unist" "^3.0.0" unist-util-stringify-position@^4.0.0: version "4.0.0" - resolved "https://registry.npmjs.org/unist-util-stringify-position/-/unist-util-stringify-position-4.0.0.tgz#449c6e21a880e0855bf5aabadeb3a740314abac2" + resolved "https://registry.npmjs.org/unist-util-stringify-position/-/unist-util-stringify-position-4.0.0.tgz" integrity sha512-0ASV06AAoKCDkS2+xw5RXJywruurpbC4JZSm7nr7MOt1ojAzvyyaO+UxZf18j8FCF6kmzCZKcAgN/yu2gm2XgQ== dependencies: "@types/unist" "^3.0.0" unist-util-visit-parents@^6.0.0: version "6.0.2" - resolved "https://registry.npmjs.org/unist-util-visit-parents/-/unist-util-visit-parents-6.0.2.tgz#777df7fb98652ce16b4b7cd999d0a1a40efa3a02" + resolved "https://registry.npmjs.org/unist-util-visit-parents/-/unist-util-visit-parents-6.0.2.tgz" integrity sha512-goh1s1TBrqSqukSc8wrjwWhL0hiJxgA8m4kFxGlQ+8FYQ3C/m11FcTs4YYem7V664AhHVvgoQLk890Ssdsr2IQ== dependencies: "@types/unist" "^3.0.0" @@ -6383,7 +6403,7 @@ unist-util-visit-parents@^6.0.0: unist-util-visit@^5.0.0: version "5.1.0" - resolved "https://registry.npmjs.org/unist-util-visit/-/unist-util-visit-5.1.0.tgz#9a2a28b0aa76a15e0da70a08a5863a2f060e2468" + resolved "https://registry.npmjs.org/unist-util-visit/-/unist-util-visit-5.1.0.tgz" integrity sha512-m+vIdyeCOpdr/QeQCu2EzxX/ohgS8KbnPDgFni4dQsfSCtpz8UqDyY5GjRru8PDKuYn7Fq19j1CQ+nJSsGKOzg== dependencies: "@types/unist" "^3.0.0" @@ -6846,7 +6866,7 @@ vega@^6.2.0: vfile-message@^4.0.0: version "4.0.3" - resolved "https://registry.npmjs.org/vfile-message/-/vfile-message-4.0.3.tgz#87b44dddd7b70f0641c2e3ed0864ba73e2ea8df4" + resolved "https://registry.npmjs.org/vfile-message/-/vfile-message-4.0.3.tgz" integrity sha512-QTHzsGd1EhbZs4AsQ20JX1rC3cOlt/IWJruk893DfLRr57lcnOeMaWG4K0JrRta4mIJZKth2Au3mM3u03/JWKw== dependencies: "@types/unist" "^3.0.0" @@ -6854,7 +6874,7 @@ vfile-message@^4.0.0: vfile@^6.0.0: version "6.0.3" - resolved "https://registry.npmjs.org/vfile/-/vfile-6.0.3.tgz#3652ab1c496531852bf55a6bac57af981ebc38ab" + resolved "https://registry.npmjs.org/vfile/-/vfile-6.0.3.tgz" integrity sha512-KzIbH/9tXat2u30jf+smMwFCsno4wHVdNmzFyL+T/L3UGqqk6JKfVqOFOZEpZSHADH1k40ab6NUIXZq422ov3Q== dependencies: "@types/unist" "^3.0.0" @@ -6862,7 +6882,7 @@ vfile@^6.0.0: "vite@^6.0.0 || ^7.0.0 || ^8.0.0", vite@^7.3.3: version "7.3.3" - resolved "https://registry.npmjs.org/vite/-/vite-7.3.3.tgz#d7e07a52b5873fb86f902a3f4b3d17410337450f" + resolved "https://registry.npmjs.org/vite/-/vite-7.3.3.tgz" integrity sha512-/4XH147Ui7OGTjg3HbdWe5arnZQSbfuRzdr9Ec7TQi5I7R+ir0Rlc9GIvD4v0XZurELqA035KVXJXpR61xhiTA== dependencies: esbuild "^0.27.0" @@ -7088,5 +7108,5 @@ zrender@6.0.0: zwitch@^2.0.0: version "2.0.4" - resolved "https://registry.npmjs.org/zwitch/-/zwitch-2.0.4.tgz#c827d4b0acb76fc3e685a4c6ec2902d51070e9d7" + resolved "https://registry.npmjs.org/zwitch/-/zwitch-2.0.4.tgz" integrity sha512-bXE4cR/kVZhKZX/RjPEflHaKVhUVl85noU3v6b8apfQEc1x4A+zBxjZ4lN8LqGd6WZ3dl98pY4o717VFmoPp+A==