Move knowledge of how data status works to db layer by PGijsbers · Pull Request #324 · openml/server-api

PGijsbers · 2026-05-06T08:34:11Z

Knowing that a lack of rows for dataset status implies being 'in preparation' is a database concern.

coderabbitai · 2026-05-06T08:34:23Z

Warning

Rate limit exceeded

@PGijsbers has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 30 minutes and 33 seconds before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d43161ad-4657-4f42-8840-db4a71b1bd99

📥 Commits

Reviewing files that changed from the base of the PR and between f62af09 and 0971055.

📒 Files selected for processing (1)

src/database/datasets.py

Walkthrough

The changes normalize dataset status handling across the database and router layers. The get_status function in src/database/datasets.py now returns a DatasetStatus enum value directly instead of a raw database row, with a default fallback to IN_PREPARATION. The update_dataset_status endpoint in src/routers/openml/datasets.py is refactored to treat current status as a direct enum value, eliminating object property access patterns. Status transition logic and logging are updated to compare and use enum values directly. The get_dataset endpoint removes intermediate status derivation and passes the status value directly to the response metadata.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main refactoring effort: moving status logic from the router layer to the database layer.
Description check	✅ Passed	The description explains the key motivation behind the changes: treating missing status rows as a database concern rather than a router concern.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch status-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The get_status function now returns a DatasetStatus enum rather than a Row | None, so its return type annotation should be updated accordingly to avoid confusing or misleading type hints.
Since get_status now encapsulates the status mapping logic, consider renaming it (e.g., to get_dataset_status_value) or documenting its behavior to make clear that it returns a DatasetStatus enum, not a DB row.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `get_status` function now returns a `DatasetStatus` enum rather than a `Row | None`, so its return type annotation should be updated accordingly to avoid confusing or misleading type hints.
- Since `get_status` now encapsulates the status mapping logic, consider renaming it (e.g., to `get_dataset_status_value`) or documenting its behavior to make clear that it returns a `DatasetStatus` enum, not a DB row.

## Individual Comments

### Comment 1
<location path="src/database/datasets.py" line_range="106-115" />
<code_context>

 async def get_status(id_: int, connection: AsyncConnection) -> Row | None:
     """Get most recent status for the dataset."""
-    row = await connection.execute(
-        text(
-            """
+    row = (
+        await connection.execute(
+            text(
+                """
     SELECT *
     FROM dataset_status
     WHERE did = :dataset_id
     ORDER BY status_date DESC
+    LIMIT 1
     """,
-        ),
-        parameters={"dataset_id": id_},
-    )
-    return row.first()
+            ),
+            parameters={"dataset_id": id_},
+        )
+    ).first()
+    return DatasetStatus(row.status) if row else DatasetStatus.IN_PREPARATION


</code_context>
<issue_to_address>
**issue (bug_risk):** Update the return type annotation of `get_status` to reflect it now returns `DatasetStatus` instead of a DB row or `None`.

The function now always returns a `DatasetStatus` (defaulting to `IN_PREPARATION` when no row is found), but the signature still says `Row | None`. This inaccurate type can mislead callers and type checkers, especially where a row with `.status` is still expected. Please update the return annotation to `-> DatasetStatus` (or a more precise union if you intend to support multiple types).
</issue_to_address>

### Comment 2
<location path="src/database/datasets.py" line_range="112" />
<code_context>
+        await connection.execute(
+            text(
+                """
     SELECT *
     FROM dataset_status
     WHERE did = :dataset_id
     ORDER BY status_date DESC
</code_context>
<issue_to_address>
**suggestion (performance):** Avoid `SELECT *` here and fetch only the `status` column that is actually used.

`get_status` only reads `row.status`, so selecting all columns is unnecessary. Restrict the query to `SELECT status` (or the minimal required columns) to better match the function’s intent and avoid extra data transfer.

```suggestion
    SELECT status
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

codecov · 2026-05-06T08:36:26Z

Codecov Report

❌ Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.87%. Comparing base (3e7c4fc) to head (0971055).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/database/datasets.py	83.33%	0 Missing and 1 partial ⚠️
src/routers/openml/datasets.py	75.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #324      +/-   ##
==========================================
+ Coverage   93.69%   93.87%   +0.18%     
==========================================
  Files          68       69       +1     
  Lines        3154     3248      +94     
  Branches      223      227       +4     
==========================================
+ Hits         2955     3049      +94     
  Misses        139      139              
  Partials       60       60

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/database/datasets.py`:
- Around line 106-122: The get_status function signature and docstring are out
of sync: update the return type annotation of get_status to return DatasetStatus
(not Row | None) and adjust the docstring to state it always returns a
DatasetStatus and that it defaults to DatasetStatus.IN_PREPARATION when no row
is found; also remove or update any now-unused Row | None imports/annotations
surrounding get_status to keep types consistent and help type-checkers identify
the concrete return type.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e8d691b6-f9e5-4249-8993-08a4ffa63839

📥 Commits

Reviewing files that changed from the base of the PR and between 3014e44 and f62af09.

📒 Files selected for processing (2)

src/database/datasets.py
src/routers/openml/datasets.py

Move knowledge of how data status works to db layer

f62af09

PGijsbers added the maintenance improvements or changes to existing systems label May 6, 2026

sourcery-ai Bot reviewed May 6, 2026

View reviewed changes

Comment thread src/database/datasets.py Outdated

Comment thread src/database/datasets.py Outdated

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Comment thread src/database/datasets.py Outdated

fix minor oversights

0971055

PGijsbers merged commit 82ab6e9 into main May 6, 2026
8 of 9 checks passed

PGijsbers deleted the status-refactor branch May 6, 2026 09:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move knowledge of how data status works to db layer#324

Move knowledge of how data status works to db layer#324
PGijsbers merged 2 commits intomainfrom
status-refactor

PGijsbers commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026 •

edited

Loading

Rate limit exceeded

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 6, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

PGijsbers commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 6, 2026 •

edited

Loading

codecov Bot commented May 6, 2026 •

edited

Loading