Move knowledge of how data status works to db layer#324
Conversation
|
Warning Rate limit exceeded
To continue reviewing without waiting, purchase usage credits in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThe changes normalize dataset status handling across the database and router layers. The 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The
get_statusfunction now returns aDatasetStatusenum rather than aRow | None, so its return type annotation should be updated accordingly to avoid confusing or misleading type hints. - Since
get_statusnow encapsulates the status mapping logic, consider renaming it (e.g., toget_dataset_status_value) or documenting its behavior to make clear that it returns aDatasetStatusenum, not a DB row.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `get_status` function now returns a `DatasetStatus` enum rather than a `Row | None`, so its return type annotation should be updated accordingly to avoid confusing or misleading type hints.
- Since `get_status` now encapsulates the status mapping logic, consider renaming it (e.g., to `get_dataset_status_value`) or documenting its behavior to make clear that it returns a `DatasetStatus` enum, not a DB row.
## Individual Comments
### Comment 1
<location path="src/database/datasets.py" line_range="106-115" />
<code_context>
async def get_status(id_: int, connection: AsyncConnection) -> Row | None:
"""Get most recent status for the dataset."""
- row = await connection.execute(
- text(
- """
+ row = (
+ await connection.execute(
+ text(
+ """
SELECT *
FROM dataset_status
WHERE did = :dataset_id
ORDER BY status_date DESC
+ LIMIT 1
""",
- ),
- parameters={"dataset_id": id_},
- )
- return row.first()
+ ),
+ parameters={"dataset_id": id_},
+ )
+ ).first()
+ return DatasetStatus(row.status) if row else DatasetStatus.IN_PREPARATION
</code_context>
<issue_to_address>
**issue (bug_risk):** Update the return type annotation of `get_status` to reflect it now returns `DatasetStatus` instead of a DB row or `None`.
The function now always returns a `DatasetStatus` (defaulting to `IN_PREPARATION` when no row is found), but the signature still says `Row | None`. This inaccurate type can mislead callers and type checkers, especially where a row with `.status` is still expected. Please update the return annotation to `-> DatasetStatus` (or a more precise union if you intend to support multiple types).
</issue_to_address>
### Comment 2
<location path="src/database/datasets.py" line_range="112" />
<code_context>
+ await connection.execute(
+ text(
+ """
SELECT *
FROM dataset_status
WHERE did = :dataset_id
ORDER BY status_date DESC
</code_context>
<issue_to_address>
**suggestion (performance):** Avoid `SELECT *` here and fetch only the `status` column that is actually used.
`get_status` only reads `row.status`, so selecting all columns is unnecessary. Restrict the query to `SELECT status` (or the minimal required columns) to better match the function’s intent and avoid extra data transfer.
```suggestion
SELECT status
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #324 +/- ##
==========================================
+ Coverage 93.69% 93.87% +0.18%
==========================================
Files 68 69 +1
Lines 3154 3248 +94
Branches 223 227 +4
==========================================
+ Hits 2955 3049 +94
Misses 139 139
Partials 60 60 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/database/datasets.py`:
- Around line 106-122: The get_status function signature and docstring are out
of sync: update the return type annotation of get_status to return DatasetStatus
(not Row | None) and adjust the docstring to state it always returns a
DatasetStatus and that it defaults to DatasetStatus.IN_PREPARATION when no row
is found; also remove or update any now-unused Row | None imports/annotations
surrounding get_status to keep types consistent and help type-checkers identify
the concrete return type.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e8d691b6-f9e5-4249-8993-08a4ffa63839
📒 Files selected for processing (2)
src/database/datasets.pysrc/routers/openml/datasets.py
Knowing that a lack of rows for dataset status implies being 'in preparation' is a database concern.