Skip to content

Issue 8124 8126 fix 7660 7682 2 6266 after review#8154

Open
melton-jason wants to merge 49 commits into
v7_12_0_7_basefrom
issue-8124-8126-fix-7660-7682-2-after-review-2
Open

Issue 8124 8126 fix 7660 7682 2 6266 after review#8154
melton-jason wants to merge 49 commits into
v7_12_0_7_basefrom
issue-8124-8126-fix-7660-7682-2-after-review-2

Conversation

@melton-jason
Copy link
Copy Markdown
Contributor

@melton-jason melton-jason commented Jun 1, 2026

Taken from #8142 (comment) by @CarolineDenis:

Fixes #8124
Fixes #8126
Fixes #8101
Fixes #8065

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add relevant issue to release milestone
  • Add pr to documentation list
  • Add automated tests
  • Add a reverse migration if a migration is present in the PR
  • Add migration function to
    def fix_schema_config(stdout: WriteToStdOut | None = None):

Testing instructions

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Fixed catalogNumber uniqueness constraint handling in data migrations
    • Corrected RelativeAgeCitation field reference in schema configuration
    • Improved tectonic unit tree definition discipline link initialization
    • Fixed duplicate uniqueness rule removal logic
  • Performance Improvements

    • Optimized bulk database operations for taxonomic tree definitions and tectonic units
    • Refactored permission assignment queries for efficiency
  • Chores

    • Debug SQL logging now respects production environment settings

Fix: compare full rule definitions before deleting anything in uniqueness

(cherry picked from commit c61a530)
(cherry picked from commit b1f0346)
(cherry picked from commit d4c0231)
(cherry picked from commit bab5083)
(cherry picked from commit 11b6608)
(cherry picked from commit af4c963)
(cherry picked from commit 57bca64)
(cherry picked from commit e07aaeb)
(cherry picked from commit b183031)
(cherry picked from commit 2e9e4ad)
(cherry picked from commit 2254fc4)
(cherry picked from commit d8bd82d)
@grantfitzsimmons
Copy link
Copy Markdown
Member

grantfitzsimmons commented Jun 1, 2026

PR review by deepseek-v4-pro:


Summary by Category

🛡️ Defensive / Safety Improvements

File Change Notes
backend/stored_queries/execution.py (×2) Guarded log_sqlalchemy_query(query) behind if settings.DEBUG: Prevents unconditional SQL logging in production in two call sites (execute and series_post_query). Good catch.
specify/api/utils.py Moved settings.DEBUG guard into log_sqlalchemy_query itself + removed literal_binds=True Centralizes the guard so all callers are protected. Removing literal_binds avoids parameter-binding errors on queries that use % or other special characters. ✅
specify/management/commands/run_key_migration_functions.py Changed logger.error(f"An error occurred: {e}")logger.exception("An error occurred while running key migrations") logger.exception auto-includes the traceback; the old code threw away the stack trace. Also removed the unused f-string (broke static analysis). ✅

🧩 Migration Correctness / Bug Fixes

File Change Notes
backend/businessrules/migration_utils.py catnum_rule_uneditable: now tracks matching rule IDs and calls .update(isDatabaseConstraint=True) on them when one already exists Previously: if a catalog-number uniqueness rule already existed for a discipline, it was silently ignored (no-op). Now it ensures the existing rule is a DB constraint. This is a meaningful behavioral fix — ensures pre-existing rules are enforced at the DB level. ✅
backend/businessrules/uniqueness_rules.py fix_global_default_rules: replaced the two-query Exists/OuterRef subquery approach with a Python-set-signature comparison Old code deleted discipline-level UniquenessRuleField rows that matched any global field, then cleaned up orphaned rules. New code builds a frozen set of (fieldPath, isScope) signatures per global rule and per discipline rule and compares them in memory, only deleting the discipline rule + its fields when there's an exact signature match. This is more deterministic and avoids the complex chained subquery approach that could have subtle Django ORM bugs. ⚠️ Risk: If two rules have the same fields but in a different order (which frozenset handles), consider edge cases where fields are not the only distinguishing factor (e.g., rule-level metadata like title on the UniquenessRule). Should still be fine since the goal is to deduplicate discipline-level duplicates of global rules.
specify/migration_utils/update_schema_config.py (multiple sites) get_or_create / except MultipleObjectsReturned → explicit .filter().order_by('id').first() + .create() if None Issue: get_or_create with discipline + name + schematype can return MultipleObjectsReturned when duplicates exist from prior migrations. The old except MultipleObjectsReturned path used .filter(...).first(), which returned an arbitrary row. Fix: Always use .filter(...).order_by('id').first() + create-if-None. Deterministic row selection. ✅
specify/migration_utils/update_schema_config.py name=field_name.lower()name__iexact=field_name (×7 locations) This changes exact lowercase matching to case-insensitive exact matching. The old code assumed field_name.lower() would match the DB value; the new code properly handles mixed-case names like collectionObjectType. ⚠️ Verify: __iexact depends on database collation. With case-insensitive collation, __iexact is equivalent to __exact, so this should be safe. ✅
specify/migration_utils/update_schema_config.py update_cog_type_fields: container__name="CollectionObject""collectionobject" Fixes a case-sensitivity mismatch for Splocalecontainer names. ✅
specify/migration_utils/update_schema_config.py update_0034_schema_config_field_desc: if not desc_str or not name_str: continue → separate if desc_str is not None + if name_str is not None blocks Old code: skipped the entire iteration if either was falsy, meaning in a partial-data scenario (one text string missing), neither got updated. New code: updates each independently. This is a correctness fix for partially populated locale data. ✅
specify/migration_utils/update_schema_config.py reverse_hide_component_fields: items.update(ishidden=True)items.update(ishidden=False) This is a behavioral reversal fix. The "reverse" function was setting ishidden=True (hiding items) instead of ishidden=False (unhiding them). Clear logic error. ✅
specify/migration_utils/update_schema_config.py update_schema_config_field_desc_for_components: added language="en" filter to .update() calls Prevents accidentally overwriting non-English locale strings with English defaults. Important for multilingual collections. ✅
specify/migration_utils/sp7_schemaconfig.py RelativeAgeCitation: absoluteAgeCitationIdrelativeAgeCitationId Bug fix. The schema config listed the wrong field ID for RelativeAgeCitation — it was using AbsoluteAgeCitation's ID. This would cause locale/schema configuration issues for relative age citations. ✅
specify/migrations/0008_ageCitations_fix.py revert_migration: usc.update_relative_age_fieldsusc.revert_relative_age_fields The revert function was calling the forward migration function instead of the reverse one. Rollback would have been a no-op or incorrectly applied. ✅
specify/migrations/0009_tectonic_ranks.py Reversed rollback order: now revert_create_root_tectonic_node before revert_default_tectonic_ranks Critical ordering fix. Tectonic unit tree nodes depend on the tree def items (which are ranks), so you must delete the child (tree entries) before the parent (ranks). The old order was reversed and would violate FK constraints on rollback. ✅

🔗 Tectonic Unit Linking / Determinism

File Change Notes
specify/migration_utils/default_cots.py fix_tectonic_unit_treedef_discipline_links: replaced ad-hoc pairwise linking with a deterministic zip() Old code: iterated disciplines and tree-defs independently, using .first() and .last() calls that could produce non-deterministic pairings across database engines. New code: list() + .order_by('id') + zip(empty_disciplines, empty_tectonic_unit_treedefs, strict=False). Pairing is now deterministic by ID ordering. Excess disciplines (more than tree-defs) get new tree-defs created. ✅
specify/migration_utils/tectonic_ranks.py create_default_tectonic_ranks: simplified query + always get_or_create Removed the TectonicTreeDef.objects.values_list('discipline_id', flat=True) exclusion since .filter(tectonicunittreedef__isnull=True) already filters those out. Simpler and more correct. ✅
specify/migration_utils/tectonic_ranks.py create_root_tectonic_node: added upsert logic — if a tree-def item with name="Root" already exists, update its rankid, parent, and isenforced Previously only created if (treedef, rankid=0, parent=None) didn't exist. Now also handles the case where a Root node exists but has stale field values. ✅
specify/migration_utils/tectonic_ranks.py revert_create_root_tectonic_node: scoped the TectonicUnit delete to name="Root", definition=tectonic_tree_def, parent__isnull=True + reversed order (delete Units before TreeDefItems) Old code deleted all TreeDefItems for the treedef and all TectonicUnits named "Root" globally (no discipline/definition scope). New code is properly scoped. ✅

🔐 Permissions / User Initialization

File Change Notes
backend/permissions/initialize.py create_admins: removed early-return guard, replaced with per-user UserPolicy check Old code: if any admin policy existed, all admin creation was skipped entirely (regardless of which user). New code: each user is individually checked for an existing admin UserPolicy. If a user already has one, they're skipped; others still get policies created. This ensures new users added after the first admin are not missed. is_sp6_user_permissions_migrated removal: that function's logic is now inlined as the UserPolicy.objects.filter(...).exists() check. Verify this is semantically equivalent — the old function may have checked collection-scoped policies. ✅
backend/permissions/initialize.py assign_users_to_roles: raw SQL changed from NOT IN (subquery) to NOT EXISTS (correlated subquery) Old code had two separate NOT IN blocks — one for user/role and one for collection. The second was c.UserGroupScopeId NOT IN (SELECT DISTINCT r.collection_id ...) which excluded any collection that had any user-role assignment, not just the current user's. New code is a single correlated NOT EXISTS that correctly checks whether this specific user has a role in this specific collection. This is a significant correctness fix for collections with multiple users using different roles. ✅

🗄️ Database Alias / Multi-DB Support

File Change Notes
backend/patches/migration_utils.py update_is_accepted and update_coordinates: added .using(db_alias) routing Uses schema_editor.connection.alias to target the correct database in multi-DB setups. Previously all operations defaulted to "default". Also switched from tree_model.objects to tree_model._base_manager (standard in migration functions to avoid filtered managers). ✅
specify/migrations/0002_geo.py Removed create_default_collection_types import from specify.api.utils, removed using=db_alias kwarg, imported from local migration utils instead The using parameter was being passed but create_default_collection_types may not accept it (or it wasn't needed). The import is now from the migration-level utils where it belongs. ✅

🔍 Items That Deserve Extra Attention

  1. fix_global_default_rules refactor — The shift from ORM subqueries to in-memory set comparison is cleaner, but verify this against a real database with multiple disciplines and overlapping global rules. A migration integration test would provide confidence.

  2. is_sp6_user_permissions_migrated removal — This function was completely removed in favor of a direct UserPolicy filter. Double-check that the new inline check covers all cases the old function did (especially legacy SP6 migrated users who might have collection-scoped admin policies rather than global ones).

  3. __iexact vs __exact + .lower() — On MySQL with utf8mb4_general_ci (case-insensitive collation, which is standard for Specify), __iexact and .lower() behave identically. However, if any database uses a case-sensitive collation, __iexact is actually more correct than the old .lower() approach. This change is an improvement.

  4. revert_create_root_tectonic_node ordering — The fix reverses the delete order (units before tree-def items) which is correct for FK constraints. However, verify the parent tree-def items also get cleaned up — the function deletes TectonicUnitTreeDefItem.objects.filter(treedef=tectonic_tree_def).delete() which would cascade, but explicit ordering is still preferred.


✅ Overall Assessment

This PR is a well-scoped collection of fixes addressing review feedback from #8142. The changes fall primarily into:

  • Bug fixes (wrong field IDs, wrong revert functions, wrong rollback ordering)
  • Correctness improvements (per-user admin seeding, deterministic tectonic linking, locale string handling)
  • Safety improvements (debug logging guards, DB alias support, better error logging)
  • Code quality (simpler ORM patterns, deterministic queries)

No new features are introduced. All changes are in migration utilities, schema config helpers, and permission initialization — code that runs during migrations and setup, not hot-path request handling.

The changes are well-contained, each targeting a specific problem from the linked issues. I'd recommend testing the tectonic unit rollback (0009_tectonic_ranks revert) and the admin creation flow with multiple users to verify the per-user behavior.

@grantfitzsimmons grantfitzsimmons self-requested a review June 1, 2026 18:07
Copy link
Copy Markdown
Member

@grantfitzsimmons grantfitzsimmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make my contribution more worthwhile, I am looking closely at the things deepseek suggested need extra attention. It seems like most of the changes seem obvious and should have been done in the first place!

  1. fix_global_default_rules refactor — The shift from ORM subqueries to in-memory set comparison is cleaner, but verify this against a real database with multiple disciplines and overlapping global rules. A migration integration test would provide confidence.

To me it seems the new code is actually a total improvement (the old code over-deleted partial Exists matches, but the new code only deletes on exact full-signature match). Happy with how this looks.

  1. is_sp6_user_permissions_migrated removal — This function was completely removed in favor of a direct UserPolicy filter. Double-check that the new inline check covers all cases the old function did (especially legacy SP6 migrated users who might have collection-scoped admin policies rather than global ones).

This is still here despite not being used by any runtime code, and the new solution is more straightforward. Should be removed IMO.

def is_sp6_user_permissions_migrated(user, apps=apps) -> bool:
UserPolicy = apps.get_model('permissions', 'UserPolicy')
UserRole = apps.get_model('permissions', 'UserRole')
return UserRole.objects.filter(specifyuser=user).exists() or \
UserPolicy.objects.filter(specifyuser=user).exists()

  1. __iexact vs __exact + .lower() — On MySQL with utf8mb4_general_ci (case-insensitive collation, which is standard for Specify), __iexact and .lower() behave identically. However, if any database uses a case-sensitive collation, __iexact is actually more correct than the old .lower() approach. This change is an improvement.

Happy with this!

  1. revert_create_root_tectonic_node ordering — The fix reverses the delete order (units before tree-def items) which is correct for FK constraints. However, verify the parent tree-def items also get cleaned up — the function deletes TectonicUnitTreeDefItem.objects.filter(treedef=tectonic_tree_def).delete() which would cascade, but explicit ordering is still preferred.

Cascade is desirable here IMO.


Should we do some integration testing to make sure all is good in a multi-db environment (maybe using the scheme defined in docker-compositions/docker/specifycloud)?

@CarolineDenis
Copy link
Copy Markdown
Contributor

@grantfitzsimmons, yes integration testing is a must I think.

@melton-jason
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
specifyweb/backend/permissions/initialize.py (1)

89-92: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Duplicate UserPolicy assignment.

UserPolicy is assigned on line 89 and again on line 91. The second assignment is redundant.

🧹 Suggested fix
 def assign_users_to_roles(apps=apps) -> None:
     Role = apps.get_model('permissions', 'Role')
     UserPolicy = apps.get_model('permissions', 'UserPolicy')
     Agent = apps.get_model('specify', 'Agent')
-    UserPolicy = apps.get_model('permissions', 'UserPolicy')
     UserRole = apps.get_model('permissions', 'UserRole')
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specifyweb/backend/permissions/initialize.py` around lines 89 - 92, Remove
the redundant reassignment of UserPolicy: the code calls
apps.get_model('permissions', 'UserPolicy') twice (once assigned to UserPolicy
on the same block alongside Agent and UserRole); delete the duplicate line so
UserPolicy is only assigned once and keep the existing Agent and UserRole
assignments intact to avoid shadowing or confusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specifyweb/specify/migration_utils/tectonic_ranks.py`:
- Around line 97-106: Tighten root-node detection by requiring both parent=None
and rankid=0 when identifying roots: update the Exists filter on TectonicUnit
(used in the TectonicUnitTreeDef.objects.annotate root_node_exists) to include
rankid=0 alongside parent=None and definition=OuterRef("pk"), and make the same
change in the revert/delete path that currently selects root candidates by name
(the logic around lines referenced for deletion) so both forward and revert
paths use parent=None AND rankid=0 when locating root TectonicUnit rows.

In `@specifyweb/specify/migrations/0002_geo.py`:
- Line 75: The migration call to create_default_collection_types currently omits
the migration DB alias and thus writes to the default DB; change the call in the
RunPython migration in 0002_geo.py to pass the migration alias (use
schema_editor.connection.alias, e.g. create_default_collection_types(apps,
schema_editor.connection.alias)), then update the
create_default_collection_types function signature to accept a db_alias
parameter and ensure all DB operations inside (queries/creates/updates)
explicitly use .using(db_alias) or the equivalent to avoid cross-database
writes.

---

Outside diff comments:
In `@specifyweb/backend/permissions/initialize.py`:
- Around line 89-92: Remove the redundant reassignment of UserPolicy: the code
calls apps.get_model('permissions', 'UserPolicy') twice (once assigned to
UserPolicy on the same block alongside Agent and UserRole); delete the duplicate
line so UserPolicy is only assigned once and keep the existing Agent and
UserRole assignments intact to avoid shadowing or confusion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1694a881-7df6-4fad-8cb1-edb41d3c353b

📥 Commits

Reviewing files that changed from the base of the PR and between 3202d36 and 39ec012.

📒 Files selected for processing (16)
  • specifyweb/backend/businessrules/migration_utils.py
  • specifyweb/backend/businessrules/migrations/0004_catnum_uniquerule.py
  • specifyweb/backend/businessrules/uniqueness_rules.py
  • specifyweb/backend/patches/migration_utils.py
  • specifyweb/backend/permissions/initialize.py
  • specifyweb/backend/stored_queries/execution.py
  • specifyweb/backend/stored_queries/utils.py
  • specifyweb/specify/api/utils.py
  • specifyweb/specify/management/commands/run_key_migration_functions.py
  • specifyweb/specify/migration_utils/default_cots.py
  • specifyweb/specify/migration_utils/sp7_schemaconfig.py
  • specifyweb/specify/migration_utils/tectonic_ranks.py
  • specifyweb/specify/migration_utils/update_schema_config.py
  • specifyweb/specify/migrations/0002_geo.py
  • specifyweb/specify/migrations/0008_ageCitations_fix.py
  • specifyweb/specify/migrations/0009_tectonic_ranks.py

Comment on lines +97 to +106
trees_missing_root_node = TectonicUnitTreeDef.objects.annotate(
root_node_exists=Exists(
TectonicUnit.objects.filter(
parent=None,
definition=tectonic_tree_def,
definitionitem=tectonic_tree_def_item
definition=OuterRef("pk")
)
)
).filter(
root_node_exists=False
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tighten root-node identification to include rankid=0 in both forward and revert paths.

Current root detection/deletion only keys on parentless rows (and name in revert), which can mis-handle malformed trees that have parentless non-root entries.

Proposed fix
-    trees_missing_root_node = TectonicUnitTreeDef.objects.annotate(
+    trees_missing_root_node = TectonicUnitTreeDef.objects.annotate(
         root_node_exists=Exists(
             TectonicUnit.objects.filter(
                 parent=None,
+                rankid=0,
                 definition=OuterRef("pk")
             )
         )
     ).filter(
         root_node_exists=False
@@
             TectonicUnit.objects.filter(
                 name="Root",
                 definition=tectonic_tree_def,
                 parent__isnull=True,
+                rankid=0,
             ).delete()

Based on learnings: root lookups in this module should combine parent=None and rankid=0 to handle malformed-tree edge cases safely.

Also applies to: 143-147

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specifyweb/specify/migration_utils/tectonic_ranks.py` around lines 97 - 106,
Tighten root-node detection by requiring both parent=None and rankid=0 when
identifying roots: update the Exists filter on TectonicUnit (used in the
TectonicUnitTreeDef.objects.annotate root_node_exists) to include rankid=0
alongside parent=None and definition=OuterRef("pk"), and make the same change in
the revert/delete path that currently selects root candidates by name (the logic
around lines referenced for deletion) so both forward and revert paths use
parent=None AND rankid=0 when locating root TectonicUnit rows.

def consolidated_python_django_migration_operations(apps, schema_editor):
db_alias = schema_editor.connection.alias or 'migrator'
create_default_collection_types(apps, using=db_alias)
create_default_collection_types(apps)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pass the migration DB alias into create_default_collection_types to avoid cross-database writes.

This call now executes on the default connection, while the migration is alias-aware elsewhere; on non-default aliases, this can populate/update records in the wrong database.

Proposed fix (cross-file)
--- a/specifyweb/specify/migrations/0002_geo.py
+++ b/specifyweb/specify/migrations/0002_geo.py
@@
-        create_default_collection_types(apps)
+        create_default_collection_types(apps, using=db_alias)
--- a/specifyweb/specify/migration_utils/default_cots.py
+++ b/specifyweb/specify/migration_utils/default_cots.py
@@
-def create_default_collection_types(apps):
+def create_default_collection_types(apps, using='default'):
@@
-    for collection in Collection.objects.filter(collectionobjecttype__isnull=True):
+    for collection in Collection.objects.using(using).filter(collectionobjecttype__isnull=True):
@@
-        cot, created = Collectionobjecttype.objects.get_or_create(
+        cot, created = Collectionobjecttype.objects.using(using).get_or_create(
@@
-        Collectionobject.objects.filter(
+        Collectionobject.objects.using(using).filter(
             collection=collection).update(collectionobjecttype=cot)
         collection.collectionobjecttype = cot
-        collection.save()
+        collection.save(using=using)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specifyweb/specify/migrations/0002_geo.py` at line 75, The migration call to
create_default_collection_types currently omits the migration DB alias and thus
writes to the default DB; change the call in the RunPython migration in
0002_geo.py to pass the migration alias (use schema_editor.connection.alias,
e.g. create_default_collection_types(apps, schema_editor.connection.alias)),
then update the create_default_collection_types function signature to accept a
db_alias parameter and ensure all DB operations inside (queries/creates/updates)
explicitly use .using(db_alias) or the equivalent to avoid cross-database
writes.

@github-project-automation github-project-automation Bot moved this from 📋Back Log to Dev Attention Needed in General Tester Board Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Dev Attention Needed

3 participants