fix: Redact SSO PII before deletion by ktyagiapphelix2u · Pull Request #38425 · openedx/openedx-platform

ktyagiapphelix2u · 2026-04-23T09:43:45Z

Description

Implements automatic PII redaction for UserSocialAuth records before deletion to prevent personally identifiable information from persisting after records are removed.

Jira Ticket

https://2u-internal.atlassian.net/browse/BOMS-514

ktyagiapphelix2u · 2026-05-06T10:56:18Z

@robrap We’re dealing with multiple ways SSO records can get deleted through Django admin, user actions like unlinking accounts, bulk retirement scripts. The challenge is that we don’t control all of these paths, so we can’t reliably add PII redaction directly into each one.

Instead, we’ve set up a two-layer approach.

The first layer is a Django signal that runs automatically right before any SSO record is deleted. This acts as a safety net. No matter how the deletion is triggered whether it’s from admin, user action, the signal ensures sensitive fields like the UID and extra data are redacted. It’s centralized, consistent, so it won’t cause issues if it runs more than once.

The second layer is used only in cases we fully control, like user retirement flows. There, we proactively run a bulk redaction step before deleting records. This is much faster because it uses efficient database operations. When the delete happens afterward, the signal still fires, but it detects that the data is already redacted and simply exits without doing extra work.

Together, these two layers cover both safety and performance. The signal guarantees we never miss redaction, even in code we don’t control, while the explicit bulk step keeps large-scale operations efficient.

robrap · 2026-05-06T17:31:53Z

+
+    try:
+        update_fields = {}
+        redacted_uid = f'redacted_{instance.pk}@retired.invalid'


We should use something more generic, like redact-before-delete@safe.com (or whatever we've come up with). This is not a retired email in all cases.

Since we are using this email as a flag between the bulk retirement and here, we should be using a constant that both pieces of code make use of, to ensure they stay in sync.

I'd comment the short-circuit code below to mention why we are doing it. Something like:

These fields may have already been redacted as part of a bulk retirement, so we skip the update if it is already done to reduce query count.

[for future] I wonder if we should have generic code for models with annotated PII that automatically introduce this redaction into the pre_delete signal?

robrap · 2026-05-07T14:44:27Z

+                    social_auth_records = list(UserSocialAuth.objects.filter(user_id=user.id))
+                    for auth in social_auth_records:
+                        auth.uid = get_redacted_social_auth_uid(auth.pk)
+                        auth.extra_data = {}
+                    UserSocialAuth.objects.bulk_update(social_auth_records, ['uid', 'extra_data'])


See "Using F() Expressions" in https://websitehurdles.com/django-bulk-update/ as an example of how you could refer to the pk without an extra call.

Doing so may make it impossible to use the shared method, but you could use the constants you had started with.

robrap · 2026-05-07T14:57:42Z

                    UserRetirementStatus.create_retirement(user)
-                    # Unlink LMS social auth accounts
+                    # Redact and unlink LMS social auth accounts.
+                    social_auth_records = list(UserSocialAuth.objects.filter(user_id=user.id))


Is it possible to put UserSocialAuth.objects.filter(user_id=user.id) in a variable and call the bulk_update and delete off of this? It might make it more clear that we're working from the same set.

robrap · 2026-05-07T19:40:43Z

+    social_auth_queryset = UserSocialAuth.objects.filter(user_id=user.id)
+    social_auth_queryset.update(
+        uid=Concat(
+            Value(REDACTED_SOCIAL_AUTH_UID_PREFIX),
+            Cast('id', output_field=CharField()),
+            Value(REDACTED_SOCIAL_AUTH_UID_SUFFIX),
+        ),
+        extra_data={},
+    )
+    social_auth_queryset.delete()


Move to new method in utils called redact_and_delete_social_auth(user_id). This can be called from both locations, rather than duplicating code. Docstring can remind why we are redacting before deleting.

robrap · 2026-05-07T19:41:10Z

+                    # Redact and unlink LMS social auth accounts.
+                    social_auth_queryset = UserSocialAuth.objects.filter(user_id=user.id)
+                    social_auth_queryset.update(
+                        uid=Concat(
+                            Value(REDACTED_SOCIAL_AUTH_UID_PREFIX),
+                            Cast('id', output_field=CharField()),
+                            Value(REDACTED_SOCIAL_AUTH_UID_SUFFIX),
+                        ),
+                        extra_data={},
+                    )
+                    social_auth_queryset.delete()


See related comment about redact_and_delete_social_auth.

robrap · 2026-05-07T20:11:59Z

+    and clears extra_data.
+    Blocks deletion if redaction fails to prevent PII leaks to downstream systems.
+    """
+    if not instance or not instance.pk:


I want to greatly simplify this safety-net method to something like the following:

redacted_uid = get_redacted_social_auth_uid(instance.pk) # safety-net in case the record wasn't redacted before delete. if instance.extra_data or instance.uid != redacted_uid: logger.warn('Social auth link for ... was deleted without first being redacted.') redact_and_delete_social_auth(instance.user_id, skip_delete=True)

Reuses existing call (see other comments) and sends optional argument to skip the delete step.

The optional skip_delete is a little hacky, but the docstring for the method could note that it is only to be used with the delete signal, where delete was already called.

It is also a little hacky that the first link for the user will trigger redaction for all the remaining links (because it takes user_id and not pk, but it greatly simplifies the code to reuse.

I'd also hope that we wouldn't need any additional exception handling.

robrap · 2026-05-07T20:16:17Z

+# Prefix and suffix used to build a per-record redacted uid for UserSocialAuth.
+REDACTED_SOCIAL_AUTH_UID_PREFIX = 'redacted-before-delete-'
+REDACTED_SOCIAL_AUTH_UID_SUFFIX = '@safe.com'


It may make more sense for these to live in utils.py instead.

robrap · 2026-05-07T20:18:20Z

+
+def get_redacted_social_auth_uid(pk):
+    """
+    Return the redacted uid for a UserSocialAuth record. Single source of truth for this format.


The following is a possible update for this docstring.

Suggested change

Return the redacted uid for a UserSocialAuth record. Single source of truth for this format.

Return the redacted uid for a UserSocialAuth record.

This must match the format used in redact_and_delete_social_auth.

robrap · 2026-05-08T12:50:59Z

+    # Safety-net in case the record wasn't redacted before delete.
+    if instance.extra_data or instance.uid != redacted_uid:
+        logger.warning(
+            'Social auth link for user_id=%s, provider=%s was deleted without first being redacted.',


I guess we should make it less scary, since we are fixing the issue.

Suggested change

'Social auth link for user_id=%s, provider=%s was deleted without first being redacted.',

'Social auth link for user_id=%s, provider=%s was deleted without first being redacted. Redacting in pre_delete.',

robrap

I still need to look at tests, but some minor comments. Looking good so far.

robrap

Mostly test clean-up comments at this point.

robrap · 2026-05-08T18:53:25Z

+                'extra_data': dict(instance.extra_data) if instance.extra_data else {},
+            })
+
+        from django.db.models.signals import pre_delete


Is this needed in the method for a reason, rather than with the other imports at the top of the file?

robrap · 2026-05-08T19:13:07Z

+        captured_states = []
+
+        def capture_state_before_delete(sender, instance, **kwargs):  # pylint: disable=unused-argument
+            instance.refresh_from_db()
+            captured_states.append({
+                'id': instance.id,
+                'uid': instance.uid,
+                'extra_data': dict(instance.extra_data) if instance.extra_data else {},
+            })


I think using a pre_delete' signal for testing a pre_deletesignal makes this confusing. Is that what is being done here? How do you know what order thepre_delete` signals will get called? I'd rather it wasn't confusing in this way, and you used some other mechanism to test, like checking that there is an appropriate UPDATE query before the DELETE query, as we did in the earlier PR. You can retain the not exists assertion at the end.

Also, If this were needed, you've got a lot of code redundancy. You could use setUpClass or setUp and tearDownClass or tearDown, or helper functions to keep things DRY (Don't Repeat Yourself).

robrap · 2026-05-08T19:14:47Z

+    Safety-net signal handler that redacts PII on any UserSocialAuth before deletion.
+
+    Records deleted via ``redact_and_delete_social_auth`` will already be redacted;
+    this handler is a fallback for any other deletion path.


Suggested change

this handler is a fallback for any other deletion path.

this handler is a fallback for any missed deletion path.

robrap · 2026-05-08T19:17:16Z

+    Redaction happens before deletion so that any observers see only sanitised data.
+    Downstream copies of data may use soft-deletes, and redacting before deleting
+    ensures PII for retired users (or future retirements) is not retained.
+    The uid format matches ``get_redacted_social_auth_uid()``.


Moving this below...

Suggested change

The uid format matches ``get_redacted_social_auth_uid()``.

robrap · 2026-05-08T19:18:40Z

+    """
+    social_auth_queryset = UserSocialAuth.objects.filter(user_id=user_id)
+    social_auth_queryset.update(
+        uid=Concat(


Moved (and edited) comment:

Suggested change

uid=Concat(

# Important: this redacted uid must match the format used by ``get_redacted_social_auth_uid()``.

uid=Concat(

robrap · 2026-05-08T19:19:28Z

+    """
+    Redact PII from all UserSocialAuth records for the given user, then delete them.
+
+    Redaction happens before deletion so that any observers see only sanitised data.


Consider dropping this comment. The comment about soft-deletes is probably enough.

Suggested change

Redaction happens before deletion so that any observers see only sanitised data.

robrap · 2026-05-08T19:23:38Z

+
+
+@skip_unless_lms
+class RedactUserSocialAuthPIITest(TestCase):


The signal tests belong in a test_signals.py file with an appropriate class name. Some reasonable signal tests:

Does the signal warn and redact if not already redacted?

Does the signal skip warning (and redaction) if already redacted?

Optional: Using mock, confirm redact_and_delete_social_auth is called with skip_delete=True.

For utils tests of direct calls to redact_and_delete_social_auth, you can cover any items you didn't cover in signals (like maybe test_delete_redacts_multiple_sso_providers), and this shouldn't require signal setup and teardown.

Note: You have much of what you need, so hopefully this is minor refactoring and clean-up.

robrap · 2026-05-08T20:15:16Z

+
+    captured_states = []
+
+    def capture_state_before_delete(sender, instance, **kwargs):  # pylint: disable=unused-argument


You may want to use the same UPDATE/DELETE query assertion you set up for the other test. See other comment for details.

You'll also want to ensure that the real receiver you set up is not interfering with this test. For example, if you deleted the redaction from retire_user.py, would this test still pass because the signal is taking care of the redaction for you? One way to to fix this would be to disconnect that signal in setUpClass (with an appropriate comment) and to re-connect it in tearDownClass. An alternative is to mock logging and ensure that there is no log.warn from the signal (about redacting). You can test that these assertions work by temporarily removing the redaction you are testing.

ktyagiapphelix2u added 4 commits April 23, 2026 09:42

fix: Redact SSO PII before deletion

9a178e3

fix: Redact SSO PII before deletion

8d57698

fix: Redact SSO PII before deletion

2688ac8

fix: Redact SSO PII before deletion

ff4b57e

ktyagiapphelix2u marked this pull request as ready for review April 23, 2026 11:29

ktyagiapphelix2u requested a review from a team as a code owner April 23, 2026 11:29

vgulati-apphelix reviewed Apr 27, 2026

View reviewed changes

Comment thread openedx/core/djangoapps/user_api/accounts/tests/test_utils.py Outdated

vgulati-apphelix reviewed Apr 27, 2026

View reviewed changes

Comment thread openedx/core/djangoapps/user_api/accounts/tests/test_utils.py Outdated

Akanshu-2u reviewed Apr 27, 2026

View reviewed changes

Comment thread openedx/core/djangoapps/user_api/accounts/utils.py Outdated

Akanshu-2u reviewed Apr 27, 2026

View reviewed changes

Comment thread openedx/core/djangoapps/user_api/accounts/signals.py Outdated

Akanshu-2u reviewed Apr 27, 2026

View reviewed changes

Comment thread openedx/core/djangoapps/user_api/management/tests/test_retire_user.py

ktyagiapphelix2u added 2 commits April 28, 2026 05:45

fix: Redact SSO PII before deletion

417aa3d

fix: Redact SSO PII before deletion

542b5be

Akanshu-2u mentioned this pull request Apr 28, 2026

fix: retirement PII leaks by redacting pending secondary email/name data #38427

Open

Akanshu-2u approved these changes Apr 28, 2026

View reviewed changes

robrap reviewed Apr 29, 2026

View reviewed changes

Comment thread openedx/core/djangoapps/user_api/management/commands/retire_user.py Outdated

ktyagiapphelix2u added 2 commits May 4, 2026 06:14

fix: Redact SSO PII before deletion

1b46be6

fix: Redact SSO PII before deletion

74d655b

robrap reviewed May 5, 2026

View reviewed changes

Comment thread openedx/core/djangoapps/user_api/accounts/signals.py Outdated

ktyagiapphelix2u added 2 commits May 6, 2026 06:18

fix: Redact SSO PII before deletion

08b491f

fix: Redact SSO PII before deletion

bbb5643

robrap reviewed May 6, 2026

View reviewed changes

ktyagiapphelix2u added 3 commits May 7, 2026 05:45

fix: Redact SSO PII before deletion

07b82ff

fix: Redact SSO PII before deletion

15bcdc0

fix: Redact SSO PII before deletion

2a9fba8

robrap reviewed May 7, 2026

View reviewed changes

fix: Redact SSO PII before deletion

dd7ac9c

robrap reviewed May 7, 2026

View reviewed changes

ktyagiapphelix2u added 2 commits May 8, 2026 06:23

fix: Redact SSO PII before deletion

5ca020f

fix: Redact SSO PII before deletion

cdb49a2

robrap reviewed May 8, 2026

View reviewed changes

fix: Redact SSO PII before deletion

bd3c108

robrap reviewed May 8, 2026

View reviewed changes

Comment thread openedx/core/djangoapps/user_api/accounts/utils.py Outdated

fix: Redact SSO PII before deletion

7528c08

robrap reviewed May 8, 2026

View reviewed changes

ktyagiapphelix2u force-pushed the ktyagi/SSOPII branch 3 times, most recently from 3f3977a to 667de73 Compare May 11, 2026 07:46

ktyagiapphelix2u closed this May 11, 2026

ktyagiapphelix2u reopened this May 11, 2026

ktyagiapphelix2u force-pushed the ktyagi/SSOPII branch 3 times, most recently from 7fc7ec0 to ebb2f96 Compare May 11, 2026 10:06

fix: Redact SSO PII before deletion

2af3cb4

ktyagiapphelix2u force-pushed the ktyagi/SSOPII branch from 2fa49b0 to 2af3cb4 Compare May 11, 2026 11:24

fix: Redact SSO PII before deletion

9a8ba84

ktyagiapphelix2u force-pushed the ktyagi/SSOPII branch from 373d581 to 9a8ba84 Compare May 11, 2026 11:32

Merge branch 'master' into ktyagi/SSOPII

a75fb7f

	'Social auth link for user_id=%s, provider=%s was deleted without first being redacted.',
	'Social auth link for user_id=%s, provider=%s was deleted without first being redacted. Redacting in pre_delete.',

	this handler is a fallback for any other deletion path.
	this handler is a fallback for any missed deletion path.

	uid=Concat(
	# Important: this redacted uid must match the format used by ``get_redacted_social_auth_uid()``.
	uid=Concat(



		@skip_unless_lms
		class RedactUserSocialAuthPIITest(TestCase):


		captured_states = []

		def capture_state_before_delete(sender, instance, **kwargs): # pylint: disable=unused-argument

Conversation

ktyagiapphelix2u commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Jira Ticket

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ktyagiapphelix2u commented May 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robrap left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

robrap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ktyagiapphelix2u commented Apr 23, 2026 •

edited

Loading