Skip to content

Invisible unread comments issue#4467

Open
hlbmtc wants to merge 10 commits intomainfrom
fix/unread-comments
Open

Invisible unread comments issue#4467
hlbmtc wants to merge 10 commits intomainfrom
fix/unread-comments

Conversation

@hlbmtc
Copy link
Contributor

@hlbmtc hlbmtc commented Mar 10, 2026

  • Prioritize rendering root comments that have unread replies, so unread comments are not buried deep in the pagination

part of #4464

Summary by CodeRabbit

  • New Features
    • Feed now uses your last-viewed timestamp to surface and prioritize threads with new comments since you last looked.
  • Bug Fixes
    • Posts are marked as read once comment loading finishes (replacing a fixed-delay approach) for more accurate read status.
  • Tests
    • Added tests to ensure unread threads are prioritized correctly across paginated comment views.
  • Chores
    • Consolidated user spam, soft-delete, and data-removal workflows for more consistent account handling.

@hlbmtc hlbmtc marked this pull request as draft March 10, 2026 22:09
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds unread-thread prioritization to comment feeds (backend + serializer + tests), replaces timing-based mark-as-read in comment feed frontend with a load-completion-triggered mark, and moves user lifecycle methods from models into new service functions (soft-delete, spam, and data-cleanup), with corresponding admin/view updates.

Changes

Cohort / File(s) Summary
Comment Feed (frontend)
front_end/src/components/comment_feed/index.tsx
Removed the inline 200ms-delayed markPostAsRead(postId) in the comments fetching effect; added a separate effect that marks the post read once initial comments finish loading, guarded by hasMarkedAsRead.
Comment filter & feed (backend)
comments/serializers/common.py, comments/services/feed.py
Added optional DateTimeField last_viewed_at to the comment filter serializer; get_comments_feed gains last_viewed_at arg, annotates threads with has_unread_thread for comments created after that timestamp, orders to prioritize unread threads, and adjusts insertion point for -is_focused_comment.
Backend tests
tests/unit/test_comments/test_views.py
Added TestUnreadThreadPrioritization with setup and test to assert unread threads prioritized across paginated comment views and behavior after mark-as-read.
Posts model defaults
posts/models.py
Consolidated timezone.now() usage into a single now variable in update_last_forecast_date; create_defaults now initializes last_forecast_date, comments_count, and viewed_at with that now.
User lifecycle: models → services
users/models.py, users/services/common.py, users/admin.py, users/views.py
Removed heavy lifecycle methods (mark_as_spam, soft_delete, clean_user_data_delete) from User model; introduced soft_delete_user, mark_user_as_spam, and clean_user_data_delete in users/services/common.py (transactional, handles soft/hard deletes and credential cleanup); updated admin and view modules to call the service functions and added necessary imports.
Manifest
package.json
Minor manifest metadata/dependency edits (lines changed small).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend as CommentFeedComponent
    participant Backend as CommentsService
    participant DB as Database

    User->>Frontend: Open post / comments
    Frontend->>Backend: GET /comments?post=ID (& optional last_viewed_at)
    Backend->>DB: Query threads, filter, compute unread (created > last_viewed_at), annotate has_unread_thread
    DB-->>Backend: Annotated threads
    Backend-->>Frontend: Respond with comment threads (includes has_unread_thread)
    Frontend->>Frontend: set isLoading = false
    Frontend->>Backend: POST /posts/:id/mark_read (triggered once when load completes)
    Backend->>DB: Persist user's last_viewed_at for post
    DB-->>Backend: Ack
    Backend-->>Frontend: Ack mark-read
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through code both spry and bright,

Moved the mark to when loads light,
Unread threads now peek and prance,
Users cleaned with a careful dance,
A twitch, a hop — the feed's just right.

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The PR title 'Invisible unread comments issue' is vague and does not clearly convey the main change, which is to prioritize rendering root comments with unread replies to prevent them from being buried in pagination. Revise title to be more specific about the primary change, such as 'Prioritize unread comment threads in feed pagination' or 'Surface root comments with unread replies'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/unread-comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
front_end/src/components/comment_feed/index.tsx (1)

247-254: Consider tracking the marked postId instead of a boolean flag for robustness.

The current implementation uses a boolean hasMarkedAsRead ref that never resets. If postId ever changes without the component remounting (e.g., shallow routing or future refactoring), the new post won't be marked as read.

A more robust approach would be to track which postId was marked:

Suggested improvement
   // Mark post as read after initial comments load completes
-  const hasMarkedAsRead = useRef(false);
+  const markedPostId = useRef<number | null>(null);
   useEffect(() => {
-    if (!isLoading && postId && user?.id && !hasMarkedAsRead.current) {
-      hasMarkedAsRead.current = true;
+    if (!isLoading && postId && user?.id && markedPostId.current !== postId) {
+      markedPostId.current = postId;
       markPostAsRead(postId);
     }
   }, [isLoading, postId, user?.id]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@front_end/src/components/comment_feed/index.tsx` around lines 247 - 254, The
boolean ref hasMarkedAsRead should be replaced with a ref that stores the
lastMarkedPostId so the effect marks the current post even if the component
doesn't remount; update the useEffect (which currently depends on isLoading,
postId, user?.id) to check if !isLoading && postId && user?.id &&
lastMarkedPostIdRef.current !== postId then call markPostAsRead(postId) and set
lastMarkedPostIdRef.current = postId; keep the same dependency array and
preserve existing early-return conditions (useEffect, postId, user?.id,
markPostAsRead).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@front_end/src/components/comment_feed/index.tsx`:
- Around line 247-254: The boolean ref hasMarkedAsRead should be replaced with a
ref that stores the lastMarkedPostId so the effect marks the current post even
if the component doesn't remount; update the useEffect (which currently depends
on isLoading, postId, user?.id) to check if !isLoading && postId && user?.id &&
lastMarkedPostIdRef.current !== postId then call markPostAsRead(postId) and set
lastMarkedPostIdRef.current = postId; keep the same dependency array and
preserve existing early-return conditions (useEffect, postId, user?.id,
markPostAsRead).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fbd1caee-b825-49e9-948f-1f7236b202df

📥 Commits

Reviewing files that changed from the base of the PR and between 97144b1 and f2ffb6e.

📒 Files selected for processing (1)
  • front_end/src/components/comment_feed/index.tsx

@hlbmtc
Copy link
Contributor Author

hlbmtc commented Mar 10, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
comments/services/feed.py (1)

18-18: Fix implicit Optional type hint.

Per PEP 484, use explicit union type instead of relying on the default value to imply optionality.

Proposed fix
-    last_viewed_at: datetime = None,
+    last_viewed_at: datetime | None = None,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comments/services/feed.py` at line 18, The parameter annotation
last_viewed_at currently relies on a default None to imply optionality; change
its type hint to an explicit Optional (e.g., Optional[datetime] or
Union[datetime, None]) and add the corresponding import from typing (Optional or
Union) at the top of comments/services/feed.py so the function signature (the
parameter named last_viewed_at) uses the explicit union type rather than relying
on the default value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@comments/services/feed.py`:
- Line 18: The parameter annotation last_viewed_at currently relies on a default
None to imply optionality; change its type hint to an explicit Optional (e.g.,
Optional[datetime] or Union[datetime, None]) and add the corresponding import
from typing (Optional or Union) at the top of comments/services/feed.py so the
function signature (the parameter named last_viewed_at) uses the explicit union
type rather than relying on the default value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d4790b21-c2a3-4c44-b8bf-76501a00f5ac

📥 Commits

Reviewing files that changed from the base of the PR and between f2ffb6e and fb8e2a5.

📒 Files selected for processing (2)
  • comments/serializers/common.py
  • comments/services/feed.py

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

🚀 Preview Environment

Your preview environment is ready!

Resource Details
🌐 Preview URL https://metaculus-pr-4467-fix-unread-comments-preview.mtcl.cc
📦 Docker Image ghcr.io/metaculus/metaculus:fix-unread-comments-df7d7f8
🗄️ PostgreSQL NeonDB branch preview/pr-4467-fix-unread-comments
Redis Fly Redis mtc-redis-pr-4467-fix-unread-comments

Details

  • Commit: d276a6cedfc963f1320d9e8f3f1fdf36de3e4fec
  • Branch: fix/unread-comments
  • Fly App: metaculus-pr-4467-fix-unread-comments

ℹ️ Preview Environment Info

Isolation:

  • PostgreSQL and Redis are fully isolated from production
  • Each PR gets its own database branch and Redis instance
  • Changes pushed to this PR will trigger a new deployment

Limitations:

  • Background workers and cron jobs are not deployed in preview environments
  • If you need to test background jobs, use Heroku staging environments

Cleanup:

  • This preview will be automatically destroyed when the PR is closed

@hlbmtc
Copy link
Contributor Author

hlbmtc commented Mar 11, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@hlbmtc hlbmtc requested review from lsabor and ncarazon March 11, 2026 15:33
@hlbmtc hlbmtc marked this pull request as ready for review March 11, 2026 15:36
Copy link
Contributor

@elisescu elisescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
comments/services/feed.py (1)

46-70: Consider performance for posts with many unread comments.

The current implementation loads all unread comment IDs into memory and uses them in an IN clause. For posts with a very large number of unread comments, this could become memory-intensive and generate a lengthy SQL query.

For typical usage, this is likely fine, but if posts can accumulate thousands of unread comments, consider using a subquery annotation instead:

unread_subquery = Comment.objects.filter(
    on_post=post,
    created_at__gt=last_viewed_at,
    is_soft_deleted=False,
    is_private=False,
).filter(
    Q(root_id=OuterRef("pk")) | Q(pk=OuterRef("pk"))
)
qs = qs.annotate(
    has_unread_thread=Case(
        When(Exists(unread_subquery), then=Value(1)),
        default=Value(0),
        output_field=IntegerField(),
    ),
)

This would push the work to the database and avoid the intermediate Python set.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comments/services/feed.py` around lines 46 - 70, The current code builds a
Python set unread_root_ids from Comment for use in an IN query which can be
memory- and SQL-size intensive for posts with many unread comments; replace this
client-side ID collection with a DB-side subquery/exists annotation: construct a
Comment queryset filtered by on_post=post, created_at__gt=last_viewed_at,
is_soft_deleted=False, is_private=False and then annotate qs (the post queryset)
with has_unread_thread using Exists (or an OuterRef filter testing root_id or pk
against the outer post pk) inside the Case/When so the database evaluates thread
membership instead of materializing unread_root_ids in Python.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@comments/services/feed.py`:
- Around line 46-70: The current code builds a Python set unread_root_ids from
Comment for use in an IN query which can be memory- and SQL-size intensive for
posts with many unread comments; replace this client-side ID collection with a
DB-side subquery/exists annotation: construct a Comment queryset filtered by
on_post=post, created_at__gt=last_viewed_at, is_soft_deleted=False,
is_private=False and then annotate qs (the post queryset) with has_unread_thread
using Exists (or an OuterRef filter testing root_id or pk against the outer post
pk) inside the Case/When so the database evaluates thread membership instead of
materializing unread_root_ids in Python.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2ee0598a-2102-4900-be87-03410a3959dd

📥 Commits

Reviewing files that changed from the base of the PR and between 4079b85 and c9c8610.

📒 Files selected for processing (1)
  • comments/services/feed.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@posts/models.py`:
- Around line 1051-1055: The create_defaults in update_last_forecast_date() is
seeding comments_count from post.get_comment_count(), which causes
annotate_unread_comment_count() to compute 0 unread for new snapshots; change
the create_defaults logic in update_last_forecast_date() so it does NOT copy the
post's comment count for users without a snapshot—either omit comments_count or
set it to 0 (and keep viewed_at unset) so annotate_unread_comment_count() treats
existing comments as unread until the user actually goes through
update_viewed_at()/the real view path; update the create_defaults block (the
dict with "last_forecast_date"/"comments_count"/"viewed_at") accordingly.

In `@users/services/common.py`:
- Around line 392-400: The deletion loop currently only checks post.comments and
post.forecasts before calling hard_delete_post(post); update the guard to also
treat coauthorship as ownership by checking post.coauthors for any coauthor
other than the current user (consistent with get_users() logic) and skip
deletion if such coauthors exist; locate the block using the variables/posts and
the hard_delete_post function and add a condition like "if
post.coauthors.exclude(author=user).exists(): continue" (or equivalent) so posts
with other coauthors are preserved.
- Around line 343-379: The GDPR delete path in clean_user_data_delete leaves PII
behind because user.metadata is not cleared and user.profile_picture only drops
the DB reference without removing the file from storage; update
clean_user_data_delete to (1) clear the metadata field (e.g. set user.metadata
to an empty dict/None consistent with the model) and (2) remove the stored
profile image file from storage before nullling the DB field (use the file
field's delete method or the storage backend via
user.profile_picture.delete(save=False)), then save the user; keep these changes
inside the existing `@transaction.atomic` block so the DB and storage removal are
coordinated.
- Around line 323-340: soft_delete_user() and mark_user_as_spam() deactivate the
user but never revoke API keys/tokens; update soft_delete_user(user: User) to
call revoke_all_user_tokens(user) after setting user.is_active = False (so
mark_user_as_spam inherits the behavior), ensuring all tokens and API keys are
immediately invalidated; alternatively, if you prefer auth-side protection, add
an is_active check in FallbackTokenAuthentication.authenticate_credentials() to
reject inactive users, but primary fix is to invoke revoke_all_user_tokens from
soft_delete_user.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f8127746-8577-4724-98c9-c54224958669

📥 Commits

Reviewing files that changed from the base of the PR and between c9c8610 and 00fa864.

📒 Files selected for processing (5)
  • posts/models.py
  • users/admin.py
  • users/models.py
  • users/services/common.py
  • users/views.py

Comment on lines +1051 to 1055
create_defaults={
"last_forecast_date": now,
"comments_count": post.get_comment_count(),
"viewed_at": now,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't mark existing comments as read when creating a forecast snapshot.

Line 1053 seeds comments_count from the current post state inside update_last_forecast_date(). For users who don't have a snapshot yet, annotate_unread_comment_count() on Lines 245-247 will then compute 0 unread comments immediately, even though they never went through the real view path in posts/views.py:420-440 / update_viewed_at(). That defeats the unread-thread prioritization this PR is trying to improve for first-time forecasters on posts that already have comments.

Suggested fix
         cls.objects.update_or_create(
             user=user,
             post=post,
             defaults={
                 "last_forecast_date": now,
             },
             create_defaults={
                 "last_forecast_date": now,
-                "comments_count": post.get_comment_count(),
-                "viewed_at": now,
             },
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@posts/models.py` around lines 1051 - 1055, The create_defaults in
update_last_forecast_date() is seeding comments_count from
post.get_comment_count(), which causes annotate_unread_comment_count() to
compute 0 unread for new snapshots; change the create_defaults logic in
update_last_forecast_date() so it does NOT copy the post's comment count for
users without a snapshot—either omit comments_count or set it to 0 (and keep
viewed_at unset) so annotate_unread_comment_count() treats existing comments as
unread until the user actually goes through update_viewed_at()/the real view
path; update the create_defaults block (the dict with
"last_forecast_date"/"comments_count"/"viewed_at") accordingly.

Comment on lines +323 to +340
@transaction.atomic
def soft_delete_user(user: User) -> None:
user.is_active = False

comments = user.comment_set.filter(is_soft_deleted=False).select_related("on_post")
for comment in comments:
# Soft-delete comments and update counters
soft_delete_comment(comment)

user.posts.update(curation_status=Post.CurationStatus.DELETED)

user.save()


@transaction.atomic
def mark_user_as_spam(user: User) -> None:
user.is_spam = True
soft_delete_user(user)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== API key authentication ==\n'
rg -n -C3 'ApiKey|authenticate_credentials|Authentication' authentication

printf '\n== JWT/session revocation checks ==\n'
rg -n -C3 'auth_revoked_at|revoke_all_user_tokens|is_active|JWT|SimpleJWT|authenticate' authentication metaculus_web users

Repository: Metaculus/metaculus

Length of output: 38087


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check for USER_AUTHENTICATION_RULE configuration
printf '\n== USER_AUTHENTICATION_RULE configuration ==\n'
rg -n 'USER_AUTHENTICATION_RULE' -t py

# Check what DRF TokenAuthentication does - look at how ApiKey model is used
printf '\n== ApiKey model and any related auth methods ==\n'
rg -n -A 10 'class ApiKey' authentication/models.py

# Check FallbackTokenAuthentication more carefully
printf '\n== FallbackTokenAuthentication full implementation ==\n'
sed -n '37,70p' authentication/auth.py

Repository: Metaculus/metaculus

Length of output: 1609


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check if DRF TokenAuthentication is customized or if there's an is_active check in ApiKey lookup
printf '\n== Check for any is_active validation in token lookup ==\n'
rg -n 'get_user|is_active' authentication/auth.py

# Check what USER_AUTHENTICATION_RULE defaults to (this is SimpleJWT config)
printf '\n== SimpleJWT settings and any custom USER_AUTHENTICATION_RULE ==\n'
rg -n -A 30 'SIMPLE_JWT' metaculus_web/settings.py | head -50

# Check if the ApiKey model has any custom authentication logic
printf '\n== Full ApiKey model definition ==\n'
sed -n '8,50p' authentication/models.py

Repository: Metaculus/metaculus

Length of output: 2169


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Let's look at the complete SessionJWTAuthentication.get_user implementation
printf '\n== SessionJWTAuthentication.get_user full implementation ==\n'
sed -n '19,35p' authentication/auth.py

# Check if there's any is_active validation in the auth.py file
printf '\n== Full auth.py file to see all authentication logic ==\n'
cat -n authentication/auth.py

# Check if DRF TokenAuthentication does is_active check - let's look for where user is retrieved
printf '\n== Search for user model retrieval in ApiKey lookup ==\n'
rg -n 'def authenticate_credentials|user_model|get_user' authentication/

Repository: Metaculus/metaculus

Length of output: 3584


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check what happens in the jwt_session.py at line 176 where USER_AUTHENTICATION_RULE is checked
printf '\n== jwt_session.py around USER_AUTHENTICATION_RULE ==\n'
sed -n '170,185p' authentication/jwt_session.py

# Check if there's any custom implementation of USER_AUTHENTICATION_RULE
printf '\n== Search for USER_AUTHENTICATION_RULE in full settings ==\n'
rg -n 'USER_AUTHENTICATION_RULE' metaculus_web/settings.py

# Let's check the SimpleJWT default behavior by looking at how refresh works
printf '\n== Full refresh_refresh_token_view and related functions ==\n'
sed -n '145,185p' authentication/jwt_session.py

# Most importantly: Does DRF TokenAuthentication check is_active by default?
# Let's search for any override of user_can_authenticate
printf '\n== Search for user_can_authenticate ==\n'
rg -n 'user_can_authenticate' authentication/

Repository: Metaculus/metaculus

Length of output: 719


🏁 Script executed:

#!/bin/bash
set -euo pipefde

# SimpleJWT's USER_AUTHENTICATION_RULE defaults to checking is_active
# But let's verify it's the default being used (no custom override)
printf '\n== Check if USER_AUTHENTICATION_RULE is customized ==\n'
grep -r 'USER_AUTHENTICATION_RULE' . --include='*.py' | grep -v '.pyc'

# Now the critical question: does SessionJWTAuthentication call USER_AUTHENTICATION_RULE on every request?
# Let's trace through: super().get_user() -> SimpleJWT's JWTAuthentication.get_user()
# SimpleJWT checks USER_AUTHENTICATION_RULE, which defaults to checking is_active

# For FallbackTokenAuthentication (API key), the parent is DRF's TokenAuthentication
# Let's verify if there's a custom authenticate_credentials that checks is_active
printf '\n== Check if TokenAuthentication is customized in rest_framework ==\n'
python3 << 'EOF'
import inspect
from rest_framework.authentication import TokenAuthentication

# Try to check what authenticate_credentials does
source = inspect.getsource(TokenAuthentication.authenticate_credentials)
print("TokenAuthentication.authenticate_credentials implementation:")
print(source)
EOF

Repository: Metaculus/metaculus

Length of output: 115


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# SimpleJWT's USER_AUTHENTICATION_RULE defaults to checking is_active
# Let's verify it's the default being used (no custom override)
printf '\n== Check if USER_AUTHENTICATION_RULE is customized ==\n'
grep -r 'USER_AUTHENTICATION_RULE' . --include='*.py' | grep -v '.pyc' | head -20

# Now let's check what the SimpleJWT default is for USER_AUTHENTICATION_RULE
# by looking at its source or finding documentation
python3 << 'EOF'
import inspect
from rest_framework_simplejwt.settings import api_settings

# Get the default USER_AUTHENTICATION_RULE
print("USER_AUTHENTICATION_RULE from SimpleJWT:")
print(api_settings.USER_AUTHENTICATION_RULE)
print("\nSource of the function:")
try:
    source = inspect.getsource(api_settings.USER_AUTHENTICATION_RULE)
    print(source)
except:
    print("Could not get source - likely built-in default")

# Also check what TokenAuthentication does
from rest_framework.authentication import TokenAuthentication
print("\n\nTokenAuthentication.authenticate_credentials:")
try:
    source = inspect.getsource(TokenAuthentication.authenticate_credentials)
    print(source)
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: Metaculus/metaculus

Length of output: 359


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Let's check the SimpleJWT documentation or look at what USER_AUTHENTICATION_RULE should be
# SimpleJWT's default USER_AUTHENTICATION_RULE checks is_active
# But let's verify by checking if soft_delete_user calls revoke_all_user_tokens
printf '\n== soft_delete_user and clean_user_data_delete implementations ==\n'
sed -n '323,365p' users/services/common.py

# Check if mark_user_as_spam calls soft_delete_user
printf '\n== mark_user_as_spam implementation ==\n'
rg -n -A 5 'def mark_user_as_spam' users/services/common.py

# Check if soft_delete_user or mark_user_as_spam calls revoke_all_user_tokens
printf '\n== Check for revoke_all_user_tokens calls ==\n'
rg -n 'revoke_all_user_tokens' users/services/common.py

Repository: Metaculus/metaculus

Length of output: 1636


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Let's check what happens at those line numbers where revoke_all_user_tokens is called
printf '\n== Lines 180-195 in users/services/common.py ==\n'
sed -n '180,195p' users/services/common.py

printf '\n== Lines 260-275 in users/services/common.py ==\n'
sed -n '260,275p' users/services/common.py

# Now check if there's any documentation or checking in TokenAuthentication
# Let's look for where is_active is checked in DRF
printf '\n== Search for is_active checks in all auth-related files ==\n'
rg -n 'is_active.*False|is_active.*True' authentication/ users/services/

# Check if DRF TokenAuthentication validates is_active - look for user_can_authenticate
printf '\n== Check AuthLoginBackend for user_can_authenticate ==\n'
sed -n '8,30p' authentication/backends.py

Repository: Metaculus/metaculus

Length of output: 2605


API keys for soft-deleted accounts will remain usable after deactivation.

soft_delete_user() and mark_user_as_spam() set user.is_active = False but do not call revoke_all_user_tokens(). This is inconsistent with change_user_password() and email changes, which do revoke tokens.

JWT authentication is protected: SimpleJWT's default USER_AUTHENTICATION_RULE checks is_active and rejects inactive users on every request. However, API key authentication via FallbackTokenAuthentication does not validate is_active—it only authenticates the user associated with the ApiKey, allowing inactive users with existing API keys to continue making authenticated requests indefinitely.

Either add an is_active check to FallbackTokenAuthentication.authenticate_credentials(), or call revoke_all_user_tokens(user) in soft_delete_user() to immediately invalidate all existing tokens and keys.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@users/services/common.py` around lines 323 - 340, soft_delete_user() and
mark_user_as_spam() deactivate the user but never revoke API keys/tokens; update
soft_delete_user(user: User) to call revoke_all_user_tokens(user) after setting
user.is_active = False (so mark_user_as_spam inherits the behavior), ensuring
all tokens and API keys are immediately invalidated; alternatively, if you
prefer auth-side protection, add an is_active check in
FallbackTokenAuthentication.authenticate_credentials() to reject inactive users,
but primary fix is to invoke revoke_all_user_tokens from soft_delete_user.

Comment on lines +343 to +379
@transaction.atomic
def clean_user_data_delete(user: User) -> None:
# Update User object
user.is_active = False
user.bio = ""
user.old_usernames = []
user.website = None
user.twitter = None
user.linkedin = None
user.facebook = None
user.github = None
user.good_judgement_open = None
user.kalshi = None
user.manifold = None
user.infer = None
user.hypermind = None
user.occupation = None
user.location = None
user.profile_picture = None
user.unsubscribed_mailing_tags = []
user.language = None
user.username = "deleted_user-" + str(user.id)
user.first_name = ""
user.last_name = ""
user.email = ""
user.set_password(None)
user.save()

# Comments
user.comment_set.filter(is_private=True).delete()
# don't touch public comments

# Token
ApiKey.objects.filter(user=user).delete()

# Social Auth login credentials
UserSocialAuth.objects.filter(user=user).delete()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Finish scrubbing stored PII in the GDPR delete path.

This still leaves user data behind: metadata is never cleared, and user.profile_picture = None only removes the database reference, not the file in storage. For an action presented as GDPR-compliant, both need to be deleted here.

🧹 Suggested fix
 `@transaction.atomic`
 def clean_user_data_delete(user: User) -> None:
+    if user.profile_picture:
+        user.profile_picture.delete(save=False)
+
     # Update User object
     user.is_active = False
     user.bio = ""
     user.old_usernames = []
@@
     user.location = None
     user.profile_picture = None
     user.unsubscribed_mailing_tags = []
     user.language = None
+    user.metadata = None
     user.username = "deleted_user-" + str(user.id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@users/services/common.py` around lines 343 - 379, The GDPR delete path in
clean_user_data_delete leaves PII behind because user.metadata is not cleared
and user.profile_picture only drops the DB reference without removing the file
from storage; update clean_user_data_delete to (1) clear the metadata field
(e.g. set user.metadata to an empty dict/None consistent with the model) and (2)
remove the stored profile image file from storage before nullling the DB field
(use the file field's delete method or the storage backend via
user.profile_picture.delete(save=False)), then save the user; keep these changes
inside the existing `@transaction.atomic` block so the DB and storage removal are
coordinated.

Comment on lines +392 to +400
posts = user.posts.all()
for post in posts:
# keep if there is at least one non-author comment
if post.comments.exclude(author=user).exists():
continue
# keep if there is at least one non-author forecast
if post.forecasts.exclude(author=user).exists():
continue
hard_delete_post(post)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't hard-delete posts that still have coauthors.

The retention check only looks at comments and forecasts. If one of these posts is coauthored, this will permanently delete content that another user still owns. get_users() in this same module already treats post.coauthors as authorship, so they need to be part of the keep/delete decision too.

👥 Suggested guard
     posts = user.posts.all()
     for post in posts:
+        if post.coauthors.exclude(pk=user.pk).exists():
+            continue
         # keep if there is at least one non-author comment
         if post.comments.exclude(author=user).exists():
             continue
         # keep if there is at least one non-author forecast
         if post.forecasts.exclude(author=user).exists():
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
posts = user.posts.all()
for post in posts:
# keep if there is at least one non-author comment
if post.comments.exclude(author=user).exists():
continue
# keep if there is at least one non-author forecast
if post.forecasts.exclude(author=user).exists():
continue
hard_delete_post(post)
posts = user.posts.all()
for post in posts:
if post.coauthors.exclude(pk=user.pk).exists():
continue
# keep if there is at least one non-author comment
if post.comments.exclude(author=user).exists():
continue
# keep if there is at least one non-author forecast
if post.forecasts.exclude(author=user).exists():
continue
hard_delete_post(post)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@users/services/common.py` around lines 392 - 400, The deletion loop currently
only checks post.comments and post.forecasts before calling
hard_delete_post(post); update the guard to also treat coauthorship as ownership
by checking post.coauthors for any coauthor other than the current user
(consistent with get_users() logic) and skip deletion if such coauthors exist;
locate the block using the variables/posts and the hard_delete_post function and
add a condition like "if post.coauthors.exclude(author=user).exists(): continue"
(or equivalent) so posts with other coauthors are preserved.

Copy link
Contributor

@ncarazon ncarazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants