Skip to content

feat: make whitelist configgurable#46

Merged
dewabisma merged 7 commits into
mainfrom
feat/configurable-whitelist
Jan 4, 2026
Merged

feat: make whitelist configgurable#46
dewabisma merged 7 commits into
mainfrom
feat/configurable-whitelist

Conversation

@dewabisma

Copy link
Copy Markdown
Contributor

Summary

Make whitelist targets for tweet aggregation configurable from admin panel.

Changes

  • add flag ignore to tweet author
  • remove whitelist from config
  • add endpoints for managing whitelist

- add flag ignore to tweet author
- remove whitelist from config
- add endpoints for managing whitelist
@dewabisma dewabisma requested review from illuzen and n13 December 22, 2025 11:20

@n13 n13 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Feat - Configurable Whitelist

The PR introduces database-driven whitelist management for tweet authors, replacing the static configuration file. This is a good move for flexibility.

Here are my findings and suggestions:

1. get_whitelist Logic Issue

In src/repositories/tweet_author.rs, the get_whitelist method retrieves authors who are not ignored (is_ignored = false).

pub async fn get_whitelist(&self) -> Result<Vec<String>, DbError> {
    let authors = sqlx::query_as::<_, TweetAuthor>("SELECT * FROM tweet_authors WHERE is_ignored = false")
        .fetch_all(&self.pool)
        .await?;
    // ...
}

However, TweetAuthor represents all authors encountered by the system, including those from random tweets or raids. By filtering only on is_ignored = false, you might be inadvertently whitelisting every author the system knows about (since is_ignored likely defaults to false).

Recommendation:
You need a specific flag for "whitelisted" status, or explicit logic.

  • If is_ignored is the only flag: Does false mean "Active/Whitelisted" and true mean "Ignored/Blacklisted"?
  • If so, when a random user is inserted (e.g., via a raid submission), do they default to false? If yes, they become part of the whitelist sync loop, which might be unintended.
  • Consider adding an is_whitelisted boolean column instead, or clarify that only manually added authors should be in this table (which seems unlikely given upsert_many usage).

2. Efficiency in get_whitelist

The current implementation fetches the entire TweetAuthor struct for every whitelisted user just to map their IDs.

let authors = sqlx::query_as::<_, TweetAuthor>("SELECT * FROM tweet_authors ...")
// ...
let whitelist: Vec<String> = authors.iter().map(|f| f.id.clone()).collect();

Optimization: Select only the id column directly.

let ids = sqlx::query_scalar::<_, String>("SELECT id FROM tweet_authors WHERE is_ignored = false")
    .fetch_all(&self.pool)
    .await?;
Ok(ids)

3. Handle upsert Return Type

In handle_create_tweet_author, you use state.db.tweet_authors.upsert.
The repository method returns DbResult<String> (the ID), but the handler wraps it in SuccessResponse::new(create_response).

Just double-check that returning just the ID string is sufficient for the frontend, or if they need the full author object back (which the repo fetches anyway via RETURNING *).

4. Naming Consistency

  • Endpoints: /tweet-authors/:id/ignore vs /tweet-authors/:id/watch
  • Repository: make_ignored_from_whitelist vs make_watched_in_whitelist

The terms "ignore" and "watch" are good, but "whitelist" in the function name make_ignored_from_whitelist might be confusing if the concept is just "active status".
Suggestion: set_ignore_status(id, status: bool) could replace both methods and reduce code duplication.

5. Tests

The tests look good and cover the new endpoints.

  • Ensure that test_create_tweet_author_success verifies that the new author defaults to the correct is_ignored state (presumably false).

Summary

The main concern is point #1: ensuring that "all authors in DB" != "whitelist". If tweet_authors table accumulates thousands of random users from raids, get_whitelist will return all of them, causing the TweetSynchronizerService to sync thousands of users unnecessarily.

Fix: Add is_whitelisted column (default false) and only set it to true for manually added authors.

By making is_ignored true by default, we can make sure only manually whitelisted author will be tracked

changes:
- Update db migration
- Efficiency in get_whitelist
- Efficiency in returning of upsert tweet authors
- Refactor change author whitelist status repository method

@n13 n13 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR 46 Review: Make Whitelist Configurable

Summary

This PR introduces a configurable whitelist for the TweetSynchronizerService, allowing the application to filter tweets based on a list of specific Twitter usernames defined in the configuration files (default.toml, etc.).

Changes Identified

  1. Configuration:
    • Added whitelist field (Vec<String>) to TweetSyncConfig in src/config.rs.
    • Updated default.toml, example.toml, and test.toml to include a whitelist array in the [tweet_sync] section.
  2. Service Logic:
    • Updated src/services/tweet_synchronizer_service.rs to use self.config.tweet_sync.whitelist via SearchParams::build_batched_whitelist_queries.
  3. Testing:
    • src/services/tweet_synchronizer_service.rs tests reflect the usage of the whitelist (implicitly via load_test_env which loads test.toml).

Code Review

Correctness

  • Config Loading: The configuration is correctly defined and loaded. The whitelist vector maps correctly to the TOML array.
  • Service Integration: The service correctly accesses the whitelist from the config and passes it to the rusx library's build_batched_whitelist_queries helper.
  • Empty Whitelist: If the whitelist is empty (default in code), build_batched_whitelist_queries presumably yields no queries, which effectively disables the sync (or syncs nothing), which is safe behavior.

Code Quality

  • DRY: The changes reuse existing config patterns and do not introduce duplication.
  • Conciseness: The implementation is minimal and focused.
  • Style: Code follows the existing project style.

Considerations & Recommendations

1. Global High-Water Mark (since_id)

The service uses a global since_id (the newest tweet ID in the database) for the search query:

let last_id = self.db.relevant_tweets.get_newest_tweet_id().await?;
// ...
if let Some(id) = last_id.clone() {
    params.since_id = Some(id.clone());
}

Potential Issue: If you add a new user to the whitelist who has posted tweets older than the current last_id in your database, those older tweets will be missed because the API query will filter them out based on the global since_id.
Recommendation: If historical backfill for new whitelist additions is required, consider tracking since_id per user or per batch, although this increases complexity significantly. For now, just be aware that adding a user will only pick up their future tweets (or tweets newer than the global high-water mark).

2. Validation

There is currently no validation for the whitelist strings (e.g., ensuring they are valid Twitter usernames, starting with @ or not, etc.).
Recommendation: Ensure that the rusx library or the configuration loader handles invalid usernames gracefully.

Conclusion

The changes look solid and correctly implement the requested feature. The "configurable" aspect is handled via the standard configuration files, which fits the project structure.

Status: ✅ Approved (with note on since_id behavior)

Base automatically changed from feat/simplify-raid-submission to main January 4, 2026 10:02
@dewabisma dewabisma merged commit 8086df5 into main Jan 4, 2026
@dewabisma dewabisma deleted the feat/configurable-whitelist branch January 4, 2026 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants