feat: make whitelist configgurable#46
Conversation
- add flag ignore to tweet author - remove whitelist from config - add endpoints for managing whitelist
n13
left a comment
There was a problem hiding this comment.
Code Review: Feat - Configurable Whitelist
The PR introduces database-driven whitelist management for tweet authors, replacing the static configuration file. This is a good move for flexibility.
Here are my findings and suggestions:
1. get_whitelist Logic Issue
In src/repositories/tweet_author.rs, the get_whitelist method retrieves authors who are not ignored (is_ignored = false).
pub async fn get_whitelist(&self) -> Result<Vec<String>, DbError> {
let authors = sqlx::query_as::<_, TweetAuthor>("SELECT * FROM tweet_authors WHERE is_ignored = false")
.fetch_all(&self.pool)
.await?;
// ...
}However, TweetAuthor represents all authors encountered by the system, including those from random tweets or raids. By filtering only on is_ignored = false, you might be inadvertently whitelisting every author the system knows about (since is_ignored likely defaults to false).
Recommendation:
You need a specific flag for "whitelisted" status, or explicit logic.
- If
is_ignoredis the only flag: Doesfalsemean "Active/Whitelisted" andtruemean "Ignored/Blacklisted"? - If so, when a random user is inserted (e.g., via a raid submission), do they default to
false? If yes, they become part of the whitelist sync loop, which might be unintended. - Consider adding an
is_whitelistedboolean column instead, or clarify that only manually added authors should be in this table (which seems unlikely givenupsert_manyusage).
2. Efficiency in get_whitelist
The current implementation fetches the entire TweetAuthor struct for every whitelisted user just to map their IDs.
let authors = sqlx::query_as::<_, TweetAuthor>("SELECT * FROM tweet_authors ...")
// ...
let whitelist: Vec<String> = authors.iter().map(|f| f.id.clone()).collect();Optimization: Select only the id column directly.
let ids = sqlx::query_scalar::<_, String>("SELECT id FROM tweet_authors WHERE is_ignored = false")
.fetch_all(&self.pool)
.await?;
Ok(ids)3. Handle upsert Return Type
In handle_create_tweet_author, you use state.db.tweet_authors.upsert.
The repository method returns DbResult<String> (the ID), but the handler wraps it in SuccessResponse::new(create_response).
Just double-check that returning just the ID string is sufficient for the frontend, or if they need the full author object back (which the repo fetches anyway via RETURNING *).
4. Naming Consistency
- Endpoints:
/tweet-authors/:id/ignorevs/tweet-authors/:id/watch - Repository:
make_ignored_from_whitelistvsmake_watched_in_whitelist
The terms "ignore" and "watch" are good, but "whitelist" in the function name make_ignored_from_whitelist might be confusing if the concept is just "active status".
Suggestion: set_ignore_status(id, status: bool) could replace both methods and reduce code duplication.
5. Tests
The tests look good and cover the new endpoints.
- Ensure that
test_create_tweet_author_successverifies that the new author defaults to the correctis_ignoredstate (presumablyfalse).
Summary
The main concern is point #1: ensuring that "all authors in DB" != "whitelist". If tweet_authors table accumulates thousands of random users from raids, get_whitelist will return all of them, causing the TweetSynchronizerService to sync thousands of users unnecessarily.
Fix: Add is_whitelisted column (default false) and only set it to true for manually added authors.
By making is_ignored true by default, we can make sure only manually whitelisted author will be tracked changes: - Update db migration - Efficiency in get_whitelist - Efficiency in returning of upsert tweet authors - Refactor change author whitelist status repository method
n13
left a comment
There was a problem hiding this comment.
PR 46 Review: Make Whitelist Configurable
Summary
This PR introduces a configurable whitelist for the TweetSynchronizerService, allowing the application to filter tweets based on a list of specific Twitter usernames defined in the configuration files (default.toml, etc.).
Changes Identified
- Configuration:
- Added
whitelistfield (Vec<String>) toTweetSyncConfiginsrc/config.rs. - Updated
default.toml,example.toml, andtest.tomlto include awhitelistarray in the[tweet_sync]section.
- Added
- Service Logic:
- Updated
src/services/tweet_synchronizer_service.rsto useself.config.tweet_sync.whitelistviaSearchParams::build_batched_whitelist_queries.
- Updated
- Testing:
src/services/tweet_synchronizer_service.rstests reflect the usage of the whitelist (implicitly viaload_test_envwhich loadstest.toml).
Code Review
Correctness
- Config Loading: The configuration is correctly defined and loaded. The
whitelistvector maps correctly to the TOML array. - Service Integration: The service correctly accesses the whitelist from the config and passes it to the
rusxlibrary'sbuild_batched_whitelist_querieshelper. - Empty Whitelist: If the whitelist is empty (default in code),
build_batched_whitelist_queriespresumably yields no queries, which effectively disables the sync (or syncs nothing), which is safe behavior.
Code Quality
- DRY: The changes reuse existing config patterns and do not introduce duplication.
- Conciseness: The implementation is minimal and focused.
- Style: Code follows the existing project style.
Considerations & Recommendations
1. Global High-Water Mark (since_id)
The service uses a global since_id (the newest tweet ID in the database) for the search query:
let last_id = self.db.relevant_tweets.get_newest_tweet_id().await?;
// ...
if let Some(id) = last_id.clone() {
params.since_id = Some(id.clone());
}Potential Issue: If you add a new user to the whitelist who has posted tweets older than the current last_id in your database, those older tweets will be missed because the API query will filter them out based on the global since_id.
Recommendation: If historical backfill for new whitelist additions is required, consider tracking since_id per user or per batch, although this increases complexity significantly. For now, just be aware that adding a user will only pick up their future tweets (or tweets newer than the global high-water mark).
2. Validation
There is currently no validation for the whitelist strings (e.g., ensuring they are valid Twitter usernames, starting with @ or not, etc.).
Recommendation: Ensure that the rusx library or the configuration loader handles invalid usernames gracefully.
Conclusion
The changes look solid and correctly implement the requested feature. The "configurable" aspect is handled via the standard configuration files, which fits the project structure.
Status: ✅ Approved (with note on since_id behavior)
Summary
Make whitelist targets for tweet aggregation configurable from admin panel.
Changes