Skip to content

Add APIs to prune completed state for LSPS1/LSPS2/LSPS5#4526

Open
f3r10 wants to merge 5 commits into
lightningdevkit:mainfrom
f3r10:feat/add_apis_to_prune_completed_state
Open

Add APIs to prune completed state for LSPS1/LSPS2/LSPS5#4526
f3r10 wants to merge 5 commits into
lightningdevkit:mainfrom
f3r10:feat/add_apis_to_prune_completed_state

Conversation

@f3r10
Copy link
Copy Markdown

@f3r10 f3r10 commented Mar 31, 2026

Closes #4444.

Add explicit pruning APIs to LSPS1ServiceHandler, LSPS2ServiceHandler, and LSPS5ServiceHandler so LSP operators can reclaim memory from completed historical state.

@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented Mar 31, 2026

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

///
/// Returns an [`APIError::APIMisuseError`] if the counterparty has no state, the order is
/// unknown, or the order is in a non-terminal state.
pub async fn prune_order(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is the right API to require downstream code list the specific orders to prune or should we have some kind of "prune all failed orders older than X" API? @tnull wdyt?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right @TheBlueMatt, it would be much better something like this:
handler.prune_orders(client_id, Duration::from_secs(30 * 24 * 3600)).await?;

I am going to update that part. Thanks for the early review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree, allowing to prune older entries in an interval is great. At some point we still might want to expose the LSPS{1,2} service state via the API, at which point allowing to drop specific orders would make sense, but not sure if that's not better done in a dedicated follow-up.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better in a follow-up when service state listing exists

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready the new method using an interval @tnull @TheBlueMatt

@f3r10 f3r10 force-pushed the feat/add_apis_to_prune_completed_state branch from 25a876b to ebaa7ed Compare April 8, 2026 17:40
@f3r10
Copy link
Copy Markdown
Author

f3r10 commented Apr 8, 2026

Trying to add the API for LSPS2, using the same strategy as LSPS1 (using intervals), I realize it is not quite possible because OutboundJITChannel has no created_at timestamp.
It would only be possible to create a function like this prune_channel(counterparty_node_id, intercept_scid), but it would be necessary to list all scids

The solution would be to add created_at: LSPSDateTime to OutboundJITChannel (new TLV field — backwards-compatible since missing fields default on read).

My question is, should this change be done in this same PR or in a follow-up @tnull @TheBlueMatt wdyt?

@f3r10 f3r10 requested review from TheBlueMatt and tnull April 8, 2026 17:49
@TheBlueMatt
Copy link
Copy Markdown
Collaborator

Yea, makes sense to add a created-at timestamp in LSPS2 requests imo.

@f3r10 f3r10 marked this pull request as ready for review April 16, 2026 16:00
(4, opening_fee_params, required),
(6, payment_size_msat, option),
(8, trust_model, required),
(10, created_at, (default_value, LSPSDateTime::new_from_duration_since_epoch(Duration::ZERO))),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The default_value of epoch (Duration::ZERO = 1970-01-01T00:00:00Z) means that any pre-existing OutboundJITChannel deserialized from old persisted state (before this field existed) will have created_at = epoch. When prune_channels is called, now.duration_since(epoch) will return ~54 years, so any max_age shorter than that will prune all pre-upgrade terminal channels. This silently breaks the age-filter semantics for operators who upgrade and then call prune_channels expecting only old-enough channels to be pruned.

Consider using a sentinel value (e.g., Option<LSPSDateTime>) and skipping the age check for channels without a created_at, or document this prominently so operators know the first post-upgrade prune with max_age > 0 will still prune all legacy terminal channels.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the suggestion. I am going to convert that field to Option<LSPSDateTime>
In this context, None would mean "unknown age" (pre-upgrade data). In prune_terminal_channels:

  • Duration::ZERO → prunes all terminal channels including legacy ones
  • Any positive max_age → skips channels with created_at = None, only prunes those with a known created_at old enough

Comment on lines +684 to +685
let should_prune = max_age == Duration::ZERO
|| now.duration_since(&channel.created_at) >= max_age;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (pre-existing, but used by new code): LSPSDateTime::duration_since uses abs_diff, so it returns the absolute time difference. If now < created_at (e.g., due to clock skew or time adjustments), this returns a positive duration equal to the gap, which could exceed max_age and cause a channel to be incorrectly pruned even though it was just created.

For age-based pruning, a saturating subtraction (return Duration::ZERO when now < created_at) would be correct — an entry created "in the future" relative to now should never be considered old enough to prune. Same issue applies to the LSPS1 usage in prune_terminal_orders.

let mut outer_state_lock = self.per_peer_state.write().unwrap();
match outer_state_lock.get_mut(&counterparty_node_id) {
Some(peer_state) => {
if !peer_state.remove_webhook(app_name) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove_webhook unconditionally sets needs_persist |= true (line 822 of peer state) even when no webhook is found. When prune_webhook is called with a non-existent app_name, remove_webhook returns false and this method correctly returns an error — but the needs_persist flag has already been set, causing a needless persistence cycle. Consider moving the needs_persist assignment inside the retain's removal branch in remove_webhook.

@ldk-claude-review-bot
Copy link
Copy Markdown
Collaborator

ldk-claude-review-bot commented Apr 16, 2026

I've thoroughly reviewed the entire PR diff. All three issues from my previous review have been addressed:

  1. created_at is Option<LSPSDateTime> with TLV option — backward compatible
  2. duration_since uses saturating_sub + .max(0) — clock skew safe
  3. remove_webhook only sets needs_persist when a webhook is actually removed

I've verified:

  • Lock ordering is consistent (no deadlock risk in prune_channelsper_peer_state is released before acquiring peer_by_intercept_scid/peer_by_channel_id)
  • LSPS1 created_at is a required field, so no None-handling needed (unlike LSPS2)
  • Duration::ZERO semantics work correctly in both LSPS1 (via duration_since >= Duration::ZERO always true) and LSPS2 (explicit fast-path check)
  • Quota accounting is correct: pruning FailedAndRefunded orders frees quota, pruning CompletedAndChannelOpened only frees memory
  • Type parameter threading through LiquidityManager, sync wrappers, and LSPSProtocolMessageHandler impl is consistent
  • Auxiliary map cleanup in prune_terminal_channels is correct (partial borrows on separate struct fields inside retain)
  • Integration tests properly exercise both terminal and non-terminal paths

No issues found.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 85.55858% with 53 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.42%. Comparing base (47122e8) to head (9e9f96b).
⚠️ Report is 215 commits behind head on main.

Files with missing lines Patch % Lines
lightning-liquidity/src/lsps1/service.rs 0.00% 39 Missing ⚠️
lightning-liquidity/src/lsps2/service.rs 93.06% 8 Missing and 4 partials ⚠️
lightning-liquidity/src/lsps1/peer_state.rs 99.14% 0 Missing and 1 partial ⚠️
lightning-liquidity/src/manager.rs 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4526      +/-   ##
==========================================
+ Coverage   86.18%   86.42%   +0.24%     
==========================================
  Files         160      158       -2     
  Lines      108410   109651    +1241     
  Branches   108410   109651    +1241     
==========================================
+ Hits        93433    94770    +1337     
+ Misses      12344    12330      -14     
+ Partials     2633     2551      -82     
Flag Coverage Δ
fuzzing-fake-hashes 5.07% <0.00%> (?)
fuzzing-real-hashes 22.77% <0.00%> (?)
tests 86.16% <85.55%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 1st Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 2nd Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 3rd Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@TheBlueMatt TheBlueMatt removed the request for review from valentinewallace April 24, 2026 16:19
Copy link
Copy Markdown
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please respond (either fix or leave a reply why it doesn't matter/can't be fixed) all the AI review feedback and fix CI.

@f3r10
Copy link
Copy Markdown
Author

f3r10 commented Apr 29, 2026

Please respond (either fix or leave a reply why it doesn't matter/can't be fixed) all the AI review feedback and fix CI.

Yes, I am already fixing the issues reported by Claude

@f3r10 f3r10 force-pushed the feat/add_apis_to_prune_completed_state branch from e56052d to 572648e Compare May 4, 2026 19:37
Comment on lines +3126 to +3136
// Simulate what prune_channel does internally.
peer_state.outbound_channels_by_intercept_scid.remove(&intercept_scid);
peer_state.intercept_scid_by_channel_id.retain(|_, iscid| *iscid != intercept_scid);
peer_state.intercept_scid_by_user_channel_id.retain(|_, iscid| *iscid != intercept_scid);
peer_state.needs_persist = true;

// All maps must be empty after pruning.
assert!(peer_state.outbound_channels_by_intercept_scid.is_empty());
assert!(peer_state.intercept_scid_by_channel_id.is_empty());
assert!(peer_state.intercept_scid_by_user_channel_id.is_empty());
assert!(peer_state.needs_persist);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This test manually reimplements the pruning logic (remove + retain + set flag) instead of calling prune_terminal_channels. If the actual method has a bug — e.g., forgetting to clean up one of the auxiliary maps — this test will still pass because it tests a re-implementation, not the real method. Consider calling peer_state.prune_terminal_channels(&now, Duration::ZERO) and then asserting the maps are empty. The same applies to test_peer_state_prune_channel_non_terminal_rejected below which checks the guard condition manually rather than calling the method.

@f3r10 f3r10 force-pushed the feat/add_apis_to_prune_completed_state branch 3 times, most recently from e443713 to 7cf9926 Compare May 16, 2026 17:34
f3r10 added 3 commits May 16, 2026 12:42
Add `prune_orders(counterparty_node_id, max_age: Duration)` to both
`LSPS1ServiceHandler` and `LSPS1ServiceHandlerSync`. It removes all
terminal orders (`CompletedAndChannelOpened` / `FailedAndRefunded`) for
a given peer that are at least `max_age` old, persists the updated state,
and returns the number of entries removed.

Passing `Duration::ZERO` prunes all terminal orders immediately regardless
of age, which is the recommended approach to unblock a client that has
hit the per-peer request limit due to accumulated failed orders.

On the `PeerState` layer, `prune_terminal_orders(now, max_age)` uses
`retain` for a single-pass removal and sets `needs_persist` only when
at least one entry is removed.
Add a `created_at: LSPSDateTime` field to `OutboundJITChannel` to
record when each JIT channel was created (i.e., when the buy request
was accepted by the LSP). This timestamp is needed to implement
time-based bulk pruning of completed channel state.

The field is persisted as TLV type 10 with a `default_value` of
Unix epoch, ensuring old serialized data (without TLV 10) is read
back successfully with the epoch sentinel rather than failing
deserialization.
Add `LSPS2ServiceHandler::prune_channels` that lets the LSP operator
remove all channels in the `PaymentForwarded` terminal state whose
`created_at` timestamp is at least `max_age` old.  Passing
 `Duration::ZERO` prunes all terminal channels regardless of age.  All associated state is cleaned up atomically:

- per-peer `intercept_scid_by_channel_id` and
`intercept_scid_by_user_channel_id`
 - handler-level `peer_by_intercept_scid` and `peer_by_channel_id`

A new `PeerState::prune_terminal_channels` helper handles the intra-peer
map cleanup and returns the removed `(scid, channel_id)` pairs for the handler to clean up the outer maps.

Integration tests cover: non-terminal channels not pruned, unknown
counterparty errors, age filtering, and successful bulk prune.
f3r10 added 2 commits May 16, 2026 12:42
Add `LSPS5ServiceHandler::prune_webhook` that removes a single webhook
entry identified by `counterparty_node_id` and `LSPS5AppName`. The
method is synchronous, consistent with all other public `notify_*`
methods on this handler: it marks the peer state as dirty and relies on
the normal `LiquidityManager::persist` loop to flush the change to
the KVStore.

The method reuses the existing private `PeerState::remove_webhook`
helper and returns an `APIError::APIMisuseError` if the counterparty
has no registered state or the given `app_name` is not found.
…skew

Replace abs_diff with a saturating subtraction so that duration_since
returns Duration::ZERO when the reference timestamp is in the apparent
future (e.g. due to clock skew or monotonicity violations). Callers
using the result as an age — prune_terminal_orders (LSPS1),
prune_terminal_channels (LSPS2), and the stale-webhook and cooldown
checks (LSPS5) — no longer risk spuriously large values triggering
premature pruning or incorrect cooldown expiry.

Two unit tests are added to document the contract.
@f3r10 f3r10 force-pushed the feat/add_apis_to_prune_completed_state branch from 7cf9926 to 9e9f96b Compare May 16, 2026 17:43
@f3r10 f3r10 requested review from TheBlueMatt and tnull May 16, 2026 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add APIs to prune completed state for LSPS1/LSPS2/LSPS5

5 participants