Skip to content

fix: Prevent native code abort crashes #98

@ovitrif

Description

@ovitrif

Context

Android Vitals traces for the production crash show app-owned native frames in libbitkitcore.so ending in Android's bionic/libc.so abort. That proves native code reached a fatal process-abort path; it does not, by itself, prove the exact mechanism was a Rust panic unwinding across the FFI boundary.

PR #97 addressed the immediate AccountInfoError blocking-task path from #96, but the same safety goal needs to apply across the codebase, not just account info.

This issue tracks hardening app-facing native boundaries so recoverable failures are converted into typed errors, watcher events, logs, or safe fallbacks instead of terminating the Android/iOS process. Rust panics remain a major target, but the scope also includes Tokio join failures, lock poisoning, invalid persisted data, watcher/callback failures, and any non-test unwrap / expect path that can reasonably be mapped to an app-facing error.

Scope

Exported async task joins

Replace rt.spawn(...).await.unwrap() / .unwrap()? in exported functions with typed join-error handling:

  • src/lib.rs: decode
  • src/lib.rs: get_lnurl_invoice
  • src/lib.rs: lnurl_auth
  • src/lib.rs: check_sweepable_balances
  • src/lib.rs: scan_legacy_rn_native_segwit_recovery_funds
  • src/lib.rs: prepare_legacy_rn_native_segwit_recovery_sweep
  • src/lib.rs: prepare_sweep_transaction
  • src/lib.rs: broadcast_sweep_transaction
  • src/lib.rs: blocktank_remove_all_orders
  • src/lib.rs: blocktank_remove_all_cjit_entries
  • src/lib.rs: blocktank_wipe_all
  • src/lib.rs: wipe_all_databases

Prefer a small shared helper per error shape, rather than repeating ad-hoc unwrap_or_else blocks.

Blocking task abort boundaries

Generalize the run_account_info_blocking pattern to other onchain blocking tasks:

  • src/modules/onchain/implementation.rs: legacy RN recovery scan/sweep blocking work
  • src/modules/onchain/implementation.rs: sweep balance sync
  • src/modules/onchain/implementation.rs: sweep transaction sync/client setup
  • src/modules/onchain/implementation.rs: sweep broadcast pre-sync and broadcast
  • src/modules/onchain/implementation.rs: broadcast_raw_tx

These should catch panic payloads inside blocking closures where possible and map failure paths to SweepError / BroadcastError instead of letting task failures become process aborts.

Watcher thread and event callbacks

Harden the long-lived watcher path:

  • Wrap the std::thread::Builder::spawn watcher body in catch_unwind.
  • Make post-init watcher panics visible via logging and/or WatcherEvent::Error where possible.
  • Route every listener.on_event(...) call through a safe callback wrapper.
  • Replace expect("blockchain set above") with a typed AccountInfoError.

Native Trezor callbacks

Native callback calls should not be able to terminate the app process:

  • TransportCallback adapter methods in src/modules/trezor/implementation.rs
  • UiCallbackAdapter methods in src/modules/trezor/implementation.rs
  • Direct mobile callback calls such as enumerate_devices, close_device, and save_thp_credential

Map callback panics/failures into TrezorError or the callback result error shape used by trezor-connect-rs.

Locks and data parsing

Convert non-test panic sources into typed errors:

  • src/lib.rs: poisoned DB mutex unwraps
  • src/lib.rs: init_db global state unwraps where practical
  • src/lib.rs: wipe_all_databases lock unwrap
  • src/modules/blocktank/db.rs: state.parse().unwrap() / state2.parse().unwrap() should become database conversion errors

Acceptance Criteria

  • No app-facing exported function aborts the process because a spawned Tokio task panicked, was cancelled, or returned a join error.
  • Onchain blocking closures catch Rust panics where practical and return typed errors with useful messages.
  • Watcher thread failures and watcher/native callback failures cannot crash the Android JVM or iOS app process.
  • DB lock poisoning and invalid stored enum values return typed errors instead of panicking.
  • No known non-test app-facing unwrap / expect path remains where a typed error can reasonably be returned instead.
  • Regression tests cover at least the shared async join helper and each new blocking panic helper.
  • cargo fmt --check, focused regression tests, and cargo test --lib -- --skip modules::blocktank pass.
  • If public/generated bindings change, bump the crate/package version, regenerate mobile bindings, and create a new release.
  • AI agents rules should ensure future changes are reviewed to avoid native process-abort hazards at mobile bindings.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions