Skip to content

Commit 677ec26

Browse files
therealalephclaude
andcommitted
fix: v1.9.8 — Android disconnect crash + UI test-button gate for non-apps_script modes
Android (#666 from @ilok67 with full root cause): - MainActivity.onStop was sending ACTION_STOP via startService() AND immediately calling stopService() on the same service. ACTION_STOP runs teardown() on a background thread that stopSelf()s at the end; the redundant stopService() triggered onDestroy() in parallel, racing the lifecycle and crashing on every Disconnect tap. Removed the stopService() — ACTION_STOP alone is sufficient for both the live-service and the zombie-after-process-death cases. The tornDown AtomicBoolean already guards against double-teardown of native state but couldn't protect against OS-level stopSelf vs stopService race. UI (#665 from @cmptrnb): - Test Relay button was showing red "test result: fail" status when used in full or direct mode. The underlying test_cmd::run deliberately refuses in those modes because probing Apps Script directly while the data plane goes via tunnel-node would give a misleading result, but the refuse path was getting translated to generic "test failed". UI now checks mode before running and shows a mode-specific explainer for full/direct (point users at https://whatismyipaddress.com in the browser via the proxy as the right way to verify). Includes already-merged PR #674 from @yyoyoian-pixel: drop client coalesce_step + tunnel-node straggler settle_step from 40 ms → 10 ms, raise tunnel-node settle max from 500 ms → 1000 ms. Asymmetric tuning: fast-fire when nothing else is queued, but adaptive coalesce on bursts. Backwards compatible — existing configs with explicit `coalesce_step_ms: 40` keep old behavior. Tests: 179 lib + 33 tunnel-node green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 994dd0b commit 677ec26

5 files changed

Lines changed: 76 additions & 21 deletions

File tree

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "mhrv-rs"
3-
version = "1.9.7"
3+
version = "1.9.8"
44
edition = "2021"
55
description = "Rust port of MasterHttpRelayVPN -- DPI bypass via Google Apps Script relay with domain fronting"
66
license = "MIT"

android/app/src/main/java/com/therealaleph/mhrv/MainActivity.kt

Lines changed: 25 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -173,30 +173,36 @@ class MainActivity : AppCompatActivity() {
173173
}
174174
},
175175
onStop = {
176-
// Three-step teardown. Each step is defensive against a
177-
// different failure mode we've actually hit in testing:
176+
// Single-step graceful teardown. ACTION_STOP delivered via
177+
// startService() reaches MhrvVpnService.onStartCommand,
178+
// which spawns the `mhrv-teardown` background thread that
179+
// tears down tun2proxy + the Rust runtime and then calls
180+
// stopSelf() at the end of teardown. Service stops on its
181+
// own — we don't need (and must not) follow up with
182+
// stopService().
178183
//
179-
// 1. ACTION_STOP — graceful path. The service receives it,
180-
// runs its teardown (stops tun2proxy, closes the TUN
181-
// fd, shuts down the Rust runtime) and stopSelf()'s.
182-
// This is what we want 99% of the time.
184+
// History (#666 from @ilok67): we used to call stopService()
185+
// immediately after startService(stopAction), as belt-and-
186+
// suspenders against a "force-closed then reopened zombie"
187+
// case. That second call was firing onDestroy() while the
188+
// mhrv-teardown thread was still running, racing two threads
189+
// through the lifecycle and crashing on tap-to-disconnect.
190+
// The teardown thread's idempotency guard (tornDown
191+
// AtomicBoolean) protects against double-teardown of native
192+
// state, but it can't protect against OS-level lifecycle
193+
// races on stopSelf vs stopService. ACTION_STOP alone is
194+
// enough for both the live-service and zombie cases —
195+
// startService creates a fresh service in the new process
196+
// for zombies, runs teardown (no-op on already-clean state)
197+
// and stops it.
183198
//
184-
// 2. stopService() — covers the "force-closed then
185-
// reopened" zombie case. Android may auto-restart our
186-
// START_STICKY service in a fresh process after the
187-
// user swipes us away from Recents, and the user's
188-
// next Stop tap needs to actually unbind even if our
189-
// in-memory TUN fd reference is gone. stopService is
190-
// idempotent so it's safe to follow the graceful path.
191-
//
192-
// 3. We do NOT touch the VpnService permission — that's
193-
// the OS-wide VPN grant and the user approved it
194-
// deliberately. Revoking it would force a re-prompt
195-
// on next Start, which is worse UX.
199+
// We do NOT touch the VpnService permission — that's the
200+
// OS-wide VPN grant and the user approved it deliberately.
201+
// Revoking it would force a re-prompt on next Start, which
202+
// is worse UX.
196203
val stopAction = Intent(this, MhrvVpnService::class.java)
197204
.setAction(MhrvVpnService.ACTION_STOP)
198205
startService(stopAction)
199-
stopService(Intent(this, MhrvVpnService::class.java))
200206
},
201207
onInstallCaConfirmed = {
202208
// The flow is (1) export cert, (2) copy it to Downloads so

docs/changelog/v1.9.8.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
<!-- see docs/changelog/v1.1.0.md for the file format: Persian, then `---`, then English. -->
2+
• Fix v1.9.7 Android: کرش روی tap Disconnect ([#666](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/issues/666) از @ilok67 با root cause + fix کامل): `MainActivity.onStop` بعد از `startService(ACTION_STOP)` بلافاصله `stopService()` رو هم می‌زد. ACTION_STOP داخل `MhrvVpnService` یک thread پس‌زمینه به نام `mhrv-teardown` می‌سازه که `teardown()` (بستن tun2proxy، fd TUN، runtime) رو اجرا می‌کنه و در پایانش `stopSelf()` رو فرامی‌خونه. ولی `stopService()` بلافاصله `onDestroy()` رو روی همان service trigger می‌کرد — دو thread همزمان دارن از lifecycle می‌گذرن، و OS process service رو می‌کشه قبل از اینکه teardown تمام بشه. crash بعد از تب Disconnect، در حدود ۹۹٪ از تستها قابل reproduce. حالا `stopService()` حذف شده — `ACTION_STOP` تنها کافی است (هم برای service زنده هم برای حالت زامبی). idempotency guard `tornDown` AtomicBoolean قبلاً موجود بود ولی محافظت OS-level lifecycle race رو نمی‌کرد. تشکر از @ilok67 برای triage عالی.
3+
• Fix v1.9.7 UI: دکمهٔ Test Relay در حالت `full``direct`) "test result: fail" قرمز نشون می‌داد ([#665](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/issues/665) از @cmptrnb). `mhrv-rs test` فقط برای حالت apps_script سیم‌کشی شده — در `full` mode عمداً refuse می‌کنه چون probe مستقیم Apps Script در حالی که data plane از tunnel-node رد می‌شه گمراه‌کننده است. ولی پیام refuse توسط UI به‌عنوان test failure ترجمه می‌شد + کاربر فکر می‌کرد proxy خراب است. حالا UI mode رو قبل از اجرای test چک می‌کنه + برای حالت‌های نامناسب پیام explainer می‌ده به‌جای fail قرمز:
4+
> Test Relay is wired only for apps_script mode. In full mode the data plane is the tunnel-node — to verify it end-to-end, start the proxy and load https://whatismyipaddress.com in your browser via 127.0.0.1:8085. The IP shown should be your tunnel-node's VPS IP.
5+
6+
- Tune adaptive batch coalesce (PR [#674](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/pull/674) از @yyoyoian-pixel): از 40 ms → **10 ms** برای client coalesce step و tunnel-node straggler settle step. tunnel-node settle max از 500 ms → **1000 ms**. منطق asymmetric: وقتی هیچ op دیگری نیست، fast-fire (10 ms کافی برای catch کردن op‌هایی که در همان event-loop tick می‌رسن مثل ۶ موازی parallel browser connection)؛ ولی وقتی هر دو طرف data دارن (uploads، page load بستی)، adaptive reset همچنان batch می‌کنه تا 1 s cap. در short: «وقتی چیزی برای انتظار نیست منتظر نباش، وقتی هست با تمام توان batch کن.» سازگار به عقب: کاربران با `coalesce_step_ms: 40` در config.json رفتار قدیمی رو نگه می‌دارن.
7+
• تست: ۱۷۹ lib + ۳۳ tunnel-node test همه pass.
8+
---
9+
• Fix Android crash on tap-Disconnect from v1.9.7 ([#666](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/issues/666) by @ilok67 with full root cause + fix): `MainActivity.onStop` was calling `stopService()` immediately after `startService(ACTION_STOP)`. ACTION_STOP inside `MhrvVpnService` spawns the `mhrv-teardown` background thread that runs `teardown()` (stops tun2proxy, closes TUN fd, shuts down the Rust runtime) and then calls `stopSelf()` at the end. But `stopService()` immediately triggered `onDestroy()` on the same service — two threads racing through the lifecycle, and the OS would kill the process before teardown finished. Crash on every Disconnect tap, ~99% reproducible. Removed the `stopService()` call — `ACTION_STOP` alone is sufficient for both the live-service and the zombie-after-process-death cases. The existing `tornDown` AtomicBoolean idempotency guard protects against double-teardown of native state, but it can't protect against OS-level lifecycle races on stopSelf vs stopService. Thanks @ilok67 for the precise triage.
10+
• Fix UI showing "test result: fail" red status for `full` (and `direct`) modes from v1.9.7 ([#665](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/issues/665) by @cmptrnb). `mhrv-rs test` is wired only for the apps_script relay path — it deliberately refuses in `full` mode because probing Apps Script directly while the actual data plane goes via tunnel-node would give a misleading green result. But the refuse path was getting translated by the UI as a generic "test failed" with red status, scaring users into thinking their proxy was broken. Now the UI checks mode before running the test and shows a friendly explainer for `full`/`direct`:
11+
> Test Relay is wired only for apps_script mode. In full mode the data plane is the tunnel-node — to verify it end-to-end, start the proxy and load https://whatismyipaddress.com in your browser via 127.0.0.1:8085. The IP shown should be your tunnel-node's VPS IP.
12+
13+
• Tune adaptive batch coalesce (PR [#674](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/pull/674) from @yyoyoian-pixel): client coalesce step + tunnel-node straggler settle step from 40 ms → **10 ms**, tunnel-node settle max from 500 ms → **1000 ms**. The asymmetric design — small step, generous max — picks up "fire-and-forget when nothing else is queued" without giving up batching on bursts. The 10 ms still catches ops that arrive in the same event-loop tick (e.g. a browser opening 6 parallel connections on page load), so we don't degenerate into single-op batches; but on a download where the client is just waiting for the next chunk, the per-batch dead-air shrinks by ~30 ms. Backwards-compatible: existing configs with explicit `coalesce_step_ms: 40` keep the old behaviour.
14+
• Tests: 179 lib + 33 tunnel-node tests all passing.

src/bin/ui.rs

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2171,6 +2171,41 @@ fn background_thread(shared: Arc<Shared>, rx: Receiver<Cmd>) {
21712171

21722172
Ok(Cmd::Test(cfg)) => {
21732173
let shared2 = shared.clone();
2174+
// Short-circuit modes where `test_cmd::run` deliberately
2175+
// refuses (full mode, direct mode). Those return false
2176+
// even when the proxy is healthy, which surfaced as
2177+
// "Test failed" + alarming red status — see #665. Show
2178+
// a friendly notice instead and skip the test path.
2179+
let mode_kind = cfg.mode_kind().ok();
2180+
let mode_explainer = match mode_kind {
2181+
Some(mhrv_rs::config::Mode::Full) => Some(
2182+
"Test Relay is wired only for apps_script mode. \
2183+
In full mode the data plane is the tunnel-node — \
2184+
to verify it end-to-end, start the proxy and load \
2185+
https://whatismyipaddress.com in your browser \
2186+
via 127.0.0.1:8085. The IP shown should be your \
2187+
tunnel-node's VPS IP. Tracking a real Full-mode \
2188+
test in #160."
2189+
),
2190+
Some(mhrv_rs::config::Mode::Direct) => Some(
2191+
"Test Relay is wired only for apps_script mode. \
2192+
In direct mode there is no Apps Script relay — \
2193+
every request goes through the SNI-rewrite tunnel \
2194+
straight to Google's edge. Verify by loading \
2195+
https://www.google.com via the proxy."
2196+
),
2197+
_ => None,
2198+
};
2199+
if let Some(msg) = mode_explainer {
2200+
{
2201+
let mut st = shared.state.lock().unwrap();
2202+
st.last_test_ok = None;
2203+
st.last_test_msg = msg.into();
2204+
st.last_test_msg_at = Some(Instant::now());
2205+
}
2206+
push_log(&shared, &format!("[ui] test skipped: {}", msg));
2207+
continue;
2208+
}
21742209
push_log(&shared, "[ui] running test...");
21752210
rt.spawn(async move {
21762211
let ok = test_cmd::run(&cfg).await;

0 commit comments

Comments
 (0)