Mitra availability: read paths respect require_mitra_ping=false

When the operator sets require_mitra_ping=false, the auto-offline sweep early-returns (by design — "don't gate online status on heartbeat freshness"). The three Valkey read paths still gated on heartbeat freshness anyway, which trapped the system: sweep won't remove the mitra from mitras:online, but readers reject them as stale. The customer CTA stayed permanently disabled with no recovery. Fix all three to skip the heartbeat-freshness check when require_ping is off, matching the sweep's contract: - computeAvailabilityFromValkey (customer beacon) - isMitraReachable (extension service) - findAvailableMitrasFromValkey (pairing candidate finder) The Postgres fallbacks already did the right thing (is_online only, no heartbeat compare); this aligns the Valkey hot path. Also: PATCH /internal/config/mitra-ping now publishes config:invalidate for require_mitra_ping and mitra_stale_after_seconds, and the subscriber in mitra-status.service was widened to listen for both. Flipping the toggle in CC now busts the 10s availability snapshot immediately instead of waiting out the TTL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 22:09:41 +08:00
parent 3052f7b799
commit d60c048776
4 changed files with 100 additions and 13 deletions
--- a/backend/src/services/mitra-status.service.js
+++ b/backend/src/services/mitra-status.service.js
@@ -74,14 +74,23 @@ export const invalidateAvailabilityCache = async () => {
  }
 }

-// Bust the shared cache when CC changes max_customers_per_mitra (any instance).
+// Bust the shared cache when CC changes any config that the beacon snapshots
+// over: max_customers_per_mitra (capacity gate), require_mitra_ping (whether
+// stale heartbeats exclude candidates), mitra_stale_after_seconds (the gate's
+// threshold itself).
+const AVAILABILITY_CACHE_INVALIDATING_KEYS = new Set([
+  'max_customers_per_mitra',
+  'require_mitra_ping',
+  'mitra_stale_after_seconds',
+])
+
 let _subscribed = false
 const ensureSubscribed = () => {
  if (_subscribed) return
  _subscribed = true
  try {
    subscribe('config:invalidate', (msg) => {
-      if (msg?.key === 'max_customers_per_mitra') {
+      if (msg?.key && AVAILABILITY_CACHE_INVALIDATING_KEYS.has(msg.key)) {
        invalidateAvailabilityCache()
      }
    })
@@ -349,7 +358,7 @@ export const mirrorHeartbeatsToPostgres = async () => {
 */
 const computeAvailabilityFromValkey = async () => {
  const { max_customers_per_mitra } = await getMaxCustomersPerMitra()
-  const { stale_after_seconds } = await getMitraPingConfig()
+  const { require_ping, stale_after_seconds } = await getMitraPingConfig()

  const candidates = await valkey.sdiff(VK_MITRAS_ONLINE, VK_MITRAS_DEACTIVATED)
  if (!candidates.length) return { available: false, count: 0 }
@@ -357,17 +366,26 @@ const computeAvailabilityFromValkey = async () => {
  const pipe = valkey.pipeline()
  for (const id of candidates) {
    pipe.get(vkCapacityKey(id))
-    pipe.get(vkHeartbeatKey(id))
+    if (require_ping) pipe.get(vkHeartbeatKey(id))
  }
  const results = await pipe.exec()
+  const stride = require_ping ? 2 : 1

  const cutoff = Date.now() - stale_after_seconds * 1000
  let count = 0
  for (let i = 0; i < candidates.length; i++) {
-    const capacity = Number(results[i * 2][1] ?? 0)
-    const heartbeat = results[i * 2 + 1][1]
+    const capacity = Number(results[i * stride][1] ?? 0)
    if (capacity >= max_customers_per_mitra) continue
-    if (!heartbeat || Date.parse(heartbeat) < cutoff) continue
+    // When the operator has turned `require_mitra_ping` off, the auto-offline
+    // sweep is also a no-op (see autoOfflineStaleMitras early-return). Mitras
+    // stay in `mitras:online` until they explicitly toggle offline, so reading
+    // a stale heartbeat here doesn't mean "unreachable" — it means "we aren't
+    // tracking liveness." Skip the freshness gate to stay consistent with the
+    // sweep, and to match what the Postgres fallback returns (is_online only).
+    if (require_ping) {
+      const heartbeat = results[i * stride + 1][1]
+      if (!heartbeat || Date.parse(heartbeat) < cutoff) continue
+    }
    count++
  }
  return { available: count > 0, count }
@@ -409,14 +427,19 @@ export const countAvailableMitrasFromCache = async () => {
 * Falls back to a Postgres `is_online` read if Valkey is unreachable; the
 * fallback skips the heartbeat-freshness check (sweep takes care of stale rows
 * within `stale_after_seconds + sweep_cadence`).
+ *
+ * When `require_mitra_ping=false`, both the auto-offline sweep AND this check
+ * skip the heartbeat gate so the read path matches the sweep's contract: a
+ * mitra stays "reachable" until they explicitly toggle offline.
 */
 export const isMitraReachable = async (mitraId) => {
  try {
    const inSet = await valkey.sismember(VK_MITRAS_ONLINE, mitraId)
    if (!inSet) return false
+    const { require_ping, stale_after_seconds } = await getMitraPingConfig()
+    if (!require_ping) return true
    const heartbeat = await valkey.get(vkHeartbeatKey(mitraId))
    if (!heartbeat) return false
-    const { stale_after_seconds } = await getMitraPingConfig()
    return Date.parse(heartbeat) >= Date.now() - stale_after_seconds * 1000
  } catch (err) {
    console.warn('[isMitraReachable] valkey unavailable, falling back to DB:', err.message)