Mitra availability: read paths respect require_mitra_ping=false

When the operator sets require_mitra_ping=false, the auto-offline sweep
early-returns (by design — "don't gate online status on heartbeat
freshness"). The three Valkey read paths still gated on heartbeat
freshness anyway, which trapped the system: sweep won't remove the
mitra from mitras:online, but readers reject them as stale. The customer
CTA stayed permanently disabled with no recovery.

Fix all three to skip the heartbeat-freshness check when require_ping
is off, matching the sweep's contract:
- computeAvailabilityFromValkey (customer beacon)
- isMitraReachable (extension service)
- findAvailableMitrasFromValkey (pairing candidate finder)

The Postgres fallbacks already did the right thing (is_online only,
no heartbeat compare); this aligns the Valkey hot path.

Also: PATCH /internal/config/mitra-ping now publishes config:invalidate
for require_mitra_ping and mitra_stale_after_seconds, and the subscriber
in mitra-status.service was widened to listen for both. Flipping the
toggle in CC now busts the 10s availability snapshot immediately instead
of waiting out the TTL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-25 22:09:41 +08:00
parent 3052f7b799
commit d60c048776
4 changed files with 100 additions and 13 deletions

View File

@@ -74,14 +74,23 @@ export const invalidateAvailabilityCache = async () => {
}
}
// Bust the shared cache when CC changes max_customers_per_mitra (any instance).
// Bust the shared cache when CC changes any config that the beacon snapshots
// over: max_customers_per_mitra (capacity gate), require_mitra_ping (whether
// stale heartbeats exclude candidates), mitra_stale_after_seconds (the gate's
// threshold itself).
const AVAILABILITY_CACHE_INVALIDATING_KEYS = new Set([
'max_customers_per_mitra',
'require_mitra_ping',
'mitra_stale_after_seconds',
])
let _subscribed = false
const ensureSubscribed = () => {
if (_subscribed) return
_subscribed = true
try {
subscribe('config:invalidate', (msg) => {
if (msg?.key === 'max_customers_per_mitra') {
if (msg?.key && AVAILABILITY_CACHE_INVALIDATING_KEYS.has(msg.key)) {
invalidateAvailabilityCache()
}
})
@@ -349,7 +358,7 @@ export const mirrorHeartbeatsToPostgres = async () => {
*/
const computeAvailabilityFromValkey = async () => {
const { max_customers_per_mitra } = await getMaxCustomersPerMitra()
const { stale_after_seconds } = await getMitraPingConfig()
const { require_ping, stale_after_seconds } = await getMitraPingConfig()
const candidates = await valkey.sdiff(VK_MITRAS_ONLINE, VK_MITRAS_DEACTIVATED)
if (!candidates.length) return { available: false, count: 0 }
@@ -357,17 +366,26 @@ const computeAvailabilityFromValkey = async () => {
const pipe = valkey.pipeline()
for (const id of candidates) {
pipe.get(vkCapacityKey(id))
pipe.get(vkHeartbeatKey(id))
if (require_ping) pipe.get(vkHeartbeatKey(id))
}
const results = await pipe.exec()
const stride = require_ping ? 2 : 1
const cutoff = Date.now() - stale_after_seconds * 1000
let count = 0
for (let i = 0; i < candidates.length; i++) {
const capacity = Number(results[i * 2][1] ?? 0)
const heartbeat = results[i * 2 + 1][1]
const capacity = Number(results[i * stride][1] ?? 0)
if (capacity >= max_customers_per_mitra) continue
if (!heartbeat || Date.parse(heartbeat) < cutoff) continue
// When the operator has turned `require_mitra_ping` off, the auto-offline
// sweep is also a no-op (see autoOfflineStaleMitras early-return). Mitras
// stay in `mitras:online` until they explicitly toggle offline, so reading
// a stale heartbeat here doesn't mean "unreachable" — it means "we aren't
// tracking liveness." Skip the freshness gate to stay consistent with the
// sweep, and to match what the Postgres fallback returns (is_online only).
if (require_ping) {
const heartbeat = results[i * stride + 1][1]
if (!heartbeat || Date.parse(heartbeat) < cutoff) continue
}
count++
}
return { available: count > 0, count }
@@ -409,14 +427,19 @@ export const countAvailableMitrasFromCache = async () => {
* Falls back to a Postgres `is_online` read if Valkey is unreachable; the
* fallback skips the heartbeat-freshness check (sweep takes care of stale rows
* within `stale_after_seconds + sweep_cadence`).
*
* When `require_mitra_ping=false`, both the auto-offline sweep AND this check
* skip the heartbeat gate so the read path matches the sweep's contract: a
* mitra stays "reachable" until they explicitly toggle offline.
*/
export const isMitraReachable = async (mitraId) => {
try {
const inSet = await valkey.sismember(VK_MITRAS_ONLINE, mitraId)
if (!inSet) return false
const { require_ping, stale_after_seconds } = await getMitraPingConfig()
if (!require_ping) return true
const heartbeat = await valkey.get(vkHeartbeatKey(mitraId))
if (!heartbeat) return false
const { stale_after_seconds } = await getMitraPingConfig()
return Date.parse(heartbeat) >= Date.now() - stale_after_seconds * 1000
} catch (err) {
console.warn('[isMitraReachable] valkey unavailable, falling back to DB:', err.message)