Files
halobestie-clone/backend/src/plugins/valkey.js
Ramadhan Sjamsani 553dbac52f Phase 6: Valkey availability mirror — move read path off Postgres
Mitra-availability state (online flag, deactivated flag, per-mitra session
count, heartbeat liveness) mirrored into Valkey so the customer beacon
+ pairing blast + dashboard counts no longer hit Postgres on the hot path.
Postgres remains the durable source of truth; Valkey state is fully
derivable via seedFromPostgres on startup + reconnect.

Schema
- mitras:online           SET    — mirror of is_online
- mitras:deactivated      SET    — mirror of is_active=false
- mitra:capacity:<id>     STRING — active+pending_payment session count
- mitra💓<id>    STRING — ISO timestamp of last ping
- availability:snapshot   JSON   — beacon cache, TTL 10s, cluster-shared

Write paths (Postgres first, best-effort Valkey)
- setOnline/setOffline mirror SADD/SREM + heartbeat SET/DEL
- updateMitraStatus mirrors mitras:deactivated AND revokes auth_sessions
  on deactivate (bounds the "ghost online" window to access-token TTL)
- heartbeat is Valkey-only on the hot path; the per-ping Postgres UPDATE
  on last_heartbeat_at is eliminated (was 1,200 ops/min at prod scale)
- chat_session lifecycle (accept/end/reroute/extension/expiry) calls
  recomputeCapacityForMitra after each UPDATE — derive-from-truth avoids
  the bookkeeping risk of per-transition INCR/DECR

Read paths (Valkey-first, Postgres fallback on Valkey error)
- isMitraReachable: SISMEMBER mitras:online + heartbeat freshness
- findAvailableMitras: SDIFF + pipelined GETs, filter by capacity + heartbeat
- countAvailableMitrasFromCache: Valkey-driven, cached cluster-wide 10s TTL
- dashboard online count: SCARD
- Each reader wraps Valkey ops in try/catch → Postgres fallback on outage

Heartbeat path on /api/mitra/status/heartbeat
- resolveMitra preHandler replaced with heartbeatGuard: SISMEMBER on
  mitras:deactivated (~0 DB hits per ping). Falls back to full DB
  resolveMitra if Valkey is unreachable so a Valkey outage doesn't
  silently accept heartbeats from deactivated mitras.

Three sweeps, env-configurable cadences
- MITRA_AUTO_OFFLINE_SWEEP_SECONDS (30) — Valkey-driven stale detection
- HEARTBEAT_MIRROR_INTERVAL_SECONDS (60) — batched UPSERT writes
  Valkey timestamps to Postgres last_heartbeat_at via UNNEST (1 statement
  per cycle, idempotent across instances)
- VALKEY_ONLINE_MIRROR_SWEEP_SECONDS (300) — periodic reseed heals drift

Startup
- restoreActiveTimers → seedFromPostgres → bind listeners
- onValkeyReady re-runs the seed on every reconnect (cold start + reseed
  on Valkey restart, no manual intervention)

Failure semantics
- Read fallback: every Valkey read wrapped, falls back to existing
  Postgres JOIN query — system stays correct during Valkey outage,
  performance degrades not breaks
- Write best-effort: Postgres write commits before Valkey is touched;
  Valkey errors log + continue; reconciliation sweep heals drift
- Auto-offline sweep aborts entirely on Valkey error (does NOT mass-
  offline via Postgres scan during Valkey hiccup)

Tests
- New: 32 integration tests in mitra-status.valkey-mirror.test.js
  covering seed, write-through, fallbacks, capacity lifecycle,
  auto-offline sweep, heartbeat mirror, deactivation flow, beacon cache
- Updated: fixtures.js seeds Valkey alongside Postgres when isOnline=true
- Updated: helpers/db.js resetDb also flushes test Valkey
- Fixed 2 pre-existing session-timer flakes (string IDs failed uuid
  parse; vi.advanceTimersByTimeAsync raced real Postgres I/O)
- All 124/124 backend tests pass (was 90/92)

Docs
- requirement/valkey-online-mirror-plan.md — canonical plan
- requirement/valkey-online-mirror-testing.md — manual E2E checklist
- requirement/deployment.md — infra + Valkey persistence guidance for
  prod (Memorystore Standard tier recommended; migration from
  self-hosted Valkey is zero-downtime via reseed-from-Postgres)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 18:07:55 +08:00

97 lines
3.1 KiB
JavaScript

import Redis from 'ioredis'
let pub
let sub
let client
// 'ready' listeners (registered before connect; fire on initial connect AND each
// reconnect). Used by services that need to reseed Valkey state from Postgres.
const readyListeners = new Set()
const attachReadyHandler = (instance) => {
instance.on('ready', () => {
for (const fn of readyListeners) {
// Fire-and-forget; each listener owns its own error handling.
Promise.resolve()
.then(() => fn())
.catch((err) => console.error('[valkey] ready listener failed:', err))
}
})
}
export const onValkeyReady = (fn) => {
readyListeners.add(fn)
return () => readyListeners.delete(fn)
}
export const getValkeyClient = () => {
if (!client) {
const url = process.env.VALKEY_URL || 'redis://localhost:6379'
client = new Redis(url)
attachReadyHandler(client)
}
return client
}
export const getValkeyPub = () => {
if (!pub) {
const url = process.env.VALKEY_URL || 'redis://localhost:6379'
pub = new Redis(url)
}
return pub
}
export const getValkeySub = () => {
if (!sub) {
const url = process.env.VALKEY_URL || 'redis://localhost:6379'
sub = new Redis(url)
}
return sub
}
export const publish = async (channel, data) => {
const pubClient = getValkeyPub()
const numReceivers = await pubClient.publish(channel, JSON.stringify(data))
console.log(`[valkey] publish to ${channel}${numReceivers} receiver(s)`)
}
export const subscribe = (channel, callback) => {
const subClient = getValkeySub()
subClient.subscribe(channel)
console.log(`[valkey] subscribed to ${channel}`)
const handler = (ch, message) => {
if (ch === channel) {
console.log(`[valkey] received on ${channel}`)
callback(JSON.parse(message))
}
}
subClient.on('message', handler)
return () => {
subClient.unsubscribe(channel)
subClient.removeListener('message', handler)
console.log(`[valkey] unsubscribed from ${channel}`)
}
}
// --- Thin wrappers used by the mitra-availability mirror ---
//
// Each wrapper uses the shared `client` (separate ioredis instance from pub/sub
// to keep subscribe state isolated). Callers in services/* wrap these in
// try/catch and fall back to Postgres on error — see the plan doc.
export const sadd = (key, ...members) => getValkeyClient().sadd(key, ...members)
export const srem = (key, ...members) => getValkeyClient().srem(key, ...members)
export const sismember = async (key, member) =>
(await getValkeyClient().sismember(key, member)) === 1
export const smembers = (key) => getValkeyClient().smembers(key)
export const sdiff = (...keys) => getValkeyClient().sdiff(...keys)
export const scard = (key) => getValkeyClient().scard(key)
export const set = (key, value) => getValkeyClient().set(key, value)
export const get = (key) => getValkeyClient().get(key)
export const del = (...keys) => getValkeyClient().del(...keys)
export const incr = (key) => getValkeyClient().incr(key)
export const decr = (key) => getValkeyClient().decr(key)
export const exists = (key) => getValkeyClient().exists(key)
export const pipeline = () => getValkeyClient().pipeline()
export const multi = () => getValkeyClient().multi()