Files
halobestie-clone/backend/test/helpers/db.js
Ramadhan Sjamsani 553dbac52f Phase 6: Valkey availability mirror — move read path off Postgres
Mitra-availability state (online flag, deactivated flag, per-mitra session
count, heartbeat liveness) mirrored into Valkey so the customer beacon
+ pairing blast + dashboard counts no longer hit Postgres on the hot path.
Postgres remains the durable source of truth; Valkey state is fully
derivable via seedFromPostgres on startup + reconnect.

Schema
- mitras:online           SET    — mirror of is_online
- mitras:deactivated      SET    — mirror of is_active=false
- mitra:capacity:<id>     STRING — active+pending_payment session count
- mitra💓<id>    STRING — ISO timestamp of last ping
- availability:snapshot   JSON   — beacon cache, TTL 10s, cluster-shared

Write paths (Postgres first, best-effort Valkey)
- setOnline/setOffline mirror SADD/SREM + heartbeat SET/DEL
- updateMitraStatus mirrors mitras:deactivated AND revokes auth_sessions
  on deactivate (bounds the "ghost online" window to access-token TTL)
- heartbeat is Valkey-only on the hot path; the per-ping Postgres UPDATE
  on last_heartbeat_at is eliminated (was 1,200 ops/min at prod scale)
- chat_session lifecycle (accept/end/reroute/extension/expiry) calls
  recomputeCapacityForMitra after each UPDATE — derive-from-truth avoids
  the bookkeeping risk of per-transition INCR/DECR

Read paths (Valkey-first, Postgres fallback on Valkey error)
- isMitraReachable: SISMEMBER mitras:online + heartbeat freshness
- findAvailableMitras: SDIFF + pipelined GETs, filter by capacity + heartbeat
- countAvailableMitrasFromCache: Valkey-driven, cached cluster-wide 10s TTL
- dashboard online count: SCARD
- Each reader wraps Valkey ops in try/catch → Postgres fallback on outage

Heartbeat path on /api/mitra/status/heartbeat
- resolveMitra preHandler replaced with heartbeatGuard: SISMEMBER on
  mitras:deactivated (~0 DB hits per ping). Falls back to full DB
  resolveMitra if Valkey is unreachable so a Valkey outage doesn't
  silently accept heartbeats from deactivated mitras.

Three sweeps, env-configurable cadences
- MITRA_AUTO_OFFLINE_SWEEP_SECONDS (30) — Valkey-driven stale detection
- HEARTBEAT_MIRROR_INTERVAL_SECONDS (60) — batched UPSERT writes
  Valkey timestamps to Postgres last_heartbeat_at via UNNEST (1 statement
  per cycle, idempotent across instances)
- VALKEY_ONLINE_MIRROR_SWEEP_SECONDS (300) — periodic reseed heals drift

Startup
- restoreActiveTimers → seedFromPostgres → bind listeners
- onValkeyReady re-runs the seed on every reconnect (cold start + reseed
  on Valkey restart, no manual intervention)

Failure semantics
- Read fallback: every Valkey read wrapped, falls back to existing
  Postgres JOIN query — system stays correct during Valkey outage,
  performance degrades not breaks
- Write best-effort: Postgres write commits before Valkey is touched;
  Valkey errors log + continue; reconciliation sweep heals drift
- Auto-offline sweep aborts entirely on Valkey error (does NOT mass-
  offline via Postgres scan during Valkey hiccup)

Tests
- New: 32 integration tests in mitra-status.valkey-mirror.test.js
  covering seed, write-through, fallbacks, capacity lifecycle,
  auto-offline sweep, heartbeat mirror, deactivation flow, beacon cache
- Updated: fixtures.js seeds Valkey alongside Postgres when isOnline=true
- Updated: helpers/db.js resetDb also flushes test Valkey
- Fixed 2 pre-existing session-timer flakes (string IDs failed uuid
  parse; vi.advanceTimersByTimeAsync raced real Postgres I/O)
- All 124/124 backend tests pass (was 90/92)

Docs
- requirement/valkey-online-mirror-plan.md — canonical plan
- requirement/valkey-online-mirror-testing.md — manual E2E checklist
- requirement/deployment.md — infra + Valkey persistence guidance for
  prod (Memorystore Standard tier recommended; migration from
  self-hosted Valkey is zero-downtime via reseed-from-Postgres)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 18:07:55 +08:00

105 lines
3.9 KiB
JavaScript

import { getDb } from '../../src/db/client.js'
import { flushTestDb } from './valkey.js'
/**
* Single shared sql client used by tests. Same singleton the services use, since
* setup.js has already rewritten DATABASE_URL to point at the test schema.
*/
export const db = () => getDb()
/**
* Truncate Phase 3.7-relevant tables between tests.
*
* Order matters: pairing_failures FK → payment_requests; chat_request_notifications
* FK → chat_sessions; customer_transactions FK → chat_sessions; etc. Use CASCADE so
* we don't have to maintain the topological order when tables get added.
*
* We deliberately do NOT truncate roles / control_center_users / mitras / customers
* — those are seeded once per test file by fixtures and re-truncating them would
* force every test to re-create users (slow + noisy).
*/
const TRUNCATE_TABLES = [
'pairing_failures',
'payment_requests',
'chat_request_notifications',
'session_extensions',
'session_closures',
'session_sensitivity_log',
'chat_messages',
'customer_transactions',
'chat_sessions',
'auth_sessions',
'otp_requests',
'mitra_online_logs',
'mitra_online_status',
]
export const resetDb = async () => {
const sql = db()
// RESTART IDENTITY is a no-op for UUID PKs but cheap; CASCADE handles any future FK additions.
await sql.unsafe(`TRUNCATE TABLE ${TRUNCATE_TABLES.join(', ')} RESTART IDENTITY CASCADE`)
// Flush Valkey availability state so each test starts hermetic. Fixtures
// (createMitra etc.) re-seed Valkey alongside their Postgres writes.
await flushTestDb()
}
/**
* Wipe the slow-changing tables too — call sparingly (a single test that needs to
* verify "no users" semantics, or in afterAll teardown).
*/
export const resetDbHard = async () => {
const sql = db()
await sql.unsafe(
`TRUNCATE TABLE ${TRUNCATE_TABLES.join(', ')}, mitras, customers, control_center_users, roles RESTART IDENTITY CASCADE`
)
}
/**
* Drop and re-seed the configurable app_config rows back to their canonical defaults.
* Tests that mutate config (e.g. flipping pricing_promotions.enabled) call this in
* afterEach.
*
* Note: the first-session discount config no longer lives in app_config (Stage 5
* deleted those legacy keys). It now lives in the `pricing_promotions` table, which
* is also reset here back to the seed defaults that match migrate.js + the
* DEFAULT_DISCOUNT in pricing.service.js.
*/
export const resetAppConfig = async () => {
const sql = db()
// Restore the same defaults the migration sets. Using ON CONFLICT … DO UPDATE so a
// test-mutated row gets clobbered back, not just left alone.
const defaults = [
['anonymity', { enabled: false }],
['max_customers_per_mitra', { value: 3 }],
['extension_timeout_seconds', { value: 60 }],
['early_end_mitra_enabled', { value: false }],
['early_end_customer_enabled', { value: false }],
['payment_request_timeout_minutes', { value: 20 }],
['returning_chat_confirmation_timeout_seconds', { value: 20 }],
['extension_default_action_on_timeout', { value: 'auto_approve' }],
['pairing_blast_timeout_seconds', { value: 60 }],
['three_minute_warning_enabled', { value: true }],
]
for (const [key, value] of defaults) {
await sql`
INSERT INTO app_config (key, value, updated_at)
VALUES (${key}, ${sql.json(value)}, NOW())
ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value, updated_at = NOW()
`
}
// Reset pricing_promotions to canonical Phase 4 defaults. The Stage 1 backfill
// gates on "table empty" so we can't rely on migrate.js to restore values after
// a test mutates them — this UPDATE is the test-side reset hook.
await sql`
UPDATE pricing_promotions
SET enabled = true,
actual_price_idr = 2000,
gimmick_price_idr = 12000,
duration_minutes = 12,
modes = ${['chat']},
updated_at = NOW()
WHERE eligibility = 'first_session'
`
}