Files
halobestie-clone/backend/src/services/extension.service.js
Ramadhan Sjamsani 553dbac52f Phase 6: Valkey availability mirror — move read path off Postgres
Mitra-availability state (online flag, deactivated flag, per-mitra session
count, heartbeat liveness) mirrored into Valkey so the customer beacon
+ pairing blast + dashboard counts no longer hit Postgres on the hot path.
Postgres remains the durable source of truth; Valkey state is fully
derivable via seedFromPostgres on startup + reconnect.

Schema
- mitras:online           SET    — mirror of is_online
- mitras:deactivated      SET    — mirror of is_active=false
- mitra:capacity:<id>     STRING — active+pending_payment session count
- mitra💓<id>    STRING — ISO timestamp of last ping
- availability:snapshot   JSON   — beacon cache, TTL 10s, cluster-shared

Write paths (Postgres first, best-effort Valkey)
- setOnline/setOffline mirror SADD/SREM + heartbeat SET/DEL
- updateMitraStatus mirrors mitras:deactivated AND revokes auth_sessions
  on deactivate (bounds the "ghost online" window to access-token TTL)
- heartbeat is Valkey-only on the hot path; the per-ping Postgres UPDATE
  on last_heartbeat_at is eliminated (was 1,200 ops/min at prod scale)
- chat_session lifecycle (accept/end/reroute/extension/expiry) calls
  recomputeCapacityForMitra after each UPDATE — derive-from-truth avoids
  the bookkeeping risk of per-transition INCR/DECR

Read paths (Valkey-first, Postgres fallback on Valkey error)
- isMitraReachable: SISMEMBER mitras:online + heartbeat freshness
- findAvailableMitras: SDIFF + pipelined GETs, filter by capacity + heartbeat
- countAvailableMitrasFromCache: Valkey-driven, cached cluster-wide 10s TTL
- dashboard online count: SCARD
- Each reader wraps Valkey ops in try/catch → Postgres fallback on outage

Heartbeat path on /api/mitra/status/heartbeat
- resolveMitra preHandler replaced with heartbeatGuard: SISMEMBER on
  mitras:deactivated (~0 DB hits per ping). Falls back to full DB
  resolveMitra if Valkey is unreachable so a Valkey outage doesn't
  silently accept heartbeats from deactivated mitras.

Three sweeps, env-configurable cadences
- MITRA_AUTO_OFFLINE_SWEEP_SECONDS (30) — Valkey-driven stale detection
- HEARTBEAT_MIRROR_INTERVAL_SECONDS (60) — batched UPSERT writes
  Valkey timestamps to Postgres last_heartbeat_at via UNNEST (1 statement
  per cycle, idempotent across instances)
- VALKEY_ONLINE_MIRROR_SWEEP_SECONDS (300) — periodic reseed heals drift

Startup
- restoreActiveTimers → seedFromPostgres → bind listeners
- onValkeyReady re-runs the seed on every reconnect (cold start + reseed
  on Valkey restart, no manual intervention)

Failure semantics
- Read fallback: every Valkey read wrapped, falls back to existing
  Postgres JOIN query — system stays correct during Valkey outage,
  performance degrades not breaks
- Write best-effort: Postgres write commits before Valkey is touched;
  Valkey errors log + continue; reconciliation sweep heals drift
- Auto-offline sweep aborts entirely on Valkey error (does NOT mass-
  offline via Postgres scan during Valkey hiccup)

Tests
- New: 32 integration tests in mitra-status.valkey-mirror.test.js
  covering seed, write-through, fallbacks, capacity lifecycle,
  auto-offline sweep, heartbeat mirror, deactivation flow, beacon cache
- Updated: fixtures.js seeds Valkey alongside Postgres when isOnline=true
- Updated: helpers/db.js resetDb also flushes test Valkey
- Fixed 2 pre-existing session-timer flakes (string IDs failed uuid
  parse; vi.advanceTimersByTimeAsync raced real Postgres I/O)
- All 124/124 backend tests pass (was 90/92)

Docs
- requirement/valkey-online-mirror-plan.md — canonical plan
- requirement/valkey-online-mirror-testing.md — manual E2E checklist
- requirement/deployment.md — infra + Valkey persistence guidance for
  prod (Memorystore Standard tier recommended; migration from
  self-hosted Valkey is zero-downtime via reseed-from-Postgres)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 18:07:55 +08:00

353 lines
14 KiB
JavaScript

import { getDb } from '../db/client.js'
import { sendToSessionParticipant, isUserOnlineWs } from '../plugins/websocket.js'
import { extendSessionTimer, clearClosureGraceTimer, startClosureGraceTimer } from './session-timer.service.js'
import { isMitraReachable, recomputeCapacityBySession } from './mitra-status.service.js'
import { consumePaymentSession, failPaymentSession, getPaymentSession } from './payment.service.js'
import { sendPushNotification } from './notification.service.js'
import {
getExtensionTimeoutConfig,
getExtensionDefaultActionOnTimeout,
} from './config.service.js'
import {
UserType,
SessionStatus,
ExtensionStatus,
TransactionType,
WsMessage,
PaymentRequestStatus,
ExtensionTimeoutAction,
PairingFailureCause,
} from '../constants.js'
const sql = getDb()
// Extension timeout map: extensionId → timeoutId
const extensionTimeouts = new Map()
const getExtensionTimeoutMs = async () => {
const { extension_timeout_seconds } = await getExtensionTimeoutConfig()
return extension_timeout_seconds * 1000
}
const getExtensionTimeoutAction = async () => {
const { extension_default_action_on_timeout } = await getExtensionDefaultActionOnTimeout()
return Object.values(ExtensionTimeoutAction).includes(extension_default_action_on_timeout)
? extension_default_action_on_timeout
: ExtensionTimeoutAction.AUTO_APPROVE
}
/**
* Customer requests an extension.
*
* `extension_payment_request_id` is REQUIRED. The payment session must:
* - belong to this customer
* - be in `confirmed` status (not yet consumed)
* - have `is_extension = true`
* - have `is_first_session_discount = false` (extensions never use the first-session discount)
*
* The payment session is NOT consumed at request time. It is consumed at approval moment
* (mitra explicit accept OR auto-approve fires).
*/
export const requestExtension = async (sessionId, customerId, { duration_minutes, price, extension_payment_request_id }) => {
// Verify session belongs to customer and is in an extendable state.
// customer_display_name is pulled along for the FCM body when the mitra
// misses the WS frame.
const [session] = await sql`
SELECT cs.id, cs.customer_id, cs.mitra_id, cs.status, cs.topic_sensitivity,
c.display_name AS customer_display_name
FROM chat_sessions cs
INNER JOIN customers c ON c.id = cs.customer_id
WHERE cs.id = ${sessionId} AND cs.customer_id = ${customerId}
AND cs.status IN (${SessionStatus.ACTIVE}, ${SessionStatus.CLOSING})
`
if (!session) {
throw Object.assign(new Error('Session not found or already ended'), { code: 'SESSION_NOT_ACTIVE', statusCode: 409 })
}
// Validate extension payment session
if (!extension_payment_request_id) {
throw Object.assign(new Error('extension_payment_request_id is required'), {
code: 'VALIDATION_ERROR', statusCode: 422,
})
}
const payRequest = await getPaymentSession(extension_payment_request_id)
if (!payRequest) {
throw Object.assign(new Error('Payment session not found'), { code: 'NOT_FOUND', statusCode: 404 })
}
if (payRequest.customer_id !== customerId) {
throw Object.assign(new Error('Payment session does not belong to this customer'), {
code: 'FORBIDDEN', statusCode: 403,
})
}
if (payRequest.status !== PaymentRequestStatus.CONFIRMED) {
throw Object.assign(new Error(`Payment session is ${payRequest.status}, must be confirmed`), {
code: 'INVALID_STATE', statusCode: 409,
})
}
if (!payRequest.is_extension) {
throw Object.assign(new Error('Payment session is not flagged as an extension payment'), {
code: 'INVALID_STATE', statusCode: 409,
})
}
if (payRequest.is_first_session_discount) {
throw Object.assign(new Error('First-session discount is not available for extensions'), {
code: 'FIRST_SESSION_DISCOUNT_NOT_ALLOWED', statusCode: 400,
})
}
// Create extension record (linked to its payment session)
const [extension] = await sql`
INSERT INTO session_extensions (session_id, requested_duration_minutes, requested_price, status, payment_request_id)
VALUES (${sessionId}, ${duration_minutes}, ${price}, ${ExtensionStatus.PENDING}, ${extension_payment_request_id})
RETURNING id, session_id, requested_duration_minutes, requested_price, status, requested_at, payment_request_id
`
// Pause the session
await sql`UPDATE chat_sessions SET status = ${SessionStatus.EXTENDING} WHERE id = ${sessionId}`
await recomputeCapacityBySession(sessionId)
// Resolve timeout once so we can both surface it in the WS payload and start the server-side timer.
const timeoutMs = await getExtensionTimeoutMs()
const timeoutSeconds = Math.round(timeoutMs / 1000)
// Notify mitra — include current topic sensitivity so UI can highlight.
// If the mitra isn't on this session's chat WS (on Home/Undangan, in
// another chat, or app backgrounded), fall back to FCM. The session-
// scoped WS is the only channel that reaches the in-chat `_buildExtensionView`
// in real time; FCM gets them to /chat/session/:id, where chat connect
// restores the pending extension state via /chat/:sessionId/info.
const wsSent = sendToSessionParticipant(sessionId, UserType.MITRA, {
type: WsMessage.EXTENSION_REQUEST,
extension_id: extension.id,
session_id: sessionId,
duration_minutes,
price,
topic_sensitivity: session.topic_sensitivity,
timeout_seconds: timeoutSeconds,
})
if (!wsSent) {
await sendPushNotification(UserType.MITRA, session.mitra_id, {
title: 'Permintaan Perpanjang',
body: `${session.customer_display_name} mau lanjut +${duration_minutes} menit`,
data: {
type: WsMessage.EXTENSION_REQUEST,
session_id: sessionId,
extension_id: extension.id,
duration_minutes,
price,
timeout_seconds: timeoutSeconds,
action: 'open_extension',
},
})
}
// Notify customer that chat is paused
sendToSessionParticipant(sessionId, UserType.CUSTOMER, {
type: WsMessage.SESSION_PAUSED,
session_id: sessionId,
reason: 'extension_pending',
})
const timeoutId = setTimeout(async () => {
try {
await timeoutExtension(extension.id, sessionId, session.mitra_id)
} catch (err) {
console.error('timeoutExtension failed', { extensionId: extension.id, sessionId, err })
}
}, timeoutMs)
extensionTimeouts.set(extension.id, timeoutId)
return extension
}
export const respondToExtension = async (extensionId, sessionId, mitraId, accepted) => {
// Verify session belongs to this mitra
const [session] = await sql`
SELECT id FROM chat_sessions WHERE id = ${sessionId} AND mitra_id = ${mitraId}
`
if (!session) {
throw Object.assign(new Error('Session not found'), { code: 'FORBIDDEN', statusCode: 403 })
}
return finalizeExtension(extensionId, sessionId, accepted, /* viaTimeout */ false)
}
/**
* Internal: applies the accepted/rejected outcome. Used by both explicit response
* and the data-driven timeout path.
*/
const finalizeExtension = async (extensionId, sessionId, accepted, viaTimeout) => {
const status = accepted ? ExtensionStatus.ACCEPTED : ExtensionStatus.REJECTED
const [extension] = await sql`
UPDATE session_extensions
SET status = ${status}, responded_at = NOW()
WHERE id = ${extensionId} AND session_id = ${sessionId} AND status = ${ExtensionStatus.PENDING}
RETURNING id, session_id, requested_duration_minutes, requested_price, status, payment_request_id
`
if (!extension) {
if (viaTimeout) return null // race: already resolved before timer fired
throw Object.assign(new Error('Extension not found or already resolved'), {
code: 'EXTENSION_RESOLVED', statusCode: 409,
})
}
// Clear timeout
const timeoutId = extensionTimeouts.get(extensionId)
if (timeoutId) {
clearTimeout(timeoutId)
extensionTimeouts.delete(extensionId)
}
if (accepted) {
// Charge fires AT approval moment (explicit OR auto-approve).
if (extension.payment_request_id) {
await consumePaymentSession(extension.payment_request_id)
}
// Clear any pending grace timer from the previous expiry
clearClosureGraceTimer(sessionId)
// Extend the session
const extended = await extendSessionTimer(extension.session_id, extension.requested_duration_minutes)
// Resume session
await sql`UPDATE chat_sessions SET status = ${SessionStatus.ACTIVE} WHERE id = ${extension.session_id}`
await recomputeCapacityBySession(extension.session_id)
// Record transaction
await sql`
INSERT INTO customer_transactions (customer_id, session_id, type, amount)
SELECT customer_id, id, ${TransactionType.EXTENSION}, ${extension.requested_price}
FROM chat_sessions WHERE id = ${extension.session_id}
`
// Notify both parties. Include the freshly-extended `expires_at` so the
// customer's local seconds-left ticker can resume immediately — without it,
// the client has to wait until the next 60s SESSION_TIMER ping to pick up
// the new deadline, leaving the floating expired banner stuck on-screen.
sendToSessionParticipant(sessionId, UserType.CUSTOMER, {
type: WsMessage.EXTENSION_RESPONSE,
accepted: true,
duration_minutes: extension.requested_duration_minutes,
expires_at: extended?.expires_at ?? null,
via_timeout: viaTimeout,
})
sendToSessionParticipant(sessionId, UserType.CUSTOMER, {
type: WsMessage.SESSION_RESUMED,
session_id: sessionId,
})
sendToSessionParticipant(sessionId, UserType.MITRA, {
type: WsMessage.SESSION_RESUMED,
session_id: sessionId,
})
} else {
// Rejected — no charge. Fail the extension payment session if present.
// viaTimeout=false here means an explicit mitra reject (the timer path goes through
// timeoutExtension which never enters this branch with viaTimeout=true for reject).
if (extension.payment_request_id) {
await failPaymentSession(extension.payment_request_id, PairingFailureCause.EXTENSION_REJECTED)
}
await sql`UPDATE chat_sessions SET status = ${SessionStatus.CLOSING} WHERE id = ${extension.session_id}`
await recomputeCapacityBySession(extension.session_id)
sendToSessionParticipant(sessionId, UserType.CUSTOMER, {
type: WsMessage.EXTENSION_RESPONSE,
accepted: false,
via_timeout: viaTimeout,
})
sendToSessionParticipant(sessionId, UserType.MITRA, {
type: WsMessage.SESSION_CLOSING,
session_id: sessionId,
})
sendToSessionParticipant(sessionId, UserType.CUSTOMER, {
type: WsMessage.SESSION_CLOSING,
session_id: sessionId,
})
startClosureGraceTimer(sessionId)
}
return extension
}
/**
* Data-driven timeout handler.
*
* - read `extension_default_action_on_timeout` config:
* - 'auto_approve': check mitra reachability (WS + Valkey online). If both OK → approve.
* If either is offline/disconnected → fall back to reject (no charge).
* - 'auto_reject' (back-compat flag): reject regardless.
*/
const timeoutExtension = async (extensionId, sessionId, mitraId) => {
extensionTimeouts.delete(extensionId)
// Confirm extension is still pending (race with explicit response)
const [pending] = await sql`
SELECT id FROM session_extensions
WHERE id = ${extensionId} AND status = ${ExtensionStatus.PENDING}
`
if (!pending) return
const action = await getExtensionTimeoutAction()
// Track WHY we ended up rejecting so the failed-pairings audit row gets the right tag.
// Default: configured policy is auto_reject → use EXTENSION_REJECTED.
let causeTag = PairingFailureCause.EXTENSION_REJECTED
let reasonForClient = 'timeout'
if (action === ExtensionTimeoutAction.AUTO_APPROVE) {
// Safeguard: mitra must be reachable (online in Valkey AND connected via WS).
// Never use "in-session" as a proxy for "online".
const wsConnected = isUserOnlineWs(UserType.MITRA, mitraId)
const onlineFlag = await isMitraReachable(mitraId)
if (wsConnected && onlineFlag) {
// Approve via the same path as explicit accept.
await finalizeExtension(extensionId, sessionId, /* accepted */ true, /* viaTimeout */ true)
return
}
// Safeguard tripped — treat as auto-reject (no charge), but tag the audit row distinctly
// so CC operators can see this was a system-safety decision, not a mitra reject or a
// configured auto-reject policy decision.
causeTag = PairingFailureCause.EXTENSION_SAFEGUARD_TRIPPED
reasonForClient = 'safeguard'
}
// auto_reject (configured) OR auto_approve-with-safeguard-tripped — both end with
// the extension marked TIMEOUT, no charge, session moves to CLOSING. The cause_tag
// distinguishes them in the failed-pairings audit log. RETURNING guards against a race
// with explicit accept/decline that landed between the pending check above and here —
// if no row was matched, the extension is no longer ours to time out.
const [timedOut] = await sql`
UPDATE session_extensions
SET status = ${ExtensionStatus.TIMEOUT}, responded_at = NOW()
WHERE id = ${extensionId} AND status = ${ExtensionStatus.PENDING}
RETURNING id, payment_request_id
`
if (!timedOut) return
if (timedOut.payment_request_id) {
await failPaymentSession(timedOut.payment_request_id, causeTag)
}
// Move session to closing & notify both parties (matches the explicit-reject UX).
await sql`UPDATE chat_sessions SET status = ${SessionStatus.CLOSING} WHERE id = ${sessionId}`
await recomputeCapacityBySession(sessionId)
sendToSessionParticipant(sessionId, UserType.CUSTOMER, {
type: WsMessage.EXTENSION_RESPONSE,
accepted: false,
reason: reasonForClient,
})
sendToSessionParticipant(sessionId, UserType.MITRA, {
type: WsMessage.SESSION_CLOSING,
session_id: sessionId,
})
sendToSessionParticipant(sessionId, UserType.CUSTOMER, {
type: WsMessage.SESSION_CLOSING,
session_id: sessionId,
})
startClosureGraceTimer(sessionId)
}