Commit Graph

7 Commits

Author SHA1 Message Date
d60c048776 Mitra availability: read paths respect require_mitra_ping=false
When the operator sets require_mitra_ping=false, the auto-offline sweep
early-returns (by design — "don't gate online status on heartbeat
freshness"). The three Valkey read paths still gated on heartbeat
freshness anyway, which trapped the system: sweep won't remove the
mitra from mitras:online, but readers reject them as stale. The customer
CTA stayed permanently disabled with no recovery.

Fix all three to skip the heartbeat-freshness check when require_ping
is off, matching the sweep's contract:
- computeAvailabilityFromValkey (customer beacon)
- isMitraReachable (extension service)
- findAvailableMitrasFromValkey (pairing candidate finder)

The Postgres fallbacks already did the right thing (is_online only,
no heartbeat compare); this aligns the Valkey hot path.

Also: PATCH /internal/config/mitra-ping now publishes config:invalidate
for require_mitra_ping and mitra_stale_after_seconds, and the subscriber
in mitra-status.service was widened to listen for both. Flipping the
toggle in CC now busts the 10s availability snapshot immediately instead
of waiting out the TTL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 22:09:41 +08:00
553dbac52f Phase 6: Valkey availability mirror — move read path off Postgres
Mitra-availability state (online flag, deactivated flag, per-mitra session
count, heartbeat liveness) mirrored into Valkey so the customer beacon
+ pairing blast + dashboard counts no longer hit Postgres on the hot path.
Postgres remains the durable source of truth; Valkey state is fully
derivable via seedFromPostgres on startup + reconnect.

Schema
- mitras:online           SET    — mirror of is_online
- mitras:deactivated      SET    — mirror of is_active=false
- mitra:capacity:<id>     STRING — active+pending_payment session count
- mitra💓<id>    STRING — ISO timestamp of last ping
- availability:snapshot   JSON   — beacon cache, TTL 10s, cluster-shared

Write paths (Postgres first, best-effort Valkey)
- setOnline/setOffline mirror SADD/SREM + heartbeat SET/DEL
- updateMitraStatus mirrors mitras:deactivated AND revokes auth_sessions
  on deactivate (bounds the "ghost online" window to access-token TTL)
- heartbeat is Valkey-only on the hot path; the per-ping Postgres UPDATE
  on last_heartbeat_at is eliminated (was 1,200 ops/min at prod scale)
- chat_session lifecycle (accept/end/reroute/extension/expiry) calls
  recomputeCapacityForMitra after each UPDATE — derive-from-truth avoids
  the bookkeeping risk of per-transition INCR/DECR

Read paths (Valkey-first, Postgres fallback on Valkey error)
- isMitraReachable: SISMEMBER mitras:online + heartbeat freshness
- findAvailableMitras: SDIFF + pipelined GETs, filter by capacity + heartbeat
- countAvailableMitrasFromCache: Valkey-driven, cached cluster-wide 10s TTL
- dashboard online count: SCARD
- Each reader wraps Valkey ops in try/catch → Postgres fallback on outage

Heartbeat path on /api/mitra/status/heartbeat
- resolveMitra preHandler replaced with heartbeatGuard: SISMEMBER on
  mitras:deactivated (~0 DB hits per ping). Falls back to full DB
  resolveMitra if Valkey is unreachable so a Valkey outage doesn't
  silently accept heartbeats from deactivated mitras.

Three sweeps, env-configurable cadences
- MITRA_AUTO_OFFLINE_SWEEP_SECONDS (30) — Valkey-driven stale detection
- HEARTBEAT_MIRROR_INTERVAL_SECONDS (60) — batched UPSERT writes
  Valkey timestamps to Postgres last_heartbeat_at via UNNEST (1 statement
  per cycle, idempotent across instances)
- VALKEY_ONLINE_MIRROR_SWEEP_SECONDS (300) — periodic reseed heals drift

Startup
- restoreActiveTimers → seedFromPostgres → bind listeners
- onValkeyReady re-runs the seed on every reconnect (cold start + reseed
  on Valkey restart, no manual intervention)

Failure semantics
- Read fallback: every Valkey read wrapped, falls back to existing
  Postgres JOIN query — system stays correct during Valkey outage,
  performance degrades not breaks
- Write best-effort: Postgres write commits before Valkey is touched;
  Valkey errors log + continue; reconciliation sweep heals drift
- Auto-offline sweep aborts entirely on Valkey error (does NOT mass-
  offline via Postgres scan during Valkey hiccup)

Tests
- New: 32 integration tests in mitra-status.valkey-mirror.test.js
  covering seed, write-through, fallbacks, capacity lifecycle,
  auto-offline sweep, heartbeat mirror, deactivation flow, beacon cache
- Updated: fixtures.js seeds Valkey alongside Postgres when isOnline=true
- Updated: helpers/db.js resetDb also flushes test Valkey
- Fixed 2 pre-existing session-timer flakes (string IDs failed uuid
  parse; vi.advanceTimersByTimeAsync raced real Postgres I/O)
- All 124/124 backend tests pass (was 90/92)

Docs
- requirement/valkey-online-mirror-plan.md — canonical plan
- requirement/valkey-online-mirror-testing.md — manual E2E checklist
- requirement/deployment.md — infra + Valkey persistence guidance for
  prod (Memorystore Standard tier recommended; migration from
  self-hosted Valkey is zero-downtime via reseed-from-Postgres)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 18:07:55 +08:00
a8c20d929e Mitra ping: decouple stale-after from app cadence
Splits the single mitra_ping_interval_seconds config (which conflated
"how often the app pings" with "how long until offline" through a
hidden ×3 multiplier) into two orthogonal knobs:

- mitra_stale_after_seconds (CC-tunable, app_config DB row): the
  operator-facing offline threshold. What you set is what you get —
  no multiplier. Default 45s (preserves today's effective grace at
  the legacy 15s ping default).
- MITRA_HEARTBEAT_CADENCE_SECONDS (env var, default 30s): how often
  the mitra app sends a heartbeat. Backend-fixed per deployment;
  surfaced to the mitra app via /api/mitra/status.

Backend:
- config.service: getMitraPingConfig returns the new tuple
  {require_ping, stale_after_seconds, heartbeat_cadence_seconds}.
  Env parser handles blank/non-numeric → 30 fallback.
- mitra-status.service::autoOfflineStaleMitras drops the *3 and uses
  stale_after_seconds directly.
- mitra-status.service::getStatus returns heartbeat_cadence_seconds
  instead of ping_interval_seconds.
- /internal/config/mitra-ping PATCH validates
  stale_after_seconds >= cadence, returns 422 with a clear message
  ("stale_after_seconds must be a number >= heartbeat cadence (30s)").
- migrate.js: adds mitra_stale_after_seconds default 45. The old
  mitra_ping_interval_seconds key is left in place (vestigial) —
  no live code reads it; safe to drop after one release.

Mitra app:
- status_notifier reads heartbeat_cadence_seconds, uses it directly
  as the Timer.periodic interval. Defaults to 30s if missing (older
  backend safety).

Control center:
- SettingsPage: renames "Interval Ping" → "Ambang offline", input
  min={heartbeat_cadence_seconds}, shows the cadence as a read-only
  value with explanation that it's env-controlled.

Verified end-to-end on dev backend:
- GET /api/mitra/status returns {…, heartbeat_cadence_seconds: 30}
- GET /internal/config/mitra-ping returns {require_ping,
  stale_after_seconds: 45, heartbeat_cadence_seconds: 30}
- PATCH with stale_after_seconds=20 → 422 with cadence message
- PATCH with stale_after_seconds=120 → 200, persisted
- Env override (=60, blank, "foo") parses correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 20:39:59 +08:00
d09e50af55 Phase 3.7: paid pairing flow + returning chat + extension flip
- Backend: payment_sessions + pairing_failures tables; payment.service.js
  and pairing-failure.service.js (new); rewritten pairing.service.js
  (payment-gated blast + targeted "Curhat lagi" + cancel + fallback);
  rewritten extension.service.js (data-driven auto-approve with offline
  safeguard, charge-at-approval); pricing.service.js (extension tiers
  without free trial); mitra-status.service.js (countAvailableMitras
  cached path); 60s sweeper for stale payment sessions
- Backend routes: client.payment.routes, client.mitra-availability.routes,
  internal/failed-pairings.routes; client.chat.routes rewritten for
  payment-gated start + /returning + /cancel + /fallback-to-blast;
  internal/config.routes adds 4 new keys with Valkey invalidate publish
- client_app: mitra-availability poll, payment screen + notifier, pairing
  notifier rewrite (PairingTargetedWaiting + PairingFailed states),
  targeted-waiting overlay + bestie-unavailable dialog, "Curhat lagi"
  CTA, failed-pairing terminal, extension via payment-session
- mitra_app: PairingRequestType enum, returning-chat 20s countdown
  auto-dismiss, extension card "otomatis disetujui" copy
- control_center: 4 new config rows in Settings, Failed Pairings page
  (filter + paginate + action menu), sidebar + route registered
- Test infrastructure: Vitest backend (7/7 pass), Playwright CC (4/4
  pass), Maestro mobile scaffold (CLI install pending)
- Bugs found via Playwright + fixed: LoginPage labels not associated
  with inputs (a11y); backend internal CORS missing PATCH/PUT/DELETE
  in allow-methods (silent settings breakage in browsers since Stage 4)
- Docs: phase3.7.md PRD, phase3.7-plan.md, phase3.7-questions.md (Q&A),
  phase3.7-testing.md (E2E checklist), phase3.7-test-run-2026-05-03.md
  (today's run results)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 23:02:49 +08:00
ed765d230c Phase 3.1 WS2: Backend FCM fallback, ping config, unread API
- Add require_mitra_ping + mitra_ping_interval_seconds config keys (migration)
- Add getMitraPingConfig/setMitraPingConfig to config service
- Add GET/PATCH /internal/config/mitra-ping routes for control center
- Update mitra status service: honor ping config in auto-offline sweep,
  include ping config in GET /api/mitra/status response
- Enhance pairing FCM payload with action: 'open_accept' for deep-link
- Add FCM fallback to closure.service (initiateEarlyEnd, completeSession)
- Add FCM fallback to session-timer.service (onSessionExpired)
- Add unread count queries (getActiveSessionByCustomerWithUnread,
  getActiveSessionsByMitraWithUnread)
- Add GET /api/client/chat/session/active-with-unread route
- Add GET /api/mitra/chat-requests/sessions/active-with-unread route

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:22:41 +08:00
b0502ac92b Phase 3 testing fixes: Fastify 5, SSE→WebSocket+FCM, enums, security, session lifecycle
- Upgrade Fastify 4→5 with all plugins (@fastify/websocket 11, cors 11, sensible 6)
- Migrate all SSE endpoints to WebSocket + FCM push (mitra chat requests, customer pairing status)
- Add flutter_local_notifications for foreground push notifications with sound
- Add splash screen to both apps (hide auth loading flash)
- Introduce constants/enums across entire codebase (no raw string literals)
- Move price tiers from hardcoded array to app_config DB (data-driven, includes 1-min test tier)
- Add session ownership validation on all shared chat routes
- Add ownership checks on endSession, respondToExtension, requestExtension
- Fix session timer: auto-complete expired/stale sessions on server restart
- Add 5-min grace period for abandoned closing sessions
- Fix extension flow: proper session_resumed handling, clearExtensionRequest, closure grace timer cleanup
- Fix chat screens: ConnectChat in initState, session status check on connect
- Fix customer expired view: 5-min countdown, closure state priority over expired state
- Fix mitra extension UI: loading spinner, disable buttons, handle EXTENSION_RESOLVED error
- Fix GoRouter navigation consistency (no more Navigator.pushNamed)
- Fix goodbye view keyboard overflow (SingleChildScrollView)
- Add active session card on customer home screen with refresh on navigate back
- Fix PricingBottomSheet extension mode (RequestExtension instead of new pairing)
- Send session_resumed to both parties on extension accept

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 00:17:25 +08:00
d668112edd Phase 2 scaffold: mitra online status & pairing logic
Add mitra online/offline status with heartbeat-based auto-offline,
customer-mitra pairing via Valkey pub/sub blast, session management,
and control center dashboard with real-time stats.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 23:17:49 +08:00