6 Commits

Author SHA1 Message Date
3052f7b799 Xendit webhook: metadata.app routing + survival audit log + rolling fallback file
Every Xendit invoice now carries metadata: { app: 'halobestie_v2' } so an
external webhook router (no DB access) can fan out v1/v2 traffic purely off
the echoed payload.

Every inbound webhook lands in a new webhook_logs table BEFORE auth or
business logic, so a forensic row survives 401/409/unknown/exception paths.
Primary fields are parsed as columns; raw_body keeps the full payload
verbatim. The handler captures outcome in closure-scoped vars and stamps
http_status/processing_result/processing_error in a single update before
the lone reply.send() — Fastify flushes reply.send() immediately, which
defeated the original finally-block stamp.

A non-UUID external_id no longer crashes the Postgres cast; it ACKs with
ignored_non_uuid_external_id so Xendit stops retrying legacy old-app IDs.

When the DB log itself fails, an optional rolling JSONL file sink absorbs
the event. Disabled by default — opt in via XENDIT_WEBHOOK_FALLBACK_ENABLED.
Naming: <NAME>-YYYY-MM-DD.jsonl in XENDIT_WEBHOOK_FALLBACK_DIR (default
./logs), basename XENDIT_WEBHOOK_FALLBACK_NAME (default
xendit-webhook-fallback). No stdout fallback by design.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 22:09:14 +08:00
553dbac52f Phase 6: Valkey availability mirror — move read path off Postgres
Mitra-availability state (online flag, deactivated flag, per-mitra session
count, heartbeat liveness) mirrored into Valkey so the customer beacon
+ pairing blast + dashboard counts no longer hit Postgres on the hot path.
Postgres remains the durable source of truth; Valkey state is fully
derivable via seedFromPostgres on startup + reconnect.

Schema
- mitras:online           SET    — mirror of is_online
- mitras:deactivated      SET    — mirror of is_active=false
- mitra:capacity:<id>     STRING — active+pending_payment session count
- mitra💓<id>    STRING — ISO timestamp of last ping
- availability:snapshot   JSON   — beacon cache, TTL 10s, cluster-shared

Write paths (Postgres first, best-effort Valkey)
- setOnline/setOffline mirror SADD/SREM + heartbeat SET/DEL
- updateMitraStatus mirrors mitras:deactivated AND revokes auth_sessions
  on deactivate (bounds the "ghost online" window to access-token TTL)
- heartbeat is Valkey-only on the hot path; the per-ping Postgres UPDATE
  on last_heartbeat_at is eliminated (was 1,200 ops/min at prod scale)
- chat_session lifecycle (accept/end/reroute/extension/expiry) calls
  recomputeCapacityForMitra after each UPDATE — derive-from-truth avoids
  the bookkeeping risk of per-transition INCR/DECR

Read paths (Valkey-first, Postgres fallback on Valkey error)
- isMitraReachable: SISMEMBER mitras:online + heartbeat freshness
- findAvailableMitras: SDIFF + pipelined GETs, filter by capacity + heartbeat
- countAvailableMitrasFromCache: Valkey-driven, cached cluster-wide 10s TTL
- dashboard online count: SCARD
- Each reader wraps Valkey ops in try/catch → Postgres fallback on outage

Heartbeat path on /api/mitra/status/heartbeat
- resolveMitra preHandler replaced with heartbeatGuard: SISMEMBER on
  mitras:deactivated (~0 DB hits per ping). Falls back to full DB
  resolveMitra if Valkey is unreachable so a Valkey outage doesn't
  silently accept heartbeats from deactivated mitras.

Three sweeps, env-configurable cadences
- MITRA_AUTO_OFFLINE_SWEEP_SECONDS (30) — Valkey-driven stale detection
- HEARTBEAT_MIRROR_INTERVAL_SECONDS (60) — batched UPSERT writes
  Valkey timestamps to Postgres last_heartbeat_at via UNNEST (1 statement
  per cycle, idempotent across instances)
- VALKEY_ONLINE_MIRROR_SWEEP_SECONDS (300) — periodic reseed heals drift

Startup
- restoreActiveTimers → seedFromPostgres → bind listeners
- onValkeyReady re-runs the seed on every reconnect (cold start + reseed
  on Valkey restart, no manual intervention)

Failure semantics
- Read fallback: every Valkey read wrapped, falls back to existing
  Postgres JOIN query — system stays correct during Valkey outage,
  performance degrades not breaks
- Write best-effort: Postgres write commits before Valkey is touched;
  Valkey errors log + continue; reconciliation sweep heals drift
- Auto-offline sweep aborts entirely on Valkey error (does NOT mass-
  offline via Postgres scan during Valkey hiccup)

Tests
- New: 32 integration tests in mitra-status.valkey-mirror.test.js
  covering seed, write-through, fallbacks, capacity lifecycle,
  auto-offline sweep, heartbeat mirror, deactivation flow, beacon cache
- Updated: fixtures.js seeds Valkey alongside Postgres when isOnline=true
- Updated: helpers/db.js resetDb also flushes test Valkey
- Fixed 2 pre-existing session-timer flakes (string IDs failed uuid
  parse; vi.advanceTimersByTimeAsync raced real Postgres I/O)
- All 124/124 backend tests pass (was 90/92)

Docs
- requirement/valkey-online-mirror-plan.md — canonical plan
- requirement/valkey-online-mirror-testing.md — manual E2E checklist
- requirement/deployment.md — infra + Valkey persistence guidance for
  prod (Memorystore Standard tier recommended; migration from
  self-hosted Valkey is zero-downtime via reseed-from-Postgres)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 18:07:55 +08:00
3fff4b1c6e Phase 5 Xendit: Stages 1-7 (XENDIT_ENABLED=false; Stage 8 pending creds)
Backend
- payment_sessions → payment_requests rename across DB schema + 29 files
- payment.service.js becomes product-agnostic owner: EventEmitter +
  Xendit wrapper + requestPayment / confirmPayment public API; legacy
  aliases retained for existing chat callers
- Webhook handler at POST /api/shared/payment/webhooks/xendit, with
  constant-time token verification (8 vitest cases)
- Server-driven pairing: payment.service emits
  payment_request.confirmed → pairing subscriber starts the blast.
  Legacy POST /chat/request still works during the cutover.
- Reconciliation sweeper extended (re-emits events for confirmed rows
  with no chat session)
- SIGTERM drain + startup reconciliation pass in server.js

Customer app
- waiting_payment_screen opens xendit_invoice_url via
  LaunchMode.inAppBrowserView
- searching / no-bestie / targeted-waiting / pairing-notifier updated
  to consume the new payment_request_id contract
- pending_payments_provider + bestie-unavailable dialog migrated

Dev / testing
- XENDIT_ENABLED=false is the safe default; .env.example documents the
  four new vars
- backend/.dev/xendit-fake-webhook.sh exercises the handler without
  ngrok
- 90/92 backend tests pass (two pre-existing session-timer flakes,
  unrelated); client_app analyzer clean
- requirement/phase5-xendit-plan.md is the canonical reference

Stage 8 (live E2E) blocked on Xendit test-mode keys. The dashboard's
single-webhook-URL constraint will be worked around via a self-poll
script next session.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 12:52:33 +08:00
1c9d81d81d Pricing: migrate from app_config JSON to relational tables
Replaces the two `pricing_*_tiers_json` blobs and five `first_session_discount_*`
keys in app_config with dedicated `pricing_tiers` and `pricing_promotions`
tables plus matching `_history` audit tables. UUID PKs, UNIQUE(mode, minutes)
natural-key constraint, optimistic-lock via `updated_at` token returning 409
STALE_WRITE on conflicts. Every mutation writes a history row capturing the
operator (changed_by from request.auth.userId) and change_kind.

CC SettingsPage replaces the JSON-textarea editors with per-row tables —
add / edit / soft-delete / reactivate / reorder, plus a buffered first-session
discount form with the same optimistic-lock contract. `minutes` and `mode` are
read-only on edit since they form the natural key; operators soft-delete and
recreate to change duration.

Stage 5 fixes a latent leak: `client.payment.routes.js` had its own local
`readDiscountConfig` that still read from app_config — would have silently
fallen to hardcoded defaults once the legacy rows were deleted. Now reads from
pricing_promotions via the shared service helper, so CC edits to the first-
session discount affect actual payment pricing on the next request.

Customer-facing GET /api/client/chat/pricing shape unchanged (id values are
now UUIDs instead of "5"/"12"/"60" but lookups happen by (mode, minutes), so
no app changes needed). 27 new backend tests, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 00:12:11 +08:00
d33d4419ea Phase 4 Stage 1: backend foundation (additive endpoints + schema)
Schema (idempotent migration):
- payment_sessions.is_free_trial -> is_first_session_discount (data copied)
- payment_sessions.mode TEXT NOT NULL DEFAULT 'chat' CHECK (chat|call)
- chat_sessions.topics TEXT[] for ESP picks (info-only)

New endpoints:
- GET /api/client/onboarding-state (drives verif sheet + S6 paywall gate)
- GET /api/client/chat-pricing (rewrite: chat+call groups + first-session
  discount block, per-customer eligibility)
- GET /api/shared/auth-providers (env-probed; replaces ENABLE_SOCIAL_AUTH
  build flag — frontend cutover lands in stage 2)
- GET /api/client/support-handles (Tanya Admin handles, CC-config-driven)

session_warning WS event fires once at 180s remaining.

app_config seeds (mock pricing tiers, first-session discount, support
handles, payment method order, end-session 2-step toggle).

CC SettingsPage: 3 new sections (first-session discount, pricing tiers
JSON editors, support handles).

15/15 Vitest passing. chat_sessions.is_free_trial also renamed for
consistency (plan only specified payment_sessions; pairing.service.js
read both).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 15:56:28 +08:00
d09e50af55 Phase 3.7: paid pairing flow + returning chat + extension flip
- Backend: payment_sessions + pairing_failures tables; payment.service.js
  and pairing-failure.service.js (new); rewritten pairing.service.js
  (payment-gated blast + targeted "Curhat lagi" + cancel + fallback);
  rewritten extension.service.js (data-driven auto-approve with offline
  safeguard, charge-at-approval); pricing.service.js (extension tiers
  without free trial); mitra-status.service.js (countAvailableMitras
  cached path); 60s sweeper for stale payment sessions
- Backend routes: client.payment.routes, client.mitra-availability.routes,
  internal/failed-pairings.routes; client.chat.routes rewritten for
  payment-gated start + /returning + /cancel + /fallback-to-blast;
  internal/config.routes adds 4 new keys with Valkey invalidate publish
- client_app: mitra-availability poll, payment screen + notifier, pairing
  notifier rewrite (PairingTargetedWaiting + PairingFailed states),
  targeted-waiting overlay + bestie-unavailable dialog, "Curhat lagi"
  CTA, failed-pairing terminal, extension via payment-session
- mitra_app: PairingRequestType enum, returning-chat 20s countdown
  auto-dismiss, extension card "otomatis disetujui" copy
- control_center: 4 new config rows in Settings, Failed Pairings page
  (filter + paginate + action menu), sidebar + route registered
- Test infrastructure: Vitest backend (7/7 pass), Playwright CC (4/4
  pass), Maestro mobile scaffold (CLI install pending)
- Bugs found via Playwright + fixed: LoginPage labels not associated
  with inputs (a11y); backend internal CORS missing PATCH/PUT/DELETE
  in allow-methods (silent settings breakage in browsers since Stage 4)
- Docs: phase3.7.md PRD, phase3.7-plan.md, phase3.7-questions.md (Q&A),
  phase3.7-testing.md (E2E checklist), phase3.7-test-run-2026-05-03.md
  (today's run results)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 23:02:49 +08:00