Files
halobestie-clone/requirement/deployment.md
Ramadhan Sjamsani 91bdbd5289 build(backend): Dockerize for self-hosted deploy + deploy/log docs
Backend deploy target is self-hosted Docker (VPS / Kubernetes / Docker
Engine), not Cloud Run. Add a multi-stage Dockerfile (Node 20, bcrypt
compiled in build stage, non-root runtime), .dockerignore, a staging
docker-compose, and DEPLOY.md covering install, build, migrate, run, and
log mapping/rotation. Pin engines.node>=20. Update deployment.md runbook
and backend/CLAUDE.md infra line off Cloud Run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 15:10:59 +08:00

11 KiB
Raw Blame History

Deployment notes

Operational decisions and dependency configuration for staging/production. Keep this updated as we make infra choices; cross-link from feature plans when a deploy-time setting matters.

Infrastructure summary

Component Service Tier / Notes
Backend (public + internal) Self-hosted Docker (VPS / Kubernetes / Docker Engine) NOT Cloud Run. Container from backend/Dockerfile; horizontal scaling via replicas; SIGTERM trapped for graceful drain (server.js)
Database GCP Cloud SQL (PostgreSQL) Source of truth for all durable state
Pub/sub + cache Valkey Self-hosted on VM today; Memorystore Standard (HA) recommended for prod (see § Valkey)
Networking GCP VPC Internal listener (port 3001) never exposed; CC reaches it via VPN
Payment Xendit See phase5-xendit-plan.md for keys / webhook URL setup
Auth Self-managed JWT + FCM-only Firebase See backend/CLAUDE.md

Valkey

Valkey is used for two distinct purposes:

  1. Pub/sub — cross-instance event fan-out (chat messages, session lifecycle, config invalidation). See backend/src/plugins/valkey.js.
  2. Availability mirrormitras:online, mitras:deactivated, mitra:capacity:<id>, mitra:heartbeat:<id>, and availability:snapshot per valkey-online-mirror-plan.md. Postgres remains the durable source of truth; Valkey is the hot read path.

Persistence — required or optional?

Not required. All durable state lives in Postgres; Valkey is a cache + ephemeral liveness layer that fully rebuilds via seedFromPostgres() on backend reconnect.

What's actually in Valkey, and what happens if it's wiped:

Key Derivable from Postgres? Cost of loss
mitras:online yes reseeded on reconnect
mitras:deactivated yes reseeded on reconnect
mitra:capacity:<id> yes (COUNT(*) FROM chat_sessions) reseeded on reconnect
mitra:heartbeat:<id> no — pure transient liveness seed writes NOW; ≤ a few seconds of fuzz on last_heartbeat_at forensics
availability:snapshot recomputable next beacon poll repopulates

Reader code in services/* has explicit Postgres fallbacks for every Valkey op, so the cold-cache window during a restart degrades performance, not correctness.

Persistence recommendation by environment

Environment Setting Reason
Dev / local No persistence (--save "" --appendonly no or just default) Restarts wipe state; reseed handles it cleanly; zero disk overhead
Staging AOF on (--appendonly yes) Verifies prod-like behavior; tiny disk cost
Production AOF on, optionally RDB too (--appendonly yes --save 60 1000) Eliminates cold-cache window after restart; trivial disk footprint (few MB)

The application code is identical across all three — persistence is a deploy-time knob, not a code-level concern.

Self-hosted Valkey (current state, dev/staging)

Docker container on the existing VM. Reference config:

valkey:
  image: valkey/valkey:7-alpine
  command: valkey-server --appendonly yes --save 60 1000
  volumes:
    - valkey-data:/data
  ports:
    - "6379:6379"
  restart: unless-stopped

Backend reaches it via VALKEY_URL=redis://<vm-ip>:6379 in backend/.env (or Cloud Run env var).

Memorystore migration (when going to prod)

The reseed-from-Postgres flow makes migration trivial — Valkey state is never load-bearing:

  1. Provision Memorystore for Valkey, Standard tier (HA with replica) in the same VPC + region as Cloud Run.
    • Smallest available size (~1 GB) is plenty; actual data footprint is well under 1 MB.
    • Cost: ~$50/month at minimum sizing in asia-southeast2.
  2. Update Cloud Run env: VALKEY_URL=redis://<memorystore-internal-ip>:6379.
  3. Deploy new revision. Cloud Run rolling deploy → new instances seed Memorystore from Postgres; old instances drain on old Valkey.
  4. Shut down old Valkey once traffic has migrated.

Zero downtime. No data migration needed (state is derivable). The cold-cache window on new instances is handled by the existing Postgres-fallback reader paths.

Tier choice rationale

Tier When to use Failover behavior
Self-hosted Docker Dev, staging Manual restart; backend reseeds when Valkey comes back
Memorystore Basic Cost-sensitive single-AZ staging ~15 min outage per maintenance event; backend handles via Postgres fallback
Memorystore Standard (HA) Production ~30s automatic failover; replica keeps data live

The system is correct on any tier — HA reduces customer-visible latency spikes during Valkey events from minutes to seconds.

Cloud Run

(Placeholder for prod tuning — fill in as we make decisions about region, min/max instances, concurrency, secrets manager wiring.)

Manual staging deploy runbook

Goal: stand up a staging backend so the Android staging flavor (com.mybestie.staging) has a real API_BASE_URL to talk to. Done manually for now (no CI/CD yet — see open ops).

Deploy target: self-hosted Docker (VPS / Kubernetes / Docker Engine) — not Cloud Run. The backend ships a multi-stage backend/Dockerfile (Node 20, non-root runtime, native bcrypt compiled in the build stage). Build with docker build -t halobestie-backend ./backend.

Full operational runbook — install Docker, build/push, migrate, run (Docker + Compose + k8s), and log mapping/rotation — lives in backend/DEPLOY.md. The steps below are the staging-bring-up summary.

A1 — Provision the staging database (Cloud SQL Postgres)

  1. Create a Cloud SQL Postgres instance (or a separate halobestie_staging DB on a shared instance). Pin the same region as the Cloud Run service.
  2. Capture its connection string for DATABASE_URL (use the Cloud SQL connector / Unix socket form for Cloud Run, or private IP over the VPC connector).
  3. Run migrations + seed against it:
    cd backend
    DATABASE_URL=postgresql://... npm run db:migrate
    DATABASE_URL=postgresql://... npm run db:seed
    

A2 — Provision staging Valkey — self-hosted Docker on the VM is fine for staging (--appendonly yes, see § Valkey). Note the VALKEY_URL.

A3 — Staging Firebase Admin creds — the app's staging google-services.json / GoogleService-Info.plist point at Firebase project my-bestie-876ec. The backend's FIREBASE_SERVICE_ACCOUNT must be a service-account key from that same project, or FCM push + token verification will silently target the wrong project. Mount it as a secret and set FIREBASE_SERVICE_ACCOUNT_PATH (or switch to a Secret Manager mount).

A4 — Build the image + run migrations, then start the container.

Build (on a build host or in CI), then push to your registry:

docker build -t <registry>/halobestie-backend:staging ./backend
docker push <registry>/halobestie-backend:staging

Run migrations as a one-off before (re)starting the service — never auto-migrate on boot (replica race):

docker run --rm --env-file backend/.env.staging \
  <registry>/halobestie-backend:staging node src/db/migrate.js
# first deploy only:
docker run --rm --env-file backend/.env.staging \
  <registry>/halobestie-backend:staging node src/db/seed.js

Run the service (plain Docker Engine example; k8s = Deployment + Service with the same env/secrets and liveness/readiness probes on :3000):

docker run -d --name halobestie-staging \
  --env-file backend/.env.staging \
  -p 3000:3000 \
  -v /path/to/firebase-sa.json:/secrets/firebase-sa.json:ro \
  --restart unless-stopped \
  <registry>/halobestie-backend:staging
  • Publish only port 3000. The internal listener (3001) stays bound to 127.0.0.1 inside the container — do not map it.
  • FIREBASE_SERVICE_ACCOUNT_PATH must point at the mounted path (e.g. /secrets/firebase-sa.json), not a baked-in file.
  • Put a TLS-terminating reverse proxy (Nginx / Traefik / Caddy) in front for https://staging-api.halobestie.com.

Staging-specific env values (backend/.env.staging; see backend/.env.example for the full list):

Var Staging value
AUTH_JWT_SECRET a fresh secret — not the prod one
XENDIT_ENABLED false until you wire test-mode keys + webhook
XENDIT_SECRET_KEY / XENDIT_WEBHOOK_TOKEN Xendit test credentials
XENDIT_SUCCESS/FAILURE_REDIRECT_URL staging backend's /payment/return/* URLs
FAZPASS_ENABLED false (test-user OTP bypass path) unless testing real OTP
CC_ORIGIN staging control-center origin (if deployed)
ADMIN_EMAIL / ADMIN_PASSWORD staging control-center login

Public listener only. The internal listener (port 3001, control center) must stay off the public internet — don't expose it from this Cloud Run service. CC for staging, if needed, goes behind the VPC/VPN per the root architecture rules.

A5 — Capture the URL. Point a DNS record (e.g. staging-api.halobestie.com) at the host/reverse proxy and terminate TLS there. This HTTPS URL is the value the app needs in Phase B.

App handoff (Phase B) — once A5 gives a URL

  1. Put the real URL in client_app/env/staging.json + mitra_app/env/staging.json (API_BASE_URL), and remove the _TODO key from the client file.
  2. Build the staging APK:
    cd client_app
    flutter build apk --flavor staging -t lib/main_staging.dart --dart-define-from-file=env/staging.json
    
    Output: build/app/outputs/flutter-apk/app-staging-release.apk.
  3. Distribute via Firebase App Distribution (debug-signed APK is accepted — no upload keystore needed for staging) or share the APK directly. com.mybestie.staging installs side-by-side with prod.

Release signing is still debug keys (client_app/android/app/build.gradle.kts release { ... }). Fine for Firebase App Distribution / direct APK. A real upload keystore is only required if you later publish staging to a Play Store internal-testing track. iOS staging is not wired yet (only one Runner.xcscheme — no per-flavor schemes/build-configs).

Cloud SQL

(Placeholder — pool size, machine type, HA flag, backup retention.)

Xendit

See phase5-xendit-plan.md for credential setup and webhook URL configuration. Stage 8 (live E2E) is currently blocked on test-mode keys.

Open ops decisions

  • Confirm Memorystore Standard tier for prod deploy (recommended in § Valkey).
  • Pin GCP region for backend + Cloud SQL + Memorystore (all must match for sub-ms internal latency).
  • Secrets manager (GCP Secret Manager vs Cloud Run env vars) for AUTH_JWT_SECRET, XENDIT_SECRET_KEY, etc.
  • Backup retention policy for Cloud SQL.
  • CI/CD pipeline for Cloud Run deploys.