diff --git a/backend/.dockerignore b/backend/.dockerignore new file mode 100644 index 0000000..9f7cfe7 --- /dev/null +++ b/backend/.dockerignore @@ -0,0 +1,22 @@ +# Deps are reinstalled inside the image via `npm ci` +node_modules +npm-debug.log* + +# Secrets / local env — mounted at runtime, never baked in +.env +.env.* +!.env.example + +# VCS + tooling +.git +.gitignore +.vscode +.idea + +# Tests + coverage (not needed in the runtime image) +coverage +docker-compose*.yml +**/*.test.js + +# Docs +*.md diff --git a/backend/CLAUDE.md b/backend/CLAUDE.md index f5ab271..a534bf8 100644 --- a/backend/CLAUDE.md +++ b/backend/CLAUDE.md @@ -10,7 +10,7 @@ Fastify.js REST API serving both mobile apps and the internal control center. - **Database:** PostgreSQL via GCP Cloud SQL - **Auth:** Self-managed JWT (HS256 access, 1h) + opaque refresh token (30d, rotated, bcrypt-hashed in `auth_sessions`). Firebase Auth removed in Phase 3.4 (commit `f860ab6`). `firebase-admin` is kept but only for FCM messaging. - **Payment:** Xendit -- **Infra:** GCP Cloud Run +- **Infra:** Self-hosted Docker (VPS / Kubernetes / Docker Engine) — **not** Cloud Run. Multi-stage [Dockerfile](Dockerfile); deploy + log runbook in [DEPLOY.md](DEPLOY.md). DB is PostgreSQL (managed or self-hosted). ## Two Listeners diff --git a/backend/DEPLOY.md b/backend/DEPLOY.md new file mode 100644 index 0000000..235eda5 --- /dev/null +++ b/backend/DEPLOY.md @@ -0,0 +1,209 @@ +# Backend — Docker Deployment Guide + +Operational guide for building, deploying, and observing the Halo Bestie backend as a **Docker container** on self-hosted infra (VPS, Docker Engine, or Kubernetes). **Not** Cloud Run / serverless. + +> Architecture / env-var reference: [../requirement/deployment.md](../requirement/deployment.md) · [.env.example](.env.example) + +--- + +## 1. What gets deployed + +A single image (multi-stage [Dockerfile](Dockerfile)) running `node src/server.js`, which starts **two listeners** ([src/server.js](src/server.js)): + +| Listener | Bind | Port | Exposed? | +|---|---|---|---| +| Public API (`client_app` + `mitra_app`) | `0.0.0.0` | `PUBLIC_PORT` (default **3000**) | **Yes** — publish this | +| Internal API (control center) | `INTERNAL_HOST` (default `127.0.0.1`) | `INTERNAL_PORT` (default 3001) | **No** — loopback only, never publish | + +Runtime image: Node 20 (bookworm-slim), prod-only deps, runs as non-root `node`, native `bcrypt` precompiled in the build stage. + +--- + +## 2. Install Docker (one-time, on the host) + +### Ubuntu / Debian VPS +```bash +# Remove any old packages, then install Docker Engine from the official repo +curl -fsSL https://get.docker.com | sh + +# Run docker as your user without sudo (log out/in afterward) +sudo usermod -aG docker "$USER" + +# Verify +docker version +docker compose version # Compose v2 ships as a plugin with modern Docker +``` + +### Kubernetes +No Docker Engine needed on nodes — just push the image to a registry your cluster can pull from (see §4) and apply your manifests (§7). + +--- + +## 3. Configure environment + +Create `backend/.env.staging` (or `.env.production`) from the template — **never commit it**: +```bash +cp .env.example .env.staging +``` +Fill in at minimum (full list in [.env.example](.env.example)): + +| Var | Notes | +|---|---| +| `PUBLIC_PORT` | `3000` (keep default unless your proxy expects otherwise) | +| `INTERNAL_HOST` / `INTERNAL_PORT` | leave default `127.0.0.1:3001` — keeps control center private | +| `DATABASE_URL` | Postgres connection string | +| `VALKEY_URL` | `redis://:6379` | +| `SERVER_TZ` | `UTC` | +| `AUTH_JWT_SECRET` | **fresh per environment** — never reuse prod's | +| `FIREBASE_SERVICE_ACCOUNT_PATH` | path to the **mounted** SA JSON, e.g. `/secrets/firebase-sa.json`. Must be from the env's Firebase project (staging = `my-bestie-876ec`) | +| `XENDIT_ENABLED` | `false` until test keys + webhook are wired | +| `CC_ORIGIN`, `ADMIN_EMAIL`, `ADMIN_PASSWORD` | control-center access | + +> Secrets (`.env`, the Firebase SA JSON) are provided at **runtime** via `--env-file` / volume mounts / k8s Secrets. They are **not** baked into the image (`.dockerignore` excludes `.env*`). + +--- + +## 4. Build & push the image + +```bash +# From the repo root +docker build -t /halobestie-backend:staging ./backend + +# Push to your registry (Docker Hub, GHCR, GCP Artifact Registry, self-hosted, …) +docker push /halobestie-backend:staging +``` +For a purely single-host setup you can skip the registry and build directly on the host. + +--- + +## 5. Run database migrations (one-off) + +Run **before** (re)starting the service. Never auto-migrate on container boot — concurrent replicas would race. +```bash +# Migrate (every deploy that includes new migrations) +docker run --rm --env-file backend/.env.staging \ + /halobestie-backend:staging node src/db/migrate.js + +# Seed (first deploy only) +docker run --rm --env-file backend/.env.staging \ + /halobestie-backend:staging node src/db/seed.js +``` + +--- + +## 6. Deploy — plain Docker Engine + +```bash +docker run -d --name halobestie-staging \ + --env-file backend/.env.staging \ + -p 3000:3000 \ + -v /opt/halobestie/secrets/firebase-sa.json:/secrets/firebase-sa.json:ro \ + --restart unless-stopped \ + --log-driver json-file --log-opt max-size=10m --log-opt max-file=5 \ + /halobestie-backend:staging +``` +- Publish **only** `3000`. Do **not** map `3001`. +- `--log-opt` enables log rotation — see §8. +- Put a TLS-terminating reverse proxy (Nginx / Traefik / Caddy) in front for `https://staging-api.halobestie.com`. WebSocket upgrade must be proxied (the apps use `/api/shared/ws`). + +### Or with Docker Compose +A ready-to-use [docker-compose.staging.yml](docker-compose.staging.yml) is included (backend only — Postgres/Valkey are expected via `DATABASE_URL`/`VALKEY_URL`). It publishes only `3000`, mounts the Firebase SA + log volume, and sets json-file rotation. Point it at your image via the `BACKEND_IMAGE` env var (or uncomment `build: .` to build on the host): +```bash +cd backend +cp .env.example .env.staging # then fill it in +BACKEND_IMAGE=/halobestie-backend:staging \ + docker compose -f docker-compose.staging.yml up -d +``` + +--- + +## 7. Deploy — Kubernetes (sketch) + +- **Deployment** with the image, `envFrom` a Secret/ConfigMap, the Firebase SA JSON mounted from a Secret volume at `/secrets/firebase-sa.json`. +- **Service** exposing only container port `3000`. Liveness/readiness probes: `tcpSocket: { port: 3000 }` (no HTTP health route — TCP probe matches the Dockerfile HEALTHCHECK). +- Migrations: a one-off **Job** (`command: ["node","src/db/migrate.js"]`) run before rolling out, not an initContainer on every pod. +- Logs: pods write to stdout (§8) — your cluster's node agent (Fluent Bit / Loki / Cloud Logging) collects them automatically. + +--- + +## 8. Logs — where they go and how to map them + +### 8a. Application logs → **stdout/stderr** (the Docker-native way) +The backend uses Fastify's pino logger (`logger: true` in [src/app.public.js](src/app.public.js) / [src/app.internal.js](src/app.internal.js)), emitting **structured JSON to stdout**, plus a few `console.log` lifecycle lines. It does **not** write its own app-log files. So: + +```bash +# Tail live +docker logs -f halobestie-staging + +# Last 200 lines +docker logs --tail 200 halobestie-staging + +# Pretty-print the JSON (pino output is one JSON object per line) +docker logs -f halobestie-staging | jq . + +# Compose equivalent +docker compose -f docker-compose.staging.yml logs -f backend +``` + +**Rotation (important — default json-file logs grow unbounded):** +set it per-container as in §6 (`--log-opt max-size=10m --log-opt max-file=5`), or globally in `/etc/docker/daemon.json`: +```json +{ + "log-driver": "json-file", + "log-opts": { "max-size": "10m", "max-file": "5" } +} +``` +then `sudo systemctl restart docker`. + +**Ship logs off-host (optional):** point Docker at a log driver instead of/alongside json-file — e.g. `--log-driver=loki`, `--log-driver=fluentd`, `--log-driver=syslog`, or `--log-driver=gcplogs`. On k8s, stdout is collected by the node logging agent; no per-container config needed. + +**Persist raw stdout to a host file (simple VPS option):** +```bash +docker run -d --name halobestie-staging ... \ + > /var/log/halobestie/backend.log 2>&1 +# better: use the json-file driver (above) and read /var/lib/docker/containers//-json.log, +# or redirect via your reverse proxy / a sidecar. Prefer a real log driver for rotation. +``` + +### 8b. Xendit webhook fallback JSONL → **needs a volume** (only if enabled) +The one component that writes a **file** is the optional webhook fallback sink ([src/services/webhook-log.service.js](src/services/webhook-log.service.js)), **off by default**. When `XENDIT_WEBHOOK_FALLBACK_ENABLED=true`, it writes rolling JSONL to `XENDIT_WEBHOOK_FALLBACK_DIR` (default `./logs` → `/app/logs` in the container). To keep those across restarts, **mount a volume**: +```bash +docker run -d ... \ + -e XENDIT_WEBHOOK_FALLBACK_ENABLED=true \ + -e XENDIT_WEBHOOK_FALLBACK_DIR=/app/logs \ + -v /opt/halobestie/logs:/app/logs \ + +``` +(The Compose example in §6 already declares the `backend-logs` volume for this.) If the fallback stays disabled, you don't need this volume — everything is on stdout. + +--- + +## 9. Health, upgrade, rollback + +```bash +# Health — the image has a built-in TCP HEALTHCHECK; check it: +docker inspect --format '{{.State.Health.Status}}' halobestie-staging + +# Upgrade to a new image +docker pull /halobestie-backend:staging +docker run ... node src/db/migrate.js # if new migrations +docker stop halobestie-staging && docker rm halobestie-staging +docker run -d --name halobestie-staging ... # re-run with the new image + +# Rollback = re-run the previous tag/digest. Graceful shutdown is handled: +# server.js traps SIGTERM and drains the listeners before exit. +``` + +--- + +## 10. Quick reference + +| Task | Command | +|---|---| +| Build | `docker build -t ./backend` | +| Migrate | `docker run --rm --env-file .env.staging node src/db/migrate.js` | +| Run | `docker run -d --name halobestie-staging --env-file .env.staging -p 3000:3000 --restart unless-stopped ` | +| Logs (live) | `docker logs -f halobestie-staging` | +| Logs (pretty) | `docker logs -f halobestie-staging \| jq .` | +| Health | `docker inspect --format '{{.State.Health.Status}}' halobestie-staging` | +| Shell in | `docker exec -it halobestie-staging sh` | diff --git a/backend/Dockerfile b/backend/Dockerfile new file mode 100644 index 0000000..1cf6f24 --- /dev/null +++ b/backend/Dockerfile @@ -0,0 +1,44 @@ +# syntax=docker/dockerfile:1 + +# --------------------------------------------------------------------------- +# Stage 1 — builder: install production deps, compiling native addons (bcrypt) +# --------------------------------------------------------------------------- +FROM node:20-bookworm-slim AS builder +WORKDIR /app + +# Toolchain required to compile native modules (bcrypt) when no prebuilt +# binary matches the platform. Lives only in this stage. +RUN apt-get update && apt-get install -y --no-install-recommends \ + python3 make g++ \ + && rm -rf /var/lib/apt/lists/* + +# Install against the lockfile for reproducible builds. Copy manifests first +# so this layer caches until deps actually change. +COPY package.json package-lock.json ./ +RUN npm ci --omit=dev + +# --------------------------------------------------------------------------- +# Stage 2 — runtime: slim image with only prod node_modules + app source +# --------------------------------------------------------------------------- +FROM node:20-bookworm-slim AS runtime +ENV NODE_ENV=production +WORKDIR /app + +# Compiled node_modules (same base image → ABI-compatible bcrypt .node binary). +COPY --from=builder /app/node_modules ./node_modules +COPY package.json ./ +COPY src ./src + +# Drop privileges — node:* images ship a non-root `node` user. +USER node + +# Public listener only. INTERNAL_PORT (3001) binds to 127.0.0.1 inside the +# container by default and is intentionally NOT published. +EXPOSE 3000 + +# No HTTP health route exists — probe the TCP port directly so the check is +# route-agnostic. Orchestrators (k8s) can override with their own probes. +HEALTHCHECK --interval=30s --timeout=3s --start-period=20s --retries=3 \ + CMD node -e "require('net').connect({port: process.env.PUBLIC_PORT||3000, host:'127.0.0.1'}).on('connect',()=>process.exit(0)).on('error',()=>process.exit(1))" + +CMD ["node", "src/server.js"] diff --git a/backend/docker-compose.staging.yml b/backend/docker-compose.staging.yml new file mode 100644 index 0000000..168659a --- /dev/null +++ b/backend/docker-compose.staging.yml @@ -0,0 +1,39 @@ +# Staging deploy for the Halo Bestie backend (self-hosted Docker). +# Usage: +# cd backend +# docker compose -f docker-compose.staging.yml up -d +# +# Prereqs: a populated .env.staging (cp .env.example .env.staging) and the +# Firebase service-account JSON at the mounted host path below. See DEPLOY.md. +# +# This runs ONLY the backend. Postgres + Valkey are expected to be reachable +# via DATABASE_URL / VALKEY_URL in .env.staging (managed/self-hosted elsewhere). +# TLS termination + the public hostname are handled by a reverse proxy in front. + +services: + backend: + image: ${BACKEND_IMAGE:-halobestie-backend:staging} + # To build on the host instead of pulling a pushed image, comment out + # `image:` above and uncomment: + # build: . + container_name: halobestie-staging + env_file: .env.staging + ports: + - "3000:3000" # public listener only — never publish 3001 + volumes: + # Firebase service-account JSON (must match the env's Firebase project, + # staging = my-bestie-876ec). FIREBASE_SERVICE_ACCOUNT_PATH in .env.staging + # must equal the in-container path on the right. + - /opt/halobestie/secrets/firebase-sa.json:/secrets/firebase-sa.json:ro + # Optional: only needed if XENDIT_WEBHOOK_FALLBACK_ENABLED=true (writes + # rolling JSONL to /app/logs). App logs themselves go to stdout — see DEPLOY.md §8. + - backend-logs:/app/logs + restart: unless-stopped + logging: + driver: json-file + options: + max-size: "10m" + max-file: "5" + +volumes: + backend-logs: diff --git a/backend/package.json b/backend/package.json index a2a61bc..5d065e5 100644 --- a/backend/package.json +++ b/backend/package.json @@ -4,6 +4,9 @@ "description": "Halo Bestie backend API", "main": "src/server.js", "type": "module", + "engines": { + "node": ">=20" + }, "scripts": { "dev": "node --watch src/server.js", "start": "node src/server.js", diff --git a/requirement/deployment.md b/requirement/deployment.md index df18cb9..0e116a7 100644 --- a/requirement/deployment.md +++ b/requirement/deployment.md @@ -6,7 +6,7 @@ Operational decisions and dependency configuration for staging/production. Keep | Component | Service | Tier / Notes | |---|---|---| -| Backend (public + internal) | GCP Cloud Run | Horizontal scaling; SIGTERM trapped for graceful drain ([server.js](../backend/src/server.js)) | +| Backend (public + internal) | Self-hosted Docker (VPS / Kubernetes / Docker Engine) | NOT Cloud Run. Container from [backend/Dockerfile](../backend/Dockerfile); horizontal scaling via replicas; SIGTERM trapped for graceful drain ([server.js](../backend/src/server.js)) | | Database | GCP Cloud SQL (PostgreSQL) | Source of truth for all durable state | | Pub/sub + cache | Valkey | Self-hosted on VM today; Memorystore Standard (HA) recommended for prod (see [§ Valkey](#valkey)) | | Networking | GCP VPC | Internal listener (port 3001) never exposed; CC reaches it via VPN | @@ -88,7 +88,86 @@ The system is correct on any tier — HA reduces customer-visible latency spikes ## Cloud Run -(Placeholder — fill in as we make decisions about region, min/max instances, concurrency, secrets manager wiring.) +(Placeholder for prod tuning — fill in as we make decisions about region, min/max instances, concurrency, secrets manager wiring.) + +### Manual staging deploy runbook + +Goal: stand up a staging backend so the Android **staging** flavor (`com.mybestie.staging`) has a real `API_BASE_URL` to talk to. Done manually for now (no CI/CD yet — see open ops). + +> **Deploy target: self-hosted Docker** (VPS / Kubernetes / Docker Engine) — not Cloud Run. The backend ships a multi-stage [backend/Dockerfile](../backend/Dockerfile) (Node 20, non-root runtime, native `bcrypt` compiled in the build stage). Build with `docker build -t halobestie-backend ./backend`. +> +> **Full operational runbook — install Docker, build/push, migrate, run (Docker + Compose + k8s), and log mapping/rotation — lives in [backend/DEPLOY.md](../backend/DEPLOY.md).** The steps below are the staging-bring-up summary. + +**A1 — Provision the staging database (Cloud SQL Postgres)** +1. Create a Cloud SQL Postgres instance (or a separate `halobestie_staging` DB on a shared instance). Pin the **same region** as the Cloud Run service. +2. Capture its connection string for `DATABASE_URL` (use the Cloud SQL connector / Unix socket form for Cloud Run, or private IP over the VPC connector). +3. Run migrations + seed against it: + ```bash + cd backend + DATABASE_URL=postgresql://... npm run db:migrate + DATABASE_URL=postgresql://... npm run db:seed + ``` + +**A2 — Provision staging Valkey** — self-hosted Docker on the VM is fine for staging (`--appendonly yes`, see [§ Valkey](#valkey)). Note the `VALKEY_URL`. + +**A3 — Staging Firebase Admin creds** — the app's staging `google-services.json` / `GoogleService-Info.plist` point at Firebase project **`my-bestie-876ec`**. The backend's `FIREBASE_SERVICE_ACCOUNT` **must be a service-account key from that same project**, or FCM push + token verification will silently target the wrong project. Mount it as a secret and set `FIREBASE_SERVICE_ACCOUNT_PATH` (or switch to a Secret Manager mount). + +**A4 — Build the image + run migrations, then start the container.** + +Build (on a build host or in CI), then push to your registry: +```bash +docker build -t /halobestie-backend:staging ./backend +docker push /halobestie-backend:staging +``` + +Run migrations as a **one-off** before (re)starting the service — never auto-migrate on boot (replica race): +```bash +docker run --rm --env-file backend/.env.staging \ + /halobestie-backend:staging node src/db/migrate.js +# first deploy only: +docker run --rm --env-file backend/.env.staging \ + /halobestie-backend:staging node src/db/seed.js +``` + +Run the service (plain Docker Engine example; k8s = Deployment + Service with the same env/secrets and liveness/readiness probes on `:3000`): +```bash +docker run -d --name halobestie-staging \ + --env-file backend/.env.staging \ + -p 3000:3000 \ + -v /path/to/firebase-sa.json:/secrets/firebase-sa.json:ro \ + --restart unless-stopped \ + /halobestie-backend:staging +``` +- Publish **only** port 3000. The internal listener (3001) stays bound to `127.0.0.1` inside the container — do not map it. +- `FIREBASE_SERVICE_ACCOUNT_PATH` must point at the mounted path (e.g. `/secrets/firebase-sa.json`), not a baked-in file. +- Put a TLS-terminating reverse proxy (Nginx / Traefik / Caddy) in front for `https://staging-api.halobestie.com`. + +Staging-specific env values (`backend/.env.staging`; see [backend/.env.example](../backend/.env.example) for the full list): +| Var | Staging value | +|---|---| +| `AUTH_JWT_SECRET` | a fresh secret — **not** the prod one | +| `XENDIT_ENABLED` | `false` until you wire test-mode keys + webhook | +| `XENDIT_SECRET_KEY` / `XENDIT_WEBHOOK_TOKEN` | Xendit **test** credentials | +| `XENDIT_SUCCESS/FAILURE_REDIRECT_URL` | staging backend's `/payment/return/*` URLs | +| `FAZPASS_ENABLED` | `false` (test-user OTP bypass path) unless testing real OTP | +| `CC_ORIGIN` | staging control-center origin (if deployed) | +| `ADMIN_EMAIL` / `ADMIN_PASSWORD` | staging control-center login | + +> **Public listener only.** The internal listener (port 3001, control center) must stay off the public internet — don't expose it from this Cloud Run service. CC for staging, if needed, goes behind the VPC/VPN per the root architecture rules. + +**A5 — Capture the URL.** Point a DNS record (e.g. `staging-api.halobestie.com`) at the host/reverse proxy and terminate TLS there. **This HTTPS URL is the value the app needs** in Phase B. + +### App handoff (Phase B) — once A5 gives a URL +1. Put the real URL in [`client_app/env/staging.json`](../client_app/env/staging.json) + [`mitra_app/env/staging.json`](../mitra_app/env/staging.json) (`API_BASE_URL`), and remove the `_TODO` key from the client file. +2. Build the staging APK: + ```bash + cd client_app + flutter build apk --flavor staging -t lib/main_staging.dart --dart-define-from-file=env/staging.json + ``` + Output: `build/app/outputs/flutter-apk/app-staging-release.apk`. +3. Distribute via **Firebase App Distribution** (debug-signed APK is accepted — no upload keystore needed for staging) or share the APK directly. `com.mybestie.staging` installs side-by-side with prod. + +> **Release signing is still debug keys** ([client_app/android/app/build.gradle.kts](../client_app/android/app/build.gradle.kts) `release { ... }`). Fine for Firebase App Distribution / direct APK. A real upload keystore is only required if you later publish staging to a Play Store internal-testing track. iOS staging is **not** wired yet (only one `Runner.xcscheme` — no per-flavor schemes/build-configs). ## Cloud SQL