build(backend): Dockerize for self-hosted deploy + deploy/log docs

Backend deploy target is self-hosted Docker (VPS / Kubernetes / Docker Engine), not Cloud Run. Add a multi-stage Dockerfile (Node 20, bcrypt compiled in build stage, non-root runtime), .dockerignore, a staging docker-compose, and DEPLOY.md covering install, build, migrate, run, and log mapping/rotation. Pin engines.node>=20. Update deployment.md runbook and backend/CLAUDE.md infra line off Cloud Run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 15:10:59 +08:00
parent be20eee16b
commit 91bdbd5289
7 changed files with 399 additions and 3 deletions
--- a/backend/.dockerignore
+++ b/backend/.dockerignore
@@ -0,0 +1,22 @@
+# Deps are reinstalled inside the image via `npm ci`
+node_modules
+npm-debug.log*
+
+# Secrets / local env — mounted at runtime, never baked in
+.env
+.env.*
+!.env.example
+
+# VCS + tooling
+.git
+.gitignore
+.vscode
+.idea
+
+# Tests + coverage (not needed in the runtime image)
+coverage
+docker-compose*.yml
+**/*.test.js
+
+# Docs
+*.md
--- a/backend/CLAUDE.md
+++ b/backend/CLAUDE.md
@@ -10,7 +10,7 @@ Fastify.js REST API serving both mobile apps and the internal control center.
 - **Database:** PostgreSQL via GCP Cloud SQL
 - **Auth:** Self-managed JWT (HS256 access, 1h) + opaque refresh token (30d, rotated, bcrypt-hashed in `auth_sessions`). Firebase Auth removed in Phase 3.4 (commit `f860ab6`). `firebase-admin` is kept but only for FCM messaging.
 - **Payment:** Xendit
- **Infra:** GCP Cloud Run
+- **Infra:** Self-hosted Docker (VPS / Kubernetes / Docker Engine) — **not** Cloud Run. Multi-stage [Dockerfile](Dockerfile); deploy + log runbook in [DEPLOY.md](DEPLOY.md). DB is PostgreSQL (managed or self-hosted).

 ## Two Listeners

--- a/backend/DEPLOY.md
+++ b/backend/DEPLOY.md
@@ -0,0 +1,209 @@
+# Backend — Docker Deployment Guide
+
+Operational guide for building, deploying, and observing the Halo Bestie backend as a **Docker container** on self-hosted infra (VPS, Docker Engine, or Kubernetes). **Not** Cloud Run / serverless.
+
+> Architecture / env-var reference: [../requirement/deployment.md](../requirement/deployment.md) · [.env.example](.env.example)
+
+---
+
+## 1. What gets deployed
+
+A single image (multi-stage [Dockerfile](Dockerfile)) running `node src/server.js`, which starts **two listeners** ([src/server.js](src/server.js)):
+
+| Listener | Bind | Port | Exposed? |
+|---|---|---|---|
+| Public API (`client_app` + `mitra_app`) | `0.0.0.0` | `PUBLIC_PORT` (default **3000**) | **Yes** — publish this |
+| Internal API (control center) | `INTERNAL_HOST` (default `127.0.0.1`) | `INTERNAL_PORT` (default 3001) | **No** — loopback only, never publish |
+
+Runtime image: Node 20 (bookworm-slim), prod-only deps, runs as non-root `node`, native `bcrypt` precompiled in the build stage.
+
+---
+
+## 2. Install Docker (one-time, on the host)
+
+### Ubuntu / Debian VPS
+```bash
+# Remove any old packages, then install Docker Engine from the official repo
+curl -fsSL https://get.docker.com | sh
+
+# Run docker as your user without sudo (log out/in afterward)
+sudo usermod -aG docker "$USER"
+
+# Verify
+docker version
+docker compose version   # Compose v2 ships as a plugin with modern Docker
+```
+
+### Kubernetes
+No Docker Engine needed on nodes — just push the image to a registry your cluster can pull from (see §4) and apply your manifests (§7).
+
+---
+
+## 3. Configure environment
+
+Create `backend/.env.staging` (or `.env.production`) from the template — **never commit it**:
+```bash
+cp .env.example .env.staging
+```
+Fill in at minimum (full list in [.env.example](.env.example)):
+
+| Var | Notes |
+|---|---|
+| `PUBLIC_PORT` | `3000` (keep default unless your proxy expects otherwise) |
+| `INTERNAL_HOST` / `INTERNAL_PORT` | leave default `127.0.0.1:3001` — keeps control center private |
+| `DATABASE_URL` | Postgres connection string |
+| `VALKEY_URL` | `redis://<host>:6379` |
+| `SERVER_TZ` | `UTC` |
+| `AUTH_JWT_SECRET` | **fresh per environment** — never reuse prod's |
+| `FIREBASE_SERVICE_ACCOUNT_PATH` | path to the **mounted** SA JSON, e.g. `/secrets/firebase-sa.json`. Must be from the env's Firebase project (staging = `my-bestie-876ec`) |
+| `XENDIT_ENABLED` | `false` until test keys + webhook are wired |
+| `CC_ORIGIN`, `ADMIN_EMAIL`, `ADMIN_PASSWORD` | control-center access |
+
+> Secrets (`.env`, the Firebase SA JSON) are provided at **runtime** via `--env-file` / volume mounts / k8s Secrets. They are **not** baked into the image (`.dockerignore` excludes `.env*`).
+
+---
+
+## 4. Build & push the image
+
+```bash
+# From the repo root
+docker build -t <registry>/halobestie-backend:staging ./backend
+
+# Push to your registry (Docker Hub, GHCR, GCP Artifact Registry, self-hosted, …)
+docker push <registry>/halobestie-backend:staging
+```
+For a purely single-host setup you can skip the registry and build directly on the host.
+
+---
+
+## 5. Run database migrations (one-off)
+
+Run **before** (re)starting the service. Never auto-migrate on container boot — concurrent replicas would race.
+```bash
+# Migrate (every deploy that includes new migrations)
+docker run --rm --env-file backend/.env.staging \
+  <registry>/halobestie-backend:staging node src/db/migrate.js
+
+# Seed (first deploy only)
+docker run --rm --env-file backend/.env.staging \
+  <registry>/halobestie-backend:staging node src/db/seed.js
+```
+
+---
+
+## 6. Deploy — plain Docker Engine
+
+```bash
+docker run -d --name halobestie-staging \
+  --env-file backend/.env.staging \
+  -p 3000:3000 \
+  -v /opt/halobestie/secrets/firebase-sa.json:/secrets/firebase-sa.json:ro \
+  --restart unless-stopped \
+  --log-driver json-file --log-opt max-size=10m --log-opt max-file=5 \
+  <registry>/halobestie-backend:staging
+```
+- Publish **only** `3000`. Do **not** map `3001`.
+- `--log-opt` enables log rotation — see §8.
+- Put a TLS-terminating reverse proxy (Nginx / Traefik / Caddy) in front for `https://staging-api.halobestie.com`. WebSocket upgrade must be proxied (the apps use `/api/shared/ws`).
+
+### Or with Docker Compose
+A ready-to-use [docker-compose.staging.yml](docker-compose.staging.yml) is included (backend only — Postgres/Valkey are expected via `DATABASE_URL`/`VALKEY_URL`). It publishes only `3000`, mounts the Firebase SA + log volume, and sets json-file rotation. Point it at your image via the `BACKEND_IMAGE` env var (or uncomment `build: .` to build on the host):
+```bash
+cd backend
+cp .env.example .env.staging          # then fill it in
+BACKEND_IMAGE=<registry>/halobestie-backend:staging \
+  docker compose -f docker-compose.staging.yml up -d
+```
+
+---
+
+## 7. Deploy — Kubernetes (sketch)
+
+- **Deployment** with the image, `envFrom` a Secret/ConfigMap, the Firebase SA JSON mounted from a Secret volume at `/secrets/firebase-sa.json`.
+- **Service** exposing only container port `3000`. Liveness/readiness probes: `tcpSocket: { port: 3000 }` (no HTTP health route — TCP probe matches the Dockerfile HEALTHCHECK).
+- Migrations: a one-off **Job** (`command: ["node","src/db/migrate.js"]`) run before rolling out, not an initContainer on every pod.
+- Logs: pods write to stdout (§8) — your cluster's node agent (Fluent Bit / Loki / Cloud Logging) collects them automatically.
+
+---
+
+## 8. Logs — where they go and how to map them
+
+### 8a. Application logs → **stdout/stderr** (the Docker-native way)
+The backend uses Fastify's pino logger (`logger: true` in [src/app.public.js](src/app.public.js) / [src/app.internal.js](src/app.internal.js)), emitting **structured JSON to stdout**, plus a few `console.log` lifecycle lines. It does **not** write its own app-log files. So:
+
+```bash
+# Tail live
+docker logs -f halobestie-staging
+
+# Last 200 lines
+docker logs --tail 200 halobestie-staging
+
+# Pretty-print the JSON (pino output is one JSON object per line)
+docker logs -f halobestie-staging | jq .
+
+# Compose equivalent
+docker compose -f docker-compose.staging.yml logs -f backend
+```
+
+**Rotation (important — default json-file logs grow unbounded):**
+set it per-container as in §6 (`--log-opt max-size=10m --log-opt max-file=5`), or globally in `/etc/docker/daemon.json`:
+```json
+{
+  "log-driver": "json-file",
+  "log-opts": { "max-size": "10m", "max-file": "5" }
+}
+```
+then `sudo systemctl restart docker`.
+
+**Ship logs off-host (optional):** point Docker at a log driver instead of/alongside json-file — e.g. `--log-driver=loki`, `--log-driver=fluentd`, `--log-driver=syslog`, or `--log-driver=gcplogs`. On k8s, stdout is collected by the node logging agent; no per-container config needed.
+
+**Persist raw stdout to a host file (simple VPS option):**
+```bash
+docker run -d --name halobestie-staging ... \
+  <image> > /var/log/halobestie/backend.log 2>&1
+# better: use the json-file driver (above) and read /var/lib/docker/containers/<id>/<id>-json.log,
+# or redirect via your reverse proxy / a sidecar. Prefer a real log driver for rotation.
+```
+
+### 8b. Xendit webhook fallback JSONL → **needs a volume** (only if enabled)
+The one component that writes a **file** is the optional webhook fallback sink ([src/services/webhook-log.service.js](src/services/webhook-log.service.js)), **off by default**. When `XENDIT_WEBHOOK_FALLBACK_ENABLED=true`, it writes rolling JSONL to `XENDIT_WEBHOOK_FALLBACK_DIR` (default `./logs` → `/app/logs` in the container). To keep those across restarts, **mount a volume**:
+```bash
+docker run -d ... \
+  -e XENDIT_WEBHOOK_FALLBACK_ENABLED=true \
+  -e XENDIT_WEBHOOK_FALLBACK_DIR=/app/logs \
+  -v /opt/halobestie/logs:/app/logs \
+  <image>
+```
+(The Compose example in §6 already declares the `backend-logs` volume for this.) If the fallback stays disabled, you don't need this volume — everything is on stdout.
+
+---
+
+## 9. Health, upgrade, rollback
+
+```bash
+# Health — the image has a built-in TCP HEALTHCHECK; check it:
+docker inspect --format '{{.State.Health.Status}}' halobestie-staging
+
+# Upgrade to a new image
+docker pull <registry>/halobestie-backend:staging
+docker run ... node src/db/migrate.js          # if new migrations
+docker stop halobestie-staging && docker rm halobestie-staging
+docker run -d --name halobestie-staging ...    # re-run with the new image
+
+# Rollback = re-run the previous tag/digest. Graceful shutdown is handled:
+# server.js traps SIGTERM and drains the listeners before exit.
+```
+
+---
+
+## 10. Quick reference
+
+| Task | Command |
+|---|---|
+| Build | `docker build -t <img> ./backend` |
+| Migrate | `docker run --rm --env-file .env.staging <img> node src/db/migrate.js` |
+| Run | `docker run -d --name halobestie-staging --env-file .env.staging -p 3000:3000 --restart unless-stopped <img>` |
+| Logs (live) | `docker logs -f halobestie-staging` |
+| Logs (pretty) | `docker logs -f halobestie-staging \| jq .` |
+| Health | `docker inspect --format '{{.State.Health.Status}}' halobestie-staging` |
+| Shell in | `docker exec -it halobestie-staging sh` |
--- a/backend/Dockerfile
+++ b/backend/Dockerfile
@@ -0,0 +1,44 @@
+# syntax=docker/dockerfile:1
+
+# ---------------------------------------------------------------------------
+# Stage 1 — builder: install production deps, compiling native addons (bcrypt)
+# ---------------------------------------------------------------------------
+FROM node:20-bookworm-slim AS builder
+WORKDIR /app
+
+# Toolchain required to compile native modules (bcrypt) when no prebuilt
+# binary matches the platform. Lives only in this stage.
+RUN apt-get update && apt-get install -y --no-install-recommends \
+      python3 make g++ \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install against the lockfile for reproducible builds. Copy manifests first
+# so this layer caches until deps actually change.
+COPY package.json package-lock.json ./
+RUN npm ci --omit=dev
+
+# ---------------------------------------------------------------------------
+# Stage 2 — runtime: slim image with only prod node_modules + app source
+# ---------------------------------------------------------------------------
+FROM node:20-bookworm-slim AS runtime
+ENV NODE_ENV=production
+WORKDIR /app
+
+# Compiled node_modules (same base image → ABI-compatible bcrypt .node binary).
+COPY --from=builder /app/node_modules ./node_modules
+COPY package.json ./
+COPY src ./src
+
+# Drop privileges — node:* images ship a non-root `node` user.
+USER node
+
+# Public listener only. INTERNAL_PORT (3001) binds to 127.0.0.1 inside the
+# container by default and is intentionally NOT published.
+EXPOSE 3000
+
+# No HTTP health route exists — probe the TCP port directly so the check is
+# route-agnostic. Orchestrators (k8s) can override with their own probes.
+HEALTHCHECK --interval=30s --timeout=3s --start-period=20s --retries=3 \
+  CMD node -e "require('net').connect({port: process.env.PUBLIC_PORT||3000, host:'127.0.0.1'}).on('connect',()=>process.exit(0)).on('error',()=>process.exit(1))"
+
+CMD ["node", "src/server.js"]
--- a/backend/docker-compose.staging.yml
+++ b/backend/docker-compose.staging.yml
@@ -0,0 +1,39 @@
+# Staging deploy for the Halo Bestie backend (self-hosted Docker).
+# Usage:
+#   cd backend
+#   docker compose -f docker-compose.staging.yml up -d
+#
+# Prereqs: a populated .env.staging (cp .env.example .env.staging) and the
+# Firebase service-account JSON at the mounted host path below. See DEPLOY.md.
+#
+# This runs ONLY the backend. Postgres + Valkey are expected to be reachable
+# via DATABASE_URL / VALKEY_URL in .env.staging (managed/self-hosted elsewhere).
+# TLS termination + the public hostname are handled by a reverse proxy in front.
+
+services:
+  backend:
+    image: ${BACKEND_IMAGE:-halobestie-backend:staging}
+    # To build on the host instead of pulling a pushed image, comment out
+    # `image:` above and uncomment:
+    # build: .
+    container_name: halobestie-staging
+    env_file: .env.staging
+    ports:
+      - "3000:3000"            # public listener only — never publish 3001
+    volumes:
+      # Firebase service-account JSON (must match the env's Firebase project,
+      # staging = my-bestie-876ec). FIREBASE_SERVICE_ACCOUNT_PATH in .env.staging
+      # must equal the in-container path on the right.
+      - /opt/halobestie/secrets/firebase-sa.json:/secrets/firebase-sa.json:ro
+      # Optional: only needed if XENDIT_WEBHOOK_FALLBACK_ENABLED=true (writes
+      # rolling JSONL to /app/logs). App logs themselves go to stdout — see DEPLOY.md §8.
+      - backend-logs:/app/logs
+    restart: unless-stopped
+    logging:
+      driver: json-file
+      options:
+        max-size: "10m"
+        max-file: "5"
+
+volumes:
+  backend-logs:
--- a/backend/package.json
+++ b/backend/package.json
@@ -4,6 +4,9 @@
  "description": "Halo Bestie backend API",
  "main": "src/server.js",
  "type": "module",
+  "engines": {
+    "node": ">=20"
+  },
  "scripts": {
    "dev": "node --watch src/server.js",
    "start": "node src/server.js",