Per-tenant container now boots all five processes behind a single exposed port (:8080), with the Core/Platform boundary enforced at the filesystem level (two Linux users, mode 0700 on cial-core). - @cial/edge: http-proxy edge (HTTP+WS) + node supervisor (PID 1 under tini, spawns each service via gosu as the right user) - Routes: /.cial/api/* -> back (prefix stripped), /.cial/* -> core front (basePath kept), /* -> platform front. Platform Back is internal-only for v1. - Dockerfile: multi-stage (builder + runtime). Builds protocol/sdk/ back/edge/front/platform-back/platform-front. Runtime installs tini+gosu, creates cial:1000 / agent:1001, locks down cial-core to 0700. - Placeholder pages now render TENANT_ID at request time so the smoke can verify per-tenant env propagation end-to-end. - scripts/smoke-tenant.mjs: docker-driven L0 acceptance — boots the image, polls healthz, probes the four route classes, and asserts the agent user cannot read /opt/cial-monorepo/cial-core. - PLAN-LOCAL.md: phased local-mode roadmap (L0..L6). Verify on a host with docker: docker build -f cial-core/docker/Dockerfile -t cial-tenant:dev . pnpm smoke:tenant Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14 KiB
Local mode — full multi-tenant Cial on your laptop
Goal:
pnpm local:upbrings up the entire Cial platform locally with Docker standing in for Fly Machines. You sign up as the owner in the App admin, create a "client", a real Docker container is spawned for that tenant, you click the tenant URL, and you land in their per-tenant Cial — SSO'd in, already authenticated.
This document scopes "plumbing v1" — same code path as production, only the orchestrator's driver swaps (Docker ↔ Fly). It pulls Phase 2 of PLAN.md (real container entrypoint) and parts of Phases 11–13 (App auth, tenant CRUD, orchestrator, router) forward.
Decisions (locked in)
| choice | |
|---|---|
| Tenant URL | {slug}.localhost:8080 (modern browsers resolve *.localhost → 127.0.0.1; no /etc/hosts edits) |
| Signup | owner-only (you log into admin and create tenants) |
| Scope | plumbing v1 — spawn tenant, route to it, SSO works, tenant container responds with an identifiable placeholder |
| Orchestration | docker compose for App+Postgres+Router; the App spawns tenant containers via the host Docker socket (DooD pattern) |
Tech picks
| concern | choice | why |
|---|---|---|
| App DB | postgres:16 in compose |
matches prod |
| App ORM | drizzle-orm + drizzle-kit |
already in deps, lightweight |
| App Auth | better-auth |
locked-in choice from infra deck |
| Orchestrator driver | dockerode |
typed, supports the full Docker API; no shelling out |
| Router proxy | tiny Node service using http-proxy (lives in cial-app/router) |
dedicated process, separates concerns, matches @cial/app-router package we already scaffolded |
| SSO | HS256 JWT (shared secret in .env.local) for v1; bumps to RS256 in prod |
smaller blast radius for a dev mode |
| Tenant data | named docker volume per tenant: cial-tenant-{id}-data |
survives docker stop, gone with the tenant on destroy |
Topology
host (your laptop)
┌──────────────────────────────────────────────────────────────────┐
│ │
│ browser → http://acme.localhost:8080 │
│ │ │
│ ▼ │
│ ┌────────────┐ │
│ │ @cial/app- │ reads tenants.{slug, container_port} from PG │
│ │ router │ forwards to 172.x.x.x:port │
│ │ (:8080) │ │
│ └────────────┘ │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────────────┐ │
│ │ @cial/app- │ │ postgres │ │ tenant containers │ │
│ │ api │◄──►│ :5432 │ │ (one per client) │ │
│ │ (:3100) │ └────────────┘ │ cial-tenant:dev │ │
│ └────────────┘ └────────────────────┘ │
│ │ │
│ │ spawns/stops via /var/run/docker.sock (DooD) │
│ ▼ │
│ host Docker daemon │
│ │
└──────────────────────────────────────────────────────────────────┘
Phasing
Each phase is independently runnable and ends with a verifiable check.
L0 — Tenant image runs the full Core + Platform stack
No scope reduction. The tenant image runs all four processes with proper user separation and a single exposed port. This derisks the hardest piece of the infra (multi-process container, internal edge, two-user model) before any of the orchestration / routing is built. Chat / agent / deploy logic stays as 501 stubs — those come back via PLAN.md phases — but the plumbing is real.
Container topology (single exposed port: 8080):
external :8080
│
▼
┌──────────────────┐
│ @cial/edge │ PID 1 supervises and routes
│ (Node, user cial)│
└──────────────────┘
│ │ │ │
/.cial/api/* │ │ │
▼ │ │ │
@cial/back │ │ │ user: cial :4000
(Express) │ │ │ data: /var/lib/cial (cial:0700)
/.cial/* │ │
▼ │ │
@cial/front │ │ user: cial :4001
(Next, basePath=/.cial)
│ │
/api/p/* │ (internal — NOT exposed for v1)
▼ │
@cial/platform-back :3001 user: agent
│
│ /* (everything else)
▼
@cial/platform-front :3000 user: agent
(Next, basePath=/)
Key decisions (calling these now to avoid round-trips):
- Supervisor: small Node script as PID 1 wrapped by
tini. It spawns the four children, pipes their stdout/stderr with prefixes, and crashes the container if any child dies (so Docker restarts it). No s6/supervisord — keeps the runtime stack pure Node. - Internal edge: tiny Node
http-proxyserver in a new packagecial-core/edge(~100 LOC). Handles HTTP and WebSocket upgrade. The only process bound to:8080. - Two-user model preserved:
cialuser owns/opt/cial-core(mode 0700), runs edge + back + front;agentuser owns/opt/cial-platform, runs platform-back + platform-front. Edge iscial-owned so the agent process can never bind the public port. - Platform Back exposure: internal-only for v1 — reachable from
Platform Front server-side via
http://localhost:3001, not from the browser. Avoids fighting Next.js for the/api/*namespace. We pick a public mount path later when we know the convention. - Next.js mode: production builds with
output: 'standalone'so each Next app ships only what it needs (smaller image, faster cold start). - No watch / dev mode in the container — that's purely for local dev
outside the container (where
pnpm smokealready works).
Tasks:
- Add
cial-core/edgepackage: Node +http-proxy+ ws upgrade handling. - Add
cial-core/supervisor(or a smallbin/supervisor.mjsinsideedge): spawns the four children with the right user viaprocess.setuid(or viasu-execin the entrypoint). - Add
output: 'standalone'+outputFileTracingRootto both Next apps (@cial/front,@cial/platform-front) so monorepo symlinks resolve in the standalone bundle. - Add a tiny placeholder route to each surface so we can prove routing:
@cial/platform-front/→<h1>Platform · Tenant {TENANT_ID}</h1>@cial/front/.cial→<h1>Cial Core · Tenant {TENANT_ID}</h1>@cial/back/.cial/api/healthalready exists (/healthzmounted there)@cial/platform-back/healthalready exists (internal only)
- Rewrite
cial-core/docker/Dockerfileas a real multi-stage build that produces the runtime image with all four services + the supervisor + correct ownership/permissions. Entry:tini -- node /opt/cial-core/edge/bin/supervisor.mjs. - Add
cial-core/docker/.dockerignore(already there).
Verify (acceptance for L0):
docker build -f cial-core/docker/Dockerfile -t cial-tenant:dev .
docker run --rm -e TENANT_ID=demo -p 9000:8080 cial-tenant:dev
# In another shell:
curl http://localhost:9000/ # → "Platform · Tenant demo"
curl http://localhost:9000/.cial # → "Cial Core · Tenant demo"
curl http://localhost:9000/.cial/api/healthz # → {"status":"ok",…}
docker exec <id> ls -la /opt/cial-core # → owner cial, mode 0700
docker exec <id> sudo -u agent cat /opt/cial-core/back/dist/index.js
# → permission denied (proves boundary)
If all five checks pass, L0 is done. The container is then a real, production-shaped artifact — every later phase just plugs into it.
L1 — docker compose + Postgres + App API + owner signup
docker-compose.ymlat repo root with services:postgres,app-apicial-app/api:- Drizzle schema:
users,tenants(slug, name, container_id, container_port, state, owner_id) drizzle-kit generate+ on-boot migrate- Better-Auth wired at
/api/auth/[...all]— email+password, no verification in dev /admin/loginand/admin/signuppages (signup disabled after first user; first user is owner)
- Drizzle schema:
app-apiDockerfile updated to actually runnext starton :3100- Verify:
docker compose up, browser tohttp://localhost:3100/admin/signup, create owner account, redirected to/admin(empty tenant list).
L2 — Admin tenant CRUD
/adminpage: list tenants from DB- "New tenant" form (slug, name) → POST
/api/admin/tenants - Validates slug (
^[a-z0-9-]{2,40}$), inserts row withstate='provisioning' - For now, no orchestrator call — just DB row + UI confirmation
- Verify: create a tenant in UI, see it in the list with state=provisioning.
L3 — Orchestrator Docker driver
cial-app/orchestrator/src/drivers/docker.ts: implementsOrchestratorviadockerodecreate(tenant):docker createfromcial-tenant:devwith envTENANT_ID, labelcial.tenant=<id>, named volume, exposed port mapped to a free host port. Sets state=starting.start/stop/destroy/status: trivial dockerode calls
app-apicallsorchestrator.create()after the DB insert in L2; recordscontainer_id,container_port; flips state torunningonce the container's/healthzresponds.- App container needs
/var/run/docker.sockmounted in compose. - Verify: create a tenant in UI → admin list shows state=running with a
port;
docker psshows the container;curl localhost:<port>returns the L0 placeholder page.
L4 — Router reverse proxy on :8080
cial-app/router/src/index.ts: small Node HTTP+WS server on :8080 usinghttp-proxy- Reads tenant routes from Postgres on each request (cache with 5s TTL)
Hostheaderacme.localhost→tenants WHERE slug='acme' AND state='running'→ forward tolocalhost:<container_port>- 404 if unknown slug; 503 if state ≠ running
- Add
routerservice to compose - Verify:
http://acme.localhost:8080shows the tenant's page (Tenant acme).
L5 — SSO handoff App → tenant Core
- App admin: tenant detail page has an Open button →
/api/admin/tenants/:slug/open - That endpoint mints a short-lived (60s) HS256 JWT
{ sub: ownerId, tenant: slug, role: 'owner' }signed withCIAL_SSO_SECRET, redirects browser tohttp://{slug}.localhost:8080/.cial/sso?token=... - Core Back:
/.cial/ssovalidates token, sets a session cookie scoped to the tenant subdomain, redirects to/. (For v1 the cookie is the marker; Phase 3 of PLAN.md replaces this with real Better-Auth in Core.) - The tenant
/page now showsTenant {id} · signed in as {sub}. - Verify: from admin, click "Open acme" → land on
acme.localhost:8080with the tenant page showing your owner email.
L6 — pnpm local:up wraps everything
- One-shot script that:
- Builds
cial-tenant:devimage - Runs
docker compose up --build -d - Tails logs, prints
→ http://localhost:3100/adminwhen ready
- Builds
pnpm local:downtears compose + prunes any orphan tenant containers (label-selected:label=cial.tenant)- A new
scripts/smoke-local.mjsend-to-end:- local:up
- sign up owner via API
- create tenant via API
- probe
acme.localhost:8080until 200 - local:down
- Verify: green E2E smoke from a clean state.
Working agreement
- Same as PLAN.md: implement → self-test → commit
phase(L<n>): …→ push. - Each phase ends with a green check. If a phase's verify step fails, we fix before moving on.
- Anything bigger than L0–L6 (e.g., bring Core Front + Platform processes inside the tenant image) is a follow-up scoped after L6 is green.
Open questions to revisit later (NOT for v1)
- Multi-process tenant (Core Front + Platform Back + Platform Front inside the container) — needed for the real Platform editing experience
- Trigger fabric / scheduler — needed once tenants have cron triggers
- Custom domains for tenants
- Per-tenant resource limits (memory/CPU caps via Docker)
- Tenant suspend (Docker stop) ↔ cold start time on first request