Reorganize the dev/prod tenant container so the agent runs in the monorepo
root with a clear, semantic directory tree:
/cial/core/ — runtime (back, front, edge, ui, sdk, protocol, scripts,
docker). Locked down to the cial linux user (mode 0700
in prod; :ro bind mount in restricted dev).
/cial/platform/ — agent-editable surface (back, front).
/cial/app/ — App control plane sources, present in workspace but
never built or run inside the tenant container.
/cial/docs/ — architecture + ops reference.
/cial/.claude/ — project skills/agents/commands (symlinked into the
harness HOME by the dev entrypoint).
/cial/data/ — persistent state (sqlite, deploy-logs, agent home).
Concrete changes:
- git mv cial-core → core, cial-platform → platform, cial-app → app,
scripts → core/scripts.
- pnpm-workspace.yaml: packages now core/*, platform/*, app/*.
- Bulk path rewrites across 250+ source / docker / docs files.
- core/scripts/dev-tenant.mjs: ROOT path fix, rw mount of repo + ro
overlay of /cial/core when --unrestricted is not set (FS-level
trust boundary, defense in depth).
- core/edge/src/supervisor.{ts,dev.ts}: cwd + CLAUDE_HOME relocated to
/cial/data/home; agent runs from /cial root so skill discovery picks
up /cial/.claude/skills automatically.
- core/back providers/claude.ts: HOME defaults to /cial/data/home, cwd
defaults to /cial.
- core/docker/{Dockerfile,Dockerfile.dev,dev-entrypoint.sh}: COPY +
WORKDIR + ENTRYPOINT updated; .claude → harness symlink.
- app/docker/{Dockerfile,Dockerfile.router}: COPY core, COPY app
(instead of cial-core / cial-app).
- New docs/file-structure.md — single canonical map of the runtime
layout. cial:self-edit SKILL.md mandates reading it first.
- cial:build SKILL.md: scope notes updated to platform/* and core/*.
- root package.json: smoke / dev:tenant scripts now under core/scripts/.
- core/scripts/smoke.mjs: cial-core.db → cial.db.
Externals preserved as-is by intent:
- JWT issuer string 'cial-app' in core/back/src/modules/sso/index.ts +
app/api/src/lib/sso.ts is an external contract — NOT renamed.
- @cial/back / @cial/edge / @cial/protocol / @cial/sdk / @cial/front
package names kept stable to minimize blast radius.
Verified:
- pnpm install --prod=false → ok
- turbo run build for protocol, sdk, back, edge, front, platform-back,
platform-front → all 7 successful (Next builds + tsc clean).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
13 KiB
Local mode — full multi-tenant Cial on your laptop
Goal:
pnpm local:upbrings up the entire Cial platform locally with Docker standing in for Fly Machines. You sign up as the owner in the App admin, create a "client", a real Docker container is spawned for that tenant, you click the tenant URL, and you land in their per-tenant Cial — SSO'd in, already authenticated.
This document scopes "plumbing v1" — same code path as production, only the orchestrator's driver swaps (Docker ↔ Fly). It pulls Phase 2 of PLAN.md (real container entrypoint) and parts of Phases 11–13 (App auth, tenant CRUD, orchestrator, router) forward.
Decisions (locked in)
| choice | |
|---|---|
| Tenant URL | {slug}.localhost:8080 (modern browsers resolve *.localhost → 127.0.0.1; no /etc/hosts edits) |
| Signup | owner-only (you log into admin and create tenants) |
| Scope | plumbing v1 — spawn tenant, route to it, SSO works, tenant container responds with an identifiable placeholder |
| Orchestration | docker compose for App+Postgres+Router; the App spawns tenant containers via the host Docker socket (DooD pattern) |
Tech picks
| concern | choice | why |
|---|---|---|
| App DB | postgres:16 in compose |
matches prod |
| App ORM | drizzle-orm + drizzle-kit |
already in deps, lightweight |
| App Auth | better-auth |
locked-in choice from infra deck |
| Orchestrator driver | dockerode |
typed, supports the full Docker API; no shelling out |
| Router proxy | tiny Node service using http-proxy (lives in app/router) |
dedicated process, separates concerns, matches @cial/app-router package we already scaffolded |
| SSO | HS256 JWT (shared secret in .env.local) for v1; bumps to RS256 in prod |
smaller blast radius for a dev mode |
| Tenant data | named docker volume per tenant: cial-tenant-{id}-data |
survives docker stop, gone with the tenant on destroy |
Topology
host (your laptop)
┌──────────────────────────────────────────────────────────────────┐
│ │
│ browser → http://acme.localhost:8080 │
│ │ │
│ ▼ │
│ ┌────────────┐ │
│ │ @cial/app- │ reads tenants.{slug, container_port} from PG │
│ │ router │ forwards to 172.x.x.x:port │
│ │ (:8080) │ │
│ └────────────┘ │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────────────┐ │
│ │ @cial/app- │ │ postgres │ │ tenant containers │ │
│ │ api │◄──►│ :5432 │ │ (one per client) │ │
│ │ (:3100) │ └────────────┘ │ cial-tenant:dev │ │
│ └────────────┘ └────────────────────┘ │
│ │ │
│ │ spawns/stops via /var/run/docker.sock (DooD) │
│ ▼ │
│ host Docker daemon │
│ │
└──────────────────────────────────────────────────────────────────┘
Phasing
Each phase is independently runnable and ends with a verifiable check.
L0 — Tenant image runs the full Core + Platform stack
No scope reduction. The tenant image runs all four processes with proper user separation and a single exposed port. This derisks the hardest piece of the infra (multi-process container, internal edge, two-user model) before any of the orchestration / routing is built. Chat / agent / deploy logic stays as 501 stubs — those come back via PLAN.md phases — but the plumbing is real.
Container topology (single exposed port: 8080):
external :8080
│
▼
┌──────────────────┐
│ @cial/edge │ PID 1 supervises and routes
│ (Node, user cial)│
└──────────────────┘
│ │ │ │
/.cial/api/* │ │ │
▼ │ │ │
@cial/back │ │ │ user: cial :4000
(Express) │ │ │ data: /cial/data (cial:0700)
/.cial/* │ │
▼ │ │
@cial/front │ │ user: cial :4001
(Next, basePath=/.cial)
│ │
/api/p/* │ (internal — NOT exposed for v1)
▼ │
@cial/platform-back :3001 user: agent
│
│ /* (everything else)
▼
@cial/platform-front :3000 user: agent
(Next, basePath=/)
Key decisions (calling these now to avoid round-trips):
- Supervisor: small Node script as PID 1 wrapped by
tini. It spawns the four children, pipes their stdout/stderr with prefixes, and crashes the container if any child dies (so Docker restarts it). No s6/supervisord — keeps the runtime stack pure Node. - Internal edge: tiny Node
http-proxyserver in a new packagecore/edge(~100 LOC). Handles HTTP and WebSocket upgrade. The only process bound to:8080. - Two-user model preserved:
cialuser owns/cial/core(mode 0700), runs edge + back + front;agentuser owns/cial/platform, runs platform-back + platform-front. Edge iscial-owned so the agent process can never bind the public port. - Platform Back exposure: internal-only for v1 — reachable from
Platform Front server-side via
http://localhost:3001, not from the browser. Avoids fighting Next.js for the/api/*namespace. We pick a public mount path later when we know the convention. - Next.js mode: production builds with
output: 'standalone'so each Next app ships only what it needs (smaller image, faster cold start). - No watch / dev mode in the container — that's purely for local dev
outside the container (where
pnpm smokealready works).
Tasks:
- Add
core/edgepackage: Node +http-proxy+ ws upgrade handling. - Add
cial-core/supervisor(or a smallbin/supervisor.mjsinsideedge): spawns the four children with the right user viaprocess.setuid(or viasu-execin the entrypoint). - Add
output: 'standalone'+outputFileTracingRootto both Next apps (@cial/front,@cial/platform-front) so monorepo symlinks resolve in the standalone bundle. - Add a tiny placeholder route to each surface so we can prove routing:
@cial/platform-front/→<h1>Platform · Tenant {TENANT_ID}</h1>@cial/front/.cial→<h1>Cial Core · Tenant {TENANT_ID}</h1>@cial/back/.cial/api/healthalready exists (/healthzmounted there)@cial/platform-back/healthalready exists (internal only)
- Rewrite
core/docker/Dockerfileas a real multi-stage build that produces the runtime image with all four services + the supervisor + correct ownership/permissions. Entry:tini -- node /opt/core/edge/bin/supervisor.mjs. - Add
core/docker/.dockerignore(already there).
Verify (acceptance for L0):
docker build -f core/docker/Dockerfile -t cial-tenant:dev .
docker run --rm -e TENANT_ID=demo -p 9000:8080 cial-tenant:dev
# In another shell:
curl http://localhost:9000/ # → "Platform · Tenant demo"
curl http://localhost:9000/.cial # → "Cial Core · Tenant demo"
curl http://localhost:9000/.cial/api/healthz # → {"status":"ok",…}
docker exec <id> ls -la /cial/core # → owner cial, mode 0700
docker exec <id> sudo -u agent cat /opt/core/back/dist/index.js
# → permission denied (proves boundary)
If all five checks pass, L0 is done. The container is then a real, production-shaped artifact — every later phase just plugs into it.
L1 — docker compose + Postgres + App API + owner signup
docker-compose.ymlat repo root with services:postgres,app-apiapp/api:- Drizzle schema:
users,tenants(slug, name, container_id, container_port, state, owner_id) drizzle-kit generate+ on-boot migrate- Better-Auth wired at
/api/auth/[...all]— email+password, no verification in dev /admin/loginand/admin/signuppages (signup disabled after first user; first user is owner)
- Drizzle schema:
app-apiDockerfile updated to actually runnext starton :3100- Verify:
docker compose up, browser tohttp://localhost:3100/admin/signup, create owner account, redirected to/admin(empty tenant list).
L2 — Admin tenant CRUD
/adminpage: list tenants from DB- "New tenant" form (slug, name) → POST
/api/admin/tenants - Validates slug (
^[a-z0-9-]{2,40}$), inserts row withstate='provisioning' - For now, no orchestrator call — just DB row + UI confirmation
- Verify: create a tenant in UI, see it in the list with state=provisioning.
L3 — Orchestrator Docker driver
app/orchestrator/src/drivers/docker.ts: implementsOrchestratorviadockerodecreate(tenant):docker createfromcial-tenant:devwith envTENANT_ID, labelcial.tenant=<id>, named volume, exposed port mapped to a free host port. Sets state=starting.start/stop/destroy/status: trivial dockerode calls
app-apicallsorchestrator.create()after the DB insert in L2; recordscontainer_id,container_port; flips state torunningonce the container's/healthzresponds.- App container needs
/var/run/docker.sockmounted in compose. - Verify: create a tenant in UI → admin list shows state=running with a
port;
docker psshows the container;curl localhost:<port>returns the L0 placeholder page.
L4 — Router reverse proxy on :8080
app/router/src/index.ts: small Node HTTP+WS server on :8080 usinghttp-proxy- Reads tenant routes from Postgres on each request (cache with 5s TTL)
Hostheaderacme.localhost→tenants WHERE slug='acme' AND state='running'→ forward tolocalhost:<container_port>- 404 if unknown slug; 503 if state ≠ running
- Add
routerservice to compose - Verify:
http://acme.localhost:8080shows the tenant's page (Tenant acme).
L5 — SSO handoff App → tenant Core
- App admin: tenant detail page has an Open button →
/api/admin/tenants/:slug/open - That endpoint mints a short-lived (60s) HS256 JWT
{ sub: ownerId, tenant: slug, role: 'owner' }signed withCIAL_SSO_SECRET, redirects browser tohttp://{slug}.localhost:8080/.cial/sso?token=... - Core Back:
/.cial/ssovalidates token, sets a session cookie scoped to the tenant subdomain, redirects to/. (For v1 the cookie is the marker; Phase 3 of PLAN.md replaces this with real Better-Auth in Core.) - The tenant
/page now showsTenant {id} · signed in as {sub}. - Verify: from admin, click "Open acme" → land on
acme.localhost:8080with the tenant page showing your owner email.
L6 — pnpm local:up wraps everything
- One-shot script that:
- Builds
cial-tenant:devimage - Runs
docker compose up --build -d - Tails logs, prints
→ http://localhost:3100/adminwhen ready
- Builds
pnpm local:downtears compose + prunes any orphan tenant containers (label-selected:label=cial.tenant)- A new
scripts/smoke-local.mjsend-to-end:- local:up
- sign up owner via API
- create tenant via API
- probe
acme.localhost:8080until 200 - local:down
- Verify: green E2E smoke from a clean state.
Working agreement
- Same as PLAN.md: implement → self-test → commit
phase(L<n>): …→ push. - Each phase ends with a green check. If a phase's verify step fails, we fix before moving on.
- Anything bigger than L0–L6 (e.g., bring Core Front + Platform processes inside the tenant image) is a follow-up scoped after L6 is green.
Open questions to revisit later (NOT for v1)
- Multi-process tenant (Core Front + Platform Back + Platform Front inside the container) — needed for the real Platform editing experience
- Trigger fabric / scheduler — needed once tenants have cron triggers
- Custom domains for tenants
- Per-tenant resource limits (memory/CPU caps via Docker)
- Tenant suspend (Docker stop) ↔ cold start time on first request