refactor(layout): consolidate workspace under /cial — core/, platform/, app/

Reorganize the dev/prod tenant container so the agent runs in the monorepo
root with a clear, semantic directory tree:

  /cial/core/      — runtime (back, front, edge, ui, sdk, protocol, scripts,
                     docker). Locked down to the cial linux user (mode 0700
                     in prod; :ro bind mount in restricted dev).
  /cial/platform/  — agent-editable surface (back, front).
  /cial/app/       — App control plane sources, present in workspace but
                     never built or run inside the tenant container.
  /cial/docs/      — architecture + ops reference.
  /cial/.claude/   — project skills/agents/commands (symlinked into the
                     harness HOME by the dev entrypoint).
  /cial/data/      — persistent state (sqlite, deploy-logs, agent home).

Concrete changes:
- git mv cial-core → core, cial-platform → platform, cial-app → app,
  scripts → core/scripts.
- pnpm-workspace.yaml: packages now core/*, platform/*, app/*.
- Bulk path rewrites across 250+ source / docker / docs files.
- core/scripts/dev-tenant.mjs: ROOT path fix, rw mount of repo + ro
  overlay of /cial/core when --unrestricted is not set (FS-level
  trust boundary, defense in depth).
- core/edge/src/supervisor.{ts,dev.ts}: cwd + CLAUDE_HOME relocated to
  /cial/data/home; agent runs from /cial root so skill discovery picks
  up /cial/.claude/skills automatically.
- core/back providers/claude.ts: HOME defaults to /cial/data/home, cwd
  defaults to /cial.
- core/docker/{Dockerfile,Dockerfile.dev,dev-entrypoint.sh}: COPY +
  WORKDIR + ENTRYPOINT updated; .claude → harness symlink.
- app/docker/{Dockerfile,Dockerfile.router}: COPY core, COPY app
  (instead of cial-core / cial-app).
- New docs/file-structure.md — single canonical map of the runtime
  layout. cial:self-edit SKILL.md mandates reading it first.
- cial:build SKILL.md: scope notes updated to platform/* and core/*.
- root package.json: smoke / dev:tenant scripts now under core/scripts/.
- core/scripts/smoke.mjs: cial-core.db → cial.db.

Externals preserved as-is by intent:
- JWT issuer string 'cial-app' in core/back/src/modules/sso/index.ts +
  app/api/src/lib/sso.ts is an external contract — NOT renamed.
- @cial/back / @cial/edge / @cial/protocol / @cial/sdk / @cial/front
  package names kept stable to minimize blast radius.

Verified:
- pnpm install --prod=false → ok
- turbo run build for protocol, sdk, back, edge, front, platform-back,
  platform-front → all 7 successful (Next builds + tsc clean).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Eliot M 2026-04-29 13:04:45 +00:00
parent fd165fdea9
commit c50cc2b5fb
244 changed files with 515 additions and 325 deletions

View file

@ -8,8 +8,8 @@ description: Build Cial inside the dev container — types, bundles, dist. Use a
Trigger a build via the self-edit endpoint and wait for completion.
## When to use
- After any edit to TypeScript/TSX in `cial-platform/*`.
- After any edit in `cial-core/*`, `cial-core/edge/*`, `protocol/*`, or `sdk/*` (only with `--unrestricted`; otherwise the request returns 403 `unrestricted_required`).
- After any edit to TypeScript/TSX in `platform/*`.
- After any edit in `core/back`, `core/front`, `core/edge`, `core/protocol`, `core/sdk`, or `core/ui` (only with `--unrestricted`; otherwise the request returns 403 `unrestricted_required`).
- Before invoking `cial:restart` on freshly-edited code.
## Endpoint

View file

@ -1,39 +1,46 @@
---
name: cial:self-edit
description: Master skill for editing Cial itself from inside the dev container. Loads the relevant docs/ entries before any edit so the agent understands the architecture, trust boundary, and build/restart loop. Use this BEFORE making any change to cial-core, cial-platform, or supervisor/edge code.
description: Master skill for editing Cial itself from inside the dev container. Loads the relevant docs/ entries before any edit so the agent understands the file layout, trust boundary, and build/restart loop. Use this BEFORE making any change to core/, platform/, or supervisor/edge code.
---
# cial:self-edit
You are about to edit the Cial codebase from inside its own dev tenant. Before touching any file, ground yourself in the architecture by reading the docs.
You are about to edit the Cial codebase from inside its own dev tenant. Before touching any file, ground yourself in the layout and architecture by reading the docs.
## Mandatory pre-read (always)
1. `docs/README.md` — index of everything.
2. `docs/architecture/overview.md` — 3-layer model + container shape.
3. `docs/architecture/core-vs-platform.md` — what's editable in which mode.
4. `docs/self-edit/unrestricted-mode.md` — what `--unrestricted` unlocks.
## Mandatory pre-read (always — in this order)
1. **`docs/file-structure.md`** — the single source of truth for where everything lives in the running container. Tells you what `/cial/core`, `/cial/platform`, `/cial/data`, `/cial/.claude`, `/cial/docs` are, what's editable in which mode, and which paths to use in code.
2. `docs/README.md` — index of everything else.
3. `docs/architecture/overview.md` — 3-layer model + container shape.
4. `docs/architecture/core-vs-platform.md` — what's editable in which mode.
5. `docs/self-edit/unrestricted-mode.md` — what `--unrestricted` unlocks.
## Conditional pre-read (read if relevant to the task)
- Editing UI / chat / sidebar → `docs/ui/design-system.md` + `docs/ui/components.md`.
- Editing deploy / build / supervisor → `docs/architecture/deploy-pipeline.md` + `docs/ops/supervisor.md`.
- Editing dev-tenant launcher / Dockerfile / entrypoint → `docs/architecture/dev-tenant.md`.
- Adding a new self-edit endpoint or skill → `docs/self-edit/api.md` + `docs/self-edit/recipes.md`.
## Workflow
1. **Understand first.** Read the docs above + the actual files you'll touch. Use Read; never guess at paths.
2. **Check scope.** Confirm `CIAL_UNRESTRICTED=1` is set if you need to touch `cial-core/*`, `edge/*`, `protocol/*`, or `sdk/*`. If not, stop and ask Eliot to relaunch with `pnpm dev:tenant --unrestricted`.
1. **Understand first.** Read `docs/file-structure.md` and any other docs above + the actual files you'll touch. Use Read; never guess at paths.
2. **Check scope.** Confirm `CIAL_UNRESTRICTED=1` is set if you need to touch `core/*`, `core/edge/*`, `core/protocol/*`, or `core/sdk/*`. If not, stop and ask Eliot to relaunch with `pnpm dev:tenant --unrestricted`.
3. **Edit.** Use Edit/Write. Keep changes minimal and on-task; do not refactor adjacent code.
4. **Build.** Invoke `cial:build`. If it fails, fix the errors and re-run until ok.
5. **Restart.** Invoke `cial:restart`. If `edgeRestart: true`, wait ~10s then health-check.
6. **Verify.** Hit the affected route or open the affected page and confirm the change is live.
## Hard rules
- Never edit anything under `cial-app/*` — that lives outside the container and you can't deploy it from here.
- Never edit anything under `app/*` — that's the App control plane, lives outside the tenant container, and can't be deployed from here.
- Never bypass the build step. `cial:restart` without a successful prior `cial:build` will just bounce stale code.
- Never `process.exit` or kill the supervisor manually — go through `/api/v1/self/restart`.
- Never write secrets to source files. All secrets live in the Tool Vault.
## Reference
- File layout: `docs/file-structure.md`
- Endpoints: `docs/self-edit/api.md`
- Common flows: `docs/self-edit/recipes.md`
- Why these constraints exist: `docs/architecture/core-vs-platform.md`

View file

@ -1,6 +1,6 @@
# Phase 5 — AI Sessions Engine
> Port the legacy Cial sessions system (`/app/server/src/core`) into `cial-core/back` and wire it through `@cial/protocol`, `@cial/sdk`, and `@cial/core-ui`. Goal: real Claude chat with streaming, persistence, and survival across server restarts.
> Port the legacy Cial sessions system (`/app/server/src/core`) into `core/back` and wire it through `@cial/protocol`, `@cial/sdk`, and `@cial/core-ui`. Goal: real Claude chat with streaming, persistence, and survival across server restarts.
Reference architecture deep-dive: see internal scout report (legacy `/app/server/src/core/services/{claude,gemini,kimi}-process.ts` + `core/ws/{stream,chat-handlers,handler-registry,session-handlers}.ts`).
@ -35,7 +35,7 @@ Reference architecture deep-dive: see internal scout report (legacy `/app/server
- Drizzle migration creating two tables (legacy column names):
- `sessions` (id, user_id, name, claude_session_id, model, status, created_at, updated_at, streaming_state, archived_at, provider, parent_session_id, effort, cwd, group_id) — + reserved nullable columns for ghost/sandbox/team to avoid future migrations
- `messages` (id auto, session_id FK CASCADE, role, content, metadata JSON, created_at)
- `cial-core/back/src/modules/sessions/`
- `core/back/src/modules/sessions/`
- `repository.ts` — Drizzle queries
- `service.ts` — auth-scoped CRUD
- `index.ts` — Express router: `POST /`, `GET /`, `GET /:id`, `PATCH /:id` (rename), `DELETE /:id`, `GET /:id/messages?before=N&limit=N`
@ -54,11 +54,11 @@ Reference architecture deep-dive: see internal scout report (legacy `/app/server
**Goal:** Real WS at `/ws` with auth + multi-tab sync of session list.
**Deliverables**
- `cial-core/back/src/infrastructure/ws/`
- `core/back/src/infrastructure/ws/`
- `server.ts` — upgrade handler that authenticates the WS via Better-Auth session cookie (reject anonymous)
- `handler-registry.ts``register(type, handler)` / dispatch table, ported from legacy pattern
- `clients.ts` — per-user socket pool; `broadcastToUser(userId, msg)`, `sessionFocus` map
- `cial-core/back/src/modules/sessions/ws-handlers.ts`
- `core/back/src/modules/sessions/ws-handlers.ts`
- Inbound: `session.list`, `session.create`, `session.rename`, `session.delete`, `session.focus`, `session.watch`, `session.unwatch`, `session.history`
- Outbound: `session.created`, `session.updated`, `session.deleted`, `session.history`, `session.list`
- WS message envelope: `{ type: string, payload: unknown, requestId?: string }` (matches legacy)
@ -75,7 +75,7 @@ Reference architecture deep-dive: see internal scout report (legacy `/app/server
**Goal:** Detached `claude` CLI processes that survive server restart, streaming events.
**Deliverables**
- `cial-core/back/src/modules/sessions/process/`
- `core/back/src/modules/sessions/process/`
- `types.ts``CliProcessEvents` interface (init, text, thinking, tool_use, tool_result, usage, rate_limit, compact, question, result, cancelled, timeout, heartbeat, close, error) + provider-agnostic shapes (`Tool`, `TurnUsage`, `TurnStats`, `ProcessSnapshot`, `ToolConfigs`)
- `harness-provider.ts`**abstract `HarnessProvider` interface**: `name: 'claude' | 'gemini' | 'kimi'`, `buildSpawnArgs(opts) → { args, env, cwd }`, `parseLine(line, ctx) → CliEvent[]`, `hasLocalState(harnessSessionId) → boolean`, `homeDir(): string`. The tail loop, snapshot/restore, lifecycle, kill flow all live in the generic engine and call into the provider only for these provider-specific bits.
- `providers/claude.ts``ClaudeProvider implements HarnessProvider` (the only one wired in 5c; Gemini/Kimi later just drop in next to it)
@ -105,11 +105,11 @@ Reference architecture deep-dive: see internal scout report (legacy `/app/server
**Goal:** WS-driven chat — send a message, watch tokens stream in, get the final result persisted.
**Deliverables**
- `cial-core/back/src/modules/sessions/ws-stream.ts`
- `core/back/src/modules/sessions/ws-stream.ts`
- `wireProcessToSocket(proc, ws, sessionId)` — subscribes to all CliProcessEvents, broadcasts as `stream.*` messages to focused/watched sockets
- `rewireProcessToSocket(proc, ws, sessionId)` — replays `accumulatedText`, `accumulatedThinking`, `collectedTools`, `latestUsage` to a newly connecting socket
- 500ms debounced `streamingState` save to `sessions.streaming_state` for UI recovery
- `cial-core/back/src/modules/sessions/chat-handlers.ts`
- `core/back/src/modules/sessions/chat-handlers.ts`
- `message {sessionId, content}` — guard on busy, INSERT user message, `processManager.getOrCreate`, `wireProcessToSocket`, `proc.start(...)`
- `cancel {sessionId}``processManager.cancel`
- `answer {sessionId, content}` — for interactive `question` events
@ -173,7 +173,7 @@ Reference architecture deep-dive: see internal scout report (legacy `/app/server
| Risk | Mitigation |
|---|---|
| `claude` binary not on PATH inside container | **5c also lands the Dockerfile change** in `cial-core/docker/` to install `@anthropic-ai/claude-code` globally; dev:tenant fails loud with install instructions if host `claude` missing |
| `claude` binary not on PATH inside container | **5c also lands the Dockerfile change** in `core/docker/` to install `@anthropic-ai/claude-code` globally; dev:tenant fails loud with install instructions if host `claude` missing |
| Per-tenant `~/.claude` state collisions across containers | Each tenant has own `~/.claude` mounted from its volume; **dev:tenant**`scripts/dev-tenant.mjs` copies `~/.claude/` (auth tokens, settings) from host into `./.dev-tenant/claude-home/` on first boot only (re-copy if `--reset-claude` flag) and exports `HOME` to point there for the spawned children |
| Process state file race (two boots overlap) | Atomic write (`.tmp` + rename) + read-then-delete-on-startup pattern from legacy |
| Tail loop CPU cost | 500ms poll matches legacy; can switch to `fs.watch` later if needed |

View file

@ -37,13 +37,13 @@ the tenant container, with rollback as a safety net. Same shape as the
| `fast` mode | Opt-in per-tenant flag (`tenant_settings.deploy_mode = 'fast' \| 'stable'`) | Same toggle PLAN.md §6 calls for. v1 wires the flag and a no-op fast path; real HMR keep-warm lands in 6f if we want it. |
| Restart mechanism | **Supervisor IPC over Unix socket** at `/run/cial-supervisor.sock` (mode `0660`, group `cial`) | Lets Core Back (`cial`) ask the supervisor (PID 1, `cial`) to bounce a specific child without touching the others. Agent (`agent`) cannot reach the socket. |
| Build queue | Single in-flight build per tenant; new requests with same payload are coalesced, different requests queue (max depth 1) | Matches `/app` behaviour; avoids racing pnpm. |
| Build target | `pnpm --filter @cial/platform-front build && pnpm --filter @cial/platform-back build` | Same as legacy. Run from `/opt/cial-monorepo` (where the workspace lives). |
| Build artifacts | In-place at `/opt/cial-monorepo/cial-platform/{front,back}/` (Next standalone output already configured). No atomic-swap for v1. | Bounce-in-place is fine because the supervisor restart drains old workers cleanly. |
| Build target | `pnpm --filter @cial/platform-front build && pnpm --filter @cial/platform-back build` | Same as legacy. Run from `/cial` (where the workspace lives). |
| Build artifacts | In-place at `/cial/platform/{front,back}/` (Next standalone output already configured). No atomic-swap for v1. | Bounce-in-place is fine because the supervisor restart drains old workers cleanly. |
| Git auto-commit granularity | **Per turn** — one commit per assistant message that mutated `/platform/` | Cleanest rollback UX. Matches `/app` flow. |
| Git branch | `instance/<tenant_slug>` on the existing `/platform` repo (initialised on first boot if not a repo) | Same convention as `instance/eliot`. No remote in v1 — local commits only. |
| Rollback strategy | `git revert <sha>..HEAD` (creates a new commit), then redeploy | Non-destructive; preserves history. Matches `/app` deploy reference. |
| Auth | All deploy/git endpoints require Better-Auth tenant session **and** owner role | Same as Vault / DB-proxy in Phase 4. |
| Logging | Build stdout/stderr line-buffered → WS `deploy.log` events; full transcript also written to `/var/lib/cial/deploy-logs/<deployId>.log` (rotated, last 50 kept) | Visible live in chat + auditable later. |
| Logging | Build stdout/stderr line-buffered → WS `deploy.log` events; full transcript also written to `/cial/data/deploy-logs/<deployId>.log` (rotated, last 50 kept) | Visible live in chat + auditable later. |
| Cancellation | `POST /.cial/api/deploy/:id/cancel``kill -SIGTERM` the build child group | Same pattern as session cancel from Phase 5c. |
| Concurrency with sessions | Build does **not** block chat. Restart bounces only platform-* children — Core Back keeps the WS open and emits `deploy.restart.done` so the UI knows when the platform endpoints are reachable again. | Mirrors prod where chat stays alive across deploys. |
@ -65,7 +65,7 @@ the tenant container, with rollback as a safety net. Same shape as the
### Supervisor IPC (new)
`cial-core/edge/src/supervisor.ts` gains a JSONL line-protocol Unix socket:
`core/edge/src/supervisor.ts` gains a JSONL line-protocol Unix socket:
```
{ type: 'restart', service: 'platform-front' | 'platform-back' | 'platform' }
@ -104,10 +104,10 @@ indicator works across sessions, like `/app`).
| Path | Owner | Mode | Touched by |
|---|---|---|---|
| `/opt/cial-monorepo/cial-platform/` | `agent:agent` | 0755 | Agent (writes), Core Back (reads to spawn `pnpm`) |
| `/opt/cial-monorepo/cial-platform/.git/` | `agent:agent` | 0755 | Agent + Core Back's git engine (both run as their respective users; we use a small `gosu agent git ...` shim from Core Back) |
| `/cial/platform/` | `agent:agent` | 0755 | Agent (writes), Core Back (reads to spawn `pnpm`) |
| `/cial/platform/.git/` | `agent:agent` | 0755 | Agent + Core Back's git engine (both run as their respective users; we use a small `gosu agent git ...` shim from Core Back) |
| `/run/cial-supervisor.sock` | `cial:cial` | 0660 | Core Back ↔ supervisor only (agent cannot reach it) |
| `/var/lib/cial/deploy-logs/` | `cial:cial` | 0700 | Core Back only |
| `/cial/data/deploy-logs/` | `cial:cial` | 0700 | Core Back only |
The agent **cannot** call deploy/restart directly via filesystem tricks; it
must go through the Core Back HTTP API (which is what its system prompt
@ -122,7 +122,7 @@ tells it to do). This keeps the privilege boundary intact.
**Goal:** Supervisor can bounce a single child on request, without exiting.
**Deliverables**
- `cial-core/edge/src/supervisor.ts`: add Unix socket server at
- `core/edge/src/supervisor.ts`: add Unix socket server at
`/run/cial-supervisor.sock`, JSONL protocol above.
- Track `restartRequested: Set<string>` so the existing `on('exit')` handler
knows whether to respawn or shut the container down.
@ -143,11 +143,11 @@ tells it to do). This keeps the privilege boundary intact.
bounce it, gated by auth.
**Deliverables**
- `cial-core/back/src/modules/deploy/`
- `core/back/src/modules/deploy/`
- `repository.ts``deploys` table (id, tenant-implicit, mode, status, started_at, ended_at, exit_code, error_summary, log_path, requested_by_user_id)
- `runner.ts``BuildRunner` class:
- Single-flight queue (max depth 1)
- Spawns `pnpm` with `cwd = /opt/cial-monorepo`, `stdio: ['ignore','pipe','pipe']`, `detached: true` so we can SIGTERM the process group
- Spawns `pnpm` with `cwd = /cial`, `stdio: ['ignore','pipe','pipe']`, `detached: true` so we can SIGTERM the process group
- Line-buffered log → emits to `BuildEvents` listener + appends to file
- Reports `ok` only if both platform-front and platform-back exit 0
- `supervisor-client.ts` — tiny UDS client for the IPC defined in 6a
@ -171,7 +171,7 @@ bounce it, gated by auth.
`instance/<slug>`; one call rolls back to any prior sha.
**Deliverables**
- `cial-core/back/src/modules/git/`
- `core/back/src/modules/git/`
- `engine.ts`:
- `ensureRepo()` — on boot, `git init` + `git checkout -b instance/<slug>` if `/platform/.git` missing; configure `user.email = agent@<slug>.cial.local`
- `commitTurn({ turnId, subject, sessionId })``git add -A && git diff --cached --quiet || git commit -m "<subject>\n\nturn:<turnId>\nsession:<sessionId>"`
@ -226,7 +226,7 @@ mirroring `/app`'s UX.
backend plumbing sits unused.
**Deliverables**
- `cial-core/back/src/modules/sessions/process/providers/claude.ts`: append a
- `core/back/src/modules/sessions/process/providers/claude.ts`: append a
Phase-6 block to the base system prompt (kept short — every token costs):
- "Your edits to `/platform/**` only become live after `POST /.cial/api/deploy` returns `ok:true`, then `POST /.cial/api/deploy/restart`."
- "Always run a deploy after edits. If the build fails, fix the errors and redeploy until it passes. Never skip the build."
@ -273,7 +273,7 @@ backend plumbing sits unused.
| Agent calls deploy on every tiny edit, wasting CPU | System prompt says "deploy after a logical change, not after every edit"; debouncer in `BuildRunner` queues at most 1 pending build, coalesces same-target requests within 2s |
| Build child outlives the request that started it (client disconnects) | Build is decoupled from the HTTP request lifecycle (returns `deployId` and runs detached); WS subscribers get events whenever they reconnect; full log on disk |
| `/platform/.git` written by `agent` but Core Back runs as `cial` | All git invocations go through `gosu agent git ...`; never read the working tree from Core Back directly |
| Stable mode bounce drops in-flight Platform requests | Edge proxy (already in `cial-core/edge`) gets a brief drain mode: hold incoming requests for up to 2s while platform restarts, fail fast after; UI shows "Restarting…" pill |
| Stable mode bounce drops in-flight Platform requests | Edge proxy (already in `core/edge`) gets a brief drain mode: hold incoming requests for up to 2s while platform restarts, fail fast after; UI shows "Restarting…" pill |
| Supervisor IPC abused if socket leaks | Socket is `0660 cial:cial`; agent is not in `cial` group. Sanity-check on every IPC line: only allow `service ∈ {platform-front, platform-back, platform}` |
---

View file

@ -30,7 +30,7 @@ tenant CRUD, orchestrator, router) forward.
| App ORM | `drizzle-orm` + `drizzle-kit` | already in deps, lightweight |
| App Auth | `better-auth` | locked-in choice from infra deck |
| Orchestrator driver | `dockerode` | typed, supports the full Docker API; no shelling out |
| Router proxy | tiny Node service using `http-proxy` (lives in `cial-app/router`) | dedicated process, separates concerns, matches `@cial/app-router` package we already scaffolded |
| Router proxy | tiny Node service using `http-proxy` (lives in `app/router`) | dedicated process, separates concerns, matches `@cial/app-router` package we already scaffolded |
| SSO | HS256 JWT (shared secret in `.env.local`) for v1; bumps to RS256 in prod | smaller blast radius for a dev mode |
| Tenant data | named docker volume per tenant: `cial-tenant-{id}-data` | survives `docker stop`, gone with the tenant on destroy |
@ -90,7 +90,7 @@ of the orchestration / routing is built. Chat / agent / deploy logic stays as
/.cial/api/* │ │ │
▼ │ │ │
@cial/back │ │ │ user: cial :4000
(Express) │ │ │ data: /var/lib/cial (cial:0700)
(Express) │ │ │ data: /cial/data (cial:0700)
/.cial/* │ │
▼ │ │
@cial/front │ │ user: cial :4001
@ -113,10 +113,10 @@ of the orchestration / routing is built. Chat / agent / deploy logic stays as
the container if any child dies (so Docker restarts it). No s6/supervisord
— keeps the runtime stack pure Node.
- **Internal edge**: tiny Node `http-proxy` server in a new package
`cial-core/edge` (~100 LOC). Handles HTTP and WebSocket upgrade. The only
`core/edge` (~100 LOC). Handles HTTP and WebSocket upgrade. The only
process bound to `:8080`.
- **Two-user model preserved**: `cial` user owns `/opt/cial-core` (mode 0700),
runs edge + back + front; `agent` user owns `/opt/cial-platform`, runs
- **Two-user model preserved**: `cial` user owns `/cial/core` (mode 0700),
runs edge + back + front; `agent` user owns `/cial/platform`, runs
platform-back + platform-front. Edge is `cial`-owned so the agent process
can never bind the public port.
- **Platform Back exposure**: **internal-only for v1** — reachable from
@ -130,7 +130,7 @@ of the orchestration / routing is built. Chat / agent / deploy logic stays as
**Tasks:**
1. Add `cial-core/edge` package: Node + `http-proxy` + ws upgrade handling.
1. Add `core/edge` package: Node + `http-proxy` + ws upgrade handling.
2. Add `cial-core/supervisor` (or a small `bin/supervisor.mjs` inside `edge`):
spawns the four children with the right user via `process.setuid` (or via
`su-exec` in the entrypoint).
@ -142,22 +142,22 @@ of the orchestration / routing is built. Chat / agent / deploy logic stays as
- `@cial/front` `/.cial``<h1>Cial Core · Tenant {TENANT_ID}</h1>`
- `@cial/back` `/.cial/api/health` already exists (`/healthz` mounted there)
- `@cial/platform-back` `/health` already exists (internal only)
5. Rewrite `cial-core/docker/Dockerfile` as a real multi-stage build that
5. Rewrite `core/docker/Dockerfile` as a real multi-stage build that
produces the runtime image with all four services + the supervisor +
correct ownership/permissions. Entry: `tini -- node /opt/cial-core/edge/bin/supervisor.mjs`.
6. Add `cial-core/docker/.dockerignore` (already there).
correct ownership/permissions. Entry: `tini -- node /opt/core/edge/bin/supervisor.mjs`.
6. Add `core/docker/.dockerignore` (already there).
**Verify (acceptance for L0):**
```bash
docker build -f cial-core/docker/Dockerfile -t cial-tenant:dev .
docker build -f core/docker/Dockerfile -t cial-tenant:dev .
docker run --rm -e TENANT_ID=demo -p 9000:8080 cial-tenant:dev
# In another shell:
curl http://localhost:9000/ # → "Platform · Tenant demo"
curl http://localhost:9000/.cial # → "Cial Core · Tenant demo"
curl http://localhost:9000/.cial/api/healthz # → {"status":"ok",…}
docker exec <id> ls -la /opt/cial-core # → owner cial, mode 0700
docker exec <id> sudo -u agent cat /opt/cial-core/back/dist/index.js
docker exec <id> ls -la /cial/core # → owner cial, mode 0700
docker exec <id> sudo -u agent cat /opt/core/back/dist/index.js
# → permission denied (proves boundary)
```
@ -167,7 +167,7 @@ production-shaped artifact — every later phase just plugs into it.
### L1 — docker compose + Postgres + App API + owner signup
- `docker-compose.yml` at repo root with services: `postgres`, `app-api`
- `cial-app/api`:
- `app/api`:
- Drizzle schema: `users`, `tenants` (slug, name, container_id, container_port, state, owner_id)
- `drizzle-kit generate` + on-boot migrate
- Better-Auth wired at `/api/auth/[...all]` — email+password, no verification in dev
@ -186,7 +186,7 @@ production-shaped artifact — every later phase just plugs into it.
### L3 — Orchestrator Docker driver
- `cial-app/orchestrator/src/drivers/docker.ts`: implements `Orchestrator` via
- `app/orchestrator/src/drivers/docker.ts`: implements `Orchestrator` via
`dockerode`
- `create(tenant)`: `docker create` from `cial-tenant:dev` with env
`TENANT_ID`, label `cial.tenant=<id>`, named volume, exposed port mapped
@ -202,7 +202,7 @@ production-shaped artifact — every later phase just plugs into it.
### L4 — Router reverse proxy on :8080
- `cial-app/router/src/index.ts`: small Node HTTP+WS server on :8080 using
- `app/router/src/index.ts`: small Node HTTP+WS server on :8080 using
`http-proxy`
- Reads tenant routes from Postgres on each request (cache with 5s TTL)
- `Host` header `acme.localhost``tenants WHERE slug='acme' AND state='running'`

24
PLAN.md
View file

@ -59,11 +59,11 @@ These need answers (or explicit defer-to-later) before any code:
**Goal:** The contract every other piece will import.
**Deliverables**
- `cial-core/protocol/` — `@cial/protocol`
- `core/protocol/` — `@cial/protocol`
- All shared types: chat messages, tool calls, deploy events, trigger schemas, DB row shapes the SDK exposes, error envelope
- Zod schemas alongside TS types for runtime validation
- **No runtime code beyond schemas.** Pure types.
- `cial-core/sdk/` — `@cial/sdk`
- `core/sdk/` — `@cial/sdk`
- HTTP + WS client to Core Back from Platform code
- Auth helpers (read tenant session)
- Triggers: `cial.triggers.register({ schedule, handler })`
@ -82,7 +82,7 @@ These need answers (or explicit defer-to-later) before any code:
**Goal:** A runnable HTTP+WS server with health, basic routes, and DB. No AI, no auth yet.
**Deliverables**
- `cial-core/back/` — Express 4 app
- `core/back/` — Express 4 app
- `/healthz`, `/readyz`
- WebSocket server at `/ws`
- Per-tenant SQLite via `better-sqlite3` (DB at `/core-data/cial.db`)
@ -177,11 +177,11 @@ These need answers (or explicit defer-to-later) before any code:
**Goal:** The default Next.js + Node project cloned into every tenant container.
**Deliverables**
- `cial-platform/front/` — Next.js 16 app
- `platform/front/` — Next.js 16 app
- Blank SaaS shell: sidebar, top bar, empty home
- Embedded Cial chat panel (uses `@cial/sdk`)
- Auth pages (use Core Back via SDK)
- `cial-platform/back/` — Node service
- `platform/back/` — Node service
- Example trigger registration
- Example custom route
- `cial-platform/README.md` — "this is YOUR project, edit anything"
@ -196,8 +196,8 @@ These need answers (or explicit defer-to-later) before any code:
**Goal:** The production image with the Linux user separation.
**Deliverables**
- Multi-stage Dockerfile in `cial-core/docker/`
- Builds Core Back → installs into `/opt/cial-core/` owned `cial:cial` mode `0700`
- Multi-stage Dockerfile in `core/docker/`
- Builds Core Back → installs into `/cial/core/` owned `cial:cial` mode `0700`
- Creates `agent:agent` user with full shell access
- Creates `/platform/` owned `agent:agent`
- Creates `/core-data/`, `/secrets/` owned `cial:cial` mode `0700`
@ -207,7 +207,7 @@ These need answers (or explicit defer-to-later) before any code:
- Healthcheck script
- Image size budget: <500 MB
**Done when:** `docker run` the image, exec in as `agent`, confirm `cat /opt/cial-core/foo` is denied, `curl --unix-socket /run/cial-core.sock` works.
**Done when:** `docker run` the image, exec in as `agent`, confirm `cat /cial/core/foo` is denied, `curl --unix-socket /run/cial-core.sock` works.
---
@ -216,12 +216,12 @@ These need answers (or explicit defer-to-later) before any code:
**Goal:** Onboarding, billing, and the subdomain router. No Fly integration yet.
**Deliverables**
- `cial-app/api/` — Next.js 16 (App Router) for owner-facing UI + API
- `app/api/` — Next.js 16 (App Router) for owner-facing UI + API
- Better-Auth for owner accounts
- Signup → tenant record in App's Postgres
- Stripe checkout + webhooks
- Admin dashboard (list tenants, status, usage)
- `cial-app/router/` — subdomain proxy: `*.cial.app` → tenant Machine ID
- `app/router/` — subdomain proxy: `*.cial.app` → tenant Machine ID
- Initially: in-memory map; resolves to a stub URL
- "Open my platform" button → mints JWT → redirects to tenant's `/.cial/sso`
@ -234,7 +234,7 @@ These need answers (or explicit defer-to-later) before any code:
**Goal:** Real per-tenant container provisioning.
**Deliverables**
- `cial-app/orchestrator/` — Fly Machines client
- `app/orchestrator/` — Fly Machines client
- On signup: create volume + Machine from current Core image, register in router
- On Core release: rolling `fly machines update`
- Suspend/wake handling already native to Fly
@ -251,7 +251,7 @@ These need answers (or explicit defer-to-later) before any code:
**Goal:** Scheduled jobs work even when tenant Machine is suspended.
**Deliverables**
- `cial-app/scheduler/` — central cron loop (one process, Postgres-backed queue)
- `app/scheduler/` — central cron loop (one process, Postgres-backed queue)
- Tenant SDK call `cial.triggers.register(...)` syncs to App via internal API
- Scheduler fires HTTP POST to `acme.cial.app/.cial/triggers/run` at the right cadence
- Webhook wakes Fly Machine; trigger handler runs; Machine suspends after

View file

@ -9,26 +9,34 @@ Closed Core. Editable Platform. One container per tenant. Multi-tenant App layer
## Repository layout
```
cial/
├─ cial-core/ CLOSED — the harness shipped as a Docker image to every tenant
│ ├─ back/ Express + WS · AI sessions · auth · git engine · vault · DB proxy
│ ├─ front/ Next.js · the rescue UI served at /.cial/*
│ ├─ sdk/ @cial/sdk — Platform code talks to Core through this
│ ├─ protocol/ @cial/protocol — shared TS types + Zod schemas
│ └─ docker/ Multi-stage Dockerfile · two Linux users (cial + agent)
cial/ ← repo root, mounted at /cial inside the tenant container
├─ core/ CLOSED — the harness shipped as a Docker image to every tenant
│ ├─ back/ Express + WS · AI sessions · auth · git engine · vault · DB proxy
│ ├─ front/ Next.js · the rescue UI served at /.cial/*
│ ├─ ui/ @cial/core-ui — shared React components
│ ├─ sdk/ @cial/sdk — Platform code talks to Core through this
│ ├─ protocol/ @cial/protocol — shared TS types + Zod schemas
│ ├─ edge/ @cial/edge — edge proxy + supervisor (PID 1)
│ ├─ scripts/ dev-tenant.mjs · smoke.mjs · smoke-process.mjs
│ └─ docker/ Multi-stage Dockerfile · two Linux users (cial + agent)
├─ cial-platform/ OPEN — starter cloned into /platform/ of every tenant container
│ ├─ front/ Next.js · the editable user-owned frontend
│ └─ back/ Node · the editable user-owned backend
├─ platform/ OPEN — starter cloned into the tenant container, agent-editable
│ ├─ front/ Next.js · the editable user-owned frontend
│ └─ back/ Node · the editable user-owned backend
└─ cial-app/ CLOSED — multi-tenant ops layer
├─ api/ Next.js · owner signup, billing, admin
├─ orchestrator/ Fly Machines provisioning
├─ router/ Subdomain → tenant Machine ID
├─ scheduler/ Central cron / trigger fabric
└─ docker/ Dockerfile for App itself
├─ app/ CLOSED — multi-tenant ops layer (lives OUTSIDE the tenant container)
│ ├─ api/ Next.js · owner signup, billing, admin
│ ├─ orchestrator/ Fly Machines provisioning
│ ├─ router/ Subdomain → tenant Machine ID
│ ├─ scheduler/ Central cron / trigger fabric
│ └─ docker/ Dockerfile for App itself
├─ docs/ Architecture, ops, self-edit refs (start at file-structure.md)
└─ .claude/ Project skills (cial:self-edit, cial:build, cial:restart)
```
> See [`docs/file-structure.md`](./docs/file-structure.md) for the canonical, runtime-accurate map of paths inside the container.
---
## Stack

View file

@ -3,7 +3,7 @@
*
* Verifies the caller is the signed-in owner, mints a 60s HS256 JWT, and
* 302-redirects the browser to `http://{slug}.localhost:8080/.cial/api/sso?token=…`
* which the tenant's Core Back validates (see cial-core/back modules/sso).
* which the tenant's Core Back validates (see core/back modules/sso).
*
* Implementation note: PLAN-LOCAL.md describes the path as `/.cial/sso`;
* we use `/.cial/api/sso` because the per-tenant edge routes `/.cial/api/*`

View file

@ -19,14 +19,14 @@ RUN corepack enable && corepack prepare pnpm@9.15.0 --activate
RUN apt-get update \
&& apt-get install -y --no-install-recommends ca-certificates git \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /workspace
WORKDIR /cial
FROM base AS builder
# devDependencies (typescript, @types/*) are required to build Next + tsc.
ENV NODE_ENV=development
COPY pnpm-workspace.yaml pnpm-lock.yaml* package.json turbo.json tsconfig.base.json ./
COPY cial-core ./cial-core
COPY cial-app ./cial-app
COPY core ./core
COPY app ./app
RUN --mount=type=cache,id=pnpm,target=/pnpm/store \
pnpm install --frozen-lockfile=false
RUN pnpm turbo run build \
@ -48,9 +48,9 @@ RUN userdel --remove node 2>/dev/null || true \
&& mkdir -p /opt/cial-app \
&& chown -R cial:cial /opt/cial-app
COPY --from=builder --chown=cial:cial /workspace /opt/cial-app
COPY --from=builder --chown=cial:cial /cial /opt/cial-app
USER cial
WORKDIR /opt/cial-app/cial-app/api
WORKDIR /opt/cial-app/app/api
EXPOSE 3100
CMD ["node", "node_modules/next/dist/bin/next", "start", "-p", "3100"]

View file

@ -13,13 +13,13 @@ FROM node:${NODE_VERSION}-bookworm-slim AS base
ENV PNPM_HOME=/pnpm \
PATH=/pnpm:$PATH
RUN corepack enable && corepack prepare pnpm@9.15.0 --activate
WORKDIR /workspace
WORKDIR /cial
FROM base AS builder
ENV NODE_ENV=development
COPY pnpm-workspace.yaml pnpm-lock.yaml* package.json turbo.json tsconfig.base.json ./
COPY cial-core ./cial-core
COPY cial-app ./cial-app
COPY core ./core
COPY app ./app
RUN --mount=type=cache,id=pnpm,target=/pnpm/store \
pnpm install --frozen-lockfile=false
RUN pnpm turbo run build --filter @cial/app-router
@ -34,9 +34,9 @@ RUN userdel --remove node 2>/dev/null || true \
&& mkdir -p /opt/cial-router \
&& chown -R cial:cial /opt/cial-router
COPY --from=builder --chown=cial:cial /workspace /opt/cial-router
COPY --from=builder --chown=cial:cial /cial /opt/cial-router
USER cial
WORKDIR /opt/cial-router/cial-app/router
WORKDIR /opt/cial-router/app/router
EXPOSE 8080
CMD ["node", "dist/index.js"]

View file

@ -6,7 +6,7 @@ The control-plane image. One deploy for the whole platform.
```
docker build \
-f cial-app/docker/Dockerfile \
-f app/docker/Dockerfile \
-t cial-app:dev \
.
```

View file

@ -8,7 +8,7 @@
* - Container name: cial-tenant-<slug>
* - Image: cial-tenant:dev (overridable via TENANT_IMAGE)
* - Labels: cial.tenant=<id>, cial.slug=<slug>
* - Volume: cial-tenant-<id>-data /var/lib/cial
* - Volume: cial-tenant-<id>-data /cial/data
* - Env: TENANT_ID=<id>
* - Ports: container :8080 ephemeral host port (Docker-assigned)
*/
@ -84,7 +84,7 @@ export function createDockerOrchestrator(
HostConfig: {
// Empty HostPort → docker daemon picks a free ephemeral port.
PortBindings: { [TENANT_INTERNAL_PORT]: [{ HostPort: '' }] },
Binds: [`${volumeName}:/var/lib/cial`],
Binds: [`${volumeName}:/cial/data`],
// Don't auto-restart in local mode; lifecycle is owner-driven.
RestartPolicy: { Name: 'no' },
},

View file

@ -7,7 +7,7 @@
"declaration": true,
"tsBuildInfoFile": "dist/.tsbuildinfo"
},
"references": [{ "path": "../../cial-core/protocol" }],
"references": [{ "path": "../../core/protocol" }],
"include": ["src/**/*"],
"exclude": ["node_modules", "dist"]
}

View file

@ -7,7 +7,7 @@
"declaration": true,
"tsBuildInfoFile": "dist/.tsbuildinfo"
},
"references": [{ "path": "../../cial-core/protocol" }],
"references": [{ "path": "../../core/protocol" }],
"include": ["src/**/*"],
"exclude": ["node_modules", "dist"]
}

View file

@ -45,17 +45,17 @@ const ConfigSchema = z.object({
devAutoLogin: z.coerce.boolean().default(false),
// ── Phase 6 deploy controller ────────────────────────────────────────
/** Workspace root that pnpm runs builds against. */
monorepoRoot: z.string().default('/opt/cial-monorepo'),
/** Unix socket the supervisor (cial-core/edge) listens on. */
monorepoRoot: z.string().default('/cial'),
/** Unix socket the supervisor (core/edge) listens on. */
supervisorSocket: z.string().default('/run/cial-supervisor.sock'),
/** Where the deploy controller persists per-build log files. */
deployLogDir: z.string().default('/var/lib/cial/deploy-logs'),
deployLogDir: z.string().default('/cial/data/deploy-logs'),
/** Path to the platform working tree (Phase 6c git engine). */
platformRoot: z.string().default('/opt/cial-monorepo/cial-platform'),
platformRoot: z.string().default('/cial/platform'),
/**
* `--unrestricted` mode: when true, the self-edit endpoints can build and
* restart everything in the container (core, sdk, protocol, edge, platform)
* instead of just `cial-platform`. Set by `pnpm dev:tenant --unrestricted`
* instead of just `platform/`. Set by `pnpm dev:tenant --unrestricted`
* via the CIAL_UNRESTRICTED env var. Never enable in production.
*/
unrestricted: z.coerce.boolean().default(false),

View file

@ -7,8 +7,9 @@
* Default scope is `platform` (mirrors the historical /deploy contract).
* With `CIAL_UNRESTRICTED=1`, scope `all` also works that lets the agent
* build + restart core, sdk, protocol, and edge from inside its own
* container. `cial-app` is never reachable from here (it lives outside
* the container and isn't part of any pnpm build target we expose).
* container. The App control plane (`/cial/app/*`) is never reachable
* from here it lives outside the tenant container and isn't part of
* any pnpm build target we expose.
*/
import type { Router } from 'express';

View file

@ -57,7 +57,14 @@ export class ClaudeProvider implements HarnessProvider {
constructor(opts: ClaudeProviderOpts = {}) {
this.binary = opts.binaryPath ?? 'claude';
this.home = opts.homeDir ?? process.env.CLAUDE_HOME ?? process.env.HOME ?? '/root';
// In tenant containers HOME is `/cial/data/home` (set by the supervisor);
// for ad-hoc local runs without a CLAUDE_HOME we still want a sensible
// fallback that points at the per-tenant volume layout.
this.home =
opts.homeDir ??
process.env.CLAUDE_HOME ??
process.env.HOME ??
'/cial/data/home';
}
homeDir(): string {
@ -125,7 +132,11 @@ export class ClaudeProvider implements HarnessProvider {
command: this.binary,
args,
env,
cwd: opts.cwd || this.home,
// Default cwd is the monorepo root `/cial`. The agent runs IN the
// workspace it edits; skill discovery walks up from cwd looking for
// `.claude/skills/`, so this is what makes `/cial:self-edit` etc.
// visible without per-session bookkeeping.
cwd: opts.cwd || '/cial',
};
}

Some files were not shown because too many files have changed in this diff Show more