Audit pass over docs/ + adjacent code following the cial-* → core/platform/app
layout consolidation.
Bug fix:
- core/back BuildRunner ALL_FILTERS referenced @cial/core-back and
@cial/core-front, which no longer exist (the packages are @cial/back +
@cial/front). Self-edit deploys with scope=all would have silently
skipped those packages. Filters corrected.
Docs aligned with reality:
- docs/README.md — promotes file-structure.md to the start-here entry.
- architecture/dev-tenant.md — full rewrite: paths now /cial/* throughout,
documents the read-only :ro overlay of /cial/core, the new
--config.confirm-modules-purge=false install flag, the symlink dance for
project skills, and the agent's cwd=/cial + HOME=/cial/data/home setup.
- architecture/deploy-pipeline.md — package-name fix for ALL_FILTERS.
- architecture/core-vs-platform.md — package-name fix for the build list.
- ops/supervisor.md — drops stale "added in Phase 7" annotation.
- ops/deploy-logs.md — example log line uses @cial/back.
- self-edit/recipes.md — protocol path and dependency chain naming.
- design/self-edit-unrestricted.md — banner clarifying it's the original
design record (pre-rename) so an agent doesn't follow stale paths from it.
Tiny code touch:
- core/edge/src/supervisor.dev.ts — comment on CIAL_MONOREPO_ROOT no longer
contradicts itself ("not /cial" → "the bind-mounted repo at /cial").
Build verified: turbo run build for @cial/back still passes (cache miss
re-executed cleanly with the updated runner.ts).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
18 KiB
Self-Edit & --unrestricted Mode — Design
Status: implemented + superseded. This is the original design record from before the layout consolidation. References to
cial-core/,cial-platform/,cial-app/,scripts/dev-tenant.mjs,@cial/core-back,@cial/core-frontdescribe the pre-rename layout. For current paths and package names, see:
docs/file-structure.md— current container layoutdocs/architecture/core-vs-platform.md— current edit boundariesdocs/architecture/dev-tenant.md— current dev launcher behaviordocs/self-edit/{api,recipes,unrestricted-mode}.md— current API contractKept as a design record. Do not use as a path/name reference.
Goal
Let the agent inside a pnpm dev:tenant container build and restart Cial itself, without any host-side intervention. By default the agent can only touch cial-platform. With --unrestricted it can touch everything that runs in the container (cial-core, cial-platform, protocol, sdk, edge) — but never cial-app (the multi-tenant orchestrator, which lives outside the container anyway).
Locked decisions
| Knob | Value |
|---|---|
| Flag | --unrestricted on pnpm dev:tenant |
| Env var | CIAL_UNRESTRICTED=1 (single source of truth, read by core-back + supervisor) |
| Build endpoint | POST /api/v1/self/deploy (mounted on core-back, exposed via edge proxy) |
| Restart endpoint | POST /api/v1/self/restart |
| Default scope | platform-front + platform-back only (current behavior, preserved) |
| Unrestricted scope | platform + core-front + core-back + core-ui + protocol + sdk + edge |
| Edge restart | process.exit(0) on supervisor → Docker restart-policy bounces container |
| Auth | Localhost-only (Express trust-proxy off + req.socket.remoteAddress check) + optional ?sessionId= correlation |
| Docs | docs/ at repo root, copied into container at /cial/docs |
| Skills | .claude/skills/cial:self-edit + cial:build + cial:restart at repo root |
Trust-boundary inventory (current state)
The agent-vs-platform boundary is enforced today in 3 spots only (no FS-level ACL in dev):
core/back/src/modules/deploy/runner.ts:160-168— hardcoded build filter--filter @cial/platform-front --filter @cial/platform-back.core/back/src/modules/deploy/service.ts:129,209— hardcodedsupervisor.restart('platform').core/edge/src/supervisor-ipc.ts:22-27—RESTARTABLE_SERVICES = Set(['platform-front', 'platform-back'])— and the prod supervisor enforces it atsupervisor.ts:239.
In dev mode there is also a 4th pseudo-boundary: supervisor.dev.ts has no IPC server at all, so POST /deploy/restart silently fails today (caught at service.ts:140-143). The dev container relies on tsx watch + next dev --turbopack for hot-reload of platform/core; only protocol, sdk, edge need an explicit rebuild, and only edge needs an explicit restart (which kills the supervisor).
(1) Module-by-module file change list
scripts/dev-tenant.mjs
- Parse
--unrestrictedflag alongside existing--reset. - When set:
dockerArgs.push('-e', 'CIAL_UNRESTRICTED=1')and print a banner so it's obvious from the host log. - No other changes — bind-mount layout stays identical.
core/back/src/config/index.ts
- Add
unrestricted: z.coerce.boolean().default(false)toConfigSchema. - Wire env in
loadConfig:unrestricted: process.env.CIAL_UNRESTRICTED === '1' || process.env.CIAL_UNRESTRICTED === 'true'.
core/back/src/modules/deploy/runner.ts
- Add
unrestricted: booleantoBuildRunnerOpts. - In
startBuild, swap the hardcoded filter for a computedargsarray:unrestricted = false→ current platform-only filter.unrestricted = true→ see scope table in section (3).
- No other lifecycle changes.
core/back/src/modules/deploy/service.ts
- Add
unrestricted: booleantoDeployServiceOpts. - In
restartOnlyand thedonehandler, replacesupervisor.restart('platform')withsupervisor.restart('all' | 'platform')based on the flag. Update broadcast targets accordingly. - New public method
restartAll()used by the new self/restart endpoint whenunrestricted=true.
core/back/src/modules/deploy/supervisor-client.ts
- Widen the
restart()signature:service: 'platform-front' | 'platform-back' | 'platform' | 'core-front' | 'core-back' | 'all'. 'all'expands on the supervisor side; client just forwards.- Update
RestartResult.serviceunion accordingly.
core/edge/src/supervisor-ipc.ts
- Export
getRestartableServices(unrestricted: boolean): ReadonlySet<string>instead of (or in addition to) the static set. - Widen
RestartTargetunion to include core-front, core-back, edge. - Update
parseIpcLineto accept the wider service names; rejection happens later against the runtime set. - Update
expandRestartTargetsso'all'returns every restartable name in the current set.
core/edge/src/supervisor.ts (prod supervisor)
- Read
CIAL_UNRESTRICTEDat startup once. - Build the runtime restartable set via
getRestartableServices(unrestricted). - For
edge, the restart handler does NOT spawn a replacement — it acks, thenprocess.exit(0)so Docker's restart policy recreates the container. - Status reply unchanged.
core/edge/src/supervisor.dev.ts (dev supervisor)
- Add the same Unix-socket IPC server as prod — currently missing.
- Restart semantics in dev:
platform-*/core-*: kill the child; supervisor respawns. Note:tsx watchandnext devalready hot-reload, so a forced restart is rarely needed but still works.edge:process.exit(0).
- Print a clear banner at boot when
CIAL_UNRESTRICTED=1.
core/back/src/modules/self/ (NEW module)
- New module mirroring
deploy/but with a tighter, agent-facing surface. - Files:
index.ts—createSelfModule({ deployService, config, logger })returns a Router.router.ts— two routes (see section 2).localhost-only.ts— middleware that 403s anything not from 127.0.0.1 / ::1 / Unix socket.
- Mounted in
app.tsat/api/v1/self. Edge proxy already forwards/api/v1/*to core-back unchanged.
core/docker/Dockerfile.dev
- Add
COPY docs /cial/docs(read at build time so docs are inside the image even if the bind mount is missing). - Add
COPY .claude /cial/.claudefor the skills. - (Both also visible via the bind mount on
pnpm dev:tenant; the COPY is a fallback so the image is self-contained.)
core/docker/Dockerfile (prod)
- Same
COPY docs+COPY .claudeso docs ship in prod images too.
docs/ (NEW)
- See section (4).
.claude/skills/cial:{self-edit,build,restart}/SKILL.md (NEW)
- See section (5).
(2) Endpoint contracts
Both endpoints live on core-back at /api/v1/self/* and are reachable from inside the container at http://127.0.0.1:4000 directly, or via edge at http://127.0.0.1:8080/api/v1/self/*.
Auth & rate limiting
- Localhost-only middleware (network namespace = container = trusted). Requests from anywhere else → 403
not_localhost. - Optional
?sessionId=<uuid>query param is correlated into the deploy row for traceability (mirrors old-Cial behavior). Not used for auth. - Single-flight: BuildRunner already enforces 1 build at a time + 1 queued slot. Restart is awaited, no overlap possible.
POST /api/v1/self/deploy
Request body (all optional):
{
"mode": "fast" | "stable", // default: current persisted mode
"sessionId": "uuid", // for log correlation
"scope": "auto" | "platform" | "all" // default: "auto"
}
scope=auto → all if CIAL_UNRESTRICTED=1, else platform. Explicit scope=all when CIAL_UNRESTRICTED=0 returns 403 unrestricted_required.
Success (202 Accepted):
{
"ok": true,
"deployId": "uuid",
"status": "queued" | "building",
"scope": "platform" | "all",
"logPath": "/cial/data/deploy-logs/<uuid>.log",
"buildFilter": ["@cial/platform-front", "@cial/platform-back"]
}
Errors:
403 not_localhost— request not from loopback.403 unrestricted_required—scope=allbut flag off.409 build_in_progress— only when caller passes?wait=false; otherwise the runner's coalesce/queue applies.
Streaming logs continue to flow over the existing WS bridge (DeployWsType.DeployLog). The skill polls for completion (see SKILL.md outline).
POST /api/v1/self/restart
Request body (all optional):
{
"sessionId": "uuid",
"scope": "auto" | "platform" | "all"
}
Same scope=auto rules as /deploy.
Success (200):
{
"ok": true,
"scope": "platform" | "all",
"restarted": [
{ "service": "platform-front", "pid": 1234, "durationMs": 412 },
{ "service": "platform-back", "pid": 1235, "durationMs": 388 }
],
"edgeRestart": false // true ⇒ container is bouncing, response sent right before exit(0)
}
When scope=all and unrestricted, the response is sent before the edge restart fires; the client should expect the connection to drop ~100ms after receiving the 200.
Errors:
403 not_localhost.403 unrestricted_required.503 supervisor_unreachable— IPC socket missing or timed out.
(3) Widened build filter & restart sets
Build filter
function buildArgs(unrestricted: boolean): string[] {
if (!unrestricted) {
return ['--filter', '@cial/platform-front', '--filter', '@cial/platform-back', 'build'];
}
return [
'--filter', '@cial/protocol',
'--filter', '@cial/sdk',
'--filter', '@cial/core-ui',
'--filter', '@cial/core-back',
'--filter', '@cial/core-front',
'--filter', '@cial/edge',
'--filter', '@cial/platform-back',
'--filter', '@cial/platform-front',
'build',
];
}
Excluded on purpose: @cial/app (out-of-container), tests, eslint, scripts. pnpm's filter graph will pull in any local workspace deps automatically; the explicit list documents intent.
Restart targets
const PLATFORM_RESTARTABLES = new Set(['platform-front', 'platform-back']);
const ALL_RESTARTABLES = new Set([
'platform-front', 'platform-back',
'core-front', 'core-back',
'edge', // edge does process.exit(0); container restart-policy bounces
]);
function getRestartableServices(unrestricted: boolean) {
return unrestricted ? ALL_RESTARTABLES : PLATFORM_RESTARTABLES;
}
Expansion shorthand:
'platform'→['platform-front', 'platform-back']'core'→['core-front', 'core-back'](only valid when unrestricted)'all'→ entire set (only valid when unrestricted)
edge is intentionally addressable as a single name only; 'all' includes it last so platform/core restarts stream their restart.done events first.
(4) docs/ outline
docs/
├── README.md # index + when to read what
├── architecture/
│ ├── overview.md # 3-layer model, container shape, ports
│ ├── core-vs-platform.md # trust boundary explained
│ ├── deploy-pipeline.md # BuildRunner → Supervisor IPC → restart, with sequence diagram
│ └── dev-tenant.md # how `pnpm dev:tenant` works end-to-end
├── self-edit/
│ ├── unrestricted-mode.md # what --unrestricted unlocks + risks
│ ├── api.md # /api/v1/self/* reference (mirrors section 2 here)
│ └── recipes.md # common edit→build→restart flows
├── ui/
│ ├── design-system.md # tailwind tokens, motion, lucide conventions
│ └── components.md # core/ui component map
├── ops/
│ ├── supervisor.md # IPC protocol, restart semantics, edge process.exit
│ └── deploy-logs.md # where logs live, how to tail
└── design/
└── self-edit-unrestricted.md # this file
docs/README.md is the entry point and links every other file. The .claude/skills/* SKILL.md files reference specific docs by relative path so the agent always has somewhere to look up details.
(5) .claude/skills/cial:* SKILL.md outlines
.claude/skills/cial:self-edit/SKILL.md
---
name: cial:self-edit
description: Master skill for editing Cial itself from inside the dev container. Loads the relevant docs/ entries before any edit so the agent understands the architecture, trust boundary, and build/restart loop. Use this BEFORE making any change to cial-core, cial-platform, or supervisor/edge code.
---
# cial:self-edit
You are about to edit the Cial codebase from inside its own dev tenant. Before touching any file, ground yourself in the architecture by reading the docs.
## Mandatory pre-read (always)
1. `docs/README.md` — index of everything.
2. `docs/architecture/overview.md` — 3-layer model + container shape.
3. `docs/architecture/core-vs-platform.md` — what's editable in which mode.
4. `docs/self-edit/unrestricted-mode.md` — what `--unrestricted` unlocks.
## Conditional pre-read (read if relevant to the task)
- Editing UI / chat / sidebar → `docs/ui/design-system.md` + `docs/ui/components.md`.
- Editing deploy / build / supervisor → `docs/architecture/deploy-pipeline.md` + `docs/ops/supervisor.md`.
- Editing dev-tenant launcher / Dockerfile / entrypoint → `docs/architecture/dev-tenant.md`.
- Adding a new self-edit endpoint or skill → `docs/self-edit/api.md` + `docs/self-edit/recipes.md`.
## Workflow
1. **Understand first.** Read the docs above + the actual files you'll touch (use Read, never guess at paths).
2. **Check scope.** Confirm `CIAL_UNRESTRICTED` is set if you need to touch `cial-core/*`, `edge/*`, `protocol/*`, `sdk/*`. If not, stop and ask Eliot to relaunch with `pnpm dev:tenant --unrestricted`.
3. **Edit.** Use Edit/Write. Keep changes minimal and on-task; do not refactor adjacent code.
4. **Build.** Invoke `cial:build`. If it fails, fix the errors and re-run until ok.
5. **Restart.** Invoke `cial:restart`. If `edgeRestart: true`, wait ~10s then health-check.
6. **Verify.** Hit the affected route or open the affected page and confirm the change is live.
## Hard rules
- Never edit anything under `cial-app/*` — that lives outside the container and you can't deploy it from here.
- Never bypass the build step. `cial:restart` without a successful prior `cial:build` will just bounce stale code.
- Never `process.exit` or kill the supervisor manually — go through `/api/v1/self/restart`.
- Never write secrets to source files. All secrets live in the Tool Vault.
## Reference
- Endpoints: `docs/self-edit/api.md`
- Common flows: `docs/self-edit/recipes.md`
- Why these constraints exist: `docs/architecture/core-vs-platform.md`
.claude/skills/cial:build/SKILL.md
---
name: cial:build
description: Build Cial inside the dev container — types, bundles, dist. Use after editing platform code, or core code if --unrestricted is on.
---
# cial:build
Trigger a build via the self-edit endpoint and wait for completion.
## When to use
- After any edit to TypeScript/TSX in cial-platform.
- After any edit in cial-core/* (only with --unrestricted; otherwise this will 403).
- Before invoking `cial:restart` on freshly-edited code.
## Steps
1. POST http://127.0.0.1:8080/api/v1/self/deploy with `{ "scope": "auto", "sessionId": "$CIAL_SESSION_ID" }`.
2. Capture the returned `deployId` and `logPath`.
3. Poll GET /api/v1/deploy/{deployId} every 2s until `status` ∈ {`ok`, `error`, `cancelled`}.
4. On error, read the last ~50 lines of `logPath` and surface the failure summary.
## Auth
None — endpoint is localhost-gated. Just curl it.
## See also
- docs/self-edit/api.md
- docs/architecture/deploy-pipeline.md
.claude/skills/cial:restart/SKILL.md
---
name: cial:restart
description: Restart Cial services inside the dev container. Default = platform only. With --unrestricted, can restart core + edge (edge restart bounces the whole container).
---
# cial:restart
Bounce one or more services via the self-edit endpoint.
## When to use
- After `cial:build` returned ok and you need the new code live.
- When a process is stuck and needs a kick.
## Steps
1. POST http://127.0.0.1:8080/api/v1/self/restart with `{ "scope": "auto", "sessionId": "$CIAL_SESSION_ID" }`.
2. Read the response to confirm `restarted[]` includes the services you expected.
3. If `edgeRestart: true`, expect the connection to drop — wait ~10s, then health-check `GET /healthz`.
## Auth
None — localhost-gated.
## Notes
- Without --unrestricted, restarting core or edge returns 403 `unrestricted_required`.
- After core-back restart, any open WS connections (including the agent's own deploy log stream) will be dropped and need to reconnect.
## See also
- docs/self-edit/api.md
- docs/ops/supervisor.md
Open question for implementation phase
Edge restart timing. When scope=all and unrestricted, the response sequence is:
- Restart platform (acks + dones via WS).
- Restart core (acks + dones via WS).
- Send 200 response with
edgeRestart: true. - ~50ms
setTimeout→ supervisorprocess.exit(0).
Risk: if the container takes >5s to come back, the agent thinks the restart hung. Mitigation: SKILL.md tells the agent to expect connection drop + retry /healthz. This is how the old Cial does it too.
Implementation order
- Plumbing-only: add config flag + dev-tenant.mjs flag + Dockerfile copies. No behavior change yet.
- Widen supervisor-ipc.ts + supervisor.ts + supervisor-client.ts (still rejected at runtime if flag off).
- Add IPC server to supervisor.dev.ts.
- Add
selfmodule + endpoints, behind localhost guard. - Add
docs/and.claude/skills/cial:*. - End-to-end smoke:
pnpm dev:tenant --unrestricted→ from agent shell, edit a core file →cial:build→cial:restart→ see change.