cial/docs/design/self-edit-unrestricted.md
Eliot M a0c9973412 docs+fix: refresh /docs to match current container layout, fix BuildRunner filters
Audit pass over docs/ + adjacent code following the cial-* → core/platform/app
layout consolidation.

Bug fix:
- core/back BuildRunner ALL_FILTERS referenced @cial/core-back and
  @cial/core-front, which no longer exist (the packages are @cial/back +
  @cial/front). Self-edit deploys with scope=all would have silently
  skipped those packages. Filters corrected.

Docs aligned with reality:
- docs/README.md       — promotes file-structure.md to the start-here entry.
- architecture/dev-tenant.md  — full rewrite: paths now /cial/* throughout,
  documents the read-only :ro overlay of /cial/core, the new
  --config.confirm-modules-purge=false install flag, the symlink dance for
  project skills, and the agent's cwd=/cial + HOME=/cial/data/home setup.
- architecture/deploy-pipeline.md  — package-name fix for ALL_FILTERS.
- architecture/core-vs-platform.md — package-name fix for the build list.
- ops/supervisor.md     — drops stale "added in Phase 7" annotation.
- ops/deploy-logs.md    — example log line uses @cial/back.
- self-edit/recipes.md  — protocol path and dependency chain naming.
- design/self-edit-unrestricted.md — banner clarifying it's the original
  design record (pre-rename) so an agent doesn't follow stale paths from it.

Tiny code touch:
- core/edge/src/supervisor.dev.ts — comment on CIAL_MONOREPO_ROOT no longer
  contradicts itself ("not /cial" → "the bind-mounted repo at /cial").

Build verified: turbo run build for @cial/back still passes (cache miss
re-executed cleanly with the updated runner.ts).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 13:17:19 +00:00

18 KiB

Self-Edit & --unrestricted Mode — Design

Status: implemented + superseded. This is the original design record from before the layout consolidation. References to cial-core/, cial-platform/, cial-app/, scripts/dev-tenant.mjs, @cial/core-back, @cial/core-front describe the pre-rename layout. For current paths and package names, see:

  • docs/file-structure.md — current container layout
  • docs/architecture/core-vs-platform.md — current edit boundaries
  • docs/architecture/dev-tenant.md — current dev launcher behavior
  • docs/self-edit/{api,recipes,unrestricted-mode}.md — current API contract

Kept as a design record. Do not use as a path/name reference.

Goal

Let the agent inside a pnpm dev:tenant container build and restart Cial itself, without any host-side intervention. By default the agent can only touch cial-platform. With --unrestricted it can touch everything that runs in the container (cial-core, cial-platform, protocol, sdk, edge) — but never cial-app (the multi-tenant orchestrator, which lives outside the container anyway).

Locked decisions

Knob Value
Flag --unrestricted on pnpm dev:tenant
Env var CIAL_UNRESTRICTED=1 (single source of truth, read by core-back + supervisor)
Build endpoint POST /api/v1/self/deploy (mounted on core-back, exposed via edge proxy)
Restart endpoint POST /api/v1/self/restart
Default scope platform-front + platform-back only (current behavior, preserved)
Unrestricted scope platform + core-front + core-back + core-ui + protocol + sdk + edge
Edge restart process.exit(0) on supervisor → Docker restart-policy bounces container
Auth Localhost-only (Express trust-proxy off + req.socket.remoteAddress check) + optional ?sessionId= correlation
Docs docs/ at repo root, copied into container at /cial/docs
Skills .claude/skills/cial:self-edit + cial:build + cial:restart at repo root

Trust-boundary inventory (current state)

The agent-vs-platform boundary is enforced today in 3 spots only (no FS-level ACL in dev):

  1. core/back/src/modules/deploy/runner.ts:160-168 — hardcoded build filter --filter @cial/platform-front --filter @cial/platform-back.
  2. core/back/src/modules/deploy/service.ts:129,209 — hardcoded supervisor.restart('platform').
  3. core/edge/src/supervisor-ipc.ts:22-27RESTARTABLE_SERVICES = Set(['platform-front', 'platform-back']) — and the prod supervisor enforces it at supervisor.ts:239.

In dev mode there is also a 4th pseudo-boundary: supervisor.dev.ts has no IPC server at all, so POST /deploy/restart silently fails today (caught at service.ts:140-143). The dev container relies on tsx watch + next dev --turbopack for hot-reload of platform/core; only protocol, sdk, edge need an explicit rebuild, and only edge needs an explicit restart (which kills the supervisor).


(1) Module-by-module file change list

scripts/dev-tenant.mjs

  • Parse --unrestricted flag alongside existing --reset.
  • When set: dockerArgs.push('-e', 'CIAL_UNRESTRICTED=1') and print a banner so it's obvious from the host log.
  • No other changes — bind-mount layout stays identical.

core/back/src/config/index.ts

  • Add unrestricted: z.coerce.boolean().default(false) to ConfigSchema.
  • Wire env in loadConfig: unrestricted: process.env.CIAL_UNRESTRICTED === '1' || process.env.CIAL_UNRESTRICTED === 'true'.

core/back/src/modules/deploy/runner.ts

  • Add unrestricted: boolean to BuildRunnerOpts.
  • In startBuild, swap the hardcoded filter for a computed args array:
    • unrestricted = false → current platform-only filter.
    • unrestricted = true → see scope table in section (3).
  • No other lifecycle changes.

core/back/src/modules/deploy/service.ts

  • Add unrestricted: boolean to DeployServiceOpts.
  • In restartOnly and the done handler, replace supervisor.restart('platform') with supervisor.restart('all' | 'platform') based on the flag. Update broadcast targets accordingly.
  • New public method restartAll() used by the new self/restart endpoint when unrestricted=true.

core/back/src/modules/deploy/supervisor-client.ts

  • Widen the restart() signature: service: 'platform-front' | 'platform-back' | 'platform' | 'core-front' | 'core-back' | 'all'.
  • 'all' expands on the supervisor side; client just forwards.
  • Update RestartResult.service union accordingly.

core/edge/src/supervisor-ipc.ts

  • Export getRestartableServices(unrestricted: boolean): ReadonlySet<string> instead of (or in addition to) the static set.
  • Widen RestartTarget union to include core-front, core-back, edge.
  • Update parseIpcLine to accept the wider service names; rejection happens later against the runtime set.
  • Update expandRestartTargets so 'all' returns every restartable name in the current set.

core/edge/src/supervisor.ts (prod supervisor)

  • Read CIAL_UNRESTRICTED at startup once.
  • Build the runtime restartable set via getRestartableServices(unrestricted).
  • For edge, the restart handler does NOT spawn a replacement — it acks, then process.exit(0) so Docker's restart policy recreates the container.
  • Status reply unchanged.

core/edge/src/supervisor.dev.ts (dev supervisor)

  • Add the same Unix-socket IPC server as prod — currently missing.
  • Restart semantics in dev:
    • platform-* / core-*: kill the child; supervisor respawns. Note: tsx watch and next dev already hot-reload, so a forced restart is rarely needed but still works.
    • edge: process.exit(0).
  • Print a clear banner at boot when CIAL_UNRESTRICTED=1.

core/back/src/modules/self/ (NEW module)

  • New module mirroring deploy/ but with a tighter, agent-facing surface.
  • Files:
    • index.tscreateSelfModule({ deployService, config, logger }) returns a Router.
    • router.ts — two routes (see section 2).
    • localhost-only.ts — middleware that 403s anything not from 127.0.0.1 / ::1 / Unix socket.
  • Mounted in app.ts at /api/v1/self. Edge proxy already forwards /api/v1/* to core-back unchanged.

core/docker/Dockerfile.dev

  • Add COPY docs /cial/docs (read at build time so docs are inside the image even if the bind mount is missing).
  • Add COPY .claude /cial/.claude for the skills.
  • (Both also visible via the bind mount on pnpm dev:tenant; the COPY is a fallback so the image is self-contained.)

core/docker/Dockerfile (prod)

  • Same COPY docs + COPY .claude so docs ship in prod images too.

docs/ (NEW)

  • See section (4).

.claude/skills/cial:{self-edit,build,restart}/SKILL.md (NEW)

  • See section (5).

(2) Endpoint contracts

Both endpoints live on core-back at /api/v1/self/* and are reachable from inside the container at http://127.0.0.1:4000 directly, or via edge at http://127.0.0.1:8080/api/v1/self/*.

Auth & rate limiting

  • Localhost-only middleware (network namespace = container = trusted). Requests from anywhere else → 403 not_localhost.
  • Optional ?sessionId=<uuid> query param is correlated into the deploy row for traceability (mirrors old-Cial behavior). Not used for auth.
  • Single-flight: BuildRunner already enforces 1 build at a time + 1 queued slot. Restart is awaited, no overlap possible.

POST /api/v1/self/deploy

Request body (all optional):

{
  "mode": "fast" | "stable",   // default: current persisted mode
  "sessionId": "uuid",          // for log correlation
  "scope": "auto" | "platform" | "all"  // default: "auto"
}

scope=autoall if CIAL_UNRESTRICTED=1, else platform. Explicit scope=all when CIAL_UNRESTRICTED=0 returns 403 unrestricted_required.

Success (202 Accepted):

{
  "ok": true,
  "deployId": "uuid",
  "status": "queued" | "building",
  "scope": "platform" | "all",
  "logPath": "/cial/data/deploy-logs/<uuid>.log",
  "buildFilter": ["@cial/platform-front", "@cial/platform-back"]
}

Errors:

  • 403 not_localhost — request not from loopback.
  • 403 unrestricted_requiredscope=all but flag off.
  • 409 build_in_progress — only when caller passes ?wait=false; otherwise the runner's coalesce/queue applies.

Streaming logs continue to flow over the existing WS bridge (DeployWsType.DeployLog). The skill polls for completion (see SKILL.md outline).

POST /api/v1/self/restart

Request body (all optional):

{
  "sessionId": "uuid",
  "scope": "auto" | "platform" | "all"
}

Same scope=auto rules as /deploy.

Success (200):

{
  "ok": true,
  "scope": "platform" | "all",
  "restarted": [
    { "service": "platform-front", "pid": 1234, "durationMs": 412 },
    { "service": "platform-back",  "pid": 1235, "durationMs": 388 }
  ],
  "edgeRestart": false              // true ⇒ container is bouncing, response sent right before exit(0)
}

When scope=all and unrestricted, the response is sent before the edge restart fires; the client should expect the connection to drop ~100ms after receiving the 200.

Errors:

  • 403 not_localhost.
  • 403 unrestricted_required.
  • 503 supervisor_unreachable — IPC socket missing or timed out.

(3) Widened build filter & restart sets

Build filter

function buildArgs(unrestricted: boolean): string[] {
  if (!unrestricted) {
    return ['--filter', '@cial/platform-front', '--filter', '@cial/platform-back', 'build'];
  }
  return [
    '--filter', '@cial/protocol',
    '--filter', '@cial/sdk',
    '--filter', '@cial/core-ui',
    '--filter', '@cial/core-back',
    '--filter', '@cial/core-front',
    '--filter', '@cial/edge',
    '--filter', '@cial/platform-back',
    '--filter', '@cial/platform-front',
    'build',
  ];
}

Excluded on purpose: @cial/app (out-of-container), tests, eslint, scripts. pnpm's filter graph will pull in any local workspace deps automatically; the explicit list documents intent.

Restart targets

const PLATFORM_RESTARTABLES = new Set(['platform-front', 'platform-back']);
const ALL_RESTARTABLES     = new Set([
  'platform-front', 'platform-back',
  'core-front', 'core-back',
  'edge',                  // edge does process.exit(0); container restart-policy bounces
]);

function getRestartableServices(unrestricted: boolean) {
  return unrestricted ? ALL_RESTARTABLES : PLATFORM_RESTARTABLES;
}

Expansion shorthand:

  • 'platform'['platform-front', 'platform-back']
  • 'core'['core-front', 'core-back'] (only valid when unrestricted)
  • 'all' → entire set (only valid when unrestricted)

edge is intentionally addressable as a single name only; 'all' includes it last so platform/core restarts stream their restart.done events first.


(4) docs/ outline

docs/
├── README.md                    # index + when to read what
├── architecture/
│   ├── overview.md              # 3-layer model, container shape, ports
│   ├── core-vs-platform.md      # trust boundary explained
│   ├── deploy-pipeline.md       # BuildRunner → Supervisor IPC → restart, with sequence diagram
│   └── dev-tenant.md            # how `pnpm dev:tenant` works end-to-end
├── self-edit/
│   ├── unrestricted-mode.md     # what --unrestricted unlocks + risks
│   ├── api.md                   # /api/v1/self/* reference (mirrors section 2 here)
│   └── recipes.md               # common edit→build→restart flows
├── ui/
│   ├── design-system.md         # tailwind tokens, motion, lucide conventions
│   └── components.md            # core/ui component map
├── ops/
│   ├── supervisor.md            # IPC protocol, restart semantics, edge process.exit
│   └── deploy-logs.md           # where logs live, how to tail
└── design/
    └── self-edit-unrestricted.md  # this file

docs/README.md is the entry point and links every other file. The .claude/skills/* SKILL.md files reference specific docs by relative path so the agent always has somewhere to look up details.


(5) .claude/skills/cial:* SKILL.md outlines

.claude/skills/cial:self-edit/SKILL.md

---
name: cial:self-edit
description: Master skill for editing Cial itself from inside the dev container. Loads the relevant docs/ entries before any edit so the agent understands the architecture, trust boundary, and build/restart loop. Use this BEFORE making any change to cial-core, cial-platform, or supervisor/edge code.
---

# cial:self-edit

You are about to edit the Cial codebase from inside its own dev tenant. Before touching any file, ground yourself in the architecture by reading the docs.

## Mandatory pre-read (always)
1. `docs/README.md` — index of everything.
2. `docs/architecture/overview.md` — 3-layer model + container shape.
3. `docs/architecture/core-vs-platform.md` — what's editable in which mode.
4. `docs/self-edit/unrestricted-mode.md` — what `--unrestricted` unlocks.

## Conditional pre-read (read if relevant to the task)
- Editing UI / chat / sidebar → `docs/ui/design-system.md` + `docs/ui/components.md`.
- Editing deploy / build / supervisor → `docs/architecture/deploy-pipeline.md` + `docs/ops/supervisor.md`.
- Editing dev-tenant launcher / Dockerfile / entrypoint → `docs/architecture/dev-tenant.md`.
- Adding a new self-edit endpoint or skill → `docs/self-edit/api.md` + `docs/self-edit/recipes.md`.

## Workflow
1. **Understand first.** Read the docs above + the actual files you'll touch (use Read, never guess at paths).
2. **Check scope.** Confirm `CIAL_UNRESTRICTED` is set if you need to touch `cial-core/*`, `edge/*`, `protocol/*`, `sdk/*`. If not, stop and ask Eliot to relaunch with `pnpm dev:tenant --unrestricted`.
3. **Edit.** Use Edit/Write. Keep changes minimal and on-task; do not refactor adjacent code.
4. **Build.** Invoke `cial:build`. If it fails, fix the errors and re-run until ok.
5. **Restart.** Invoke `cial:restart`. If `edgeRestart: true`, wait ~10s then health-check.
6. **Verify.** Hit the affected route or open the affected page and confirm the change is live.

## Hard rules
- Never edit anything under `cial-app/*` — that lives outside the container and you can't deploy it from here.
- Never bypass the build step. `cial:restart` without a successful prior `cial:build` will just bounce stale code.
- Never `process.exit` or kill the supervisor manually — go through `/api/v1/self/restart`.
- Never write secrets to source files. All secrets live in the Tool Vault.

## Reference
- Endpoints: `docs/self-edit/api.md`
- Common flows: `docs/self-edit/recipes.md`
- Why these constraints exist: `docs/architecture/core-vs-platform.md`

.claude/skills/cial:build/SKILL.md

---
name: cial:build
description: Build Cial inside the dev container — types, bundles, dist. Use after editing platform code, or core code if --unrestricted is on.
---

# cial:build

Trigger a build via the self-edit endpoint and wait for completion.

## When to use
- After any edit to TypeScript/TSX in cial-platform.
- After any edit in cial-core/* (only with --unrestricted; otherwise this will 403).
- Before invoking `cial:restart` on freshly-edited code.

## Steps
1. POST http://127.0.0.1:8080/api/v1/self/deploy with `{ "scope": "auto", "sessionId": "$CIAL_SESSION_ID" }`.
2. Capture the returned `deployId` and `logPath`.
3. Poll GET /api/v1/deploy/{deployId} every 2s until `status` ∈ {`ok`, `error`, `cancelled`}.
4. On error, read the last ~50 lines of `logPath` and surface the failure summary.

## Auth
None — endpoint is localhost-gated. Just curl it.

## See also
- docs/self-edit/api.md
- docs/architecture/deploy-pipeline.md

.claude/skills/cial:restart/SKILL.md

---
name: cial:restart
description: Restart Cial services inside the dev container. Default = platform only. With --unrestricted, can restart core + edge (edge restart bounces the whole container).
---

# cial:restart

Bounce one or more services via the self-edit endpoint.

## When to use
- After `cial:build` returned ok and you need the new code live.
- When a process is stuck and needs a kick.

## Steps
1. POST http://127.0.0.1:8080/api/v1/self/restart with `{ "scope": "auto", "sessionId": "$CIAL_SESSION_ID" }`.
2. Read the response to confirm `restarted[]` includes the services you expected.
3. If `edgeRestart: true`, expect the connection to drop — wait ~10s, then health-check `GET /healthz`.

## Auth
None — localhost-gated.

## Notes
- Without --unrestricted, restarting core or edge returns 403 `unrestricted_required`.
- After core-back restart, any open WS connections (including the agent's own deploy log stream) will be dropped and need to reconnect.

## See also
- docs/self-edit/api.md
- docs/ops/supervisor.md

Open question for implementation phase

Edge restart timing. When scope=all and unrestricted, the response sequence is:

  1. Restart platform (acks + dones via WS).
  2. Restart core (acks + dones via WS).
  3. Send 200 response with edgeRestart: true.
  4. ~50ms setTimeout → supervisor process.exit(0).

Risk: if the container takes >5s to come back, the agent thinks the restart hung. Mitigation: SKILL.md tells the agent to expect connection drop + retry /healthz. This is how the old Cial does it too.

Implementation order

  1. Plumbing-only: add config flag + dev-tenant.mjs flag + Dockerfile copies. No behavior change yet.
  2. Widen supervisor-ipc.ts + supervisor.ts + supervisor-client.ts (still rejected at runtime if flag off).
  3. Add IPC server to supervisor.dev.ts.
  4. Add self module + endpoints, behind localhost guard.
  5. Add docs/ and .claude/skills/cial:*.
  6. End-to-end smoke: pnpm dev:tenant --unrestricted → from agent shell, edit a core file → cial:buildcial:restart → see change.