cial/cial

mirror of https://github.com/techforces-ai/Cial.git synced 2026-05-15 20:14:11 +00:00

Eliot M 4ec28d1b60 phase(5a): sessions schema + REST CRUD

- protocol: full Zod schemas for sessions/messages with reserved fields for upcoming sub-phases
- back: chat_session + chat_message tables (namespaced to avoid Better-Auth's session table)
- back: SessionsRepository + SessionsService + auth-scoped Express router (POST/GET/PATCH/DELETE /sessions)
- bootstrap: wire migrateSessions
- PHASE-5.md: locked plan for 5a-5e (HarnessProvider abstraction, Claude binary in tenant Dockerfile, dev:tenant copies ~/.claude)

2026-04-26 17:04:37 +00:00

12 KiB

Raw Blame History

Phase 5 — AI Sessions Engine

Port the legacy Cial sessions system (/app/server/src/core) into cial-core/back and wire it through @cial/protocol, @cial/sdk, and @cial/core-ui. Goal: real Claude chat with streaming, persistence, and survival across server restarts.

Reference architecture deep-dive: see internal scout report (legacy /app/server/src/core/services/{claude,gemini,kimi}-process.ts + core/ws/{stream,chat-handlers,handler-registry,session-handlers}.ts).

Scope decisions (lock these before sub-phase 5a)

Decision	Choice	Why
Providers in v1	Claude only, but behind a `HarnessProvider` abstraction	Match legacy 80/20; Gemini/Kimi land in 5f/5g by implementing the same interface — no engine rewrite
Tool approval	Auto-approve (`--dangerously-skip-permissions`)	No approval UI in v1; matches dev autologin trust model
Tools available	Built-in Claude CLI tools (Read/Write/Edit/Bash/Glob/Grep)	No MCP in v1
Dataworlds / presets / agent teams / ghost / sandbox / voice / special commands	Deferred	Land core engine first
FTS search	Deferred (LIKE fallback for now)	Schema slot reserved
Session groups	Deferred	Schema slot reserved
Process state file	`/core-data/.process-state.json` per tenant	Same volume as SQLite
CLI working dir	`/platform/` per tenant (matches Phase 8 design)	Native dev mode uses `process.cwd()` of dev-tenant
`claude` binary	Installed in the tenant Dockerfile as a dep (`npm i -g @anthropic-ai/claude-code` or equivalent); for dev:tenant, host's `claude` is reused	Self-contained image, no PATH surprises
`~/.claude` state	Per-tenant volume mount (production); for dev:tenant, `scripts/dev-tenant.mjs` copies `~/.claude/` from host into the dev-tenant data dir on first boot so Claude is already authed	No re-login per dev session, matches legacy auth flow
Schema shape	Mirror legacy column names	Future-proof for migrating real legacy data, easier port
WS protocol	Reuse legacy message-type strings verbatim	Lets us copy client code with minimal renaming

Sub-phase split — each independently shippable

5a — Schema + REST CRUD

Goal: Sessions and messages persist in tenant SQLite, owned by Better-Auth user.

Deliverables

Drizzle migration creating two tables (legacy column names):
- sessions (id, user_id, name, claude_session_id, model, status, created_at, updated_at, streaming_state, archived_at, provider, parent_session_id, effort, cwd, group_id) — + reserved nullable columns for ghost/sandbox/team to avoid future migrations
- messages (id auto, session_id FK CASCADE, role, content, metadata JSON, created_at)
cial-core/back/src/modules/sessions/
- repository.ts — Drizzle queries
- service.ts — auth-scoped CRUD
- index.ts — Express router: POST /, GET /, GET /:id, PATCH /:id (rename), DELETE /:id, GET /:id/messages?before=N&limit=N
All routes guarded by Better-Auth session; users can only see their own sessions
@cial/protocol/src/sessions.ts — full Zod schemas (Session, Message, Tool, TurnStats, TurnUsage, SubTurn) ported from legacy

Done when

Curl can create → list → fetch messages → delete a session via SSO cookie
Two users see disjoint session lists
Vitest covers the repository

5b — WS handler registry + session handlers

Goal: Real WS at /ws with auth + multi-tab sync of session list.

Deliverables

cial-core/back/src/infrastructure/ws/
- server.ts — upgrade handler that authenticates the WS via Better-Auth session cookie (reject anonymous)
- handler-registry.ts — register(type, handler) / dispatch table, ported from legacy pattern
- clients.ts — per-user socket pool; broadcastToUser(userId, msg), sessionFocus map
cial-core/back/src/modules/sessions/ws-handlers.ts
- Inbound: session.list, session.create, session.rename, session.delete, session.focus, session.watch, session.unwatch, session.history
- Outbound: session.created, session.updated, session.deleted, session.history, session.list
WS message envelope: { type: string, payload: unknown, requestId?: string } (matches legacy)
Heartbeat / ping every 30s

Done when

Two browser tabs open as same user → create in tab A → list updates in tab B within 100ms
Disconnecting one tab doesn't kill events in the other

5c — ClaudeProcess (the engine)

Goal: Detached claude CLI processes that survive server restart, streaming events.

Deliverables

cial-core/back/src/modules/sessions/process/
- types.ts — CliProcessEvents interface (init, text, thinking, tool_use, tool_result, usage, rate_limit, compact, question, result, cancelled, timeout, heartbeat, close, error) + provider-agnostic shapes (Tool, TurnUsage, TurnStats, ProcessSnapshot, ToolConfigs)
- harness-provider.ts — abstract HarnessProvider interface: name: 'claude' | 'gemini' | 'kimi', buildSpawnArgs(opts) → { args, env, cwd }, parseLine(line, ctx) → CliEvent[], hasLocalState(harnessSessionId) → boolean, homeDir(): string. The tail loop, snapshot/restore, lifecycle, kill flow all live in the generic engine and call into the provider only for these provider-specific bits.
- providers/claude.ts — ClaudeProvider implements HarnessProvider (the only one wired in 5c; Gemini/Kimi later just drop in next to it)
- cli-process.ts — generic class with start(message, isResume, toolConfigs), cancel(), _startTailing(), _processLine(), snapshot/restore — provider-agnostic
- process-manager.ts — getOrCreate(sessionId, provider), cancel(sessionId), saveState(), restoreState() — called from instrumentation.ts on boot
- state-file.ts — atomic write/read of /core-data/.process-state.json (snapshot includes provider field so restore picks the right HarnessProvider impl)
Spawn args ported from legacy:
- --dangerously-skip-permissions, --output-format stream-json, --include-partial-messages, --verbose
- --resume {claudeSessionId} if local state exists, else --session-id {claudeSessionId}
- --model, --append-system-prompt (single base prompt for v1)
- -p {message} last
Spawn opts: detached: true, stdio: ['ignore', fd, fd], unref()
Log file path: /core-data/process-logs/{sessionId}-{ts}.jsonl
500ms tail poll, heartbeat 5s, inactivity timeout 24h
Cancel: process.kill(-pid, 'SIGTERM') then SIGKILL after 5s
Snapshot fields: pid, sessionId, claudeSessionId, model, logFile, byteOffset, accumulatedText, accumulatedThinking, collectedTools, latestUsage, turnStats, resultEmitted
Restore on boot: kill -0 pid check → resume tailing OR drain remaining bytes for trailing result event

Done when

Vitest spec spawns a real claude process with a stub message, captures init → text (partial+final) → result events
Kill the test runner mid-stream, restart, recover the same process by PID, see remaining events through result

5d — wireProcessToSocket + chat lifecycle

Goal: WS-driven chat — send a message, watch tokens stream in, get the final result persisted.

Deliverables

cial-core/back/src/modules/sessions/ws-stream.ts
- wireProcessToSocket(proc, ws, sessionId) — subscribes to all CliProcessEvents, broadcasts as stream.* messages to focused/watched sockets
- rewireProcessToSocket(proc, ws, sessionId) — replays accumulatedText, accumulatedThinking, collectedTools, latestUsage to a newly connecting socket
- 500ms debounced streamingState save to sessions.streaming_state for UI recovery
cial-core/back/src/modules/sessions/chat-handlers.ts
- message {sessionId, content} — guard on busy, INSERT user message, processManager.getOrCreate, wireProcessToSocket, proc.start(...)
- cancel {sessionId} — processManager.cancel
- answer {sessionId, content} — for interactive question events
Outbound stream.* types verbatim from legacy:
- stream.init, stream.text, stream.thinking, stream.tool_use, stream.tool_result, stream.usage, stream.rate_limit, stream.heartbeat, stream.compact, stream.done, stream.cancelled, stream.error
- question (interactive ask)
- session.status (idle/busy/error broadcast)
On result: INSERT assistant message with metadata = {tools, stats, thinking, turnHistory}, clear streaming_state, status = 'idle'
On cancelled: INSERT assistant message with _(cancelled)_ suffix
On close with non-zero code without prior result: status error, INSERT crash placeholder

Done when

Send a message from a curl WS test, see streaming text events, get a final assistant message in DB
Cancel mid-stream → _(cancelled)_ message persisted
Restart server mid-stream → reconnect WS → see remaining events + final result

5e — Wire UI + SDK

Goal: The polished AppShell talks to the real backend.

Deliverables

@cial/sdk/src/modules/chat.ts — real subscribe(sessionId, onMsg) and send(sessionId, content) over WS
@cial/sdk/src/modules/sessions.ts — REST helpers: list(), create(name?), delete(id), rename(id, name), history(id, opts?)
@cial/sdk/src/ws-client.ts — auto-reconnect with exponential backoff (max 10s), outbound queue during outage, replay on reconnect, typed message dispatch
@cial/core-ui/src/store.ts — replace localStorage mock with real implementation:
- On mount: sdk.sessions.list() → seed
- On session.focus(id): sdk.sessions.history(id) → load messages
- WS dispatcher: stream.text → append delta to streamingBySession[id], stream.tool_use → add to streaming.tools, stream.done → finalize message + clear streaming, session.status → update session row
- sendMessage(text) → sdk.chat.send(activeId, text)
- cancel() → new method, sends cancel WS message
@cial/core-ui/src/SessionView.tsx — render streaming text live (typing animation), render tool calls inline (collapsed, expandable), show "stop" button while busy
Reconnect indicator in sidebar (small dot color: green/yellow/red)

Done when

pnpm dev:tenant → log in via autologin → type "Say hi in 3 words" → see streaming tokens → message persists → reload page → message is still there
Mid-stream, kill the dev-tenant process, restart → page reconnects → streaming continues from where it was
Stop button cancels mid-stream cleanly

Non-goals for Phase 5 (parking lot)

Multi-provider (Gemini → 5f, Kimi → 5g)
Tool approval UI (5h)
MCP tool plumbing (Phase 4 work — vault feeds MCP env)
Dataworld/preset injection (separate phase)
Agent teams, ghost mode, sandboxing
Voice input/transcribe
Special commands (/orchestrate, /crm, etc.)
Account rotation
FTS5 + search
Session groups (sidebar folders)
Branching / forking sessions
Cost dashboards (data captured, not displayed)

Risks & mitigations

Risk	Mitigation
`claude` binary not on PATH inside container	5c also lands the Dockerfile change in `cial-core/docker/` to install `@anthropic-ai/claude-code` globally; dev:tenant fails loud with install instructions if host `claude` missing
Per-tenant `~/.claude` state collisions across containers	Each tenant has own `~/.claude` mounted from its volume; dev:tenant — `scripts/dev-tenant.mjs` copies `~/.claude/` (auth tokens, settings) from host into `./.dev-tenant/claude-home/` on first boot only (re-copy if `--reset-claude` flag) and exports `HOME` to point there for the spawned children
Process state file race (two boots overlap)	Atomic write (`.tmp` + rename) + read-then-delete-on-startup pattern from legacy
Tail loop CPU cost	500ms poll matches legacy; can switch to `fs.watch` later if needed
WS auth on upgrade is awkward with Better-Auth	Read session cookie from upgrade headers, call `auth.api.getSession({ headers })` before accepting
Streaming state debounce drops final 500ms on crash	Same as legacy — UI re-fetches `session.history` on reconnect, gets authoritative DB state
`--dangerously-skip-permissions` running tools that modify the FS	v1 cwd is `/platform/`; that IS the editable tenant code by design (matches legacy where Claude edits `/app`)

Working agreement (from PLAN.md)

Each sub-phase: implement → self-test against the "Done when" criteria → commit phase(5x): <summary> → push → notify Eliot for sign-off → next sub-phase.

12 KiB Raw Blame History

Phase 5 — AI Sessions Engine

Scope decisions (lock these before sub-phase 5a)

Sub-phase split — each independently shippable

5a — Schema + REST CRUD

5b — WS handler registry + session handlers

5c — ClaudeProcess (the engine)

5d — wireProcessToSocket + chat lifecycle

5e — Wire UI + SDK

Non-goals for Phase 5 (parking lot)

Risks & mitigations

Working agreement (from PLAN.md)

12 KiB

Raw Blame History