1-2-3-4 Architecture Model

The 1-2-3-4 model is the central organizing principle of Aleph's engineering architecture. It answers one question: how does a self-hosted AI assistant bridge the gap between pure reasoning and physical/digital action?

The answer is a strict separation of concerns across four numbered layers:

1 Core (the Brain) -- reasoning, state, routing
2 Faces (the Interfaces) -- how users interact with Aleph
3 Limbs (the Execution Systems) -- how Aleph acts on the world
4 Nerves (the Communication Protocols) -- how each layer talks to the Core

                          ┌───────────────────────┐
                          │     2. FACES           │
                          │  ┌─────────────────┐   │
                          │  │  Unified Panel   │   │
    User ────────────────▶│  │ (Leptos / WASM)  │   │
                          │  └────────┬────────┘   │
                          │           │ WebSocket   │
                          │  ┌────────┴────────┐   │
    Social Platforms ────▶│  │  Social Gateway  │   │
                          │  │ (Telegram, etc.) │   │
                          │  └────────┬────────┘   │
                          └───────────┼────────────┘
                    gRPC/NATS ────────┤
                    WebSocket/RPC ────┤
                                      │
                          ┌───────────┴────────────┐
                          │      1. CORE            │
                          │  ┌─────────────────┐   │
                          │  │   Reasoning      │   │
                          │  │   State Mgmt     │   │
                          │  │   Routing         │   │
                          │  └────────┬────────┘   │
                          └───────────┼────────────┘
                                      │
              ┌───────────────────────┼───────────────────────┐
              │                       │                       │
    ┌─────────┴──────────┐  ┌────────┴────────┐  ┌──────────┴─────────┐
    │  Native (Muscles)  │  │   MCP (Tools)   │  │  Skills (Expertise) │
    │  Desktop Bridge    │  │  Playwright,    │  │  PPT Expert,       │
    │  Shell, OCR        │  │  Google Maps    │  │  Code Review       │
    └────────────────────┘  └─────────────────┘  └────────────────────┘
              │                       │                       │
              └───────────────────────┴───────────────────────┘
                              3. LIMBS

1 Core -- The Brain

The Rust Core is Aleph's soul. It is responsible for exactly three things and nothing else:

Responsibility	Description
Reasoning	Decide what to do next. The Core orchestrates the Harness (Observe-Think-Act-Feedback), invokes LLM providers, and runs the Prompt System for goal-oriented execution.
State Management	Maintain conversation context, task graphs, session keys, and the memory system.
Routing	Dispatch work to the correct execution system -- native tools, MCP servers, or skill plugins -- based on task requirements.

What the Core does NOT do

The Core never renders UI, never calls platform-specific APIs (AppKit, Vision, CoreGraphics), and never bundles heavy third-party libraries for niche functionality. These are architectural redlines (R1, R3) that keep the Core lightweight and portable.

// The Core defines capability contracts as traits.
// Physical implementations live in the Desktop Bridge.
//
// Example: the Core can request a screenshot, but the
// actual CGWindowListCreateImage call happens in Tauri.
pub trait DesktopCapability: Send + Sync {
    async fn screenshot(&self, region: Option<ScreenRegion>) -> Result<Vec<u8>>;
    async fn click(&self, x: f64, y: f64, button: MouseButton) -> Result<()>;
    async fn type_text(&self, text: &str) -> Result<()>;
}

2 Faces -- The Interfaces

Aleph presents two faces to the world. Both are pure I/O layers -- they convert user input into JSON-RPC calls and render responses back to the user. Neither face contains business logic (redline R4).

Face 1: Unified Panel (Leptos/WASM)

A single Leptos codebase compiled to WASM serves as the UI across all platforms:

Platform	Host	Notes
Web	Browser	Served by the Aleph server
macOS	Tauri shell	Menu-bar-first, Halo floating window
Windows	Tauri shell	System tray integration
Linux	Tauri shell	Standard desktop window

The native shell (Tauri) provides only the window container, system tray, and native animations. All complex UI -- settings pages, conversation views, debug panels -- lives in the WASM layer (redline R2).

The Gateway connects Aleph to messaging platforms as a persistent background intelligence:

Telegram -- bot interface via teloxide
Discord -- bot interface via serenity
iMessage -- bridged via AppleScript/Shortcuts

The Gateway translates platform-specific message formats into Aleph's unified JSON-RPC protocol. A single Aleph Core can serve multiple social channels simultaneously, each with independent session routing.

3 Limbs -- The Execution Systems

Limbs are how Aleph acts on the world. There are three categories, ordered by proximity to the system:

Limb 1: Native Capabilities (The Muscles)

Direct system control through the Desktop Bridge and shell execution:

Capability	Implementation	Protocol
Screenshots / OCR	Desktop Bridge (Tauri)	UDS JSON-RPC
Mouse / Keyboard	Desktop Bridge (Tauri)	UDS JSON-RPC
Window Management	Desktop Bridge (Tauri)	UDS JSON-RPC
Canvas Overlay	Desktop Bridge (Tauri)	UDS JSON-RPC
Shell Commands	Built-in Exec system	Direct

The Desktop Bridge runs as a separate Tauri process. The Core communicates with it over a Unix Domain Socket using JSON-RPC 2.0, maintaining the Brain-Limb separation principle.

Limb 2: MCP (The External Tools)

The Model Context Protocol lets Aleph leverage the broader tool ecosystem:

Core ──JSON-RPC──▶ MCP Server (Playwright)
Core ──JSON-RPC──▶ MCP Server (Google Maps)
Core ──JSON-RPC──▶ MCP Server (GitHub)

MCP servers run as separate processes. Aleph's mcp/ module handles server lifecycle, capability discovery, and JSON-RPC transport.

Limb 3: Skills and Plugins (The Expertise)

Skills represent domain knowledge and specialized capabilities:

Skill Type	Runtime	Example
WASM Plugin	Extism sandbox	Data transformation, custom logic
Node.js Plugin	IPC subprocess	Complex integrations
Python Skill	Managed subprocess	Data analysis, ML pipelines
Bash Skill	Shell execution	System administration

Skills are the future growth vector for Aleph. The architecture is the skeleton; skills are the muscle and blood.

4 Nerves -- The Communication Protocols

Every interaction between the Core and its Faces or Limbs travels through one of four nerve channels:

Nerve	Channel	Protocol	Purpose
N1	Core -- UI	WebSocket + JSON-RPC	Drive the Unified Panel. Bidirectional: the UI sends user input, the Core streams responses and events.
N2	Core -- Desktop Bridge	UDS + JSON-RPC 2.0	Drive desktop control. One request-response per connection over `~/.aleph/run/desktop-bridge.sock`.
N3	Core -- Gateway	gRPC / NATS	Drive social bots. High-throughput, supports fan-out for multi-channel delivery.
N4	Core -- MCP	JSON-RPC over stdio/SSE	Drive external tools. Standard MCP transport, one server per tool domain.

Protocol Selection Rationale

┌──────────────────────────────────────────────────────────────────────┐
│                        PROTOCOL MAP                                  │
├──────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  N1 (WebSocket)     Best for: real-time bidirectional streaming     │
│  ────────────────   Why: UI needs live token streaming,             │
│                     event-driven updates, low latency               │
│                                                                      │
│  N2 (UDS/IPC)       Best for: local high-frequency control          │
│  ────────────────   Why: Desktop Bridge is co-located,              │
│                     UDS avoids TCP overhead, sub-ms latency         │
│                                                                      │
│  N3 (gRPC/NATS)     Best for: structured service-to-service        │
│  ────────────────   Why: Gateway may run on different host,         │
│                     needs schema evolution, fan-out                  │
│                                                                      │
│  N4 (JSON-RPC)      Best for: standardized tool integration         │
│  ────────────────   Why: MCP ecosystem uses JSON-RPC over           │
│                     stdio or SSE, Aleph follows the standard        │
│                                                                      │
└──────────────────────────────────────────────────────────────────────┘

Architectural Redlines

The 1-2-3-4 model is enforced by seven architectural redlines. Violations are not merged:

Redline	Rule	Consequence
R1	Brain-Limb Separation	`core/src` never imports platform APIs. The Core defines traits; the Desktop Bridge implements them.
R2	Single Source of UI Truth	Tauri shell has no business-logic UI. All complex views are in Leptos/WASM.
R3	Core Minimalism	No heavy third-party crates for niche functionality. Prefer Skills or MCP.
R4	I/O-Only Interfaces	Faces never persist data, retrieve memories, or plan tasks.
R5	Menu Bar First	macOS defaults to menu-bar mode with Halo overlay. Full windows only when needed.
R6	AI Comes to You	Reduce context switching. Halo, notifications, inline suggestions.
R7	One Core, Many Shells	Rust Core is the single brain. UI is unified via WASM. Native shells only wrap.

Data Flow: End-to-End Request

Here is how a user message flows through the 1-2-3-4 model:

1. User sends message via Telegram
   │
   ▼
2. Gateway (Face 2) receives message, resolves route
   │  → SessionKey: agent:main:telegram:dm:user123
   │
   ▼
3. Core (Brain) receives JSON-RPC request
   │  → Agent Loop: Observe → Think → Act
   │  → Thinker calls LLM provider
   │  → LLM decides: "take a screenshot, then OCR it"
   │
   ▼
4. Core dispatches to Desktop Bridge (Limb 1, Nerve N2)
   │  → UDS JSON-RPC: desktop.screenshot → base64 image
   │  → UDS JSON-RPC: desktop.ocr → extracted text
   │
   ▼
5. Core continues reasoning with OCR results
   │  → LLM generates final response
   │
   ▼
6. Core sends response back through Gateway (Nerve N3)
   │
   ▼
7. Gateway delivers response to Telegram
   │
   ▼
8. User sees the reply in their chat

Directory Mapping

Each layer of the 1-2-3-4 model maps to specific directories in the codebase:

aleph/
├── src/                         # 1. CORE (Brain)
│   ├── harness/                 #    Reasoning: Observe-Think-Act-Feedback
│   ├── thinker/                 #    Reasoning: LLM interaction
│   ├── dispatcher/              #    Reasoning: Task orchestration
│   ├── routing/                 #    Routing: Session keys
│   ├── memory/                  #    State: Memory system
│   ├── resilience/              #    State: Task persistence
│   ├── gateway/                 #    2. FACE 2 (Gateway server)
│   │   ├── handlers/            #       RPC method handlers
│   │   ├── interfaces/          #       Telegram, Discord, iMessage
│   │   └── security/            #       Auth, pairing, devices
│   ├── desktop/                 #    4. NERVE N2 (Bridge client)
│   ├── mcp/                     #    4. NERVE N4 (MCP client)
│   ├── tools/                   #    3. LIMB (Tool definitions)
│   ├── builtin_tools/           #    3. LIMB 1 (Native tools)
│   └── extension/               #    3. LIMB 3 (Plugin runtime)
├── apps/
│   ├── desktop/                 #    3. LIMB 1 (Desktop Bridge)
│   └── cli/                     #    2. FACE 1 (CLI client)
└── web/                         #    2. FACE 1 (Leptos/WASM Panel)

Design Philosophy

The 1-2-3-4 model reflects a core belief: an AI assistant should be a brain with swappable limbs, not a monolith. By separating reasoning from execution, Aleph can:

Add new channels (Face) without touching the Core
Add new capabilities (Limb) without touching the UI
Upgrade communication protocols (Nerve) independently
Run the Core on a headless server while the Faces and Limbs run elsewhere

This is not microservices for the sake of it. It is a pragmatic separation that keeps a complex system maintainable by a small team. The architecture is now stable -- future work fills in capabilities, not restructures the skeleton.

1-2-3-4 Architecture Model

On this page