Aleph
Architecture

Desktop Bridge

UDS JSON-RPC 2.0 protocol for desktop capabilities

The Desktop Bridge gives Aleph physical-world capabilities: taking screenshots, reading screen text via OCR, clicking buttons, typing text, managing windows, and rendering canvas overlays. It is the primary implementation of the "Muscles" limb in the 1-2-3-4 model.

Architecture

The bridge follows the Brain-Limb separation principle (redline R1): the Rust Core never calls platform-specific APIs directly. Instead, it sends JSON-RPC requests over a Unix Domain Socket to a Tauri-based desktop application that implements the actual system calls.

┌──────────────────────┐         UDS          ┌──────────────────────┐
│     Aleph Core       │  ──JSON-RPC 2.0──▶   │   Desktop Bridge     │
│     (Brain)          │                       │   (Tauri App)        │
│                      │                       │                      │
│  DesktopBridgeClient │  ◀──JSON-RPC──────   │  AppKit / Vision /   │
│  (src/desktop/)      │    (response)         │  CoreGraphics / ...  │
└──────────────────────┘                       └──────────────────────┘
        Rust Core                                  Native Platform

This separation means:

  • The Core can run on a headless server with no desktop capabilities
  • Desktop capabilities can be upgraded independently of the Core
  • Platform-specific code is isolated in the Tauri app, not in src

Socket Path Resolution

The bridge supports two deployment modes, each with a different socket path:

ModeSocket PathDescription
Managed~/.aleph/run/desktop-bridge.sockBridge launched by BridgeSupervisor (daemon mode)
Standalone~/.aleph/bridge.sockBridge launched manually by the user

The DesktopBridgeClient::new() probes both paths, preferring the managed socket:

pub fn new() -> Self {
    let home = dirs::home_dir().expect("cannot resolve home directory");
    let managed = home.join(".aleph/run/desktop-bridge.sock");
    let standalone = home.join(".aleph/bridge.sock");

    // Prefer managed socket when it exists
    let socket_path = if managed.exists() { managed } else { standalone };
    Self { socket_path }
}

Protocol

Every interaction follows the JSON-RPC 2.0 protocol over a newline-delimited UDS stream. Each request opens a fresh connection, sends one request, reads one response, and closes.

Request Format

{
    "jsonrpc": "2.0",
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "method": "desktop.screenshot",
    "params": {
        "region": null
    }
}

Response Format

{
    "jsonrpc": "2.0",
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "result": {
        "image_base64": "iVBORw0KGgo...",
        "width": 2560,
        "height": 1600
    }
}

Error Format

{
    "jsonrpc": "2.0",
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "error": {
        "code": -32000,
        "message": "No focused window"
    }
}

Timeout

All requests have a 30-second timeout. If the bridge does not respond within this window, the client returns a DesktopError::Protocol error.

Capabilities

The bridge provides capabilities in four categories: Perception, Action, Canvas, and Internal.

Perception Methods

These methods let Aleph "see" the screen:

MethodParametersReturnsDescription
desktop.screenshotregion?: ScreenRegionBase64 imageCapture full screen or a specific region
desktop.ocrimage_base64?: StringExtracted textRun OCR on a provided image or current screen
desktop.ax_treeapp_bundle_id?: StringAccessibility tree JSONGet the accessibility hierarchy of an app
desktop.snapshotapp_bundle_id?: String, max_depth?: u32, include_non_interactive?: boolSnapshot with elementsGet a UI snapshot with resolved elements

Action Methods

These methods let Aleph "act" on the desktop:

MethodParametersDescription
desktop.clickref?: String, x?: f64, y?: f64, button: StringClick at coordinates or a ref element
desktop.double_clickref?: String, x?: f64, y?: f64, button: StringDouble-click
desktop.type_textref?: String, text: StringType text into focused element or ref
desktop.key_combokeys: [String]Press a key combination (e.g., ["cmd", "c"])
desktop.scrollref?: String, x?: f64, y?: f64, delta_x: f64, delta_y: f64Scroll at position or ref
desktop.dragstart_ref?, start_x?, start_y?, end_ref?, end_x?, end_y?, duration_ms?Drag from start to end
desktop.hoverref?: String, x?: f64, y?: f64Move mouse to position
desktop.pastetext: StringPaste text via clipboard
desktop.launch_appbundle_id: StringLaunch an application
desktop.window_list(none)List all visible windows
desktop.focus_windowwindow_id: u32Bring a window to front

Canvas Methods

Canvas provides an HTML overlay window for rich visual output:

MethodParametersDescription
desktop.canvas_showhtml: String, position: CanvasPositionShow an HTML overlay at a screen position
desktop.canvas_hide(none)Hide the canvas overlay
desktop.canvas_updatepatch: JSONIncrementally update the canvas content

Internal Methods

MethodParametersDescription
desktop.ping(none)Health check, returns "pong"

Request Types

All request variants are defined in the DesktopRequest enum:

pub enum DesktopRequest {
    // Perception
    Screenshot { region: Option<ScreenRegion> },
    Ocr { image_base64: Option<String> },
    AxTree { app_bundle_id: Option<String> },
    Snapshot { app_bundle_id: Option<String>, max_depth: Option<u32>,
              include_non_interactive: Option<bool> },

    // Action (coordinate or ref-based)
    Click { ref_id: Option<String>, x: Option<f64>, y: Option<f64>,
            button: MouseButton },
    DoubleClick { ref_id: Option<String>, x: Option<f64>, y: Option<f64>,
                  button: MouseButton },
    TypeText { ref_id: Option<String>, text: String },
    KeyCombo { keys: Vec<String> },
    Scroll { ref_id: Option<String>, x: Option<f64>, y: Option<f64>,
             delta_x: f64, delta_y: f64 },
    Drag { start_ref: Option<String>, start_x: Option<f64>,
           start_y: Option<f64>, end_ref: Option<String>,
           end_x: Option<f64>, end_y: Option<f64>,
           duration_ms: Option<u64> },
    Hover { ref_id: Option<String>, x: Option<f64>, y: Option<f64> },
    Paste { text: String },
    LaunchApp { bundle_id: String },
    WindowList,
    FocusWindow { window_id: u32 },

    // Canvas
    CanvasShow { html: String, position: CanvasPosition },
    CanvasHide,
    CanvasUpdate { patch: serde_json::Value },

    // Internal
    Ping,
}

Element References

Action methods support two targeting modes: coordinate-based and ref-based. Ref-based targeting uses element IDs from a prior desktop.snapshot call:

pub struct ResolvedElement {
    pub ref_id: RefId,       // e.g., "e1", "e12"
    pub role: String,        // UI role (button, text field, etc.)
    pub label: Option<String>, // Accessible label
    pub frame: ScreenRegion, // Bounding box on screen
}

The workflow:

1. desktop.snapshot → { elements: [
     { ref_id: "e1", role: "button", label: "Submit", frame: {...} },
     { ref_id: "e2", role: "textField", label: "Email", frame: {...} },
   ]}

2. desktop.click { ref: "e1" }   → clicks the Submit button
3. desktop.type_text { ref: "e2", text: "[email protected]" }

This is more reliable than raw coordinates because:

  • Elements can be targeted by semantic identity
  • The bridge resolves the ref to current coordinates at click time
  • Layout changes between snapshot and action are handled automatically

Supporting Types

/// A rectangular region on screen (pixels)
pub struct ScreenRegion {
    pub x: f64,
    pub y: f64,
    pub width: f64,
    pub height: f64,
}

/// Mouse button variants
pub enum MouseButton {
    Left,
    Right,
    Middle,
}

/// Canvas overlay position and size
pub struct CanvasPosition {
    pub x: f64,
    pub y: f64,
    pub width: f64,
    pub height: f64,
}

/// Statistics about a UI snapshot
pub struct SnapshotStats {
    pub total_elements: u32,
    pub interactive: u32,
    pub max_depth: u32,
}

Client Usage

The DesktopBridgeClient provides the Rust-side API:

let client = DesktopBridgeClient::new();

// Check if bridge is available
if !client.is_available() {
    return Err("Desktop Bridge not running");
}

// Take a screenshot
let result = client.send(DesktopRequest::Screenshot {
    region: None,
}).await?;

// Click a button
client.send(DesktopRequest::Click {
    ref_id: Some("e7".into()),
    x: None,
    y: None,
    button: MouseButton::Left,
}).await?;

// Type text
client.send(DesktopRequest::TypeText {
    ref_id: None,
    text: "Hello, world!".into(),
}).await?;

Error Handling

pub enum DesktopError {
    /// Bridge socket does not exist -- app not running
    AppNotRunning,

    /// Connection was refused -- app crashed or socket stale
    ConnectionFailed(std::io::Error),

    /// Protocol error (timeout, malformed response)
    Protocol(String),

    /// Operation error (bridge returned an error)
    Operation(String),
}

Module Structure

src/desktop/
├── mod.rs      # Module entry, re-exports
├── types.rs    # DesktopRequest, DesktopResponse, ScreenRegion, etc.
├── client.rs   # DesktopBridgeClient: UDS connection and JSON-RPC
└── error.rs    # DesktopError types

On this page