Desktop Bridge
UDS JSON-RPC 2.0 protocol for desktop capabilities
The Desktop Bridge gives Aleph physical-world capabilities: taking screenshots, reading screen text via OCR, clicking buttons, typing text, managing windows, and rendering canvas overlays. It is the primary implementation of the "Muscles" limb in the 1-2-3-4 model.
Architecture
The bridge follows the Brain-Limb separation principle (redline R1): the Rust Core never calls platform-specific APIs directly. Instead, it sends JSON-RPC requests over a Unix Domain Socket to a Tauri-based desktop application that implements the actual system calls.
┌──────────────────────┐ UDS ┌──────────────────────┐
│ Aleph Core │ ──JSON-RPC 2.0──▶ │ Desktop Bridge │
│ (Brain) │ │ (Tauri App) │
│ │ │ │
│ DesktopBridgeClient │ ◀──JSON-RPC────── │ AppKit / Vision / │
│ (src/desktop/) │ (response) │ CoreGraphics / ... │
└──────────────────────┘ └──────────────────────┘
Rust Core Native PlatformThis separation means:
- The Core can run on a headless server with no desktop capabilities
- Desktop capabilities can be upgraded independently of the Core
- Platform-specific code is isolated in the Tauri app, not in
src
Socket Path Resolution
The bridge supports two deployment modes, each with a different socket path:
| Mode | Socket Path | Description |
|---|---|---|
| Managed | ~/.aleph/run/desktop-bridge.sock | Bridge launched by BridgeSupervisor (daemon mode) |
| Standalone | ~/.aleph/bridge.sock | Bridge launched manually by the user |
The DesktopBridgeClient::new() probes both paths, preferring the managed socket:
pub fn new() -> Self {
let home = dirs::home_dir().expect("cannot resolve home directory");
let managed = home.join(".aleph/run/desktop-bridge.sock");
let standalone = home.join(".aleph/bridge.sock");
// Prefer managed socket when it exists
let socket_path = if managed.exists() { managed } else { standalone };
Self { socket_path }
}Protocol
Every interaction follows the JSON-RPC 2.0 protocol over a newline-delimited UDS stream. Each request opens a fresh connection, sends one request, reads one response, and closes.
Request Format
{
"jsonrpc": "2.0",
"id": "550e8400-e29b-41d4-a716-446655440000",
"method": "desktop.screenshot",
"params": {
"region": null
}
}Response Format
{
"jsonrpc": "2.0",
"id": "550e8400-e29b-41d4-a716-446655440000",
"result": {
"image_base64": "iVBORw0KGgo...",
"width": 2560,
"height": 1600
}
}Error Format
{
"jsonrpc": "2.0",
"id": "550e8400-e29b-41d4-a716-446655440000",
"error": {
"code": -32000,
"message": "No focused window"
}
}Timeout
All requests have a 30-second timeout. If the bridge does not respond within this window, the client returns a DesktopError::Protocol error.
Capabilities
The bridge provides capabilities in four categories: Perception, Action, Canvas, and Internal.
Perception Methods
These methods let Aleph "see" the screen:
| Method | Parameters | Returns | Description |
|---|---|---|---|
desktop.screenshot | region?: ScreenRegion | Base64 image | Capture full screen or a specific region |
desktop.ocr | image_base64?: String | Extracted text | Run OCR on a provided image or current screen |
desktop.ax_tree | app_bundle_id?: String | Accessibility tree JSON | Get the accessibility hierarchy of an app |
desktop.snapshot | app_bundle_id?: String, max_depth?: u32, include_non_interactive?: bool | Snapshot with elements | Get a UI snapshot with resolved elements |
Action Methods
These methods let Aleph "act" on the desktop:
| Method | Parameters | Description |
|---|---|---|
desktop.click | ref?: String, x?: f64, y?: f64, button: String | Click at coordinates or a ref element |
desktop.double_click | ref?: String, x?: f64, y?: f64, button: String | Double-click |
desktop.type_text | ref?: String, text: String | Type text into focused element or ref |
desktop.key_combo | keys: [String] | Press a key combination (e.g., ["cmd", "c"]) |
desktop.scroll | ref?: String, x?: f64, y?: f64, delta_x: f64, delta_y: f64 | Scroll at position or ref |
desktop.drag | start_ref?, start_x?, start_y?, end_ref?, end_x?, end_y?, duration_ms? | Drag from start to end |
desktop.hover | ref?: String, x?: f64, y?: f64 | Move mouse to position |
desktop.paste | text: String | Paste text via clipboard |
desktop.launch_app | bundle_id: String | Launch an application |
desktop.window_list | (none) | List all visible windows |
desktop.focus_window | window_id: u32 | Bring a window to front |
Canvas Methods
Canvas provides an HTML overlay window for rich visual output:
| Method | Parameters | Description |
|---|---|---|
desktop.canvas_show | html: String, position: CanvasPosition | Show an HTML overlay at a screen position |
desktop.canvas_hide | (none) | Hide the canvas overlay |
desktop.canvas_update | patch: JSON | Incrementally update the canvas content |
Internal Methods
| Method | Parameters | Description |
|---|---|---|
desktop.ping | (none) | Health check, returns "pong" |
Request Types
All request variants are defined in the DesktopRequest enum:
pub enum DesktopRequest {
// Perception
Screenshot { region: Option<ScreenRegion> },
Ocr { image_base64: Option<String> },
AxTree { app_bundle_id: Option<String> },
Snapshot { app_bundle_id: Option<String>, max_depth: Option<u32>,
include_non_interactive: Option<bool> },
// Action (coordinate or ref-based)
Click { ref_id: Option<String>, x: Option<f64>, y: Option<f64>,
button: MouseButton },
DoubleClick { ref_id: Option<String>, x: Option<f64>, y: Option<f64>,
button: MouseButton },
TypeText { ref_id: Option<String>, text: String },
KeyCombo { keys: Vec<String> },
Scroll { ref_id: Option<String>, x: Option<f64>, y: Option<f64>,
delta_x: f64, delta_y: f64 },
Drag { start_ref: Option<String>, start_x: Option<f64>,
start_y: Option<f64>, end_ref: Option<String>,
end_x: Option<f64>, end_y: Option<f64>,
duration_ms: Option<u64> },
Hover { ref_id: Option<String>, x: Option<f64>, y: Option<f64> },
Paste { text: String },
LaunchApp { bundle_id: String },
WindowList,
FocusWindow { window_id: u32 },
// Canvas
CanvasShow { html: String, position: CanvasPosition },
CanvasHide,
CanvasUpdate { patch: serde_json::Value },
// Internal
Ping,
}Element References
Action methods support two targeting modes: coordinate-based and ref-based. Ref-based targeting uses element IDs from a prior desktop.snapshot call:
pub struct ResolvedElement {
pub ref_id: RefId, // e.g., "e1", "e12"
pub role: String, // UI role (button, text field, etc.)
pub label: Option<String>, // Accessible label
pub frame: ScreenRegion, // Bounding box on screen
}The workflow:
1. desktop.snapshot → { elements: [
{ ref_id: "e1", role: "button", label: "Submit", frame: {...} },
{ ref_id: "e2", role: "textField", label: "Email", frame: {...} },
]}
2. desktop.click { ref: "e1" } → clicks the Submit button
3. desktop.type_text { ref: "e2", text: "[email protected]" }This is more reliable than raw coordinates because:
- Elements can be targeted by semantic identity
- The bridge resolves the ref to current coordinates at click time
- Layout changes between snapshot and action are handled automatically
Supporting Types
/// A rectangular region on screen (pixels)
pub struct ScreenRegion {
pub x: f64,
pub y: f64,
pub width: f64,
pub height: f64,
}
/// Mouse button variants
pub enum MouseButton {
Left,
Right,
Middle,
}
/// Canvas overlay position and size
pub struct CanvasPosition {
pub x: f64,
pub y: f64,
pub width: f64,
pub height: f64,
}
/// Statistics about a UI snapshot
pub struct SnapshotStats {
pub total_elements: u32,
pub interactive: u32,
pub max_depth: u32,
}Client Usage
The DesktopBridgeClient provides the Rust-side API:
let client = DesktopBridgeClient::new();
// Check if bridge is available
if !client.is_available() {
return Err("Desktop Bridge not running");
}
// Take a screenshot
let result = client.send(DesktopRequest::Screenshot {
region: None,
}).await?;
// Click a button
client.send(DesktopRequest::Click {
ref_id: Some("e7".into()),
x: None,
y: None,
button: MouseButton::Left,
}).await?;
// Type text
client.send(DesktopRequest::TypeText {
ref_id: None,
text: "Hello, world!".into(),
}).await?;Error Handling
pub enum DesktopError {
/// Bridge socket does not exist -- app not running
AppNotRunning,
/// Connection was refused -- app crashed or socket stale
ConnectionFailed(std::io::Error),
/// Protocol error (timeout, malformed response)
Protocol(String),
/// Operation error (bridge returned an error)
Operation(String),
}Module Structure
src/desktop/
├── mod.rs # Module entry, re-exports
├── types.rs # DesktopRequest, DesktopResponse, ScreenRegion, etc.
├── client.rs # DesktopBridgeClient: UDS connection and JSON-RPC
└── error.rs # DesktopError types