Intent Detection
Three-layer intent classification and execution routing that determines whether user input triggers task execution or conversational response.
Intent Detection is the decision-making subsystem that classifies every user message before it reaches the LLM. Its core question is simple: should this input trigger an executable action, or is it a conversational message? By resolving this upfront, prompts only need to describe how to execute, never whether to execute.
Three-Layer Architecture
Classification proceeds through three layers of increasing latency and sophistication. The pipeline short-circuits as soon as any layer produces a confident match.
User Input
│
▼
┌─────────────────────────────────────────────────────────────┐
│ L1: Regex Matching (<5ms, confidence = 1.0) │
│ Fast pattern matching for explicit commands │
│ Example: "整理文件夹里的文件" → FileOrganize │
└─────────────────────────────────────────────────────────────┘
│ no match
▼
┌─────────────────────────────────────────────────────────────┐
│ L2: Keyword Matching (<20ms, confidence = 0.5-0.95) │
│ KeywordIndex with weighted scoring + CJK tokenization │
│ Configurable via KeywordPolicy in config.toml │
│ Fallback: static keyword sets │
└─────────────────────────────────────────────────────────────┘
│ no match
▼
┌─────────────────────────────────────────────────────────────┐
│ L3: AI Classification (optional, 1-3s) │
│ LLM-based detection for complex/ambiguous cases │
│ Language-agnostic, extracts parameters (path, etc.) │
└─────────────────────────────────────────────────────────────┘
│ no match or AI disabled
▼
ExecutionIntent::ConversationalL1: Regex Matching
The fastest layer uses pre-compiled regex patterns to match explicit task commands in both Chinese and English. Matches carry full confidence (1.0) since regex patterns only fire on unambiguous requests.
pub static EXECUTABLE_PATTERNS: Lazy<Vec<(Regex, TaskCategory)>> = Lazy::new(|| {
vec![
// FileOrganize: organize/sort/classify + file
(
Regex::new(r"(?i)(整理|归类|分类|organize|sort|classify).*(文件|files?|folder)")
.unwrap(),
TaskCategory::FileOrganize,
),
// FileTransfer: move/copy/transfer + to
(
Regex::new(r"(?i)(移动|复制|拷贝|转移|move|copy|transfer).*(到|to)")
.unwrap(),
TaskCategory::FileTransfer,
),
// CodeExecution: run/execute + script/code
(
Regex::new(r"(?i)(运行|执行|跑一下|run|execute).*(脚本|代码|script|code)")
.unwrap(),
TaskCategory::CodeExecution,
),
// ... additional patterns for FileCleanup, DocumentGenerate, etc.
]
});L1 also extracts file paths from input using a dedicated path regex that handles Unix paths (/path, ~/path) and Windows paths (C:\path).
L2: Keyword Matching
When regex fails, the keyword layer uses two strategies:
Static keyword sets -- Predefined verb-noun combinations for each task category. Both a verb and a noun must match for the category to fire (confidence 0.85).
KeywordIndex (enhanced) -- A configurable index loaded from KeywordPolicy in config.toml. Each rule carries weighted keywords and a minimum score threshold:
let mut rule = KeywordRule::new("file_organize", "FileOrganize");
rule = rule
.with_keyword("整理", 1.0) // organize (Chinese)
.with_keyword("文件", 0.8) // file
.with_keyword("organize", 1.0)
.with_match_mode(KeywordMatchMode::Weighted)
.with_min_score(0.5);Match modes include Any (first keyword wins), All (every keyword required), and Weighted (sum of matched keyword weights against threshold).
L3: AI Classification
For ambiguous inputs that escape L1 and L2, an optional LLM-based detector (AiIntentDetector) provides language-agnostic classification. It returns structured results including the intent type, confidence score, and extracted parameters:
pub struct AiIntentResult {
pub intent: String, // e.g., "file_organize"
pub confidence: f64, // 0.0 - 1.0
pub params: HashMap<String, String>, // e.g., {"path": "/Downloads"}
pub missing: Vec<String>, // parameters still needed
}L3 maps its output to the same TaskCategory enum used by L1 and L2, so downstream routing is uniform regardless of which layer classified the input.
Exclusion Patterns
A critical safety feature: inputs containing analysis or understanding verbs are excluded from agent mode before any layer runs. This prevents requests like "analyze this file" or "summarize this document" from triggering destructive file operations.
Exclusion verbs include:
- Chinese: 分析, 理解, 解释, 总结, 摘要, 描述, 概括, 看看
- English: analyze, understand, explain, summarize, describe, review
The exclusion check runs at constant time and takes priority over all classification layers.
TaskCategory
The TaskCategory enum represents the 20 categories of executable tasks that Aleph can handle. Each category maps to specific tools and prompt templates:
pub enum TaskCategory {
General, // Explicit /agent command
FileOrganize, // Sort, classify files
FileOperation, // Read, write, search
FileTransfer, // Move, copy
FileCleanup, // Delete, archive
CodeExecution, // Run scripts/commands
AppLaunch, // Open applications
AppAutomation, // UI automation
DocumentGeneration, // Create documents
ImageGeneration, // Generate images
VideoGeneration, // Generate video
AudioGeneration, // Generate audio
SpeechGeneration, // Text-to-speech
WebSearch, // Search the web
WebFetch, // Fetch page content
SystemInfo, // System queries
MediaDownload, // YouTube, etc.
TextProcessing, // Translation, summarization
DataProcess, // Data transformation
}Categories expose semantic helpers: is_file_related(), is_generation(), and is_read_only() that downstream systems use for risk assessment and tool filtering.
ExecutionIntent
The three-valued classification result that drives all downstream behavior:
pub enum ExecutionIntent {
/// Trigger Agent mode with tools
Executable(ExecutableTask),
/// Need one clarifying question
Ambiguous { task_hint: String, clarification: String },
/// Normal conversational response
Conversational,
}
pub struct ExecutableTask {
pub category: TaskCategory,
pub action: String,
pub target: Option<String>, // extracted path or object
pub confidence: f32, // 0.0 - 1.0
}ExecutionIntentDecider
The ExecutionIntentDecider is a higher-level decision system that extends classification into routing. It determines not just whether to execute, but how -- routing to the appropriate execution mode:
User Input
│
▼
┌─────────────────────────────────────────────────────────────┐
│ ExecutionIntentDecider │
│ │
│ L0: Slash Commands (/screenshot, /ocr) → DirectTool │
│ L1: Regex Patterns → Execute(category) │
│ L2: Context Signals (selected file) → Execute(category) │
│ L3: Semantic Analysis (optional LLM) → Execute | Converse │
│ L4: Default Fallback → Execute (bias toward action) │
│ │
└─────────────────────────────────────────────────────────────┘
│
▼
ExecutionModeExecutionMode
The decider routes to one of six execution modes:
| Mode | Description | Example |
|---|---|---|
| DirectTool | Built-in tool invocation | /screenshot, /search |
| Skill | Skill with injected instructions | /knowledge-graph |
| Mcp | MCP server tool execution | /git status |
| Custom | Custom command with system prompt | /translate |
| Execute | AI with tools (by TaskCategory) | "organize my files" |
| Converse | Pure conversation, no tools | "what is machine learning?" |
Context Signals
The L2 layer of ExecutionIntentDecider uses ambient context to inform routing without requiring explicit commands:
pub struct ContextSignals {
pub selected_file: Option<String>, // file selected in UI
pub active_app: Option<String>, // current application
pub ui_mode: Option<String>, // panel/mode state
pub clipboard_type: Option<String>, // "image", "text", etc.
}For example, if the user has selected a .jpg file and says "process this", the context layer routes to ImageGeneration without any keyword matching.
IntentCache
The IntentCache provides fast-path routing for repeated patterns. It uses an LRU cache with time-based confidence decay and success/failure tracking:
pub struct IntentCache {
cache: Arc<RwLock<LruCache<u64, CachedIntent>>>,
config: CacheConfig,
metrics: Arc<RwLock<CacheMetrics>>,
}Key behaviors:
- Confidence decay: Entries lose confidence over time using exponential decay with configurable half-life (default: 1 hour)
- Adaptive learning:
record_success()andrecord_failure()track execution outcomes, adjusting the effective confidence viaadjusted_confidence = decayed_confidence * success_rate - Auto-eviction: Entries with more than 3 failures and zero successes are automatically removed
- Input normalization: Keys are hashed from lowercase, trimmed, first-100-character input for fuzzy matching
Configuration
pub struct CacheConfig {
pub capacity: usize, // default: 1000
pub half_life_secs: f32, // default: 3600 (1 hour)
pub min_confidence: f32, // default: 0.5
pub enabled: bool, // default: true
}Confidence Calibration
The ConfidenceCalibrator adjusts raw classification scores using historical execution data. It tracks which routing layers (L1, L2, L3) produce reliable results for specific tool types and applies learned adjustments:
pub struct CalibratedSignal {
pub intent_type: String,
pub tool_name: String,
pub raw_confidence: f32,
pub calibrated_confidence: f32,
pub layer: RoutingLayer,
}The IntentAggregator then combines calibrated signals into a final AggregatedIntent with an action recommendation: Execute, Confirm, Clarify, or GeneralChat.
Classification Flow Summary
User Input
│
├─ length < 3 chars ──────────────────► Conversational
│
├─ check IntentCache ──── hit ────────► Cached route
│
├─ L1 Regex (<5ms) ──── match ────────► Executable (1.0)
│
├─ L2 Enhanced Keywords ── match ─────► Executable (0.5-0.95)
│
├─ L2 Static Keywords ── match ───────► Executable (0.85)
│
├─ L3 AI Detector ──── match ─────────► Executable (AI conf.)
│
└─ no match ──────────────────────────► ConversationalAfter classification, the IntentRouter wraps the result for the Agent Loop: slash commands and direct tools become DirectRoute (bypassing LLM thinking entirely), while category-matched and ambiguous inputs become NeedsThinking with optional category hints that guide tool filtering and prompt selection.