Generation
Media generation provider abstraction supporting images, video, audio, and speech through a unified trait-based interface with multiple backend providers.
The generation module defines a unified interface for media generation backends. Whether the agent needs an image, a video clip, background music, or text-to-speech, all providers implement the same GenerationProvider trait for consistent interaction.
Design Philosophy
The generation system follows three principles:
- Trait-based abstraction — All backends (DALL-E, Stable Diffusion, ElevenLabs, etc.) implement the same interface
- Type-safe requests —
GenerationRequestcarries type, prompt, and parameters in a single struct - Graceful degradation — Optional features (progress checking, cancellation, image editing) have default "not supported" implementations
Core Types
GenerationRequest
pub struct GenerationRequest {
pub generation_type: GenerationType,
pub prompt: String,
pub params: GenerationParams,
pub request_id: Option<String>,
pub user_id: Option<String>,
}Builder pattern:
let request = GenerationRequest::image("A sunset over mountains")
.with_params(
GenerationParams::builder()
.width(1024)
.height(1024)
.quality("hd")
.build()
)
.with_request_id("req-001");GenerationType
pub enum GenerationType {
Image,
Video,
Audio,
Speech,
Transcription,
}GenerationOutput
pub struct GenerationOutput {
pub generation_type: GenerationType,
pub data: GenerationData,
pub metadata: GenerationMetadata,
pub request_id: Option<String>,
}Data variants:
GenerationData::Url(String)— Remote URLGenerationData::Base64(String)— Inline base64 dataGenerationData::Bytes(Vec<u8>)— Raw bytes
GenerationProvider Trait
pub trait GenerationProvider: Send + Sync {
fn generate(
&self,
request: GenerationRequest,
) -> Pin<Box<dyn Future<Output = GenerationResult<GenerationOutput>> + Send + '_>>;
fn name(&self) -> &str;
fn supported_types(&self) -> Vec<GenerationType>;
fn supports(&self, gen_type: GenerationType) -> bool;
fn color(&self) -> &str; // Brand color for UI
fn default_model(&self) -> Option<&str>;
// Optional features with default "not supported" implementations
fn check_progress(&self,
_job_id: &str,
) -> Pin<Box<dyn Future<... >> { /* default: Err(UnsupportedFeature) */ }
fn cancel(&self,
_job_id: &str,
) -> Pin<Box<dyn Future<... >> { /* default: Err(UnsupportedFeature) */ }
fn edit_image(&self,
_request: GenerationRequest,
) -> Pin<Box<dyn Future<... >> { /* default: Err(UnsupportedFeature) */ }
fn list_voices(&self) -> Vec<VoiceInfo> { vec![] }
}Thread safety: The trait extends Send + Sync so providers can be stored in Arc<dyn GenerationProvider> and shared across async tasks.
Provider Registry
pub struct GenerationProviderRegistry {
providers: HashMap<String, Arc<dyn GenerationProvider>>,
}
impl GenerationProviderRegistry {
pub fn register(
&mut self,
provider: Arc<dyn GenerationProvider>,
) { /* ... */ }
pub fn get(
&self,
name: &str,
) -> Option<Arc<dyn GenerationProvider>> { /* ... */ }
pub fn names_for_type(
&self,
gen_type: GenerationType,
) -> Vec<String> { /* ... */ }
}Determinism: names() and names_for_type() return sorted vectors.
Supported Providers
| Provider | Image | Video | Audio | Speech |
|---|---|---|---|---|
| DALL-E | ✅ | — | — | — |
| Stable Diffusion | ✅ | — | — | — |
| Midjourney | ✅ | — | — | — |
| Runway | — | ✅ | — | — |
| Pika | — | ✅ | — | — |
| Sora | — | ✅ | — | — |
| Suno | — | — | ✅ | — |
| Udio | — | — | ✅ | — |
| ElevenLabs | — | — | — | ✅ |
| OpenAI TTS | — | — | — | ✅ |
Response Parsing
The module includes a parser for extracting generation requests from agent responses:
pub fn parse_generation_requests(
text: &str,
) -> ParseResult<Vec<ParsedGenerationRequest>>This allows the agent to express generation intents in natural language, which are then parsed into structured GenerationRequest objects.
Mock Provider
For testing:
let provider = MockGenerationProvider::new("mock")
.with_color("#10a37f")
.with_types(vec![GenerationType::Image]);
let request = GenerationRequest::image("test");
let output = provider.generate(request).await.unwrap();Safety Properties
- No
.expect()on user paths —MidjourneyProvider::new()returnsResult, not panics - Division by zero guarded —
determine_aspect_ratio()checksh == 0 - No silent truncation —
u32::try_from(s).ok()for seed conversion (returnsNonefor out-of-range) - No lock issues — Module has no shared mutable state
- No
static mut— UsesLazyLockfor regex patterns
Code Location
src/generation/mod.rs—GenerationProvidertrait and mock providersrc/generation/types.rs— Request/output types and buildersrc/generation/error.rs—GenerationErrortyped errorssrc/generation/registry.rs— Provider registrysrc/generation/response_parser.rs— Natural language request parsingsrc/generation/providers/— Provider implementations
See Also
- Builtin Tools — How generation tools are exposed to the agent
- Media Processing — How generated media is processed
Tool Infrastructure
Tool system infrastructure including schema generation, execution, repair, registry, and middleware pipeline.
Media Processing
Multimodal media processing pipeline handling attachment download, caching, format detection, image injection, audio transcription, and vision-based understanding.