SIARE System Architecture
Version: 2.0 (Consolidated) Status: Production Reference
This document consolidates the complete system architecture for SIARE (Self-Improving Agentic RAG Engine).
Open-Core Architecture: SIARE follows an open-core model:
- siare (this package): MIT-licensed core with evolution engine, gene pool, QD grid, execution, evaluation
- siare-cloud (enterprise): Proprietary features including auth, billing, audit, approval workflows
Sections marked with 🔒 describe enterprise features not included in this open-source package.
1. Overview & Design Principles
SIARE treats RAG pipeline configuration as a searchable space and uses AI-driven evolution to discover optimal strategies.
Core Principles
- Config over code: Pipelines defined as data (SOP configs), not hard-coded flows
- Separation of concerns: Execution, Evaluation, Optimization, Domain adapters
- Multi-objective by default: Accuracy, safety, cost, latency are explicit metrics
- Human-in-the-loop: Humans approve which SOPs are promoted and deployed
- Evolution as first-class: Built to iterate on its own configurations
- Quality-Diversity: Maintain diverse, high-performing strategies (MAP-Elites)
Data Flow
ProcessConfig (SOP) → ExecutionEngine → EvaluationVector → Director → Mutated SOP → GenePool
2. Core Components
2.1 Config Store
Persists and versions all configuration artifacts:
ProcessConfig(SOPs) - Multi-agent pipeline definitionsPromptGenome- Evolvable prompts per roleMetaConfig- Director/judge promptsMetricConfig- Evaluation metricsToolConfig- Tool adapter configurations
2.2 Execution Engine (Guild Runner)
Builds and runs multi-agent DAGs:
- Nodes =
RoleConfigs bound withRolePrompts - Edges =
GraphEdgeentries with optional conditions - Conditional execution via safe eval (regex-validated, keyword blacklist)
- Emits execution traces with node inputs/outputs, timings, tokens
2.3 Evaluation Service
Runs metrics over execution traces:
- LLM Judge: Model-based evaluation with textual critiques
- Programmatic: Custom functions (accuracy, compliance checks)
- Runtime: Latency, tokens, cost derived from traces
- Human: Slots for expert ratings
Statistical Aggregation:
- Bootstrap confidence intervals (95% CI)
- Benjamini-Hochberg correction for multiple comparisons
- Configurable per metric: mean, median, min, max, p95
- Task weighting support
2.4 Director Service
AI brain that evolves SOPs through diagnosis and mutation:
Diagnostician:
- Analyzes
EvaluationVectorscores + artifacts - Produces
Diagnosiswith weaknesses, strengths, root causes
Architect:
- Generates
SOPMutations based on diagnosis - Validates constraints before mutations
- Supports 6 mutation types
2.5 Gene Pool
Stores SOP versions and performance:
- Ancestry tracking (parent → child lineage)
- Semantic versioning (patch/minor/major)
- Aggregated metrics per SOP
- QD grid integration
2.6 QD Grid Manager (MAP-Elites)
Maintains diverse population of high-performing SOPs:
- Discretizes feature space into cells
- Maintains highest-quality SOP per occupied cell
- Enables diversity-aware sampling
2.7 Evolution Scheduler
Orchestrates the evolution loop:
- Manages job lifecycle and phase transitions
- Selects parents using configurable strategies
- Coordinates execution → evaluation → mutation cycle
- Tracks budget and convergence
3. Evolution Engine
3.1 Evolution Loop
Generation N
│
â–¼
1. Parent Selection (from QD Grid + Pareto)
│
â–¼
2. Mutation (Director generates SOP variants)
│
â–¼
3. Evaluation (Run on task set, compute metrics)
│
â–¼
4. Gene Pool Update (Add to pool, update QD Grid & Pareto)
│
â–¼
5. Termination Check (Budget, convergence, max generations)
│
â–¼
Next Generation
3.2 Selection Strategies (8 Implemented)
| Strategy | Description |
|---|---|
| Tournament | k-way tournament selection |
| Roulette Wheel | Probability proportional to fitness |
| Rank-Based | Linear ranking with selection pressure |
| Elitist | Always include top performers |
| Fitness Proportionate | Selection proportional to fitness |
| Truncation | Select top k% only |
| Stochastic Universal Sampling | Evenly spaced selection points |
| Lexicase | Filter by random metric order |
3.3 Mutation Types (6 Implemented)
| Type | Scope | Description |
|---|---|---|
PROMPT_CHANGE | Prompts | Modify role prompts |
PARAM_TWEAK | Parameters | Adjust hyperparameters (temperature, max_tokens) |
ADD_ROLE | Topology | Add new specialist role |
REMOVE_ROLE | Topology | Remove underperforming role |
REWIRE_GRAPH | Topology | Change edge connections |
CROSSOVER | Structure | Combine two SOPs (stubbed) |
3.4 Multi-Phase Evolution
phases = [
{
"name": "Phase 1: Prompt Refinement",
"allowedMutationTypes": ["prompt_change", "param_tweak"],
"maxGenerations": 20
},
{
"name": "Phase 2: Structural Exploration",
"allowedMutationTypes": ["add_role", "rewire_graph", "prompt_change"],
"maxGenerations": 30
},
{
"name": "Phase 3: Fine-Tuning",
"allowedMutationTypes": ["param_tweak", "prompt_change"],
"maxGenerations": 10
}
]
4. Quality-Diversity Grid
4.1 Feature Space
Each SOP characterized by:
Complexity (normalized 0-1):
complexity = 0.4 * numRoles + 0.3 * avgChainDepth + 0.2 * numEdges + 0.1 * avgPromptLength
Diversity Embedding (384-dim):
- Sentence embedding of concatenated prompts
- Captures semantic diversity
Domain Features (configurable):
- Domain-specific metrics as features
- Example: ethics_focus, regulatory_coverage, safety_emphasis
4.2 Grid Operations
- Cell Assignment: Map SOP features to grid cell
- Elite Replacement: Higher quality replaces current elite
- Sampling: Uniform, curiosity-driven, or quality-weighted
4.3 Pareto Frontier
Computes non-dominated SOPs across metrics:
- Incremental update for efficiency
- Periodic full recomputation
- Integration with QD grid for selection
5. Metric Aggregation & Statistics
5.1 Aggregation Methods
| Method | Use Case |
|---|---|
mean | Average performance (default) |
median | Robust to outliers |
min | Worst-case performance |
max | Best-case performance |
p95 | High percentile |
5.2 Statistical Rigor
Bootstrap Confidence Intervals:
# 10,000 bootstrap samples
# 95% confidence interval (2.5th, 97.5th percentile)
ci_low, ci_high = bootstrap_ci(scores, n_bootstrap=10000)
Multiple Comparison Correction:
# Benjamini-Hochberg procedure for FDR control
adjusted_p_values = benjamini_hochberg(raw_p_values)
Task Weighting:
- Tasks with
weightfield contribute proportionally - Higher weight = more influence on aggregate
6. Error Handling & Fault Tolerance
6.1 Implemented Components
| Component | Purpose |
|---|---|
| CircuitBreaker | 3-state pattern (CLOSED, OPEN, HALF_OPEN) |
| RetryHandler | Exponential backoff with jitter |
| CheckpointManager | Job checkpoint save/restore |
6.2 Transaction Semantics
- SOPGene insertion: Atomic via database transaction
- EvaluationVector writes: Batch-wise with all-or-nothing
- Failed runs stored with
status: "failed" - Gene Pool write failures: Rollback entire generation
6.3 Recovery Strategy
| Failure Type | Response |
|---|---|
| Execution failure | Retry 3x with exponential backoff |
| Evaluation failure | Mark run as failed, continue batch |
| Director failure | Halt generation, preserve state, alert |
| Gene Pool write failure | Rollback entire generation |
7. Safety & Governance 🔒
Enterprise Feature: Full safety governance is available in siare-cloud. The open-source core includes basic safety validation.
7.1 Deployment Gates
Multi-Stage Approval:
- Auto-validation (schema checks, safety thresholds)
- Domain expert review (performance, risk)
- Production approval (authorized stakeholders)
7.2 Safety Validator
- Toxicity detection
- Bias detection
- PII detection
- Constraint validation (
mustNotChange)
7.3 Emergency Controls
- Kill Switch: Emergency SOP deactivation
- Auto-Rollback: Revert to last known good version
- Runtime Monitoring: Continuous safety metric tracking
8. Governance & Deployment 🔒
Enterprise Feature: Approval workflows, deployment services, and auth are available in siare-cloud.
8.1 Approval Workflow Engine
Multi-stage approval for SOP deployments:
Deployment Request
│
â–¼
Stage 1: Auto-Validation (safety checks, schema)
│
â–¼
Stage 2: Domain Expert Review
│
â–¼
Stage 3: Production Approval
│
â–¼
Deployed
Approval Types:
AUTO: Automated validation (safety thresholds, schema checks)SINGLE: One approver from required rolesMULTI: Multiple approvers (quorum-based)UNANIMOUS: All required approvers must approve
Key Features:
- Configurable approval chains per environment
- Timeout and escalation policies
- Conditional approvals with comments
- Approval history and audit trail
8.2 Deployment Service
Manages SOP lifecycle across environments:
Deployment States:
CREATED → PENDING → VALIDATING → APPROVED → DEPLOYING → DEPLOYED → ACTIVE
↓
REJECTED
↓
DENIED
Environment Support:
development: Unrestricted deploymentstaging: Domain expert approval requiredproduction: Full approval workflow + safety validation
Version Management:
- Immutable deployment versions
- Rollback to previous versions
- A/B deployment support (planned)
8.3 Authentication & Authorization
Authentication Service:
- API key authentication (
SIARE_API_KEYenvironment variable) - JWT token support (configurable)
- OAuth/SAML integration (planned)
- Backward compatible demo mode
Authorization Service:
- Resource-based access control (RBAC)
- Scope-based permissions (domain, resource ID)
- Conditional permissions (time, MFA)
- Role inheritance
Predefined Roles: | Role | Capabilities | |——|————-| | admin | Full system access | | domain_owner | Full access to assigned domains | | researcher | Run evolution, no deployment | | viewer | Read-only access |
8.4 Prompt Evolution Framework
Three optimization strategies for automatic prompt improvement:
TextGrad Strategy:
- Textual gradient descent with LLM-generated critiques
- Analyzes failure patterns and generates targeted improvements
- Best for: Iterative refinement of specific issues
EvoPrompt Strategy:
- Evolutionary algorithms (GA/DE) for prompt populations
- Maintains population diversity through crossover and mutation
- Best for: Exploring diverse prompt variations
MetaPrompt Strategy:
- LLM meta-analysis for targeted improvements
- High-level structural changes based on failure analysis
- Best for: Significant prompt restructuring
Adaptive Strategy Selector:
- Auto-selects optimal strategy based on failure patterns
- Considers: failure type, iteration count, improvement rate
- Falls back to ensemble approach for complex failures
Key Components:
PromptParser: Section-based prompt parsing (markdown/LLM)FeedbackExtractor: Extract actionable feedback from critiquesSectionMutator: Apply targeted mutations to prompt sectionsConstraintValidator: Ensure mutations respectmustNotChangeconstraints
9. Observability & Operations
Note: Audit Service (9.1) and Safety Monitor (9.2) are enterprise features 🔒. Checkpoint Manager (9.3) is included in the open-source core.
9.1 Audit Service 🔒
Comprehensive logging of system operations:
Audit Events:
- Authentication events (login, logout, token refresh)
- Authorization decisions (permitted/denied)
- SOP modifications (create, update, deploy)
- Evolution job lifecycle events
- Approval workflow events
Event Classification:
SECURITY: Authentication, authorization, accessOPERATION: CRUD operations, job managementSYSTEM: Health, errors, recovery
9.2 Safety Monitor 🔒
Runtime safety metric tracking:
Monitored Metrics:
- Response toxicity scores
- Bias detection results
- PII exposure risk
- Safety threshold violations
Alert Conditions:
- Metric degradation (>10% below baseline)
- Threshold violation (configurable)
- Anomaly detection (statistical)
9.3 Checkpoint Manager
Job checkpointing for fault tolerance:
Checkpoint Contents:
- Current generation state
- Gene pool snapshot
- QD grid state
- Evolution job configuration
Recovery Semantics:
- Resume from last checkpoint
- Partial generation recovery
- Automatic checkpoint on phase completion
10. API Reference Summary 🔒
Enterprise Feature: The REST API server is available in siare-cloud. The open-source core provides CLI (
siare init,siare evolve,siare run) and Python library interfaces.
10.1 Core Endpoints
| Endpoint | Method | Description |
|---|---|---|
/v1/run | POST | Execute single SOP |
/v1/run/batch | POST | Batch execution |
/v1/jobs | POST | Start evolution job |
/v1/jobs/{id} | GET | Job status |
/v1/jobs/{id}/stream | WebSocket | Real-time updates |
/v1/gene-pool | GET | Query SOPs |
/v1/frontier | GET | Pareto-optimal SOPs |
/v1/qd-grid | GET | QD grid data |
10.2 Configuration Endpoints
| Endpoint | Method | Description |
|---|---|---|
/v1/sops | GET/POST | SOP management |
/v1/prompts | GET/POST | PromptGenome management |
/v1/metrics | GET/POST | MetricConfig management |
/v1/tools | GET/POST | ToolConfig management |
10.3 Governance Endpoints 🔒
| Endpoint | Method | Description |
|---|---|---|
/v1/deployments | POST | Initiate promotion |
/v1/deployments/{id}/approve | POST | Approve stage |
/v1/deployments/{id}/rollback | POST | Rollback |
/v1/kill-switch/{sopId} | POST | Emergency stop |
10.4 Authentication Endpoints 🔒
| Endpoint | Method | Description |
|---|---|---|
/v1/auth/validate | GET | Validate API key |
/v1/auth/me | GET | Current user info |
10.5 Prompt Evolution Endpoints
| Endpoint | Method | Description |
|---|---|---|
/v1/prompts/{id}/optimize | POST | Trigger prompt optimization |
/v1/prompts/{id}/history | GET | Prompt evolution history |
11. Data Models Reference
See docs/architecture/DATA_MODELS.md for complete Pydantic model definitions.
Key Models
ProcessConfig: SOP definition with roles, graph, toolsRoleConfig: Individual agent configurationGraphEdge: DAG edge with optional conditionPromptGenome: Evolvable prompts per roleMetricConfig: Evaluation metric definitionEvaluationVector: Scores + artifacts from evaluationSOPGene: Gene pool entry with ancestryDiagnosis: Director analysis outputSOPMutation: Mutation proposal with rationale
12. Extension Points
Adding a Tool Adapter
from siare.adapters.base import ToolAdapter, register_adapter
@register_adapter("my_tool")
class MyToolAdapter(ToolAdapter):
def initialize(self) -> None:
self.is_initialized = True
def execute(self, inputs: dict) -> dict:
return {"result": "..."}
def validate_inputs(self, inputs: dict) -> list[str]:
return []
Adding a Metric
def my_metric(trace: ExecutionTrace, task_data: dict) -> float:
return 0.85 # Score between 0 and 1
evaluation_service.register_metric_function("my_metric", my_metric)
Adding a Mutation Type
- Add to
MutationTypeenum insiare/core/models.py - Implement
_apply_<type>_mutation()insiare/services/director.py - Add to mutation dispatch in
mutate_sop() - Write tests
Last Updated: 2025-12-17