Implementing Domain Use Cases in SIARE
This guide explains how to implement domain-specific use cases in SIARE, using the clinical trial design use case as a concrete example. It provides a general pattern that can be applied to any domain requiring autonomous, self-improving multi-agent RAG systems.
Table of Contents
- Understanding SIARE’s Architecture
- The Clinical Trials Use Case
- General Pattern for Use Case Implementation
- Step-by-Step Implementation Guide
- Best Practices
- Example Use Cases
Understanding SIARE’s Architecture
SIARE (Self-Improving Agentic RAG Engine) consists of two main loops:
Inner Loop: Multi-Agent Execution
Task Input
│
↓
ProcessConfig (SOP)
│
├─→ Agent 1 (uses tools, produces outputs)
├─→ Agent 2 (uses outputs from Agent 1)
├─→ Agent 3 (parallel with Agent 2)
│ │
│ ↓
└─→ Agent 4 (synthesizes all outputs)
│
↓
Final Output
Key components:
- ProcessConfig (SOP): Defines agent roles, tools, and collaboration graph
- Roles: Individual agents with specific responsibilities
- Tools: Adapters for external knowledge (vector DB, SQL, web search, APIs)
- Graph: DAG structure with optional conditional edges
Outer Loop: Self-Evolution
Evaluation Results (multi-dimensional metrics)
│
↓
Director Service (AI-powered)
│
├─→ Diagnostician: Analyzes performance, identifies weaknesses
├─→ Architect: Proposes mutations targeting gaps
└─→ Validator: Ensures constraints are met
│
↓
Mutated SOPs
│
↓
Re-evaluate on task set
│
↓
Gene Pool (stores all versions with ancestry)
│
↓
QD Grid (maintains diversity)
│
↓
Pareto Frontier (multi-objective optimization)
│
↓
Human Decision Maker (selects optimal configuration)
Key components:
- Director: AI agent that evolves SOPs (mutations, diagnosis, constraints)
- Gene Pool: Version control for SOPs with performance history
- QD Grid: Quality-Diversity grid for maintaining diverse solutions
- Metrics: Multi-dimensional evaluation (accuracy, cost, latency, domain-specific)
The Clinical Trials Use Case
Problem Statement
Manual tuning of clinical trial design systems is extremely difficult. The system must balance:
- Scientific rigor (evidence-based design)
- Regulatory compliance (FDA requirements)
- Ethical adequacy (patient safety, informed consent)
- Feasibility (recruitment, budget, timeline)
Each trial is unique, and configurations that work in testing often fail with unseen production data.
Solution Architecture
Inner Loop: Specialist Agent Guild
Trial Request (indication, intervention, phase, population)
│
├─→ Medical Researcher
│ ↓ (searches PubMed)
│ Evidence Base, Literature Analysis
│ │
│ ├─→ Regulatory Specialist
│ │ ↓ (queries FDA guidelines)
│ │ Compliance Checklist, Requirements
│ │
│ ├─→ Ethics Officer (parallel)
│ │ ↓ (references ethics frameworks)
│ │ Safety Measures, Informed Consent
│ │
│ └─→ Data Analyst (parallel)
│ ↓ (queries clinical database)
│ Feasibility Data, Recruitment Projections
│ │
│ ↓
└──────────→ Trial Coordinator
↓ (synthesizes all inputs)
Trial Design, Protocol Draft
│
↓ (conditional on complexity)
Quality Reviewer
↓
Final Protocol
Key insight: Specialists collaborate as a “guild,” each contributing domain expertise.
Outer Loop: Performance-Driven Evolution
Scenario: After 10 evaluations, the Diagnostician notices:
- Scientific Accuracy: 0.88 ✓
- Regulatory Compliance: 0.90 ✓
- Ethical Adequacy: 0.89 ✓
- Feasibility: 0.65 ⚠️ ← Weakest dimension
Diagnosis: Recruitment projections are overly optimistic, leading to unrealistic timelines.
Mutations proposed:
- PROMPT_CHANGE for Data Analyst: Add emphasis on historical recruitment challenges
- PARAM_TWEAK: Reduce
min_patient_countthreshold - ADD_ROLE: Insert Feasibility Reviewer for independent validation
Result: After evolution, 3 Pareto-optimal configurations emerge:
- Config A: Max scientific rigor (accuracy 0.92, feasibility 0.70)
- Config B: Balanced (accuracy 0.88, feasibility 0.78)
- Config C: Max feasibility (accuracy 0.85, feasibility 0.85, cost 0.8×)
Human decision: Selects Config C for a resource-constrained trial.
Why This Works in SIARE
- Modularity: Each specialist is an independent role, easy to modify
- Composability: Graph structure allows flexible collaboration patterns
- Evolvability: Director can mutate any aspect (prompts, tools, topology)
- Multi-objective: Pareto frontier respects real-world trade-offs
- Safety: Constraints prevent removal of critical components (mandatory roles, immutable prompt keywords)
General Pattern for Use Case Implementation
Follow this pattern for any domain:
Phase 1: Domain Analysis
- Identify the complex task that requires multiple perspectives or knowledge sources
- Decompose into specialist roles based on expertise areas
- Map knowledge sources to tool adapters (APIs, databases, documents)
- Define evaluation dimensions reflecting domain priorities
- Establish safety constraints (mandatory components, immutable principles)
Phase 2: Architecture Design
- Design the Inner Loop
- Define specialist agent roles with clear responsibilities
- Map dependencies between roles (who needs whose outputs?)
- Select appropriate LLM models for each role (complex vs. simple tasks)
- Identify conditional execution paths (when should certain roles run?)
- Design the Outer Loop
- Define multi-dimensional metrics aligned with domain goals
- Set metric weights reflecting domain priorities
- Specify evolution constraints (what must not change?)
- Plan evolution phases (optimize specialists → refine coordination → balance trade-offs)
Phase 3: Implementation
- Create tool adapters for domain-specific knowledge sources
- Implement ProcessConfig (SOP) with roles, graph, prompts
- Implement evaluation metrics (programmatic or LLM-based)
- Create PromptGenome with role prompts and constraints
- Assemble DomainPackage tying everything together
- Write evaluation tasks representing realistic scenarios
Phase 4: Validation
- Run baseline evaluation on task set
- Execute evolution over multiple generations
- Analyze Pareto frontier for meaningful trade-offs
- Validate with domain experts
- Deploy and monitor in production
Step-by-Step Implementation Guide
Step 1: Define Your Domain Use Case
Questions to answer:
- What complex task requires multiple perspectives?
- What are the key dimensions of quality?
- What knowledge sources are needed?
- What are the safety-critical constraints?
Example (Clinical Trials):
- Task: Design clinical trial protocols
- Dimensions: Scientific accuracy, regulatory compliance, ethical adequacy, feasibility
- Knowledge: PubMed, FDA guidelines, ethics frameworks, clinical databases
- Constraints: Cannot remove patient safety, FDA compliance, informed consent requirements
Step 2: Design Specialist Agents
Template:
RoleConfig(
id="<role_name>",
model="<llm_model>", # gpt-5, gpt-5-mini, etc.
tools=["<tool_id>"], # Tools this role can use
promptRef="<prompt_id>",
inputs=[{"from": "<upstream_role_or_user_input>"}],
outputs=["<output_1>", "<output_2>"], # What this role produces
)
Best practices:
- Single responsibility: Each role has one clear job
- Clear outputs: Name outputs descriptively (e.g.,
literature_analysis, notresult) - Appropriate models: Use cheaper models (gpt-5-mini) for straightforward tasks
- Tool selection: Only assign tools the role actually needs
Example:
RoleConfig(
id="medical_researcher",
model="gpt-5",
tools=["pubmed_search"],
promptRef="medical_researcher_prompt",
inputs=[{"from": "user_input"}],
outputs=["literature_analysis", "evidence_base", "research_gaps"],
)
Step 3: Design the Collaboration Graph
Template:
graph = [
# Sequential dependency
GraphEdge(from_="role_a", to="role_b"),
# Multiple inputs
GraphEdge(from_=["role_a", "role_b"], to="role_c"),
# Conditional execution
GraphEdge(
from_="role_a",
to="role_b",
condition="role_a.complexity == 'high'"
),
]
Best practices:
- Minimize sequential dependencies: Allow parallel execution where possible
- Use conditionals wisely: Skip expensive roles when not needed
- Validate DAG structure: Ensure no cycles, all roles reachable
Example:
graph = [
# Parallel specialist analysis
GraphEdge(from_="user_input", to="medical_researcher"),
GraphEdge(from_=["user_input", "medical_researcher"], to="regulatory_specialist"),
GraphEdge(from_=["user_input", "medical_researcher"], to="ethics_officer"),
GraphEdge(from_=["user_input", "medical_researcher"], to="data_analyst"),
# Synthesis
GraphEdge(
from_=["user_input", "medical_researcher", "regulatory_specialist",
"ethics_officer", "data_analyst"],
to="trial_coordinator"
),
# Conditional final review
GraphEdge(
from_=["trial_coordinator", "regulatory_specialist", "ethics_officer"],
to="quality_reviewer",
condition="trial_coordinator.complexity == 'high'"
),
]
Step 4: Create Tool Adapters
Template:
from siare.adapters.base import ToolAdapter, register_adapter
@register_adapter("my_tool")
class MyToolAdapter(ToolAdapter):
def __init__(self, config: dict[str, Any]):
super().__init__(config)
# Extract config parameters
self.api_key = config.get("api_key")
self.base_url = config.get("base_url")
def initialize(self) -> None:
"""Set up connections, load resources"""
self.session = requests.Session()
self.is_initialized = True
def execute(self, inputs: dict[str, Any]) -> dict[str, Any]:
"""Main execution logic"""
query = inputs.get("query")
# Call external API, query database, etc.
results = self._call_api(query)
return {
"results": results,
"status": "success",
}
def validate_inputs(self, inputs: dict[str, Any]) -> list[str]:
"""Validate inputs before execution"""
errors = []
if "query" not in inputs:
errors.append("Missing required field: query")
return errors
def cleanup(self) -> None:
"""Close connections, free resources"""
if self.session:
self.session.close()
Best practices:
- Error handling: Return error messages, don’t raise exceptions
- Rate limiting: Respect API rate limits (use
time.sleep()if needed) - Caching: Cache expensive API calls when appropriate
- Validation: Check inputs before expensive operations
Step 5: Implement Evaluation Metrics
Template:
from siare.core.models import MetricResult
def calculate_my_metric(
output_a: Dict[str, Any],
output_b: Dict[str, Any],
ground_truth: Dict[str, Any] = None,
) -> MetricResult:
"""
Calculate metric score
Args:
output_a: Output from role A
output_b: Output from role B
ground_truth: Expected results (if available)
Returns:
MetricResult with score 0-1
"""
score = 0.0
details = {}
# Scoring logic
# Component 1 (weight: 0.4)
component_1_score = calculate_component_1(output_a)
score += component_1_score * 0.4
details["component_1"] = component_1_score
# Component 2 (weight: 0.3)
if ground_truth:
component_2_score = calculate_component_2(output_b, ground_truth)
score += component_2_score * 0.3
else:
score += 0.15 # Partial credit if no ground truth
# Component 3 (weight: 0.3)
component_3_score = calculate_component_3(output_a, output_b)
score += component_3_score * 0.3
details["component_3"] = component_3_score
return MetricResult(
metricId="my_metric",
score=round(score, 3),
source="programmatic",
details=details,
)
Best practices:
- Weighted components: Break metrics into sub-components with explicit weights
- Ground truth optional: Support evaluation with and without ground truth
- Details dictionary: Include diagnostic information for debugging
- Score range: Always return 0-1 (SIARE normalizes to this range)
Step 6: Write Role Prompts with Constraints
Template:
from siare.core.models import RolePrompt, PromptConstraints
prompt = RolePrompt(
id="specialist_prompt",
content="""You are a [ROLE TITLE] specializing in [DOMAIN].
Your responsibilities:
1. [Responsibility 1]
2. [Responsibility 2]
3. [Responsibility 3]
Given [INPUTS], provide:
- [output_1]: [Description of what this output should contain]
- [output_2]: [Description]
Use the [TOOL_NAME] tool to [PURPOSE].
Focus on [KEY PRIORITIES].
CRITICAL: [Safety-critical requirements]
""",
constraints=PromptConstraints(
mustNotChange=["critical", "phrase", "list"],
domainTips=[
"Specific guidance for this role",
"Best practices to emphasize",
"Common pitfalls to avoid",
],
),
)
Best practices:
- Clear structure: Use numbered lists for responsibilities and outputs
- Explicit tool usage: Tell the agent which tools to use and why
- Safety-critical keywords: Mark with “CRITICAL:” prefix
- Constraints: Use
mustNotChangefor phrases that ensure safety/compliance - Domain tips: Guide the Director on how to improve this prompt
Step 7: Assemble the Domain Package
Template:
from siare.core.models import DomainPackage, DomainConfig, EvolutionConstraints
def create_my_domain_package() -> DomainPackage:
# 1. Define SOPs (ProcessConfigs)
my_sop = ProcessConfig(
id="my_sop",
version="1.0.0",
description="SOP for my domain",
models={"default": "gpt-5", "analysis": "gpt-5-mini"},
tools=["tool_1", "tool_2"],
roles=[...], # From Step 2
graph=[...], # From Step 3
hyperparameters={...},
)
# 2. Define Prompts (PromptGenome)
my_prompts = PromptGenome(
id="my_prompts",
version="1.0.0",
rolePrompts={...}, # From Step 6
)
# 3. Define Metrics
metrics = [
MetricConfig(
id="metric_1",
type=MetricType.PROGRAMMATIC,
fnRef="my_module.calculate_metric_1",
inputs=["output_a", "output_b"],
weight=0.4,
),
MetricConfig(
id="metric_2",
type=MetricType.LLM_JUDGE,
model="gpt-5",
promptRef="metric_2_judge",
inputs=["output_c"],
weight=0.3,
),
]
# 4. Define Tools
tools = [
ToolConfig(
id="tool_1",
type=ToolType.CUSTOM,
config={...},
),
]
# 5. Define Tasks
tasks = TaskSet(
id="eval_tasks",
domain="my_domain",
version="1.0.0",
tasks=[
Task(
id="task_1",
input={...},
groundTruth={...},
metadata=TaskMetadata(
category="easy",
difficulty="medium",
importance=1.0,
),
),
],
)
# 6. Define Evolution Strategy
domain_config = DomainConfig(
defaultEvolutionConfig={
"phases": [
{
"name": "optimize_specialists",
"allowedMutationTypes": ["prompt_change", "param_tweak"],
"selectionStrategy": "qd_uniform",
"parentsPerGeneration": 4,
"maxGenerations": 15,
},
{
"name": "refine_coordination",
"allowedMutationTypes": ["prompt_change", "rewire_graph"],
"selectionStrategy": "pareto",
"parentsPerGeneration": 3,
"maxGenerations": 10,
},
],
},
aggregationConfig={
"quality_weights": {
"metric_1": 0.4,
"metric_2": 0.3,
"metric_3": 0.3,
},
},
recommendedConstraints=EvolutionConstraints(
budgetLimit=BudgetLimit(
maxEvaluations=100,
maxLLMCalls=500,
maxCost=50.0,
),
maxRoles=10,
maxEdges=20,
allowedTools=["tool_1", "tool_2"],
mandatoryRoles=["critical_role_1", "critical_role_2"],
promptConstraints=PromptConstraints(
mustNotChange=["safety", "critical", "keywords"],
),
),
)
# 7. Assemble Package
return DomainPackage(
id="my_domain",
name="My Domain Name",
version="1.0.0",
description="Package for [domain description]",
sopTemplates=["my_sop"],
promptGenomes=["my_prompts"],
metaConfigs=["my_meta"],
toolConfigs=["tool_1", "tool_2"],
metricConfigs=["metric_1", "metric_2"],
evaluationTasks=["eval_tasks"],
domainConfig=domain_config,
maintainer="Your Name",
documentation="...",
tags=["domain", "tag1", "tag2"],
)
Step 8: Create Demonstration Script
Template:
import asyncio
import logging
async def run_demo():
# 1. Setup
package = create_my_domain_package()
tasks = create_example_tasks()
# 2. Inner Loop Demo
logger.info("INNER LOOP: Specialist Agent Collaboration")
for task in tasks[:1]: # Demo with first task
results = await run_inner_loop(package, task)
display_results(results)
# 3. Outer Loop Demo
logger.info("OUTER LOOP: Self-Evolution")
demonstrate_diagnosis(results)
demonstrate_mutations(results, package)
# 4. Human-in-the-Loop Demo
logger.info("HUMAN-IN-THE-LOOP: Pareto Frontier")
demonstrate_pareto_frontier()
if __name__ == "__main__":
asyncio.run(run_demo())
Best Practices
Architecture Design
- Start simple, iterate: Begin with 3-4 roles, add complexity as needed
- Parallel when possible: Minimize sequential dependencies for speed
- Conditional execution: Skip expensive roles when not needed
- Clear interfaces: Define explicit inputs/outputs between roles
Prompt Engineering
- Structure over prose: Use numbered lists, clear sections
- Explicit tool usage: Tell agents which tools to use and when
- Output format: Specify exactly what each output should contain
- Safety first: Mark critical requirements with “CRITICAL:” or “IMPORTANT:”
Evaluation Metrics
- Multi-dimensional: Don’t rely on a single metric
- Weighted components: Make trade-offs explicit
- Ground truth optional: Support evaluation without perfect labels
- Programmatic + LLM: Mix fast programmatic checks with nuanced LLM judges
Evolution Strategy
- Phase-based evolution: Start with specialists, then coordination, then global optimization
- Conservative constraints: Protect safety-critical components
- Diverse selection: Use QD Grid to maintain diversity
- Human oversight: Always provide Pareto frontier for human decision
Testing
- Unit test adapters: Test tool adapters in isolation
- Integration test SOP: Test full pipeline execution
- Metric validation: Verify metrics produce expected scores
- Evolution simulation: Test mutation logic without full evolution
Example Use Cases
Use Case 1: Legal Contract Review
Problem: Review complex contracts for risks, compliance, and negotiation opportunities.
Specialist Agents:
- Legal Analyst: Identifies legal risks and compliance issues
- Financial Analyst: Assesses financial implications
- Risk Manager: Evaluates liability and mitigation strategies
- Domain Expert: Provides industry-specific context
- Contract Coordinator: Synthesizes recommendations
Knowledge Sources:
- Legal precedent database (SQL)
- Regulatory text corpus (vector search)
- Company contract templates (vector search)
- Financial models (API)
Metrics:
- Risk identification completeness (0.3)
- Regulatory compliance accuracy (0.3)
- Negotiation recommendation quality (0.2)
- Financial impact assessment (0.2)
Use Case 2: Software Architecture Design
Problem: Design optimal software architecture for complex requirements.
Specialist Agents:
- Requirements Analyst: Extracts functional and non-functional requirements
- Security Architect: Identifies security requirements and patterns
- Performance Engineer: Designs for scalability and performance
- Integration Specialist: Plans external system integrations
- Architecture Coordinator: Creates coherent architecture
Knowledge Sources:
- Architecture pattern library (vector search)
- Security best practices (vector search)
- Performance benchmarks (SQL database)
- API documentation (web search)
Metrics:
- Requirements coverage (0.25)
- Security posture (0.25)
- Scalability potential (0.20)
- Implementation feasibility (0.20)
- Cost efficiency (0.10)
Use Case 3: Financial Report Analysis
Problem: Analyze financial reports for investment decisions.
Specialist Agents:
- Financial Analyst: Analyzes financial statements and ratios
- Market Analyst: Assesses market position and competition
- Risk Analyst: Evaluates financial risks
- Industry Expert: Provides sector-specific insights
- Investment Advisor: Synthesizes investment recommendation
Knowledge Sources:
- Financial statements database (SQL)
- Market data API (real-time)
- News and filings (web search)
- Industry reports (vector search)
Metrics:
- Financial analysis accuracy (0.3)
- Risk assessment completeness (0.25)
- Market context quality (0.20)
- Recommendation clarity (0.15)
- Timeliness (0.10)
Use Case 4: Scientific Literature Review
Problem: Conduct comprehensive literature reviews for research.
Specialist Agents:
- Search Strategist: Designs optimal search strategy
- Paper Screener: Filters relevant papers
- Methodology Critic: Assesses research quality
- Data Synthesizer: Extracts and synthesizes findings
- Review Coordinator: Creates structured review
Knowledge Sources:
- Academic databases (PubMed, arXiv, etc.)
- Citation graph (network database)
- Research methodology guides (vector search)
Metrics:
- Search comprehensiveness (0.25)
- Paper relevance (0.25)
- Methodological rigor (0.20)
- Synthesis quality (0.20)
- Citation completeness (0.10)
Summary: The SIARE Use Case Pattern
Every SIARE use case follows this pattern:
1. ANALYZE DOMAIN
├─ Complex task decomposition
├─ Identify knowledge sources
├─ Define quality dimensions
└─ Establish safety constraints
2. DESIGN ARCHITECTURE
├─ Specialist agents (3-6 roles)
├─ Collaboration graph (DAG)
├─ Tool adapters
└─ Evaluation metrics
3. IMPLEMENT COMPONENTS
├─ ProcessConfig (SOP)
├─ PromptGenome (role prompts)
├─ Tool adapters
├─ Metric functions
└─ DomainPackage
4. EVOLVE AND OPTIMIZE
├─ Run baseline evaluation
├─ Execute evolution (Director)
├─ Analyze Pareto frontier
└─ Select optimal configuration
5. DEPLOY AND MONITOR
├─ API deployment
├─ Production monitoring
├─ Continuous evolution
└─ Human feedback loop
The key insight: Treat the RAG pipeline configuration as a searchable space, and use AI-driven evolution to discover optimal strategies.
Next Steps
- Choose your domain: Identify a complex task in your organization
- Follow the pattern: Use this guide to implement your use case
- Start simple: Begin with 3-4 agents, iterate based on results
- Evolve: Let SIARE discover optimal configurations
- Deploy: Use the API to integrate into production systems
For more examples, see:
siare/domains/rag_package.py- Generic RAG use casesiare/domains/clinical_trials_package.py- Clinical trials use caseexamples/- Demonstration scripts and metrics
Questions? Check the main documentation:
docs/architecture/SYSTEM_ARCHITECTURE.md- System architecturedocs/PRD.md- Product requirementsCLAUDE.md- Development guide