Configuration Guide¶
This document describes the SAGE-X configuration system, including all configuration fields, their purposes, and how to write configuration files.
Overview¶
SAGE-X uses TOML (Tom's Obvious, Minimal Language) format for configuration files. The configuration system supports:
- Template Variables: Use
${VAR_NAME}syntax for reusable values - Nested Sections: Organize related settings into logical groups
- Environment Variable Support: Template variables can reference environment variables
- Type Safety: Automatic conversion to Python dataclasses with type checking
Configuration File Location¶
Configuration files are loaded in the following order:
- Default Configuration:
src/aigise/templates/configs/default_config.toml(used when no config is specified) - Custom Configuration: Path specified via
config_pathparameter when creatingAigiseSession
Configuration Structure¶
The configuration is organized into several main sections:
# Top-level template variables (optional)
VARIABLE_NAME = "value"
# Root-level fields
task_name = "my_task"
src_dir_in_sandbox = "/shared/code"
default_host = "127.0.0.1"
auto_cleanup = true
# Section-based configuration
[neo4j]
# Neo4j database configuration
[sandbox]
# Sandbox configuration
[llm]
# LLM model configuration
[history]
# History and tool response configuration
[plugins]
# Plugin configuration
[agent_ensemble]
# Agent ensemble configuration
[build]
# Build and execution configuration
[mcp]
# Model Context Protocol services configuration
Template Variables¶
SAGE-X supports template variable expansion using ${VAR_NAME} syntax.
Rules:¶
- Top-level UPPERCASE variables automatically become template variables
- Variables can be referenced anywhere using
${VAR_NAME} - Variables are expanded recursively throughout the configuration
- Undefined variables cause an error at load time
Example:¶
# Define template variables (UPPERCASE)
DEFAULT_IMAGE = "ubuntu:20.04"
MAIN_MODEL = "openai/gpt-4"
NEO4J_PASSWORD = "mypassword123"
# Use template variables
[sandbox.sandboxes.main]
image = "${DEFAULT_IMAGE}"
[llm.model_configs.main]
model_name = "${MAIN_MODEL}"
[neo4j]
password = "${NEO4J_PASSWORD}"
Configuration Sections¶
Root-Level Fields¶
These fields are defined at the top level of the configuration file:
| Field | Type | Description | Default |
|---|---|---|---|
task_name | string | Name identifier for the current task/session | None |
src_dir_in_sandbox | string | Path to source code directory within sandbox containers | "/shared/code" |
agent_storage_path | string | Path where dynamically created agents are stored | None |
default_host | string | Default hostname for services (used by Neo4j and MCP services) | None (falls back to 127.0.0.1) |
auto_cleanup | boolean | Whether to automatically cleanup resources when session ends | true |
Example:
task_name = "vulnerability_analysis"
src_dir_in_sandbox = "/shared/code"
agent_storage_path = "/tmp/agents"
default_host = "localhost"
auto_cleanup = true
Neo4j Configuration¶
Configures the Neo4j graph database connection.
Section: [neo4j]
| Field | Type | Description | Default |
|---|---|---|---|
user | string | Neo4j username | None |
password | string | Neo4j password | None |
bolt_port | integer | Neo4j Bolt protocol port | 7687 |
neo4j_http_port | integer | Neo4j HTTP port | 7474 |
Note: The uri property is dynamically constructed as neo4j://{default_host}:{bolt_port}. If default_host is not set, it defaults to 127.0.0.1.
Example:
Sandbox Configuration¶
Configures sandbox environments (Docker containers or Kubernetes pods).
Section: [sandbox]
Top-Level Sandbox Settings¶
| Field | Type | Description | Default |
|---|---|---|---|
default_image | string | Default Docker image for sandboxes | None |
backend | string | Sandbox backend type: "native" (Docker) or "k8s" (Kubernetes) | "native" |
project_relative_shared_data_path | string | Path relative to project root for shared data (will be mounted as /shared in containers) | None |
absolute_shared_data_path | string | Absolute path for shared data | None |
tolerations | list[dict] | Kubernetes tolerations applied to all pods | None |
Per-Sandbox Configuration¶
Each sandbox type is configured under [sandbox.sandboxes.<sandbox_type>]:
Common Sandbox Types: - main: Primary analysis sandbox - joern: Joern static analysis sandbox - codeql: CodeQL analysis sandbox - neo4j: Neo4j database container - gdb_mcp: GDB debugger MCP service - pdb_mcp: PDB debugger MCP service - fuzz: Fuzzing environment
Container Configuration Fields:
| Field | Type | Description | Default |
|---|---|---|---|
image | string | Docker image name/tag | None |
container_id | string | Connect to existing container (instead of creating new) | None |
timeout | integer | Container operation timeout in seconds | 300 |
project_relative_dockerfile_path | string | Path to Dockerfile relative to project root | None |
absolute_dockerfile_path | string | Absolute path to Dockerfile | None |
command | string | Override container command (empty string = use Dockerfile default, None = use bash) | None |
platform | string | Platform architecture (e.g., "linux/amd64") | None |
network | string | Docker network name | None |
privileged | boolean | Run container in privileged mode | false |
security_opt | list[string] | Security options | [] |
cap_add | list[string] | Additional capabilities | [] |
gpus | string | GPU allocation (e.g., "all" or "device=GPU-UUID") | None |
shm_size | string | Shared memory size (e.g., "2g") | None |
mem_limit | string | Memory limit (e.g., "4g") | None |
cpus | string | CPU limit (e.g., "2") | None |
user | string | User to run as (e.g., "1000:1000") | None |
working_dir | string | Working directory in container | None |
Build Configuration:
| Field | Type | Description |
|---|---|---|
build_args | dict[string, string] | Docker build arguments |
using_cached | boolean | Whether to use cached image (internal flag) |
Environment, Volumes, and Ports:
| Field | Type | Description |
|---|---|---|
environment | dict[string, any] | Environment variables |
volumes | list[string] | Volume mounts in format "/host:/container:ro" |
mounts | list[string] | Docker mount specifications |
ports | dict[string, int\|string] | Port mappings in format {"port/tcp" = host_port} |
docker_args | list[string] | Raw arguments passed through to Docker CLI |
Extra Configuration:
| Field | Type | Description |
|---|---|---|
extra | dict[string, any] | Additional custom configuration (e.g., initializer_timeout_sec) |
Kubernetes-Specific Fields:
| Field | Type | Description |
|---|---|---|
pod_name | string | Connect to existing Pod instead of creating new |
container_name | string | Name of container within the Pod |
Example:
[sandbox]
backend = "native"
project_relative_shared_data_path = "data/my_project.tar.gz"
[sandbox.sandboxes.main]
image = "ubuntu:20.04"
project_relative_dockerfile_path = "dockerfiles/main/Dockerfile"
timeout = 300
[sandbox.sandboxes.main.build_args]
BASE_IMAGE = "ubuntu:20.04"
[sandbox.sandboxes.main.environment]
PYTHONPATH = "/shared/code"
[sandbox.sandboxes.main.ports]
"8080/tcp" = 8080
[sandbox.sandboxes.main.extra]
initializer_timeout_sec = 1800
[sandbox.sandboxes.joern]
image = "aigise/joern"
project_relative_dockerfile_path = "dockerfiles/joern/Dockerfile"
command = ""
[sandbox.sandboxes.joern.environment]
JAVA_OPTS = "-Xmx16G -Xms4G"
[sandbox.sandboxes.joern.ports]
"8081/tcp" = 18087
LLM Configuration¶
Configures language models used by agents.
Section: [llm]
Models are configured under [llm.model_configs.<model_name>]:
Common Model Names: - main: Primary model for agent reasoning - summarize: Model for summarization and context compression - flag_claims: Model for flag claims processing
Model Configuration Fields:
| Field | Type | Description | Default |
|---|---|---|---|
model_name | string | Model identifier (e.g., "openai/gpt-4", "anthropic/claude-3") | Required |
temperature | float | Sampling temperature (0.0-2.0) | None |
max_tokens | integer | Maximum tokens in response | None |
rpm | integer | Rate limit: requests per minute | None |
tpm | integer | Rate limit: tokens per minute | None |
Example:
[llm]
[llm.model_configs.main]
model_name = "openai/gpt-4"
temperature = 0.7
max_tokens = 4096
rpm = 60
tpm = 60000
[llm.model_configs.summarize]
model_name = "openai/gpt-3.5-turbo"
temperature = 0.3
max_tokens = 2048
rpm = 30
tpm = 30000
History Configuration¶
Configures tool response handling and event history management.
Section: [history]
| Field | Type | Description | Default |
|---|---|---|---|
max_tool_response_length | integer | Maximum length of a single tool response before special handling | 10000 |
enable_quota_countdown | boolean | Show remaining LLM call quota after each tool response | false |
Events Compaction Configuration:
Section: [history.events_compaction]
| Field | Type | Description | Default |
|---|---|---|---|
max_history_summary_length | integer | Character budget threshold for triggering compaction | 100000 |
compaction_percent | integer | Percentage of history to compress (0-100) | 50 |
Example:
[history]
max_tool_response_length = 10000
enable_quota_countdown = true
[history.events_compaction]
max_history_summary_length = 100000
compaction_percent = 50
Plugins Configuration¶
Configures which plugins are enabled.
Section: [plugins]
| Field | Type | Description | Default |
|---|---|---|---|
enabled | list[string] | List of enabled plugin names | [] |
Common Plugins: - history_summarizer_plugin: Summarizes long conversation history - tool_response_summarizer_plugin: Summarizes long tool responses - quota_after_tool_plugin: Shows quota countdown after tools
Example:
[plugins]
enabled = [
"history_summarizer_plugin",
"tool_response_summarizer_plugin",
"quota_after_tool_plugin",
]
Agent Ensemble Configuration¶
Configures multi-agent ensemble execution.
Section: [agent_ensemble]
| Field | Type | Description | Default |
|---|---|---|---|
thread_safe_tools | list[string] | List of tool names that are thread-safe (can be called in parallel) | [] |
available_models_for_ensemble | list[string] or string | List of model names available for ensemble (can be comma-separated string) | [] |
Example:
[agent_ensemble]
thread_safe_tools = ["google_search", "read_file"]
available_models_for_ensemble = ["openai/gpt-4", "anthropic/claude-3"]
Or as comma-separated string:
[agent_ensemble]
thread_safe_tools = ["google_search", "read_file"]
available_models_for_ensemble = "openai/gpt-4,anthropic/claude-3"
Build Configuration¶
Configures build and execution commands for target programs.
Section: [build]
| Field | Type | Description | Default |
|---|---|---|---|
poc_dir | string | Directory path for proof-of-concept code | None |
compile_command | string | Command to compile the target program | None |
run_command | string | Command to run the target program | None |
target_type | string | Type of target (e.g., "default", "binary") | None |
target_binary | string | Path to target binary | None |
Example:
[build]
poc_dir = "/tmp/poc"
compile_command = "gcc -o target target.c"
run_command = "./target"
target_type = "binary"
target_binary = "/tmp/poc/target"
MCP Configuration¶
Configures Model Context Protocol (MCP) services.
Section: [mcp]
MCP services are configured under [mcp.services.<service_name>]:
Common Service Names: - gdb_mcp: GDB debugger MCP service - pdb_mcp: PDB debugger MCP service
MCP Service Configuration Fields:
| Field | Type | Description |
|---|---|---|
sse_port | integer | Server-Sent Events (SSE) server port |
sse_host | string | SSE server host (if None, uses default_host from root config) |
Note: The sse_host property dynamically uses default_host from the root configuration if not explicitly set.
Example:
[mcp]
[mcp.services.gdb_mcp]
sse_port = 1111
[mcp.services.pdb_mcp]
sse_port = 1112
sse_host = "localhost" # Optional, defaults to root config's default_host
Complete Example¶
Here's a complete configuration file example:
# Template Variables
DEFAULT_IMAGE = "ubuntu:20.04"
MAIN_MODEL = "openai/gpt-4"
NEO4J_PASSWORD = "secure_password"
TASK_NAME = "security_analysis"
# Root Configuration
task_name = "${TASK_NAME}"
src_dir_in_sandbox = "/shared/code"
default_host = "localhost"
auto_cleanup = true
# Neo4j Configuration
[neo4j]
user = "neo4j"
password = "${NEO4J_PASSWORD}"
bolt_port = 7687
neo4j_http_port = 7474
# Sandbox Configuration
[sandbox]
backend = "native"
project_relative_shared_data_path = "data/project.tar.gz"
[sandbox.sandboxes.main]
image = "${DEFAULT_IMAGE}"
project_relative_dockerfile_path = "dockerfiles/main/Dockerfile"
timeout = 300
[sandbox.sandboxes.main.environment]
PYTHONPATH = "/shared/code"
[sandbox.sandboxes.joern]
image = "aigise/joern"
project_relative_dockerfile_path = "dockerfiles/joern/Dockerfile"
command = ""
[sandbox.sandboxes.joern.ports]
"8081/tcp" = 18087
# LLM Configuration
[llm]
[llm.model_configs.main]
model_name = "${MAIN_MODEL}"
temperature = 0.7
max_tokens = 4096
[llm.model_configs.summarize]
model_name = "${MAIN_MODEL}"
temperature = 0.3
max_tokens = 2048
# History Configuration
[history]
max_tool_response_length = 10000
enable_quota_countdown = true
[history.events_compaction]
max_history_summary_length = 100000
compaction_percent = 50
# Plugins Configuration
[plugins]
enabled = [
"history_summarizer_plugin",
"tool_response_summarizer_plugin",
]
# Agent Ensemble Configuration
[agent_ensemble]
thread_safe_tools = ["google_search"]
available_models_for_ensemble = "${MAIN_MODEL}"
# Build Configuration
[build]
compile_command = "make"
run_command = "./target"
# MCP Configuration
[mcp]
[mcp.services.gdb_mcp]
sse_port = 1111
Loading Configuration in Code¶
Using Default Configuration¶
from aigise.session import AigiseSession
# Uses default config from src/aigise/templates/configs/default_config.toml
session = AigiseSession(aigise_session_id="my_session")
Using Custom Configuration¶
from aigise.session import AigiseSession
# Load custom configuration file
session = AigiseSession(
aigise_session_id="my_session",
config_path="/path/to/my_config.toml"
)
Accessing Configuration¶
# Access configuration through session
config = session.config
# Access specific sections
neo4j_config = config.neo4j
sandbox_config = config.sandbox
llm_config = config.llm
# Access nested configurations
main_sandbox = config.get_sandbox_config("main")
main_model = config.get_llm_config("main")
Best Practices¶
- Use Template Variables: Define reusable values as UPPERCASE template variables at the top
- Organize by Section: Group related settings into logical sections
- Document Custom Fields: Add comments for non-standard or custom configuration
- Version Control: Keep configuration files in version control, but exclude sensitive values (passwords, API keys)
- Environment-Specific Configs: Create separate config files for development, testing, and production
- Validate Early: Test configuration files before deploying to catch errors early
Troubleshooting¶
Template Variable Not Found¶
If you see KeyError: Template variable 'VAR_NAME' not found, ensure: - The variable is defined as an UPPERCASE top-level variable - The variable name matches exactly (case-sensitive) - There are no typos in ${VAR_NAME} references
Configuration Not Loading¶
- Verify the TOML file syntax is correct
- Check file path is correct (use absolute paths if relative paths don't work)
- Ensure all required fields are present (check error messages)
Dynamic Host Resolution¶
If default_host is not set, services like Neo4j and MCP will default to 127.0.0.1. Set default_host at the root level for Kubernetes deployments or remote services.
Related Documentation¶
- Getting Started - Initial setup guide
- Architecture - System architecture overview
- Core Concepts - Core concepts including sessions
- Adding Sandboxes - Guide to adding new sandbox types