Operator Runbook
Reference procedures for operating the LYDOS server: startup, health verification, incident response, log access, and backup. Keep this page bookmarked for on-call use.
Server Restart
The LYDOS server is a single FastAPI process on port 8888. To restart it, activate the virtual environment and run server.py from the project root. The server loads all modules, registers Q-engine routers, and initialises agents at startup — this takes 2–5 seconds on a typical machine.
# 1. Activate the virtual environment
source ~/.ailydian-venv/bin/activate
# 2. Navigate to the project root
cd ~/Masaüstü/AILYDIAN-AGENT-ORCHESTRATOR
# 3. Start the server (runs in foreground — Ctrl+C to stop)
python3 server.py
# Expected startup output (abridged):
# INFO: LYDOS Agent OS v12.0.0 starting on http://0.0.0.0:8888
# INFO: Loaded 29 modules
# INFO: 109 agents registered
# INFO: MCP server: 162 tools available
# INFO: Application startup complete.Background service (systemd)
For persistent operation without a terminal, install the systemd unit. The install script creates the unit file and enables it on boot.
# Install as a systemd service (run once)
bash scripts/install_service.sh
# Check service status
systemctl status lydos
# Start / stop / restart
systemctl start lydos
systemctl stop lydos
systemctl restart lydos
# Follow logs
journalctl -u lydos -fAuto-start via session-start hook
When working inside an AI IDE (Claude Code, Cursor, etc.), thehooks/session-start.sh hook starts the server automatically at the beginning of each session. The hook also writes the current health status to .lydos_status.json for fast offline reads.
# Manually run the session-start hook (useful for debugging)
bash hooks/session-start.sh
# Read the cached status without hitting the live server
cat .lydos_status.json | python3 -m json.toolHealth Check
The health endpoint runs a 29-module diagnostic and returns a composite score out of 100. A score of 90+ means the system is fully operational. The one expected degraded module is analysis_api (score 70) when no Analysis Provider key is configured — this is normal and does not affect core operations.
# Full health check — human-readable
curl -s http://localhost:8888/api/health | python3 -m json.tool
# Minimal check — just the score
curl -s http://localhost:8888/api/health | python3 -c "
import sys, json
d = json.load(sys.stdin)
print(f'Score: {d["score"]}/100 Status: {d["status"]}')
print(f'Modules: {d["modules_healthy"]}/{d.get("modules_total", 29)} healthy')
if d.get('modules_failed', 0) > 0:
print('ALERT: failed modules:', d.get('failed_modules', []))
"
# Status endpoint (broader system overview)
curl -s http://localhost:8888/api/status | python3 -m json.tool
# Via lydos CLI
lydos health
lydos health --jsonExpected healthy response (abridged):
{
"status": "excellent",
"score": 94,
"modules_healthy": 28,
"modules_degraded": 1,
"modules_failed": 0,
"modules_total": 29,
"total_agents": 109,
"active_agents": 109,
"degraded_modules": ["analysis_api"],
"uptime_seconds": 14400,
"version": "12.0.0"
}API Not Responding
If curl http://localhost:8888/api/health fails or hangs, work through this checklist in order.
1. Check if the process is running
# Check for the server process
ps aux | grep "python3 server.py" | grep -v grep
# If no output, the server is not running — start it
source ~/.ailydian-venv/bin/activate
cd ~/Masaüstü/AILYDIAN-AGENT-ORCHESTRATOR
python3 server.py &
# Check the process started
ps aux | grep "python3 server.py" | grep -v grep2. Check the port
# Verify something is listening on port 8888
ss -tlnp | grep 8888
# Expected: LISTEN 0 128 0.0.0.0:8888 ...
# Or with lsof
lsof -i :8888
# If port is in use by another process, find it and kill it
fuser 8888/tcp
fuser -k 8888/tcp # Force-kill the process holding port 88883. Check the .env file
# Verify required variables are set
grep -E "^(PRIMARY_API_KEY|BILINGUAL_API_KEY)" .env
# If .env is missing or empty, copy from the example
cp .env.example .env
# Then add your API keys
# Check the server can read .env
python3 -c "import dotenv; dotenv.load_dotenv(); import os; print(os.getenv('PRIMARY_API_KEY', 'MISSING')[:8] + '...')"
# Expected: gsk_abcd... (first 8 chars of your key)4. Check the venv
# Verify the venv is activated and intact
which python3
# Expected: /home/user/.ailydian-venv/bin/python3
# If not activated
source ~/.ailydian-venv/bin/activate
# If venv is missing or broken, recreate it
python3 -m venv ~/.ailydian-venv
source ~/.ailydian-venv/bin/activate
pip install -r requirements.txt5. Check the startup logs
# Run in foreground to see all startup errors
source ~/.ailydian-venv/bin/activate
cd ~/Masaüstü/AILYDIAN-AGENT-ORCHESTRATOR
python3 server.py 2>&1 | head -60
# Look for lines containing "ERROR" or "CRITICAL"
python3 server.py 2>&1 | grep -E "(ERROR|CRITICAL|ImportError|ModuleNotFound)"
# Run with debug logging
LOG_LEVEL=DEBUG python3 server.pyCommon Issues
fuser -k 8888/tcp && python3 server.pyecho "PRIMARY_API_KEY=gsk_your_key_here" >> .envsource ~/.ailydian-venv/bin/activatepkill -f 'python3 server.py' && sleep 1 && python3 server.pycurl -s http://localhost:8888/api/health | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('failed_modules',[]))"~/.ailydian-venv/bin/python3 core/infrastructure/mcp_server.py --transport stdioIncident Severity Matrix
Use this matrix to classify incidents and decide the appropriate response speed. LYDOS is a local development tool — there are no SLAs — but tracking severity helps prioritise when multiple issues arise.
| Severity | Condition | Impact | Response |
|---|---|---|---|
| P1 | Server completely down — port 8888 not reachable | All agents unavailable, MCP disconnected | Immediate restart. Check process, port, venv, .env. |
| P2 | Server up but health score < 80 or 2+ modules failed | Degraded agent performance, some engines unavailable | Identify failed modules via /api/health, check per-module logs. |
| P3 | Single non-critical engine failed (e.g. analysis_api) | One provider or engine unavailable, fallback active | Check API key for that provider. Non-urgent — investigate at next opportunity. |
| P4 | Cosmetic issues: slow response, UI glitch, log noise | No functional impact | Log the issue, investigate during regular maintenance window. |
Log Locations
LYDOS writes structured JSON logs via Python logging. By default logs go to stdout, which systemd captures into the journal. The CLI also maintains its own operation log.
| Location | Contents | How to read |
|---|---|---|
stdout / systemd journal | Server startup, request routing, module errors | journalctl -u lydos -f (systemd) or terminal output |
~/.config/lydos/logs/cli.log | CLI commands, auth events, agent task invocations | tail -f ~/.config/lydos/logs/cli.log |
~/.config/lydos/audit.jsonl | Q48 Kavach audit trail — every approved/rejected action | cat ~/.config/lydos/audit.jsonl | python3 -m json.tool | head -40 |
.lydos/history/ | Per-task agent run logs (one JSON file per task) | ls .lydos/history/ | sort -r | head -5 |
.lydos_status.json | Cached health status written by session-start hook | cat .lydos_status.json | python3 -m json.tool |
Changing the log level
# Set log level via environment variable before starting
LOG_LEVEL=DEBUG python3 server.py
# Or add to .env for permanent change
echo "LOG_LEVEL=DEBUG" >> .env
# Filter server logs to errors only (useful in production)
LOG_LEVEL=ERROR python3 server.pyBackup and Restore
LYDOS stores state in three locations: configuration files, the goal store (hedefler/), and memory files. Back up all three to ensure full recovery after a disk failure or migration.
What to back up
| Path | Contents | Frequency |
|---|---|---|
config/ | agents.yaml, kernel.yaml, master_prompt.md | On every change |
hedefler/ | Goal storage — long-term goals, sprints, milestones | Daily |
~/.config/lydos/ | auth.json, config.yaml, CLI logs, audit trail | Daily (exclude logs if large) |
.env | API keys — store ENCRYPTED in a password manager | On every change — encrypt before storing |
.lydos/ | Project config, agent run history | Weekly or on project milestones |
Backup script
#!/usr/bin/env bash
# Minimal LYDOS state backup script
set -euo pipefail
BACKUP_DIR="$HOME/lydos-backup-$(date +%Y%m%d-%H%M%S)"
LYDOS_ROOT="$HOME/Masaüstü/AILYDIAN-AGENT-ORCHESTRATOR"
mkdir -p "$BACKUP_DIR"
# Config files (no secrets)
cp -r "$LYDOS_ROOT/config/" "$BACKUP_DIR/config/"
# Goal storage
cp -r "$LYDOS_ROOT/hedefler/" "$BACKUP_DIR/hedefler/"
# CLI auth and global config (contains API key — encrypt this)
cp -r "$HOME/.config/lydos/" "$BACKUP_DIR/lydos-config/"
# Encrypt the backup (requires gpg key configured)
tar czf "$BACKUP_DIR.tar.gz" "$BACKUP_DIR/"
gpg --symmetric --batch --passphrase-file "$HOME/.lydos-backup-passphrase" "$BACKUP_DIR.tar.gz"
rm -rf "$BACKUP_DIR" "$BACKUP_DIR.tar.gz"
echo "Backup created: $BACKUP_DIR.tar.gz.gpg"Restore procedure
# 1. Stop the server
systemctl stop lydos # or Ctrl+C if running in terminal
# 2. Decrypt the backup
gpg --decrypt lydos-backup-20260328-120000.tar.gz.gpg > backup.tar.gz
tar xzf backup.tar.gz
# 3. Restore config files
cp -r backup/config/ ~/Masaüstü/AILYDIAN-AGENT-ORCHESTRATOR/config/
# 4. Restore goal storage
cp -r backup/hedefler/ ~/Masaüstü/AILYDIAN-AGENT-ORCHESTRATOR/hedefler/
# 5. Restore global config
cp -r backup/lydos-config/ ~/.config/lydos/
# 6. Restore .env (from your password manager — do not store in backup as plaintext)
# Re-create .env manually with your API keys
# 7. Restart the server and verify
source ~/.ailydian-venv/bin/activate
cd ~/Masaüstü/AILYDIAN-AGENT-ORCHESTRATOR
python3 server.py &
sleep 3
curl -s http://localhost:8888/api/health | python3 -m json.tooldata/lydos.db (created at first run). Back it up with the WAL checkpoint command to ensure consistency: sqlite3 data/lydos.db "PRAGMA wal_checkpoint(FULL);" before copying.