Automated SRE: The Automations Engine
Define rules to automatically manage runaway processes. OmniMon's new automations engine monitors CPU and RAM thresholds with configurable actions.
Proactive System Management
Monitoring is reactive by nature — you see a problem, then you fix it. OmniMon’s new Automations Engine flips this model: define rules once, and let the system handle the rest.
How Rules Work
Each automation rule has five parameters:
| Parameter | Description | Example |
|---|---|---|
| Process Pattern | Name or substring to match | chrome, node, python |
| Metric | What to monitor | cpu or ram |
| Threshold | Trigger value | 80% CPU, 1024 MB RAM |
| Duration | How long the violation must persist | 30 seconds |
| Action | What to do when triggered | kill or alert |
Duration-Based Tracking
OmniMon doesn’t trigger on momentary spikes. The engine tracks how long a process has exceeded its threshold using a HashMap<(rule_id, pid), Instant>. Actions only fire after the violation has persisted for the configured duration.
This prevents false positives from:
- Brief CPU spikes during compilation
- Momentary memory allocation peaks
- Transient process startup bursts
Example Rules
Kill Chrome tabs eating too much RAM:
Pattern: chrome
Metric: RAM
Threshold: 2048 MB
Duration: 60 seconds
Action: Kill
Alert when Node.js exceeds CPU threshold:
Pattern: node
Metric: CPU
Threshold: 80%
Duration: 30 seconds
Action: Alert
Safety Guarantees
The automations engine inherits OmniMon’s process safety system:
- Protected processes (kernel,
launchd,smss.exe, etc.) cannot be killed by automation rules - All kill actions go through
kill_process_safe()which enforces OS-specific blocklists - Native desktop notifications inform you of every action taken
Notifications
When a rule triggers:
Kill action:
“Killed Chrome (PID 1234) for exceeding 2048.0 MB RAM”
Alert action:
“Process node (PID 5678) exceeded 80.0% CPU”
Evaluation Loop
The engine runs as a background worker thread, evaluating all rules every 5 seconds:
Every 5 seconds:
1. Read current rules (thread-safe RwLock)
2. Fetch cached system state
3. For each rule:
a. Match process names against pattern
b. Check metric vs threshold
c. Track violation duration
d. Execute action if sustained
4. Send notifications
5. Reset violation tracker after action
Creating Rules
Open OmniMon → Navigate to the Automations panel → Click “New Rule” → Configure parameters → Save. Rules take effect immediately.