automationssrefeature

Automated SRE: The Automations Engine

Define rules to automatically manage runaway processes. OmniMon's new automations engine monitors CPU and RAM thresholds with configurable actions.

Proactive System Management

Monitoring is reactive by nature — you see a problem, then you fix it. OmniMon’s new Automations Engine flips this model: define rules once, and let the system handle the rest.

How Rules Work

Each automation rule has five parameters:

ParameterDescriptionExample
Process PatternName or substring to matchchrome, node, python
MetricWhat to monitorcpu or ram
ThresholdTrigger value80% CPU, 1024 MB RAM
DurationHow long the violation must persist30 seconds
ActionWhat to do when triggeredkill or alert

Duration-Based Tracking

OmniMon doesn’t trigger on momentary spikes. The engine tracks how long a process has exceeded its threshold using a HashMap<(rule_id, pid), Instant>. Actions only fire after the violation has persisted for the configured duration.

This prevents false positives from:

  • Brief CPU spikes during compilation
  • Momentary memory allocation peaks
  • Transient process startup bursts

Example Rules

Kill Chrome tabs eating too much RAM:

Pattern: chrome
Metric: RAM
Threshold: 2048 MB
Duration: 60 seconds
Action: Kill

Alert when Node.js exceeds CPU threshold:

Pattern: node
Metric: CPU
Threshold: 80%
Duration: 30 seconds
Action: Alert

Safety Guarantees

The automations engine inherits OmniMon’s process safety system:

  • Protected processes (kernel, launchd, smss.exe, etc.) cannot be killed by automation rules
  • All kill actions go through kill_process_safe() which enforces OS-specific blocklists
  • Native desktop notifications inform you of every action taken

Notifications

When a rule triggers:

Kill action:

“Killed Chrome (PID 1234) for exceeding 2048.0 MB RAM”

Alert action:

“Process node (PID 5678) exceeded 80.0% CPU”

Evaluation Loop

The engine runs as a background worker thread, evaluating all rules every 5 seconds:

Every 5 seconds:
  1. Read current rules (thread-safe RwLock)
  2. Fetch cached system state
  3. For each rule:
     a. Match process names against pattern
     b. Check metric vs threshold
     c. Track violation duration
     d. Execute action if sustained
  4. Send notifications
  5. Reset violation tracker after action

Creating Rules

Open OmniMon → Navigate to the Automations panel → Click “New Rule” → Configure parameters → Save. Rules take effect immediately.