uptime-monitor avatar

uptime-monitor

Continuous HTTP/HTTPS uptime monitoring with instant downtime alerts

monitoringuptimealertsdevopsinfrastructurehealth-checksre
by openhatchabout 2 months ago

Quick Start

# Install and run
openhatch run @openhatch/uptime-monitor

Template Contents

Browse files before installing this template.

Loading template files…

About

Uptime Monitor Agent

Never miss a downtime event. This OpenClaw agent continuously monitors your websites, APIs, and services, alerting you the moment something goes down.

What It Does

The Uptime Monitor agent performs health checks on HTTP/HTTPS endpoints every 5 minutes, tracking:

  • Availability — Is the endpoint responding?
  • Response time — How fast is it?
  • Status codes — 200 OK, 404, 500, timeouts
  • SSL certificates — Expiration warnings
  • State changes — Only alerts when status actually changes (up→down or down→up)

It's designed to be quiet when everything is fine and loud when something breaks.

Key Features

Multi-endpoint monitoring — Track unlimited URLs
Smart alerting — Only notify on state changes, not every check
Response time tracking — Historical performance data
SSL expiration warnings — Get notified 30/7 days before cert expires
Custom status page — Generate markdown status reports
Configurable intervals — Default 5 min, customize per endpoint
Timeout handling — Configurable timeouts (default 10s)
Lightweight — Uses Claude Haiku for fast, cheap checks

Quick Start

1. Install

hatchery run @openclaw/uptime-monitor

2. Configure Endpoints

Edit memory/endpoints.json in your workspace:

{
  "endpoints": [
    {
      "name": "Production API",
      "url": "https://api.example.com/health",
      "method": "GET",
      "expectedStatus": 200,
      "timeout": 10,
      "checkInterval": 300
    },
    {
      "name": "Marketing Site",
      "url": "https://example.com",
      "method": "GET",
      "expectedStatus": 200,
      "timeout": 15,
      "checkInterval": 600
    }
  ]
}

3. Start Monitoring

The agent automatically begins checking endpoints on startup. You'll receive:

  • Immediate alerts when an endpoint goes down
  • Recovery notifications when it comes back up
  • SSL warnings 30 days and 7 days before expiration
  • Daily status summaries (optional)

Configuration

Environment Variables

None required! This agent works out of the box. Optional:

# Optional: Set your timezone for time-aware reporting
USER_TIMEZONE=America/New_York

# Optional: Slack/Discord webhook for alerts (in addition to agent DM)
ALERT_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL

Endpoint Configuration

Each endpoint in memory/endpoints.json supports:

  • name (required) — Human-readable identifier
  • url (required) — Full URL to check
  • method (optional) — HTTP method, default GET
  • expectedStatus (optional) — Expected HTTP status code, default 200
  • timeout (optional) — Request timeout in seconds, default 10
  • checkInterval (optional) — Seconds between checks, default 300 (5 min)
  • headers (optional) — Custom headers object
  • body (optional) — Request body for POST/PUT

Alert Preferences

Edit TOOLS.md to customize:

  • Quiet hours — Don't alert between 11 PM - 7 AM unless critical
  • Alert channels — Direct message, Slack, Discord, email
  • Escalation rules — Who to notify after X minutes of downtime

Usage Examples

Check Current Status

You: What's the current status of all endpoints?

Agent: 📊 Uptime Status Report

✅ Production API
   https://api.example.com/health
   Status: UP (200 OK)
   Response time: 124ms
   Uptime: 99.97% (last 30 days)

✅ Marketing Site
   https://example.com
   Status: UP (200 OK)
   Response time: 456ms
   Uptime: 100.00%

🟢 All systems operational

Downtime Alert

Agent: 🚨 DOWNTIME ALERT

Production API is DOWN
https://api.example.com/health

Error: Connection timeout after 10s
Last successful check: 5 minutes ago
Previous uptime: 45 days

I'll notify you when it recovers.

Recovery Notification

Agent: ✅ RECOVERED

Production API is back UP
https://api.example.com/health

Downtime duration: 8 minutes
Status: 200 OK
Response time: 132ms

SSL Expiration Warning

Agent: ⚠️ SSL Certificate Expiring Soon

https://example.com
Certificate expires in 28 days (March 15, 2024)

Please renew before expiration to avoid downtime.

Add New Endpoint

You: Add monitoring for https://status.example.com

Agent: Added new endpoint:

✅ status.example.com
   URL: https://status.example.com
   Check interval: 5 minutes
   Timeout: 10s
   Expected status: 200

First check in progress...
✅ UP (200 OK, 234ms)

I've updated memory/endpoints.json

How It Works

Architecture

  1. HEARTBEAT.md defines the check schedule (every 5 minutes)
  2. skills/uptime-check/check.sh performs the actual HTTP requests
  3. memory/endpoints.json stores endpoint configurations
  4. memory/uptime-state.json tracks current state and history
  5. Agent compares current state to previous state
  6. Alerts sent only on state transitions (up→down or down→up)

State Tracking

The agent maintains state in memory/uptime-state.json:

{
  "endpoints": {
    "https://api.example.com/health": {
      "status": "up",
      "lastCheck": 1708128000,
      "lastStatusChange": 1704412800,
      "consecutiveFailures": 0,
      "uptimePercentage": 99.97,
      "responseTimeHistory": [124, 132, 118, 145]
    }
  }
}

Check Logic

# For each endpoint:
1. Load last known state from memory/uptime-state.json
2. Execute HTTP request with timeout
3. Record response time and status code
4. Compare to expected status
5. If state changed (up→down or down→up): ALERT
6. If state unchanged: Update metrics silently
7. Save new state to memory/uptime-state.json
8. Output HEARTBEAT_OK if nothing to report

Troubleshooting

Agent isn't checking endpoints

  • Verify memory/endpoints.json exists and is valid JSON
  • Check that URLs are accessible from the agent's network
  • Look for errors in the agent's session logs

Too many alerts

  • Increase checkInterval to reduce check frequency
  • Adjust timeout if endpoints are legitimately slow
  • Enable quiet hours in TOOLS.md

Missing recovery notifications

  • The agent only notifies on state changes
  • If you restarted the agent, it may have lost state
  • Check memory/uptime-state.json for correct state tracking

SSL warnings not appearing

  • SSL checks only happen once per day (not every heartbeat)
  • Warnings appear at 30 days and 7 days before expiration
  • Check that the endpoint uses HTTPS

Advanced Usage

Custom Headers (Authentication)

{
  "name": "Authenticated API",
  "url": "https://api.example.com/private",
  "headers": {
    "Authorization": "Bearer YOUR_TOKEN",
    "X-Custom-Header": "value"
  }
}

POST Health Checks

{
  "name": "POST Endpoint",
  "url": "https://api.example.com/webhook",
  "method": "POST",
  "body": "{\"ping\":\"health\"}" ,
  "expectedStatus": 200
}

Status Page Generation

You: Generate a status page

Agent: [Creates memory/status-page.md with current status of all endpoints]

Status page generated at memory/status-page.md
You can publish this to your website or share with your team.

Best Practices

Start with critical endpoints — Don't monitor everything at once
Set reasonable timeouts — Match your actual SLAs
Use check intervals wisely — 5 min for critical, 15-30 min for non-critical
Monitor health endpoints — Dedicated /health routes are better than homepage checks
Test your endpoints — Make sure they're reachable from the agent's network
Review weekly — Check uptime percentages and response time trends

Model & Cost

This agent uses Claude Haiku for:

  • Speed — Quick decisions on up/down state
  • 💰 Cost efficiency — Checks every 5 min = ~8,640 checks/month
  • 🎯 Appropriate complexity — Simple boolean logic doesn't need Sonnet/Opus

Estimated cost: ~$2-5/month for 10 endpoints checked every 5 minutes.

Contributing

Found a bug? Have a feature request? Open an issue or submit a PR!

License

MIT License - use freely in personal and commercial projects.


Made with OpenClaw — The self-hosted AI agent runtime.
Learn more at docs.openclaw.ai

Deploy to Hosted

Stats

Downloads0
Deployments0
Latest Version1.0.0
Runtime SupportAny
Size19.2 KB

Versions

1.0.0about 2 months ago