A distributed health checking system for monitoring various services, databases, and endpoints.
- 15+ check types — HTTP, TCP, ICMP, SSH, DNS, Redis, MongoDB, SSL/TLS, SMTP, gRPC, WebSocket, Domain Expiry, Passive, and full MySQL/PostgreSQL suites
- 9 alert channels — Slack, Discord, Telegram, Email, PagerDuty, OpsGenie, Microsoft Teams, ntfy, and Webhooks
- Rich App integrations for Slack, Discord, and Telegram with incident threading and interactive buttons
- Web-based monitoring dashboard
- Extensible architecture for adding new check types and alert channels
- Go 1.24.0 or later
- PostgreSQL (for storing check configurations and results)
- Access to monitored services
# Clone the repository
git clone https://github.com/yourusername/checker-github.git
cd checker-github
# Build the binary
go build -o checker ./cmd/app
# Run the checker
./checker -config config.yamlchecker-edge is the lightweight agent that runs inside your network and reports back to the Ensafely SaaS. It is distributed as a Helm chart via GitHub Pages.
helm repo add ensafely https://imcitius.github.io/checker
helm repo updatehelm install checker-edge ensafely/checker-edge \
--set apiKey=ck_YOUR_KEY \
--set region=office-londonReplace ck_YOUR_KEY with your API key (create one at app.ensafely.com → API Keys) and set region to a label that identifies this deployment (e.g. us-east-k8s, office-london).
See charts/checker-edge/values.yaml for the full list of configurable parameters, including resource limits, node selectors, tolerations, and pod annotations.
Avoid putting your API key in plain text on the command line. Create a Kubernetes Secret first, then reference it:
kubectl create secret generic checker-edge-secret \
--from-literal=api-key=ck_YOUR_KEYhelm install checker-edge ensafely/checker-edge \
--set existingSecret.name=checker-edge-secret \
--set existingSecret.key=api-key \
--set region=office-londonConfiguration is provided via YAML files. A basic example:
defaults:
duration: 10s
alerts_channel: telegram
maintenance_duration: 15m
db:
protocol: postgres
host: localhost:5432
username: checker-dev
database: checker_dev
password: password
alerts:
telegram:
type: telegram
bot_token: YOUR_BOT_TOKEN
critical_channel: CHANNEL_ID
noncritical_channel: CHANNEL_ID
slack:
type: slack
webhook_url: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
channel: "#general"
projects:
example:
parameters:
duration: 30s
healthchecks:
api_checks:
parameters:
duration: 30s
checks:
Google:
type: http
url: https://google.com
timeout: 5sHTTP checks verify that web endpoints are responding correctly.
Google:
type: http
url: https://google.com
timeout: 5s
code: [200] # Expected status codes
answer: "Google" # Expected content in response
skip_check_ssl: false
ssl_expiration_period: "720h" # Warn if SSL cert expires within 30 days
stop_follow_redirects: false
auth:
user: admin
password: secret
headers:
- Authorization: "Bearer token"TCP checks verify connectivity to a specific port.
Database:
type: tcp
host: db.example.com
port: 5432
timeout: 3sICMP checks verify that a host responds to ping requests.
ServerPing:
type: icmp
host: server.example.com
count: 3
timeout: 5sSSH checks verify that an SSH server is reachable and optionally validate its banner string.
GitServer:
type: ssh
host: git.example.com
port: 22
timeout: 5s
expect_banner: "OpenSSH" # Optional: verify the SSH banner contains this stringDNS checks verify that a domain resolves correctly for a given record type.
DNSLookup:
type: dns
domain: example.com
record_type: A # A, AAAA, MX, TXT, NS, CNAME
expected: "93.184.216.34" # Optional: expected value in results
host: 8.8.8.8 # Optional: custom DNS resolver
timeout: 5sRedis checks verify connectivity to a Redis instance using a PING command.
CacheServer:
type: redis
host: redis.example.com
port: 6379
password: secret # Optional
db: 0 # Optional: database number
timeout: 5sMongoDB checks verify connectivity to a MongoDB instance.
DocumentStore:
type: mongodb
uri: "mongodb://user:pass@mongo.example.com:27017/mydb"
timeout: 5sDomain expiry checks monitor domain registration expiration via WHOIS lookups.
DomainRenewal:
type: domain_expiry
domain: example.com
expiry_warning_days: 30 # Warn when domain expires within this many days
timeout: 10sSSL certificate checks monitor certificate expiration and optionally validate the certificate chain.
CertCheck:
type: ssl_cert
host: example.com
port: 443
expiry_warning_days: 30 # Warn when cert expires within this many days
validate_chain: true # Optional: verify the full certificate chain
timeout: 5sSMTP checks verify that a mail server is accepting connections.
MailServer:
type: smtp
host: mail.example.com
port: 587
starttls: true # Optional: use STARTTLS
username: alerts@example.com # Optional
password: secret # Optional
timeout: 5sgRPC health checks use the standard gRPC health checking protocol to verify service availability.
PaymentService:
type: grpc_health
host: "grpc.example.com:50051" # host:port format
use_tls: true # Optional: connect with TLS
timeout: 5sWebSocket checks verify that a WebSocket endpoint accepts connections and optionally send/receive messages.
LiveFeed:
type: websocket
url: "wss://ws.example.com/feed" # ws:// or wss://
send_message: "ping" # Optional: message to send after connecting
expect_message: "pong" # Optional: expected response content
timeout: 5sPassive checks wait for external signals rather than actively testing. An alert fires if no signal is received within the timeout.
CronJob:
type: passive
timeout: 10m # Alert if no signal received within this timeframePerforms a query to verify database connectivity and operation.
MySQL Basic Query:
type: mysql_query
host: db.example.com
port: 3306
timeout: 5s
username: dbuser
password: dbpassword
dbname: mydatabase
query: "SELECT 1;"
response: "1" # Optional expected responseVerifies that the database server's time is synchronized within a specified tolerance.
MySQL Time Check:
type: mysql_query_unixtime
host: db.example.com
port: 3306
timeout: 5s
username: dbuser
password: dbpassword
dbname: mydatabase
query: "SELECT UNIX_TIMESTAMP();"
difference: "10s" # Maximum allowed time differenceMonitors MySQL replication by inserting test data on the master and verifying it appears on replicas.
MySQL Replication:
type: mysql_replication
host: master-db.example.com
port: 3306
timeout: 5s
username: repluser
password: replpassword
dbname: test_db
table_name: replication_test # Table must exist on all servers
lag: "5s" # Maximum allowed replication lag
server_list:
- "replica1.example.com"
- "replica2.example.com:3307"Performs a query to verify database connectivity and operation.
PostgreSQL Basic Query:
type: pgsql_query
host: db.example.com
port: 5432
timeout: 5s
username: dbuser
password: dbpassword
dbname: mydatabase
sslmode: require # Optional: disable, require, verify-ca, verify-full
query: "SELECT 1;"
response: "1" # Optional expected responseVerifies that the database server's time is synchronized.
PostgreSQL Time Check:
type: pgsql_query_unixtime # or pgsql_query_timestamp
host: db.example.com
port: 5432
timeout: 5s
username: dbuser
password: dbpassword
dbname: mydatabase
query: "SELECT CAST(EXTRACT(EPOCH FROM NOW()) AS INTEGER);" # for unixtime
difference: "10s" # Maximum allowed time differenceMonitors PostgreSQL replication by inserting test data on the master and verifying it appears on replicas.
PostgreSQL Replication:
type: pgsql_replication
host: master-db.example.com
port: 5432
timeout: 5s
username: repluser
password: replpassword
dbname: test_db
sslmode: require
table_name: replication_test
lag: "5s"
server_list:
- "replica1.example.com"
analytic_replicas: # Optional: replicas with higher lag tolerance
- "analytics.example.com"Checks replication health by querying PostgreSQL's built-in replication status views instead of inserting test data.
PostgreSQL Replication Status:
type: pgsql_replication_status
host: master-db.example.com
port: 5432
timeout: 5s
username: repluser
password: replpassword
dbname: mydatabase
sslmode: require
lag: "30s"
server_list:
- "replica1.example.com"Alert channels define how you are notified when checks fail or recover. Configure them in the alerts section of your config file.
Webhook-based alerting or full Slack App integration with incident threading, interactive buttons, and silence commands.
alerts:
slack:
type: slack # or slack_webhook
webhook_url: https://hooks.slack.com/services/YOUR/WEBHOOK/URLApp integration: When configured as a Slack App (bot token + channel), provides threaded incident tracking, interactive action buttons, and
/silencecommands.
Full Discord App integration with rich embeds, interactive buttons, and thread-based incident tracking.
alerts:
discord:
type: discord
bot_token: YOUR_BOT_TOKEN
channel_id: "123456789012345678"Full Telegram Bot integration with message threading, error snapshots, and inline keyboards.
alerts:
telegram:
type: telegram
bot_token: YOUR_BOT_TOKEN
critical_channel: CHANNEL_ID
noncritical_channel: CHANNEL_IDSMTP-based alerting with HTML templates.
alerts:
email:
type: email
smtp_host: smtp.example.com
smtp_port: 587
username: alerts@example.com
password: secret
from: alerts@example.com
to:
- team@example.comEvents API v2 integration with automatic resolve and severity mapping.
alerts:
pagerduty:
type: pagerduty
routing_key: YOUR_EVENTS_API_V2_ROUTING_KEYAlert trigger and resolve with priority mapping (P1–P3).
alerts:
opsgenie:
type: opsgenie
api_key: YOUR_API_KEYWebhook-based alerting using the MessageCard format.
alerts:
teams:
type: teams
webhook_url: https://outlook.office.com/webhook/YOUR/WEBHOOK/URLPush notification service with priority mapping and action buttons.
alerts:
ntfy:
type: ntfy
topic: checker-alerts
server: https://ntfy.sh # Optional, defaults to https://ntfy.sh
token: YOUR_ACCESS_TOKEN # OptionalGeneric HTTP POST notifications with Go template body and HMAC-SHA256 signing for payload verification.
alerts:
custom_webhook:
type: webhook
url: https://api.example.com/alerts
method: POST
headers:
Content-Type: application/json
payload: '{"check": "{{.CheckName}}", "status": "{{.Status}}"}'- Define the check type in the
pkg/checkspackage - Add a config struct in
pkg/models/check_types.go - Register the check in
pkg/checks/factory.go - Add UI components in the frontend
- Create tests in
pkg/checks/your_check_test.go
# Run unit tests
go test ./...
# Run integration tests (requires services to be available)
INTEGRATION_TESTS=true go test ./...
# Run MySQL integration tests
INTEGRATION_TESTS=true TEST_MYSQL_USERNAME=root TEST_MYSQL_PASSWORD=password go test ./pkg/checks -run=^TestMySQL
# Note: When running individual test files, use the package approach instead of the file approach
# Correct: go test ./pkg/checks -run=^TestMySQL
# Incorrect: go test ./pkg/checks/mysql_test.goThis project is licensed under the Business Source License 1.1 (BSL 1.1).
- Self-hosting for internal use
- Modifying the source code
- Using the software for personal or internal business purposes
- Non-competing commercial use (e.g. running checks for your own infrastructure)
- Offering this software as a managed monitoring, health-checking, or uptime-tracking service to third parties (i.e. you cannot build a SaaS product on top of this software that competes with Ensafely)
On May 1, 2031, the license automatically converts to the Apache License 2.0, making the software fully open source.
For alternative licensing arrangements, please contact Ensafely.
