Skip to main content

Overview

This guide covers production-grade deployment of Blocklight with enterprise features including high availability, horizontal scaling, comprehensive monitoring, and disaster recovery.

Pre-Deployment Checklist

Before deploying to production, ensure you have:
  • Validated all detection rules with blocklight config check
  • Configured .env with production API keys
  • Set up monitoring infrastructure (Prometheus, Grafana, Loki)
  • Configured alerting channels (Slack, PagerDuty, etc.)
  • Planned backup and disaster recovery strategy
  • Reviewed security best practices
  • Load tested with expected transaction volume

Docker Compose Production Deployment

1. Configuration

Create a production-ready config.yaml:
chains:
  ethereum:
    enabled: true
    rpc_url: https://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_API_KEY}
    ws_url: wss://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_API_KEY}
    workers: 8
    batch_size: 100
    start_block: latest

go_core:
  port: 50051
  batch_size: 100
  channel_buffer: 10000  # Higher for production
  max_condition_depth: 20
  connection_pool_size: 50

aggregation:
  enabled: true
  window_seconds: 60
  min_count: 3
  preserve_individual: false

storage:
  max_findings: 50000

logging:
  level: INFO
  format: json
  console:
    enabled: true
  file:
    enabled: true
    path: /app/logs/blocklight.log
    max_size_mb: 100
    max_backups: 10

2. Environment Variables

Create .env for production:
# API Keys
ALCHEMY_API_KEY=your_production_key_here
INFURA_API_KEY=your_backup_key_here

# Logging
LOG_LEVEL=INFO

# Grafana
GRAFANA_USER=admin
GRAFANA_PASSWORD=secure_password_here

# Prometheus
PROMETHEUS_PORT=9090

# API
API_PORT=8000

3. Deploy with Observability Stack

# Start all services with observability stack (Vector, Loki, Grafana)
docker-compose --profile observability up -d

# Verify all services are healthy
docker-compose ps

# Check logs
docker-compose logs -f blocklight-core

Resource Limits

Configure resource limits in docker-compose.yml:
services:
  blocklight-core:
    deploy:
      resources:
        limits:
          cpus: '4.0'
          memory: 4G
        reservations:
          cpus: '1.0'
          memory: 1G
    restart: unless-stopped

High Availability Setup

Active-Passive Configuration

Deploy two Blocklight instances:
  1. Primary Instance: Actively processing transactions
  2. Standby Instance: Ready to take over on failure
Use a load balancer (HAProxy, Nginx) with health checks:
upstream blocklight {
    server blocklight-primary:50051 max_fails=3 fail_timeout=30s;
    server blocklight-standby:50051 backup;
}

server {
    listen 50051;
    location / {
        grpc_pass grpc://blocklight;
    }
}

Health Checks

Configure comprehensive health checks:
healthcheck:
  test: ["CMD", "grpc_health_probe", "-addr=:50051"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 40s

Scaling Considerations

Vertical Scaling

For high-throughput chains:
  • CPU: 4-8 cores for Ethereum mainnet
  • Memory: 4-8GB RAM
  • Disk: SSD with 100GB+ for logs and findings
  • Network: 1Gbps+ for WebSocket connections

Horizontal Scaling

Deploy multiple instances per chain:
services:
  blocklight-ethereum-1:
    # Instance 1 configuration
    
  blocklight-ethereum-2:
    # Instance 2 configuration
Use different start_block values to partition workload.

Monitoring and Observability

Prometheus Metrics

Key metrics to monitor:
# Finding rate
rate(blocklight_findings_total[5m])

# Transaction processing rate
rate(blocklight_transactions_processed_total[5m])

# Evaluation latency
histogram_quantile(0.95, blocklight_evaluation_duration_seconds)

# Event bus queue size
blocklight_event_bus_queue_size

# Memory usage
process_resident_memory_bytes{job="blocklight"}

Grafana Dashboards

Import the pre-configured dashboard:
  1. Open Grafana at http://localhost:3000
  2. Navigate to DashboardsImport
  3. Upload config/grafana/dashboards/blocklight-detections.json
  4. Select Loki and Prometheus data sources

Alerting Rules

Configure Prometheus alerts:
groups:
  - name: blocklight
    interval: 30s
    rules:
      - alert: HighFindingRate
        expr: rate(blocklight_findings_total[5m]) > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High finding rate detected"
          
      - alert: EventBusQueueFull
        expr: blocklight_event_bus_queue_size > 8000
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Event bus queue near capacity"
          
      - alert: NoTransactionsProcessed
        expr: rate(blocklight_transactions_processed_total[5m]) == 0
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "No transactions processed in 10 minutes"

Backup and Disaster Recovery

Configuration Backup

Backup critical files daily:
#!/bin/bash
# backup.sh
BACKUP_DIR="/backups/blocklight/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"

# Backup configuration
cp config/config.yaml "$BACKUP_DIR/"
cp .env "$BACKUP_DIR/"

# Backup rules
tar -czf "$BACKUP_DIR/rules.tar.gz" rules/

# Backup findings (last 7 days)
docker exec blocklight-core tar -czf - /app/data/findings/ > "$BACKUP_DIR/findings.tar.gz"

Disaster Recovery Plan

  1. RTO (Recovery Time Objective): < 15 minutes
  2. RPO (Recovery Point Objective): < 1 hour
Recovery Steps:
# 1. Restore configuration
cp /backups/blocklight/latest/config.yaml config/
cp /backups/blocklight/latest/.env .

# 2. Restore rules
tar -xzf /backups/blocklight/latest/rules.tar.gz

# 3. Restart services
docker-compose --profile observability up -d

# 4. Verify health
docker-compose ps
curl http://localhost:8000/health

Security Best Practices

Network Security

# docker-compose.yml
networks:
  blocklight-network:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.0.0/16
  • Use internal networks for service communication
  • Expose only necessary ports
  • Implement firewall rules

Secrets Management

Use Docker secrets or external secret managers:
secrets:
  alchemy_api_key:
    external: true

services:
  blocklight-core:
    secrets:
      - alchemy_api_key

API Authentication

Enable authentication for gRPC API:
security:
  enabled: true
  api_key: ${API_KEY}
  tls:
    enabled: true
    cert_file: /certs/server.crt
    key_file: /certs/server.key

Performance Tuning

Optimize Configuration

# High-throughput configuration
go_core:
  batch_size: 200
  channel_buffer: 20000
  connection_pool_size: 100

chains:
  ethereum:
    workers: 16
    batch_size: 200

Database Optimization

For Loki:
limits_config:
  ingestion_rate_mb: 20
  max_streams_per_user: 20000
  max_global_streams_per_user: 20000

Cache Configuration

analysis:
  contract:
    cache_ttl_seconds: 3600
  transaction:
    cache_ttl_seconds: 300

Troubleshooting

Common Issues

Issue: Event channel full
# Increase buffer size
go_core:
  channel_buffer: 20000
Issue: High memory usage
# Reduce cache TTL and max findings
storage:
  max_findings: 10000

analysis:
  contract:
    cache_ttl_seconds: 1800
Issue: Slow transaction processing
# Increase workers and batch size
chains:
  ethereum:
    workers: 16
    batch_size: 200

Debug Mode

Enable debug logging temporarily:
docker-compose exec blocklight-core \
  /app/blocklight start --config /app/config/config.yaml --log-level DEBUG

Maintenance

Rolling Updates

# 1. Pull latest image
docker-compose pull blocklight-core

# 2. Restart with zero downtime (if using multiple instances)
docker-compose up -d --no-deps --scale blocklight-core=2
docker-compose up -d --no-deps --scale blocklight-core=1

Log Rotation

Configure log rotation:
logging:
  file:
    enabled: true
    max_size_mb: 100
    max_backups: 30
    max_age_days: 90

Database Maintenance

Compact Loki data:
docker-compose exec loki /usr/bin/loki \
  -config.file=/etc/loki/local-config.yaml \
  -target=compactor

Cost Optimization

RPC Provider Optimization

  • Use caching to reduce RPC calls
  • Implement request batching
  • Consider running your own node for high volumes

Resource Right-Sizing

Monitor actual usage and adjust:
# Check resource usage
docker stats blocklight-core

# Adjust limits based on actual usage

Support and Monitoring

Health Endpoints

  • Basic Health: GET /health
  • Detailed Health: GET /health/detailed
  • Metrics: GET /metrics

Logging

All logs are structured JSON for easy parsing:
{
  "level": "info",
  "timestamp": "2024-01-15T10:30:00Z",
  "message": "Finding detected",
  "rule": "high_value_transfer",
  "severity": "CRITICAL",
  "chain": "ethereum",
  "tx_hash": "0x..."
}

Next Steps