Production Deployment

Overview

This guide covers production-grade deployment of Blocklight with enterprise features including high availability, horizontal scaling, comprehensive monitoring, and disaster recovery.

Pre-Deployment Checklist

Before deploying to production, ensure you have:

Validated all detection rules with blocklight config check
Configured .env with production API keys
Set up monitoring infrastructure (Prometheus, Grafana, Loki)
Configured alerting channels (Slack, PagerDuty, etc.)
Planned backup and disaster recovery strategy
Reviewed security best practices
Load tested with expected transaction volume

Docker Compose Production Deployment

1. Configuration

Create a production-ready config.yaml:

chains:
  ethereum:
    enabled: true
    rpc_url: https://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_API_KEY}
    ws_url: wss://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_API_KEY}
    workers: 8
    batch_size: 100
    start_block: latest

go_core:
  port: 50051
  batch_size: 100
  channel_buffer: 10000  # Higher for production
  max_condition_depth: 20
  connection_pool_size: 50

storage:
  max_findings: 50000

logging:
  level: INFO
  format: json
  console:
    enabled: true
  file:
    enabled: true
    path: /app/logs/blocklight.log
    max_size_mb: 100
    max_backups: 10

2. Environment Variables

Create .env for production:

# API Keys
ALCHEMY_API_KEY=your_production_key_here
INFURA_API_KEY=your_backup_key_here

# Logging
LOG_LEVEL=INFO

# Grafana
GRAFANA_USER=admin
GRAFANA_PASSWORD=secure_password_here

# Prometheus
PROMETHEUS_PORT=9090

# API
API_PORT=8000

3. Deploy with Observability Stack

# Start all services with observability stack (Vector, Loki, Grafana)
docker-compose --profile observability up -d

# Verify all services are healthy
docker-compose ps

# Check logs
docker-compose logs -f blocklight-core

Resource Limits

Configure resource limits in docker-compose.yml:

services:
  blocklight-core:
    deploy:
      resources:
        limits:
          cpus: '4.0'
          memory: 4G
        reservations:
          cpus: '1.0'
          memory: 1G
    restart: unless-stopped

High Availability Setup

Active-Passive Configuration

Deploy two Blocklight instances:

Primary Instance: Actively processing transactions
Standby Instance: Ready to take over on failure

Use a load balancer (HAProxy, Nginx) with health checks:

upstream blocklight {
    server blocklight-primary:50051 max_fails=3 fail_timeout=30s;
    server blocklight-standby:50051 backup;
}

server {
    listen 50051;
    location / {
        grpc_pass grpc://blocklight;
    }
}

Health Checks

Configure comprehensive health checks:

healthcheck:
  test: ["CMD", "grpc_health_probe", "-addr=:50051"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 40s

Scaling Considerations

Vertical Scaling

For high-throughput chains:

CPU: 4-8 cores for Ethereum mainnet
Memory: 4-8GB RAM
Disk: SSD with 100GB+ for logs and findings
Network: 1Gbps+ for WebSocket connections

Horizontal Scaling

Deploy multiple instances per chain:

services:
  blocklight-ethereum-1:
    # Instance 1 configuration
    
  blocklight-ethereum-2:
    # Instance 2 configuration

Use different start_block values to partition workload.

Monitoring and Observability

Prometheus Metrics

Key metrics to monitor:

# Finding rate
rate(blocklight_findings_total[5m])

# Transaction processing rate
rate(blocklight_transactions_processed_total[5m])

# Evaluation latency
histogram_quantile(0.95, blocklight_evaluation_duration_seconds)

# Event bus queue size
blocklight_event_bus_queue_size

# Memory usage
process_resident_memory_bytes{job="blocklight"}

Grafana Dashboards

Import the pre-configured dashboard:

Open Grafana at http://localhost:3000
Navigate to Dashboards → Import
Upload config/grafana/dashboards/blocklight-detections.json
Select Loki and Prometheus data sources

Alerting Rules

Configure Prometheus alerts:

groups:
  - name: blocklight
    interval: 30s
    rules:
      - alert: HighFindingRate
        expr: rate(blocklight_findings_total[5m]) > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High finding rate detected"
          
      - alert: EventBusQueueFull
        expr: blocklight_event_bus_queue_size > 8000
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Event bus queue near capacity"
          
      - alert: NoTransactionsProcessed
        expr: rate(blocklight_transactions_processed_total[5m]) == 0
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "No transactions processed in 10 minutes"

Backup and Disaster Recovery

Configuration Backup

Backup critical files daily:

#!/bin/bash
# backup.sh
BACKUP_DIR="/backups/blocklight/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"

# Backup configuration
cp config/config.yaml "$BACKUP_DIR/"
cp .env "$BACKUP_DIR/"

# Backup rules
tar -czf "$BACKUP_DIR/rules.tar.gz" rules/

# Backup findings (last 7 days)
docker exec blocklight-core tar -czf - /app/data/findings/ > "$BACKUP_DIR/findings.tar.gz"

Disaster Recovery Plan

RTO (Recovery Time Objective): < 15 minutes
RPO (Recovery Point Objective): < 1 hour

Recovery Steps:

# 1. Restore configuration
cp /backups/blocklight/latest/config.yaml config/
cp /backups/blocklight/latest/.env .

# 2. Restore rules
tar -xzf /backups/blocklight/latest/rules.tar.gz

# 3. Restart services
docker-compose --profile observability up -d

# 4. Verify health
docker-compose ps
curl http://localhost:8000/health

Security Best Practices

Network Security

# docker-compose.yml
networks:
  blocklight-network:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.0.0/16

Use internal networks for service communication
Expose only necessary ports
Implement firewall rules

Secrets Management

Use Docker secrets or external secret managers:

secrets:
  alchemy_api_key:
    external: true

services:
  blocklight-core:
    secrets:
      - alchemy_api_key

API Authentication

Enable authentication for gRPC API:

security:
  enabled: true
  api_key: ${API_KEY}
  tls:
    enabled: true
    cert_file: /certs/server.crt
    key_file: /certs/server.key

Performance Tuning

Optimize Configuration

# High-throughput configuration
go_core:
  batch_size: 200
  channel_buffer: 20000
  connection_pool_size: 100

chains:
  ethereum:
    workers: 16
    batch_size: 200

Database Optimization

For Loki:

limits_config:
  ingestion_rate_mb: 20
  max_streams_per_user: 20000
  max_global_streams_per_user: 20000

Cache Configuration

analysis:
  transaction:
    cache_ttl_seconds: 300

Troubleshooting

Common Issues

Issue: Event channel full

# Increase buffer size
go_core:
  channel_buffer: 20000

Issue: High memory usage

# Reduce cache TTL and max findings
storage:
  max_findings: 10000

analysis:
  transaction:
    cache_ttl_seconds: 180

Issue: Slow transaction processing

# Increase workers and batch size
chains:
  ethereum:
    workers: 16
    batch_size: 200

Debug Mode

Enable debug logging temporarily:

docker-compose exec blocklight-core \
  /app/blocklight start --config /app/config/config.yaml --log-level DEBUG

Maintenance

Rolling Updates

# 1. Pull latest image
docker-compose pull blocklight-core

# 2. Restart with zero downtime (if using multiple instances)
docker-compose up -d --no-deps --scale blocklight-core=2
docker-compose up -d --no-deps --scale blocklight-core=1

Log Rotation

Configure log rotation:

logging:
  file:
    enabled: true
    max_size_mb: 100
    max_backups: 30
    max_age_days: 90

Database Maintenance

Compact Loki data:

docker-compose exec loki /usr/bin/loki \
  -config.file=/etc/loki/local-config.yaml \
  -target=compactor

Cost Optimization

RPC Provider Optimization

Use caching to reduce RPC calls
Implement request batching
Consider running your own node for high volumes

Resource Right-Sizing

Monitor actual usage and adjust:

# Check resource usage
docker stats blocklight-core

# Adjust limits based on actual usage

Support and Monitoring

Health Endpoints

Basic Health: GET /health
Detailed Health: GET /health/detailed
Metrics: GET /metrics

Logging

All logs are structured JSON for easy parsing:

{
  "level": "info",
  "timestamp": "2024-01-15T10:30:00Z",
  "message": "Finding detected",
  "rule": "high_value_transfer",
  "severity": "CRITICAL",
  "chain": "ethereum",
  "tx_hash": "0x..."
}

Next Steps

Pipeline Integration - Integrate with SIEM and log aggregators
Observability - Advanced monitoring and alerting
Best Practices - Security and operational best practices

Welcome

Rules

Deployment

CLI Tools

Features

Production

​Overview

​Pre-Deployment Checklist

​Docker Compose Production Deployment

​1. Configuration

​2. Environment Variables

​3. Deploy with Observability Stack

​Resource Limits

​High Availability Setup

​Active-Passive Configuration

​Health Checks

​Scaling Considerations

​Vertical Scaling

​Horizontal Scaling

​Monitoring and Observability

​Prometheus Metrics

​Grafana Dashboards

​Alerting Rules

​Backup and Disaster Recovery

​Configuration Backup

​Disaster Recovery Plan

​Security Best Practices

​Network Security

​Secrets Management

​API Authentication

​Performance Tuning

​Optimize Configuration

​Database Optimization

​Cache Configuration

​Troubleshooting

​Common Issues

​Debug Mode

​Maintenance

​Rolling Updates

​Log Rotation

​Database Maintenance

​Cost Optimization

​RPC Provider Optimization

​Resource Right-Sizing

​Support and Monitoring

​Health Endpoints

​Logging

​Next Steps