Documentation Index
Fetch the complete documentation index at: https://docs.blocklight.co/llms.txt
Use this file to discover all available pages before exploring further.
Overview
This guide covers production-grade deployment of Blocklight with enterprise features including high availability, horizontal scaling, comprehensive monitoring, and disaster recovery.
Pre-Deployment Checklist
Before deploying to production, ensure you have:
Docker Compose Production Deployment
1. Configuration
Create a production-ready config.yaml:
chains:
ethereum:
enabled: true
rpc_url: https://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_API_KEY}
ws_url: wss://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_API_KEY}
workers: 8
batch_size: 100
start_block: latest
go_core:
port: 50051
batch_size: 100
channel_buffer: 10000 # Higher for production
max_condition_depth: 20
connection_pool_size: 50
storage:
max_findings: 50000
logging:
level: INFO
format: json
console:
enabled: true
file:
enabled: true
path: /app/logs/blocklight.log
max_size_mb: 100
max_backups: 10
2. Environment Variables
Create .env for production:
# API Keys
ALCHEMY_API_KEY=your_production_key_here
INFURA_API_KEY=your_backup_key_here
# Logging
LOG_LEVEL=INFO
# Grafana
GRAFANA_USER=admin
GRAFANA_PASSWORD=secure_password_here
# Prometheus
PROMETHEUS_PORT=9090
# API
API_PORT=8000
3. Deploy with Observability Stack
# Start all services with observability stack (Vector, Loki, Grafana)
docker-compose --profile observability up -d
# Verify all services are healthy
docker-compose ps
# Check logs
docker-compose logs -f blocklight-core
Resource Limits
Configure resource limits in docker-compose.yml:
services:
blocklight-core:
deploy:
resources:
limits:
cpus: '4.0'
memory: 4G
reservations:
cpus: '1.0'
memory: 1G
restart: unless-stopped
High Availability Setup
Active-Passive Configuration
Deploy two Blocklight instances:
- Primary Instance: Actively processing transactions
- Standby Instance: Ready to take over on failure
Use a load balancer (HAProxy, Nginx) with health checks:
upstream blocklight {
server blocklight-primary:50051 max_fails=3 fail_timeout=30s;
server blocklight-standby:50051 backup;
}
server {
listen 50051;
location / {
grpc_pass grpc://blocklight;
}
}
Health Checks
Configure comprehensive health checks:
healthcheck:
test: ["CMD", "grpc_health_probe", "-addr=:50051"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
Scaling Considerations
Vertical Scaling
For high-throughput chains:
- CPU: 4-8 cores for Ethereum mainnet
- Memory: 4-8GB RAM
- Disk: SSD with 100GB+ for logs and findings
- Network: 1Gbps+ for WebSocket connections
Horizontal Scaling
Deploy multiple instances per chain:
services:
blocklight-ethereum-1:
# Instance 1 configuration
blocklight-ethereum-2:
# Instance 2 configuration
Use different start_block values to partition workload.
Monitoring and Observability
Prometheus Metrics
Key metrics to monitor:
# Finding rate
rate(blocklight_findings_total[5m])
# Transaction processing rate
rate(blocklight_transactions_processed_total[5m])
# Evaluation latency
histogram_quantile(0.95, blocklight_evaluation_duration_seconds)
# Event bus queue size
blocklight_event_bus_queue_size
# Memory usage
process_resident_memory_bytes{job="blocklight"}
Grafana Dashboards
Import the pre-configured dashboard:
- Open Grafana at
http://localhost:3000
- Navigate to Dashboards → Import
- Upload
config/grafana/dashboards/blocklight-detections.json
- Select Loki and Prometheus data sources
Alerting Rules
Configure Prometheus alerts:
groups:
- name: blocklight
interval: 30s
rules:
- alert: HighFindingRate
expr: rate(blocklight_findings_total[5m]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "High finding rate detected"
- alert: EventBusQueueFull
expr: blocklight_event_bus_queue_size > 8000
for: 2m
labels:
severity: critical
annotations:
summary: "Event bus queue near capacity"
- alert: NoTransactionsProcessed
expr: rate(blocklight_transactions_processed_total[5m]) == 0
for: 10m
labels:
severity: critical
annotations:
summary: "No transactions processed in 10 minutes"
Backup and Disaster Recovery
Configuration Backup
Backup critical files daily:
#!/bin/bash
# backup.sh
BACKUP_DIR="/backups/blocklight/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
# Backup configuration
cp config/config.yaml "$BACKUP_DIR/"
cp .env "$BACKUP_DIR/"
# Backup rules
tar -czf "$BACKUP_DIR/rules.tar.gz" rules/
# Backup findings (last 7 days)
docker exec blocklight-core tar -czf - /app/data/findings/ > "$BACKUP_DIR/findings.tar.gz"
Disaster Recovery Plan
- RTO (Recovery Time Objective): < 15 minutes
- RPO (Recovery Point Objective): < 1 hour
Recovery Steps:
# 1. Restore configuration
cp /backups/blocklight/latest/config.yaml config/
cp /backups/blocklight/latest/.env .
# 2. Restore rules
tar -xzf /backups/blocklight/latest/rules.tar.gz
# 3. Restart services
docker-compose --profile observability up -d
# 4. Verify health
docker-compose ps
curl http://localhost:8000/health
Security Best Practices
Network Security
# docker-compose.yml
networks:
blocklight-network:
driver: bridge
ipam:
config:
- subnet: 172.20.0.0/16
- Use internal networks for service communication
- Expose only necessary ports
- Implement firewall rules
Secrets Management
Use Docker secrets or external secret managers:
secrets:
alchemy_api_key:
external: true
services:
blocklight-core:
secrets:
- alchemy_api_key
API Authentication
Enable authentication for gRPC API:
security:
enabled: true
api_key: ${API_KEY}
tls:
enabled: true
cert_file: /certs/server.crt
key_file: /certs/server.key
Optimize Configuration
# High-throughput configuration
go_core:
batch_size: 200
channel_buffer: 20000
connection_pool_size: 100
chains:
ethereum:
workers: 16
batch_size: 200
Database Optimization
For Loki:
limits_config:
ingestion_rate_mb: 20
max_streams_per_user: 20000
max_global_streams_per_user: 20000
Cache Configuration
analysis:
transaction:
cache_ttl_seconds: 300
Troubleshooting
Common Issues
Issue: Event channel full
# Increase buffer size
go_core:
channel_buffer: 20000
Issue: High memory usage
# Reduce cache TTL and max findings
storage:
max_findings: 10000
analysis:
transaction:
cache_ttl_seconds: 180
Issue: Slow transaction processing
# Increase workers and batch size
chains:
ethereum:
workers: 16
batch_size: 200
Debug Mode
Enable debug logging temporarily:
docker-compose exec blocklight-core \
/app/blocklight start --config /app/config/config.yaml --log-level DEBUG
Maintenance
Rolling Updates
# 1. Pull latest image
docker-compose pull blocklight-core
# 2. Restart with zero downtime (if using multiple instances)
docker-compose up -d --no-deps --scale blocklight-core=2
docker-compose up -d --no-deps --scale blocklight-core=1
Log Rotation
Configure log rotation:
logging:
file:
enabled: true
max_size_mb: 100
max_backups: 30
max_age_days: 90
Database Maintenance
Compact Loki data:
docker-compose exec loki /usr/bin/loki \
-config.file=/etc/loki/local-config.yaml \
-target=compactor
Cost Optimization
RPC Provider Optimization
- Use caching to reduce RPC calls
- Implement request batching
- Consider running your own node for high volumes
Resource Right-Sizing
Monitor actual usage and adjust:
# Check resource usage
docker stats blocklight-core
# Adjust limits based on actual usage
Support and Monitoring
Health Endpoints
- Basic Health:
GET /health
- Detailed Health:
GET /health/detailed
- Metrics:
GET /metrics
Logging
All logs are structured JSON for easy parsing:
{
"level": "info",
"timestamp": "2024-01-15T10:30:00Z",
"message": "Finding detected",
"rule": "high_value_transfer",
"severity": "CRITICAL",
"chain": "ethereum",
"tx_hash": "0x..."
}
Next Steps