Transport Timeout
Severity:
High
Elasticsearch Version:
8.5.0
Problem
Transport timeout exceptions during cluster communication
Root Cause
Network latency or instability causing slow node-to-node communication
How to Detect
Symptoms
- TransportTimeoutExceptions in logs
- High latency in cluster health API responses
- Increased request failures
Commands
curl -XGET 'localhost:9200/_cluster/health?pretty'
curl -XGET 'localhost:9200/_cat/nodes?v&h=ip,heapPercent,ramPercent,load,nodeRole,master'
curl -XGET 'localhost:9200/_nodes/stats?http&network'
Remediation Steps
- Verify network connectivity between nodes
- Check for network congestion or packet loss
- Increase network timeout settings in elasticsearch.yml (e.g., transport.tcp.connect_timeout)
- Restart affected nodes after network stabilization
- Monitor cluster logs for recurring timeout errors
Prevention
- Implement network quality of service (QoS) policies
- Ensure low latency and reliable network links between nodes
- Configure appropriate timeout settings based on network performance
- Regularly monitor network health and cluster metrics
Production Example
curl -XGET 'localhost:9200/_cluster/health?pretty'