Transport Timeout
Severity:
High
Elasticsearch Version:
8.5.0
Problem
Transport timeout exceptions due to slow or unstable cluster communication
Root Cause
Network latency or instability affecting inter-node communication, leading to increased response times and timeouts
How to Detect
Symptoms
- TransportTimeoutExceptions in logs
- High latency in cluster communication
- Increased request latency and failures
Commands
curl -XGET 'localhost:9200/_cluster/health?pretty'
curl -XGET 'localhost:9200/_nodes/stats/transport'
curl -XGET 'localhost:9200/_cluster/stats'
Remediation Steps
- Verify network connectivity and latency between nodes
- Check cluster logs for network errors or dropped connections
- Increase transport socket timeout settings in elasticsearch.yml (e.g., 'transport.tcp.connect_timeout')
- Restart affected nodes if network issues are resolved
- Monitor cluster health and communication stability post-remediation
Prevention
- Implement network redundancy and quality of service (QoS) policies
- Regularly monitor network latency and node health
- Configure appropriate timeout settings based on network performance
- Ensure cluster nodes are geographically close or connected via reliable links
Production Example
curl -XGET 'localhost:9200/_cluster/health?pretty'