Disk Watermark Exceeded
Severity:
High
Elasticsearch Version:
7.17.0
Problem
Shard allocation refused due to disk watermark exceeded
Root Cause
Disk usage on data nodes exceeded high watermark, preventing shard allocation to avoid disk space exhaustion
How to Detect
Symptoms
- Elasticsearch cluster health shows red or yellow status
- Shard allocation failures in logs
- Cluster state indicates shards unassigned due to disk issues
Commands
curl -X GET 'localhost:9200/_cluster/health?pretty'
curl -X GET 'localhost:9200/_cat/shards?v'
curl -X GET 'localhost:9200/_cluster/allocation/explain?pretty'
Remediation Steps
- Identify nodes with high disk usage using '_cat/allocation' API
- Free disk space by deleting unnecessary data or snapshots
- Adjust disk watermark settings temporarily: curl -X PUT 'localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d '{"persistent": {"cluster.routing.allocation.disk.watermark.high": "85%"}}'
- Reroute shards away from overutilized nodes: curl -X POST 'localhost:9200/_cluster/reroute' -H 'Content-Type: application/json' -d '{"commands": [{"move": {"index": "<index_name>", "shard": <shard_number>, "from_node": "<node_name>", "to_node": "<target_node>"}}]}']
- Monitor shard reallocation and disk usage
- Once disk space is freed, restore watermark settings to default
Prevention
- Implement regular disk usage monitoring and alerts
- Configure appropriate disk watermarks based on node capacity
- Schedule routine cleanup of old indices and snapshots
- Use data lifecycle management policies to automate data retention
Production Example
curl -X PUT 'localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d '{"persistent": {"cluster.routing.allocation.disk.watermark.high": "85%"}}'