DAPE.work

Cluster Red Unassigned Shards

Severity: Critical
Elasticsearch Version: 8.5.0

Problem

Cluster health is RED due to unassigned primary shards after node failure

Root Cause

Node failure caused primary shards to become unassigned, preventing cluster from allocating shards properly

How to Detect

Symptoms

  • Cluster health status is RED
  • Unassigned primary shards listed in _cat/shards
  • Node failure logs indicating shard allocation issues

Commands

curl -X GET "localhost:9200/_cluster/health"
curl -X GET "localhost:9200/_cat/shards?v"
curl -X GET "localhost:9200/_cluster/allocation/explain"

Remediation Steps

  1. Identify unassigned primary shards using _cat/shards
  2. Check shard allocation explanations with _cluster/allocation/explain
  3. Attempt to reroute unassigned shards with POST /_cluster/reroute with specific shard allocation commands
  4. If necessary, manually allocate shards using cluster reroute API
  5. Verify cluster health status after reroute

Prevention

  • Implement shard allocation awareness and replica settings
  • Ensure sufficient node capacity and disk space
  • Configure shard allocation filtering to prevent overloading nodes
  • Regularly monitor cluster health and shard status

Production Example

curl -X POST "localhost:9200/_cluster/reroute" -H 'Content-Type: application/json' -d '{"commands": [{"allocate_primary": {"index": "my_index", "shard": 0, "node": "node_name"}}]}'