Top Five Tips and Tricks to Manage Your Elasticsearch Cluster

Posted on April 16, 2020

Elasticsearch is the engine of choice for many companies looking for a distributed, RESTful search and analytics solution. At CloudHero, we deploy Elasticsearch on Kubernetes and use it quite a lot for storing and analyzing data. Using our hands-on experience, we compiled a cheat sheet containing the top five most helpful commands that you can use to manage your Elasticsearch cluster.

1. Elasticsearch 403 Forbidden

If you don’t delete the Elasticsearch indices from time to time, you will eventually run out disk space. Usually, at 85% disk utilization, Elasticsearch will hit something called a watermark. After this watermark is reached, Elasticsearch will stop allocating any shards to the nodes (log ingest will stop). To fix this, you first need to delete some indices:

curl -XDELETE <elasticsearch_ip>:9200/<index_name>

or

curl -XDELETE <elasticsearch_ip>:9200/<index_prefix>*

When enough disk space is released, we can go on and unblock all current indices:

curl -X PUT "<elasticsearch_ip>:9200/_settings" -H 'Content-Type: application/json' -d'
{
    "index": {
        "blocks": {
            "read_only_allow_delete": "false"
        }
    }
}
'

As a permanent fix, make sure you install Elasticsearch Curator, a tool that helps you curate, or manage your indices.

2. How to Create Elasticsearch templates

Sometimes it is necessary to create Elasticsearch templates to override default settings. In the example below, we create an Elasticsearch template which sets the number of shards to 1 and the number of replicas to 0 (great settings for a single node cluster):

curl -X POST "<elasticsearch_ip>:9200/_template/default" -H 'Content-Type: application/json' -d'
{
  "index_patterns": ["*"],
  "order": -1,
  "settings": {
    "number_of_shards": "1",
    "number_of_replicas": "0"
  }
}
'

Keep in mind that this template will only take effect for newly created indices. If you need to apply these settings to all your indices, you will need to reindex your data.

3. How to Reindex your data

As mentioned earlier, if you would like to apply template settings to old indices, you need to reindex your data. Reindexing actually copies old indices to new ones, but with a different name. Currently, Elasticsearch does not support reindexing multiple indices with the same command, but we can fix that with a simple trick:

#!/bin/bash
set -euo pipefail
IFS=$'\n'; arr=( $(curl -s <elasticsearch_ip>:9200/_cat/indices | tr -s ' ' | cut -d' ' -f3) ); for index in ${arr[@]} ; do
  curl -H Content-Type:application/json -XPOST "<elasticsearch_ip>:9200/_reindex?slices=auto&refresh&pretty" -d'{
    "source": {
      "index": "'$index'"
    },
    "dest": {
      "index": "'$index'-reindexed"
    }
  }'
done

Notice the slices=auto setting, it will let Elasticsearch decide on how to split indices for reindexing based on you CPU.

After reindexing your data, you are left with old indices which contain redundant data. One trick to remove them is:

#!/bin/bash
IFS=$'\n'; arr=( $(curl -s <elasticsearch_ip>:9200/_cat/indices | tr -s ' ' | cut -d' ' -f3 | grep -v '\-reindexed$' | grep -v kibana) ); for index in ${arr[@]} ; do
  curl -H Content-Type:application/json -XDELETE "<elasticsearch_ip>:9200/$index"
done

The script chooses all indices which do not have the -reindexed suffix and which are not Kibana indices and deletes them.

4. Hot to Migrate Elasticsearch Data Using Snapshots

Migrating data from one Elasticsearch cluster to another can be pretty frustrating, but luckily, we can create snapshots from the source cluster and restore them on the destination one. The easiest method which we found is to use AWS S3 to hold our snapshots, but you need to make sure that you have the S3 plugin installed on all nodes in the cluster.

To set up an AWS S3 bucket as a snapshot repository:

curl -XPUT 'http://<elasticsearch_ip>:9200/_snapshot/<snapshot_name>?pretty' -H 'Content-Type: application/json' -d'
{
  "type": "s3",
  "settings": {
    "bucket": "<s3_bucket>",
    "region": "<s3_region>"
  }
}'

To push a snapshot of your cluster to your configured snapshot repository:

curl -XPUT 'http://<elasticsearch_ip>:9200/_snapshot/<snapshot_name>/snapshot_1?wait_for_completion=true&pretty'

Notice the wait_for_completion=true flag. We use it because it’s much easier to check when the snapshot upload is completed, without interrogating Elasticsearch for this information. It basically runs this process in foreground, so even if you are disconnected from you SSH session, you can still see the process running by using the PS tool.

On the destination cluster:

curl -X POST "<elasticsearch_ip>:9200/_snapshot/<snapshot_name>/snapshot_1/_restore?pretty" -H 'Content-Type: application/json' -d'
{
  "indices": "*",
  "ignore_unavailable": true,
  "include_global_state": true,
  "rename_pattern": ".kibana",
  "rename_replacement": "restored_.kibana"
}
'

Notice the rename pattern, it is used so your Kibana data is not overwritten by the snapshot.

5. How to fix GeoIP mapping form Logstash to Elasticsearch

When migrating to Logstash, we wanted to use its geopoint capabilities for Nginx access logs. It practically takes a field from your logs which contains IP addresses, and uses its GeoIP database to create another field with GeoIP information.

Unfortunately, when using this feature, Elasticsearch does not recognise GeoIP information and sets the field types as something else. In turn, this leads to the inability to create map dashboards in Kibana, as they require geopoint data types for the fields. The easiest fix for this problem is to create another index template:

curl -X POST "<elasticsearch_ip>:9200/_template/<template_name>" -H 'Content-Type: application/json' -d'  
{
  "index_patterns" : "<index_prefix>-*",
  "version" : 60001,
  "settings" : {
    "index.refresh_interval" : "5s",
    "number_of_shards": 1
  },
  "mappings" : {
    "dynamic_templates" : [ {
      "message_field" : {
        "path_match" : "message",
        "match_mapping_type" : "string",
        "mapping" : {
          "type" : "text",
          "norms" : false
        }
      }
    }, {
      "string_fields" : {
        "match" : "*",
        "match_mapping_type" : "string",
        "mapping" : {
          "type" : "text", "norms" : false,
          "fields" : {
            "keyword" : { "type": "keyword", "ignore_above": 256 }
          }
        }
      }
    } ],
    "properties" : {
      "@timestamp": { "type": "date"},
      "@version": { "type": "keyword"},
      "geoip"  : {
        "dynamic": true,
        "properties" : {
          "ip": { "type": "ip" },
          "location" : { "type" : "geo_point" },
          "latitude" : { "type" : "half_float" },
          "longitude" : { "type" : "half_float" }
        }
      }
    }
  }
}
'

The index prefix is defined in filebeat, and the index settings can be skipped, what is important is the last part.

Bonus: Create Kibana indices from the command line

The last missing piece from a fully automated logging pipeline is creating Kibana indices from the command line, and not from the UI.

Here is a script which can help you with this task:

#!/bin/bash
set -euo pipefail
url="http://$1"
id="$2"
index_pattern="$id-*"
time_field="@timestamp"
# Create index pattern
# curl -f to fail on error
curl -f -XPOST -H "Content-Type: application/json" -H "kbn-xsrf: anything" \
  "$url/api/saved_objects/index-pattern/$id" \
  -d"{\"attributes\":{\"title\":\"$index_pattern\",\"timeFieldName\":\"$time_field\"}}"

I hope that all these tips will help you fix problems which may arise when using Elasticsearch.

Keep Reading

Take Advantage of the Cloud

Schedule a Call