Clone and ReIndex performs similar operations in Elasticsearch, but it has fundamentally some differences. Now let us understand the details of clone and reindex operations.

What is a Clone operation ?

Clone operation will clone an existing index into a new index, where each original primary shard is cloned into a new primary shard in the new index. Basically this functionality is to copy an existing index to a new index with the same properties and settings as that of the original index.

The following are the internal activities happening as part of the clone operation.

  • First, it creates a new target index with the same definition as the source index.
  • Then it hard-links segments from the source index into the target index. (If the file system doesn’t support hard-linking, then all segments are copied into the new index, which is a much more time consuming process.)
  • Finally, it recovers the target index as though it were a closed index which had just been re-opened.

Clone functionality is useful in cases where we need the a copy of the index as is to another index. Clone will maintain the same number of shards, same mapping and settings as that of the source index in the target index.

The API request for clone operation is given below.

POST /my_source_index/_clone/my_target_index

What is a ReIndex operation ?

ReIndex operation copies the contents of a source index and writes it to a target index. This operation copies only the data and does not copies the index settings. We need to create the target index upfront with the required settings and mapping before doing the reindex operation. The source and destination can be any pre-existing index, index alias, or data stream. However, the source and destination must be different.

A Sample request definition for reindex is given below.

POST _reindex
{
  "source": {
    "index": "source_index"
  },
  "dest": {
    "index": "target_index"
  }
}

ReIndex is suitable for cases that requires updating the number of shards, updating the mapping, updating the settings etc. I usually perform reindex to update the mapping.

ReIndex operation can be performed in the background by setting the following property

wait_for_completion=false.

I hope this explanation is clear and useful.

Advertisement