Skip to content

Latest commit

 

History

History
487 lines (374 loc) · 15 KB

README.md

File metadata and controls

487 lines (374 loc) · 15 KB

Elasticsearch Beyonder

Welcome to the Elasticsearch Beyonder project.

This project comes historically from spring-elasticsearch project.

The goal of this project is to provide a simple Java library which helps to create indices, mappings, etc. when you start your application.

Versions

elasticsearch-beyonder elasticsearch Release date
8.6-SNAPSHOT 8.x
7.16 7.x 2022-01-13
7.15 7.x 2021-10-14
7.13.2 7.x 2021-07-22
7.13.1 7.x 2021-06-21
7.13 7.x 2021-06-03
7.5 7.x 2020-01-15
7.0 7.0 -> 7.x 2019-04-04
6.5 6.5 -> 6.x 2019-01-04
6.3 6.3 -> 6.4 2018-07-21
6.0 6.0 -> 6.2 2018-02-05
5.1 5.x, 6.x 2017-07-12
5.0 5.x, 6.x 2017-07-11
2.1.0 2.0, 2.1 2015-11-25
2.0.0 2.0 2015-10-24
1.5.0 1.5 2015-03-27
1.4.1 1.4 2015-03-02
1.4.0 1.4 2015-02-27

Documentation

  • For 8.x elasticsearch versions, you are reading the latest documentation.
  • For 7.x elasticsearch versions, look at es-7.x branch.
  • For 6.x elasticsearch versions, look at es-6.x branch.
  • For 5.x elasticsearch versions, look at es-5.x branch.
  • For 2.x elasticsearch versions, look at es-2.1 branch.

Build Status

Maven Central Build Status

Release notes

8.6

  • Update project to Elasticsearch 8.6.2.
  • Remove the deprecated Transport Client
  • _pipeline dir is not supported anymore. Use _pipelines dir.
  • _template and _templates dir are not supported anymore. Use _index_templates and _component_templates dirs.
  • method start(RestClient client, String root, boolean merge, boolean force) is now start(RestClient client, String root, boolean force).

7.16

  • Update Log4J (optional) dependency to 2.17.1.

7.15

  • Add support for Index Lifecycles.

7.13.2

  • Added back support for Java 8

7.13.1

  • _pipeline dir has been deprecated by _pipelines dir.
  • _template dir has been deprecated by _templates dir.
  • force parameter is not applied anymore to pipelines. So pipelines are always updated.
  • force parameter is not applied anymore to templates, component templates and index templates. So they are always updated.
  • method start(RestClient client, String root, boolean merge, boolean force) is now deprecated as the merge parameter is not used anymore. Use instead the start(RestClient client, String root, boolean force) method.
  • support for the aliases API has been added.

Getting Started

Maven dependency

Import elasticsearch-beyonder in you project pom.xml file:

<dependency>
  <groupId>fr.pilato.elasticsearch</groupId>
  <artifactId>elasticsearch-beyonder</artifactId>
  <version>8.6-SNAPSHOT</version>
</dependency>

You need to import as well the elasticsearch client you want to use by adding one of the following dependencies to your pom.xml file.

For example, here is how to import the REST Client to your project:

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-client</artifactId>
    <version>8.6.2</version>
</dependency>

For example, here is how to import the Transport Client to your project (deprecated):

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>transport</artifactId>
    <version>8.6.2</version>
</dependency>

Adding Beyonder to your client

Elasticsearch provides a Low Level Rest Client. You can create it like this:

RestClient client = RestClient.builder(HttpHost.create("http://127.0.0.1:9200")).build();

Once you have the client, you can use it to manage automatic creation of index, mappings, templates and aliases. To activate those features, you only need to pass to Beyonder the Rest Client instance:

ElasticsearchBeyonder.start(client);

By default, Beyonder will try to locate resources from elasticsearch directory within your classpath. We will use this default value for the rest of the documentation.

But you can change this using:

ElasticsearchBeyonder.start(client, "models/myelasticsearch");

In that case, Beyonder will search for resources from models/myelasticsearch.

There is also a more complete version of the start method with:

ElasticsearchBeyonder.start(client, "models/myelasticsearch", true);

This last parameter is known as force. It removes any existing index which is managed by Beyonder. It is super useful for integration testing but it is super dangerous in production.

For the record, when your cluster is secured, you can use for example the Basic Authentication:

CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials("elastic", "changeme"));
RestClient client = RestClient.builder(HttpHost.create("http://127.0.0.1:9200"))
        .setHttpClientConfigCallback(hcb -> hcb.setDefaultCredentialsProvider(credentialsProvider)).build();
ElasticsearchBeyonder.start(client);

Managing indices

When Beyonder starts, it tries to find index names and settings in the classpath.

If you add in your classpath a file named elasticsearch/twitter, the twitter index will be automatically created at startup if it does not exist yet.

If you add in your classpath a file named elasticsearch/twitter/_settings.json, it will be automatically applied to define settings for your twitter index.

For example, create the following file src/main/resources/elasticsearch/twitter/_settings.json in your project:

{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "message": { "type": "text" },
      "foo": { "type": "text" }
    }
  }
}

By default, Beyonder will not overwrite an index if it already exists. This can be overridden by setting force to true in the expanded factory method ElasticsearchBeyonder.start().

You can also provide a file named _update_settings.json to update your index settings and a file named _update_mapping.json if you want to update an existing mapping. Note that Elasticsearch do not allow updating all settings and mappings.

You can for example add a new field, or change the search_analyzer for a given field but you can not modify the field type.

Considering the previous example we saw, you can create a elasticsearch/twitter/_update_settings.json to update the number of replicas:

{
    "number_of_replicas" : 1
}

And you can create elasticsearch/twitter/_update_mapping.json:

{
  "properties": {
    "message" : {"type" : "text", "search_analyzer": "keyword" },
    "bar" : { "type" : "text" }
  }
}

This will change the search_analyzer for the message field and will add a new field named bar. All other existing fields (like foo in the previous example) won't be changed.

Managing aliases

An alias is helpful to define or remove an alias to a given index. You could also use index templates to do that automatically when the index is created, but you can also define a file elasticsearch/_aliases.json:

{
  "actions" : [
    { "remove": { "index": "test_1", "alias": "test" } },
    { "add":  { "index": "test_2", "alias": "test" } }
  ]
}

When Beyonder starts, it will automatically send the content to the Aliases API.

Managing index templates (aka templates V2)

Since version 7.13, the new index template management API is supported. It allows to define both component templates and index templates.

Component templates

To define component templates, you can create json files within the elasticsearch/_component_templates/ dir.

Let's first create a elasticsearch/_component_templates/component1.json:

{
  "template": {
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date"
        }
      }
    }
  }
}

Then a second component template as elasticsearch/_component_templates/component2.json:

{
  "template": {
    "mappings": {
      "runtime": {
        "day_of_week": {
          "type": "keyword",
          "script": {
            "source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
          }
        }
      }
    }
  }
}

When Beyonder starts, it will create 2 component templates into elasticsearch, named respectively component1 and component2.

Index templates

To define index templates, you can create json files within the elasticsearch/_index_templates/ dir.

Let's create a elasticsearch/_index_templates/template_1.json:

{
  "index_patterns": ["te*", "bar*"],
  "template": {
    "settings": {
      "number_of_shards": 1
    },
    "mappings": {
      "_source": {
        "enabled": true
      },
      "properties": {
        "host_name": {
          "type": "keyword"
        },
        "created_at": {
          "type": "date",
          "format": "EEE MMM dd HH:mm:ss Z yyyy"
        }
      }
    },
    "aliases": {
      "mydata": { }
    }
  },
  "priority": 500,
  "composed_of": ["component1", "component2"],
  "version": 3,
  "_meta": {
    "description": "my custom"
  }
}

When Beyonder starts, it will create the index templates named template_1 into elasticsearch. Note that this index template references 2 component templates that must be available before Beyonder starts or defined within the component_templates dir as we saw just before.

Managing pipelines

A pipeline is a definition of a series of processors that are to be executed in the same order as they are declared while documents are being indexed. Please note that this feature is only supported when you use the REST client not the Transport client.

For example, setting one fields value based on another field by using an Set Processor you can add a file named elasticsearch/_pipelines/set_field_processor.json in your project:

{
  "description" : "Twitter pipeline",
  "processors" : [
    {
      "set" : {
        "field": "copy",
        "value": "{{otherField}}"
      }
    }
  ]
}

Index lifecycles

To define an index lifecycle, you can create json files within the elasticsearch/_index_lifecycles/ dir.

Let's create a elasticsearch/_index_lifecycles/my_lifecycle.json:

{
  "policy": {
    "phases": {
      "warm": {
        "min_age": "10d",
        "actions": {
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

When Beyonder starts, it will create the index templates named my_lifecycle into elasticsearch.

Tests

This project comes with unit tests and integration tests. You can disable running them by using skipTests option as follows:

mvn clean install -DskipTests

Unit Tests

If you want to disable only running unit tests, use skipUnitTests option:

mvn clean install -DskipUnitTests

Integration Tests

Integration tests are launching a Docker instance. So you need to have Docker installed.

If you want to disable running integration tests, use skipIntegTests option:

mvn clean install -DskipIntegTests

If you wish to run integration tests against a cluster which is already running externally, you can configure the following settings to locate your cluster:

setting default
tests.cluster http://127.0.0.1:9400

For example:

mvn clean install -Dtests.cluster=http://127.0.0.1:9200

If you want to run your tests against an Elastic Cloud instance, you can use something like:

mvn clean install \
    -Dtests.cluster=https://CLUSTERID.eu-west-1.aws.found.io:9243 \
    -Dtests.cluster.user=elastic \
    -Dtests.cluster.pass=GENERATEDPASSWORD

When user and password are set only Rest Tests are ran.

Why this name?

I was actually looking for a cool name in the marvel characters list and found that Beyonder was actually a very powerful character.

This project gives some features beyond elasticsearch itself. :)

License

This software is licensed under the Apache 2 license, quoted below.

Copyright 2011-2023 David Pilato

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.