|
| 1 | + |
| 2 | + |
| 3 | +Image by [DaPino](http://www.iconarchive.com/show/fishing-equipment-icons-by-dapino/backpack-icon.html) |
| 4 | +[CC Attribution-Noncommercial 3.0](http://creativecommons.org/licenses/by-nc/3.0/) |
| 5 | + |
| 6 | +# Elasticsearch Knapsack Plugin |
| 7 | + |
| 8 | +Knapsack is an "swiss knife" export/import plugin for [Elasticsearch](http://github.com/elasticsearch/elasticsearch). |
| 9 | +It uses archive formats (tar, zip, cpio) and also Elasticsearch bulk format with |
| 10 | +compression algorithms (gzip, bzip2, lzf, xz). |
| 11 | + |
| 12 | +A pull or push of indexes or search hits with stored fields across clusters is also supported. |
| 13 | + |
| 14 | +The knapsack actions can be executed via HTTP REST, or in Java using the Java API. |
| 15 | + |
| 16 | +In archive files, the following index information is encoded: |
| 17 | + |
| 18 | +- index settings |
| 19 | +- index mappings |
| 20 | +- index aliases |
| 21 | + |
| 22 | +When importing archive files again, this information is reapplied. |
| 23 | + |
| 24 | +## Versions |
| 25 | + |
| 26 | + |
| 27 | + |
| 28 | +| Elasticsearch | Plugin | Release date | |
| 29 | +| -------------- | -------------- | ------------ | |
| 30 | +| 1.3.2 | 1.3.2.0 | Sep 28, 2014 | |
| 31 | +| 1.2.1 | 1.2.1.0 | Jun 13, 2014 | |
| 32 | +| 1.2.0 | 1.2.0.0 | May 23, 2014 | |
| 33 | +| 1.1.0 | 1.1.0.0 | May 25, 2014 | |
| 34 | +| 1.0.0 | 1.0.0.1 | Feb 16, 2014 | |
| 35 | +| 0.90.11 | 0.90.11.2 | Feb 16, 2014 | |
| 36 | +| 0.20.6 | 0.20.6.2 | Feb 16, 2014 | |
| 37 | +| 0.19.11 | 0.19.11.3 | Feb 15, 2014 | |
| 38 | +| 0.19.8 | 0.19.8.1 | Feb 15, 2014 | |
| 39 | + |
| 40 | +## Installation |
| 41 | + |
| 42 | + ./bin/plugin -install knapsack -url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-knapsack/1.3.2.0/elasticsearch-knapsack-1.3.2.0-plugin.zip |
| 43 | + |
| 44 | +Do not forget to restart the node after installation. |
| 45 | + |
| 46 | +## Project docs |
| 47 | + |
| 48 | +The Maven project site is available at [Github](http://jprante.github.io/elasticsearch-knapsack) |
| 49 | + |
| 50 | +## Overview |
| 51 | + |
| 52 | + |
| 53 | + |
| 54 | +# Example |
| 55 | + |
| 56 | +Let's go through a simple example: |
| 57 | + |
| 58 | + curl -XDELETE localhost:9200/test |
| 59 | + curl -XPUT localhost:9200/test/test/1 -d '{"key":"value 1"}' |
| 60 | + curl -XPUT localhost:9200/test/test/2 -d '{"key":"value 2"}' |
| 61 | + |
| 62 | +# Export |
| 63 | + |
| 64 | +You can export this Elasticsearch index with |
| 65 | + |
| 66 | + curl -XPOST localhost:9200/test/test/_export |
| 67 | + {"running":true,"state":{"mode":"export","started":"2014-09-28T19:18:27.447Z","path":"file:///Users/es/elasticsearch-1.3.2/test_test.tar.gz","node_name":"Slither"}} |
| 68 | + |
| 69 | +The result is a file in the Elasticsearch folder |
| 70 | + |
| 71 | + -rw-r--r-- 1 joerg staff 343 28 Sep 21:18 test_test.tar.gz |
| 72 | + |
| 73 | +Check with tar utility, the settings and the mapping is also exported |
| 74 | + |
| 75 | + tar ztvf test_test.tar.gz |
| 76 | + -rw-r--r-- 0 joerg 0 133 28 Sep 21:18 test/_settings/null/null |
| 77 | + -rw-r--r-- 0 joerg 0 49 28 Sep 21:18 test/test/_mapping/null |
| 78 | + -rw-r--r-- 0 joerg 0 17 28 Sep 21:18 test/test/1/_source |
| 79 | + -rw-r--r-- 0 joerg 0 17 28 Sep 21:18 test/test/2/_source |
| 80 | + |
| 81 | +Also, you can export a whole index with |
| 82 | + |
| 83 | + curl -XPOST localhost:9200/test/_export |
| 84 | + |
| 85 | +with the result file test.tar.gz, or even all cluster indices with |
| 86 | + |
| 87 | + curl -XPOST 'localhost:9200/_export' |
| 88 | + |
| 89 | +to the file `_all.tar.gz` |
| 90 | + |
| 91 | +By default, the archive format is `tar` with compression `gz` (gzip). |
| 92 | +You can also export to `zip`, `cpio` or `bulk` archive or use another compression scheme. |
| 93 | +Available are `bz2` (bzip2), `xz` (Xz), or `lzf` (LZF) |
| 94 | + |
| 95 | +Note: if you use the `bulk` format, you can create Elasticsearch bulk format. |
| 96 | + |
| 97 | +## Export search results |
| 98 | + |
| 99 | +You can add a query to the `_export` endpoint just like you would do for searching in Elasticsearch. |
| 100 | + |
| 101 | + curl -XPOST 'localhost:9200/test/test/_export' -d '{ |
| 102 | + "query" : { |
| 103 | + "match_phrase" : { |
| 104 | + "key" : "value 1" |
| 105 | + } |
| 106 | + }, |
| 107 | + "fields" : [ "_parent", "_source" ] |
| 108 | + }' |
| 109 | + |
| 110 | +## Export to an archive with a given path name |
| 111 | + |
| 112 | +You can configure an archive path with the parameter `path` |
| 113 | + |
| 114 | + curl -XPOST 'localhost:9200/test/_export?path=/tmp/myarchive.zip' |
| 115 | + |
| 116 | +If Elasticsearch can not write to the path, an error message will appear, and no export will take place. |
| 117 | +You can force overwrite with the parameter `overwrite=true` |
| 118 | + |
| 119 | +## Renaming indexes and index types |
| 120 | + |
| 121 | +You can rename indexes and index types by adding a `map` parameter that contains a JSON |
| 122 | +object with old and new index (and index/type) names. |
| 123 | + |
| 124 | + curl -XPOST 'localhost:9200/test/type/_export?map=\{"test":"testcopy","test/type":"testcopy/typecopy"\}' |
| 125 | + |
| 126 | +Note the backslash, which is required to escape shell interpretation of curly braces. |
| 127 | + |
| 128 | +## Push or pull indices from one cluster to another |
| 129 | + |
| 130 | +If you want tp push or pull indices from one cluster to another, Knapsack is your friend. |
| 131 | + |
| 132 | +You can copy an index in the local cluster or to a remote cluster with the `_push` or the `_pull` endpoint. |
| 133 | +This works if you have the same Java JVM version and the same Elasticsearch version. |
| 134 | + |
| 135 | +Example for a local cluster copy of the index `test` to `testcopy` |
| 136 | + |
| 137 | + curl -XPOST 'localhost:9200/test/_push?map=\{"test":"testcopy"\}' |
| 138 | + |
| 139 | +Example for a remote cluster copy of the index `test` by using the parameters `cluster`, `host`, and `port` |
| 140 | + |
| 141 | + curl -XPOST 'localhost:9200/test/_push?&cluster=remote&host=127.0.0.1&port=9201' |
| 142 | + |
| 143 | +This is a complete example that illustrates how to filter an index by timestamp and copy this part to |
| 144 | +another index |
| 145 | + |
| 146 | + curl -XDELETE 'localhost:9200/test' |
| 147 | + curl -XDELETE 'localhost:9200/testcopy' |
| 148 | + curl -XPUT 'localhost:9200/test/' -d ' |
| 149 | + { |
| 150 | + "mappings" : { |
| 151 | + "_default_": { |
| 152 | + "_timestamp" : { "enabled" : true, "store" : true, "path" : "date" } |
| 153 | + } |
| 154 | + } |
| 155 | + } |
| 156 | + ' |
| 157 | + curl -XPUT 'localhost:9200/test/doc/1' -d ' |
| 158 | + { |
| 159 | + "date" : "2014-01-01T00:00:00", |
| 160 | + "sentence" : "Hi!", |
| 161 | + "value" : 1 |
| 162 | + } |
| 163 | + ' |
| 164 | + curl -XPUT 'localhost:9200/test/doc/2' -d ' |
| 165 | + { |
| 166 | + "date" : "2014-01-02T00:00:00", |
| 167 | + "sentence" : "Hello World!", |
| 168 | + "value" : 2 |
| 169 | + } |
| 170 | + ' |
| 171 | + curl -XPUT 'localhost:9200/test/doc/3' -d ' |
| 172 | + { |
| 173 | + "date" : "2014-01-03T00:00:00", |
| 174 | + "sentence" : "Welcome!", |
| 175 | + "value" : 3 |
| 176 | + } |
| 177 | + ' |
| 178 | + curl 'localhost:9200/test/_refresh' |
| 179 | + curl -XPOST 'localhost:9200/test/_push?map=\{"test":"testcopy"\}' -d ' |
| 180 | + { |
| 181 | + "fields" : [ "_timestamp", "_source" ], |
| 182 | + "query" : { |
| 183 | + "filtered" : { |
| 184 | + "query" : { |
| 185 | + "match_all" : { |
| 186 | + } |
| 187 | + }, |
| 188 | + "filter" : { |
| 189 | + "range": { |
| 190 | + "_timestamp" : { |
| 191 | + "from" : "2014-01-02" |
| 192 | + } |
| 193 | + } |
| 194 | + } |
| 195 | + } |
| 196 | + } |
| 197 | + } |
| 198 | + ' |
| 199 | + curl '0:9200/test/_search?fields=_timestamp&pretty' |
| 200 | + curl '0:9200/testcopy/_search?fields=_timestamp&pretty' |
| 201 | + |
| 202 | +# Import |
| 203 | + |
| 204 | +You can import the file with the `_import` endpoint |
| 205 | + |
| 206 | + curl -XPOST 'localhost:9200/test/test/_import' |
| 207 | + |
| 208 | +Knapsack does not delete or overwrite data by default. |
| 209 | +But you can use the parameter `createIndex` with the value `false` to allow indexing to indexes that exist. |
| 210 | + |
| 211 | +When importing, you can map your indexes or index/types to your favorite ones. |
| 212 | + |
| 213 | + curl -XPOST 'localhost:9200/test/_import?map=\{"test":"testcopy"\}' |
| 214 | + |
| 215 | +## Modifying settings and mappings |
| 216 | + |
| 217 | +You can overwrite the settings and mapping when importing by using parameters in the form |
| 218 | +`<index>_settings=<filename>` or `<index>_<type>_mapping=<filename>`. |
| 219 | + |
| 220 | +General example:: |
| 221 | + |
| 222 | + curl -XPOST 'localhost:9200/myindex/mytype/_import?myindex_settings=/my/new/mysettings.json&myindex_mytype_mapping=/my/new/mapping.json' |
| 223 | + |
| 224 | +The following statements demonstrate how you can change the number of shards from the default `5` to `1` |
| 225 | +and replica from `1` to `0` for an index `test` |
| 226 | + |
| 227 | + curl -XDELETE localhost:9200/test |
| 228 | + curl -XPUT 'localhost:9200/test/test/1' -d '{"key":"value 1"}' |
| 229 | + curl -XPUT 'localhost:9200/test/test/2' -d '{"key":"value 2"}' |
| 230 | + curl -XPUT 'localhost:9200/test2/foo/1' -d '{"key":"value 1"}' |
| 231 | + curl -XPUT 'localhost:9200/test2/bar/1' -d '{"key":"value 1"}' |
| 232 | + curl -XPOST 'localhost:9200/test/_export' |
| 233 | + tar zxvf test.tar.gz test/_settings |
| 234 | + echo '{"index.number_of_shards":"1","index.number_of_replicas":"0"}' > test/_settings/null/null |
| 235 | + curl -XDELETE 'localhost:9200/test' |
| 236 | + curl -XPOST 'localhost:9200/test/_import?test_settings=test/_settings/null/null' |
| 237 | + curl -XGET 'localhost:9200/test/_settings?pretty' |
| 238 | + curl -XPOST 'localhost:9200/test/_search?q=*&pretty' |
| 239 | + |
| 240 | +The result is a search on an index with just one shard. |
| 241 | + |
| 242 | + { |
| 243 | + "took" : 19, |
| 244 | + "timed_out" : false, |
| 245 | + "_shards" : { |
| 246 | + "total" : 1, |
| 247 | + "successful" : 1, |
| 248 | + "failed" : 0 |
| 249 | + }, |
| 250 | + "hits" : { |
| 251 | + "total" : 2, |
| 252 | + "max_score" : 1.0, |
| 253 | + "hits" : [ { |
| 254 | + "_index" : "test", |
| 255 | + "_type" : "test", |
| 256 | + "_id" : "1", |
| 257 | + "_score" : 1.0, |
| 258 | + "_source":{"key":"value 1"} |
| 259 | + }, { |
| 260 | + "_index" : "test", |
| 261 | + "_type" : "test", |
| 262 | + "_id" : "2", |
| 263 | + "_score" : 1.0, |
| 264 | + "_source":{"key":"value 2"} |
| 265 | + } ] |
| 266 | + } |
| 267 | + } |
| 268 | + |
| 269 | +## State of knapsack import/export actions |
| 270 | + |
| 271 | +While exports or imports or running, you can check the state with |
| 272 | + |
| 273 | + curl -XPOST 'localhost:9200/_export/state' |
| 274 | + |
| 275 | +or |
| 276 | + |
| 277 | + curl -XPOST 'localhost:9200/_import/state' |
| 278 | + |
| 279 | +## Aborting knapsack actions |
| 280 | + |
| 281 | +If you want to abort all running knapsack exports/import, you can do this by |
| 282 | + |
| 283 | + curl -XPOST 'localhost:9200/_export/abort' |
| 284 | + |
| 285 | +or |
| 286 | + |
| 287 | + curl -XPOST 'localhost:9200/_import/abort' |
| 288 | + |
| 289 | +# Java API |
| 290 | + |
| 291 | +Knapsack implements all actions as Java transport actions in ELasticsearch. |
| 292 | + |
| 293 | +You can consult the junit tests for finding out how to use the API. To give you an impression, |
| 294 | +here is just an example for a very minimal export/import cycle using the `bulk` archive format. |
| 295 | + |
| 296 | + |
| 297 | + client.index(new IndexRequest().index("index1").type("test1").id("doc1") |
| 298 | + .source("content","Hello World").refresh(true)).actionGet(); |
| 299 | + |
| 300 | + File exportFile = File.createTempFile("minimal-import-", ".bulk"); |
| 301 | + Path exportPath = Paths.get(URI.create("file:" + exportFile.getAbsolutePath())); |
| 302 | + KnapsackExportRequestBuilder requestBuilder = new KnapsackExportRequestBuilder(client.admin().indices()) |
| 303 | + .setPath(exportPath) |
| 304 | + .setOverwriteAllowed(false); |
| 305 | + KnapsackExportResponse knapsackExportResponse = requestBuilder.execute().actionGet(); |
| 306 | + |
| 307 | + KnapsackStateRequestBuilder knapsackStateRequestBuilder = |
| 308 | + new KnapsackStateRequestBuilder(client.admin().indices()); |
| 309 | + KnapsackStateResponse knapsackStateResponse = knapsackStateRequestBuilder.execute().actionGet(); |
| 310 | + |
| 311 | + Thread.sleep(1000L); |
| 312 | + |
| 313 | + client.admin().indices().delete(new DeleteIndexRequest("index1")).actionGet(); |
| 314 | + |
| 315 | + KnapsackImportRequestBuilder knapsackImportRequestBuilder = new KnapsackImportRequestBuilder(client.admin().indices()) |
| 316 | + .setPath(exportPath); |
| 317 | + KnapsackImportResponse knapsackImportResponse = knapsackImportRequestBuilder.execute().actionGet(); |
| 318 | + |
| 319 | + |
| 320 | + |
| 321 | +# Caution |
| 322 | + |
| 323 | +Knapsack is very simple and works without locks or snapshots. This means, if Elasticsearch is |
| 324 | +allowed to write to the part of your data in the export while it runs, you may lose data in the export. |
| 325 | +So it is up to you to organize the safe export and import with this plugin. |
| 326 | + |
| 327 | +If you want a more advanced feature, please use the snapshot/restore which is the standard |
| 328 | +procedure for saving/restoring data in Elasticsearch: |
| 329 | + |
| 330 | +http://www.elasticsearch.org/blog/introducing-snapshot-restore/ |
| 331 | + |
| 332 | +# Credits |
| 333 | + |
| 334 | +Knapsack contains derived work of Apache Common Compress |
| 335 | +http://commons.apache.org/proper/commons-compress/ |
| 336 | + |
| 337 | +The code in this component has many origins: |
| 338 | +The bzip2, tar and zip support came from Avalon's Excalibur, but originally |
| 339 | +from Ant, as far as life in Apache goes. The tar package is originally Tim Endres' |
| 340 | +public domain package. The bzip2 package is based on the work done by Keiron Liddle as |
| 341 | + well as Julian Seward's libbzip2. It has migrated via: |
| 342 | +Ant -> Avalon-Excalibur -> Commons-IO -> Commons-Compress. |
| 343 | +The cpio package has been contributed by Michael Kuss and the jRPM project. |
| 344 | + |
| 345 | +Thanks to `nicktgr15 <https://github.com/nicktgr15>` for extending Knapsack to support Amazon S3. |
| 346 | + |
| 347 | +# License |
| 348 | + |
| 349 | +Elasticsearch Knapsack Plugin |
| 350 | + |
| 351 | +Copyright (C) 2012 Jörg Prante |
| 352 | + |
| 353 | +Licensed under the Apache License, Version 2.0 (the "License"); |
| 354 | +you may not use this file except in compliance with the License. |
| 355 | +You may obtain a copy of the License at |
| 356 | + |
| 357 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 358 | + |
| 359 | +Unless required by applicable law or agreed to in writing, software |
| 360 | +distributed under the License is distributed on an "AS IS" BASIS, |
| 361 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 362 | +See the License for the specific language governing permissions and |
| 363 | +limitations under the License. |
0 commit comments