Skip to content

Commit d5501ef

Browse files
committed
Knapsack 1.3.2.0
- new: support for Elasticsearch 1.3.2 - new: all knapsack actions are reimplemented as Java API transport actions - new: Elasticsearch bulk format support - new: byte progress watcher, splitting into more than one archive file by byte size - new: _push action for copying indices - new: _pull endoint for fetching indices - new: index aliases recorded in archive file - new: archive codec API - removed S3 support - use ES snapshot/restore for this - added numerous junit tests - switch to bzip2 implementation of https://code.google.com/p/jbzip2/ - switch to JDK ZIP archive implementation - cleaned up tar implementation - _state action overhaul - _abort action overhaul - optinal URI encoding of archive name entries - new diagram - added CREDITS - added Apache License in source files - tests: switched from log4j to log4j2
1 parent 6221f4f commit d5501ef

File tree

253 files changed

+20674
-19993
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

253 files changed

+20674
-19993
lines changed

CREDITS.txt

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
2+
Knapsack contains derived work of Apache Common Compress http://commons.apache.org/proper/commons-compress/
3+
4+
The code in this component has many origins:
5+
The bzip2, tar and zip support came from Avalon's Excalibur, but originally
6+
from Ant, as far as life in Apache goes. The tar package is originally Tim Endres'
7+
public domain package. The bzip2 package is based on the work done by Keiron Liddle as well
8+
as Julian Seward's libbzip2. It has migrated via:
9+
Ant -> Avalon-Excalibur -> Commons-IO -> Commons-Compress.
10+
11+
The cpio package has been contributed by Michael Kuss and the jRPM project.
12+
13+
Thanks to `nicktgr15 <https://github.com/nicktgr15>` for extending Knapsack to support Amazon S3.
14+
15+
The original URI class loader imlpementation was taken from
16+
Apache Geronimo org/apache/geronimo/kernel/classloader/ (Apache 2.0)
17+
18+
The original bzip2 implementation was taken from http://code.google.com/p/jbzip2/ (MIT)
19+
20+
The original LZF implementation was taken from https://github.com/ning/compress (Apache 2.0)
21+
22+
The original XZ implementation was taken from http://tukaani.org/xz/java.html (Public Domain)
23+
24+
The original tar implementation was taken from
25+
Apache Commons Compress http://commons.apache.org/proper/commons-compress/tar.html (Apache 2.0)
26+
27+
The original cpio implementation was taken from the jRPM project (http://jrpm.sourceforge.net) (Apache 2.0)
28+
29+
The original BytesProgressWatcher is from JetS3t (Apache 2.0)
30+

README.md

Lines changed: 363 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,363 @@
1+
![Knapsack](https://github.com/jprante/elasticsearch-knapsack/raw/master/src/site/resources/knapsack.png)
2+
3+
Image by [DaPino](http://www.iconarchive.com/show/fishing-equipment-icons-by-dapino/backpack-icon.html)
4+
[CC Attribution-Noncommercial 3.0](http://creativecommons.org/licenses/by-nc/3.0/)
5+
6+
# Elasticsearch Knapsack Plugin
7+
8+
Knapsack is an "swiss knife" export/import plugin for [Elasticsearch](http://github.com/elasticsearch/elasticsearch).
9+
It uses archive formats (tar, zip, cpio) and also Elasticsearch bulk format with
10+
compression algorithms (gzip, bzip2, lzf, xz).
11+
12+
A pull or push of indexes or search hits with stored fields across clusters is also supported.
13+
14+
The knapsack actions can be executed via HTTP REST, or in Java using the Java API.
15+
16+
In archive files, the following index information is encoded:
17+
18+
- index settings
19+
- index mappings
20+
- index aliases
21+
22+
When importing archive files again, this information is reapplied.
23+
24+
## Versions
25+
26+
![Travis](https://travis-ci.org/jprante/elasticsearch-knapsack.png)
27+
28+
| Elasticsearch | Plugin | Release date |
29+
| -------------- | -------------- | ------------ |
30+
| 1.3.2 | 1.3.2.0 | Sep 28, 2014 |
31+
| 1.2.1 | 1.2.1.0 | Jun 13, 2014 |
32+
| 1.2.0 | 1.2.0.0 | May 23, 2014 |
33+
| 1.1.0 | 1.1.0.0 | May 25, 2014 |
34+
| 1.0.0 | 1.0.0.1 | Feb 16, 2014 |
35+
| 0.90.11 | 0.90.11.2 | Feb 16, 2014 |
36+
| 0.20.6 | 0.20.6.2 | Feb 16, 2014 |
37+
| 0.19.11 | 0.19.11.3 | Feb 15, 2014 |
38+
| 0.19.8 | 0.19.8.1 | Feb 15, 2014 |
39+
40+
## Installation
41+
42+
./bin/plugin -install knapsack -url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-knapsack/1.3.2.0/elasticsearch-knapsack-1.3.2.0-plugin.zip
43+
44+
Do not forget to restart the node after installation.
45+
46+
## Project docs
47+
48+
The Maven project site is available at [Github](http://jprante.github.io/elasticsearch-knapsack)
49+
50+
## Overview
51+
52+
![Diagram](https://github.com/jprante/elasticsearch-knapsack/raw/master/src/site/resources/knapsack-diagram-2.png)
53+
54+
# Example
55+
56+
Let's go through a simple example:
57+
58+
curl -XDELETE localhost:9200/test
59+
curl -XPUT localhost:9200/test/test/1 -d '{"key":"value 1"}'
60+
curl -XPUT localhost:9200/test/test/2 -d '{"key":"value 2"}'
61+
62+
# Export
63+
64+
You can export this Elasticsearch index with
65+
66+
curl -XPOST localhost:9200/test/test/_export
67+
{"running":true,"state":{"mode":"export","started":"2014-09-28T19:18:27.447Z","path":"file:///Users/es/elasticsearch-1.3.2/test_test.tar.gz","node_name":"Slither"}}
68+
69+
The result is a file in the Elasticsearch folder
70+
71+
-rw-r--r-- 1 joerg staff 343 28 Sep 21:18 test_test.tar.gz
72+
73+
Check with tar utility, the settings and the mapping is also exported
74+
75+
tar ztvf test_test.tar.gz
76+
-rw-r--r-- 0 joerg 0 133 28 Sep 21:18 test/_settings/null/null
77+
-rw-r--r-- 0 joerg 0 49 28 Sep 21:18 test/test/_mapping/null
78+
-rw-r--r-- 0 joerg 0 17 28 Sep 21:18 test/test/1/_source
79+
-rw-r--r-- 0 joerg 0 17 28 Sep 21:18 test/test/2/_source
80+
81+
Also, you can export a whole index with
82+
83+
curl -XPOST localhost:9200/test/_export
84+
85+
with the result file test.tar.gz, or even all cluster indices with
86+
87+
curl -XPOST 'localhost:9200/_export'
88+
89+
to the file `_all.tar.gz`
90+
91+
By default, the archive format is `tar` with compression `gz` (gzip).
92+
You can also export to `zip`, `cpio` or `bulk` archive or use another compression scheme.
93+
Available are `bz2` (bzip2), `xz` (Xz), or `lzf` (LZF)
94+
95+
Note: if you use the `bulk` format, you can create Elasticsearch bulk format.
96+
97+
## Export search results
98+
99+
You can add a query to the `_export` endpoint just like you would do for searching in Elasticsearch.
100+
101+
curl -XPOST 'localhost:9200/test/test/_export' -d '{
102+
"query" : {
103+
"match_phrase" : {
104+
"key" : "value 1"
105+
}
106+
},
107+
"fields" : [ "_parent", "_source" ]
108+
}'
109+
110+
## Export to an archive with a given path name
111+
112+
You can configure an archive path with the parameter `path`
113+
114+
curl -XPOST 'localhost:9200/test/_export?path=/tmp/myarchive.zip'
115+
116+
If Elasticsearch can not write to the path, an error message will appear, and no export will take place.
117+
You can force overwrite with the parameter `overwrite=true`
118+
119+
## Renaming indexes and index types
120+
121+
You can rename indexes and index types by adding a `map` parameter that contains a JSON
122+
object with old and new index (and index/type) names.
123+
124+
curl -XPOST 'localhost:9200/test/type/_export?map=\{"test":"testcopy","test/type":"testcopy/typecopy"\}'
125+
126+
Note the backslash, which is required to escape shell interpretation of curly braces.
127+
128+
## Push or pull indices from one cluster to another
129+
130+
If you want tp push or pull indices from one cluster to another, Knapsack is your friend.
131+
132+
You can copy an index in the local cluster or to a remote cluster with the `_push` or the `_pull` endpoint.
133+
This works if you have the same Java JVM version and the same Elasticsearch version.
134+
135+
Example for a local cluster copy of the index `test` to `testcopy`
136+
137+
curl -XPOST 'localhost:9200/test/_push?map=\{"test":"testcopy"\}'
138+
139+
Example for a remote cluster copy of the index `test` by using the parameters `cluster`, `host`, and `port`
140+
141+
curl -XPOST 'localhost:9200/test/_push?&cluster=remote&host=127.0.0.1&port=9201'
142+
143+
This is a complete example that illustrates how to filter an index by timestamp and copy this part to
144+
another index
145+
146+
curl -XDELETE 'localhost:9200/test'
147+
curl -XDELETE 'localhost:9200/testcopy'
148+
curl -XPUT 'localhost:9200/test/' -d '
149+
{
150+
"mappings" : {
151+
"_default_": {
152+
"_timestamp" : { "enabled" : true, "store" : true, "path" : "date" }
153+
}
154+
}
155+
}
156+
'
157+
curl -XPUT 'localhost:9200/test/doc/1' -d '
158+
{
159+
"date" : "2014-01-01T00:00:00",
160+
"sentence" : "Hi!",
161+
"value" : 1
162+
}
163+
'
164+
curl -XPUT 'localhost:9200/test/doc/2' -d '
165+
{
166+
"date" : "2014-01-02T00:00:00",
167+
"sentence" : "Hello World!",
168+
"value" : 2
169+
}
170+
'
171+
curl -XPUT 'localhost:9200/test/doc/3' -d '
172+
{
173+
"date" : "2014-01-03T00:00:00",
174+
"sentence" : "Welcome!",
175+
"value" : 3
176+
}
177+
'
178+
curl 'localhost:9200/test/_refresh'
179+
curl -XPOST 'localhost:9200/test/_push?map=\{"test":"testcopy"\}' -d '
180+
{
181+
"fields" : [ "_timestamp", "_source" ],
182+
"query" : {
183+
"filtered" : {
184+
"query" : {
185+
"match_all" : {
186+
}
187+
},
188+
"filter" : {
189+
"range": {
190+
"_timestamp" : {
191+
"from" : "2014-01-02"
192+
}
193+
}
194+
}
195+
}
196+
}
197+
}
198+
'
199+
curl '0:9200/test/_search?fields=_timestamp&pretty'
200+
curl '0:9200/testcopy/_search?fields=_timestamp&pretty'
201+
202+
# Import
203+
204+
You can import the file with the `_import` endpoint
205+
206+
curl -XPOST 'localhost:9200/test/test/_import'
207+
208+
Knapsack does not delete or overwrite data by default.
209+
But you can use the parameter `createIndex` with the value `false` to allow indexing to indexes that exist.
210+
211+
When importing, you can map your indexes or index/types to your favorite ones.
212+
213+
curl -XPOST 'localhost:9200/test/_import?map=\{"test":"testcopy"\}'
214+
215+
## Modifying settings and mappings
216+
217+
You can overwrite the settings and mapping when importing by using parameters in the form
218+
`<index>_settings=<filename>` or `<index>_<type>_mapping=<filename>`.
219+
220+
General example::
221+
222+
curl -XPOST 'localhost:9200/myindex/mytype/_import?myindex_settings=/my/new/mysettings.json&myindex_mytype_mapping=/my/new/mapping.json'
223+
224+
The following statements demonstrate how you can change the number of shards from the default `5` to `1`
225+
and replica from `1` to `0` for an index `test`
226+
227+
curl -XDELETE localhost:9200/test
228+
curl -XPUT 'localhost:9200/test/test/1' -d '{"key":"value 1"}'
229+
curl -XPUT 'localhost:9200/test/test/2' -d '{"key":"value 2"}'
230+
curl -XPUT 'localhost:9200/test2/foo/1' -d '{"key":"value 1"}'
231+
curl -XPUT 'localhost:9200/test2/bar/1' -d '{"key":"value 1"}'
232+
curl -XPOST 'localhost:9200/test/_export'
233+
tar zxvf test.tar.gz test/_settings
234+
echo '{"index.number_of_shards":"1","index.number_of_replicas":"0"}' > test/_settings/null/null
235+
curl -XDELETE 'localhost:9200/test'
236+
curl -XPOST 'localhost:9200/test/_import?test_settings=test/_settings/null/null'
237+
curl -XGET 'localhost:9200/test/_settings?pretty'
238+
curl -XPOST 'localhost:9200/test/_search?q=*&pretty'
239+
240+
The result is a search on an index with just one shard.
241+
242+
{
243+
"took" : 19,
244+
"timed_out" : false,
245+
"_shards" : {
246+
"total" : 1,
247+
"successful" : 1,
248+
"failed" : 0
249+
},
250+
"hits" : {
251+
"total" : 2,
252+
"max_score" : 1.0,
253+
"hits" : [ {
254+
"_index" : "test",
255+
"_type" : "test",
256+
"_id" : "1",
257+
"_score" : 1.0,
258+
"_source":{"key":"value 1"}
259+
}, {
260+
"_index" : "test",
261+
"_type" : "test",
262+
"_id" : "2",
263+
"_score" : 1.0,
264+
"_source":{"key":"value 2"}
265+
} ]
266+
}
267+
}
268+
269+
## State of knapsack import/export actions
270+
271+
While exports or imports or running, you can check the state with
272+
273+
curl -XPOST 'localhost:9200/_export/state'
274+
275+
or
276+
277+
curl -XPOST 'localhost:9200/_import/state'
278+
279+
## Aborting knapsack actions
280+
281+
If you want to abort all running knapsack exports/import, you can do this by
282+
283+
curl -XPOST 'localhost:9200/_export/abort'
284+
285+
or
286+
287+
curl -XPOST 'localhost:9200/_import/abort'
288+
289+
# Java API
290+
291+
Knapsack implements all actions as Java transport actions in ELasticsearch.
292+
293+
You can consult the junit tests for finding out how to use the API. To give you an impression,
294+
here is just an example for a very minimal export/import cycle using the `bulk` archive format.
295+
296+
297+
client.index(new IndexRequest().index("index1").type("test1").id("doc1")
298+
.source("content","Hello World").refresh(true)).actionGet();
299+
300+
File exportFile = File.createTempFile("minimal-import-", ".bulk");
301+
Path exportPath = Paths.get(URI.create("file:" + exportFile.getAbsolutePath()));
302+
KnapsackExportRequestBuilder requestBuilder = new KnapsackExportRequestBuilder(client.admin().indices())
303+
.setPath(exportPath)
304+
.setOverwriteAllowed(false);
305+
KnapsackExportResponse knapsackExportResponse = requestBuilder.execute().actionGet();
306+
307+
KnapsackStateRequestBuilder knapsackStateRequestBuilder =
308+
new KnapsackStateRequestBuilder(client.admin().indices());
309+
KnapsackStateResponse knapsackStateResponse = knapsackStateRequestBuilder.execute().actionGet();
310+
311+
Thread.sleep(1000L);
312+
313+
client.admin().indices().delete(new DeleteIndexRequest("index1")).actionGet();
314+
315+
KnapsackImportRequestBuilder knapsackImportRequestBuilder = new KnapsackImportRequestBuilder(client.admin().indices())
316+
.setPath(exportPath);
317+
KnapsackImportResponse knapsackImportResponse = knapsackImportRequestBuilder.execute().actionGet();
318+
319+
320+
321+
# Caution
322+
323+
Knapsack is very simple and works without locks or snapshots. This means, if Elasticsearch is
324+
allowed to write to the part of your data in the export while it runs, you may lose data in the export.
325+
So it is up to you to organize the safe export and import with this plugin.
326+
327+
If you want a more advanced feature, please use the snapshot/restore which is the standard
328+
procedure for saving/restoring data in Elasticsearch:
329+
330+
http://www.elasticsearch.org/blog/introducing-snapshot-restore/
331+
332+
# Credits
333+
334+
Knapsack contains derived work of Apache Common Compress
335+
http://commons.apache.org/proper/commons-compress/
336+
337+
The code in this component has many origins:
338+
The bzip2, tar and zip support came from Avalon's Excalibur, but originally
339+
from Ant, as far as life in Apache goes. The tar package is originally Tim Endres'
340+
public domain package. The bzip2 package is based on the work done by Keiron Liddle as
341+
well as Julian Seward's libbzip2. It has migrated via:
342+
Ant -> Avalon-Excalibur -> Commons-IO -> Commons-Compress.
343+
The cpio package has been contributed by Michael Kuss and the jRPM project.
344+
345+
Thanks to `nicktgr15 <https://github.com/nicktgr15>` for extending Knapsack to support Amazon S3.
346+
347+
# License
348+
349+
Elasticsearch Knapsack Plugin
350+
351+
Copyright (C) 2012 Jörg Prante
352+
353+
Licensed under the Apache License, Version 2.0 (the "License");
354+
you may not use this file except in compliance with the License.
355+
You may obtain a copy of the License at
356+
357+
http://www.apache.org/licenses/LICENSE-2.0
358+
359+
Unless required by applicable law or agreed to in writing, software
360+
distributed under the License is distributed on an "AS IS" BASIS,
361+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
362+
See the License for the specific language governing permissions and
363+
limitations under the License.

0 commit comments

Comments
 (0)