forked from wikimedia/wikidata-query-blazegraph
-
Notifications
You must be signed in to change notification settings - Fork 0
/
RELEASE_0.81b.txt
66 lines (47 loc) · 2.9 KB
/
RELEASE_0.81b.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
This is a bigdata (R) snapshot release. This release is capable of loading 1B
triples in under one hour on a 15 node cluster and has been used to load up to
13B triples on the same cluster. JDK 1.6 is required.
See [1] for instructions on installing bigdata(R), [2] for the javadoc and [3]
and [4] for news, questions, and the latest developments. There are a number
of dependencies which are only required for the scale-out architecture (jini
and zookeeper) or compressed unicode sort keys (ICU). There are notes on the
wiki[1] if you want to exclude these dependencies from your installation.
Please note that we recommend checking out the code from SVN using the tag for
this release. The code will build automatically under eclipse. You can also
build the code using the ant script. The cluster installer requires the use of
the ant script. You can checkout this release from the following URL:
https://bigdata.svn.sourceforge.net/svnroot/bigdata/tags/RELEASE_0.81b
This tag corresponds to revision 2267.
New features:
- Support for quads.
- The B+Tree data record is now the same on the disk and in memory, eliminating
de-serialization costs for immutable B+Tree records.
- Shared LRU for all B+Tree instances in the same JVM. This provides competition
across those B+Tree instances for RAM. The LRU is configured by default to use
10% of the JVM heap, but that value may be increased to 20% even on very heavy
workloads with large heaps.
- Parallel iterator for distributed access paths.
- 3x improvement in distributed query performance. We have only just begun to
optimize distributed query. This performance improvement is mainly due to the
parallel access path iterator, the shared LRU, and the selection of a better
chunk size for distributed query. We expect substantion improvements in query
performance over the next several months.
- Some query hotspot elimination.
The roadmap for the next release includes:
- Full transactions for the SAIL.
- Record level compression.
- Query optimizations.
For more information, please see the following links:
[1] http://bigdata.wiki.sourceforge.net/GettingStarted
[2] http://www.bigdata.com/bigdata/docs/api/
[3] http://sourceforge.net/projects/bigdata/
[4] http://www.bigdata.com/blog
About bigdata:
Bigdata® is a horizontally-scaled, general purpose storage and computing fabric
for ordered data (B+Trees), designed to operate on either a single server or a
cluster of commodity hardware. Bigdata® uses dynamically partitioned key-range
shards in order to remove any realistic scaling limits - in principle, bigdata®
may be deployed on 10s, 100s, or even thousands of machines and new capacity may
be added incrementally without requiring the full reload of all data. The bigdata®
RDF database supports RDFS and OWL Lite reasoning, high-level query (SPARQL),
and datum level provenance.