RocksJava Performance on Flash Storage

RocksJava is a project that we launched in April 2014 to build a high performance Java driver for RocksDB. Here we would like to show its performance numbers on flash storage. We will first show the result summary. Details about experimental setup and commands to run the benchmark will be covered in the later sections.

Result Summary

We repeated the benchmarks on flash storage described in [3] and compare the performance between RocksJava and the RocksDB C++. In this benchmark, the database has one billion key-values, and each key / value has 16 / 800 bytes respectively. Below shows the summary of the results:

Table 1. Performance comparison over 1TB database on flash storage.

Benchmark	RocksJava	RocksDB C++	Difference (%)	Details
1 Sequential Writer	369k wps	371K wps	< 1%	Seq Bulk Load
32 Random Readers	270K rps	303K rps	-10.8%
32 Random Readers w/ 1 Random Writer	206K rps	336K rps	-38.5%
32 Sequential Readers	2.12M rps	6.84M rps	-69.0%

Here we further discuss the 32 sequential readers benchmark where RocksJava is 70% slower than RocksDB C++.

Sequential-read is a relative high qps operation compared to the operations in other benchmarks. As a result, the overhead on the Java side will become more noticeable. In addition, in the current implementation, each read in RocksJava's sequential reader involves in four JNI calls: which are next(), isValid(), key(), and value(), while the operations used in other benchmarks only involve one JNI call:

    @Override public void runTask() throws RocksDBException {
      org.rocksdb.Iterator iter = db_.newIterator();
      long i;
      for (iter.seekToFirst(), i = 0;
           iter.isValid() && i < numEntries_;
           iter.next(), ++i) {
        stats_.found_++;
        stats_.finishedSingleOp(iter.key().length + iter.value().length);
        if (isFinished()) {
          return;
        }
      }
    }

This can be improved by introducing a better api that combines some of these functions together (such as boolean nextValid()).

Setup

We tried to reuse the settings used in []. Here are some of important settings / difference used in our benchmark:

Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz, 40 cores.
25 MB CPU cache, 144 GB Ram
CentOS release 5.2 (Final)
Java version "1.7.0_55" (Java(TM) SE Runtime Environment (build 1.7.0_55-b13), Java HotSpot(TM) 64-Bit * * Server VM (build 24.55-b03, mixed mode)
Test with 1 billion key / value pairs. Each key is 16 bytes, and each value is 800 bytes.
Snappy compression is used.
For 32 readers w/ 1 writer benchmark, the writer performs 10k writes per second.
Does not use JEMALLOC.

Bulk Load of keys in Sequential Order

Random Read performance

Multi-Threaded Random Read and Single-Threaded Write Performance

Sequential Read Performance

Contents

RocksDB Wiki
Overview
RocksDB FAQ
Terminology
Requirements
Contributors' Guide
Release Methodology
RocksDB Users and Use Cases
RocksDB Public Communication and Information Channels
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
Options
- Setup Options and Basic Tuning
- Option String and Option Map
- RocksDB Options File
MemTable
Journal
- Write Ahead Log (WAL)
- MANIFEST
- Track WAL in MANIFEST
Cache
- Block Cache
- SecondaryCache (Experimental)
Write Buffer Manager
Compaction
- Leveled Compaction
- Universal compaction style
- FIFO compaction style
- Manual Compaction
- Subcompaction
- Choose Level Compaction Files
- Managing Disk Space Utilization
- Trivial Move Compaction
- Remote Compaction (Experimental)
SST File Formats
- Block-based Table Format
- PlainTable Format
- CuckooTable Format
- Index Block Format
- Bloom Filter
- Data Block Hash Index
IO
- Rate Limiter
- SST File Manager
- Direct I/O
Compression
- Dictionary Compression
Full File Checksum and Checksum Handoff
Background Error Handling
Huge Page TLB Support
Tiered Storage (Experimental)
Logging and Monitoring
- Logger
- Statistics
- Compaction Stats and DB Status
- Perf Context and IO Stats Context
- EventListener
Known Issues
Troubleshooting Guide
Tests
- Stress Test
- Fuzzing
- Benchmarking
Tools / Utilities
- Administration and Data Access Tool
- How to Backup RocksDB?
- Replication Helpers
- Checkpoints
- How to persist in-memory RocksDB database
- Third-party language bindings
- RocksDB Trace, Replay, Analyzer, and Workload Generation
- Block cache analysis and simulation tools
- IO Tracer and Parser
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
Extending RocksDB
- RocksDB Configurable Objects
- The Customizable Class
- Object Registry
RocksJava
- RocksJava Basics
- Logging in RocksJava
- JNI Debugging
- RocksJava API TODO
- RocksJava Performance on Flash Storage
- Tuning RocksDB from Java
Lua
- Lua CompactionFilter
Performance
- Performance Benchmarks
- In Memory Workload Performance
- Read-Modify-Write (Merge) Performance
- Delete A Range Of Keys
- Write Stalls
- Pipelined Write
- MultiGet Performance
- Tuning Guide
- Memory usage in RocksDB
- Speed-Up DB Open
- Implement Queue Service Using RocksDB
- Asynchronous IO
- Off-peak in RocksDB
Projects Being Developed
Misc
- Building on Windows
- Developing with an IDE
- Open Projects
- Talks
- Publication
- Features Not in LevelDB
- How to ask a performance-related question?
- Articles about Rocks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly