TokuMX Bulk Loading

Introduction

We wish to improve the user experience with importing bulk data. Straight inserts into TokuMX collections are fast, but the fractal tree bulk loader is faster due to multithreading, especially when secondary indexes are defined before the import.

Design

Introduce a bulk load mode to the Client object. Clients enter bulk load mode with a list of collections. Bulk load mode begins a multi-statement transaction (if necessary) and table locks each collection. Once locked, the empty collections will utilize a bulk loader for future inserts from the calling Client. They may not be updated, queried, or deleted from. Non-empty collections will not utilized a bulk loader and have no query/update/delete restriction. Importing into non-empty collections is not recommended, but it should be supported to preserve existing workflows. It is up to Tokutek and friends to convince users to bulk load into empty collections for performance.

API

db.runCommand({ 'beginLoad' : [ 'coll1' , ... ] })

...

db.runCommand({ 'commitLoad' : 1 })

OR

db.runCommand({ 'abortLoad' : 1 })

Example workflow

db.coll1.drop();

db.coll2.insert({ z: 1 })

db.coll1.ensureIndex({ a: 1 })

db.coll1.ensureIndex({ b: 1 })

db.runCommand({ 'beginLoad' : [ 'coll1', 'coll2' ] })

db.coll2.insert({ z: 100 }) // does not utilize a loader, just uses inserts. queries are ok.

db.coll1.insert({ a: 101, b: 102 }) // utilizes a loader for all indexes. queries are not ok.

db.runCommand({ 'commitLoad' : 1 })

// queries are now ok for both

Issues

Collections under-going bulk load must not be the target of an ensureIndex.
- Proposed solution: Maintain a bit in the NamespaceDetails that a bulk load is in progress and index build attempts will fail. This bit will be protected by the DBWrite lock.
Big transactional imports and replication can cause slave delay. On commit, there will be a massive oplog entry to process. New transactions will not get processed to secondaries until the big txn finishes (which can take a very long time).
- Proposed solution: Send uncommitted load inserts to the slave, which will be processed by a local loader. On commit, the secondary load commits, etc. Aborting is still cheap (that is, there is no massive transaction to rollback on the secondary).
Loads into a sharded environment will need to run the begin/abort load commands on each shard in the cluster since client inserts will be routed to any one of the shards by the mongos process. Further, the balancer will need to be disabled during load.
- Proposed solution: We'll just need to make those changes to mongos.
If we load a very large collection into one chunk and it doesn't get a chance to split until after the load is done, this causes two problems:
- splitVector can only return 8192 split points, so if the collection is larger than (chunk size) * 8k, it'll fail (same problem as trying to shard an existing large collection).
- It will be expensive to do all the migrations that will be necessary after you finish the load. Therefore, it would be better to pre-split the collection before the load happens. We may want to enforce this somehow. It probably only makes sense as a way to restore from a copy of an already sharded collection (so you have the chunk boundaries up front). This is not really a sensible thing to do for an initial load into a sharded collection, if the collection is very large. In this case it would almost certainly be better to just do normal inserts and let balancing happen.
Replication write concern won't work on a per-insert basis if everything is done in one transaction. This matters because the mongoimport and mongorestore tools allow users to specify what the write concern should be during load.
- This probably doesn't matter too much, since a bulk load would be an all-or-nothing operation anyway. Write concern makes sense in Vanilla where there are no transactions.
The mongoimport and mongorestore tools allow users to upsert backed up data into collections that already have data. The loader can't be used here since the tree isn't searchable until it's finished building.
- We can easily detect whether or not upserts are going to be used and not use a loader in this case. It's up to Tokutek and friends to advise against upsert mode.

Testing

TODO: Figure this out

Action items

Enhance the ydb put multiple API to support multiple secondary keys per primary row
Modify mongoimport / mongorestore.
Later: More parallelism in the fractal tree bulk loader? Performance measurements will tell.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly