-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best practices for implementing DB splitting #157
Comments
Maybe you mean "sharding"? Searching for this keyword might help better. Here's also a project that performs sharding in the application level (https://github.com/uber/ringpop-go). Or maybe go all in the Raft and multi-raft instead of Kafka in this project. I believe this topic goes out of the scope here, because normally people shard on top of different replica groups, and you could also read that in the Rockset white paper they don't talk so much about this. |
if you want to share sst files between two instances, then you can disable file deletion for both the instances, and then switch on the purger at https://github.com/rockset/rocksdb-cloud/blob/master/include/rocksdb/cloud/cloud_env_options.h#L282. We do not use this feature in production so I am not sure whether this code path is production ready or not. |
Thx! I am now trying to implement SplitDB by means of checkpoint+deleteRange, and then regularly scan the manifest of the old DB to determine whether the file is expired. Also, I will find out if the method you mentioned is suitable. |
@dhruba Hi!Another question I have is that when we switch on the purger, we read all the sst files from the Current Manifest, and take out all the sst files under the old DB, so as to select the files to be deleted. Is there a risk that this batch of files to be deleted may be referenced by a request or snapshot. |
Yes, agreed that a query could be accessing a file F that is not part of the manifest. The reason being that the query started when F was part of the database but a simultaneous compaction. process has removed the file from the DB. |
@LucienXian did you implement it ? You could also have just created hard links to those SSTs. That's the easiest I can think of |
Problem Description
I'm trying to use rocksdb-cloud, but I'm having some issues with how to implement DB instance splitting.
Suppose the current DB key range is
[0, 100]
. To split the DB into 2 small DBs according to certain rules, the 2 new DBs may be[0, 50]
and[51, 100]
. How should I do it in a better way?In this process, it is necessary to split as quickly as possible to reduce the time of stop writing. All I can think of is to share the sst files when splitting, the other files are quickly independent. When the new DB is opened, the WAL of the old DB is replayed, and the replay calls a DeleteRange to delete it.
But here is a question, can we control sst files to be shared in two new DBs until both DBs no longer need to use shared sst files because of subsequent compaction
Or is there a better solution?
The text was updated successfully, but these errors were encountered: