Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kvrocks & IngestExternalFile #2458

Open
2 tasks done
matan129 opened this issue Jul 31, 2024 · 1 comment
Open
2 tasks done

Kvrocks & IngestExternalFile #2458

matan129 opened this issue Jul 31, 2024 · 1 comment
Labels
enhancement type enhancement help wanted Good for newcomers

Comments

@matan129
Copy link

matan129 commented Jul 31, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Motivation

Hi folks,

First of all - I just wanted to say that this is an awesome project 🙂

Secondly -

I wondered whether it's possible to load data to Kvrocks via RocksDB's IngestExternalFile.

The use case is real-world.

I currently work on a system that relies on (non-distributed) RocksDB, and we'd like to possible start using Kvrocks instead.
Every once in a while, we use an offline, "bulky" Spark process which essentially generates a complete view of the RocksDB database.
This is done by creating SST files directly, which is pretty cool*.
The system then downloads these files locally and just points RocksDB to use them.
This way, we can leverage Spark's super-scalable compute to create a dataset (of ~20B tiny records) which would otherwise take a long, long time to write to an empty RocksDB database.

Q:
Since Kvrocks uses RocksDB as its backend, I wondered - how hard would it be to do something like this?

Thanks!


  • Technically: we create a Spark dataframe with the data in two columns, and we sort the data globally by key.
    Then, a custom Spark module we've written creates an SST file using the RocksJava binding.
    Each dataframe partition is turned into a separate SST file.
    Since the data is sorted and keys are strictly unique, the SST files are non-overlapping, and thus are ingested to the bottomest level of RocksDB.

Solution

I assume that a solution would involve the following components:

  1. An offline library to create Kvrocks-compatible SSTs (i.e. conforming to this)
  2. A server API which can be given a list of SSTs to download and create a Kvrocks set from, using RockDB's IngestExternalFile.

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@matan129 matan129 added the enhancement type enhancement label Jul 31, 2024
@git-hulk
Copy link
Member

git-hulk commented Aug 1, 2024

Hi @matan129 Thanks for raising this discussion.

Yes, some users also proposed to support ingesting extern files: #1301, #1628. And the solution what you have mentioned is correct to implement this feature. But AFAIK, no community volunteer is working on it for now. Welcome to contribute if you're willing to do that.

@PragmaTwice PragmaTwice added the help wanted Good for newcomers label Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement type enhancement help wanted Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants