You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanna know if there is a properly way to produce data in batches? Like I have 10 million objects to produce, I wanna produce them divided into 10 parts and produce 1 million objects every time. I need to produce data in batches because my vm does not have enough memory to store 10 million objects.
I am using Incremental and withNumStatesBetweenSnapshots to make it publish snapshot only at begining and at last so that it run like "in batches". But I met a problem that sometimes the Incremental did not publish dataset because some batch do not change the dataset.
I have fork hollow-reference-implementation and make 2 test cases to show what we are looking for. You can check my test cases: ProducerTest
The text was updated successfully, but these errors were encountered:
@prasoon16081994@shyam4u Not yet.
Now we are still using Incremental to get close to it. The "dataset not changed" issue can be avoided if you check the version at each producing time. Example below:
List<Data> datas = getMillionDatas();
Incrementalincremental = getIncremental();
longlastVersion = 0;
for (...) {
letversion = incremental.produce(...);
// check if version changed, if not, means dataset not changed.if (version == lastVersion ) {
thrownewRuntimeException("dataset not changed!");
}
lastVersion = version;
}
@Q-Bug4
Hey, finally developed a halfway decent understanding of hollow.
Isn't is the correct implementation to not write a delta if the dataset didn't change?
Perhaps if each of the records were needed, there could be some field that could be added to the record that is unique across all records, and marked as the primary key.
Hi, we love using hollow, it is very nice.
I wanna know if there is a properly way to produce data in batches? Like I have 10 million objects to produce, I wanna produce them divided into 10 parts and produce 1 million objects every time. I need to produce data in batches because my vm does not have enough memory to store 10 million objects.
I am using
Incremental
andwithNumStatesBetweenSnapshots
to make it publish snapshot only at begining and at last so that it run like "in batches". But I met a problem that sometimes theIncremental
did not publish dataset because some batch do not change the dataset.I have fork hollow-reference-implementation and make 2 test cases to show what we are looking for. You can check my test cases: ProducerTest
The text was updated successfully, but these errors were encountered: