-
Notifications
You must be signed in to change notification settings - Fork 117
[HIVEMALL-75] Support Sparse Vector Format as the input of RandomForest #51
Conversation
Need to reduce memory usage for a large sparse input. WIP...
|
To reduce memory consumption, I'm considering to replace
Any good algorithm or library for fast integer sequence compression/decompression? JavaFastPFOR does not support streaming decompression. |
As another option, you can use bitshuffle + snappy in That is, my answer is that AFAIK there is no existing library for your case. |
@maropu In this case, Random Access is not a requirement. An option is using Deflate and DataInputStream/DataOutputStream. Fast deserialization as much as possible is preferred though. Compressing int[] to byte[] and decompressing the compressed byte[] by sequential |
@maropu introduced compressed |
|
and IntReservoirSampler
1 similar comment
Need to sync with the current Smile's up-to-date prediction scheme for the better accuracy. haifengl/smile@444d8bb |
1 similar comment
Merged. Documentation and model conversion pull requests follows in [HIVEMALL-75-2] |
…put of RandomForest
What changes were proposed in this pull request?
Supported sparse vector as the input of RandomForest.
What type of PR is it?
Improvement
What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-75
How was this patch tested?
unit test and manual test