add sorting policy to ChunkDataset #17

xzhu1900 · 2019-07-18T22:41:44Z

Add a sorting policy to ChunkDataset.

This is considered an advanced parameter for developers who want to apply a 'sorting policy' to the chunk data before sampling into minibatch.

Different than the collate method, this policy is applied on the chunk level. When a chunk of data is loaded (multiple chunks if cross_chunk_shuffle_count_ is greater than 1), this policy is targeting to the full loaded data. It will be useful if developers want to perform some pre-processing (like bucketing) to the chunk data before example sampler samples the data.

thiagocrepaldi · 2019-07-19T00:39:17Z

test/cpp/api/dataloader.cpp

+    size_t chunk_count_;
+  };
+
+  auto sorting_policy = [](std::vector<int>& raw_batch_data){


Would D::BatchType& be more flexible than std::vector, as it will still be back compatible if D implementation changes to another type in the future?

Yes, it will be. I'll make the change in next iteration.

xzhu1900 added 2 commits July 18, 2019 15:32

add sorting policy

10a3c2a

add comment

1cdd167

thiagocrepaldi reviewed Jul 19, 2019

View reviewed changes

xzhu1900 and others added 2 commits July 24, 2019 10:32

rename sorting_policy to preprocessing_policy

cbe62a0

Merge remote-tracking branch 'origin/master' into HEAD

c85a5ae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add sorting policy to ChunkDataset #17

add sorting policy to ChunkDataset #17

xzhu1900 commented Jul 18, 2019

thiagocrepaldi Jul 19, 2019

xzhu1900 Jul 19, 2019

add sorting policy to ChunkDataset #17

Are you sure you want to change the base?

add sorting policy to ChunkDataset #17

Conversation

xzhu1900 commented Jul 18, 2019

thiagocrepaldi Jul 19, 2019

Choose a reason for hiding this comment

xzhu1900 Jul 19, 2019

Choose a reason for hiding this comment