-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Remote Vector Index Build] Add download + indexOuput#write implementation to RemoteIndexBuildStrategy #2554
Conversation
1a11805
to
3674aa2
Compare
src/main/java/org/opensearch/knn/index/codec/nativeindex/remote/RemoteIndexBuildStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/remote/RemoteIndexBuildStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/remote/RemoteIndexBuildStrategy.java
Outdated
Show resolved
Hide resolved
3674aa2
to
6d950eb
Compare
src/main/java/org/opensearch/knn/index/codec/nativeindex/remote/RemoteIndexBuildStrategy.java
Outdated
Show resolved
Hide resolved
bc0f66a
to
c945c88
Compare
src/main/java/org/opensearch/knn/index/codec/nativeindex/remote/RemoteIndexBuildStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/store/IndexOutputWithBuffer.java
Outdated
Show resolved
Hide resolved
c945c88
to
6b6c3b1
Compare
src/main/java/org/opensearch/knn/index/codec/nativeindex/remote/RemoteIndexBuildStrategy.java
Outdated
Show resolved
Hide resolved
6b6c3b1
to
e7afbc4
Compare
@VisibleForTesting | ||
public void readFromRepository(String path, IndexOutputWithBuffer indexOutputWithBuffer) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be public if you are adding @VisibleForTesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's an artifact from rebasing. Since we're implementing an interface now, it has to be public. I will make the VectorRepositoryAccessor
package private though.
…ategy Signed-off-by: Jay Deng <[email protected]>
e7afbc4
to
6187dfb
Compare
* @see IndexOutputWithBuffer#writeFromStreamWithBuffer(InputStream, byte[]) | ||
*/ | ||
public void writeFromStreamWithBuffer(InputStream inputStream) throws IOException { | ||
writeFromStreamWithBuffer(inputStream, this.buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Not sure why we need to wrap the method again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove this if you think it would be better, but the idea is that the buffer we use for downloading graph files for remote index builds is not necessarily the same buffer as the one we use to buffer bytes out of native memory.
As a part of #2464 we will tune this buffer size (as well as the other ones used in the upload path), but for now we just use the same buffer as a starting point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous discussion here: #2554 (comment)
Description
As a follow-up to #2550, this PR includes the implementation + testing for downloading a graph file from the remote repository and then writing to an
IndexOutput
.The key change here is that in order to manage memory consumption, we use a buffer of size 64kb, the same as used in the
IndexOutputWithBuffer
class, to buffer the graph file from the remote repository into theIndexOutput
, else we will go OOM on the download of larger graph files.For now, we are only using sequential download for the graph file. Parallel download of the graph file has several difficulties associated with it, both on the download side as well as on the writing to
IndexOutput
side. For more details on these, see: #2464. We take on parallel blob download as a future improvement.Related Issues
Relates #2465
Relates #2392
Relates #2391
Check List
- [ ] New functionality has been documented.- [ ] API changes companion pull request created.--signoff
.- [ ] Public documentation issue/PR created.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.