Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: OpenAI embeddings with GPU based KNN #2157

Merged
merged 27 commits into from
Feb 9, 2024

Conversation

vonodiripsa
Copy link
Contributor

Added new OpenAI embeddings Quickstart demo with GPU based KNN using NVIDIA Rapids

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

It is a new docs notebook demonstrating usage of NVIDIA Rabids KNN on GPU.

How is this patch tested?

  • I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

  • No. You can skip this section.
  • Yes. Make sure the dependencies are resolved correctly. It depends on GPU based compute with Init Script installing NVIDIA Rapids KNN.

Does this PR add a new feature? If so, have you added samples on website?

  • No. You can skip this section.
  • Yes.

Added OpenAI embeddings with GPU based KNN using NVIDIA Rapids
Copy link

Hey @vonodiripsa 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.

We use semantic commit messages to streamline the release process.
Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix.
This helps us to create release messages and credit you for your hard work!

Examples of commit messages with semantic prefixes:

  • fix: Fix LightGBM crashes with empty partitions
  • feat: Make HTTP on Spark back-offs configurable
  • docs: Update Spark Serving usage
  • build: Add codecov support
  • perf: improve LightGBM memory usage
  • refactor: make python code generation rely on classes
  • style: Remove nulls from CNTKModel
  • test: Add test coverage for CNTKModel

To test your commit locally, please follow our guild on building from source.
Check out the developer guide for additional guidance on testing your change.

@mhamilton723
Copy link
Collaborator

Please clear the output in this notebook before checking it in so that the diff is minimal

@mhamilton723
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723
Copy link
Collaborator

Well also want to try to get this init script to run on the databricks clusters we spin up so that it tests properly. Can you add the init script to a file in say the tools/init_scripts directory. That way we can just link people to it, and we can upload it during the build. Well also want to add this to the GPU tests on databricks, see the nbtest folder for pointers to the GPU databricks test runner

@codecov-commenter
Copy link

codecov-commenter commented Jan 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (fa9ba2e) 84.49% compared to head (85e1094) 84.47%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2157      +/-   ##
==========================================
- Coverage   84.49%   84.47%   -0.03%     
==========================================
  Files         325      325              
  Lines       16959    16959              
  Branches     1524     1524              
==========================================
- Hits        14330    14326       -4     
- Misses       2629     2633       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Added init script to install repids ml using cuda 11.8
@vonodiripsa vonodiripsa changed the title OpenAI embeddings with GPU based KNN doc: OpenAI embeddings with GPU based KNN Jan 18, 2024
@vonodiripsa vonodiripsa changed the title doc: OpenAI embeddings with GPU based KNN docs: OpenAI embeddings with GPU based KNN Jan 18, 2024
@vonodiripsa
Copy link
Contributor Author

Corrected the semantic prefix and added init script

@vonodiripsa vonodiripsa changed the title docs: OpenAI embeddings with GPU based KNN feat: OpenAI embeddings with GPU based KNN Jan 18, 2024
@mhamilton723
Copy link
Collaborator

can you remove output from the notebook please?

Removed outputs
@bvonodiripsa
Copy link
Contributor

@microsoft-github-policy-service agree company="Microsoft"

With GPU KNN notebook test code
Added GPU test code to OpenAI with KNN notebook
@bvonodiripsa
Copy link
Contributor

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bvonodiripsa
Copy link
Contributor

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vonodiripsa
Copy link
Contributor Author

@microsoft-github-policy-service agree company="NVIDIA"

@bvonodiripsa
Copy link
Contributor

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bvonodiripsa
Copy link
Contributor

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Fixed style errors
Suggested by Mark
@bvonodiripsa
Copy link
Contributor

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines 104 to 109
.filterNot(_.getAbsolutePath.contains("Fine-tune"))
.filterNot(_.getAbsolutePath.contains("GPU"))
.filterNot(_.getAbsolutePath.contains("Explanation Dashboard")) // TODO Remove this exclusion

val GPUNotebooks: Seq[File] = ParallelizableNotebooks.filter(_.getAbsolutePath.contains("Fine-tune"))
val GPUNotebooks: Seq[File] = ParallelizableNotebooks
.filter(file =>
file.getAbsolutePath.contains("GPU"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep fine-tine in the filternots, you had it right last time

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to set this back to the || expression you had

@bvonodiripsa
Copy link
Contributor

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bvonodiripsa
Copy link
Contributor

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Create cluster using init script
@bvonodiripsa
Copy link
Contributor

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).


class DatabricksRapidsTests extends DatabricksTestHelper {

val clusterId: String = createClusterInPool(GPUClusterName, AdbGpuRuntime, 2, GpuPoolId, RapidsInitScripts)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val clusterId: String = createClusterInPool(GPUClusterName, AdbGpuRuntime, 2, GpuPoolId, RapidsInitScripts)
val clusterId: String = createClusterInPool(GPUClusterName, AdbGpuRuntime, 1, GpuPoolId, RapidsInitScripts)

Reduced number of nodes to 1
@bvonodiripsa
Copy link
Contributor

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bvonodiripsa
Copy link
Contributor

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723 mhamilton723 merged commit 2836cf3 into microsoft:master Feb 9, 2024
66 of 68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants