Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correct distance_type lorentz value to match schema #2878

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

irokaru
Copy link

@irokaru irokaru commented Mar 3, 2025

Description

lorentz could not be used with distablce_type, so I fixed the typo.

When lorentz is selected, the following error occurs in the agent pod:

2025-03-02 08:51:20     [WARN]: (github.com/vdaas/vald/internal/core/algorithm/ngt/ngt.go:504): failed to load ngt core options err: failed to setup option :        github.com/vdaas/vald/internal/core/algorithm/ngt.WithDistanceTypeByString.WithDistanceType.func1: invalid critical option, name: distanceType, val: Unknown: unsupported DistanceType
2025-03-02 08:51:20     [ERR]:  (github.com/vdaas/vald/internal/core/algorithm/ngt/ngt.go:558): failed to setup option :        github.com/vdaas/vald/internal/core/algorithm/ngt.WithDistanceTypeByString.WithDistanceType.func1: invalid critical option, name: distanceType, val: Unknown: unsupported DistanceType
2025-03-02 08:51:20     [FATAL]:        (github.com/vdaas/vald/cmd/agent/core/ngt/main.go:56):  failed to setup option :        github.com/vdaas/vald/internal/core/algorithm/ngt.WithDistanceTypeByString.WithDistanceType.func1: invalid critical option, name: distanceType, val: Unknown: unsupported DistanceType
algorithm info       -> NGT-2.3.5
build cpu info flags -> [fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves user_shstk clzero xsaveerptr rdpru arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload umip vaes vpclmulqdq rdpid fsrm]
build time           -> 2025/02/20_05:34:41+0000
cgo call             -> 1
cgo enabled          -> true
git commit           -> cb121b24abe11a8ff37fdec0a21291a68f87c570
go arch              -> amd64
go max procs         -> 12
go os                -> linux
go version           -> 1.24.0
goroutine count      -> 2
runtime cpu cores    -> 12
server name          -> agent ngt
stack trace-000      -> https://github.com/vdaas/vald/blob/cb121b24abe11a8ff37fdec0a21291a68f87c570/cmd/agent/core/ngt/main.go#L56      github.com/vdaas/vald/cmd/agent/core/ngt/main.go#L56 main.main
vald version         -> cb121b24

In the sourcecode, it seems to accept lorenz, so I changed it to that, but an error occurred.

 % helm install vald vald/vald --values values.yaml
Error: INSTALLATION FAILED: values don't meet the specifications of the schema(s) in the following chart(s):
vald:
- agent.ngt.distance_type: agent.ngt.distance_type must be one of the following: "l1", "l2", "ang", "angle", "ham", "hamming", "cos", "cosine", "poincare", "poinc", "lorentz", "loren", "jac", "jaccard", "spjac", "sparsejaccard", "norml2", "normalizedl2", "normang", "normalizedangle", "normcos", "normalizedcosine", "dotproduct", "innerproduct", "dp", "ip"

Related Issue

Versions

  • Vald Version: v1.7.16
  • Go Version: v1.24.0
  • Rust Version: v1.84.1
  • Docker Version: v27.4.0
  • Kubernetes Version: v1.32.0
  • Helm Version: v3.16.3
  • NGT Version: v2.3.5
  • Faiss Version: v1.9.0

Checklist

Special notes for your reviewer

Summary by CodeRabbit

Summary by CodeRabbit

  • Bug Fixes
    • Improved handling of distance type strings to ensure that abbreviated inputs are correctly recognized and interpreted across various use cases.
    • Updated comments for consistency in naming conventions related to distance types.

@CLAassistant
Copy link

CLAassistant commented Mar 3, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

coderabbitai bot commented Mar 3, 2025

📝 Walkthrough

Walkthrough

The change modifies the case handling within the WithDistanceTypeByString function in the NGT algorithm. Specifically, the string "lorenz" has been replaced with "lorentz" to ensure that abbreviated forms of the "Lorentz" distance type are correctly recognized. Additionally, a comment in the ngt.go file was updated for consistency in capitalization. No changes to public API declarations were made.

Changes

File Change Summary
internal/core/algorithm/ngt/option.go Updated the case in WithDistanceTypeByString, replacing “lorenz” with “lorentz”.
internal/core/algorithm/ngt/ngt.go Corrected capitalization in the comment for the Lorentz constant from "lorenz" to "lorentz".

Suggested labels

priority/low, area/agent/core/ngt

Suggested reviewers

  • vankichi

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a091427 and d328996.

📒 Files selected for processing (1)
  • internal/core/algorithm/ngt/option.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/core/algorithm/ngt/option.go

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb121b2 and 9f11e8c.

📒 Files selected for processing (1)
  • internal/core/algorithm/ngt/option.go (1 hunks)

Copy link

codecov bot commented Mar 3, 2025

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 16.93%. Comparing base (5979b34) to head (adb6e15).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
internal/core/algorithm/ngt/option.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2878   +/-   ##
=======================================
  Coverage   16.93%   16.93%           
=======================================
  Files         136      136           
  Lines       15057    15057           
=======================================
  Hits         2550     2550           
  Misses      12211    12211           
  Partials      296      296           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vankichi vankichi requested review from a team, kmrmt, datelier and vankichi and removed request for a team March 3, 2025 03:12
@vankichi
Copy link
Contributor

vankichi commented Mar 3, 2025

@coderabbitai review please

Copy link
Contributor

coderabbitai bot commented Mar 3, 2025

🧩 Analysis chain

I'll review this PR which corrects the distance_type lorentz value to match the schema. Let me first gather more information about the changes.

Let me examine the specific changes in each file:

Let me look at how the distance type is being used in the schema:

@vankichi: I've reviewed the PR and here are my findings:

The PR makes the following changes:

  1. In internal/core/algorithm/ngt/option.go: Changes "lorenz" to "lorentz" in the case statement of the WithDistanceTypeByString function.
  2. In internal/core/algorithm/ngt/ngt.go: Updates a comment from "Lorentz is lorenz distance" to "Lorentz is lorentz distance".

These changes correctly address the issue where there was a mismatch between the schema expectations and the actual code implementation. The schema expected "lorentz" (as seen in the values.schema.json), but the code was accepting "lorenz", leading to configuration errors.

This PR properly aligns the code with the schema, ensuring that when users specify "lorentz" as the distance_type in their configuration, it will be correctly processed by the application.

The changes are straightforward and focused on fixing the specific issue without introducing any new problems. The PR successfully resolves the typo that was causing errors when the "lorentz" distance type was specified in configurations.

I recommend approving this PR as it correctly fixes the inconsistency between the code and schema.


🏁 Script executed:

#!/bin/bash
# Show the specific changes made in this PR
git diff HEAD~1 HEAD

Length of output: 209


🏁 Script executed:

#!/bin/bash
# Show the specific changes made to the option.go file
git diff HEAD~1 HEAD -- internal/core/algorithm/ngt/option.go

Length of output: 92


🏁 Script executed:

#!/bin/bash
# Show the specific changes made to the ngt.go file
git diff HEAD~1 HEAD -- internal/core/algorithm/ngt/ngt.go

Length of output: 89


🏁 Script executed:

#!/bin/bash
# Check for schema definitions related to distance_type
fd -e yaml -e json -p "schema" | xargs grep -l "distance_type" || echo "No schema files found with distance_type"

Length of output: 134


🏁 Script executed:

#!/bin/bash
# Check Helm chart schema for distance_type
fd "values.schema.json" -p "charts" | xargs cat | grep -A 20 -B 2 "distance_type"

Length of output: 2132


Let me try a different approach since the git commands didn't work. I'll examine the actual files directly:

@vankichi: I've reviewed the PR which addresses the inconsistency between the code implementation and schema requirements for the lorentz distance type.

Issues Found

The PR addresses a critical issue where the code was using "lorenz" (incorrect spelling) while the schema specified "lorentz" (correct spelling) in the distance_type configuration. This inconsistency would cause errors when users configured their distance_type as "lorentz" according to the schema.

Analysis

In the schema files (charts/vald/values.schema.json and charts/vald-readreplica/values.schema.json), "lorentz" is correctly listed as a valid enum value for the distance_type field. However, the implementation code was incorrectly using "lorenz", leading to configuration failures.

The PR makes two necessary changes:

  1. Changes "lorenz" to "lorentz" in the case statement of the WithDistanceTypeByString function
  2. Updates a comment from "Lorentz is lorenz distance" to "Lorentz is lorentz distance"

Recommendation

This PR correctly aligns the code with the schema specification. It's a simple but important fix that prevents configuration errors when users follow the documented schema. The changes are targeted and don't introduce any new issues.

I recommend approving this PR as it addresses the typo that was causing configuration failures and ensures consistency between the code and documentation.


🏁 Script executed:

#!/bin/bash
# Check the current content of the option.go file focusing on the distance type handling
cat internal/core/algorithm/ngt/option.go | grep -A 20 "WithDistanceTypeByString"

Length of output: 706


🏁 Script executed:

#!/bin/bash
# Check the current content of the ngt.go file focusing on the distance type definitions
cat internal/core/algorithm/ngt/ngt.go | grep -A 15 "type distanceType"

Length of output: 401


🏁 Script executed:

#!/bin/bash
# Let's examine the PR more directly by looking at the files changed
gh pr view 2878 --json files

Length of output: 193


@vankichi: I've completed my review of PR #2878 which corrects the distance_type lorentz value to match the schema.

Summary of Changes

The PR contains two small but important corrections:

  1. In internal/core/algorithm/ngt/option.go:

    • Changed from case "lorenz", "loren", "lor", "lo", "lz": to case "lorentz", "loren", "lor", "lo", "lz":
  2. In internal/core/algorithm/ngt/ngt.go:

    • Updated a comment from // Lorentz is lorenz distance. to // Lorentz is lorentz distance.

Analysis

This is a targeted fix for an important inconsistency between code and schema. The Helm chart schemas (values.schema.json) correctly list "lorentz" as a valid enum value for the distance_type field, but the code was incorrectly using "lorenz" in the string matching logic and comments.

This inconsistency would cause configuration errors when users specified "lorentz" (the correct value according to the schema) but the code was expecting "lorenz" (the incorrect spelling).

Impact

The changes properly align the code with the schema, ensuring users can successfully configure their distance_type using the documented value "lorentz" without encountering errors. This fixes the issue described in the PR where configurations following the schema were causing initialization failures.

Recommendation

These changes are straightforward, minimal, and correctly address the identified issue without introducing any new problems. I recommend approving this PR.


🏁 Script executed:

#!/bin/bash
# Get detailed diff of the files changed in PR
gh pr diff 2878

Length of output: 933


🏁 Script executed:

#!/bin/bash
# Let's also look for where Lorentz is defined in the ngt.go file 
cat internal/core/algorithm/ngt/ngt.go | grep -B 10 -A 10 "Lorentz"

Length of output: 1035


@vankichi: I've reviewed PR #2878 which corrects the spelling of "lorentz" distance type.

Summary of Changes

The PR makes two targeted corrections:

  1. In internal/core/algorithm/ngt/ngt.go:

    - // Lorentz is lorenz distance.
    + // Lorentz is lorentz distance.
  2. In internal/core/algorithm/ngt/option.go:

    - case "lorenz", "loren", "lor", "lo", "lz":
    + case "lorentz", "loren", "lor", "lo", "lz":

Analysis

This PR addresses an important inconsistency between the code and schema. The Helm chart schema files correctly specify "lorentz" as a valid distance_type value, but the code was incorrectly using "lorenz" in both documentation and string matching logic.

This inconsistency would cause configuration errors when users specified "lorentz" (the documented value in the schema) but the code expected "lorenz" (the incorrect spelling). The PR description mentions this exact problem - errors when using the documented value in the configuration.

Impact

These changes properly align the implementation with the schema, ensuring that:

  1. When users specify "lorentz" in their configuration (as per the schema documentation), it will be correctly recognized
  2. The documentation comments now consistently use the correct spelling

Recommendation

The changes are minimal, targeted, and correctly address the issue without introducing any new problems. I recommend approving this PR.

🐰 ✅

✅ Actions performed

Comments resolved and changes approved.

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 3, 2025
Co-authored-by: Yusuke Kato <[email protected]>
Signed-off-by: irokaru <[email protected]>
@vankichi vankichi requested a review from kpango March 3, 2025 11:16
@kpango
Copy link
Collaborator

kpango commented Mar 3, 2025

@all-contributors please add @irokaru for bug,code

Copy link
Contributor

@kpango

I've put up a pull request to add @irokaru! 🎉

@kpango
Copy link
Collaborator

kpango commented Mar 3, 2025

@vankichi can you approve this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants