Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Sec. index on json #3330

Open
wants to merge 28 commits into
base: develop
Choose a base branch
from
Open

feat: Sec. index on json #3330

wants to merge 28 commits into from

Conversation

islamaliev
Copy link
Contributor

Relevant issue(s)

Resolves #2280

Description

Enables json fields indexing.

JSON interface has been extended to allow traversing it with different configurations.

Indexing or documents has been refactor so that instead of acting based off of the fact that there is a special field in the index description (like array or json), we assign a field-specific generator so that every field is responsible for generating a value for the inde key. For example, if we have a composite index made up of fields of types int, array and json (complex composite index), we will generate all possible combinations where int generator will always generate 1 value, array generator will generate values for every element and json generator will generate values for every json node.

Added json encoding/decoding to our encoding package.

@islamaliev islamaliev added area/query Related to the query component perf Performance issue or suggestion labels Dec 16, 2024
@islamaliev islamaliev self-assigned this Dec 16, 2024
@islamaliev islamaliev requested review from a team and pradhanashutosh December 16, 2024 16:08
Copy link

codecov bot commented Dec 16, 2024

Codecov Report

Attention: Patch coverage is 83.94161% with 132 lines in your changes missing coverage. Please review.

Project coverage is 78.20%. Comparing base (363fe9c) to head (06e78e4).
Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
internal/db/fetcher/indexer_matchers.go 67.42% 75 Missing and 12 partials ⚠️
internal/db/index.go 81.05% 12 Missing and 6 partials ⚠️
client/json.go 96.15% 6 Missing and 2 partials ⚠️
internal/encoding/json.go 92.59% 5 Missing and 3 partials ⚠️
internal/encoding/field_value.go 87.88% 3 Missing and 1 partial ⚠️
internal/encoding/errors.go 62.50% 2 Missing and 1 partial ⚠️
internal/db/fetcher/errors.go 0.00% 2 Missing ⚠️
internal/db/fetcher/indexer.go 0.00% 1 Missing ⚠️
internal/db/fetcher/indexer_iterators.go 98.48% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3330      +/-   ##
===========================================
+ Coverage    78.05%   78.20%   +0.15%     
===========================================
  Files          388      391       +3     
  Lines        35398    35602     +204     
===========================================
+ Hits         27629    27842     +213     
+ Misses        6133     6126       -7     
+ Partials      1636     1634       -2     
Flag Coverage Δ
all-tests 78.20% <83.94%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
client/normal_util.go 96.12% <100.00%> (+0.12%) ⬆️
internal/encoding/bool.go 100.00% <100.00%> (ø)
internal/encoding/encoding.go 100.00% <ø> (ø)
internal/encoding/null.go 100.00% <100.00%> (ø)
internal/encoding/type.go 100.00% <100.00%> (+11.76%) ⬆️
internal/planner/scan.go 90.04% <100.00%> (+0.16%) ⬆️
internal/db/fetcher/indexer.go 83.69% <0.00%> (ø)
internal/db/fetcher/indexer_iterators.go 85.09% <98.48%> (+10.14%) ⬆️
internal/db/fetcher/errors.go 22.22% <0.00%> (-1.78%) ⬇️
internal/encoding/errors.go 82.76% <62.50%> (-7.72%) ⬇️
... and 5 more

... and 16 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 363fe9c...06e78e4. Read the comment docs.

@islamaliev islamaliev removed the request for review from pradhanashutosh December 16, 2024 17:28
Copy link

@ashutosh-src ashutosh-src left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, pls make the suggested doc changes.

Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments to start. There are a couple areas I want to go over again but in general I would say this PR looks really good. The JSON index feature is really powerful and will certainly be a selling point for a lof of devs that need to handle JSON metadata.

Comment on lines +125 to +126
// JSONVisitor is a function that processes a JSON value at a given path.
// path represents the location of the value in the JSON tree.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: There is no path in the function signature. Was this forgotten from a previous implementation?

edit: I see that it's in reference to the JSON parameter that has the path internally. Maybe just rephrase so that it reflect that instead. at a given path makes the reader believe that a path parameter should be provided and that seems like a mistake.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Thank you for writing this documentation. It's very informative and will be useful both to help write the external documentation and for future devs working on indexes.

if cond.op == compOpAny || cond.op == compOpAll || cond.op == compOpNone {
subCondMap := filterVal.(map[connor.FilterKey]any)
for subKey, subVal := range subCondMap {
// TODO: check what happens with _any: {_eq: [1, 2]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: I assume this should be done before merging.

@@ -691,7 +459,15 @@ func (f *IndexFetcher) createIndexIterator() (indexIterator, error) {
} else if fieldConditions[0].op == opIn && fieldConditions[0].arrOp != compOpNone {
iter, err = f.newInIndexIterator(fieldConditions, matchers)
} else {
iter, err = f.newPrefixIterator(f.newIndexDataStoreKey(), matchers, &f.execInfo), nil
key := f.newIndexDataStoreKey()
// TODO: can we test fieldConditions[not 0]?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: Please resolve this or create an issue for it.

fieldsDescs []client.SchemaFieldDescription
collection client.Collection
desc client.IndexDescription
// fieldsDescs is a slice of field descriptions for the fields that are indexed by the index
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: "for the fields that form the index"

@@ -186,6 +241,47 @@ func (index *collectionBaseIndex) Description() client.IndexDescription {
return index.desc
}

func (index *collectionBaseIndex) generateIndexKeys(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The name of this method and the one ForField version are both misleading as they don't just generate index keys. The function f is a side effect that brings a bit of confusion when reading the code. Like the name suggests that we want to generate index keys but no keys are returned. I know this is internal but what is happening is so not obvious that documentation would be very helpful.

b = append(b, jsonMarker)
for _, part := range v.GetPath() {
pathBytes := unsafeConvertStringToBytes(part)
//b = encodeBytesAscendingWithTerminator(b, pathBytes, ascendingBytesEscapes.escapedTerm)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: Remove commented line.

@@ -1313,7 +1313,7 @@ func TestQueryWithUniqueCompositeIndex_AfterUpdateOnNilFields_ShouldFetch(t *tes
},
},
},
testUtils.Request{
/*testUtils.Request{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: Uncomment or remove commented code.

Copy link
Member

@nasdf nasdf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation and documentation are both excellent. There's two test gaps in the index matcher functions that could be fixed but no blockers from me.

GetPath() []string

// accept calls the visitor function for the JSON value at the given path.
accept(visitor JSONVisitor, path []string, opts traverseJSONOptions) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: renaming this function to visit or traverse is a bit more intuitive to me

value time.Time
}

func (m *timeMatcher) Match(value client.NormalValue) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: this function could use some additional test coverage if possible

isEq bool
}

func (m *boolMatcher) Match(value client.NormalValue) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: this function could use some additional test coverage if possible

if descending {
return EncodeVarintDescending(b, boolInt)
return EncodeBoolDescending(b, v.Value())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: This is not hit with testing

@@ -89,3 +93,17 @@ func NewErrInvalidUvarintLength(b []byte, length int) error {
func NewErrVarintOverflow(b []byte, value uint64) error {
return errors.New(errVarintOverflow, errors.NewKV("Buffer", b), errors.NewKV("Value", value))
}

// NewErrInvalidJSONPayload returns a new error indicating that the buffer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: The comment seems cut off (indicating that the buffer....)


// DecodeBoolDescending decodes a boolean value encoded in descending order.
func DecodeBoolDescending(b []byte) ([]byte, bool, error) {
leftover, v, err := DecodeBoolAscending(b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: rename to leftOver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/query Related to the query component perf Performance issue or suggestion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sec. Index: enable indexing of key-values of JSON fields
5 participants