Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort labels #527

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

ericpromislow
Copy link
Contributor

Follow up to #46333: I didn't implement sorting by arbitrary labels.

@ericpromislow ericpromislow requested a review from a team as a code owner February 25, 2025 05:13
@ericpromislow ericpromislow removed the request for review from a team February 25, 2025 05:22
@ericpromislow ericpromislow marked this pull request as draft February 25, 2025 06:56
@ericpromislow
Copy link
Contributor Author

Setting this to draft because I think if you're sorting on a particular label but don't mention that label in a query we should add a query specifying the label exists. Otherwise the results don't sort on that label. I haven't figured out why. The simple but probably wrong answer is that the sqlite engine ignores the field because it isn't being queried or selected. The more probably correct answer that I haven't worked out is that the select distinct on non-label values does exactly what we tell it to, and it ignores the sorted values for some other ordering.

So my solution to give the user what they probably expect is for each label that we're sorting on that isn't mentioned in a query, we add an EXISTS query on it. And then I think the answers will be as expected (if not strictly correct from a SQL point of view).

This comes from a random result I noticed:

# Unsorted list:
curl -skL https://localhost:5111/v1/events'?filter=metadata.labels.bar'| jq '.data[].metadata | [.name, .labels.bar]'
[
  "my-custom-event2",
  "vanillay"
]
[
  "my-custom-event4",
  "chocolate"
]
[
  "my-custom-event7",
  "chocolate"
]

# Sorted by label:

$ curl -skL https://localhost:5111/v1/events'?filter=metadata.labels.bar&sort=metadata.labels.bar' | jq '.data[].metadata | [.name, .labels.bar]'
[
  "my-custom-event4",
  "chocolate"
]
[
  "my-custom-event7",
  "chocolate"
]
[
  "my-custom-event2",
  "vanillay"
]

# Sort on the bar label without referencing it doesn't sort as expected (although the underlying SQL looked fine)
$  curl -skL https://localhost:5111/v1/events'?sort=metadata.labels.bar' | jq '.data[].metadata | [.name, .labels.bar]'
[
  "my-custom-event2",
  "vanillay"
]
[
  "my-custom-event3",
  null
]
[
  "my-custom-event4",
  "chocolate"
]
[
  "my-custom-event7",
  "chocolate"
]
[
  "my-custom-event11",
  null
]

The underlying SQL for the last query:

SELECT DISTINCT o.object, o.objectnonce, o.dekid FROM "_v1_Event" o
  JOIN "_v1_Event_fields" f ON o.key = f.key
  LEFT OUTER JOIN "_v1_Event_labels" lt1 ON o.key = lt1.key
  ORDER BY (CASE lt1.label WHEN "bar" THEN lt1.value ELSE NULL END) ASC NULLS LAST

Getting deeper into the SQL weeds shows why we get the ordering by name:

sqlite> select distinct o.key, lt1.label, lt1.value FROM "_v1_Event" o JOIN "_v1_Event_fields" f ON o.key = f.key LEFT OUTER JOIN "_v1_Event_labels" lt1 ON o.key = lt1.key ORDER BY (CASE lt1.label WHEN "bar" THEN lt1.value ELSE NULL END) ASC NULLS LAST;
default/my-custom-event4|bar|chocolate
default/my-custom-event5|bar|chocolate
default/my-custom-event6|bar|chocolate
default/my-custom-event7|bar|chocolate
default/my-custom-event2|bar|vanillay
default/my-custom-event11|rancher.io/label01|app1
default/my-custom-event11|rancher.io/label02|app2
default/my-custom-event2|app|app2
default/my-custom-event2|cat|crampee
default/my-custom-event3||
default/my-custom-event4|app|app1
default/my-custom-event5|app|app1
default/my-custom-event5|cat|morris
default/my-custom-event5|dog|zipher
default/my-custom-event6|app|app1
default/my-custom-event6|yok|egg
default/my-custom-event7|app|app1
sqlite> select distinct o.key FROM "_v1_Event" o JOIN "_v1_Event_fields" f ON o.key = f.key LEFT OUTER JOIN "_v1_Event_labels" lt1 ON o.key = lt1.key ORDER BY (CASE lt1.label WHEN "bar" THEN lt1.value ELSE NULL END) ASC NULLS LAST;
default/my-custom-event11
default/my-custom-event2
default/my-custom-event3
default/my-custom-event4
default/my-custom-event5
default/my-custom-event6

The last bit of SQL is what we're having sqlite do. Adding an existence test for lt1.label == X solves the problem.

@ericpromislow ericpromislow marked this pull request as ready for review February 25, 2025 22:41
The key to doing this is if we want to sort on, say, `metadata.labels.foo`, we need to
search for all rows with a label of the name `foo` in all the various
join tables we create for each label the query references.

We ignore nulls by giving them lowest priority using "NULLS LAST"
("NULLS FIRST" if sorting in descending order).
@ericpromislow
Copy link
Contributor Author

ericpromislow commented Feb 25, 2025

I implemented "forced selection" in the second commit (Ensure labels that are mentioned only in sort params are still selected.), but I'm wondering now if this should be an error message instead.

Also, I'm generating more code than needed when more than one label is referenced in a filter or query, but I don't think damage is actually done. I'm thinking that it would be better to track which join table lt<N> is for which label name.... Might do that later.

@ericpromislow ericpromislow marked this pull request as draft February 26, 2025 17:00
If we don't do this -- say we sort on metadata.labels.foo but never
make a test on it, the sort resuilts are ignored.
@ericpromislow ericpromislow marked this pull request as ready for review February 27, 2025 00:22
@ericpromislow
Copy link
Contributor Author

Last change: I realized that my label-sort command had a sql injection vulnerability -- (CASE lt1.label WHEN <labelName> THEN lt1.value ELSE NULL) has a possible vul if someone does something like curl...sort=metadata.labels[oops" ELSE NULL ; DROP ALL TABLES ; <and then start another sort stmt up to the ELSE>). I didn't test for this, just replaced <labelName> with a ? and let the sql engine do the proper escaping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant