Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add columns to support sort ordering on label columns #5013

Open
roed314 opened this issue Feb 8, 2022 · 9 comments
Open

Add columns to support sort ordering on label columns #5013

roed314 opened this issue Feb 8, 2022 · 9 comments
Labels
Artin reps Artin representations Belyi ECQ Elliptic curves over Q finite groups Fnite groups
Milestone

Comments

@roed314
Copy link
Contributor

roed314 commented Feb 8, 2022

Labels are stored as text, but we'd like them to be sorted numerically (lexicographically). There's probably some fancy thing we could do with custom PostgreSQL types, but the simpler solution is to just add some more columns with the numerical components. The following sort orders were suggested by @AndrewVSutherland in #3109, and were initially added in #4991 but were disabled because they sorted incorrectly.

Here's a list of text columns that we would need to split up into numerical columns (or possibly single columns consisting of a list of integers) in order to have proper sorting.

  • artin_reps: Container
  • belyi_galmaps_fixed: base_field_label
  • ec_curvedata: Ciso
  • g2c_curves: st_label
  • gps_groups: center_label, commutator_label, central_quotient, abelian_quotient
  • gps_st: st0_label

There are also several cases where we have enough parts of the label that it sorts sensibly, but the text label is used as a tiebreaker. These mostly look okay.

@AndrewVSutherland AndrewVSutherland added Artin reps Artin representations ECQ Elliptic curves over Q finite groups Fnite groups Genus 2 Genus 2 curves over Q ST groups Sato-Tate groiups Belyi labels Feb 8, 2022
@AndrewVSutherland
Copy link
Member

I'm going to add a column st_label_components to g2c_curves of type integer[] that is a list of the 6 integers that make up the Sato-Tate group label. Question for @roed314: would it make sense to also add a label_components column to the gps_st table? Currently we sort with an index on the six columns that make up the components, but we could instead sort on a single column. The same question applies to all of the object tables -- should any table that has a label column also have a label_components column of type integer[] (or numeric[] if there is any possibility of values greater than 2^31, as with number fields and artin reps). Conceivably this could save on indexes, but if might not because we may also be using the same index for queries that involve only some of the columns in the label...

@roed314
Copy link
Contributor Author

roed314 commented Feb 8, 2022

I'm not 100% sure what is more efficient. I suspect that it's better to keep them as separate columns when they might occur in search queries, since otherwise postgres doesn't know that they're connected (unless you use more advanced statistics for the query planner than we have enabled). Of course, for gps_st there are no efficiency concerns since it's so small.

In general, I think a lot of the component columns already exist since the labels have mathematically meaningful parts. The issues usually arise with the "tiebreaker" parts of the labels. I think my inclination would be to make the changes as minimally intrusive as possible, and just add numerical versions of these tiebreaker parts.

@AndrewVSutherland
Copy link
Member

@roed314 But for tables that are using the labels as references (e.g. g2c_curves referring to a Sato-Tate group) surely we don't want a column for each component of the ST group label, I think an array is better.

For the label column itself, I'm not immediately convinced that it is less invasive to add numeric tie breaker columns. For example, in the gps_st table I would need to add two columns (one for the identity component letter and another for the tie breaker at the end), or I could just add one column whose values I need to compute anyway. None of these columns is going to be used by the code for anything other than sorting, and its less code to just specify a single sort key than a compound one. Am I missing something here?

@AndrewVSutherland
Copy link
Member

I've added the column st_label_components to g2c_curves.

@roed314
Copy link
Contributor Author

roed314 commented Feb 8, 2022

I agree: for columns referring to an external label, just having an array is fine.

There's some data duplication (which is negligible in most cases). I think if there is only one part that needs to be numeric, I'd go with a scalar column, but in the gps_st table where there are two I'm fine with an array column.

@roed314
Copy link
Contributor Author

roed314 commented Feb 9, 2022

I'm using the st_label_components for the sort in genus 2 curves now. But I discovered that the Sato-Tate knowls are broken: click on one here.

@AndrewVSutherland
Copy link
Member

This must have broken when I was addressing a merge conflict a few days ago, fixed in #5016 (the change is just two characters)

@AndrewVSutherland AndrewVSutherland removed Genus 2 Genus 2 curves over Q ST groups Sato-Tate groiups labels Feb 11, 2022
@AndrewVSutherland
Copy link
Member

I've added a label_components column to gps_st0

@jenpaulhus
Copy link
Contributor

Is this something we are still hoping to do with finite groups?

@roed314 roed314 modified the milestones: v1.4, v1.3 Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Artin reps Artin representations Belyi ECQ Elliptic curves over Q finite groups Fnite groups
Projects
None yet
Development

No branches or pull requests

3 participants