languages: copy over latest version from sourcegraph #828

keegancsmith · 2024-09-17T10:20:21Z

I realised we haven't been updating this package as we updated the package in the sourcegraph repo.

We don't need all the functionality it has, but its easier to just copy paste everything.

Test Plan: go test

I realised we haven't been updating this package as we updated the package in the sourcegraph repo. We don't need all the functionality it has, but its easier to just copy paste everything. Test Plan: go test

github-actions · 2024-09-17T10:21:13Z

Fuzz test failed on commit 78a7ea4. To troubleshoot locally, use the GitHub CLI to download the seed corpus with

gh run download 10901106986 -n testdata

keegancsmith · 2024-09-17T13:50:56Z

Ok there are issues here. The issue is boils down to how we changed normalization between the two versions of the package.

I actually have bigger concerns around this now. We just lookup by query.Language.Language and do no normalization on it. We expect normalization to be done before. This then relies on the normalization being the same at index time as the time we construct the query. I think this is quite easy to mess up, so I need to read more code to see how this can be made more robust.

But essentially I feel like when we read in index data from disk we should normalize by the rules of the running process. Then we do the same normilization when executing a query. There is one other issue where we directly look into the go-enry maps to get out extensions. That code also seems very fragile.

cc @jtibshirani who I think looked at all this code recently.

varungandhi-src

Let's move the lib/codeintel/languages package to a public repo and add a dependency on that? Otherwise, having to maintain this copy separately is going to get tedious quickly, and reviewing changes is also not that simple.

Wrt the normalization, I don't fully understand the requirements -- do we need to maintain some kind of compatibility between successive versions of the package?

keegancsmith · 2024-09-18T08:30:19Z

Let's move the lib/codeintel/languages package to a public repo and add a dependency on that? Otherwise, having to maintain this copy separately is going to get tedious quickly, and reviewing changes is also not that simple.

Agreed, this makes sense to be shared. However, there is a tricky requirement around testing since the implementation of this package is tied directly to the version of go-enry. Maybe what we can do is expose a package which contains the tests, then as part of CI for sourcegraph and zoekt we run those tests to ensure we are compatible with the resolved version of go-enry?

Wrt the normalization, I don't fully understand the requirements -- do we need to maintain some kind of compatibility between successive versions of the package?

We do case-sensitive string comparisons on the language value. I noticed that while doing this upgrade the casing we chose for some stuff changed. I didn't exactly track it down, but I believe it has something to do with things migrating between being inside of go-enry vs our custom support (I believe the code made different decisions around deciding to ToLower a string). I need to look into it more to give you a definite answer, but I saw enough to mark this PR as draft :)

jtibshirani · 2024-09-18T20:22:02Z

Let's move the lib/codeintel/languages package to a public repo and add a dependency on that?

+1 let's do that! Could we commit to languages being our only interface for language info (including the IsVendor, etc. methods we added)? Then we could completely remove a direct dependency on go-enry.

jtibshirani · 2025-01-24T23:52:51Z

@varungandhi-src @keegancsmith I'd like to revisit this PR. What's your thinking on moving the lib/codeintel/languages package to a public repo so we can add a dependency on that?

varungandhi-src · 2025-01-25T00:41:38Z

@jtibshirani Yes, let's do it.

I'm OOO for Lunar New Year next week.

If you want to do this in the fastest way possible, feel free to copy the code to a separate public repo. (I can work on merging it with other code later.)

If you want to keep it with other code, I recommend creating a bindings/go/langur directory in the langur repo (see brief dev docs, fixing the CI (I think that needs an extra flag to Bazel CLI or temporarily pin the Bazel version to something older), copying the code there, and updating the callers.

If you don't have time the coming week, I can follow up on it after coming back.

You should be able to get repo maintainer/admin level access through Entitle.

languages: copy over latest version from sourcegraph

78a7ea4

I realised we haven't been updating this package as we updated the package in the sourcegraph repo. We don't need all the functionality it has, but its easier to just copy paste everything. Test Plan: go test

keegancsmith requested review from mmanela, varungandhi-src and a team September 17, 2024 10:20

cla-bot bot added the cla-signed label Sep 17, 2024

keegancsmith marked this pull request as draft September 17, 2024 13:47

varungandhi-src reviewed Sep 17, 2024

View reviewed changes

mmanela removed their request for review January 7, 2025 14:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

languages: copy over latest version from sourcegraph #828

languages: copy over latest version from sourcegraph #828

keegancsmith commented Sep 17, 2024

github-actions bot commented Sep 17, 2024

keegancsmith commented Sep 17, 2024

varungandhi-src left a comment

keegancsmith commented Sep 18, 2024

jtibshirani commented Sep 18, 2024

jtibshirani commented Jan 24, 2025

varungandhi-src commented Jan 25, 2025 •

edited

Loading

languages: copy over latest version from sourcegraph #828

Are you sure you want to change the base?

languages: copy over latest version from sourcegraph #828

Conversation

keegancsmith commented Sep 17, 2024

github-actions bot commented Sep 17, 2024

keegancsmith commented Sep 17, 2024

varungandhi-src left a comment

Choose a reason for hiding this comment

keegancsmith commented Sep 18, 2024

jtibshirani commented Sep 18, 2024

jtibshirani commented Jan 24, 2025

varungandhi-src commented Jan 25, 2025 • edited Loading

varungandhi-src commented Jan 25, 2025 •

edited

Loading