Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It's not possible to extract terms from HighFreqTerms #959

Closed
1 task done
VahidN opened this issue Sep 17, 2024 · 3 comments · Fixed by #963
Closed
1 task done

It's not possible to extract terms from HighFreqTerms #959

VahidN opened this issue Sep 17, 2024 · 3 comments · Fixed by #963

Comments

@VahidN
Copy link

VahidN commented Sep 17, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

It's not possible to GetTermText() from HighFreqTerms, because it's defined as an internal method.

Expected Behavior

Make the GetTermText() method public.

Steps To Reproduce

No response

Exceptions (if any)

No response

Lucene.NET Version

4.8.0-beta00016

.NET Version

.NET 8x

Operating System

No response

Anything else?

No response

@VahidN VahidN added the is:bug label Sep 17, 2024
@paulirwin
Copy link
Contributor

@VahidN Thanks for the feedback. It looks like even in the latest Java Lucene that method is still package-private (kinda-sorta how Java spells "internal") from its default visibility: https://github.com/apache/lucene/blob/b59a357e58611b86e4cc1643794e05dd17d71b08/lucene/misc/src/java/org/apache/lucene/misc/TermStats.java#L35

However, the termtext field is public in the upstream code (including in 4.8.0), so we incorrectly ported that as internal: https://github.com/apache/lucene/blob/b59a357e58611b86e4cc1643794e05dd17d71b08/lucene/misc/src/java/org/apache/lucene/misc/TermStats.java#L23

It seems like the most correct fix here is to make the field public (and possibly make it a property like the ones below it), and leave the visibility of GetTermText() as internal, although making it public too does seem like a harmless change that could be Lucene.Net-specific.

@NightOwl888
Copy link
Contributor

HighFreqTerms is not intended to be copied into another app to use. In Java, you can simply call it on the command line by providing the name of the class to find an entry point on. However, in .NET we cannot do that. A DLL can either be defined as a class library or have an entry point, not both.

So, we provide a wrapper CLI tool, lucene-cli to execute the commands that Lucene provides as command line tools (except for a few that didn't appear to be useful). The list-high-freq-terms command can be used directly by end users, so there is generally no need to copy and paste the HighFreqTerms file into a console project to run.

Of course, as @paulirwin pointed out, the termtext field should be made into a public property to match Lucene, so I am marking this issue as a bug so we can track it.

@NightOwl888 NightOwl888 added pri:normal up-for-grabs This issue is open to be worked on by anyone labels Sep 18, 2024
@NightOwl888
Copy link
Contributor

NightOwl888 commented Sep 18, 2024

It seems like it would also be a good idea to search the codebase for Main( and add a comment on each of those methods as well as the header of the class (if present and if the file is only to be used as a CLI tool) that these commands are available through lucene-cli with a link to the NuGet package so we don't keep getting reports like this one. Except for the demos - those can be copied directly out of the codebase to run (or you can use the CLI to export the files to your local disk, write them to the console, or run them).

@rclabo rclabo self-assigned this Sep 19, 2024
@rclabo rclabo removed the up-for-grabs This issue is open to be worked on by anyone label Sep 19, 2024
rclabo added a commit to rclabo/lucenenet that referenced this issue Sep 25, 2024
paulirwin pushed a commit to rclabo/lucenenet that referenced this issue Oct 22, 2024
@paulirwin paulirwin added this to the 4.8.0-beta00017 milestone Oct 22, 2024
paulirwin added a commit that referenced this issue Oct 22, 2024
…LI bug fix (#963)

* Make TermStats.TermText public (#959)

* SWEEP: Added XML comments for Main(args) methods that are only intended for command line  CLI usage.

* BUG: Lucene,Net.Cli.BenchmarkFindQualityQueriesCommand: fixed mapping to QualityQueriesFinder.Main(args)

* Better approach to fixing TermStats.TermText to fix one failing unit test.

* Address PR feedback

---------

Co-authored-by: Paul Irwin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants