Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Lucene.Net.Grouping to eliminate usage of LINQ, covariant interfaces, and non-generic collection interfaces #1059

Open
7 of 9 tasks
NightOwl888 opened this issue Dec 5, 2024 · 20 comments · May be fixed by #1066
Open
7 of 9 tasks
Assignees
Labels

Comments

@NightOwl888
Copy link
Contributor

NightOwl888 commented Dec 5, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Task description

There were several workarounds done during the initial port of Lucene.Net.Grouping that make it less usable and even less performant than the grouping module in Lucene 4.8.1.

  1. It is using IEnumerable<T> in places where Lucene used the Java equivalent of ICollection<T>
  2. Since we are using IEnumeable<T>, we have to use LINQ Count() and Any() in a few places
  3. We are using non-generic collection interfaces to get around the fact that .NET doesn't support wildcard generics
  4. The Collector abstract class in Lucene.Net.Queries was replaced with an ICollector interface to make it covariant

The fundamental problem with Lucene.Net.Grouping is that C# doesn't support wildcard generics and the Lucene design requires them. When I ported it, it took several tries just to find something that would compile and this is essentially our first (failed) attempt. It functions, but it has problems that rippled throughout the search namespace. It has bottlenecks that didn't exist in Java where we use LINQ and extra collection allocations. These bottlenecks will not be fixable after the Lucene.NET 4.8.0 release because they require breaking API changes, both to Lucene.Net.Queries and Lucene.Net.Grouping.

Through the high-level analysis work that @rclabo did, we learned that the crux of the issue is with the GroupingSearch class. It is a God Object that the user can see all of the grouping options on. But since it has fields that need to be multiple different generic types, it doesn't work in C#. Ron refactored the search method into different types of search, but I think it still requires covariance to exist. That is where we are today.

I suspect we can fix all of the incompatibles by correcting the God Object design and following the Single Responsibility Principle. There are 3 different types of grouping searches (each defined by a separate constructor):

  • Group by Field - groups documents by index terms using the FieldCache
  • Group by Function - groups documents by function using a ValueSource
  • Group by Doc Block - can only be used when documents belonging in a group are indexed in one block

My thought is to start by refactoring the GroupingSearch class into FieldGroupingSearch, FunctionGroupingSearch, and DocBlockGroupingSearch, separating all of the fields into their respective classes. If there is any shared functionality remaining, create a new class named AbstractGroupingSearch to put this shared state and members in. This removes the need to have anything resembling Java wildcard generics because we will have separate classes with their own generic types.

This will ripple changes through the Lucene.Net.Grouping project, but we should aim to remove the extra interfaces we created just to be able to declare types as covariant (which was a workaround we used for C#'s lack of wildcard generics).

  • IAbstractAllGroupsCollector<out TGroupValue>
  • AbstractDistinctValuesCollector.IGroupCount<out TGroupValue>
  • IAbstractDistinctValuesCollector<out GC>
  • IAbstractFirstPassGroupingCollector<out TGroupValue>
  • IAbstractSecondPassGroupingCollector<out TGroupValue>
  • IGroupDocs<out TGroupValue>
  • ISearchGroup<out TGroupValue>
  • ITopGroups<out TGroupValue>
  • ICollector (in Lucene.Net.Queries). Note that we could keep this interface, but we should use the original Collector abstract class design. It has a constant declaration that cannot be put into an interface, so we have to have a class, anyway.

NOTE: We may still require non-generic interfaces to work around issues with calling members on generic types without referring to a generic closing type, but the goal here is to remove the need for covariance.

Once we remove the interfaces, the violations to the SRP, and fix the 4 issues above, we can then use the class GroupingSearch to either delegate the call to the specialized GroupingSearch classes (preferred - this would give us a similar UX as Lucene) or to add static factory methods to create instances of them. This is going to require some creativity. I am not sure the former is possible because of the different return types for each search method (which was the primary reason we needed covariance to port it), but separating the functionality will likely reveal more options than we previously had.

The idea is that we will ultimately "put it all in one place" in GroupingSearch to make it somewhat similar to Lucene, even though all we are doing is delegating calls or creating a single jump point to assist with creating the right class instance to do the real work.

Deliverables

  • Factor out the need for anything resembling Java wildcard generics
  • Remove the extra covariant interfaces
  • Restore Collector back to its original abstract class design as it was in Lucene 4.8.1
  • Restore IEnumerable<T> declarations to ICollection<T> as they were in Lucene 4.8.1
  • Remove the calls to LINQ methods on IEnuerable<T> and restore ICollection<T>.Count as it was in Lucene 4.8.1
  • Restore generic collections where we are currently using ICollection and IDictionary
  • Remove extra collection allocations (see the LUCENENET TODO: Try to work out how to do this without an O(n) operation lines)
  • Create a UX as similar as possible to Lucene 4.8.1 and update the documentation to show usage
@NightOwl888 NightOwl888 added up-for-grabs This issue is open to be worked on by anyone design pri:high is:task A chore to be done labels Dec 5, 2024
@NightOwl888 NightOwl888 added this to the 4.8.0-beta00018 milestone Dec 5, 2024
@paulirwin
Copy link
Contributor

This is awesome analysis and well-written, thanks @NightOwl888. I might take this one after I finish up a few others I'm working on, if no one else jumps on it in the meantime.

@paulirwin paulirwin self-assigned this Dec 12, 2024
@paulirwin
Copy link
Contributor

I spent some time looking at this, and playing around with the APIs in both Lucene and Lucene.NET beta 17, and I've got a pretty good grasp on it now.

The problem boils down to not as much that the Lucene Java code uses wildcard generics, but that we're trying to force a reified generics design on this API, when in Java the generics are type-erased. The generic methods where only the generic type parameter is used in the return type lets the caller pretend like it's any type they want, but it will fail unless it's the exact type that is hydrated at runtime. For example, Lucene/Java happily let you write unintelligible code like the following:

TopGroups<System> topGroups = groupingSearch.search(searcher, query, 0, topNGroups);

(Yes, that System.) Since these types are erased, there's nothing telling groupingSearch.search what type to make (or expect to try to make) the group values. If the API took a Class<T> parameter, that would allow us to match it with .NET reified generics, but since it doesn't, our generics really can only just help ensure we don't have to cast everywhere to use the API as they're just doing the casting internally. i.e. if you try to pass string where it internally expects it to be a BytesRef it will fail, but that's not really different from what would happen in Lucene. It very easily could just be a non-generic API if we wanted to do that, where the group value would be of type object, but I think @NightOwl888's plan above is going to work.

I've got the GroupingSearch class broken down into those three classes with a common abstract base class. The only failing test I'm chasing down now is that the Function tests are failing to cast from MutableValue to MutableValueStr (or Int32), so I have to figure out a solution to that.

@NightOwl888
Copy link
Contributor Author

I've got the GroupingSearch class broken down into those three classes with a common abstract base class. The only failing test I'm chasing down now is that the Function tests are failing to cast from MutableValue to MutableValueStr (or Int32), so I have to figure out a solution to that.

It sounds like we probably need a TGroupSearch generic type parameter added to the FunctionGroupingSearch class and possibly elsewhere with a constraint that it must subclass MutableValue.

paulirwin added a commit to paulirwin/lucene.net that referenced this issue Dec 15, 2024
This still needs XML doc comments, but this breaks the GroupingSearch
"god class" into three subclasses that implement a common abstract
class. This should allow us to not need covariant interfaces for the
return types.

In order to randomly switch between these classes with incompatible
generic type arguments, the test shows how you can use a delegate to
work around this limitation.
@paulirwin
Copy link
Contributor

An update: I have the GroupingSearch class split out, and many of the interfaces and covariance removed, I just have one failing unit test that I am having difficulty tracking down. You can track my progress at my branch: https://github.com/paulirwin/lucene.net/tree/issue/1059 - or I welcome if anyone can pull that down and see what the problem is.

The tests are a bit wonky as I have to create types that use object as the generic type parameter to mimic the previous covariance, since the tests randomly create either Term or Function versions based on a random boolean. Having an object-based wrapper class helps me get around needing covariance. The types BytesRef and MutableValue are already reference types, so converting them to object just for the tests is not a problem, it's just mildly annoying to have to cast them back out when needed.

The GroupingSearch class now has three helper factory methods, but you also could use the split-out classes directly if you wanted to. I'll add XML doc comments before I publish the PR, for now I'm focused on getting it working.

Once I get the failing test working, I'll finish removing the interfaces, then focus on converting IEnumerable back to ICollection and the other items on the list.

@paulirwin
Copy link
Contributor

paulirwin commented Dec 15, 2024

Re: "Restore generic collections where we are currently using ICollection and IDictionary", ICollection is no longer there, but IDictionary still is, because that requires Lucene.Net.Queries.Function.ValueSource to have a generic dictionary context argument to GetValues, and that would be a much larger change. I think that can be broken out as its own issue/PR as this is big enough as it is, and this being non-generic only affects the Function types, so the scope is pretty limited.

I think we can safely replace IDictionary with IDictionary<object, object> in that case. Still loosely typed to object, but we kinda have to because it's designed to just be a form of a cache that is passed around. Some implementations (like ScaleFloatFunction) use non-string keys as well, so it's not safe to make it IDictionary<string, object>.

@paulirwin
Copy link
Contributor

Also note that I had to add an extra TGroupValue generic type parameter to AbstractDistinctValuesCollector due to removing the interfaces/covariance. The signature is now:

public abstract class AbstractDistinctValuesCollector<GC, TGroupValue> : ICollector
    where GC : AbstractDistinctValuesCollector.GroupCount<TGroupValue>

(Note that I haven't yet looked at the Collector changes, still using ICollector for now.)

This is required because we can't make abstract types covariant, and the Function implementation uses MutableValue while Term uses BytesRef, and if you try to do something like where GC : AbstractDistinctValuesCollector.GroupCount<object>, that doesn't work because of the covariance issue. The only alternative is to add back the IGroupCount interface... I could go either way on that.

Since most people will probably be using the Function or Term implementations and not creating their own, this likely won't affect much of anyone except for those that are needing to create their own grouping search classes.

@NightOwl888
Copy link
Contributor Author

An update: I have the GroupingSearch class split out, and many of the interfaces and covariance removed, I just have one failing unit test that I am having difficulty tracking down. You can track my progress at my branch: https://github.com/paulirwin/lucene.net/tree/issue/1059 - or I welcome if anyone can pull that down and see what the problem is.

The tests are a bit wonky as I have to create types that use object as the generic type parameter to mimic the previous covariance, since the tests randomly create either Term or Function versions based on a random boolean. Having an object-based wrapper class helps me get around needing covariance. The types BytesRef and MutableValue are already reference types, so converting them to object just for the tests is not a problem, it's just mildly annoying to have to cast them back out when needed.

The GroupingSearch class now has three helper factory methods, but you also could use the split-out classes directly if you wanted to. I'll add XML doc comments before I publish the PR, for now I'm focused on getting it working.

Once I get the failing test working, I'll finish removing the interfaces, then focus on converting IEnumerable back to ICollection and the other items on the list.

I took a look and this seems to be a common problem that we have had when porting generic types from Java. C# sees FirstPassGroupingCollector<BytesRef> and FirstPassGroupingCollector<MutableValue> as 2 distinct types, but in Java both types can be stored in a variable of type FirstPassGroupingCollector<?>. The solution we commonly use is to create a non-generic abstraction for FirstPassGroupingCollector that the generic type implements/subclasses.

public abstract class FirstPassGroupingCollector<T> : FirstPassGroupingCollector
{
    // implementation
}

public abstract class FirstPassGroupingCollector // Note this could be an interface or an abstract class
     : ICollector
{
    private protected FirstPassGroupingCollector() {} // Only allow internal use

    // implementation
}

This would allow the type to be referenced without being aware of its generic type.

        private AbstractFirstPassGroupingCollector CreateFirstPassCollector<T>(string groupField, Sort groupSort, int topDocs, AbstractFirstPassGroupingCollector firstPassGroupingCollector)
        {
            // LUCENENET specific - we need to look for our wrapper types
            if (typeof(ObjectFirstPassGroupingCollector<BytesRef>).IsAssignableFrom(firstPassGroupingCollector.GetType()))
            {
                ValueSource vs = new BytesRefFieldSource(groupField);
                return new FunctionFirstPassGroupingCollector<MutableValue>(vs, new Hashtable(), groupSort, topDocs);
            }
            else if (typeof(ObjectFirstPassGroupingCollector<MutableValue>).IsAssignableFrom(firstPassGroupingCollector.GetType()))
            {
                return new TermFirstPassGroupingCollector(groupField, groupSort, topDocs);
            }

            fail();
            return null;
        }

If you think about it, since we are having trouble testing it this way, it is very likely that users will need to be able to store these distinct types in a common variable type (after all, Java allows this), so making this a first class feature rather than a test mock is necessary.

This would eliminate the need to create an object wrapper type. There may still be the need to cast somewhere, but there may be ways to mitigate that cast (perhaps by calling a method or property on a class that already is aware of the generic type and can handle the cast internally).

@NightOwl888
Copy link
Contributor Author

Re: "Restore generic collections where we are currently using ICollection and IDictionary", ICollection is no longer there, but IDictionary still is, because that requires Lucene.Net.Queries.Function.ValueSource to have a generic dictionary context argument to GetValues, and that would be a much larger change. I think that can be broken out as its own issue/PR as this is big enough as it is, and this being non-generic only affects the Function types, so the scope is pretty limited.

I think we can safely replace IDictionary with IDictionary<object, object> in that case. Still loosely typed to object, but we kinda have to because it's designed to just be a form of a cache that is passed around. Some implementations (like ScaleFloatFunction) use non-string keys as well, so it's not safe to make it IDictionary<string, object>.

Would it make sense to add the closing types of IDictionary<TKey, TValue> to the classes that wrap them? It looks like we could cascade that down from AbstractGroupingSearch<T, TKey, TValue> down to the others like AbstractFirstPassGroupingCollector<T, TKey, TValue>, etc.

@NightOwl888
Copy link
Contributor Author

Re: "Restore generic collections where we are currently using ICollection and IDictionary", ICollection is no longer there, but IDictionary still is, because that requires Lucene.Net.Queries.Function.ValueSource to have a generic dictionary context argument to GetValues, and that would be a much larger change. I think that can be broken out as its own issue/PR as this is big enough as it is, and this being non-generic only affects the Function types, so the scope is pretty limited.
I think we can safely replace IDictionary with IDictionary<object, object> in that case. Still loosely typed to object, but we kinda have to because it's designed to just be a form of a cache that is passed around. Some implementations (like ScaleFloatFunction) use non-string keys as well, so it's not safe to make it IDictionary<string, object>.

Would it make sense to add the closing types of IDictionary<TKey, TValue> to the classes that wrap them? It looks like we could cascade that down from AbstractGroupingSearch<T, TKey, TValue> down to the others like AbstractFirstPassGroupingCollector<T, TKey, TValue>, etc.

Wait, it looks like that is only for FunctionGroupingSearch, so maybe FunctionGroupingSearch<T, TKey, TValue>? Of course, we should have generic type names than this.

@paulirwin
Copy link
Contributor

If you think about it, since we are having trouble testing it this way, it is very likely that users will need to be able to store these distinct types in a common variable type (after all, Java allows this), so making this a first class feature rather than a test mock is necessary.

I'm not so sure it's necessary, this is only a problem with using the low-level Abstract*Collector types. I don't think many people are going to be using those low-level types (instead using the types in the GroupingSearch file), and those that are further still are probably not going to need to be switching between them in the same variable like the test does, and further still if they really need to do that, they can do a workaround like we're doing here.

What I don't want to have to do is expose object-based properties that are shadowed in the generic classes (or worse, i.e. ICollection<object> Groups for a non-generic AbstractAllGroupsCollector base class). That seems like a cure that is worse than the disease, and I don't even know if it's feasible.

Would it make sense to add the closing types of IDictionary<TKey, TValue> to the classes that wrap them? It looks like we could cascade that down from AbstractGroupingSearch<T, TKey, TValue> down to the others like AbstractFirstPassGroupingCollector<T, TKey, TValue>, etc.

No, because ValueSource implementations treat it as IDictionary<object, object>. Different implementations use different types for the keys and values. Object is the only common type. We also don't know which particular ValueSource implementations will be used, so that's pretty much impossible. This dictionary is truly just a bag of key/value pairs, strongly typing it would not help anything and would require a significant deviation from upstream.

@paulirwin
Copy link
Contributor

Also note that a value source might even store/retrieve different key and value types in the context within the same method.

@NightOwl888
Copy link
Contributor Author

If you think about it, since we are having trouble testing it this way, it is very likely that users will need to be able to store these distinct types in a common variable type (after all, Java allows this), so making this a first class feature rather than a test mock is necessary.

I'm not so sure it's necessary, this is only a problem with using the low-level Abstract*Collector types. I don't think many people are going to be using those low-level types (instead using the types in the GroupingSearch file), and those that are further still are probably not going to need to be switching between them in the same variable like the test does, and further still if they really need to do that, they can do a workaround like we're doing here.

What I don't want to have to do is expose object-based properties that are shadowed in the generic classes (or worse, i.e. ICollection<object> Groups for a non-generic AbstractAllGroupsCollector base class). That seems like a cure that is worse than the disease, and I don't even know if it's feasible.

But it is how we do it elsewhere (for example, in FieldComparer). Having the ability to store a collection of AbstractXXXCollector<T> without knowing the generic closing type is a useful thing to have. Furthermore, it is supported by Java, so there may be examples on blogs and elsewhere that are doing it that way.

That said, there doesn't seem to be a use case where it makes sense to do this for AbstractAllGroupsCollector. It looks like you neatly solved access to the GroupCount field using IGroupCountCollector, but the tests don't access the Groups property so there is no need to find a solution to cast it AFAICT.

Would it make sense to add the closing types of IDictionary<TKey, TValue> to the classes that wrap them? It looks like we could cascade that down from AbstractGroupingSearch<T, TKey, TValue> down to the others like AbstractFirstPassGroupingCollector<T, TKey, TValue>, etc.

No, because ValueSource implementations treat it as IDictionary<object, object>. Different implementations use different types for the keys and values. Object is the only common type. We also don't know which particular ValueSource implementations will be used, so that's pretty much impossible. This dictionary is truly just a bag of key/value pairs, strongly typing it would not help anything and would require a significant deviation from upstream.

Gotcha.

Perhaps we are better off sticking with IDictionary, then. This allows the user to either use IDictionary<object, object> or have it strongly typed without having to do some weird cast or conversion to pass it in.

Need to think about this some more.

@paulirwin
Copy link
Contributor

It looks like you neatly solved access to the GroupCount field using IGroupCountCollector, but the tests don't access the Groups property so there is no need to find a solution to cast it AFAICT.

I actually already removed that in lieu of having this do the same approach as the other collectors in the tests, so that we're not adding an unnecessary interface just for testing purposes. It's unlikely that someone would need just the group counts in the real world without also accessing the groups, so I figured it would be best to not pollute the public API with that interface when it was only used (read: not needed, just used as one approach) for test purposes.

@paulirwin
Copy link
Contributor

I just had an idea, based on that last comment, if you're open to it. We could create non-generic interfaces (not abstract classes) that do not expose collections, but give you APIs to retrieve items. By using an interface instead of an abstract class, we wouldn't be forced to convert to/from object except for in the explicitly implemented methods. It also would not change the type hierarchy. For example:

public interface IAllGroupsCollector : ICollector // NOTE: we'd need to keep this interface
{
    int GroupCount { get; }
    object GetGroup(int index);
}

We would then explicitly implement the interface for the GetGroup method. I think that would be cleaner than exposing a non-generic ICollection, although I could be swayed on explicitly implementing that too. Thoughts?

@NightOwl888
Copy link
Contributor Author

It looks like you neatly solved access to the GroupCount field using IGroupCountCollector, but the tests don't access the Groups property so there is no need to find a solution to cast it AFAICT.

I actually already removed that in lieu of having this do the same approach as the other collectors in the tests, so that we're not adding an unnecessary interface just for testing purposes. It's unlikely that someone would need just the group counts in the real world without also accessing the groups, so I figured it would be best to not pollute the public API with that interface when it was only used (read: not needed, just used as one approach) for test purposes.


I just had an idea, based on that last comment, if you're open to it. We could create non-generic interfaces (not abstract classes) that do not expose collections, but give you APIs to retrieve items. By using an interface instead of an abstract class, we wouldn't be forced to convert to/from object except for in the explicitly implemented methods. It also would not change the type hierarchy. For example:

public interface IAllGroupsCollector : ICollector // NOTE: we'd need to keep this interface
{
    int GroupCount { get; }
    object GetGroup(int index);
}

We would then explicitly implement the interface for the GetGroup method. I think that would be cleaner than exposing a non-generic ICollection, although I could be swayed on explicitly implementing that too. Thoughts?

That is the reason why I said "abstraction" rather than "abstract class", because either would probably be fine. I just used abstract classes in most other places because it meant keeping the same syntax as the upstream code. For example:

Java

List<AbstractAllGroupsCollector> collectors = new ArrayList();
ValueSource vs = new BytesRefFieldSource(groupField);
collectors.add(new FunctionAllGroupHeadsCollector(vs, new Map(), sortWithinGroup));
collectors.add(TermAllGroupHeadsCollector.Create(groupField, sortWithinGroup));

C#

List<AbstractAllGroupsCollector> collectors = new List<AbstractAllGroupsCollector>();
ValueSource vs = new BytesRefFieldSource(groupField);
collectors.Add(new FunctionAllGroupHeadsCollector(vs, new Hashtable(), sortWithinGroup));
collectors.Add(TermAllGroupHeadsCollector.Create(groupField, sortWithinGroup));

Where if we used an interface, it would be declared a bit different:

List<IAllGroupsCollector> collectors = new List<IAllGroupsCollector>();

The main reason for doing this in other places was for being able to access constants and fields without having to specify the non-generic type (in which case the syntax stays the same as upstream).

It would also allow the use of the Collector abstract class, so the user could cast to that type just as they could in Java. Being that there are no methods on ICollector that wouldn't also exist on Collector, that is not such a big deal, though.

That being said, using an interface does have an advantage we may be able to take advantage of: it can be declared internal and used just for testing purposes. If we use an abstract class, it will have to be declared public in order for a public type to inherit it. Although, I am not entirely convinced that getting the GroupCount without having strongly-typed access to the groups is not a valid use case.

After mulling this over a bit more, if the user needed to create a collection of mixed collectors, they could just declare it object or ICollector and then pass it through a set of cast tests upon usage. However, they would need to know all possible types that it could be cast to at compile time, which might not be possible for every use case.

//List<AbstractGroupingSearch> collectors = new List<AbstractGroupingSearch>();
List<object> collectors = new List<object>();

collectors.Add(GroupingSearch.ByField(...));
collectors.Add(GroupingSearch.ByFunction<MutableValueStr>(...));
collectors.Add(GroupingSearch.ByFunction<MutableValueInt32>(...));
collectors.Add(GroupingSearch.ByDocBlock<string>(...));
collectors.Add(GroupingSearch.ByDocBlock<int>(...));

foreach (object collector in collectors)
{
    if (collector is FieldGroupingSearch fieldGroupingSearch)
        // do something with the collector
    else if (collector is FunctionGroupingSearch<MutableValueStr> functionOnString)
        // do something with the collector
    else if (collector is FunctionGroupingSearch<MutableValueInt32> functionOnInt32)
        // do something with the collector
    else if (collector is DocBlockGroupingSearch<string> docBlockOnString)
        // do something with the collector
    else if (collector is DocBlockGroupingSearch<int> docBlockOnInt32)
         // do something with the collector
}

This would be very cumbersome if all they want to do is read a field that is common among all of these types and even moreso if they wanted to do that for any custom subclass (which would probably require reflection).

I am not opposed to keeping ICollector. The only thing that is kind of smelly about it is that there is one constant that is declared on Collector because interfaces in c# don't support constants. So, we need the Collector class whether it is made abstract or not. Since it is currently a static class, users could be confused that they cannot cast a collector type to Collector as is the case in Java.

@paulirwin
Copy link
Contributor

That being said, using an interface does have an advantage we may be able to take advantage of: it can be declared internal and used just for testing purposes.

I think if you're saying that users have a valid use case of needing to store multiple of these different collectors in a variable or collection with different type parameters, we might as well make the interface public then. But as I noted above, I think interfaces instead of abstract classes have another advantage: being able to explicitly implement them in the generic abstract classes.

If we take the abstract class approach, you can't explicitly implement an abstract class. It will require shadowing the non-generic properties if they have the same name. I think an interface lets us have the best of both worlds: we can allow for this use case, while simplifying our testing code. I'll give it a try and see how it comes out.

@NightOwl888
Copy link
Contributor Author

We would then explicitly implement the interface for the GetGroup method. I think that would be cleaner than exposing a non-generic ICollection, although I could be swayed on explicitly implementing that too. Thoughts?

Yeah, I definitely would stay away from a non-generic ICollection.

That being said, using an interface does have an advantage we may be able to take advantage of: it can be declared internal and used just for testing purposes.

I think if you're saying that users have a valid use case of needing to store multiple of these different collectors in a variable or collection with different type parameters, we might as well make the interface public then. But as I noted above, I think interfaces instead of abstract classes have another advantage: being able to explicitly implement them in the generic abstract classes.

If we take the abstract class approach, you can't explicitly implement an abstract class. It will require shadowing the non-generic properties if they have the same name. I think an interface lets us have the best of both worlds: we can allow for this use case, while simplifying our testing code. I'll give it a try and see how it comes out.

Agreed. I often use interfaces to separate members of the same name that only differ by return type. In fact, I made a fluent API generator that creates a builder class that implements interfaces to enforce complex business rules, such as only allowing a specific setting to be applied up to 3 times (after which it will return the builder instance cast to an interface that no longer declares the method for the setting).

I would also like to see how that approach turns out. And I agree that there is more utility for this than just testing.

@paulirwin
Copy link
Contributor

This is going well so far, I'll have a commit soon. I'm creating adapter classes to adapt i.e. an IList<GC> to IList<IGroupCount> for use in AbstractDistinctValuesCollector and such, so that we can use the generic interfaces in a non-generic interface instead of having to use the non-generic ICollection/IList/etc, and without having to use LINQ to create new collections just to cast those values.

@paulirwin
Copy link
Contributor

@NightOwl888 This is pushed up to my branch, let me know what you think. This allowed me to remove the wrapping types so the test code is closer to Lucene. It required adding those "Casting" types to Lucene.Net.Support (as internal classes) to do the casting for the explicit interface implementations without having to allocate new collections (and perform O(n) operations to do so) or use LINQ.

Still has one failing test, but I think it'll be easier to figure out what's wrong from the diff now that the test code has less changes.

@paulirwin
Copy link
Contributor

Of course, as always happens, I realized what the problem was as soon as I posted that. Fixed, and the unit test passes now!

@paulirwin paulirwin linked a pull request Dec 15, 2024 that will close this issue
4 tasks
@paulirwin paulirwin removed the up-for-grabs This issue is open to be worked on by anyone label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants