Add support for certain non-trivial Stream searching #34

runeflobakk · 2024-01-05T13:28:09Z

By non-trivial I mean searching which uses some kind of "contextual" condition in addition to a matching predicate.

The feature currently offered is to find a certain element, which is traditionally done using .filter(..), but with the additional constraint that the element must not be followed by a certain other element.

The usage can look something like this:

Optional<Event> failedEvent = stream.collect(find(FAILED).keepFirstNotFollowedBy(RESTORED));

(see "Example - analyzing events" below for a more complete example)

API design

The search is realized with a Collector which is built by first expressing the subject element you are interested in, using DiggCollectors.find(Predicate). The returned object offers the API to compose the full compound search condition.

Currently, and introduced in this PR, the returned object offers methods to specify that the element you search for can not be followed by a certain other element. In addition you also specify if you want the first or last encountered element, should there be multiple applicable elements which is not followed by any "cancelling" element.

This should allow for further extending the API with other search cases. For example it should be possible to extract a sub-collection of successive events given the initial subject predicate, and then specifying the condition for when to end the extraction.

It follows that the use cases for this typically requires Streams which are ordered and sequential, which must be ensured by the developer, or you may get unpredictable results. One example of such source of elements may be a chronological stream of events, and you are interested in finding an event but only if it has not been "voided" by a certain following event.

Tradeoffs

The existing standard collectors offered by the JDK and in widespread use (e.g. .toList(), .groupingBy(..), etc) generally communicate that the entire stream will be traversed to produce a result. A collector has no facilities to send a "halting signal", i.e. to decide that the result is complete at any point, and cancel any further elements to be offered to the collector for processing/collecting. It can of course choose to discard any offered element at any point.

Currently, I consider the API to communicate that the entirety of the Stream must be traversed, because of the condition of the "any following element" which may affect the result. But an API such as this may evolve to communicate that once a result is obtained, the rest of the Stream is discarded and not processed, and this is not the case. As with all uses of collectors, the Stream pipeline should be forged to filter only relevant elements, and must be finite.

This feature would probably be a better fit as a Gatherer, currently in preview in not-yet-released OpenJDK 22, as it would allow termination of the Stream traversal. And having it as an intermediate operation could also enable more flexibility for further processing after the find-operation.

The integrator function [of a Gatherer] (...) can also terminate processing before reaching the end of the input stream; for example, a gatherer searching for the largest of a stream of integers can terminate if it detects Integer.MAX_VALUE.

Example - analyzing events

Given this model of events:

record Event(Type type, LocalDateTime at) {
    enum Type implements Predicate<Event> { INIT, MOVING_ALONG, FAILED, RESTORED;
        @Override public boolean test(Event e) {
            return this == e.type();
        }
    }
}

And the following events happened resolved in chronnological order:

var now = LocalDateTime.of(2024, 1, 4, 8, 8);
var firstFailure = now.plusSeconds(1);
var firstFailureAfterRestore = now.plusSeconds(10);
var yetAnotherFailure = firstFailureAfterRestore.plusSeconds(10);

var events = List.of(
    new Event(Event.Type.INIT, now),
    new Event(Event.Type.FAILED, firstFailure),
    new Event(Event.Type.RESTORED, firstFailure.plusSeconds(1)),
    new Event(Event.Type.MOVING_ALONG, firstFailure.plusSeconds(2)),
    new Event(Event.Type.FAILED, firstFailureAfterRestore),
    new Event(Event.Type.MOVING_ALONG, firstFailureAfterRestore.plusSeconds(1)),
    new Event(Event.Type.FAILED, yetAnotherFailure),
    new Event(Event.Type.MOVING_ALONG, yetAnotherFailure.plusSeconds(20)));

The following tests shows resolving the first applicable failure event, and the last applicable failure event, both happening after the restoring which happened as the third event.

Lastly, we append yet another restored-event, which voids all the contained failures.

Optional<Event> failure; // failure may or may not be the result

failure = events.stream()
    .collect(find(Event.Type.FAILED).keepFirstNotFollowedBy(Event.Type.RESTORED));
assertThat(failure.get(), is(new Event(Event.Type.FAILED, firstFailureAfterRestore)));

failure = events.stream()
        .collect(find(Event.Type.FAILED).keepLastNotFollowedBy(Event.Type.RESTORED));
assertThat(failure.get(), is(new Event(Event.Type.FAILED, yetAnotherFailure)));

failure = Stream
        .concat(events.stream(), Stream.of(new Event(Event.Type.RESTORED, yetAnotherFailure.plusDays(1))))
        .collect(find(Event.Type.FAILED).keepLastNotFollowedBy(Event.Type.RESTORED));
assertTrue(failure.isEmpty());

hernil

Looks neat. The case and test outlined in the PR could also be part of the tests as part of the documentation? It's a bit more verbose in a good way that helps illustrate potential use cases imo 😄

runeflobakk force-pushed the stream-compound-search branch 2 times, most recently from ca3a5a7 to 8086ac4 Compare January 5, 2024 15:36

Add support for non-trivial Stream searching

9ec20f0

runeflobakk force-pushed the stream-compound-search branch from 8086ac4 to 9ec20f0 Compare January 5, 2024 23:00

runeflobakk marked this pull request as ready for review January 6, 2024 15:44

runeflobakk requested a review from hernil January 8, 2024 09:18

hernil approved these changes Jan 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for certain non-trivial Stream searching #34

Add support for certain non-trivial Stream searching #34

runeflobakk commented Jan 5, 2024 •

edited

Loading

hernil left a comment

Add support for certain non-trivial Stream searching #34

Are you sure you want to change the base?

Add support for certain non-trivial Stream searching #34

Conversation

runeflobakk commented Jan 5, 2024 • edited Loading

API design

Tradeoffs

Example - analyzing events

hernil left a comment

Choose a reason for hiding this comment

runeflobakk commented Jan 5, 2024 •

edited

Loading