Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for certain non-trivial Stream searching #34

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

runeflobakk
Copy link
Member

@runeflobakk runeflobakk commented Jan 5, 2024

By non-trivial I mean searching which uses some kind of "contextual" condition in addition to a matching predicate.

The feature currently offered is to find a certain element, which is traditionally done using .filter(..), but with the additional constraint that the element must not be followed by a certain other element.

The usage can look something like this:

Optional<Event> failedEvent = stream.collect(find(FAILED).keepFirstNotFollowedBy(RESTORED));

(see "Example - analyzing events" below for a more complete example)

API design

The search is realized with a Collector which is built by first expressing the subject element you are interested in, using DiggCollectors.find(Predicate). The returned object offers the API to compose the full compound search condition.

Currently, and introduced in this PR, the returned object offers methods to specify that the element you search for can not be followed by a certain other element. In addition you also specify if you want the first or last encountered element, should there be multiple applicable elements which is not followed by any "cancelling" element.

This should allow for further extending the API with other search cases. For example it should be possible to extract a sub-collection of successive events given the initial subject predicate, and then specifying the condition for when to end the extraction.

It follows that the use cases for this typically requires Streams which are ordered and sequential, which must be ensured by the developer, or you may get unpredictable results. One example of such source of elements may be a chronological stream of events, and you are interested in finding an event but only if it has not been "voided" by a certain following event.

Tradeoffs

The existing standard collectors offered by the JDK and in widespread use (e.g. .toList(), .groupingBy(..), etc) generally communicate that the entire stream will be traversed to produce a result. A collector has no facilities to send a "halting signal", i.e. to decide that the result is complete at any point, and cancel any further elements to be offered to the collector for processing/collecting. It can of course choose to discard any offered element at any point.

Currently, I consider the API to communicate that the entirety of the Stream must be traversed, because of the condition of the "any following element" which may affect the result. But an API such as this may evolve to communicate that once a result is obtained, the rest of the Stream is discarded and not processed, and this is not the case. As with all uses of collectors, the Stream pipeline should be forged to filter only relevant elements, and must be finite.

This feature would probably be a better fit as a Gatherer, currently in preview in not-yet-released OpenJDK 22, as it would allow termination of the Stream traversal. And having it as an intermediate operation could also enable more flexibility for further processing after the find-operation.

The integrator function [of a Gatherer] (...) can also terminate processing before reaching the end of the input stream; for example, a gatherer searching for the largest of a stream of integers can terminate if it detects Integer.MAX_VALUE.

Example - analyzing events

Given this model of events:

record Event(Type type, LocalDateTime at) {
    enum Type implements Predicate<Event> { INIT, MOVING_ALONG, FAILED, RESTORED;
        @Override public boolean test(Event e) {
            return this == e.type();
        }
    }
}

And the following events happened resolved in chronnological order:

var now = LocalDateTime.of(2024, 1, 4, 8, 8);
var firstFailure = now.plusSeconds(1);
var firstFailureAfterRestore = now.plusSeconds(10);
var yetAnotherFailure = firstFailureAfterRestore.plusSeconds(10);

var events = List.of(
    new Event(Event.Type.INIT, now),
    new Event(Event.Type.FAILED, firstFailure),
    new Event(Event.Type.RESTORED, firstFailure.plusSeconds(1)),
    new Event(Event.Type.MOVING_ALONG, firstFailure.plusSeconds(2)),
    new Event(Event.Type.FAILED, firstFailureAfterRestore),
    new Event(Event.Type.MOVING_ALONG, firstFailureAfterRestore.plusSeconds(1)),
    new Event(Event.Type.FAILED, yetAnotherFailure),
    new Event(Event.Type.MOVING_ALONG, yetAnotherFailure.plusSeconds(20)));

The following tests shows resolving the first applicable failure event, and the last applicable failure event, both happening after the restoring which happened as the third event.

Lastly, we append yet another restored-event, which voids all the contained failures.

Optional<Event> failure; // failure may or may not be the result

failure = events.stream()
    .collect(find(Event.Type.FAILED).keepFirstNotFollowedBy(Event.Type.RESTORED));
assertThat(failure.get(), is(new Event(Event.Type.FAILED, firstFailureAfterRestore)));

failure = events.stream()
        .collect(find(Event.Type.FAILED).keepLastNotFollowedBy(Event.Type.RESTORED));
assertThat(failure.get(), is(new Event(Event.Type.FAILED, yetAnotherFailure)));

failure = Stream
        .concat(events.stream(), Stream.of(new Event(Event.Type.RESTORED, yetAnotherFailure.plusDays(1))))
        .collect(find(Event.Type.FAILED).keepLastNotFollowedBy(Event.Type.RESTORED));
assertTrue(failure.isEmpty());

@runeflobakk runeflobakk force-pushed the stream-compound-search branch 2 times, most recently from ca3a5a7 to 8086ac4 Compare January 5, 2024 15:36
@runeflobakk runeflobakk force-pushed the stream-compound-search branch from 8086ac4 to 9ec20f0 Compare January 5, 2024 23:00
@runeflobakk runeflobakk marked this pull request as ready for review January 6, 2024 15:44
@runeflobakk runeflobakk requested a review from hernil January 8, 2024 09:18
Copy link

@hernil hernil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks neat. The case and test outlined in the PR could also be part of the tests as part of the documentation? It's a bit more verbose in a good way that helps illustrate potential use cases imo 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants