Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PathExtraction initinal implementaion #1

Merged
merged 13 commits into from
Oct 18, 2018
Merged

Conversation

raganhan
Copy link
Contributor

  • Path Extraction implementation that exposes the IonReader to
    registered callbacks
  • Package setup, e.g. gradle, checkstyle, copyright

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

* Path Extraction implementation that exposes the IonReader to
registered callbacks
* Package setup, e.g. gradle, checkstyle, copyright
@raganhan raganhan requested a review from tgregg October 17, 2018 02:41
Copy link
Contributor

@tgregg tgregg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. Good tests. Some comments below.

*/
class PathComponentParser {

private static final IonSystem ION_SYSTEM = IonSystemBuilder.standard().build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend replacing this with an IonReaderBuilder and an IonTextWriterBuilder; a full IonSystem is no longer required for constructing readers and writers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, will change it

* language governing permissions and limitations under the License.
*/

package software.amazon.com.ionpathextraction;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The com is superfluous. For example, ion-java uses software.amazon.ion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for noticing that, I'll submit the change in namespace as a separate PR to avoid polluting this one, #2


private boolean isWildcard(final IonReader reader) {
if (reader.stringValue().equals(Wildcard.TEXT)) {
for (final Iterator<String> iter = reader.iterateTypeAnnotations(); iter.hasNext(); ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with ion-c, this should only check to see if the first annotation is WILDCARD_ESCAPE_ANNOTATION. (Processing of local symbol tables serves as precedent for this--only structs whose first annotation is $ion_symbol_table are treated as local symbol tables.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

private static final boolean DEFAULT_MATCH_RELATIVE_PATHS = false;
private static final boolean DEFAULT_CASE_INSENSITIVE = false;
private final List<SearchPath> searchPaths = new ArrayList<>();
private final List<Function<IonReader, Integer>> callbacks = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think there is value in giving the user the ability to pass through some context to the callback. The first adopter of the ion-c path extractor is using this feature. That would make the callback function a BiFunction<IonReader, Object, Integer>, and you'd probably have to add a class to hold the pair of the function and Object context.

Then again, maybe this can be achieved in Java with no loss of functionality if the author of the callback just refers to external variables containing any desired context. If that's the case, then we can keep it as-is, and we should change the developers' guide to make it clear that presence of user context in the callback is is left to the implementation based on what's easiest in the particular language.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Java the user can use the lambda closure or a concrete Function implementation as the user context. The callback I use in the tests uses the lambda closure for this so it can accumulate all results in a IonList

Will add that as an example in the documentation


private static final String WILDCARD_ESCAPE_ANNOTATION = "$ion_extractor_field";

List<PathComponent> parse(final String ionPathExpression) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to make all these methods static and not require an actual instance of PathComponentParser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really.

I was thinking of having interfaces for all these classes to make it easier to swap implementations eventually but since PathComponentParser is not public this is not necessary. I should do that for PathExtractor at least though

Will change PathComponentParser and add an interface to PathExtractor

}

int getCurrentDepth() {
return stack.size() - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this could cause problems when there are no search paths. Maybe we should do a quick return in match when that's the case?

Or we can require at least one search path, but currently ion-c and the documentation don't require that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep parity with ion-c here as it can be useful to have a no-op PathExtractor, i.e. with zero search paths

I added some tests and it works with zero paths.
Take a look at PathExtractor#matchRecursive, even though currentDepth will be -1 it is not used by anything as there are no search paths, and thus no active paths skipping the for body where currentDepth is used

Agree that a short circuit is good here as it makes the code clearer and avoids unnecessary work so will add that.


assertEquals(ION.singleValue("[1,2,3]"), out)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test case for calling match with zero registered search paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And one for returning an integer from a matched callback that is greater than the reader's depth.

Copy link
Contributor Author

@raganhan raganhan Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do. I'm assuming that the callback return should not be higher than the reader current relative depth to account for matchRelativePaths

example

data {foo: {bar: 1}}
search path: (bar) 
return 2 // max

edit: was supposed to say "should not be higher" instead of "should be higher"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct.


private int invokeCallback(final IonReader reader, final SearchPath searchPath) {
int previousReaderDepth = reader.getDepth();
int stepOutTimes = callbacks.get(searchPath.getId()).apply(reader);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fail if stepOutTimes is greater than the reader's depth (as ion-c does)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't, will fix

private static final boolean DEFAULT_MATCH_RELATIVE_PATHS = false;
private static final boolean DEFAULT_CASE_INSENSITIVE = false;
private final List<SearchPath> searchPaths = new ArrayList<>();
private final List<Function<IonReader, Integer>> callbacks = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having two parallel ArrayLists, how about modifying the SearchPath class to hold a List<PathComponent> and the callback Function (instead of an Integer id, which is only used as an index into the callbacks list).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do, see my answers to your comments on SearchPath above for more context

"reader must be at depth zero, it was at:" + reader.getDepth());

// marks all search paths as active
tracker.reset(searchPaths);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted PathExtractor to be thread-safe (with the caveat that each thread would need to use a different IonReader), it looks like it would be as simple as instantiating a new Tracker on each invocation of match.

Copy link
Contributor Author

@raganhan raganhan Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Trade off is more instantiation every match call. TBH I'm not sure how big that trade off is so erred on making it faster instead of thread safe as you can pool PathExtractors easily.

In any case should be to refactor so we can have both versions and let the user decide. Would wait on a use case before benchmarking both versions and/or providing both versions

Showing how you can use the lambda closure as ion-c user context
making the wildcard escape annotation valid only as the first
annotation. Added a test to show that
PathExtractorImpl is now package private to make easier to add new
implementations in the future if necessary
Already worked, but short circuiting makes it clearer and avoids
unecessary work.

Also adds missing tests for this use case
and some other spelling and missing comments
* Moving callback to SearchPath
* Simplifying partial match detection logic
* Fixing needToStepIn logic
* Adding tests for nested search paths
@raganhan
Copy link
Contributor Author

Tried to address each comment in a separate commit to make it easier for the reviewer.

I also renamed PathExtractor to PathExtractorImpl and made PathExtractor an interface. I can undo this change if it makes it easier to review and re-apply before merging

README.md Outdated
.register("(baz 1)", callback)
.build();

IonReader ionReader = ion.newReader("{foo: 1}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please help me teach users to use an IonReaderBuilder for constructing IonReaders. I dream of deprecating all of the miscellaneous non-DOM-related utility methods from IonSystem one day.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. Also added a test for the example to avoid breaking in the future, which already happened...

Also added a test for the example to make sure it works
Copy link
Contributor

@tgregg tgregg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before you merge, if you're going to make the first commit part of the history, don't forget to change it from "PathExtraction initinal implementaion" to "PathExtraction initial implementation".

README.md Outdated
```java
PathExtractorBuilder.standard()
.withMatchCaseInsensitive(true)
.register("(foo)", (reader) -> { ... })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withSearchPath

README.md Outdated

### Notification
Each time the `PathExtractor` encounters a value that matches a registered search path it will invoke the respective
callback passing the reader positioned at the current value. See `PathExtractorBuilder#register` methods for more
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withSearchPath

@raganhan raganhan merged commit 864def4 into master Oct 18, 2018
@raganhan raganhan deleted the initial-implementation branch October 18, 2018 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants