Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization of "empty" Records as root values fails #177

Closed
wants to merge 1 commit into from
Closed

Deserialization of "empty" Records as root values fails #177

wants to merge 1 commit into from

Conversation

marcospassos
Copy link
Contributor

@marcospassos marcospassos commented Aug 21, 2019

@cowtowncoder Could you please give me some guidance on how to fix this issue? I'd like to fix it in time for 2.10.

The unit test spots the issue: empty records fail on deserialization.

@cowtowncoder
Copy link
Member

Quick question here: is this related to how Apache Avro library generates schema, or can this be easily replicated with textual schema and/or Avro module's schema generator?

Just figuring out where to add the test: if it's related to apache lib, interop is fine; but if not I try to keep depenencies to it to minimum.

I will try to help with this issue over the weekend: thank you for all your work on Avro module & apologies for limited time I have to spend to help keep you unblocked.

@marcospassos
Copy link
Contributor Author

Quick question here: is this related to how Apache Avro library generates the schema, or can this be easily replicated with textual schema and/or Avro module's schema generator?

I don't believe that is related to schema generation, but rather to the serialization logic that expects a non-empty list of field, mainly because I’m providing the schema in the test.

Thank you for your support. I'm glad to contribute to a library that helps us that much.

@cowtowncoder
Copy link
Member

@marcospassos Ok so the test can be simplified to use (f.ex) embedded schema it sounds like.

@cowtowncoder
Copy link
Member

Did not have time to dig into this today: it's at top of my list and (aside from adding comments which I prioritize high to try to help everyone), and should be able to at least get test added, have a look at exception tomorrow,

cowtowncoder added a commit that referenced this pull request Aug 26, 2019
@cowtowncoder
Copy link
Member

@marcospasson Ok yes, I can see the problem. Will see if I can easily fix this corner case.

@cowtowncoder
Copy link
Member

Hmmh. So, end-of-input detection by reader notices we are out of content (which is true), but reader probably would need to synthesize START_OBJECT/END_OBJECT for special case of empty Object(s). Possibly/probably only affects root values, not nested ones.

@cowtowncoder
Copy link
Member

This may be difficult thing to fix. The problem is that at root level decision needs to be made regarding whether more content is available or not, and that is determined by checking if more content is available. In this case there isn't, and usually that signals end-of-content. But since "empty" Record takes no space, in this instance expectation is that handling should sort of materialize empty structure (START_OBJECT, END_OBJECT) out of no content.

So: the idea would be to detect the case of Record with no fields being read, and if so, avoid indicating end-of-content. With that RecordReader should be able to properly return START_OBJECT/END_OBJECT sequence as expected.

@cowtowncoder
Copy link
Member

And just to test above: if I pass new byte[1], test passes: reader is able to get empty record decoded.
So the trick is just to figure out how to find this special case in RootReader, either during its initialization, or, possibly, within nextToken() if possible.

@marcospassos
Copy link
Contributor Author

marcospassos commented Aug 26, 2019

Hmmm, got it. How about checking the schema for emptiness?

@cowtowncoder cowtowncoder changed the title [Help needed] Fix empty record serialization [Help needed] Fix empty record deserialization Aug 26, 2019
@cowtowncoder cowtowncoder changed the title [Help needed] Fix empty record deserialization Deserialization of "empty" Records as root values fails Aug 26, 2019
@cowtowncoder
Copy link
Member

That is sort of what I ended up doing, having a look at fields found, if any. I suspect some edges might still cause problems (maybe for "constant" values), but this case works now.

Well, it does for 2.10; master has something else remaining that I need to look into (due to optimized field-name lookups).

@marcospassos
Copy link
Contributor Author

Thank you!

@cowtowncoder
Copy link
Member

Np -- thank you for reporting it. Let me know if you find other edge cases.

cowtowncoder added a commit that referenced this pull request Aug 27, 2019
@marcospassos marcospassos deleted the empty-record-bug branch August 27, 2019 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants