Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recycled buffers in XMLStreamWriter are colliding between multiple writer instances when used in an Akka stream #82

Closed
jphelp32 opened this issue Sep 12, 2019 · 16 comments
Milestone

Comments

@jphelp32
Copy link

I'll start w/the caveat that I have not been able to produce a unit test case to illustrate this issue. But what I'm observing in our application is that the recycled char buffers (pretty sure it's the small copy buffers, but could potentially be the others), are being concurrently used by multiple threads in an Akka stream, and the result is corrupted output. In particular in the namespace writing, in my application.
I've so far been unable to reproduce the problem using aalto, but I'm not certain it also isn't affected given that it also uses the ThreadLocal recycled buffers.
Hoping you might have an inkling as to how to attempt to reproduce outside of Akka and my application? Or perhaps this is already a known issue that woodstox should not be used under fork-join parallelism? (However, I did try changing my akka configuration to use a fixed threadpool dispatcher and the problem still occurs.)
I've locally recompiled woodstox w/some debug logging added and I have some log files that demonstrate the problem. Don't want to presume upon you to review them, but I mention in case it might help.

@jphelp32 jphelp32 changed the title Recycled buffers in XMLStreamWriter are colliding when multiple when used in an Akka stream Recycled buffers in XMLStreamWriter are colliding between multiple writer instances when used in an Akka stream Sep 12, 2019
@jphelp32
Copy link
Author

Currently using woodstox-core:5.3.0, but also tested 5.2.1 and 6.0.0 and they are also affected.

@cowtowncoder
Copy link
Member

Ok. Unfortunate timing, on one hand (since I just pushed 6.0.0 recently), but possibly fortunate since I think I went through similar problem with Jackson.

So: one assumption with current buffer recycling is that parser/generator instances are only ever used from a single thread: not just at any given time (i.e. single-threaded access), but during life-cycle of parser/generator instance. If so, ThreadLocal suffices.
In case of Jackson this was generally true, except that when async-parsing was implemented, usage pattern changed, and problem was found. Solution was to use atomic references in addition to ThreadLocal/SoftReference, and I think that should solve the issue (full fix will be in 2.10.0).

I was actually wondering if I should port changes to Woodstox' recycler (which is where Jackson originally got it impl from), but since I had not heard of any issues I hadn't.

But sounds like there is now use case.

Now: do you think it would be easy for you to test a fix, if I come up with what I think should work? Does not have to be automated.

@jphelp32
Copy link
Author

Yeah, I'm able to consistently reproduce within my application using an aggressive gatling script.
I'm curious though.. with aalto, I cannot reproduce with that same script. And that is using the same ThreadLocal/SoftReference strategy as far as I can tell. Any thoughts as to why that might be?

@cowtowncoder
Copy link
Member

cowtowncoder commented Sep 12, 2019

Hmmh. That is interesting -- code in BufferRecycler looks identical, so I would expect behavior to be similar. I assume access pattern for either obtaining or releasing might differ.

Finding out which buffer is causing problems might help.

@cowtowncoder
Copy link
Member

One thing wrt Aalto being less prone could be that it does fewer copies in general, and bundles local-name/namespace-URI combination so it could just have less time to have collision or something. So perhaps it is just significantly less common to get corruption, but not immune.

@jphelp32
Copy link
Author

That's sorta what I thought might be the case. And why I didn't just switch over to aalto and move on.

@cowtowncoder
Copy link
Member

:)

It'd be good to resolve the problem, still.

One more question: how is the shared access from multiple threads done? As in, are parsers/generators (problem is on generator side, right?) being shared or passed across processing threads? Since both operations are sequential, I assume there is some hand-over or. ..?

@jphelp32
Copy link
Author

Problem is on the generator side, yes. Handoff is happening entirely by the akka stream/actors framework. I'm not exactly sure under the covers how it assigns threads to actors, etc.
I tried to write a unit test to reproduce, taking akka out of the equation, but could not reproduce the problem. Though I'm not very familiar with java concurrency and ExecutorService etc, to know whether or not my test was a good representation of what akka is doing.

@jphelp32
Copy link
Author

jphelp32 commented Sep 12, 2019

To answer your earlier question, it appears to be the medium cbuffer that I'm having issues with.
I turned your commented out System.err.println calls in BufferRecycler into log entries. And I added some additional logging into BufferingXmlWriter.writeAttrValue(), as follows:

    private final void writeAttrValue(char[] value, int offset, int len)
        throws IOException
    {
        String val = new String(value).substring(offset, len);
        logger.debug(this.toString() + " writeAttrValue: " + val);
        logger.debug(this.toString() + " value buf " + value.toString() + " offset=" + offset + " len=" + len + " content=" + new String(value));

and I can see the problem pretty clearly in the log:

00:43:45.999+0000 DEBUG [wws.dispatcher-231] [c.c.w.i.BufferRecycler] - com.ctc.wstx.io.BufferRecycler@4b4234c1 Alloc CMed: [C@321c270
00:43:45.999+0000 DEBUG [wws.dispatcher-223] [c.c.w.i.BufferRecycler] - com.ctc.wstx.io.BufferRecycler@4b4234c1 Alloc CMed: [C@321c270
00:43:45.999+0000 DEBUG [wws.dispatcher-231] [c.c.w.s.BufferingXmlWriter] - BufferingXmlWriter@3d8d0790 writeAttrValue: aN8sHgwZXIjW6o1Ff1kaa8eRFEHZZTaT2qQLdI5bwNtbWXjZ
00:43:45.999+0000 DEBUG [wws.dispatcher-223] [c.c.w.s.BufferingXmlWriter] - BufferingXmlWriter@734af638 writeAttrValue: aN8sHgwZXIjW6o1Ff1kaa8eRFEHZZTaT2qQLdI5bwNtbWXjZTFkjHazVj
00:43:45.999+0000 DEBUG [wws.dispatcher-223] [c.c.w.s.BufferingXmlWriter] - BufferingXmlWriter@734af638 value buf [C@321c270 offset=0 len=57 content=aN8sHgwZXIjW6o1Ff1kaa8eRFEHZZTaT2qQLdI5bwNtbWXjZTFkjHazVjr,VPRept)��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
00:43:45.999+0000 DEBUG [wws.dispatcher-231] [c.c.w.s.BufferingXmlWriter] - BufferingXmlWriter@3d8d0790 value buf [C@321c270 offset=0 len=48 content=aN8sHgwZXIjW6o1Ff1kaa8eRFEHZZTaT2qQLdI5bwNtbWXjZTFkjHazVjr,VPRept)��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

dispatcher-231 and dispatcher-223 are two different threads. The test case generates random character data for the attribute values, so they ought to be completely different between the two threads. (you can see the original lengths of the two different random values are different, 57 vs 48). But by the time the char buf is logged inside writeAttrValue(), the last write won and both threads are dealing with the same buffer.

@cowtowncoder
Copy link
Member

Thank you for additional information.

@jphelp32
Copy link
Author

Do you know what sort of timeline you'd be looking at for publishing fixes to both woodstox and aalto? I'm currently looking into how to incorporate a temporary fork w/the ThreadLocals disabled into my project, but if you expect a pretty quick and easy fix to be available soon, I'd probably hold off on that.
Thanks

@cowtowncoder
Copy link
Member

This is bit difficult to say unfortunately as I am busy trying to finish up Jackson 2.10.
If I find time to actually dig into this, could be quick, and not much delay for release(s).
It's the fixing part that is unpredictable.

@cowtowncoder
Copy link
Member

Ahhhhh. One simple, obvious thing that I must have done for Jackson a while ago was to actually clear the buffer references. That alone would probably remove obvious failures.
But I am bit wary that if it leaves rare (but not "never") failure case, it'd be even gnarlier.

I think I'll dig into this bit more, and that should result in a fix this week I think -- it's more a question of how to make small enough refactoring. This is internal functionality so API changes should not be problematic.

@jphelp32
Copy link
Author

I'll give that a test run and let you know how it goes

@jphelp32
Copy link
Author

Several test runs produced no errors. Looks good. Thanks!

@cowtowncoder cowtowncoder added this to the 6.0.0.pr1 milestone Sep 27, 2019
@cowtowncoder
Copy link
Member

Was included in 6.0.1 release

@cowtowncoder cowtowncoder modified the milestones: 6.0.0.pr1, 6.0.1 Sep 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants