Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to interrupt hanging thread? #46

Open
KnIfER opened this issue Jan 3, 2020 · 7 comments
Open

how to interrupt hanging thread? #46

KnIfER opened this issue Jan 3, 2020 · 7 comments

Comments

@KnIfER
Copy link

KnIfER commented Jan 3, 2020

call thread.interrupt() and nothing happened. so how to stop the hanging thread?

Charset _charset = Charset.forName("GB18030");
/* text containing irregular binary data will make thread hang */
Thread thread = new Thread(new Runnable() {
	@Override
	public void run() {
		try {
			String key = "a"; // any character
			byte[] pattern = key.getBytes(_charset);

			Regex regex = new Regex(pattern, 0, pattern.length, Option.IGNORECASE, GB18030Encoding.INSTANCE);

			byte[] source = new byte[]{0x2f, 0x2f, (byte) 0xaf}; // text content.
			/* Encoded by GB18030, It reads "//�" where � means that "0xaf" is wrong or unsupported? */
			System.out.println(new String(source, _charset));

			Matcher matcher = regex.matcher(source);
			// search Interruptible ?
			int idx=matcher.searchInterruptible(0, source.length, Option.DEFAULT);
			System.out.println(idx+"");
		} catch (InterruptedException e) {
			System.out.println("InterruptedException");
			e.printStackTrace();
		}
	}
});

thread.start();

new Thread(new Runnable() {
	@Override
	public void run() {
		try {
			Thread.sleep(500);
		} catch (InterruptedException e) {
			e.printStackTrace();
		}
		System.out.println("interrupt !!! ");
		thread.interrupt();  // called but not working. 
	}
}).start();

v2.1.30

@headius
Copy link
Member

headius commented Feb 10, 2020

Hmmm I would have expected that to work. Trying locally.

@headius
Copy link
Member

headius commented Feb 10, 2020

Ok, so a bit of background here. The original intent of the interruptibility was to ensure long-running regular expressions could be stopped. By long running here we mean things like regular expressions with complex alternations... not regular expressions that get stuck because of bad character encoding.

This case, and likely many others, are not interruptible because they get stuck in a tight inner loop trying endlessly to advance a byte offset for a broken multibyte characters. In this case, the code get stuck here:

while (s < end) {
if (lowerCaseMatch(target, targetP, targetEnd, text, s, textEnd, enc, buf, regex.caseFoldFlag)) return s;
s += enc.length(text, s, textEnd);
}

Adding an interrupt check here would add a lot of overhead to this search logic, so I'm reluctant to do that. We have discussed making joni more robust in the presence of bad character data, however, since there's several places like this that can get stuck.

So unfortunately right now there's no workaround to make interrupting work for your case. It should really be an error. We'll discuss a bit and see if we can figure out a good path forward.

@KnIfER
Copy link
Author

KnIfER commented Feb 11, 2020

what about this ?

    int max = enc.maxLength();
    if(max==1) {
        while (s < end) {
            if (lowerCaseMatch(target, targetP, targetEnd, text, s, textEnd, enc, buf, regex.caseFoldFlag)) return s;
            s += enc.length(text, s, textEnd);
        }
    } else {
        int count = s;
        while (s < textRange) {
            if (lowerCaseMatch(target, targetP, targetEnd, text, s, textEnd, enc, buf, regex.caseFoldFlag)) return s;
            int val = enc.length(text, s, textEnd);
            if (val > 0) {
                s += val;
                count += val;
            } else {
                if (max + s + val < count) {
                    s += 1;
                    count = s;
                } else { //I believe it shouldn't lag too much behind ?
                    s += val;
                    count += 1;
                }
            }
        }
    }

besides SLOW_IC_FORWARD, I've also modified MAP_FORWARD,they are similiar.
I wonder is it bad though.

@headius
Copy link
Member

headius commented Feb 11, 2020

This looks like a decent change! We were discussing on the JRuby matrix how to fix this, and I believe a similar fix was suggested. @lopex What do you think?

@lopex
Copy link
Contributor

lopex commented Feb 11, 2020

I'm reluctant to such changes since they will introduce inconsistencies and diverge us from onigmo even more. The problem is explained more thoroughly in jruby/jcodings#26

@lopex
Copy link
Contributor

lopex commented Feb 11, 2020

Also, jruby/jcodings#26 shows that those infinite loops will exist in other places and it would not be ideal to randomly patch susceptible callsites.

@lopex
Copy link
Contributor

lopex commented Feb 11, 2020

For this very case, a quick fix would be to add an unsafe GB18030 version in the same fashion https://github.com/jruby/jcodings/tree/unsafe-encoding aims to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants