Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid continuation byte during reduction #167

Open
maroneze opened this issue Feb 26, 2025 · 4 comments
Open

invalid continuation byte during reduction #167

maroneze opened this issue Feb 26, 2025 · 4 comments

Comments

@maroneze
Copy link

While trying to reduce a C file, I'm periodically getting some crashes, such as:

00:12:58 INFO ===< LinesPass::1 >===
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 166: invalid continuation byte
Traceback (most recent call last):
  File "/usr/bin/cvise", line 313, in <module>
    reducer.reduce(pass_group, skip_initial=args.skip_initial_passes)
  File "/usr/share/cvise/cvise.py", line 149, in reduce
    self._run_additional_passes(pass_group['first'])
  File "/usr/share/cvise/cvise.py", line 172, in _run_additional_passes
    self.test_manager.run_pass(p)
  File "/usr/share/cvise/utils/testing.py", line 529, in run_pass
    success_env = self.run_parallel_tests()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/cvise/utils/testing.py", line 445, in run_parallel_tests
    quit_loop = self.process_done_futures()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/cvise/utils/testing.py", line 397, in process_done_futures
    assert test_env.exitcode
           ^^^^^^^^^^^^^^^^^
00:49:02 INFO ===< LinesPass::1 >===
Unexpected TestEnvironment::run failure: 'utf-8' codec can't decode byte 0xe2 in position 164: invalid continuation byte
Traceback (most recent call last):
  File "/usr/share/cvise/utils/testing.py", line 107, in run
    self.exitcode = self.run_test(False)
                    ^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/cvise/utils/testing.py", line 120, in run_test
    stdout, stderr, returncode = ProcessEventNotifier(self.pid_queue).run_process(self.test_script, shell=True)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/cvise/passes/abstract.py", line 132, in run_process
    stdout, stderr = proc.communicate()
                     ^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/subprocess.py", line 1209, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/subprocess.py", line 2153, in _communicate
    stdout = self._translate_newlines(stdout,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/subprocess.py", line 1086, in _translate_newlines
    data = data.decode(encoding, errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 164: invalid continuation byte
Traceback (most recent call last):
  File "/usr/bin/cvise", line 313, in <module>
    reducer.reduce(pass_group, skip_initial=args.skip_initial_passes)
  File "/usr/share/cvise/cvise.py", line 149, in reduce
    self._run_additional_passes(pass_group['first'])
  File "/usr/share/cvise/cvise.py", line 172, in _run_additional_passes
    self.test_manager.run_pass(p)
  File "/usr/share/cvise/utils/testing.py", line 529, in run_pass
    success_env = self.run_parallel_tests()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/cvise/utils/testing.py", line 445, in run_parallel_tests
    quit_loop = self.process_done_futures()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/cvise/utils/testing.py", line 397, in process_done_futures
    assert test_env.exitcode
           ^^^^^^^^^^^^^^^^^
AssertionError

I'm using the Fedora 40 package (cvise 2.9.0 (cc76e98d)). Would it be useful to update to the latest release here from Github before retrying and seeing if they still occur?

They're fairly infrequent (about once every 30 minutes), so I'm not entirely sure how to quickly reproduce them. But if the issue is not already known and/or fixed, I'll try getting more data about it.

@intrigus-lgtm
Copy link

Did cvise generate a folder containing information related to the bug?
That is, a cvise_* folder in the directory from which you started cvise?

@maroneze
Copy link
Author

No. I did see such directories when I left cvise running and put the laptop to sleep (which generated timeouts when awaken, and those did create cvise_extra_xxx directories), but whenever this Unicode issue arises, the Python process quits and there are no cvise_* directories.

I can still progress though, so the reduction takes less time and they are arriving more frequently now. Is there a way to fix a random seed to obtain a completely deterministic process?

I just compiled the master branch of cvise and tried using it instead (cvise --version returns cvise 2.11.0 (eac3b8c)) and the issue still happens occasionally.

The code (both the tool used by my reduction script and the analyzed code) are both open source, but installing them is non-trivial. I can try to produce a Dockerfile so it can be reproduced by others. Or I can try to add some debugging flags and see if I can provide you more details directly.

By the way, here's the message I had with the latest crash, which is slightly different from the 2.9 version:

00:03:13 INFO (3.9%, 285923 bytes, 509 lines)
Unexpected TestEnvironment::run failure: 'utf-8' codec can't decode byte 0xe2 in position 186: invalid continuation byte
Traceback (most recent call last):
  File "/home/user/bin/share/cvise/utils/testing.py", line 113, in run
    self.exitcode = self.run_test(False)
                    ^^^^^^^^^^^^^^^^^^^^
  File "/home/user/bin/share/cvise/utils/testing.py", line 126, in run_test
    stdout, stderr, returncode = ProcessEventNotifier(self.pid_queue).run_process(
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/bin/share/cvise/passes/abstract.py", line 139, in run_process
    stdout, stderr = proc.communicate()
                     ^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/subprocess.py", line 1209, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/subprocess.py", line 2153, in _communicate
    stdout = self._translate_newlines(stdout,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/subprocess.py", line 1086, in _translate_newlines
    data = data.decode(encoding, errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 186: invalid continuation byte
Traceback (most recent call last):
  File "/home/user/bin/bin/cvise", line 439, in <module>
    reducer.reduce(pass_group, skip_initial=args.skip_initial_passes)
  File "/home/user/bin/share/cvise/cvise.py", line 163, in reduce
    self._run_additional_passes(pass_group['first'])
  File "/home/user/bin/share/cvise/cvise.py", line 186, in _run_additional_passes
    self.test_manager.run_pass(p)
  File "/home/user/bin/share/cvise/utils/testing.py", line 587, in run_pass
    success_env = self.run_parallel_tests()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/bin/share/cvise/utils/testing.py", line 497, in run_parallel_tests
    quit_loop = self.process_done_futures()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/bin/share/cvise/utils/testing.py", line 419, in process_done_futures
    outcome = self.check_pass_result(test_env)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/bin/share/cvise/utils/testing.py", line 463, in check_pass_result
    assert test_env.exitcode
           ^^^^^^^^^^^^^^^^^
AssertionError

@maroneze
Copy link
Author

By the way, I just noticed, the very likely cause for this, are non-ASCII characters present in C comments, namely ℤ, ∀, ≡, ∧, ⇒, etc.

ℤ in particular is composed of bytes 0xe2 0x84 0xa4. So I think that cvise is splitting the comments in a way that is breaking these characters.

Here's an example of a comment present in the code I'm reducing:

/*@ axiomatic MemCmp {
logic ℤ memcmp{L1, L2}
(char *s1, char *s2, ℤ n)      reads \at(*(s1 + (0 .. n - 1)),L1), \at(*(s2 + (0 .. n - 1)),L2);
}
*/

@maroneze
Copy link
Author

I managed to get a similar (not entirely identical, but possibly related) issue with the following command:

cvise --skip-initial-passes --start-with-pass ClexPass::rm-toks-1 ./grep.sh file.c

And the following files:

In file.c:

∀ char *s;

In grep.sh:

#!/bin/bash

grep "∀ char \*s;" file.c

The main difference is that, while I do have Python error messages related to UnicodeDecodeError, cvise keeps running, while in my original case the process stopped due to an AssertionError.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants