Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utf8 decode error #100

Open
childenProtos opened this issue Mar 15, 2023 · 2 comments
Open

Utf8 decode error #100

childenProtos opened this issue Mar 15, 2023 · 2 comments

Comments

@childenProtos
Copy link

I am comparing two arm gcc elf files. If I do not specify the bin_dir the report is generated successfully but I get the "Unable to read assembly from binary" warning.

If I specify the correct bin_dir + bin_prefix the warning disappears and instead I get the following output (with utf-8 decode error):

py -m elf_diff --bin_dir tools\arm-gcc\bin --bin_prefix "arm-none-eabi-" --html_dir report2 [OLD].elf [NEW].elf
Tools:
   objdump: tools\arm-gcc\bin\arm-none-eabi-objdump.exe
   nm:      tools\arm-gcc\bin\arm-none-eabi-nm.exe
   readelf:      tools\arm-gcc\bin\arm-none-eabi-readelf.exe
   size:    tools\arm-gcc\bin\arm-none-eabi-size.exe
Verifying config keys...
Symbol selection regex:
   old binary: 'None'
   new binary: 'None'
Symbol exclusion regex:
   old binary: 'None'
   new binary: 'None'
Parsing symbols of old binary ([OLD].elf)
File format of binary [OLD].elf: elf32-littlearm
Extracting symbols
100% (5577 of 5577) |#####################################################################################| Elapsed Time: 0:00:00 Time:  0:00:00
Gathering instructions
100% (223307 of 223307) |#################################################################################| Elapsed Time: 0:00:00 Time:  0:00:00
Parsing symbols of new binary ([NEW].elf)
File format of binary [NEW].elf: elf32-littlearm
Extracting symbols
100% (5564 of 5564) |#####################################################################################| Elapsed Time: 0:00:00 Time:  0:00:00
Gathering instructions
================================================================================

Traceback (most recent call last):
  File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\__main__.py", line 124, in main
    exportDocument(settings)
  File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\__main__.py", line 66, in exportDocument
    document: ValueTreeNode = generateDocument(settings)
  File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\pair_report_document.py", line 1167, in generateDocument
    meta_document.configureValueTree(value_tree, settings=settings)
  File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\pair_report_document.py", line 976, in configureValueTree
    self.binary_pair = BinaryPair(
  File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\binary_pair.py", line 103, in __init__
    self.new_binary = Binary(
  File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\binary.py", line 78, in __init__
    self._initSymbols()
  File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\binary.py", line 122, in _initSymbols
    self._gatherSymbolInstructions()
  File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\binary.py", line 108, in _gatherSymbolInstructions
    instruction_collector.gatherSymbolInstructions(
  File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\instruction_collector.py", line 136, in gatherSymbolInstructions
    objdump_output: str = runSystemCommand(
  File "C:\[...]\Python\Python310\lib\site-packages\elf_diff\system_command.py", line 33, in runSystemCommand
    output: str = o.decode("utf8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 12242097: invalid start byte

================================================================================
 elf_diff is unconsolable :-( Something went wrong
================================================================================

 Error:  'utf-8' codec can't decode byte 0xfc in position 12242097: invalid start byte

================================================================================
 Don't let this take you down! Have a nice hot coffee and start over.
================================================================================

Is there any way I can debug the source of the error / find out what is causing the wrong utf-8 string?

@noseglasses
Copy link
Owner

Sorry for this answer coming pretty late. I am currently too busy to work on this project.
You might want to try replacing the decode call in line 33 of system_command.py with output: str = o.decode("utf8", errors="ignore"). I am not sure, though, which character causes the decoding to fail.

@me21
Copy link

me21 commented Apr 4, 2024

Here's my two cents on this issue:

I replaced that call with

    try:
      output: str = o.decode("utf8")
    except:
      with open("subprocess_output.txt", "wb") as f:
        f.write(o)
      raise

and got a text file containing the problematic output. In my case it was a section sign (0xA7) in the line containing source code. It appears my sources are encoded not as UTF-8 but as CP1252. After replacing the codec in decode call, elf_diff ran smoothly.

It would be nice to add source file encoding option to elf_diff command. And it may be different for the first and second ELF file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants