Skip to content
This repository has been archived by the owner on Jan 10, 2023. It is now read-only.

Workaroud for UnicodeDecodeError #171

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

matthieuxyz
Copy link

Why:

Fix a random UnicodeDecodeError, when using AdbCommands.Logcat.

Steps to reproduce

  • Add multi-bytes characters to log file
  • Try to use AdbCommands.Logcat

Expected result

Output result as unicode string (or str in python3)

Actual result

UnicodeDecodeError is sometime raised with the following message:

ERROR: 'utf-8' codec can't decode byte 0xe3 in position 4095: unexpected end of data

The exception is raised from the method AdbMessage.StreamingCommand() from the module adb_protocol.py:

    @classmethod
    def StreamingCommand(cls, usb, service, command='', timeout_ms=None):
        """
        ...
        """
        if not isinstance(command, bytes):
            command = command.encode('utf8')
        connection = cls.Open(
            usb, destination=b'%s:%s' % (service, command),
            timeout_ms=timeout_ms)
        for data in connection.ReadUntilClose():
            yield data.decode('utf-8')  # <---- HERE

Investigation

When streaming output of command, a max length is provided to UsbHandle.BulkRead. This length is (if I understood correctly) given in bytes.

So the end of a bulk of bytes data can append at any given bytes.

But in utf-8, characters may be encoded on multiples bytes (ex: あ (U+3042) is encoded as 0xE3 0x81 0x82 in utf-8).

Since, the bulk may end at any given bytes, it is possible for a multi-byte sequence to be split between two bulks (ex: [b'...\xe3', b'\x81\x82...'] ).

But, decoding will fail if a multi-byte sequence is incomplete (try to run b'AAA\xe3'.decode('utf-8') in a python console).

What:

This pull request contains...

  • A test case with a @skip decorator to illustrate the problem.

  • New "Bytes" methods that implements a "workaround" (ie. using bytes and decode only when stream is complete)

  • New tests to cover the new "Bytes" methods

There is probably a better solution, but this is what worked for me. And I hope that it will help to illustrate the problem with StreamingCommand.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@coveralls
Copy link

coveralls commented Nov 4, 2019

Coverage Status

Coverage increased (+1.2%) to 44.888% when pulling 5f45f7c on matthieuxyz:master into f4e597f on google:master.

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@JeffLIrion
Copy link
Contributor

@matthieuxyz thanks for the detailed explanation! I implemented this fix for the adb-shell package, although I didn't add a specific test for this scenario. Could you please take a look and let me know if the change looks correct?

JeffLIrion/adb_shell#34

@matthieuxyz
Copy link
Author

Seems good to me. I made a pull request with additional tests to make sure:

JeffLIrion/adb_shell#35

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants