This project focuses on validating UTF-8 encoding using bitwise operations, UTF-8 encoding rules, and Python programming. The goal is to determine if a given dataset correctly represents UTF-8 encoded data.
- 💡 Bitwise Operations: Use bitwise manipulation for handling data at the byte level.
- 📜 UTF-8 Encoding Scheme: Understand the structure of UTF-8 encoding for 1-4 byte characters.
- 🗂️ Data Representation: Efficiently represent data at the byte level.
- 🔄 Boolean Logic: Apply logical reasoning to validate encoding patterns.
- Python files are interpreted on Ubuntu 20.04 LTS using
python3
(version 3.4.3). - Each file should end with a new line and start with
#!/usr/bin/python3
. - Code must follow the PEP 8 style guide.
- Ensure all files are executable.
- Bitwise Operations:
- Understanding AND (
&
), OR (|
), XOR (^
), NOT (~
), and shifts (<<
,>>
).
- Understanding AND (
- UTF-8 Encoding Scheme:
- Understanding how characters are represented in 1-4 bytes.
- Recognizing valid UTF-8 patterns.
- Data Representation:
- Handling the least significant bits (LSB) of integers to simulate byte-level data.
- List Manipulation in Python:
- Iterating, accessing elements, and using list comprehensions.
- Boolean Logic:
- Applying logical conditions to validate data.
- Objective: Write a method to check if a data set represents a valid UTF-8 encoding.
- Prototype:
def validUTF8(data: List[int]) -> bool
- Return:
True
ifdata
is valid UTF-8; otherwise,False
data = [65]
print(validUTF8(data)) # True
data = [229, 65, 127, 256]
print(validUTF8(data)) # False
0-validate_utf8.py
: Contains the UTF-8 validation function.
0-main.py
: Test file to validate the implementation.
- Clone the repository:
git clone https://github.com/Alogyn/alx-interview
cd alx-interview/0x04-utf8_validation#
- Run the test file:
./0-main.py
Mohamed Derfoufi 📧 [email protected] | 🌐 Linkdin