#393
UTF-8 Validation
MediumArrayBit ManipulationBit ManipulationArray
Approaches
Brute ForceOptimal
Complexity Comparison
| Brute Force | Optimal Solution★ | |
|---|---|---|
| Time | O(n²) | O(n) |
| Space | O(1) | O(1) |
💡
Intuition
Time O(n)Space O(1)
The optimal solution processes each byte in a single pass, checking the leading bits to determine the number of bytes in the current character and validating the continuation bytes accordingly. This reduces the time complexity significantly.
⚙️
Algorithm
5 steps- 1Step 1: Initialize an index to track the current byte being processed.
- 2Step 2: Loop through each byte in the data array.
- 3Step 3: Determine the number of bytes for the current character based on the leading bits.
- 4Step 4: Validate the required continuation bytes based on the number of bytes determined.
- 5Step 5: If any validation fails, return false. If all bytes are processed successfully, return true.
solution.py27 lines
1# Full working Python code
2
3def validUtf8(data):
4 i = 0
5 while i < len(data):
6 byte = data[i]
7 if (byte & 0x80) == 0:
8 # 1-byte character
9 i += 1
10 elif (byte & 0xE0) == 0xC0:
11 # 2-byte character
12 if i + 1 >= len(data) or (data[i + 1] & 0xC0) != 0x80:
13 return False
14 i += 2
15 elif (byte & 0xF0) == 0xE0:
16 # 3-byte character
17 if i + 2 >= len(data) or (data[i + 1] & 0xC0) != 0x80 or (data[i + 2] & 0xC0) != 0x80:
18 return False
19 i += 3
20 elif (byte & 0xF8) == 0xF0:
21 # 4-byte character
22 if i + 3 >= len(data) or (data[i + 1] & 0xC0) != 0x80 or (data[i + 2] & 0xC0) != 0x80 or (data[i + 3] & 0xC0) != 0x80:
23 return False
24 i += 4
25 else:
26 return False
27 return Trueℹ
Complexity note: The complexity is O(n) because we only traverse the array once, checking each byte and its continuation bytes in a single pass.
- 1Understanding the bit manipulation is crucial for identifying the number of bytes in UTF-8 encoding.
- 2The leading bits of each byte dictate how many continuation bytes are expected.
Solutions and explanations are original Tejav content. Problem titles © LeetCode — use the LeetCode button above for the full problem statement.