#609
Find Duplicate File in System
MediumArrayHash TableStringHash MapArray
Approaches
Brute ForceOptimal
Complexity Comparison
| Brute Force | Optimal Solution★ | |
|---|---|---|
| Time | O(n²) | O(n) |
| Space | O(1) | O(n) |
💡
Intuition
Time O(n)Space O(n)
The optimal approach uses a hash map to group files by their content. This drastically reduces the number of comparisons needed, allowing us to efficiently find duplicates.
⚙️
Algorithm
4 steps- 1Step 1: Initialize a hash map to store file contents as keys and their paths as values.
- 2Step 2: For each directory, extract the files and their contents.
- 3Step 3: For each file, add its path to the hash map under its content key.
- 4Step 4: Collect all paths from the hash map that have more than one entry (duplicates).
solution.py12 lines
1# Full working Python code
2from collections import defaultdict
3
4def findDuplicate(paths):
5 content_map = defaultdict(list)
6 for path in paths:
7 parts = path.split(' ')
8 dir_path = parts[0]
9 for file in parts[1:]:
10 name, content = file.split('(')
11 content_map[content].append(f'{dir_path}/{name}')
12 return [group for group in content_map.values() if len(group) > 1]ℹ
Complexity note: The time complexity is O(n) because we process each file exactly once. The space complexity is O(n) due to the storage of file paths in the hash map.
- 1Using a hash map allows us to efficiently group files by content.
- 2Identifying duplicates requires careful extraction of file names and contents.
Solutions and explanations are original Tejav content. Problem titles © LeetCode — use the LeetCode button above for the full problem statement.