#609

Find Duplicate File in System

Medium
ArrayHash TableStringHash MapArray
LeetCode ↗

Approaches

Brute ForceOptimal
Complexity Comparison
Brute ForceOptimal Solution
Time
O(n²)
O(n)
Space
O(1)
O(n)
💡

Intuition

Time O(n)Space O(n)

The optimal approach uses a hash map to group files by their content. This drastically reduces the number of comparisons needed, allowing us to efficiently find duplicates.

⚙️

Algorithm

4 steps
  1. 1Step 1: Initialize a hash map to store file contents as keys and their paths as values.
  2. 2Step 2: For each directory, extract the files and their contents.
  3. 3Step 3: For each file, add its path to the hash map under its content key.
  4. 4Step 4: Collect all paths from the hash map that have more than one entry (duplicates).
solution.py12 lines
1# Full working Python code
2from collections import defaultdict
3
4def findDuplicate(paths):
5    content_map = defaultdict(list)
6    for path in paths:
7        parts = path.split(' ')
8        dir_path = parts[0]
9        for file in parts[1:]:
10            name, content = file.split('(')
11            content_map[content].append(f'{dir_path}/{name}')
12    return [group for group in content_map.values() if len(group) > 1]

Complexity note: The time complexity is O(n) because we process each file exactly once. The space complexity is O(n) due to the storage of file paths in the hash map.

  • 1Using a hash map allows us to efficiently group files by content.
  • 2Identifying duplicates requires careful extraction of file names and contents.

Solutions and explanations are original Tejav content. Problem titles © LeetCode — use the LeetCode button above for the full problem statement.