#196
Delete Duplicate Emails
EasyDatabaseHash MapArray
Approaches
Brute ForceOptimal
Complexity Comparison
| Brute Force | Optimal Solution★ | |
|---|---|---|
| Time | O(n²) | O(n) |
| Space | O(1) | O(n) |
💡
Intuition
Time O(n)Space O(n)
The optimal solution uses a HashMap to track the smallest id for each email. This allows us to efficiently identify duplicates and keep the one with the smallest id.
⚙️
Algorithm
4 steps- 1Step 1: Create a HashMap to store the smallest id for each email.
- 2Step 2: Iterate through the Person table and populate the HashMap with the email as the key and the smallest id as the value.
- 3Step 3: Create a list of ids to delete, which are those not in the HashMap's values.
- 4Step 4: Delete the rows with the ids collected in the previous step.
solution.py12 lines
1# Full working Python code
2import pandas as pd
3
4def delete_duplicates(person):
5 email_map = {}
6 for index, row in person.iterrows():
7 if row['email'] not in email_map:
8 email_map[row['email']] = row['id']
9 else:
10 email_map[row['email']] = min(email_map[row['email']], row['id'])
11 to_delete = person[~person['id'].isin(email_map.values())]
12 person.drop(index=to_delete.index, inplace=True)ℹ
Complexity note: This complexity is due to the single pass through the Person table and the use of a HashMap for storage, which allows for efficient lookups.
- 1Using a HashMap allows for efficient tracking of duplicates.
- 2Always keep track of the smallest id when dealing with duplicates.
Solutions and explanations are original Tejav content. Problem titles © LeetCode — use the LeetCode button above for the full problem statement.