#196

Delete Duplicate Emails

Easy
DatabaseHash MapArray
LeetCode ↗

Approaches

Brute ForceOptimal
Complexity Comparison
Brute ForceOptimal Solution
Time
O(n²)
O(n)
Space
O(1)
O(n)
💡

Intuition

Time O(n)Space O(n)

The optimal solution uses a HashMap to track the smallest id for each email. This allows us to efficiently identify duplicates and keep the one with the smallest id.

⚙️

Algorithm

4 steps
  1. 1Step 1: Create a HashMap to store the smallest id for each email.
  2. 2Step 2: Iterate through the Person table and populate the HashMap with the email as the key and the smallest id as the value.
  3. 3Step 3: Create a list of ids to delete, which are those not in the HashMap's values.
  4. 4Step 4: Delete the rows with the ids collected in the previous step.
solution.py12 lines
1# Full working Python code
2import pandas as pd
3
4def delete_duplicates(person):
5    email_map = {}
6    for index, row in person.iterrows():
7        if row['email'] not in email_map:
8            email_map[row['email']] = row['id']
9        else:
10            email_map[row['email']] = min(email_map[row['email']], row['id'])
11    to_delete = person[~person['id'].isin(email_map.values())]
12    person.drop(index=to_delete.index, inplace=True)

Complexity note: This complexity is due to the single pass through the Person table and the use of a HashMap for storage, which allows for efficient lookups.

  • 1Using a HashMap allows for efficient tracking of duplicates.
  • 2Always keep track of the smallest id when dealing with duplicates.

Solutions and explanations are original Tejav content. Problem titles © LeetCode — use the LeetCode button above for the full problem statement.