How do you solve Delete Duplicate Emails on LeetCode?

Delete Duplicate Emails (LeetCode #196) can be solved using Brute Force or Optimal Solution. The optimal approach is Optimal Solution with O(n) time complexity and O(n) space complexity. Key insight: Using a HashMap allows for efficient tracking of duplicates.

What algorithm or data structure is used to solve Delete Duplicate Emails?

Delete Duplicate Emails uses the following concepts: Database. The recommended patterns to study are: Hash Map, Array.

What are the interview tips for Delete Duplicate Emails?

Always clarify whether you need to keep the smallest id or just any one duplicate. Discuss your thought process out loud to show your understanding of the problem. Consider edge cases, such as all emails being unique or all being duplicates.

What are common mistakes when solving Delete Duplicate Emails?

Not using a HashMap and resorting to nested loops, leading to inefficiency. Failing to check for the smallest id when encountering duplicates.

Delete Duplicate Emails — LeetCode #196 (Easy)

Tags: Database

Related patterns: Hash Map, Array

Brute Force approach

Time complexity: O(n²). Space complexity: O(1).

The brute force approach involves checking each email against all others to find duplicates. This is straightforward but inefficient, as it requires multiple passes through the data.

This complexity arises because for each email, we potentially check all previous emails, leading to a nested loop.

Step 1: Create a list to keep track of emails that have been seen.
Step 2: For each email in the Person table, check if it has been seen before.
Step 3: If it has been seen, mark the current row for deletion; if not, add it to the seen list.
Step 4: After processing all emails, delete the marked rows from the Person table.

1. Initial state: seen = {}, to_delete = [] 2. Process email 'john@example.com' (id 1): seen = {'john@example.com'} 3. Process email 'bob@example.com' (id 2): seen = {'john@example.com', 'bob@example.com'} 4. Process email 'john@example.com' (id 3): to_delete = [2] 5. Final state before deletion: to_delete = [2] 6. Delete rows: Remaining emails are 'john@example.com' (id 1) and 'bob@example.com' (id 2).

Optimal Solution approach

Time complexity: O(n). Space complexity: O(n).

The optimal solution uses a HashMap to track the smallest id for each email. This allows us to efficiently identify duplicates and keep the one with the smallest id.

This complexity is due to the single pass through the Person table and the use of a HashMap for storage, which allows for efficient lookups.

Step 1: Create a HashMap to store the smallest id for each email.
Step 2: Iterate through the Person table and populate the HashMap with the email as the key and the smallest id as the value.
Step 3: Create a list of ids to delete, which are those not in the HashMap's values.
Step 4: Delete the rows with the ids collected in the previous step.

1. Initial state: emailMap = {} 2. Process email 'john@example.com' (id 1): emailMap = {'john@example.com': 1} 3. Process email 'bob@example.com' (id 2): emailMap = {'john@example.com': 1, 'bob@example.com': 2} 4. Process email 'john@example.com' (id 3): emailMap = {'john@example.com': 1, 'bob@example.com': 2} 5. Final emailMap: {'john@example.com': 1, 'bob@example.com': 2} 6. Delete rows not in emailMap values: Remaining emails are 'john@example.com' (id 1) and 'bob@example.com' (id 2).

Key Insights

Using a HashMap allows for efficient tracking of duplicates.
Always keep track of the smallest id when dealing with duplicates.

Common Mistakes

Not using a HashMap and resorting to nested loops, leading to inefficiency.
Failing to check for the smallest id when encountering duplicates.

Interview Tips

Always clarify whether you need to keep the smallest id or just any one duplicate.
Discuss your thought process out loud to show your understanding of the problem.
Consider edge cases, such as all emails being unique or all being duplicates.

#196

Delete Duplicate Emails

Easy

Database↗Hash Map↗Array↗

LeetCode ↗

Approaches

Brute ForceOptimal

Complexity Comparison

	Brute Force	Optimal Solution★
Time	O(n²)	O(n)
Space	O(1)	O(n)

💡

Intuition

Time O(n)Space O(n)

The optimal solution uses a HashMap to track the smallest id for each email. This allows us to efficiently identify duplicates and keep the one with the smallest id.

⚙️

Algorithm

4 steps

1Step 1: Create a HashMap to store the smallest id for each email.
2Step 2: Iterate through the Person table and populate the HashMap with the email as the key and the smallest id as the value.
3Step 3: Create a list of ids to delete, which are those not in the HashMap's values.
4Step 4: Delete the rows with the ids collected in the previous step.

solution.py12 lines

1# Full working Python code
2import pandas as pd
3
4def delete_duplicates(person):
5    email_map = {}
6    for index, row in person.iterrows():
7        if row['email'] not in email_map:
8            email_map[row['email']] = row['id']
9        else:
10            email_map[row['email']] = min(email_map[row['email']], row['id'])
11    to_delete = person[~person['id'].isin(email_map.values())]
12    person.drop(index=to_delete.index, inplace=True)

ℹ

Complexity note: This complexity is due to the single pass through the Person table and the use of a HashMap for storage, which allows for efficient lookups.

1Using a HashMap allows for efficient tracking of duplicates.
2Always keep track of the smallest id when dealing with duplicates.

Solutions and explanations are original Tejav content. Problem titles © LeetCode — use the LeetCode button above for the full problem statement.