#3475

DNA Pattern Recognition

Medium
DatabaseHash MapArray
LeetCode ↗

Approaches

Brute ForceOptimal
Complexity Comparison
Brute ForceOptimal Solution
Time
O(n²)
O(n)
Space
O(1)
O(n)
💡

Intuition

Time O(n)Space O(n)

The optimal solution processes each DNA sequence only once, checking for all patterns in a single pass. This reduces the time complexity significantly and is more efficient for larger datasets.

⚙️

Algorithm

4 steps
  1. 1Step 1: Initialize an empty result list to store the output.
  2. 2Step 2: For each DNA sequence, check for all patterns in a single iteration.
  3. 3Step 3: Use string methods to check for the start codon, stop codons, motif, and consecutive G's.
  4. 4Step 4: Append the results to the result list and return it ordered by sample_id.
solution.py10 lines
1def identify_patterns(samples):
2    result = []
3    for sample in samples:
4        sample_id, dna_sequence, species = sample
5        has_start = dna_sequence.startswith('ATG')
6        has_stop = any(dna_sequence.endswith(stop) for stop in ['TAA', 'TAG', 'TGA'])
7        has_atat = 'ATAT' in dna_sequence
8        has_consecutive_g = 'GGG' in dna_sequence
9        result.append((sample_id, dna_sequence, species, has_start, has_stop, has_atat, has_consecutive_g))
10    return sorted(result, key=lambda x: x[0])

Complexity note: The time complexity is O(n) because we only traverse each DNA sequence once, checking all patterns in a single pass. The space complexity is O(n) due to the storage of results.

  • 1Patterns can be checked in a single pass for efficiency.
  • 2Using built-in string methods can simplify the logic.

Solutions and explanations are original Tejav content. Problem titles © LeetCode — use the LeetCode button above for the full problem statement.