Group Anagrams

Problem Statement

Given an array of strings strs, group the anagrams together. You can return the answer in any order.

An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

In this problem, we will focus on how to effectively solve the Group Anagram challenge.

Example 1:

Input: strs = ["eat","tea","tan","ate","nat","bat"]
Output: [["bat"],["nat","tan"],["ate","eat","tea"]]
Explanation: 
- "eat", "tea", "ate" are anagrams
- "tan", "nat" are anagrams
- "bat" is alone

Example 2:

Input: strs = [""]
Output: [[""]]

Example 3:

Input: strs = ["a"]
Output: [["a"]]

Constraints:

  • 1 <= strs.length <= 10^4
  • 0 <= strs[i].length <= 100
  • strs[i] consists of lowercase English letters

Approach

Initial Thoughts

Anagrams have the same characters with the same frequencies. We need a way to identify anagrams and group them together. The key insight is finding a unique “signature” or “key” that represents all anagrams of the same group.

Strategy

Approach 1 (Sorted String as Key): Sort each string and use the sorted version as a key in a hash map. All anagrams will have the same sorted string.

Approach 2 (Character Count as Key): Count character frequencies and use the count tuple as a key. This avoids sorting overhead.

Both approaches use a hash map where:

  • Key: A unique identifier for anagram groups (sorted string or character count)
  • Value: List of strings that belong to this anagram group

Key Insights

  • Anagrams will produce the same key (sorted string or character count)
  • Hash map allows O(1) grouping of strings with the same key
  • Can use tuple of character counts as hash key (tuples are hashable, lists are not)
  • Sorting approach is simpler to code but slightly slower O(k log k) per string
  • Character count approach is O(k) per string but requires more code
  • Final step is extracting all values from hash map

Algorithm Steps

Sorted String Approach:

  1. Create empty hash map anagram_groups to store {sorted_string: [original_strings]}
  2. For each string in strs:
    • Sort the string to get its key
    • Add the original string to the list at anagram_groups[key]
  3. Return all values from the hash map as a list

Character Count Approach:

  1. Create empty hash map anagram_groups
  2. For each string in strs:
    • Count character frequencies (array of size 26)
    • Convert count array to tuple (to use as hash key)
    • Add the original string to the list at anagram_groups[tuple]
  3. Return all values from the hash map as a list

Complexity Analysis

Sorted String Approach:

Time Complexity: O(n * k log k)

  • n = number of strings
  • k = maximum length of a string
  • Sorting each string takes O(k log k)

Space Complexity: O(n * k)

  • Storing all strings in hash map
  • Hash map can have at most n groups

Character Count Approach:

Time Complexity: O(n * k)

  • n = number of strings
  • k = maximum length of a string
  • Counting characters takes O(k) per string
  • Asymptotically better than sorting

Space Complexity: O(n * k)

  • Same as sorted approach

Code

Solution 1: Sorted String as Key (Simpler & Clean)

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        anagram_groups = {}

        for s in strs:
            # Sort the string to get its key
            key = ''.join(sorted(s))

            # Add string to its anagram group
            if key not in anagram_groups:
                anagram_groups[key] = []
            anagram_groups[key].append(s)

        return list(anagram_groups.values())

Solution 2: Using defaultdict (Cleaner)

from collections import defaultdict

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        anagram_groups = defaultdict(list)

        for s in strs:
            key = ''.join(sorted(s))
            anagram_groups[key].append(s)

        return list(anagram_groups.values())

Solution 3: Character Count as Key (Optimal Time)

from collections import defaultdict

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        anagram_groups = defaultdict(list)

        for s in strs:
            # Count character frequencies
            count = [0] * 26
            for char in s:
                count[ord(char) - ord('a')] += 1

            # Use tuple of counts as key (lists aren't hashable)
            key = tuple(count)
            anagram_groups[key].append(s)

        return list(anagram_groups.values())

Solution 4: Character Count – More Concise

from collections import defaultdict

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        anagram_groups = defaultdict(list)

        for s in strs:
            count = [0] * 26
            for c in s:
                count[ord(c) - ord('a')] += 1
            anagram_groups[tuple(count)].append(s)

        return list(anagram_groups.values())

Code Walkthrough

Sorted String Approach (Solution 1):

Line 3: Initialize empty hash map to store anagram groups

Line 5: Iterate through each string in the input array

Line 7: Create the key by sorting the string

  • sorted(s) returns a list of characters: ['a', 'e', 't']
  • ''.join(...) converts it back to string: "aet"
  • All anagrams will produce the same sorted string

Line 10-12: Add string to its corresponding anagram group

  • Check if key exists in map; if not, create empty list
  • Append original string to the list

Line 14: Return all groups (values from hash map)

  • anagram_groups.values() gives dict_values object
  • Convert to list for the return type

Character Count Approach (Solution 3):

Line 8-10: Build character frequency array

  • Array of size 26 for lowercase letters ‘a’ to ‘z’
  • ord(char) - ord('a') maps ‘a’→0, ‘b’→1, …, ‘z’→25
  • Increment count for each character

Line 13: Convert count array to tuple

  • Tuples are immutable and hashable (can be used as dict keys)
  • Lists are not hashable and cannot be used as keys
  • Example: [1, 0, 0, ..., 1, 1] becomes (1, 0, 0, ..., 1, 1)

Line 14: Add string to group identified by this count signature

Line 16: Extract all anagram groups from the hash map


Edge Cases to Consider

  1. Empty string: strs = [""][[""]]
  2. Single string: strs = ["a"][["a"]]
  3. No anagrams: strs = ["abc", "def", "ghi"][["abc"], ["def"], ["ghi"]]
  4. All same anagrams: strs = ["eat", "tea", "ate"][["eat", "tea", "ate"]]
  5. All identical strings: strs = ["a", "a", "a"][["a", "a", "a"]]
  6. Single character strings: strs = ["a", "b", "a"][["a", "a"], ["b"]]
  7. Different lengths:strs = ["a", "ab", "abc"][["a"], ["ab"], ["abc"]]
    • Different lengths → cannot be anagrams
  8. Empty array: Not possible per constraints (length >= 1)
  9. Long strings: Strings up to length 100
  10. Many groups: Up to 10^4 strings

Similar Problems

  • LC 49: Group Anagrams (this problem)
  • LC 242: Valid Anagram
  • LC 438: Find All Anagrams in a String
  • LC 567: Permutation in String
  • LC 249: Group Shifted Strings
  • LC 1152: Analyze User Website Visit Pattern
  • LC 890: Find and Replace Pattern
  • LC 760: Find Anagram Mappings

Personal Notes

  • This is a classic hash map grouping problem – very common pattern in interviews
  • Sorted string approach is recommended for interviews:
    • Simpler to code (one line for key generation)
    • Easier to explain
    • Performance difference rarely matters in practice
    • Shows you understand the core concept
  • Character count approach:
    • Asymptotically better time complexity
    • Good to mention as an optimization
    • More code, more room for bugs in interviews
  • Key insight: Finding the right “signature” for anagram groups
    • Sorted string: Simple and intuitive
    • Character count: More efficient but complex
    • Both work because anagrams share the same character distribution
  • defaultdict vs regular dict:
    • defaultdict(list) eliminates need for existence checks
    • Cleaner code, fewer lines
    • Good to know for interviews
  • Why tuple for character count:
    • Lists are mutable and not hashable
    • Tuples are immutable and hashable
    • Can use tuple as dictionary key, not list
  • Common mistakes:
    • Forgetting to convert sorted result back to string
    • Trying to use list as hash key (not hashable)
    • Not converting dict_values to list for return
  • Interview strategy:
    1. Start with sorted approach (easier to code)
    2. Mention character count as optimization
    3. Code the sorted approach cleanly with defaultdict
  • Return order doesn’t matter – any grouping order is acceptable
  • This problem combines multiple concepts: hash maps, sorting, character counting

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top