Top K Frequent Elements

Problem Statement

Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order.

Example 1:

Input: nums = [1,1,1,2,2,3], k = 2
Output: [1,2]
Explanation: 1 appears 3 times, 2 appears 2 times, 3 appears 1 time.
The 2 most frequent are 1 and 2.

Example 2:

Input: nums = [1], k = 1
Output: [1]

Constraints:

  • 1 <= nums.length <= 10^5
  • -10^4 <= nums[i] <= 10^4
  • k is in the range [1, the number of unique elements in the array]
  • It is guaranteed that the answer is unique

Follow-up:

Your algorithm’s time complexity must be better than O(n log n), where n is the array’s size.


Approach

Initial Thoughts

We need to find the k elements that appear most frequently. The naive approach would be to count frequencies, sort by frequency, and take the top k. But the follow-up asks for better than O(n log n).

Strategy

Approach 1 (Heap/Priority Queue): Count frequencies, then use a min-heap of size k to keep track of the k most frequent elements. Time: O(n log k).

Approach 2 (Bucket Sort): Count frequencies, then use bucket sort where index represents frequency. This achieves O(n) time.

Approach 3 (Sorting – doesn’t meet follow-up): Count frequencies and sort by frequency. Time: O(n log n).

The bucket sort approach is optimal for the follow-up constraint.

Key Insights

  • Must count frequencies first – use hash map for O(n) time
  • Bucket sort works perfectly here because frequencies are bounded by array length
  • Create buckets where bucket[i] contains all numbers with frequency i
  • Iterate through buckets from highest frequency to lowest
  • Can stop once we’ve collected k elements
  • Heap approach is good but O(n log k) – bucket sort is better at O(n)
  • Order of output doesn’t matter – makes problem easier

Algorithm Steps

Bucket Sort Approach (Optimal):

  1. Count frequencies: Create hash map {number: frequency}
  2. Create buckets: Array of size n+1 where bucket[i] = list of numbers with frequency i
  3. Fill buckets: For each number and its frequency, add number to appropriate bucket
  4. Collect top k: Iterate through buckets from highest frequency (n) to lowest (1)
    • Add numbers from each bucket to result
    • Stop when we have k elements
  5. Return result

Heap Approach (Alternative):

  1. Count frequencies with hash map
  2. Create min-heap of size k
  3. For each number and frequency:
    • If heap size < k, add to heap
    • Else if current frequency > min frequency in heap, pop min and add current
  4. Extract all elements from heap

Complexity Analysis

Bucket Sort Approach:

Time Complexity: O(n)

  • O(n) to count frequencies
  • O(n) to fill buckets
  • O(n) to collect top k elements (worst case visit all buckets)
  • Overall: O(n) – meets follow-up requirement

Space Complexity: O(n)

  • O(n) for frequency hash map
  • O(n) for buckets array
  • O(k) for result

Heap Approach:

Time Complexity: O(n log k)

  • O(n) to count frequencies
  • O(n log k) for heap operations (at most n insertions, each O(log k))
  • Better than O(n log n) but not as good as bucket sort

Space Complexity: O(n + k)

  • O(n) for frequency map
  • O(k) for heap

Sorting Approach:

Time Complexity: O(n log n) – does NOT meet follow-up requirement

Space Complexity: O(n)


Code

Solution 1: Bucket Sort (Optimal – O(n))

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        # Step 1: Count frequencies
        freq_map = {}
        for num in nums:
            freq_map[num] = freq_map.get(num, 0) + 1

        # Step 2: Create buckets (index = frequency)
        # bucket[i] contains all numbers with frequency i
        buckets = [[] for _ in range(len(nums) + 1)]

        # Step 3: Fill buckets
        for num, freq in freq_map.items():
            buckets[freq].append(num)

        # Step 4: Collect top k elements from highest frequency
        result = []
        for i in range(len(buckets) - 1, 0, -1):  # From highest to lowest
            for num in buckets[i]:
                result.append(num)
                if len(result) == k:
                    return result

        return result

Solution 2: Using Counter (Cleaner)

from collections import Counter

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        # Count frequencies
        freq_map = Counter(nums)

        # Create buckets
        buckets = [[] for _ in range(len(nums) + 1)]

        # Fill buckets
        for num, freq in freq_map.items():
            buckets[freq].append(num)

        # Collect top k
        result = []
        for i in range(len(buckets) - 1, 0, -1):
            for num in buckets[i]:
                result.append(num)
                if len(result) == k:
                    return result

        return result

Solution 3: Min Heap (O(n log k))

import heapq
from collections import Counter

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        # Count frequencies
        freq_map = Counter(nums)

        # Use min heap of size k
        # Store tuples of (frequency, number)
        min_heap = []

        for num, freq in freq_map.items():
            heapq.heappush(min_heap, (freq, num))
            if len(min_heap) > k:
                heapq.heappop(min_heap)  # Remove least frequent

        # Extract numbers from heap
        return [num for freq, num in min_heap]

Solution 4: Using most_common() (Pythonic but O(n log n))

from collections import Counter

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        freq_map = Counter(nums)
        return [num for num, freq in freq_map.most_common(k)]

Code Walkthrough

Bucket Sort Solution (Solution 1):

Lines 4-6: Count frequencies using hash map

  • Build {number: frequency} mapping
  • O(n) time

Line 10: Create buckets array of size n + 1

  • Index represents frequency (0 to n)
  • Each bucket is a list of numbers with that frequency
  • Size is n + 1 because max frequency is n (all elements same)

Lines 13-14: Fill buckets

  • For each number and its frequency
  • Add the number to the bucket at index frequency
  • Example: if 1 appears 3 times, add 1 to buckets[3]

Line 17: Start from highest frequency

  • range(len(buckets) - 1, 0, -1) goes from n down to 1
  • We want most frequent elements first

Lines 18-21: Collect elements from each bucket

  • Add all numbers from current frequency bucket
  • Once we have k elements, return immediately
  • Order doesn’t matter per problem statement

Why this works:

  • Buckets are naturally sorted by frequency (index)
  • Iterating from high to low gives us most frequent first
  • No explicit sorting needed – O(n) time!

Heap Solution (Solution 3):

Line 8: Initialize empty min heap

Lines 11-14: Process each number

  • Push (frequency, number) tuple onto heap
  • Heap orders by first element (frequency)
  • If heap exceeds size k, remove minimum
  • Maintains only k most frequent elements

Line 17: Extract numbers from heap

  • Heap contains k most frequent elements
  • Extract just the numbers (not frequencies)

Edge Cases to Consider

  1. Single element: nums = [1], k = 1[1]
  2. All same elements: nums = [1,1,1,1], k = 1[1]
  3. All unique elements: nums = [1,2,3,4], k = 2 → any 2 elements
  4. k equals unique count: nums = [1,1,2,2,3,3], k = 3[1,2,3]
  5. k = 1: Only need single most frequent element
  6. Large array: Up to 10^5 elements
  7. Negative numbers: nums = [-1,-1,2,2,2], k = 1[2]
  8. Two elements same frequency:nums = [1,1,2,2,3], k = 2[1,2] or [2,1]
    • Order doesn’t matter when frequencies are equal
  9. Maximum k: k = number of unique elements

Similar Problems

  • LC 347: Top K Frequent Elements (this problem)
  • LC 692: Top K Frequent Words
  • LC 973: K Closest Points to Origin
  • LC 215: Kth Largest Element in an Array
  • LC 451: Sort Characters By Frequency
  • LC 1481: Least Number of Unique Integers after K Removals
  • LC 658: Find K Closest Elements
  • LC 373: Find K Pairs with Smallest Sums

Personal Notes

  • This problem tests multiple concepts: hash maps, heaps, bucket sort, and optimization
  • Bucket sort approach is the key insight:
    • Most people think of heap first (O(n log k))
    • Bucket sort achieves O(n) by using frequency as index
    • This is the “aha!” moment for optimal solution
  • Interview strategy:
    1. Start by explaining frequency counting
    2. Mention heap approach (O(n log k))
    3. Then optimize to bucket sort (O(n)) if time permits
    4. Show you understand the trade-offs
  • Why bucket sort works here:
    • Frequencies are bounded: 0 to n
    • Small range makes bucket sort perfect
    • No need for comparison-based sorting
  • Heap approach is still valuable:
    • More intuitive for many people
    • Better for streaming data
    • Good enough for most practical cases
    • Shows you know about heaps and priority queues
  • Counter.most_common() is elegant but doesn’t meet follow-up
    • Uses quickselect internally (O(n) average case)
    • But can be O(n log n) worst case
    • Good to mention for Python knowledge
  • Common mistakes:
    • Not recognizing bucket sort opportunity
    • Creating buckets of wrong size (need n+1, not n)
    • Iterating buckets in wrong direction (low to high)
    • Forgetting to stop when k elements collected
  • Max heap vs Min heap:
    • We use min heap to maintain k elements
    • Keep k largest frequencies by removing minimum
  • The problem guarantees unique answer – no ties to worry about
  • Output order doesn’t matter – simplifies the problem
  • Bucket sort is a classic example of trading space for time efficiency

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top