← Back to Blog

Advanced Data Structures

July 22, 2025

Table of Contents


Advanced Data Structures

Binary Search Tree (BST)

Definition: A Binary Search Tree is a special type of Binary Tree that maintains a specific ordering of its elements. For any given node, all values in its left subtree are less than the node's value, and all values in its right subtree are greater than the node's value. This property, known as the BST invariant, must hold true for every node in the tree.

Time Complexity (for a Binary Tree):
  • Access (by value): O(log N) on average. O(N) in the worst case.
  • Search (by value): O(log N) on average. O(N) in the worst case.
  • Insertion: O(log N) on average. O(N) in the worst case.
  • Deletion: O(log N) on average. O(N) in the worst case.

The O(log N) average-case performance is the primary advantage of a BST. However, this relies on the tree being reasonably balanced. A skewed tree degrades performance to O(N).

Space Complexity: O(N) to store N nodes. The space required for recursion during operations is O(H), where H is the height of the tree (O(log N) for a balanced tree, O(N) for a skewed tree).

Strengths:
  • Keeps Data Sorted: The BST invariant ensures that the data is always structured in a sorted manner, which allows for very efficient searching.
  • Efficient Average-Case Operations: Searching, inserting, and deleting elements are significantly faster on average (O(log N)) than in unsorted arrays or lists.
  • Sorted Traversal: A key property of a BST is that an in-order traversal visits the nodes in ascending sorted order.
Weaknesses:
  • Unbalanced Performance: The main weakness is that performance degrades to O(N) if the tree becomes unbalanced from non-random insertions (e.g., inserting already sorted data). This is why self-balancing BSTs (like AVL or Red-Black trees) are used in production systems.
  • No O(1) Access: There is no constant-time way to access an element by its position or index.

Python Example: This example defines a TreeNode and performs a recursive search.

# Definition for a binary tree node.
class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

# Manually create a simple BST:
#      8
#    /   \
#   3     10
#  / \      \
# 1   6      14
root = TreeNode(8)
root.left = TreeNode(3, TreeNode(1), TreeNode(6))
root.right = TreeNode(10, None, TreeNode(14))

def search_bst(node, target):
    """Recursively searches for a target value in a BST."""
    if not node or node.val == target:
        return node
    
    # If target is smaller, go left. Otherwise, go right.
    if target < node.val:
        return search_bst(node.left, target)
    else:
        return search_bst(node.right, target)

# Search for a value
found_node = search_bst(root, 6)
print(f"Searching for 6... Found: {'Yes' if found_node else 'No'}")
# Expected Output: Searching for 6... Found: Yes

Java Example: This example defines a TreeNode and performs the same recursive search.

public class BstExample {

    // Definition for a binary tree node.
    public static class TreeNode {
        int val;
        TreeNode left;
        TreeNode right;

        TreeNode(int val) { this.val = val; }
        TreeNode(int val, TreeNode left, TreeNode right) {
            this.val = val;
            this.left = left;
            this.right = right;
        }
    }

    /**
     * Recursively searches for a target value in a BST.
     */
    public static TreeNode searchBst(TreeNode node, int target) {
        if (node == null || node.val == target) {
            return node;
        }

        // Based on the BST property, we only need to search one subtree.
        if (target < node.val) {
            return searchBst(node.left, target);
        } else {
            return searchBst(node.right, target);
        }
    }

    public static void main(String[] args) {
        // Manually create a simple BST:
        //      8
        //    /   \
        //   3     10
        TreeNode root = new TreeNode(8, new TreeNode(3), new TreeNode(10));
        
        // Search for a value
        TreeNode foundNode = searchBst(root, 10);
        System.out.println("Searching for 10... Found: " + (foundNode != null));
        // Expected Output: Searching for 10... Found: true
    }
}
Visual Example
Common Patterns and Example Problems:
  • Validation: A very common pattern is checking if a given binary tree is a valid BST. This requires passing down min/max constraints to recursively validate the BST invariant for all nodes.
    • Validate Binary Search Tree (#98)
  • Search, Insertion, and Deletion: These are the fundamental operations. You must be able to implement them while preserving the BST property.
    • Search in a Binary Search Tree (#700)
    • Insert into a Binary Search Tree (#701)
    • Delete Node in a BST (#450)
  • Leveraging the Sorted Order: Many problems rely on the fact that an in-order traversal of a BST yields a sorted sequence. This can be used to find elements by their rank or to find the successor/predecessor.
    • Kth Smallest Element in a BST (#230)
    • Lowest Common Ancestor of a Binary Search Tree (#235)
    • Convert Sorted Array to Binary Search Tree (#108)

Priority Queue

Definition: A Priority Queue is an abstract data type similar to a regular queue, but where each element has an associated "priority." Elements with higher priority are served before elements with lower priority. If two elements have the same priority, their relative order is typically undefined. The most common and efficient way to implement a Priority Queue is by using a Heap.

Time Complexity (when implemented with a Heap):
  • Insertion (enqueue/add): O(log N)
  • Deletion of Highest Priority (dequeue/poll): O(log N)
  • Get Highest Priority (peek): O(1)
  • Search (for an arbitrary element): O(N)

Space Complexity: O(N) to store N elements.

Strengths:
  • Prioritized Processing: Its core strength is efficiently managing a dynamic collection of items where you always need to access or process the one with the highest priority next.
  • Efficient Operations: The O(log N) time for insertions and deletions is very efficient, making it suitable for algorithms that frequently update a collection of prioritized tasks.
  • Flexible Prioritization: The definition of "priority" is flexible. It can be the smallest value, the largest value, or a custom-calculated score.
Weaknesses:
  • Slow Search: It is not designed for fast lookups of arbitrary elements. Finding an element that is not the highest-priority item requires an O(N) search.
  • No Direct Indexing: You cannot access elements by their position or index.

Python Example: This example uses the queue.PriorityQueue class, which is thread-safe and explicitly represents the abstract data type.

from queue import PriorityQueue

# PriorityQueue implements a min-priority queue by default.
# It stores tuples of (priority, data).
pq = PriorityQueue()

# Enqueue items with their priorities
pq.put((2, 'Task B')) # Priority 2
pq.put((1, 'Task A')) # Priority 1 (higher)
pq.put((3, 'Task C')) # Priority 3

print(f"Is the queue empty? {pq.empty()}")
# Expected Output: Is the queue empty? False

# Dequeue the highest-priority item
highest_priority_item = pq.get()
print(f"Dequeued item: {highest_priority_item}")
# Expected Output: Dequeued item: (1, 'Task A')

print(f"Next item's priority: {pq.queue[0][0]}")
# Expected Output: Next item's priority: 2

Java Example: This example uses PriorityQueue, which is Java's built-in heap-based implementation.

import java.util.PriorityQueue;
import java.util.Collections;

public class PriorityQueueExample {
    public static void main(String[] args) {
        // PriorityQueue is a min-priority queue by default.
        PriorityQueue<Integer> pq = new PriorityQueue<>();
        pq.add(30); // Lower priority
        pq.add(10); // Higher priority
        pq.add(20);

        // Peek at the highest-priority element
        System.out.println("Highest-priority element: " + pq.peek());
        // Expected Output: Highest-priority element: 10

        // Dequeue the highest-priority element
        System.out.println("Dequeued: " + pq.poll());
        // Expected Output: Dequeued: 10

        System.out.println("Next highest-priority element: " + pq.peek());
        // Expected Output: Next highest-priority element: 20
    }
}
Visual Example
Common Patterns and Example Problems:
  • Top 'K' Elements: Use a priority queue of size K to keep track of the K largest or smallest elements seen so far in a collection. This is more memory-efficient than sorting the entire collection.
    • Kth Largest Element in an Array (#215)
    • Top K Frequent Elements (#347)
  • Best-First Pathfinding Algorithms: Priority queues are the core of algorithms like Dijkstra's (for shortest paths) and A* search. The priority queue always stores the next most promising node to visit, ensuring the optimal path is found efficiently.
    • Find the City With the Smallest Number of Neighbors at a Threshold Distance (#1334)
  • Event Simulation / Task Scheduling: Used in systems to manage events or tasks. The event with the earliest timestamp (highest priority) is always processed next.
    • Meeting Rooms II (#253)
  • Merging Sorted Streams: A priority queue is perfect for merging K sorted lists. You add the head of each list to the queue, and repeatedly extract the minimum, adding the next element from that same list back into the queue.
    • Merge k Sorted Lists (#23)

Heap

Definition: A Heap is a specialized tree-based data structure that satisfies the heap property. It is typically implemented as a complete binary tree, which allows it to be stored efficiently in an array. There are two main types:

  • Min-Heap: The value of each parent node is less than or equal to the values of its children. The minimum element is always at the root.
  • Max-Heap: The value of each parent node is greater than or equal to the values of its children. The maximum element is always at the root.

Heaps are the most common way to implement the Priority Queue abstract data type.

Time Complexity:
  • Insertion (push): O(log N)
  • Deletion of Min/Max (pop): O(log N)
  • Get Min/Max (peek): O(1)
  • Search (for an arbitrary element): O(N)

The O(log N) complexity for insertion and deletion is due to the "sift-up" and "sift-down" operations required to maintain the heap property in a balanced tree.

Space Complexity: O(N) to store N elements.

Strengths:
  • Fast Access to Min/Max: Heaps provide O(1) access to the minimum or maximum element, which is their primary advantage.
  • Efficient Insertions and Deletions: O(log N) time for these operations is very efficient, making heaps ideal for problems where the collection of elements changes frequently.
  • Partial Ordering: A heap is a "partially ordered" structure. It does less work than a fully sorted array or BST, making it faster for problems where you only need priority-based access.
Weaknesses:
  • Slow Search: Searching for an element that is not the min/max requires checking every node, an O(N) operation. A BST is much better for general-purpose searching.
  • No Sorted Traversal: Iterating through a heap does not yield the elements in sorted order.

Python Example: This example uses the heapq module, which implements a min-heap.

import heapq

# The heapq module provides min-heap functionality.
min_heap = [5, 3, 7, 1]
heapq.heapify(min_heap) # Transform a list into a heap in-place

print(f"Min-heap: {min_heap}")
# Expected Output: Min-heap: [1, 3, 7, 5] (The structure, not sorted order)

# Push an element
heapq.heappush(min_heap, 0)
print(f"Min-heap after pushing 0: {min_heap}")
# Expected Output: Min-heap after pushing 0: [0, 1, 7, 5, 3]

# Pop the smallest element
smallest = heapq.heappop(min_heap)
print(f"Popped smallest element: {smallest}")
# Expected Output: Popped smallest element: 0

# Max-Heap using heapq (negate values approach)
# Since heapq only provides min-heap, we negate values to simulate max-heap
max_heap_values = [5, 3, 7, 1]
max_heap = [-x for x in max_heap_values]  # Negate all values
heapq.heapify(max_heap)

print(f"Max-heap (negated values): {max_heap}")
# Expected Output: Max-heap (negated values): [-7, -3, -5, -1]

# Push an element to max-heap
heapq.heappush(max_heap, -9)  # Push -9 to represent 9
print(f"Max-heap after pushing 9: {max_heap}")
# Expected Output: Max-heap after pushing 9: [-9, -7, -5, -1, -3]

# Pop the largest element from max-heap
largest = -heapq.heappop(max_heap)  # Negate to get original value
print(f"Popped largest element: {largest}")
# Expected Output: Popped largest element: 9

Java Example: This example uses PriorityQueue to create a min-heap and a max-heap.

import java.util.PriorityQueue;
import java.util.Collections;

public class HeapExample {
    public static void main(String[] args) {
        // PriorityQueue is a min-heap by default.
        PriorityQueue<Integer> minHeap = new PriorityQueue<>();
        minHeap.add(5);
        minHeap.add(3);
        minHeap.add(7);
        minHeap.add(1);

        System.out.println("Min-heap peek: " + minHeap.peek());
        // Expected Output: Min-heap peek: 1

        System.out.println("Popped from min-heap: " + minHeap.poll());
        // Expected Output: Popped from min-heap: 1

        // To create a max-heap, provide a reverse order comparator.
        PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder());
        maxHeap.add(5);
        maxHeap.add(3);
        maxHeap.add(7);
        maxHeap.add(1);

        System.out.println("Max-heap peek: " + maxHeap.peek());
        // Expected Output: Max-heap peek: 7

        System.out.println("Popped from max-heap: " + maxHeap.poll());
        // Expected Output: Popped from max-heap: 7
    }
}
Visual Example
Common Patterns and Example Problems:
  • Top 'K' Elements: The most common heap pattern. Use a heap of size K to find the Kth largest/smallest element, or the top K most frequent items. For the "Kth largest," you use a min-heap to easily discard elements smaller than the current K largest.
    • Kth Largest Element in an Array (#215)
    • Top K Frequent Elements (#347)
  • Finding the Median: A classic problem using two heaps: a max-heap to store the smaller half of the numbers and a min-heap to store the larger half. This keeps the median(s) at the top of the heaps, accessible in O(1) time.
    • Find Median from Data Stream (#295)
  • Merging and Pathfinding: Heaps are crucial for efficiently merging K sorted lists. They are also used in graph algorithms like Dijkstra's (shortest path) and Prim's (minimum spanning tree) to always select the next best node or edge to process.
    • Merge k Sorted Lists (#23)

Graph

Definition: A Graph is a non-linear data structure consisting of a finite set of vertices (or nodes) and a set of edges that connect these vertices. Unlike trees, graphs can have cycles, and there is no concept of a root node. They are used to model networks and relationships, such as social networks, computer networks, and maps.

Graph Types & Terminology Graphs can be categorized in several ways:

  • Directed vs. Undirected: In an undirected graph, edges are two-way (an edge between A and B means you can go from A to B and B to A). In a directed graph (or digraph), edges have a direction (an edge from A to B does not imply an edge from B to A).
  • Weighted vs. Unweighted: In a weighted graph, each edge has a "cost" or "weight" associated with it. In an unweighted graph, all edges are considered equal.
  • Cyclic vs. Acyclic: A graph is cyclic if it contains a path that starts and ends at the same vertex. A Directed Acyclic Graph (DAG) is a special type of directed graph with no cycles, often used to model dependencies or tasks.
Graph Representation
  • Adjacency List: The most common representation in interviews. Each vertex has a list of all the vertices it is connected to. It is space-efficient for sparse graphs (few edges). Space complexity is O(V + E).
  • Adjacency Matrix: A V x V matrix where matrix[i][j] is 1 (or the edge weight) if there is an edge from vertex i to j. It's faster for checking if an edge exists between two specific vertices but uses more space. Space complexity is O(V^2).
Time Complexity (using Adjacency List):
  • Add Vertex: O(1)
  • Add Edge: O(1)
  • Search/Traversal (BFS, DFS): O(V + E) where V is the number of vertices and E is the number of edges.

Space Complexity (using Adjacency List): O(V + E)

Strengths:
  • Extreme Versatility: Graphs can model a vast array of real-world problems, including social networks, mapping and navigation, internet routing, and dependency chains.
  • Rich Set of Algorithms: A deep and well-understood set of algorithms exists for traversal, pathfinding, and analyzing graph structures.
Weaknesses:
  • Complexity: Graph algorithms can be more complex to understand and implement correctly compared to those for linear or tree-based data structures.
  • High Time Complexity: Many graph algorithms are computationally intensive. For dense graphs, complexities can approach O(V^2) or higher, which can be slow for large inputs.

Python Example: This example uses a dictionary to build an adjacency list and performs a DFS traversal.

from collections import defaultdict

class Graph:
    def __init__(self):
        # Use a defaultdict to easily handle new vertices
        self.adj_list = defaultdict(list)

    def add_edge(self, u, v):
        self.adj_list[u].append(v)

    def dfs(self, start_node, visited=None):
        if visited is None:
            visited = set()
        
        visited.add(start_node)
        print(start_node, end=" ")
        
        for neighbor in self.adj_list[start_node]:
            if neighbor not in visited:
                self.dfs(neighbor, visited)

    # BFS example using the same graph
    from collections import deque

    def bfs(graph, start_node):
        visited = set()
        queue = deque([start_node])
        visited.add(start_node)
        result = []
        
        while queue:
            vertex = queue.popleft()
            result.append(vertex)
            
            for neighbor in graph.adj_list[vertex]:
                if neighbor not in visited:
                    visited.add(neighbor)
                    queue.append(neighbor)
        
        return result

# Create a directed graph: A->B, A->C, B->D, B->E, C->F, C->G
#       A
#     /   \
#    B     C
#   / \   / \
#  D   E F   G
g = Graph()
g.add_edge('A', 'B')
g.add_edge('A', 'C')
g.add_edge('B', 'D')
g.add_edge('B', 'E')
g.add_edge('C', 'F')
g.add_edge('C', 'G')

print("DFS starting from vertex A:")
g.dfs('A')
# Expected Output: DFS starting from vertex A: A B D E C F G

print("\nBFS starting from vertex A:")
bfs_result = bfs(g, 'A')
print(' '.join(bfs_result))
# Expected Output: BFS starting from vertex A: A B C D E F G 

Java Example: This example uses a Map to build an adjacency list and performs DFS.

import java.util.*;

public class GraphExample {
    private Map<Character, List<Character>> adjList = new HashMap<>();

    public void addEdge(char u, char v) {
        // Ensure the vertex is in the map, then add the neighbor
        this.adjList.computeIfAbsent(u, k -> new ArrayList<>()).add(v);
    }

    public void dfs(char startNode) {
        Set<Character> visited = new HashSet<>();
        dfsRecursive(startNode, visited);
    }
    
    private void dfsRecursive(char v, Set<Character> visited) {
        visited.add(v);
        System.out.print(v + " ");

        // Get neighbors of the current vertex. If none, use an empty list.
        List<Character> neighbors = this.adjList.getOrDefault(v, Collections.emptyList());
        for (Character neighbor : neighbors) {
            if (!visited.contains(neighbor)) {
                dfsRecursive(neighbor, visited);
            }
        }
    }

    public void bfs(char startNode) {
        Set<Character> visited = new HashSet<>();
        Queue<Character> queue = new LinkedList<>();
        
        queue.add(startNode);
        visited.add(startNode);
        
        while (!queue.isEmpty()) {
            char vertex = queue.poll();
            System.out.print(vertex + " ");
            
            List<Character> neighbors = this.adjList.getOrDefault(vertex, Collections.emptyList());
            for (Character neighbor : neighbors) {
                if (!visited.contains(neighbor)) {
                    visited.add(neighbor);
                    queue.add(neighbor);
                }
            }
        }
    }

    public static void main(String[] args) {
        // Create a directed graph: A->B, A->C, B->D, B->E, C->F, C->G
        //       A
        //     /   \
        //    B     C
        //   / \   / \
        //  D   E F   G
        GraphExample g = new GraphExample();
        g.addEdge('A', 'B');
        g.addEdge('A', 'C');
        g.addEdge('B', 'D');
        g.addEdge('B', 'E');
        g.addEdge('C', 'F');
        g.addEdge('C', 'G');
        
        System.out.println("DFS starting from vertex A:");
        g.dfs('A');
        // Expected Output: DFS starting from vertex A: A B D E C F G 
        
        // BFS example using the same graph
        System.out.println("\nBFS starting from vertex A:");
        g.bfs('A');
        // Expected Output: BFS starting from vertex A: A B C D E F G
    }
}
Visual Example
Common Patterns and Example Problems:
  • Graph Traversal (BFS & DFS): The foundation of nearly all graph problems. Used to determine connectivity, find paths, and detect cycles.

    • Depth-First Search (DFS): Explores as far as possible down each branch before backtracking. Often implemented with recursion or a stack.
    • Breadth-First Search (BFS): Explores all neighbors at the present depth before moving to the next level. Implemented with a queue. It's the standard way to find the shortest path in an unweighted graph.
    • Example Problems: Number of Islands (#200), Clone Graph (#133), Rotting Oranges (#994)
  • Topological Sort: For Directed Acyclic Graphs (DAGs), this provides a linear ordering of vertices such that for every directed edge from u to v, u comes before v. It's used for scheduling tasks with dependencies.

    • Example Problems: Course Schedule (#207), Course Schedule II (#210)
  • Shortest Path Algorithms: For finding the shortest path in weighted graphs.

    • Dijkstra's Algorithm: Finds the shortest path from a single source to all other vertices in a graph with non-negative edge weights. It uses a priority queue.
    • Example Problems: Network Delay Time (#743)
  • Minimum Spanning Tree (MST): For a weighted, undirected graph, an MST is a subset of the edges that connects all vertices together with the minimum possible total edge weight.

    • Example Problems: Min Cost to Connect All Points (#1584)

Graph (Adjacency Matrix)

Definition: An Adjacency Matrix is an alternative representation for graphs that uses a 2D matrix to store edge information. For a graph with V vertices, it creates a V×V matrix where matrix[i][j] indicates whether there is an edge from vertex i to vertex j. This representation provides different trade-offs compared to adjacency lists.

Time Complexity:
  • Add Edge: O(1)
  • Remove Edge: O(1)
  • Check if Edge Exists: O(1)
  • Get All Neighbors: O(V)
  • Space Complexity: O(V²)

Space Complexity: O(V²) regardless of the number of edges. This is the main trade-off - even sparse graphs use the same amount of space as dense graphs.

Strengths:
  • Fast Edge Queries: Checking if an edge exists between two vertices is O(1), which is faster than adjacency lists for dense graphs.
  • Simple Implementation: The matrix representation is straightforward and easy to understand.
  • Cache Locality: Good memory access patterns for certain algorithms that process edges systematically.
  • Algorithm Compatibility: Some algorithms (like Floyd-Warshall) are naturally implemented with matrices.
Weaknesses:
  • Space Inefficient for Sparse Graphs: Uses O(V²) space even when the graph has very few edges.
  • Slow Neighbor Iteration: Finding all neighbors of a vertex requires scanning an entire row, taking O(V) time.

Python Example: This example implements an adjacency matrix with the same graph structure as the adjacency list example.

class GraphMatrix:
    def __init__(self, vertices):
        self.V = len(vertices)
        self.vertex_map = {v: i for i, v in enumerate(vertices)}
        self.vertices = vertices
        # Initialize matrix with all 0s (no edges)
        self.matrix = [[0] * self.V for _ in range(self.V)]
    
    def add_edge(self, u, v):
        i, j = self.vertex_map[u], self.vertex_map[v]
        self.matrix[i][j] = 1  # For directed graph
        # For undirected graph, also add: self.matrix[j][i] = 1
    
    def has_edge(self, u, v):
        i, j = self.vertex_map[u], self.vertex_map[v]
        return self.matrix[i][j] == 1
    
    def get_neighbors(self, u):
        i = self.vertex_map[u]
        neighbors = []
        for j in range(self.V):
            if self.matrix[i][j] == 1:
                neighbors.append(self.vertices[j])
        return neighbors
    
    def print_matrix(self):
        print("   ", end="")
        for v in self.vertices:
            print(f"{v:3}", end="")
        print()
        for i, v in enumerate(self.vertices):
            print(f"{v}: ", end="")
            for j in range(self.V):
                print(f"{self.matrix[i][j]:3}", end="")
            print()

# Create the same graph: A->B, A->C, B->D, B->E, C->F, C->G
vertices = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
g_matrix = GraphMatrix(vertices)
g_matrix.add_edge('A', 'B')
g_matrix.add_edge('A', 'C')
g_matrix.add_edge('B', 'D')
g_matrix.add_edge('B', 'E')
g_matrix.add_edge('C', 'F')
g_matrix.add_edge('C', 'G')

print("Adjacency Matrix representation:")
g_matrix.print_matrix()
# Expected Output:
#      A  B  C  D  E  F  G
# A:   0  1  1  0  0  0  0
# B:   0  0  0  1  1  0  0
# C:   0  0  0  0  0  1  1
# D:   0  0  0  0  0  0  0
# E:   0  0  0  0  0  0  0
# F:   0  0  0  0  0  0  0
# G:   0  0  0  0  0  0  0

print(f"\nDoes edge A->B exist? {g_matrix.has_edge('A', 'B')}")
# Expected Output: Does edge A->B exist? True
print(f"Neighbors of A: {g_matrix.get_neighbors('A')}")
# Expected Output: Neighbors of A: ['B', 'C']

Java Example: This example implements an adjacency matrix using a 2D array.

import java.util.*;

public class GraphMatrixExample {
    
    static class GraphMatrix {
        private int[][] matrix;
        private Map<Character, Integer> vertexMap;
        private char[] vertices;
        private int size;
        
        public GraphMatrix() {
            this.vertices = new char[]{'A', 'B', 'C', 'D', 'E', 'F', 'G'};
            this.size = vertices.length;
            this.matrix = new int[size][size];
            this.vertexMap = new HashMap<>();
            
            // Initialize vertex mapping
            for (int i = 0; i < size; i++) {
                vertexMap.put(vertices[i], i);
            }
        }
        
        public void addEdge(char u, char v) {
            int i = vertexMap.get(u);
            int j = vertexMap.get(v);
            matrix[i][j] = 1; // For directed graph
            // For undirected graph, also add: matrix[j][i] = 1;
        }
        
        public boolean hasEdge(char u, char v) {
            int i = vertexMap.get(u);
            int j = vertexMap.get(v);
            return matrix[i][j] == 1;
        }
        
        public List<Character> getNeighbors(char u) {
            int i = vertexMap.get(u);
            List<Character> neighbors = new ArrayList<>();
            for (int j = 0; j < size; j++) {
                if (matrix[i][j] == 1) {
                    neighbors.add(vertices[j]);
                }
            }
            return neighbors;
        }
        
        public void printMatrix() {
            // Print header
            System.out.print("   ");
            for (char v : vertices) {
                System.out.printf("%3c", v);
            }
            System.out.println();
            
            // Print matrix rows
            for (int i = 0; i < size; i++) {
                System.out.print(vertices[i] + ": ");
                for (int j = 0; j < size; j++) {
                    System.out.printf("%3d", matrix[i][j]);
                }
                System.out.println();
            }
        }
    }

    public static void main(String[] args) {
        // Create the same graph: A->B, A->C, B->D, B->E, C->F, C->G
        GraphMatrix gMatrix = new GraphMatrix();
        gMatrix.addEdge('A', 'B');
        gMatrix.addEdge('A', 'C');
        gMatrix.addEdge('B', 'D');
        gMatrix.addEdge('B', 'E');
        gMatrix.addEdge('C', 'F');
        gMatrix.addEdge('C', 'G');
        
        System.out.println("Adjacency Matrix representation:");
        gMatrix.printMatrix();
        
        System.out.println("\nDoes edge A->B exist? " + gMatrix.hasEdge('A', 'B'));
        // Expected Output: Does edge A->B exist? true
        System.out.println("Neighbors of A: " + gMatrix.getNeighbors('A'));
        // Expected Output: Neighbors of A: [B, C]
    }
}
Visual Example
Common Use Cases:
  • Dense Graphs: When the number of edges approaches V², the space overhead becomes acceptable and the O(1) edge queries become valuable.
  • Floyd-Warshall Algorithm: All-pairs shortest path algorithms naturally work with matrix representation.
  • Graph Powers: Computing A², A³, etc. (matrix multiplication) to find paths of specific lengths.
  • Small Graphs: For graphs with a small number of vertices, the O(V²) space is manageable and the implementation simplicity is valuable.
Example Problems:
  • Number of Provinces (#547)
    • The input is an n x n matrix isConnected where isConnected[i][j] == 1 means city i and city j are connected. This is a direct adjacency matrix representation of an undirected graph. You can run a traversal (like DFS or BFS) directly on the matrix to find all connected components (provinces).
  • Find the City With the Smallest Number of Neighbors at a Threshold Distance (#1334)
    • In this Medium problem, n is at most 100. You need to find the shortest path from each city to every other city to see how many are within a distance threshold. This is a perfect scenario for building an adjacency matrix and running the Floyd-Warshall algorithm.
  • Min Cost to Connect All Points (#1584)
    • This problem is a perfect example of using an adjacency matrix to represent a graph. The graph is dense (all points are connected), and you need to find the minimum cost to connect all points.

Union-Find (Disjoint Set)

Definition: A Union-Find (or Disjoint Set) is a data structure that tracks a collection of elements partitioned into a number of non-overlapping (disjoint) subsets. It is specifically designed to be highly efficient at two operations:

  • find: Determine which subset an element belongs to by finding the "representative" or root of that set.
  • union: Merge two subsets into a single subset.
Core Concepts & Optimizations

A Union-Find data structure is typically implemented with an array and is made extremely efficient through two key optimizations:

  • Path Compression: During a find operation, every node on the path from the element to the root is made to point directly to the root. This dramatically flattens the structure for future find calls on any of those elements.
  • Union by Rank/Size: When merging two sets (represented as trees), the shorter tree is always attached to the root of the taller tree. This prevents the trees from becoming unnecessarily deep, keeping find operations fast. "Rank" refers to the height of the tree, while "Size" refers to the number of nodes.
Time Complexity:
  • Find: O(α(N)) on average (amortized) with both optimizations.
  • Union: O(α(N)) on average (amortized) with both optimizations.
  • Connected Check: O(α(N)) on average (amortized).

The function α(N) is the inverse Ackermann function, which grows so slowly that it is less than 5 for any conceivable input size N. For all practical purposes, the time complexity can be considered nearly constant.

Space Complexity: O(N) to store the parent and rank/size arrays for N elements.

Strengths:
  • Extremely Fast: The near-constant time for find and union operations makes it one of the most efficient data structures for its specific purpose.
  • Simple to Implement: A basic Union-Find implementation is relatively straightforward, requiring only an array.
Weaknesses:
  • Highly Specialized: It only answers questions about set membership and connectivity. It does not store the structure of the graph or the path between elements.
  • No "un-union": There is no easy way to undo a union operation.

Python Example: This example implements Union-Find with path compression and union by rank.

class UnionFind:
    def __init__(self, size):
        # parent[i] stores the parent of element i
        self.parent = list(range(size))
        # rank[i] stores the height of the tree rooted at i
        self.rank = [0] * size

    def find(self, i):
        # Path Compression: If i is not its own parent,
        # recursively find the root and set i's parent to it.
        if self.parent[i] != i:
            self.parent[i] = self.find(self.parent[i])
        return self.parent[i]

    def union(self, i, j):
        # Union by Rank: Attach the smaller rank tree
        # under the root of the higher rank tree.
        root_i = self.find(i)
        root_j = self.find(j)
        
        if root_i != root_j:
            if self.rank[root_i] > self.rank[root_j]:
                self.parent[root_j] = root_i
            elif self.rank[root_i] < self.rank[root_j]:
                self.parent[root_i] = root_j
            else:
                self.parent[root_j] = root_i
                self.rank[root_i] += 1
                
    def connected(self, i, j):
        return self.find(i) == self.find(j)

# Example usage with 5 elements (0 to 4)
uf = UnionFind(5)
uf.union(0, 1) # {0, 1}, {2}, {3}, {4}
uf.union(1, 2) # {0, 1, 2}, {3}, {4}
uf.union(3, 4) # {0, 1, 2}, {3, 4}

print(f"Are 0 and 2 connected? {uf.connected(0, 2)}")
# Expected Output: Are 0 and 2 connected? True
print(f"Are 1 and 4 connected? {uf.connected(1, 4)}")
# Expected Output: Are 1 and 4 connected? False

Java Example: This example implements Union-Find with path compression and union by rank.

public class UnionFindExample {

    static class UnionFind {
        private int[] parent;
        private int[] rank;

        public UnionFind(int size) {
            parent = new int[size];
            rank = new int[size];
            for (int i = 0; i < size; i++) {
                parent[i] = i;
                rank[i] = 0;
            }
        }

        public int find(int i) {
            if (parent[i] != i) {
                parent[i] = find(parent[i]); // Path compression
            }
            return parent[i];
        }

        public void union(int i, int j) {
            int root_i = find(i);
            int root_j = find(j);

            if (root_i != root_j) {
                // Union by rank
                if (rank[root_i] > rank[root_j]) {
                    parent[root_j] = root_i;
                } else if (rank[root_i] < rank[root_j]) {
                    parent[root_i] = root_j;
                } else {
                    parent[root_j] = root_i;
                    rank[root_i]++;
                }
            }
        }
        
        public boolean connected(int i, int j) {
            return find(i) == find(j);
        }
    }

    public static void main(String[] args) {
        UnionFind uf = new UnionFind(5);
        uf.union(0, 1);
        uf.union(1, 2);
        uf.union(3, 4);

        System.out.println("Are 0 and 2 connected? " + uf.connected(0, 2));
        // Expected Output: Are 0 and 2 connected? true
        System.out.println("Are 1 and 4 connected? " + uf.connected(1, 4));
        // Expected Output: Are 1 and 4 connected? false
    }
}
Visual Example
Common Patterns and Example Problems:
  • Dynamic Connectivity / Number of Provinces: The canonical use case. Given a set of items, you process connections one by one and use Union-Find to efficiently determine how many disjoint groups or connected components exist.

    • Number of Connected Components in an Undirected Graph (#323)
    • Number of Provinces (#547)
  • Cycle Detection in Undirected Graphs: You can detect a cycle by iterating through the edges of a graph. For each edge (u, v), if u and v are already in the same set (find(u) == find(v)), adding this edge would form a cycle.

    • Redundant Connection (#684)
    • Graph Valid Tree (#261)
  • Kruskal's Algorithm for Minimum Spanning Tree: Union-Find is the core data structure used to implement Kruskal's algorithm. Edges are processed in increasing order of weight, and an edge is added to the MST only if it connects two previously disconnected components.

    • Min Cost to Connect All Points (#1584)

Trie (Prefix Tree)

Definition: A Trie, also known as a Prefix Tree, is a specialized tree-like data structure designed for the efficient storage and retrieval of strings. Each node in the Trie represents a single character. A path from the root node to any other node represents a prefix. A special marker is used within a node to signify that the path to it forms a complete word.

Core Concepts
  • Trie Node: The fundamental building block of a Trie. A node must contain two key pieces of information:
    1. Children Pointers: A collection of pointers to its child nodes, one for each possible character. This is typically implemented as a hash map or a fixed-size array (e.g., an array of size 26 for lowercase English letters).
    2. End-of-Word Marker: A boolean flag that indicates if the path from the root to this node represents the end of a complete word that has been inserted.

Time Complexity: Let M be the length of the string (word or prefix) being operated on.

  • Insertion: O(M)
  • Search (for a full word): O(M)
  • Search (for a prefix): O(M)

The key advantage of a Trie is that its operation times depend only on the length of the string, not on the number of words already in the Trie.

Space Complexity: O(C * L), where C is the size of the character set (e.g., 26) and L is the sum of the lengths of all words in the Trie. In the worst case, if there are no shared prefixes, this can be large, but it's very efficient when many words share common prefixes.

Strengths:
  • Fast Prefix-Based Operations: Tries are faster than hash tables for prefix-based searches (e.g., startsWith). A hash table cannot efficiently find all keys with a given prefix.
  • Efficient Storage for Shared Prefixes: Words with common starting sequences (like "car", "care", and "cart") share the same prefix nodes, which can save a significant amount of space.
  • Natural Autocomplete Structure: The structure is a natural fit for building autocomplete and search suggestion systems.
Weaknesses:
  • High Memory Usage: Tries can consume a lot of memory, especially if the character set is large or if words do not share many prefixes. Each node might store many null pointers.
  • Specialized Use Case: It is highly specialized for string and prefix-based problems and is not a general-purpose data structure.

Python Example: This example implements a Trie from scratch with insert, search, and prefix search methods.

class TrieNode:
    def __init__(self):
        self.children = {} # Maps character to TrieNode
        self.is_end_of_word = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end_of_word = True

    def search(self, word: str) -> bool:
        node = self.root
        for char in word:
            if char not in node.children:
                return False
            node = node.children[char]
        return node.is_end_of_word

    def startsWith(self, prefix: str) -> bool:
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False
            node = node.children[char]
        return True

# Example Usage
trie = Trie()
trie.insert("apple")
print(f"Search for 'apple': {trie.search('apple')}")      # Expected Output: Search for 'apple': True
print(f"Search for 'app': {trie.search('app')}")        # Expected Output: Search for 'app': False
print(f"Starts with 'app': {trie.startsWith('app')}")    # Expected Output: Starts with 'app': True
trie.insert("app")
print(f"Search for 'app' after insert: {trie.search('app')}") # Expected Output: Search for 'app' after insert: True

Java Example: This example implements a Trie from scratch with insert, search, and prefix search methods.

import java.util.HashMap;
import java.util.Map;

class TrieNode {
    Map<Character, TrieNode> children = new HashMap<>();
    boolean isEndOfWord = false;
}

class Trie {
    private TrieNode root;

    public Trie() {
        root = new TrieNode();
    }

    public void insert(String word) {
        TrieNode node = root;
        for (char ch : word.toCharArray()) {
            node.children.computeIfAbsent(ch, k -> new TrieNode());
            node = node.children.get(ch);
        }
        node.isEndOfWord = true;
    }

    public boolean search(String word) {
        TrieNode node = root;
        for (char ch : word.toCharArray()) {
            if (!node.children.containsKey(ch)) {
                return false;
            }
            node = node.children.get(ch);
        }
        return node.isEndOfWord;
    }

    public boolean startsWith(String prefix) {
        TrieNode node = root;
        for (char ch : prefix.toCharArray()) {
            if (!node.children.containsKey(ch)) {
                return false;
            }
            node = node.children.get(ch);
        }
        return true;
    }
}

public class TrieExample {
    public static void main(String[] args) {
        Trie trie = new Trie();
        trie.insert("apple");
        System.out.println("Search for 'apple': " + trie.search("apple"));
        // Expected Output: Search for 'apple': true
        System.out.println("Search for 'app': " + trie.search("app"));
        // Expected Output: Search for 'app': false
        System.out.println("Starts with 'app': " + trie.startsWith("app"));
        // Expected Output: Starts with 'app': true
        trie.insert("app");
        System.out.println("Search for 'app' after insert: " + trie.search("app"));
        // Expected Output: Search for 'app' after insert: true
    }
}
Visual Example
Common Patterns and Example Problems:
  • Dictionary Implementation: The canonical use case, requiring the implementation of insert, search, and startsWith methods.
    • Implement Trie (Prefix Tree) (#208)
  • Autocomplete and Prefix Matching: Problems that require finding words with a common prefix or suggesting replacements from a dictionary.
    • Replace Words (#648)
  • Wildcard Searches: A common variation where the search pattern can include wildcard characters (e.g., '.') that match any character. This typically requires modifying the search function to use recursion or backtracking.
    • Design Add and Search Words Data Structure (#211)
  • Grid-Based Word Searching: Using a Trie to efficiently search for a large list of dictionary words on a 2D grid of characters. The Trie helps to prune the search space of the grid traversal (DFS) early.
    • Word Search II (#212)

Binary Indexed Tree (Fenwick Tree)

TBD


Segment Tree

TBD


Balanced Binary Search Tree (AVL Tree, Red-Black Tree)

TBD


Remember: The goal isn't just to memorize implementations, but to understand the underlying principles and trade-offs that make each data structure suitable for specific problems.

For a guide to the core data structures, see Core Data Structures.

For a comprehensive guide to algorithms and algorithmic techniques, check out Core Algorithms and Patterns Part 1 and Core Algorithms and Patterns Part 2.