Skip to content

Data Structures

Avery Lieu edited this page Dec 10, 2016 · 19 revisions

Linked Lists

Definition: A linear collection of nodes in which each node links to the next node.


Singly Linked List

Each node has a single pointer to the next node.

When to Use:

  • Need constant time insertion or deletion
    • Only at head (front of list)
    • Inserting elements is easier in singly linked list that in doubly linked list
  • Unknown number of elements
  • No need for random access
  • Need linked list and memory is a concern
    • Requires less memory than doubly linked list

Time Complexity:

Average Worst
Access θ(n) O(n)
Search θ(n) O(n)
Insertion θ(1) O(1)
Deletion θ(1) O(1)

Doubly Linked List

Each node has two pointers, one to the previous node and one to the next node.

When to Use:

  • Need constant time insertion or deletion
    • Only at head or tail (front or back of list)
    • Doubly linked list deletion is faster that singly linked list deletion, except when deleting the head
  • Unknown number of elements
  • No need for random access

Time Complexity:

Average Worst
Access θ(n) O(n)
Search θ(n) O(n)
Insertion θ(1) O(1)
Deletion θ(1) O(1)


Stacks

Definition: An abstract data type that supports a last in, first out (LIFO) policy. A push method is used to add an element onto the stack and a pop method is used to remove the most recent element of the stack.

When to Use:

  • List reversal
  • Parsing
  • Expression conversion and evaluation
    • Balancing delimiters
  • Tower of Hanoi

Time Complexity:

Average Worst
Access θ(n) O(n)
Search θ(n) O(n)
Insertion θ(1) O(1)
Deletion θ(1) O(1)


Queues

Definition: Elements within this abstract data type are ordered by a specified principle. An enqueue method adds an element to the back of the queue, while a dequeue method removes an element from the front of the queue.


FIFO

The traditional queue policy is first in, first out (FIFO), just like a line for a service. Elements which are added to the queue more recently are accessible sooner than elements added to the queue later on.

When to Use:

  • Need to maintain order
  • Packet transmission
  • Providing a service in which clients are all valued equally

Time Complexity:

Average Worst
Access θ(n) O(n)
Search θ(n) O(n)
Insertion θ(1) O(1)
Deletion θ(1) O(1)

Deque

Elements can be inserted and removed from both the front and back of this type of queue.

When to Use:

  • Need access to old and new elements
  • Providing a service that is likely to require clients to return
    • A fast food cashier serves a customer their order but forgets to include ketchup packets
      • The customer is allowed to cut the line to get ketchup packets.

Time Complexity:

Average Worst
Access θ(n) O(n)
Search θ(n) O(n)
Insertion θ(1) O(1)
Deletion θ(1) O(1)

Priority Queue

Similar to a queue, except elements are assigned priorities, and these priorities determine the element's position in the queue. A priority queue is typically implemented with a min or max heap.

When to Use:

  • Event-driven scheduling
  • Data compression
  • Graphing algorithms (Dijkstra, Prim)
  • Bandwidth management

Time Complexity:

Average Worst
Access θ(1) O(1)
Search θ(n) O(n)
Insertion θ(nlog(n)) O(nlog(n))
Deletion θ(nlog(n)) O(nlog(n))


Trees

Definition: A hierarchical structure of parent-child relationships between nodes. Each parent will one level higher on the structure than its children, respectively. Trees are non-cyclic but may be linear.


BST

Ordered upon the principle that a node's left child must have a value less than the parent's value and its right child must have a value greater than the parent's value. It is common for repeated nodes to be disallowed; however, in the case of a BST which allows repeated nodes, either the left child condition or the right child condition can be modified to handle equal values.

When to Use:

  • Goal of average case of log(n) for all operations
  • Need memory efficient solution to storing data
  • Expression evaluation
  • Data compression

Time Complexity:

Average Worst
Access θ(log(n)) O(n)
Search θ(log(n)) O(n)
Insertion θ(log(n)) O(n)
Deletion θ(log(n)) O(n)

AVL

Self-balancing binary search tree in which the difference in height between left and right subtrees cannot exceed one for all nodes. If a node is inserted or deleted such that the height condition is violated, an AVL tree will reorder itself to meet its height conditions and standard BST conditions.

When to Use:

  • Lookup time is priority
    • Fastest of the trees in terms of lookup time
  • Not concerned with insertion and deletion times
    • Slow insertions and deletions due to strict height requirements
    • Requires rebalancing often

Time Complexity:

Average Worst
Access θ(log(n)) O(log(n))
Search θ(log(n)) O(log(n))
Insertion θ(log(n)) O(log(n))
Deletion θ(log(n)) O(log(n))

Red-Black

Self balancing binary search tree with each node being assigned a color, either red or black. It is similar to an AVL tree; however, balancing is not as rigid (the farthest leaf from the root can be up to twice as far as the nearest leaf from the root) and reorganizing the tree is done in a more efficient manner. The following are the requirements for a red-black tree:

  1. Nodes are either red or black
  2. Root node is black
  3. Leaves are black
  4. Red nodes have black children
  5. The number of black nodes between any node and its leaves is the same

When to Use:

  • Guaranteed O(log(n)) for all operations
  • Balance between fast look ups and efficient insertions and deletions

Time Complexity:

Average Worst
Access θ(log(n)) O(log(n))
Search θ(log(n)) O(log(n))
Insertion θ(log(n)) O(log(n))
Deletion θ(log(n)) O(log(n))


Hash Tables

Definition: An associative structure with mappings between keys and values. A hash function is used on a key to compute an index for a given value. Furthermore, once a hash table has been filled passed a predefined load factor, which is a ratio of the number of filled positions to the number of total positions, the hash table is resized. Ideally, hash tables have the following properties:

  • Size of hash table is prime at initialization and upon resizing
  • Values are evenly distributed
  • Hash function minimizes the number of collisions (keys hashing to same index)

In the event of a collision, either open addressing or separate chaining can be used to resolve the collision.


Open Addressing

Resolves collisions by probing alternative locations for availability. There are three main types of probing:

  1. Linear probing
  2. Quadratic probing
  3. Double hashing

In linear probing the interval of the probe is fixed at a constant value, so if the constant was set to 1 and a collision occurred at position i, then the probe would follow the sequence i+1, i+2, i+3, ..., i+n, until an open position was found.

In quadratic probing the interval of the probe is incremented linearly and then squared, so if the constant was set to 1 and a collision occurred at position i, then the probe would follow the sequence i+12, i+22, i+32, ..., i+n2, until an open position was found.

In double hashing the interval of the probe is fixed at a constant value and run through a second hash function. The resulting value is added to the initial index in which the collision occurred, so if the constant was set to 1 and there is a second hash function H2(x), then the probe would follow the sequence i+H2(1), i+H2(2), i+H2(3), ..., i+H2(n), until an open position was found.

When to Use:

  • Fast look ups
    • Typically faster than separate chaining
  • Small load factor
    • Outperforms separate chaining when load factor is small
  • Ordering is not relevant

Time Complexity:

Average Worst
Access N/A N/A
Search θ(1) O(n)
Insertion θ(1) O(n)
Deletion θ(1) O(n)

Separate Chaining

Each position in the hash table contains a bucket of some sort (linked list, binary trees). Whenever a collision occurs, the value is inserted into that bucket. If multiple values are contained in a bucket and the index for that bucket is called, then that data structures find method must be invoked. Typically, only one value should be associated with each bucket, and three values at upper limit; otherwise, the average case for lookup times will approach nearer to θ(n).

When to Use:

  • Fast look ups
  • Large load factor
    • Outperforms open addressing as load factor increases
  • Ordering is not relevant
  • Need simplicity
    • Easier to implement than open addressing hash tables

Time Complexity:

Average Worst
Access N/A N/A
Search θ(1) O(n)
Insertion θ(1) O(n)
Deletion θ(1) O(n)


Clone this wiki locally