Skip to content

Latest commit

 

History

History
296 lines (187 loc) · 11.7 KB

8. graph_algorithms.md

File metadata and controls

296 lines (187 loc) · 11.7 KB

Graphs

A graph is a data structure designer to show relationships between objects, modelling relationships between elements.

A graph objects are referred to as nodes or vertexes with the lines connected them edges.

Edges can store data, often about the strength of a connection or distance within the graph space.

Why are they useful

Not only do graphs capture relationships, but they support inference, we can reason about them. We're trying to build computational models - build models in code that simulate a problem and allow us to test or predict what will happen.

The world is full of networks that are based on relationships that can be captured and represented by a graph.

  • Computer networks - moving content on the web
  • Transportation networks - moving people and things
  • Financial networks - moving money
  • Sewers, Water...

Sometimes you want to maximise flow across the network, at other times you want to find the fastest route.

Graphs capture interesting relationships.

Four common questions we can ask;

(1) Finding sequences - is there a path from A to B?

(2) Find the lest expensive path - what is the shortest path from A to Z?

  • Shortest path problem

(3) Partitioning the graph - nodes may not have some connection to all other nodes. Can I find and separate the graph into sets of connected elements?

  • Graph partition problem

(4) Min-cut/Max-flow - are there separate sets of connected clusters within my graph?

  • Min-cut/Max-flow problem

The first reported use of graph theory was in the early 1700s, called "the Bridges of Keonigsberg". There are seven bridges that connect various islands. The question posed was "is it possible to take a walk that traverses each of the seven bridges exactly once?"

Euler solved this problem

  • Make each island a node
  • Make each bridge an undirected edge

The answer is no

Properties

Directions

Edges can have a direction. This means that the relation between two nodes applies only one way (in maths speak, the relationship is not transitive).

When edges have a sense of direction the graphs is a directed graph.

For example a graph that represents travel plans. You must start at a node and traverse from node to node, there is a start and end node. If you do a reverse trip, you will need new edges to represent that direction.

An undirected graph has edges with no sense of direction. No sense of dependency.

Cycles

A graph can have cycles. A cycle is when you can follow the edges back to a node (loop back to where you started).

Note, a Tree graph cannot have a cycle. It is acyclic.

Connectivity

A connected graph has no disconnected vertices. A graph is said to be connected if every pair of vertices in the graph is connected.

In Graph Theory connectivity can be measured. The concept of connectivity is the minimum number of elements (nodes or edges) that need to be removed for the graph to become disconnected.

For example, say the image below represents two friendship groups. If you were to remove just one connection in the left group, the graph would becomes disconnected.

Graph representations

OOP

Graphs can be functionally the same but built and represented in a number of different ways.

Object orientated languages might use Vertex and Edge objects to represent a graph.

class Vertex(object):
    id: str
    name: str
    edges: List[Edge]

class Edge(object):
    id: str 
    strength: int
    nodes: List[Vertex]

From `MIT's Computational Thinking;

class Node(object):
    def __init__(self, name):
    """Assuming the name is a string"""
    self.name = name
    def getName(self):
        return self.name
    def __str__(self):
        return self.name

class Edge(object):
    def __init__(self, src, dest):
    """Assumes the arguments src and dest are nodes, not names, but instances of nodes"""
        self.src = src
        self.dest = dest
    def getSource(self):
        return self.src
    def getDestination(self):
        return self.dest
    def __str__(self):
    # printing the representation of an edge
    # name of source, arrow and then destination
    # this is why we need instances of the nodes, 
    # so we can call the method getName on each
    return self.src.getName() + '->'\
            + self.dest.getName()

However, this can make traversing through graphs inconvenient.

Edge List

An edge list is simply a list of edges.

The edges are represented as a list of two elements (typically numbers that represent vertices) that have an edge between them.

Adjacency List

In an adjacency list vertices have an ID number that corresponds to the index in an array. Node 0 is represented as adjacent_list[0], Node 1 is represented as adjacent_list[1]

Each element of the adjacent list is used to store a list of nodes that the node of that ID is adjacent to, i.e. can be reached with a single edge.

In the example below

  • index 0 shows that node 0 is adjacent to node 1
  • index 1 shows that node 1 is adjacent to nodes 0, 2, 3
  • index 2 shows that node 2 is adjacent to nodes 1, 3
  • index 3 shows that node 3 is adjacent to nodes 1, 2

Adjacency Matrix

A matrix is a 2D array (a list of lists), where the lists inside are all the same length.

  • The rows are all the sources,
  • The columns all the destinations,
  • If there is an edge between a source and a destination insert a 1, else 0

adj_matrix[0][1] == 1  # node 1 is adjacent to node 0

Every edge will appear twice in the matrix. However, if we're creating a directed graph, it's not symmetric. There may be a one between S and D but not between D and S.

It's not the most efficient way of representing things. You have to go into the matrix to look things up, and there could be a huge matrix with lots of zeros, essentially holding redundant information.

Some examples

A Tree is a type of Graph with certain properties. A tree is a special kind of directed graph - information flows from parents to children

A tree is an undirected graph G that satisfies any of the following equivalent conditions [ref]:

  • G is connected and acyclic (contains no cycles).
  • G is acyclic, and a simple cycle is formed if any edge is added to G.
  • G is connected, but would become disconnected if any single edge is removed from G.
  • G is connected and the 3-vertex complete graph K3 is not a minor of G.
  • Any two vertices in G can be connected by a unique simple path.

A DAG (directed acyclic graph) is another example (some Tree graphs are DAGs)

A directed graph or digraph is a graph in which edges have orientations.

It consists of vertices and edges, with each edge directed from one vertex to another, such that following those directions will never form a closed loop. A directed graph is a DAG if and only if it can be topologically ordered, by arranging the vertices as a linear ordering that is consistent with all edge directions.

Graph Traversal

There are two basic methods for traversing graphs. There are the basis for many more advanced algorithms.

Depth first - always follow the next available edge until I get stuck

Breadth first - always follow the exploring the next equal length option

(1) Depth First Search

Follow a path "down" the graph, before moving "across".

Similar to take the left most depth most method on a search tree, but because you're using graphs there are the ability for loops, therefore you need to keep track of what's in the path and don't go back to a node in the path.

  • Start with the source node
  • Look at all the edge that leave that node, in some order
  • Follow the first edge
  • Check to see if you're at the right location, if yes then done, if not then follow the first edge out of that node.
  • Keep doing that loop until you find the goal node or run out of options
  • When you hit a dead-end, work back one at a time until you hit a node where you could have taken another route.

DFS typically uses a stack (LIFO) to keep track of nodes visited.

  • If there are more edges and unseen nodes, keep repeating the step picking an edge, follow it and mark the node as seen
  • If you hit a node you've seen before, go back to the previous nodes and try another edge
  • If you run out of edges with new nodes, pop the current node from the stack and go back to the one before it
  • Follow this approach until you find the node you were looking for, or you've popped everything from the stack

Alternatively, you could use recursion, as opposed to a stack.

  • The base case is there are no more unseen nodes.
  • You move back to the last level of recursion, which will be the last seen node.

We are visiting every vortex once and every edge twice (as we have to go back). Runtime of DFS;

$$ O(|E| + |V|) $$

(2) Breadth First Search

Look at nodes adjacent, before moving to the next level.

  • Start with an initial node
  • Follow the first edge as before
  • If I'm not at the right place, I stay at the initial node at try one of the other options I could have taken.
  • Explore all paths of length one, before exploring paths of length two.
  • Once I find a solution I know I can stop, because any future path will be longer.

BFS typically uses a queue (FIFO) to keep track of nodes visited.

  • Start with a node and visit every node adjacent to it
  • Mark each node as seen and add them to a queue
  • When we run out of edges we de-queue a node from the queue, and use that as our next starting point
  • We do this until we've exhausted our options of new nodes

BFS is equivalent to creating a tree out of a graph.

  • The node we started with becomes the root
  • The group of adjacent nodes is the next level in the tree

The runtime is the same as we're visiting every edge and vertex during our traversal

$$ O(|E| + |V|) $$

The shortest path problem

The shortest path is the one where the sum of the edges is as small as possible. If the edge have no weights, then it is the path with the least number of edges that have to be traversed.

The solution of the problem varies greatly depending on the type of graph. For an unweighted graph, this is just breadth first search.

For weighted undirected graphs, we can use Dijkstra's algorithm.

  • The node we start with has a distance of 0
  • We place all other nodes in a min priority queue, each with a placeholder weighting of infinity

  • We remove our starting node from the queue
  • We follow each edge of our node and update the values in the queue

We run extract min on the queue to choose our next node to follow (this is a greedy algorithm as we are picking the option that looks best at the moment).

We repeat the process until the node we are looking for has been extracted from the queue or we have a path of infinity (i.e. the node doesn't exist).

The run time is the number of vertices squared, as the worst case is visiting each element once or twice and having to search through the queue to find the minimum element.

$$ O(|V|^2) $$

Note

Dijkstra's algorithm can solve the single-source shortest path only when the edges have non-negative weights. In other words, Dijkstra's algorithm can not work if:

  • There are edges with negative weights.
  • There are negative cycles (in directed graphs). A negative cycle is one that has negative weights. It can reduce the cost of the "shortest path" every time the cycle is traversed. However, the algorithm works fine with positive cycles.