Skip to content

Latest commit

 

History

History
267 lines (197 loc) · 12.2 KB

7. greedy_algorithms.md

File metadata and controls

267 lines (197 loc) · 12.2 KB

Greedy

A greedy algorithm choses the "best" solution first, then the next best, and continues until reaching the limit.

greedy technique, we follow the best possible solution at each step of the algorithm

The following Knapsack problem from MIT's "Introduction to computation and programming" provides a good illustration;

A burglar has a knapsack that can hold at most 20 pounds of loot. They break into a house and find the following items, how do they decide what to take and what to leave behind?

Value Weight Value/Weight
Clock 175 10 17.5
Painting 90 9 10
Radio 20 4 5
Vase 50 2 25
Book 10 1 10
Computer 200 20 10

The thief would have to decide what "best" should mean. Though greedy-by-density yields the best result in this case (start by taking the vase, then the clock, then painting, use value to break the tie when value/weight is equal) there is no guarantee that a greedy-by-density algorithm is always the best approach.

More generally, there is no guarantee that any solution to this kind of knapsack problem that is found by a greedy algorithm will be optimal - _there is probably some deep moral lesson to be extracted "greed is not good".

Because we are not considering future scenarios (and are only concerned with the best choice at each step), a greedy solution might not be the most effective solution for the problem.

In "algorithms to live by", the authors describe greedy algorithms as "myopic algorithms": one that short-sightedly takes the best thing available every step of the way.

Famous greedy algorithms

  1. Dijkstra's Shortest Path Algorithm
  2. A* search Algorithm
  3. Prim's algorithm for Minimal Spanning Tree
  4. Kruskal's algorithm for Minimal Spanning Tree
  5. Knapsack Problem
  6. Travelling Salesman Problem

Dijkstra's Shortest Path Algorithm

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a graph;

For a given source node in the graph, the algorithm finds the shortest path between that node and every other. It can also be used for finding the shortest paths from a single node to a single destination node by stopping the algorithm once the shortest path to the destination node has been determined.

Implementation

Find the shortest path to all the nodes starting from a given single source node.

  1. Create a result dictionary. At the end of the program, result will have the shortest distance (value) for all nodes (key) in the graph. For our example, it will become as {'A': 0, 'B': 5, 'C': 3, 'D': 2, 'F': 6, 'E': 4}

  2. Start with the source node. Distance from source to source itself is 0.

  3. The distance to all other nodes from the source is unknown initially, therefore set the initial distance to infinity.

  4. Create a set unvisited containing nodes that have not been visited. Initially, it will have all nodes of the graph.

  5. Create a path dictionary that keeps track of the previous node (value) that can lead to the current node (key). At the end of the program, for our example, it will become as {'B': 'A', 'C': 'D', 'D': 'A', 'F': 'C', 'E': 'C'}.

  6. As long as unvisited is non-empty, repeat the following:
  • Find the unvisited node having smallest known distance from the source node.

  • For the current node, find all the unvisited neighbours. For this, you have calculate the distance of each unvisited neighbour.

  • If the calculated distance of the unvisited neighbour is less than the already known distance in result dictionary, update the shortest distance in the result dictionary.

  • If there is an update in the result dictionary, you need to update the path dictionary as well for the same key.

  • Remove the current node from the unvisited set.

Note - This implementation of the Dijkstra's algorithm is not very efficient. Currently it has a O(n^2) time complexity. We will see a better version in the next lesson - "Graph Algorithms" with O(nlogn) time complexity.

import sys
from collections import defaultdict

class Graph:
    def __init__(self):
        self.nodes = set()                   # A set cannot contain duplicate nodes
        self.neighbours = defaultdict(list)  # Defaultdict is a child class of Dictionary that provides a default value for a key that does not exists.
        self.distances = {}                  # Dictionary. An example record as ('A', 'B'): 6 shows the distance between 'A' to 'B' is 6 units

    def add_node(self, value):
        self.nodes.add(value)

    def add_edge(self, from_node, to_node, distance):
        self.neighbours[from_node].append(to_node)
        self.neighbours[to_node].append(from_node)
        self.distances[(from_node, to_node)] = distance
        self.distances[(to_node, from_node)] = distance    # lets make the graph undirected / bidirectional 
        
    def print_graph(self):
        print("Set of Nodes are: ", self.nodes)
        print("Neighbours are: ", self.neighbours)
        print("Distances are: ", self.distances)


def dijkstra(graph, source):
    """Find the shortest path from the source node to every other node in the given graph"""

    # Declare and initialize result, unvisited, and path
    result = {i: sys.maxsize if i != source else 0 for i in graph.nodes}  # placeholder, by default set distance to maxsize

    path = dict()

    unvisited = set(graph.nodes)

    while unvisited:  # As long as unvisited is non-empty
        min_node = None
        # Find the unvisited node having smallest known distance from the source node.
        for node in unvisited:
            if min_node is None:  # base case
                min_node = node
            elif result[node] < result[min_node]:
                min_node = node  # switch the nodes, so start with source, then next lowest...

            """tried to be fancy"""
            # d = {i[0][1]: i[1] for i in graph.distances.items() if i[0][0] == node}
            # min_node = min(d, key=d.get)
            # result[min_node] = d[min_node]

        current_distance = result[min_node]

        # For the current node, find all the unvisited neighbours. 
        # For this, you have calculate the distance of each unvisited neighbour.

        # unvisited_neighbours =  unvisited.intersection(graph.neighbours[min_node])  does not work, might not be a path between nodes
        for neighbour in graph.neighbours[min_node]:
            if neighbour in unvisited:
                distance = current_distance + graph.distances[(min_node, neighbour)]

            # If the calculated distance of the unvisited neighbour is less than the already known distance in result dictionary,
            # update the shortest distance in the result dictionary.        
                if distance < result[neighbour]:
                    result[neighbour] = distance
                    path[neighbour] = min_node
                    
        # Remove the current node from the unvisited set.
        unvisited.remove(min_node)

    # should do an ASSERT to check no values in result dict equal sys.maxsize

    return result

In some fields, artificial intelligence in particular, Dijkstra's algorithm or a variant of it is known as uniform cost search and formulated as an instance of the more general idea of best-first search.

Minimum Platforms

Given arrival and departure times of trains on a single day in a railway platform,find out the minimum number of platforms required so that no train has to wait forthe other(s) to leave. In other words, when a train is about to arrive, at least oneplatform must be available to accommodate it.

You will be given arrival and departure times both in the form of a list. The size of both the lists will be equal, with each common index representing the same train. Note: Time hh:mm would be written as integer hhmm for e.g. 9:30 would be written as 930. Similarly, 13:45 would be given as 1345

Example:
Input: A schedule of 6 trains:

arrival = [900,  940, 950,  1100, 1500, 1800]
departure = [910, 1200, 1120, 1130, 1900, 2000]

Expected output: Minimum number of platforms required = 3

The greedy approach

Sort the schedule, and make sure when a train arrives or depart, keep track of the required number of platforms. We will have iterator i and j traversing the arrival and departure lists respectively. At any moment, the difference (i - j) will provide us the required number of platforms.

At the time of either arrival or departure of a train, if i^th arrival is scheduled before the j^th departure, increment the platform_required and i as well. Otherwise, decrement platform_required count, and increase j. Keep track of the max value of platform_required ever, as the expected result.

"""
Given arrival and departure times of trains on a single day in a railway platform,
find out the minimum number of platforms required
so that no train has to wait for the other(s) to leave.
In other words, when a train is about to arrive, at least one platform must be available to accommodate it.

You will be given arrival and departure times both in the form of a list. 
The size of both the lists will be equal, with each common index representing the same train. 
Note: Time `hh:mm` would be written as integer `hhmm` for e.g. `9:30` would be written as `930`.

Example
Input:  A schedule of 6 trains:
arrival = [900,  940, 950,  1100, 1500, 1800]
departure = [910, 1200, 1120, 1130, 1900, 2000]

Expected output: Minimum number of platforms required = 3
"""
from typing import List

def min_platforms(arrival: List[int], departure: List[int]) -> int:
    if(len(arrival) != len(departure)): # Mismatch in the length of the lists
        return -1

    # Sort both the lists.    
    arrival.sort()
    departure.sort()
    
    platform_required = 1              # Count of platforms required at the moment when comparing i^th arrival and j^th departure
    max_platform_required = 1          # Keep track of the max value of platform_required

    # Iterate such that (i-j) will represent platform_required at that moment
    i = 1
    j = 0

    # Traverse the arrival list with iterator `i`, and departure with iterator `j`.
    while i < len(arrival) and j < len(arrival):
        
        # if i^th arrival is scheduled before than j^th departure, 
        # increment platform_required and i as well.
        print(f"new arrival: {arrival[i]}, departure: {departure[j]}")
        if arrival[i] < departure[j]:
            platform_required += 1
            i += 1

            # Update the max value of platform_required
            if platform_required > max_platform_required:
                max_platform_required = platform_required
         
        # Otherwise, decrement platform_required count, and increase j.
        else:
            platform_required -= 1
            j += 1

    return max_platform_required

arrival1 = [200, 210, 300, 320, 350, 500]
departure1 = [230, 340, 320, 430, 400, 520]  
assert min_platforms(arrival1, departure1) == 2


arrival2 = [900,  940, 950,  1100, 1500, 1800]
departure2 = [910, 1200, 1120, 1130, 1900, 2000]
assert min_platforms(arrival2, departure2) == 3

Minimum Operations

Starting from the number 0, find the minimum number of operations required to reach a given positive target number. You can only use the following two operations:

  1. Add 1
  2. Double the number

For Target = 18, output = 6, because it takes at least 6 steps shown below to reach the target

  • start = 0
  • step 1 ==> 0 + 1 = 1
  • step 2 ==> 1 * 2 = 2 # or 1 + 1 = 2
  • step 3 ==> 2 * 2 = 4
  • step 4 ==> 4 * 2 = 8
  • step 5 ==> 8 + 1 = 9
  • step 6 ==> 9 * 2 = 18
def min_operations(target: int) -> int:
    """
    Return number of steps taken to reach a target number
    input: target number (as an integer)
    output: number of steps (as an integer)
    """
    num_steps = 0
    
    # start backwards from the target
    # if target is odd --> subtract 1
    # if target is even --> divide by 2
    while target != 0:
        if target % 2 == 0:
            target = target // 2

        else:
            target = target - 1
        num_steps += 1
    return num_steps