diff --git a/src/algorithm_practice/Datastructure_Algorithms/Graph/roadsAndLibraries/README.md b/src/algorithm_practice/Datastructure_Algorithms/Graph/roadsAndLibraries/README.md index b5caadfe..f2d95a5f 100644 --- a/src/algorithm_practice/Datastructure_Algorithms/Graph/roadsAndLibraries/README.md +++ b/src/algorithm_practice/Datastructure_Algorithms/Graph/roadsAndLibraries/README.md @@ -1,84 +1,92 @@ -Source: https://www.hackerrank.com/challenges/torque-and-development/ +# Roads and Libraries -TODO(domfarolino): revise this post. +Source: https://www.hackerrank.com/challenges/torque-and-development/ This is a pretty interesting graph problem. It vexed me for a bit until I made some cruicial realizations. # Divide the problem into connected components -When starting with this problem I fumbled around quite a bit. Eventually I came to some good realizations: +When starting with this problem I fumbled around quite a bit, but eventually I came to some realizations revolving +around focusing on each connected component in the given graph: - - We'll need at least one library per connected component - - In each component, there are two extremes: + - We'll need at least one library per connected component in the graph + - In each connected component, there are two extremes: - Every city in a connected component has a library - Only one city in a connected component has a library -My next thought was that the naive solution would be to find all possible combinations of library/road -allocations in between the extremes, which seems combinatorially explosive. For example, what if there -were not the extreme `n` libraries and `0` roads in a component, but instead `n - 1` libraries and `1` -road, or `n - 2` libraries and `2` roads. How many different ways can we -[*choose*](https://en.wikipedia.org/wiki/Binomial_coefficient) how to allocate which cities have libraries -and which cities to connect, and more importantly, does the choosing of these actually affect the cost? -Determining the number of possible choices we can make when allocating libraries to cities is actually pretty -easy (it's just the summation of binomial coefficients, [see here](https://math.stackexchange.com/questions/519832/)), -it would just be combinatorially explosive to go through each one; was it necessary? +My next thought was that the naïve solution would be to find all possible combinations of allocated libraries +and built roads between the aforementioned extremes, which seems combinatorially explosive. For example, what +if there were not the extreme `n` libraries and `0` roads in a component, but instead `n - 1` libraries and `1` +road, or maybe `n - 2` libraries and `2` roads. It is obvious to determine how many different ways we can +[*choose*](https://en.wikipedia.org/wiki/Binomial_coefficient) to allocate `n - k` libraries amongst `n` cities +in a component. But how many different ways can we decide what roads to build given a particular allocation? Do +the different choices of which roads to build for some given library allocation actually affect the cost? + +When looking at an example graph with five (once-) connected cities, I realized that the allocation of libraries +doesn't matter at all and won't affect the cost. (I had considered the idea that perhaps the degree of each city +might have an affect on, or indicate priority of library assignment). Which cities we choose to build libraries +in is irrelevant as long as we don't waste a road connecting two library-bearing cities when we could use it to +connect a non-library-bearing city to a library-bearing one. -When looking at an example graph with five (once-) connected cities I realized that the allocation of libraries -doesn't matter at all and won't affect the cost. (I was considering the idea that perhaps the degree of each city -might have an affect on, or indicate priority of library assignment). The allocation makes no difference as long -as we don't waste a road connecting two library-bearing cities, because why would we do that? +> Warning, entering a tangent: -[Enter a tangent]... +![entering a tagent](https://apps.azdot.gov/files/traffic/moas/thumbgif/w/w01-010.gif) The whole reason this accidentally-connecting-two-library-bearing-cities issue came up is because I was examining a -quite feasible 5-city graph with a cycle trying to allocate `3` libraries and `2` roads. I wondered if I could choose -a "bad" allocation of libraries and roads, namely one that doesn't actually connect each city in the component. This is -certainly possible in a graph with cycles when only dealing with `numberOfCities` resources (`3` libraries and `2` roads). +quite feasible 5-city graph with a cycle, trying to allocate `3` libraries and `2` roads. I wondered what would happen +if I chose to waste the building of a road on connecting two cities that both bore libraries, instead of building a road +from a city that was not connected to a library to a city in the same component that was. This "bad allocation" can happen +because of this cycle (effectively wasting a road on a group of cities that don't need another built). -I was then worried about making sure my implementation would not accidentally theoretically waste a road on two -library-bearing cities, and then I realized well yeah, if the allocation doesn't matter, we just have to know that -some working allocation exists, and that will be the minimum total cost for such choices of the number of libraries -and roads for that connected component. +I was wondering how I could ensure my implementation would not accidentally do this, but then I realized it wouldn't be +necessary. *Some* proper allocation of built roads exists, and as long as I know it exists, I don't have to worry about +accidentally choosing a "bad allocation", because I'll be using the same number of roads either way! And I know that by +using that number of roads, it *is* possible to connect all cities, therefore the exact layout is meaningless. # A connected component is at least a tree -The "choice" of which roads to build dissolves when you realize that the connected component by definition is at least a -tree, and thus always has valid allocations of libraries and roads in the form of: +The reason there is definitely *a* working allocation of roads to build is true is because we can ignore cycles that would +otherwise form a "bad allocation", because the connected component contains *at least* the roads necessary to connect all +cities without roads that form cycles. In other words, the connected component is at least a tree, so mathematically *some* +non-wasteful layout of roads exists. + +A group of connectable cities can therefore have all of its cities connected to libraries in `N` different ways, where `N` +is the number of cities in the group: `N - K` libraries + `K` roads, `∀ K < N` (remember, we need at least one library). -This means each connected component had `N` possible solutions, and for each of the values of `K`, we needed to choose the -minumum one. Going through some examples I realized the best answer always seemed to be one of the extreme allocations, namely -an allocation with all `N` libraries or only `1` library. I tried to find an example where one of the middleground less -extreme allocations could be more optimal, but I came to the conclusion that that will never be the case, because we greedily -want to choose to employ as many of the cheapest resource (either libraries or roads) as possible. In other words, if roads were -cheaper to build then libraries, and there exist the possible roads to repair to connect the entire component (the definition!), -then we'd want to only build `1` library, and as many remaining roads as we'd need. We could build two libraries, and one less -road, but that would give us the same connected result but with a higher cost, unnecessarily. +This means each connected component has `N` possible solutions, and for each of the values of `K` (ranging from `0` to `N - 1`), +we need to choose the most cost-efficient one. Going through some examples I realized the best answer always seemed to be one of +the extremes, namely a layout containing all `N` libraries and `0` roads, or only `1` library and `N - 1` roads. Trying to find +an example in which one of the middleground distributions was more optimal, I eventually came to the conclusino that this will never +be the case, because we greedily want to choose to employ as many of the cheapest resource (libraries or roads) as possible! + +In other words, if roads were cheaper to build then libraries, and there exist the possible roads to repair to connect the entire +component (the definition!), then we'd want to only build `1` library and as many remaining roads as we'd need. We could build two +libraries, and one less road, but that would give us the same connected result but with a higher cost, unnecessarily. # Implementation design When thinking about the implementation, I knew the number of connected components was relevant to this problem. I also knew -we could get an entire connected component (but more importantly its size) using a trivial-to-implement BFS algorithm. I figured +we could get an entire connected component (but more importantly, its size) using good ole BFS (DFS suffices too). I figured I'd use an adjecency list to store the graph, since I wasn't going to perform any operations that a matrix would be more suited for. The necessary steps were something like this: - Build the graph's adjacency list - - For each connected component + - For each connected component: - Get the size of the component - - Minimal cost of connecting this component was `min(a, b)` where: - - `a = numCities * costLib` - - `b = costLib + (numCities - 1) * costRoad` - - With the minimal cost of the component in hand, add the value to the running some, and perform the same operation for the next component. + - Compute the minimal cost of connecting this component (rebuilding existing roads), which is `min(a, b)` where: + - `a = componentSize * costLib` + - `b = costLib + (componentSize - 1) * costRoad` + - With the minimal cost of the component in hand, add the value to a running sum, and perform the same operation for the next component. -Moving from component-to-component is as easy as just using BFS with some sort of global visitation store. +Moving from component-to-component is as easy as just using BFS with some sort of global visitation data structure. We can try to find a connected component from each given city. The first time we run BFS, we'll mark *all* nodes in the discovered component as visited. Then in the next given city, we'll try to find another connected component *if* -the city has not already been visited (does not exist as a part of an already-discovered connected component). We keep -a running sum, adding to it the minimum cost required to connect a once-connected component, and eventually return the -final value. +the city has not already been visited (does not exist as a part of an already-discovered connected component). Eventually +we'll return the value of a running sum we've kept (as mentioned above). -Time complexity: O(n) (by marking nodes as visited, we're repeating ourselves) +Time complexity: O(n) Space complexity: O(n) *It should be noted that the complexity of this algorithm could easily by O(n^2) (due to edge processing in the complete