404 Page does not exist!
+Please use the search bar at the top or visit our homepage!
+diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/404.html b/404.html new file mode 100644 index 0000000..f186104 --- /dev/null +++ b/404.html @@ -0,0 +1,282 @@ + + +
+ + + + + ++ Data science from Granada to the world. +
+Please use the search bar at the top or visit our homepage!
++ Data science from Granada to the world. +
+In my last post I explained the usefulness of Hidden Markov Models for predicting the outcome of a padel match with only a few observations. There I also showed how easy it was to implement everything in Python, but I left the most important part: the HMM itself. Today we are going to learn how to design a HMM to predict the result of a point. This is going to be an iterative process. The final model, as you will see, is a monstruosity. But step by step we are goint to build it succesfully. “Rome wasn’t made in a day”.
+ +Before diving into the details, let me explain how I tackled this problem. I think the creative process is worth mentioning. If you just want the details, you can skip this section. Before starting to code or thinking on my own for the solution of a problem I always research for similar problems that are already solved so that I can get some inspiration. In this case, it turned out that somebody had already designed a HMM for tenis. In this article, they present a HMM for following the state of a tennis match. The observations in this article were poorly defined, but the hidden states wer very clear. I could get an idea of what I had to do by just looking at this image.
+ + + +This graph is just depicting visually the rules of tenis. In our case we just have to show the rules of padel with a similar graph. If you look carefully you will see that the white boxes represent hidden states with an obvious observation. For padel there will be more of them because apart from bouncing on the ground, the ball can bounce on the walls, which makes the graph more complex but the idea is the same. Another helpful diagram on the same article represented the same graph but organised in several subgraphs.
+ + + +What’s interesting here is that those four subgraphs will be the same for padel. The high-level view of the process (top) is exactly the same. That’s part of the job which is already done. We just have to change the internal representation between those four subgraphs and maintain the interconnections.
+ +What about designing graphs? What is the software I used for that? You may say. Well, it is actually a pen and a lot of papers. No matter how well developed graph visualizations software are, drawing a simple graph by hand will always be faster than coding it. Of course, when the project keeps growing and the graphs becomes massive you will need those software tools which I will mention later. But at the beginning, just take a pen and start drawing. In the other sections I will present diagrams made with a computer because they are visually more pleasant and you will understand them better. Nevertheless, here is one of the graphs I painted by hand, in case you are curious.
+ + + +For a HMM we need the transition and emission matrices. The transition matrix is going to be the adjacency matrix of the transitions graph. That graph is simply a representation of the rules of the game. I am going to distinguish two main parts in that rules. The rules for the serve and the rules for the normal game. The reason for creating two distinct graphs is because the effect of the ball going out or touching the net is different at the beginning.
+ +What happens when a player is on their serve? It can go in, it can go out or it can touch the net. And if it touches the net, it can then go in or out. Let’s ignore when it goes in without touching the net for the moment. How would you represent the 1st serve? Like this?
+ + + +Did you think about the init state? Remember, this model is a realistic one. In practice you don’t know when is the point starting. Therefore, you need a special state for waiting until you have enough evidence that the match has begun. If you look closely, you will see that there is a self-loop on the init state. That is how we represent a waiting state in a HMM, by a self-loop. Now, let’s pass to the 2nd-serve. How would you design it?
+ + + +Exactly the same as the first service. The simpler, the better. We are ignoring when the ball goes in and the game continues. We are just focusing on when the ball goes out or to the net. The rest of the details will be added later on. For now let’s just focus on what we have. We have a graph with several nodes, each representing a state. What do we need? Consistent labels across the whole graph. One of the limitations of the HMM is that you have to fulfill the Markov property. Which means that the state representing going out after the 1st-serve is different than the state of going out after the 2nd-serve. So they need different names. In my case, I just added a suffix number when that happened. That way, going out in the first service is ‘out1’ and after the second is ‘out2’. Another state that repeats a lot across the graph is the ‘in’ state. For that one adding a suffix number is not enough, so I added a more descriptive suffix. For instance, going in after touching the net in the first serve is ‘in-net1’. This is a decision that could have been made of many ways, but I decided to make it like this.
+ +Okay, let’s now talk about what happens when the ball actually goes in and the game continues. To keep it simple, let’s focus on what happens before any player hits the ball. And let’s call this the ace model. As the name states it, one of the things than can happen is an ace. What characterizes an ace? The fact that the ball touches the ground again before any player hitting it. Try to draw the scheme for the ace model. Keep in mind that before touching again the ground it can hit the walls. And also bare in mind that there are two types of walls. What are the connections among those states? Which combinations are valid and which not? Here is my solution for that problem, omitting the connections to the states Point-server and Point-receiver representing the end of the game.
+ + + +Did you thought of the ‘time-out’ state? Again, this is a real model so it has to deal with real problems. And one of them is that you miss the observation that characterizes the end of the game. If that happens you can only know the game has finished by time. That’s why you need a state to represent the end of the game by time. Later on when defining the emissions it will be made more clear why this state is needed. Most of the extra states are created so that when an observation is wrong, there is still a path in the graph to the end. Otherwise, the model will give an error and doesn’t return anything. Observe also that there are two ‘in’ states. Can you imagine why? The state ‘in1’ is when the ball goes in on the first serve, and ‘in2’ on the second. Since we have to maintain the Markov property, those two states are different although they have the same emissions and are identical to us.
+ +Before going to the next block, which is when a player hits the ball and the game actually starts, let me explain how I made this pictures and how you can code graphs on Python. With the library networkx
you can do many thing on graphs, it has implemented almost every algorithm that exists related to graphs. In this project we only use it to define the graphs. The syntax is pretty straighforward, you define a DiGraph
which stands for directed graph, and add nodes and edges with the functions add_nodes_from
and add_edges_from
. After that, you can save the model in gml
format and open it with Gephi. That’s it. Here is the code for the three models presented above.
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+
import networkx as nx
+
+folder_path = 'graphs/'
+
+""" First serve model """
+first = nx.DiGraph()
+first.add_nodes_from([
+ ("init", {"hidden": True}),
+ ("1st-serve", {"hidden": True}),
+ ("net1", {"hidden": True}),
+ ("out1", {"hidden": True}),
+ ("in-net1", {"hidden": True}),
+])
+first.add_edges_from([
+ ("init", "1st-serve"), ("init", "init"),
+ ("1st-serve","net1"), ("1st-serve","out1"),
+ ("net1","in-net1"), ("net1","out1"),
+ ("in-net1","1st-serve")
+])
+nx.write_gml(first, folder_path + 'first.gml')
+
+""" Second serve model """
+second = nx.DiGraph()
+second.add_nodes_from([
+ ("2nd-serve", {"hidden": True}),
+ ("net2", {"hidden": True}),
+ ("out2", {"hidden": True}),
+ ("in-net2", {"hidden": True}),
+])
+second.add_edges_from([
+ ("2nd-serve","net2"), ("2nd-serve","out2"),
+ ("net2","in-net2"), ("net2","out2"),
+ ("in-net2","2nd-serve")
+])
+nx.write_gml(second, folder_path + 'second.gml')
+
+""" Ace model """
+ace = nx.DiGraph()
+ace.add_nodes_from([
+ ("in1", {"hidden": True}),
+ ("in2", {"hidden": True}),
+ ("time-out", {"hidden": True}),
+ ("ground", {"hidden": True}),
+ ("wall-outer1", {"hidden": True}),
+ ("wall-outer2", {"hidden": True}),
+ ("wall-inner1", {"hidden": True}),
+ ("wall-inner2", {"hidden": True}),
+])
+ace.add_edges_from([
+ ("in1", "time-out"), ("in1", "ground"), ("in1", "wall-outer1"), ("in1", "wall-inner1"),
+ ("in2", "time-out"), ("in2", "ground"), ("in2", "wall-outer1"), ("in2", "wall-inner2"),
+ ("wall-outer1", "time-out"), ("wall-outer1", "ground"), ("wall-outer1", "wall-outer2"),
+ ("wall-outer2", "time-out"), ("wall-outer2", "ground"),
+])
+nx.write_gml(ace, folder_path + "ace.gml")
+
Gephi has some handy features that make posible visualize big graphs. Concretely, you can use the Force Atlas distribution to reorder the nodes by simulating forces proportional to the number of edges they have. It has many parameters you can try, but I normally just click on execute and wait a few seconds for convergence.
+ + + +Then, on the previsualization tab, you can create the diagrams I showed you. It has many options, like curved edges. I don’t use that feature for this post because for complex graphs it can be messy. But for some graphs I think is prettier with curvy edges. Other things to adapt are the font size and the size of the arrows. With a bit of practice you can create nice figures quite fast.
+ + + +The rally model is a bit more complex than the ones presented above. There are two ways to design it based on which observations you have. If you only have an observation for bouncing on the ground, anywhere, then the rally model only has one ‘HIT’ state and the rest is similar to the ace model. However, in practice you can have more information than that. Suppose you have an image of a game and you know the location of the ball together with the fact that it is a bounce in the ground. You could, potentially, distinguish if the ball is on the server side or on the receiver side. You just need to segment the court and see if the ball is on the upper side or not. This is not trivial, but is possible to achieve. For that reason, we are going to have two ‘HIT’ states: one for the server side and one for the receiver. The rest is just identical to the ace model, with one exception. Instead of ending the game or returning to the beginning, the states point to the other ‘HIT’ states. Similar to what would happen in a match. Your head is going from one side to the other. Here the hidden state is moving from one model to the other. Finally, the diagram.
+ + + +If you understood the ace model, you just need to focus on the arrows that cross from left to right and vice versa. The rest is just the standard mechanics of a padel game. One more thing to notice is that there are many arrows missing. Concretely, the arrow between the models and the arrows that point to the absorbing states, that is, those which end the game. In the next section, I will try to describe the connections between the models and will show you the (almost) full picture of the HMM transition graph. Also, below is the code for this model.
+ +1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+
""" Rally model """
+rally = nx.DiGraph()
+rally.add_nodes_from([
+ ("HIT1", {"hidden": True}),
+ ("net-HIT1", {"hidden": True}),
+ ("in-HIT1", {"hidden": True}),
+ ("out-HIT1", {"hidden": True}),
+ ("time-out-HIT1", {"hidden": True}),
+ ("ground-HIT1", {"hidden": True}),
+ ("wall-inner-HIT1", {"hidden": True}),
+ ("wall-outer1-HIT1", {"hidden": True}),
+ ("wall-outer2-HIT1", {"hidden": True}),
+
+ ("HIT2", {"hidden": True}),
+ ("net-HIT2", {"hidden": True}),
+ ("in-HIT2", {"hidden": True}),
+ ("out-HIT2", {"hidden": True}),
+ ("time-out-HIT2", {"hidden": True}),
+ ("ground-HIT2", {"hidden": True}),
+ ("wall-inner-HIT2", {"hidden": True}),
+ ("wall-outer1-HIT2", {"hidden": True}),
+ ("wall-outer2-HIT2", {"hidden": True}),
+])
+rally.add_edges_from([
+ ("HIT1", "out-HIT1"), ("HIT1", "in-HIT1"), ("HIT1", "net-HIT1"),
+ ("in-HIT1", "HIT2"), ("wall-inner-HIT1", "HIT2"), ("wall-outer1-HIT1", "HIT2"), ("wall-outer2-HIT1", "HIT2"),
+ ("net-HIT1", "in-HIT1"), ("net-HIT1", "out-HIT1"),
+ ("in-HIT1", "time-out-HIT1"), ("in-HIT1", "ground-HIT1"), ("in-HIT1", "wall-inner-HIT1"), ("in-HIT1", "wall-outer1-HIT1"),
+ ("wall-inner-HIT1", "time-out-HIT1"), ("wall-inner-HIT1", "ground-HIT1"), ("wall-inner-HIT1", "wall-outer2-HIT1"),
+ ("wall-outer1-HIT1", "time-out-HIT1"), ("wall-outer1-HIT1", "ground-HIT1"), ("wall-outer1-HIT1", "wall-outer2-HIT1"),
+ ("wall-outer2-HIT1", "time-out-HIT1"), ("wall-outer2-HIT1", "ground-HIT1"),
+
+ ("HIT1", "HIT2"), ("HIT2", "HIT1"),
+
+ ("HIT2", "out-HIT2"), ("HIT2", "in-HIT2"), ("HIT2", "net-HIT2"),
+ ("in-HIT2", "HIT1"), ("wall-inner-HIT2", "HIT1"), ("wall-outer1-HIT2", "HIT1"), ("wall-outer2-HIT2", "HIT1"),
+ ("net-HIT2", "in-HIT2"), ("net-HIT2", "out-HIT2"),
+ ("in-HIT2", "time-out-HIT2"), ("in-HIT2", "ground-HIT2"), ("in-HIT2", "wall-inner-HIT2"), ("in-HIT2", "wall-outer1-HIT2"),
+ ("wall-inner-HIT2", "time-out-HIT2"), ("wall-inner-HIT2", "ground-HIT2"), ("wall-inner-HIT2", "wall-outer2-HIT2"),
+ ("wall-outer1-HIT2", "time-out-HIT2"), ("wall-outer1-HIT2", "ground-HIT2"), ("wall-outer1-HIT2", "wall-outer2-HIT2"),
+ ("wall-outer2-HIT2", "time-out-HIT2"), ("wall-outer2-HIT2", "ground-HIT2")
+
+])
+nx.write_gml(rally, folder_path + "rally.gml")
+
Let’s go model by model, edge by edge, starting by the 1st-serve. It has two connections, one to the ‘in1’ state of the ace model and one to the 2nd-serve. The latter is from the ‘out1’ state, if the ball goes out, you have a second service, that’s the rules. The 2nd-serve model is similar, one connection to the ‘in2’ state and one to the absorbing state ‘Point-receiver’. If you fail your second serve, you lose the point. Simple, concise and clear. Let’s continue with the ace model. The ‘ground’ and ‘time-out’ states point to ‘Point-server’. This is the representation of an ace. If the ball hits the ground twice, it’s an ace and the point goes to the server side. The rest of the connections are to the ‘HIT1’ state, which means that the receiver has hit the ball and the game continues. And that’s it. The only remaining connections are between the rally model and the absorbing states, just like in the ace model. If the ball bounces twice, or if it goes out of the court after bouncing in, the last player wins. If it goes out, the last player loses. For those of you that know how to play padel, this should be no surprise. There are many cases to deal with because the ball can hit the walls and the net. But more or less it is summarized like that. Here’s the full picture.
+ + + +The code for this part is different. This time we have to join the four different models. To do so we are going to use the function union_all
that creates a new graph with all the nodes and edges from before but without any connections between the subgraphs. To add those connections we use the add_edge
function and for the absorbing states the add_nodes_from
. This is the result.
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+
""" Graph union plus connection edges """
+union = nx.union_all((first, second, ace, rally))
+union.add_edge("out1", "2nd-serve")
+union.add_edge("1st-serve", "in1")
+union.add_edge("2nd-serve", "in2")
+union.add_edge("wall-inner1", "2nd-serve")
+
+""" Absorbing states """
+union.add_nodes_from([
+ ("Point-receiver", {"hidden": True}),
+ ("Point-server", {"hidden": True})
+])
+union.add_edge("wall-inner2", "Point-receiver")
+union.add_edge("out2", "Point-receiver")
+union.add_edge("time-out", "Point-server")
+union.add_edge("ground", "Point-server")
+union.add_edge("out-HIT1", "Point-server")
+union.add_edge("time-out-HIT1", "Point-receiver")
+union.add_edge("ground-HIT1", "Point-receiver")
+union.add_edge("out-HIT2", "Point-receiver")
+union.add_edge("time-out-HIT2", "Point-server")
+union.add_edge("ground-HIT2", "Point-server")
+
+""" Connection between ace and rally """
+union.add_edge("in1", "HIT1")
+union.add_edge("in2", "HIT1")
+union.add_edge("wall-outer1", "HIT1")
+union.add_edge("wall-outer2", "HIT1")
+nx.write_gml(union, folder_path + "union.gml")
+
Previously I said this was almost the full picture. The reason for that is that the model here does not take into account that the ball is flying in between bounces. In an ideal model this would be irrelevant, with just the bounces we can predict the outcome of the game. But in practice that is not true. Do you remember the reason for using a ‘time-out’ state? Here is similar. Imagine you detect the same bounce twice by error. If you only look at bounces you will assume that the game has finished. However, if you detect twice the same bounce, in time they will be very close. In contrast with what will happen if those two bounces are both real. Therefore, if you somehow take into account the time between observations you can solve those kind of errors. The way to do that is by adding ‘flying’ states. In between any two states you include a ‘flying’ state and in the emissions you consider ‘flying’ as a plausible observation. This way you have a way of measuring time. The more ‘flying’ observations you have, the more time that has passed between states. The key here is that the ‘flying’ state has a self-loop. Similar to the ‘init’ state. You don’t know how much time is going to occur between states. For that reason you add a self-loop to stay in that state until there is evidence enough that you are not flying anymore.
+ +For this part there is no diagram. As you may have guessed, adding one state for every edge is going to make the model huge and very complicated to deal with. At this step, I mostly work with the code. Adding all the edges by hand is a nightmare. For that reason, I let python do that for me. Before showing you the code, there is one more hypothesis to deal with. I said that the reason for the ‘flying’ state is to correct duplicate observations. But to do that it is needed to add more than one ‘flying’ state per edge. Why? Because the first ‘flying’ states are going to be corrective states without self-loop. It is only the last one that is a waiting state. Those corrective states can emit the same observations as the first state. While the waiting state can only emit the ‘flying’ observation. The reason for this is mostly empirical. Using only one ‘flying’ state yielded unsatisfactory results. In my experiments I ended up using five ‘flying’ states: four correctives, and one waiting. Finally, the code for that.
+ +1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+
""" Add flying states """
+current_edges = list(union.edges)
+fly_err_len = 5
+for (u,v) in current_edges:
+ if u == "init":
+ continue
+ union.add_nodes_from([("flying-"+u+'-'+str(k), {"hidden": True}) for k in range(fly_err_len+1)])
+ union.remove_edge(u, v)
+ union.add_edge(u, "flying-"+u+"-0")
+ union.add_edge(u, "flying-"+u+"-"+str(fly_err_len))
+ for k in range(fly_err_len):
+ union.add_edge("flying-"+u+'-'+str(k), "flying-"+u+'-'+str(k+1))
+ union.add_edge("flying-"+u+'-'+str(k), "flying-"+u+'-'+str(fly_err_len))
+ union.add_edge("flying-"+u+'-'+str(fly_err_len), "flying-"+u+'-'+str(fly_err_len))
+ union.add_edge("flying-"+u+'-'+str(fly_err_len), v)
+
And the last part of the code is to convert the ‘Point-server’ and ‘Point-receiver’ states into waiting states. The reason for this is numerical. When creating the transition and emission matrices the values need to be normalized. If you don’t add this connections you have rows with all zeros that give errors and it is easier to solve them like this. Those edges are added after creating the ‘flying’ states because those edges don’t need any ‘flying’ states attached to them, they are simply a trick for the computation.
+ +1
+2
+3
+
""" Self-loops for absorbing states """
+union.add_edge("Point-receiver", "Point-receiver")
+union.add_edge("Point-server", "Point-server")
+
First, let’s define the observations.
+ +1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+
""" Possible observations """
+obs = nx.DiGraph()
+obs.add_nodes_from([
+ ("player-hit", {"hidden": False}),
+ ("bounce-ground-receiver", {"hidden": False}),
+ ("bounce-ground-server", {"hidden": False}),
+ ("bounce-net", {"hidden": False}),
+ ("bounce-wall-inner", {"hidden": False}),
+ ("bounce-wall-outer", {"hidden": False}),
+ ("flying", {"hidden": False}),
+ ("end", {"hidden": False})
+])
+
This is all we can observe, at least automatically with a camera. We can observe the ball bouncing anywhere: in the walls, in the net, or in the ground. And we can observe any player hitting the ball. Notice that we don’t have to distinguish which player hits the ball, that job is done by distinguishing where is the ball bouncing. The reason for that is that it is quite difficult in practice to detect when a player hits the ball and which player it is due to projection. With just one observation representing all the player is enough to solve the problem. Keep in mind that this is for real cases, and the model has to reflect the limitations of the detections. There is one special observation called ‘end’. The HMM presented here can only deal with isolated points. The ‘end’ observation is only emitted by the absorbing states. It is a way of forcing the HMM to find a solution. When dealing with more than one point it is needed to detect when the point has finished.
+ +As with the ‘flying’ states, I didn’t add all the emissions by hand. There are a lot of nodes in the transition graph. And the emission graph is a bipartite graph with transitions on one side and emissions on the other. That is a lot of edges. But in the end, is no more than a regex problem. The states have descriptive names. Any state with ‘ground’ in their name is going to emit either ‘bounce-ground-receiver’ or ‘bounce-ground-server’. And the ‘flying’ states emit the ‘flying’ observation. There is no fancy ideas here, just nasty work. I leave here the code for you. There are better ways to code this for sure, but this works and it’s mine, so I like it.
+ +There are two more things to mention. The flying probability and the missing probability. As I said before there are corrective states. Those corrective states can emit the same observation as the state they are attached to it, but with a smaller probability. Otherwise, if the probability isn’t lower, we are not taking into account the fact that the more separated two observations are, the more likely they are to be correct. Thus, the corrective ‘flying’ states have some probability of emit ‘flying’ and some probability of emiting other things. The missing probability is similar but is for the other states. If instead of a duplicate you miss an observation, the graph still needs to find a path to the end. For that reason every state has some little probability of emitting ‘flying’. And there are many other little changes that were added in the process of creating this matrix. The justification behind most of the strange things you will see in the code is empirical. You start with a simple model and find a case where it doesn’t work, change the model and repeat. After several iterations you arrive at this.
+ +1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+
""" Create bipartite graph representing emissions """
+G = union.copy()
+G.clear_edges()
+U = nx.union(G, obs)
+flying_prob = 0.9
+err_miss_prob = 0.001
+eps = 1e-12
+for u in G.nodes():
+ if "Point" in u:
+ #U.add_weighted_edges_from([(u, "flying", 1)])
+ U.add_weighted_edges_from([(u, "end", 1)])
+ continue
+ if "flying" in u and "init" not in u:
+ U.add_weighted_edges_from([(u, "flying", flying_prob)])
+ U.add_weighted_edges_from([(u, "bounce-ground-server", eps)])
+ U.add_weighted_edges_from([(u, "bounce-ground-receiver", eps)])
+ if fly_err_len > 0 and fly_err_len == int(u.split('-')[-1]):
+ continue
+ elif u not in ["net1", "net2", "wall-inner1", "wall-inner2"]:
+ U.add_weighted_edges_from([(u, "flying", err_miss_prob)])
+
+ if "init" in u:
+ U.add_weighted_edges_from([(u, "bounce-ground-receiver", (1-flying_prob)/5)])
+ U.add_weighted_edges_from([(u, "bounce-ground-server", (1-flying_prob)/5)])
+ U.add_weighted_edges_from([(u, "bounce-wall-inner", (1-flying_prob)/5)])
+ U.add_weighted_edges_from([(u, "bounce-wall-outer", (1-flying_prob)/5)])
+ U.add_weighted_edges_from([(u, "bounce-net", (1-flying_prob)/5)])
+ # U.add_weighted_edges_from([(u, "player-hit", eps)])
+ U.add_weighted_edges_from([(u, "flying", flying_prob)])
+ elif "serve" in u or u == "HIT1" or u == "HIT2" or "flying-HIT1-" in u or "flying-HIT2-" in u:
+ U.add_weighted_edges_from([(u, "player-hit", 1-flying_prob)])
+ U.add_weighted_edges_from([(u, "bounce-ground-receiver", eps)])
+ U.add_weighted_edges_from([(u, "bounce-ground-server", eps)])
+ U.add_weighted_edges_from([(u, "bounce-wall-inner", eps)])
+ U.add_weighted_edges_from([(u, "bounce-wall-outer", eps)])
+ elif "time" in u:
+ U.add_weighted_edges_from([(u, "flying", 1)])
+ elif "wall" in u: # this must be before in and out
+ if "inner" in u:
+ U.add_weighted_edges_from([(u, "bounce-wall-inner", 1-flying_prob)])
+ elif "outer" in u:
+ U.add_weighted_edges_from([(u, "bounce-wall-outer", (1-flying_prob)/2)])
+ if "HIT1" in u:
+ U.add_weighted_edges_from([(u, "bounce-ground-server", (1-flying_prob)/2)])
+ elif "HIT2" in u:
+ U.add_weighted_edges_from([(u, "bounce-ground-receiver", (1-flying_prob)/2)])
+ else: assert(False)
+ U.add_weighted_edges_from([(u, "player-hit", eps)])
+ elif ("in" in u or "ground" in u) and "flying-net" not in u and "flying-out" not in u:
+ if "in1" in u or "in2" in u or "in-HIT2" in u or "ground-HIT2" in u\
+ or "in-net1" in u or "in-net2" in u:
+ U.add_weighted_edges_from([(u, "bounce-ground-receiver", 1-flying_prob)])
+ if "ground-HIT2" in u:
+ U.add_weighted_edges_from([(u, "bounce-ground-server", eps)])
+ elif "in-HIT1" in u or "ground-HIT1" in u:
+ U.add_weighted_edges_from([(u, "bounce-ground-server", 1-flying_prob)])
+ if "ground-HIT1" in u:
+ U.add_weighted_edges_from([(u, "bounce-ground-receiver", eps)])
+ elif "ground" in u:
+ U.add_weighted_edges_from([(u, "bounce-ground-receiver", 1-flying_prob)])
+ U.add_weighted_edges_from([(u, "bounce-ground-server", eps)])
+ else: assert(False)
+ U.add_weighted_edges_from([(u, "player-hit", eps)])
+ elif "out" in u:
+ if "out-HIT1" in u:
+ U.add_weighted_edges_from([(u, "bounce-ground-receiver", (1-flying_prob) / 4)])
+ elif "out-HIT2" in u or "out1" in u or "out2" in u:
+ U.add_weighted_edges_from([(u, "bounce-ground-server", (1-flying_prob) / 4)])
+ U.add_weighted_edges_from([(u, "bounce-wall-inner", (1-flying_prob) / 4)])
+ U.add_weighted_edges_from([(u, "bounce-wall-outer", (1-flying_prob) / 4)])
+ U.add_weighted_edges_from([(u, "bounce-net", (1-flying_prob) / 4)])
+ U.add_weighted_edges_from([(u, "player-hit", eps)])
+ elif "net" in u:
+ U.add_weighted_edges_from([(u, "bounce-net", 1-flying_prob)])
+ #U.add_weighted_edges_from([(u, "player-hit", eps)])
+
+
Okay, we have the graphs, but what about the matrices? We need those for the hmmkay
library. How do we generate them? NetworkX provides a function for generating adjacency matrices (adjacency_matrix
). However, we cannot use those matrices as they are, we need to normalize them so that the rows sum up to one. Remember, they represent probabilities. After normalization, we can save the result using pandas
.
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+
import pandas as pd
+
+""" Emission matrix """
+V1 = len(G.nodes())
+B = nx.adjacency_matrix(U).toarray()[:V1,V1:]
+err_change = 0
+B += err_change
+B = B / B.sum(axis=1).reshape((-1,1))
+B_df = pd.DataFrame(B, columns=obs.nodes(), index=G.nodes())
+B_df.to_csv(folder_path + 'B.csv')
+
+""" Transition matrix """
+A = nx.adjacency_matrix(union).toarray()
+A = A / A.sum(axis=1).reshape((-1,1))
+A_df = pd.DataFrame(A, columns=G.nodes(), index=G.nodes())
+A_df.to_csv(folder_path + 'A.csv')
+
The err_change
variable is for adding noise to the emissions so that every state can emit every observation with a little probability. In my experience it doesn’t work well, but I leave it there in case you want to experiment with it.
In this post we have seen how to properly design a HMM for following the result of a padel match. We have learnt to use the NetworkX library to create the graphs and the Gephi program to visualize the process. In the next post of this series we will learn how to actually test whether the HMM works. Stay tuned.
+ + ++ + + + +
+ + ++ Data science from Granada to the world. +
+In part I and part II of this series I have talked about what is a Hidden Markov Model, why it is useful for modelling a padel match, and how to design it properly. Today the topic is about testing the model. Testing a machine learning model is the basis for deploying it. Here I will be explaining how to test in python and guide you through our HMM example.
+ +Testing in python can be done with several libraries. One of the most popular ones is pytest
. There are many good tutorials out there about how to use it, like this one. And if you have a project on your hands and have specific needs, you can always go and have a look at their documentation. For this post I will pass very briefly on the funcionalities that we need for testing the HMM.
In pytest you have files that contain the word ‘test’ on it. Those files contain functions that also have the string ‘test’ in their names. And those functions must have code that yields True
of False
depending on whether they pass the test. To run the tests you just execute pytest
on the command line and the library does the rest of the job for you. This is the big picture, let’s define now what are going to be our tests.
Our tests are going to have a sequence of observations as input and we are going to check for the result of the match. That is the final goal of the HMM. The difference is that here we are going to deal with simple scenarios where we know for sure the result. That way we can check that the HMM is at least well implemented. We cannot check if the design is going to work in a real-world scenario, but we can check that it works in ideal scenarios. If your model breaks in an environment where you control the input, then something is clearly wrong in the code. However, if it breaks in production it may be that the model hypothesis about the world are wrong. That’s why you need these tests, to detect bugs prior to analysing the correctness of the model itself.
+ +Once the tests are designed, it’s time to implement them. For that, pytest
comes with two main features that make the process easier.
In pytest
you can parameterize the input. When testing a function, several input-output pairs are used to ensure that the function produces the desired result. Using a different file for each input-output pair would be an inconvenience. Instead, you can specify several input-output pairs for a given test. Consider the following test where we are interested in knowing if the HMM correctly detects an ace:
1
+2
+3
+4
+5
+6
+
def test_ace(sequence, indexers, hmm, hidden_states):
+ indexer_hidden, indexer_obs = indexers
+ sequences = [[indexer_obs[obs] for obs in sequence]]
+ decoded_seq = hmm.decode(sequences)
+ decoded_seq = [hidden_states[idx] for idx in decoded_seq[0]]
+ assert('Point-server' in decoded_seq)
+
The parameter sequence is a list of observations that corresponds to an ace. Without parameterization we would have to specify a different test function for each sequence that represents an ace. With parameterization we can reuse the function. In pytest
that features is implemented with decorators. For those of you who don’t know python enough, a decorator is basically a function of functions. Here it takes the test function as input and creates a parameterized test function as output. Syntactically it is coded like this:
1
+2
+3
+4
+5
+6
+7
+8
+9
+
@pytest.mark.parametrize("sequence", [
+ ### Ace examples
+ ])
+def test_ace(sequence, indexers, hmm, hidden_states):
+ indexer_hidden, indexer_obs = indexers
+ sequences = [[indexer_obs[obs] for obs in sequence]]
+ decoded_seq = hmm.decode(sequences)
+ decoded_seq = [hidden_states[idx] for idx in decoded_seq[0]]
+ assert('Point-server' in decoded_seq)
+
You only have to add the decorator on top of the function. If you recall the problem at hands, which sequences could possibly represent an ace? Let’s see some examples:
+ +1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+
### ace in first service ###
+['player-hit', *['flying'] * 10, 'bounce-ground-receiver', *['flying'] * 10,
+ 'bounce-ground-receiver', *['flying'] * 100, 'end'],
+ # Bounce on wall
+['player-hit', *['flying'] * 10, 'bounce-ground-receiver', *['flying'] * 10,
+ 'bounce-wall-outer', *['flying'] * 10, 'end'],
+ # Bounce on wall twice
+['player-hit', *['flying'] * 10, 'bounce-ground-receiver', *['flying'] * 10,
+ 'bounce-wall-outer', *['flying'] * 10, 'bounce-wall-outer', *['flying'] * 10, 'end'],
+
+### ace in second service ###
+ # Bounce on wall
+['player-hit', *['flying'] * 10, 'bounce-ground-server', *['flying'] * 10,
+ 'player-hit', *['flying'] * 10, 'bounce-ground-receiver', *['flying'] * 10,
+ 'bounce-wall-outer', *['flying'] * 10, 'end'],
+ # Bounce on wall twice
+['player-hit', *['flying'] * 10, 'bounce-ground-server', *['flying'] * 10,
+ 'player-hit', *['flying'] * 10, 'bounce-ground-receiver', *['flying'] * 10,
+ 'bounce-wall-outer', *['flying'] * 10, 'bounce-wall-outer', *['flying'] * 10, 'end']
+
Basically, whenever there are two consecutive bounces on the other side after the first player has hit the ball we have an ace. When an ace happens the server wins, therefore we have to check that the last hidden state correspond to the server winning. If that doesn’t happens, then our HMM isn’t working. Passing this test doesn’t prove that the HMM works, but by adding more and more tests we at least know that our model is robust to all those cases.
+ +Have you wondered how do you pass parameters to a test? In the previous section I talk about parameterizing with a decorator. But what about the rest of the parameters? Not every parameter is part of the input. There are parameters that are part of the function itself. For instance, the parameter hmm
. How does pytest
know where to look for that object?
At the beginning I said that you just have to run pytest
in the command line. You don’t specify parameters to the internal test functions directly. Instead, you use fixtures. A fixture is another decorator provided by pytest
. In this case, you decorate a function that returns the parameter you want to use later on. Let’s look at an example by specifying the fixture for the hmm
object. Suppose that you have a function in your code (not your test, your real code) that initialized the hmm
and returns it. Then, you would convert that function to a fixture this way:
1
+2
+3
+
@pytest.fixture()
+def hmm():
+ return read_hmm()
+
That’s everything you have to do in order for every other function to know where to look for the hmm
object. The same would be needed for the indexers
and hidden_states
that in my case are just dictionaries to convert from strings of states to the internal identifiers that the HMM uses.
To end this post I’ll show you some concrete tests I designed for my HMM. They are a bit different than the rest of the tests. When evaluating the HMM I said that we give ideal scenarios as input to the tests. But it is possible to give noisy scenarios too, if you control the noise. There are no rules for writing tests, they just serve to check that your code does what you want it to do. And if I want my HMM to be robust to noise, I can test for it.
+ +When I talk about noise in this problem it would be missing some observation or having repeated observations for the same hidden state. We designed our model so that it would work even on those cases. So if I provide a series of noisy observations, it must predict correctly the result. For example, in the following code there is the test for two cases where the server fails on the second service. However, the first hit is repeated and some bounces are repeated too. This series of observations doesn’t correspond to an ideal one, but the model should correctly predict the result.
+ +1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+
@pytest.mark.parametrize("sequence", [
+ ### Fail in second service ###
+ # Bounce in and then on inner wall
+ ['player-hit','flying', 'flying', 'player-hit', *['flying'] * 13,
+ 'bounce-ground-server', 'flying', 'bounce-ground-server', *['flying'] * 9,
+ 'player-hit', *['flying'] * 10, 'bounce-ground-receiver', *['flying'] * 4,
+ 'bounce-wall-inner', *['flying'] * 10, 'end'],
+ # Bounce out
+ ['player-hit', *['flying'] * 15, 'bounce-ground-server', *['flying'] * 10,
+ 'player-hit', 'player-hit', *['flying'] * 12, 'bounce-ground-server', *['flying'] * 10, 'end']
+])
+def test_fail_noise(sequence, indexers, hmm, hidden_states):
+ indexer_hidden, indexer_obs = indexers
+ sequences = [[indexer_obs[obs] for obs in sequence]]
+ decoded_seq = hmm.decode(sequences)
+ decoded_seq = [hidden_states[idx] for idx in decoded_seq[0]]
+ assert('Point-receiver' in decoded_seq)
+
You can also have tests that don’t pass on purpose. I call this one ‘impossible_noise_test’:
+ +1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+
""" Test designed to make the model fail """
+@pytest.mark.parametrize("sequence", [
+ [*['flying'] * 3, 'bounce-ground-server', *['flying'] * 3,
+ 'bounce-ground-server', *['flying'] * 3,'player-hit', # Bounces in the ground badly detected as player-hit
+ *['flying'] * 3, 'bounce-ground-server', # Badly detected bounce
+ 'bounce-ground-receiver', *['flying'] * 5, 'player-hit', *['flying'] * 4, # Well detected serve
+ 'bounce-ground-server', 'bounce-ground-server', 'bounce-ground-server', # Same bounce
+ *['flying'] * 5, 'player-hit', # Well detected response
+ *['player-hit', *['flying'] * 5]*10, # Normal rally (now the ball is for receiver)
+ 'bounce-ground-receiver', 'end'] # It goes to net and out
+])
+def test_impossible_noise(sequence, indexers, hmm, hidden_states):
+ indexer_hidden, indexer_obs = indexers
+ sequences = [[indexer_obs[obs] for obs in sequence]]
+ decoded_seq = hmm.decode(sequences)
+ decoded_seq = [hidden_states[idx] for idx in decoded_seq[0]]
+ assert("Point-server" in decoded_seq)
+
This type of test allows you to know the limits of the model. There must be a limit, and if you don’t manage to create a test that fails, that is also and indicator that something is wrong. Maybe you are not creative enough for the cases, or you have some bug that makes all the test pass (that has happened to me). So it is also a good idea to have a test that fails, just in case.
+ +With this post the Hidden Markov Model for Padel Modelling series comes to an end. We have learned everything to deploy a HMM. From the theory behind the model, till testing the model extensively. Those are the steps to put any machine learning model into production. Learning the basics, designing the model and testing the model. The only thing left is to create a pipeline and integrate it into the final product, but that is a topic for another day. I hope you liked the process as much as I did. See you on my next series.
+ ++ + + + +
+ + ++ Data science from Granada to the world. +
+Imagine you are at a padel match. You watch the ball go from one side to the other, hit the wall, hit the ground, the net, and after some time someone wins the point. In your head you are following the state of the match until the end to know the winner. Imagine now that you were distracted by a fly and lost concentration on the players. You don’t know what has happened when you were distracted but you managed to watch the last part of the point and so you still know who is the winner. How can we replicate this situation with a model? Which is the correct model to manage situations where you can lose track in the middle but by looking the last part you know the result? The property of only needing to know the last part to know the result is called the Markov property. So a suitable model for this task is a Hidden Markov Model.
+ +In this series I will describe how to properly design a Hidden Markov Model (HMM from now on) to keep track of the state of a padel match. I will also provide functional python code that you could play with. And as extra material I will talk about unit testing and how to use unit tests to incrementally build a model like this one. Let’s begin.
+ +First of all a rapid introduction on HMM and how to code them. A Hidden Markov Model has two main elements: hidden states and observations. The hidden states are what represent the model, in the padel case a hidden state can be first service, or ace or second service. The observations are what you can observe directly, continuing with the analogy an observation can be when the ball hits the ground. What the HMM does is to infer the hidden states based only on a sequence of observations. If you watch that the first player hits the ball and then the ball hits the ground twice on the other side, you can infer that the sequence of states is first service and then ace. A more abstract scheme is shown below.
+ + + +Every arrow represents a transition either between hidden states or between a hidden state or an observation. Each transition is nothing more than a probability. For example, the probability of going from second service to first service is zero because the first service always goes first. The transitions between hidden states and observations are called emissions. One emission could be that the probability of observing the ball hit the ground after first service is $0.5$. The transition and emission probabilities are represented as matrices. Below you can see the transitions of an example from a toy HMM.
+ + + +Once you have a HMM there are three things you can do with it. Given a sequence of observations you can decode the most probable sequence of hidden states with an algorithm call Viterbi’s. You can estimate the internal probabilities of the HMM using several sequences of observations with the Baum-Welch algorithm. Or you can sample a new sequence of observations. We are only interested in the first one, decoding. The transition and emission probabilities will be designed to model the rules of a padel match.
+ +Now that we have seen the theory, let’s see how we can decode sequences in Python. As you may have guessed there are libraries that implement all the previously mentioned algorithms. My favorite one so far is hmmmkay
. It is quite easy to use and is fast enough for my use cases. You can create a HMM with a few lines of code. See below.
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+
import pandas as pd
+import numpy as np
+from hmmkay import HMM
+
+folder_path = 'graphs/'
+transition_probas = pd.read_csv(folder_path + 'A.csv', index_col=0)
+emission_probas = pd.read_csv(folder_path + 'B.csv', index_col=0)
+hidden_states = emission_probas.shape[0]
+init_probas = np.zeros(hidden_states)
+init_probas[0] = 1
+hmm = HMM(init_probas, transition_probas, emission_probas)
+
The important method is HMM(init_probas, transition_probas, emission_probas)
. Given the transition and emission matrices and given also some initial probabilities for the hidden states it returns an object that can decode any sequence. The details of how to create the matrices will be described in later posts of the series. I can advance you a bit about the process. You first begin by drawing a graph in paper with the transitions you like, you then parse that graph in paper into a graph in digital format (.gml). And finally, you use the adjacency matrices of the graph as the probability matrices. But for now, let’s assume we already have those files. How can we decode a sequence? Like this.
1
+
decoded_seq = hmm.decode(sequences)
+
The only concern to bear in mind is that the input and the output are numbers starting from zero. You need to parse those values to have something significant. I normally use the column names of the matrices as dictionaries to parse the sequences.
+ +1
+2
+3
+4
+5
+6
+
indexer_hidden = dict()
+for k, col in enumerate(transition_probas.columns):
+ indexer_hidden[col] = k
+indexer_obs = dict()
+for k, col in enumerate(emission_probas.columns):
+ indexer_obs[col] = k
+
Putting everything together, to decode a sequence the program would look something like this code.
+ +1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+
### ace in first service ###
+sequence = ['player-hit', *['flying'] * 10, 'bounce-ground-receiver', *['flying'] * 10,
+ 'bounce-ground-receiver', *['flying'] * 10, 'end']
+sequences = [[indexer_obs[obs] for obs in sequence]]
+decoded_seq = hmm.decode(sequences)
+decoded_seq = [hidden_states[idx] for idx in decoded_seq[0]]
+print(decoded_seq)
+# Result: ['1st-serve', 'flying-1st-serve-0', 'flying-1st-serve-1', 'flying-1st-serve-2', 'flying-1st-serve-3',
+# 'flying-1st-serve-4', 'flying-1st-serve-5', 'flying-1st-serve-5', 'flying-1st-serve-5', 'flying-1st-serve-5',
+# 'flying-1st-serve-5', 'in1', 'flying-in1-0', 'flying-in1-1', 'flying-in1-2', 'flying-in1-3', 'flying-in1-4',
+# 'flying-in1-5', 'flying-in1-5', 'flying-in1-5', 'flying-in1-5', 'flying-in1-5', 'ground', 'flying-ground-0',
+# 'flying-ground-1', 'flying-ground-2', 'flying-ground-3', 'flying-ground-4', 'flying-ground-5',
+# 'flying-ground-5', 'flying-ground-5', 'flying-ground-5', 'flying-ground-5', 'Point-server']
+
Don’t worry if you don’t understand all this fuzzy names, they are the names that I chose for the hidden states and observations. On my next post I will explain in detail what everything means. For now I just want you to notice that the HMM is correctly identifying the winner in this point. The sequence of observations correspond to an ace in the first service. The player hits the balls and it bounces twice in the other side. Any person watching the match will automatically identify that as an ace. Here, it gives more information than that. It recognizes the instant in which the ace is achieved. The hidden state in1
means the ball has correctly entered into the other side and the state ground
means that it has bounced again in the ground. After a while of having the ball flying, the model outputs the state Point-server
correctly giving the victory to the server.
On my next post I will talk about the process of designing the transition and emission matrices. I will also talk about what is reasonable to be defined as observation and which hidden states are needed. And on a later post I will cover a lesson on noisy decoding. One important feature of Hidden Markov Models is that they can decode the sequence of observations even if it is wrong at some point. Like I said at the beginning you could ignore the match for some time and then you would still be able to recognize the winner. HMMs can go even further. Imagine that you are not looking the match and you are just hearing it from the radio. If the commentator makes some mistake you may hear something impossible like a player hitting twice the ball without the match finishing. HMM can decode the sequence correctly even in those situations. That is because HMMs work with probabilities. If a player hits twice the ball and the game continues, the HMM will identify that second hit as a mistake and ignore it. Of course, if the sequence is completely wrong, the result will be wrong too. But HMM are quite robust to noise in the input if you design them carefully. See you on my next post to learn how to design Hidden Markov Models.
+ ++ + + + +
+ + ++ Data science from Granada to the world. +
+In the previous article we discussed how we can use reinforcement learning to design simple architectures like some types of convolutional neural networks. Today I am bringing to you the explanation on how to design more complex architectures. Before diving into how to modify the controller, let’s introduce another way of thinking about recurrent nertworks. Typically, LSTM and GRU are explained through formulas or diagrams like the one I showed in the previous article. However, in the NAS paper they introduced another way of thinking about them. They used a graph representation in which states are nodes, and the edges represent the way of merging states. For instance, and edge can mean to apply a sigmoid function, another can be summing two hidden states, and so on. Below is a simple example visualized.
+ + + +Here the input is $x_t$, the hidden state from the previous step $h_{t-1}$ and the cell state $c_{t-1}$ which is used as memory. As you can see the states are combined either using multiplication or addition, and then some activation functions are applied. Now, think for a moment how can you represent this same graph in a linear way, as a sequence of operations. You have it? Well, one possible way would be [Add, Tanh, Multiply, ReLU, Multiply, sigmoid, Add, ReLU, 1, 0]. Don’t worry, there are many ways to represent the above graph sequentally, this one is just the one provided by the author of the NAS. To understand it look at the next picture.
+ + + +Let’s analyze it step by step. As you can see the process is split in 5. The first three are what you see, the way in which to combine $x_t$ and $h_{t-1}$ to produce $h_{t}$. However, that description is not complete because the cells state can be injected to any tree index. The last two numbers of the sequences indicate when to inject, in this example there is a 1 and a 0, which means that the cell state is injected to the tree index 0 and that the new cell state is the value in tree index 1. And the content of the cell inject part is how you inject the cell state. Let’s recap. Tree index 0 is $a_0 = \text{tanh}(W_1 \cdot x_t + W_2 \cdot h_{t-1})$, which is located at the right part of the graph above. Tree index 1 is $a_1 = \text{ReLU}(W_3 \cdot x_t \odot W_4 \cdot h_{t-1})$ located at the left. This is the simple part. Now things get complicated, the number at the end tells which tree index to inject the cell state, in this case the 0. So we have to update $a_0$ by $a_0 \leftarrow \text{ReLU}(a_0 + c_{t-1})$. Note that there are no learnable parameters in this step. Having done that we can now compute tree index 2: $a_2 = \text{sigmoid}( a_0 \odot a_{1})$. And this is the new hidden state $h_t \leftarrow a_2$. There is just one thing left, what is the new cell state? The value at tree index 1, which is the number we haven’t used yet. So the new cell state is the value at tree index 1 previous to activation so $c_t \leftarrow W_3 \cdot x_t \odot W_4 \cdot h_{t-1}$.
+ +It is a mess at the beginning but once you understand it, is awesome. You can represent any combination by a sequence and so you can learn to generate the optimal sequence. The irony here is that we are using recurrent networks to design recurrent networks. And although the authors didn’t try it, it could be interesting to iterate that process. Use an RNN to design a better RNN, then use that new RNN to design another one and so on. My guess is that it would converge, but who knows, maybe you get an infinitely better network.
+ +Okay, we have learned a way to represent RNN, so, how does the LSTM look like with this new representation? It looks like this
+ + + +If you are interested you can go through the graph step by step to check that the formulas are the same.
+ +Finally, the moment we were all expecting, the new and better Recurrent cell found by the authors of the NAS, the so-called NASCell (you can find it in tensorflow with that name).
+ + + +In order to find it the authors required a lot of computation. This RNN is supposed to be better at language tasks than the normal LSTM. However, since this article came before big transformers were made, this recurrent cell got forgotten after that, and didn’t get much attention. Nevertheless, it is interesting to know that there are many possible RNN, not only the LSTM and the GRU. So the next time you want to try a simple RNN instead of a big transformer, you can think of using the NASCell.
+ +If you liked this, then you are going the enjoy the last part of this series. In the next and last chapter I will be explaining another modification of the controller to include residual connections. Making the controller capable of designing architectures such as ResNet or EfficienNet. Stay tuned!
+ ++ + + + +
+ + ++ Data science from Granada to the world. +
+We have seen in Part 1 and Part 2 that neural architecture search can be used to find hyperparameters and to design recurrent networks. Is there any more use to it? The answer is obviously yes (otherwise there would be no point on this post). Let’s recap, NAS is just a method where you learn something that is non-learnable by backpropagation using trial and error, aka, reinforcement learning. The key for using it in more complex settings is to have a well defined space of parameters. In the first part that space was simply the real range for each hyperparameter. In the second part the search space included every recurrent network. Since that is a pretty broad space we narrowed it down by only working with some specific recurrent networks that could be generated by a tree-like structure. In this third part the search space is going to be the set of all convolutional neural networks. And to reduce that space to something manageable we are only going to consider networks generated by residual connections. Residual connections are typically just an addition operation like this
+ + + +The good property of residual connections is that they prevent vanishing gradients for very deep networks. Their discovery allowed to increase the number of layers. ResNet achieved state of the art results when it was first created, and big transformer-based architecture also use residual connections. Normally the residual connection is between one layer and the next one. Can you think of a way of generalising this? Is there a ways of creating a bigger set of possible architectures? Similar to what we did for recurrent networks, we could break that restriction of just connecting to the next layer. This way we could end up with something like the architecture presented in the original NAS paper
+ + + +Again, the real question is how to encode that architecture, because using a picture is not quite computer-friendly. Think of the way of representing a graph, you use the adjacency matrix, or the adjacency lists. Here it is similar, for each layer you just need to know the inputs. Or in other words, the architecture is saved as adjacency lists of incoming vertices. That’s simple enough for the controller to generate. For each layer the controller has an array of previous layers and simply selects one or more indices from there representing the incoming residual connections. The rest of hyperparameters are generated afterwards. See the diagram
+ + + +What the anchor point box is doing is just that, sampling indices representing the incoming layers. That box may implemented as a feed forward network with several classes. It could seem that the number of classes varies and so we cannot fix the last layer to have a constant size, and that is true, but for a given layer it is always the same. Therefore, fixing the total number of layers also fixes the number of output classes for each anchor point and so we can train them. Another reasonable way to implement the anchor box is as a multinomial variable. That way you only need to learn the probabilities, maybe conditioned to something. Although that poses another problem since another way of applying backpropagation needs to be designed, probably by using some reparameterisation trick. The original implementation was done through attention. They gave some hidden state to each anchor box and used Bahdanau attention mechanism to select the most similar layers (according to that hidden state that is learnable) to use as residual connections. This solves the problem of variable sized input, you can use the same attention mechanism for every anchor point. Thus, reducing the number of total parameters while making all the connections related to each other.
+ +But simply selecting residual connections at random has one issue, have you seen it? There could be layers without connections, or layers that are not connected to anything. The solution is simple, for those layers that are not connected, you simply connect them to the last layer. And for those that doesn’t receive any input, the input image is used as the input, and not any other layer. This way, many complex architectures can emerge, not just linear ones. And there you have it, you can use reinforcement learning to explore the search space of convolutional neural networks.
+ +Here I have presented you three ways of using neural architecture search to better design neural network architectures. If you are feeling short of ideas to create networks, maybe try designing a broad search space and traing a NAS controller on a toy dataset. And don’t feel that it is an outdated technique, Google is still using it, see this article. The main disadvantage of this tecnique is that it requires a lot of computational power, but as everything else, you can use it with reduced capabilities on smaller datasets to try on your own. Maybe you find the next transformer this way, who knows?
+ ++ + + + +
+ + ++ Data science from Granada to the world. +
+Today we are going to dive into an idea that some may fear, and others may praise: AI training itself. Well, in reality the idea is a bit different from an AI training itself, neural architecture search consists of using a network to design other networks in a similar way a human would do it, but automatically. The process can be described as follows.
+ + + +The blue box is the network we want, it can be a Convolutional Neural Network to classify images, or a recurrent neural network to do sentiment analysis. On the other side, the red box is the network that is going to design the solution to our problem for us. However, the controller is not going to output the full code solving your problem, it is not so smart. Instead, it is going to generate the hyperparameters of your network, they can be the filter size, the number of channels of your convolutional layers, or the number of layers of your LSTM, I’ll talk in more details later which hyperparameters can be predicted.
+ +But first, how are we going to train the controller? Because training the supervised model is easy, you throw some data to it and apply gradient descent. However, the controller is not a supervised model. There is no data about the hyperparameters and there is no loss function between the values it gives and the optimal values because, of course, we don’t know the optimal values. Nevertheless, we do have a reward function: the accuracy of the child network. And so we can apply the reinforcement learning paradigm. There is still one problem, the accuracy is a reward function, but it is not differentiable, and we don’t know how it relates to the hyperparameters so it seems that we cannot compute the gradient, and therefore, we can’t apply gradient descent. The currently used solution for that problem was invented by Williams in 1992, they derived the following formula for the gradient:
+That formula deserves its own post, but for now just bear in mind this is the gradient used to train the controller. The process is fairly simple, withdraw an architecture from the controller, train the architecture in a supervised manner and get a validation accuracy. Use that accuracy as reward and train the controller using the above gradient. Repeat suficiently many times and voilà, your controller has learnt to design architectures. The process is illustrated below. Several controllers are trained in parallel due to the high number of attempts the controller needs in order to achieve good performance. Remember, the controller is learning by try and error.
+ + + +Now, the details. What is the controller, exactly? In the original paper it was an LSTM, however, any architecture valid for correlated data can be used, like a transformer. But, the NAS article was published the same year as the transformer paper, so they could only try the LSTM because there was no transformer at the time. More precisely, this is the scheme they present in their paper:
+ + + +As you can see the controller is predicting very simple parameters, the parameters of a CNN that we normally ignore and use by default. But don’t be fooled by its simplicity, it can get quite complex. Imagine we want to design a recurrent neural network, similar to the LSTM or the GRU. There are hidden states and hidden memory cells but, what are the connections? For the LSTM the connections are the ones shown below.
+ + + +So think about all those arrows, what if I told you that with neural architecture search we can learn which arrows are the optimal? If you want to know how, wait for the next part in this series.
+ ++ + + + +
+ + ++ Data science from Granada to the world. +
+Following the philosophy of my blog, this will be a very specific post. You can find many resources on the internet about how to deploy a web app. +I will even be referencing many of those here. But the main difference with those posts is that mine is going to be straight to the point. +The following is a tutorial on how to create a web app from scratch. The backend will be on Django and the database will be Postgresql. +The app will be running on AWS and to deploy it there we will create a Docker image. Last but not least, I’ll explain how to buy a domain and link the domain to the AWS ip address. Let’s get ours hands dirty! Also, don’t worry about the code, it is all available here.
+ +Let’s start by creating a minimal python environment with just Django. You can do it either via python or conda. For reproducibility, please use python3.10 and Django 4.2.2. Open a terminal and run the following:
+ +1
+2
+3
+
python3.10 -m venv .venv
+source .venv/bin/activate # For Windows use: .\.venv\Script\activate
+pip install django==4.2.2 psycopg2-binary
+
For the Conda installation:
+ +1
+2
+3
+
conda create --name .venv python=3.10
+conda activate .venv
+pip install django==4.2.2 psycopg2-binary
+
The next step is to create a prosgresql server. To install postgresql go to the official page. Once installed, you need to start running the server on your system:
+ +1
+2
+3
+
mkdir /usr/local/var/postgres # Create folder if it does not exist
+initdb -D /usr/local/var/postgres # Initialize database cluster
+pg_ctl -D /usr/local/var/postgres start # Start server
+
This will start the server and save everything into the /usr/local/var/postgres
folder. For Window users, replace pg_ctl
and initdb
with the path to the pg_ctl.exe
and initdb.exe
binaries, which may be something similar to "C:\Program Files\PostgreSQL\14\bin\pg_ctl.exe"
and use any data directory you want.
Once the server is running we need to create a database, for that, you need to run postgres in a terminal and execute the relevant SQL code:
+ +1
+2
+3
+4
+5
+
psql postgres # Start SQL shell
+postgres=# CREATE DATABASE mydatabase;
+postgres=# CREATE USER myuser WITH PASSWORD 'mypassword';
+postgres=# GRANT ALL PRIVILEGES ON DATABASE mydatabase TO myuser;
+postgres=# exit;
+
The commands are self-explanatory, just replace mydatabase
, myuser
and mypassword
with what you deem appropiate. Now you have the database ready to use locally, you can connect to it using any database management system you want, I recommend Dbeaver. The connection is through localhost and port 5432. Later on we will see how to automate this process but for now, this is how it is done locally.
Given the database, we need a web on top. With the python environment activated, run the following using any name you want:
+ +1
+
django-admin startproject myprojectname
+
This will create the basic skeleton for a Django project. We now have to configure the database and create a simple app to store data. Go to settings.py
and locate the INSTALLED_APPS
variable. Append 'django.db.backends.postgresql',
to the list, it should be like this:
1
+2
+3
+4
+5
+6
+7
+8
+9
+
INSTALLED_APPS = [
+ 'django.contrib.admin',
+ 'django.contrib.auth',
+ 'django.contrib.contenttypes',
+ 'django.contrib.sessions',
+ 'django.contrib.messages',
+ 'django.contrib.staticfiles',
+ 'django.db.backends.postgresql',
+]
+
Also locate the DATABASES
variable and modify it to contain the information necessary to connect to the posgresql database:
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+
DATABASES = {
+ 'default': {
+ 'ENGINE': 'django.db.backends.postgresql',
+ 'NAME': 'mydatabase',
+ 'USER': 'myuser',
+ 'PASSWORD': 'mypassword',
+ 'HOST': 'localhost',
+ 'PORT': '5432',
+ }
+}
+
Since storing passwords in plain text is normally not a good idea, I recommend you use environment variables for that:
+ +1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+
DATABASES = {
+ 'default': {
+ 'ENGINE': 'django.db.backends.postgresql',
+ 'NAME': 'mydatabase',
+ 'USER': 'myuser',
+ 'PASSWORD': os.environ.get('DB_PASSWORD', ''), # Don't forget to import os
+ 'HOST': 'localhost',
+ 'PORT': '5432',
+ }
+}
+
You can now provide the password through environment variables:
+ +1
+2
+3
+
export DB_PASSWORD='mypassword' # Unix
+set DB_PASSWORD "mypassword" # CMD Windows
+$env:DB_PASSWORD="mypassword" # PowerShell Windows
+
Finally, let’s create a simple page. Start by typing:
+ +1
+
python manage.py startapp simpleapp
+
Now, modify settings.py
to include it by adding it to the INSTALLED_APPS
variable:
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+
INSTALLED_APPS = [
+ 'django.contrib.admin',
+ 'django.contrib.auth',
+ 'django.contrib.contenttypes',
+ 'django.contrib.sessions',
+ 'django.contrib.messages',
+ 'django.contrib.staticfiles',
+ 'django.db.backends.postgresql',
+ 'simpleapp',
+]
+
After that, we will add a very simple model and view to handle data. Our model will only contain names of users. Go to simpleapp/models.py
and add this:
1
+2
+3
+4
+5
+
class User(models.Model):
+ name = models.CharField(max_length=50)
+
+ def __str__(self):
+ return self.name
+
Then, create simpleapp/forms.py
and add this:
1
+2
+3
+4
+5
+6
+7
+
from django import forms
+from .models import User
+
+class UserForm(forms.ModelForm):
+ class Meta:
+ model = User
+ fields = ['name']
+
Next, modify simpleapp/views.py
to include the following:
1
+2
+3
+4
+5
+6
+7
+8
+9
+
from django.views.generic.edit import CreateView
+from .models import User
+from .forms import UserForm
+
+class UserCreateView(CreateView):
+ model = User
+ form_class = UserForm
+ template_name = 'user_form.html'
+ success_url = '/create_user
+
You also need to create the template under a templates folder, create simpleapp/templates/user_form.html
and insert this:
1
+2
+3
+4
+5
+
<form method="post">
+ {% csrf_token %}
+ {{ form.as_p }}
+ <button type="submit">Create</button>
+</form>
+
Last, we need to link the urls for everything to work properly. Create the file simpleapp/urls.py
and write:
1
+2
+3
+4
+5
+6
+
from django.urls import path
+from .views import UserCreateView
+
+urlpatterns = [
+ path('create_user/', UserCreateView.as_view(), name='create_user'),
+]
+
Go to the main myprojectname/urls.py
and edit it to be like this:
1
+2
+3
+4
+5
+6
+7
+8
+
from django.contrib import admin
+from django.urls import path
+from django.urls import include
+
+urlpatterns = [
+ path('admin/', admin.site.urls),
+ path('simpleapp/', include('simpleapp.urls')),
+]
+
Now you are good to go. We can finally start adding rows to the database. For doing so, apply migrations and start the webapp:
+ +1
+2
+3
+
python manage.py makemigrations
+python manage.py migrate
+python manage.py runserver
+
Open a browser, go to http://127.0.0.1:8000/simpleapp/create_user/
and you will be able to input users’ names. If it is your first time using Django, this is a whole lot, I know. This is a simple example using Django’s class-based views. Things can get very, very complex. The aim of this tutorial is to set up a minimal working webapp. For more information on Django, you can go to their official documentation. Okay, close everything and let’s start our Docker journey. To stop the webapp run ctrl+C
and to stop the posgresql server run:
1
+
pg_ctl -D /usr/local/var/postgres stop
+
Setting everything from scratch is time consuming but if you only need to do it once, it is affordable. The problem comes when you want to migrate to other machines or you want to scale. Having to go through all the process above everytime is annoying. As I mentioned before, it would be nice to automate it. That’s when Docker comes into play. It is a way to pack everything up so that it can run on your machine, the cloud or a microwave, if it has Docker installed, of course. A Docker is basically made of a few configuration files that are used to construct an image that does whatever you want, in our case handle data through a web app. Having introduced the concept, let’s build a Docker for our web app.
+ +This section is mostly inspired by this other tutorial. Hand over there if you feel curious. Also, I recommend you giving a look at the official Docker beginner tutorial for more information on how to set up Docker and the basics of it.
+ +To start, create a Dockerfile inside of the Django project directory. Specify the following information on it:
+ +1
+2
+3
+4
+5
+6
+
FROM python:3.10.2-slim-bullseye
+
+WORKDIR /code
+
+COPY . .
+RUN pip install -r requirements.txt
+
You also need to create a requirements.txt
with this:
django==4.2.2
+psycopg2-binary
+
+
+You could simply install it with pip, but when the project grows you will be thankfull to have it all in a requirements.txt
file. So, that’s the container of the webapp. Simple, right? However, we still need to connect it to a posgres database. For that we need to use docker compose to run another container with the database and connect them. For that, create a docker-compose.yml
file with the following:
version: "3"
+
+services:
+ web:
+ build: .
+ command: sh start.sh
+ environment:
+ - DB_PASSWORD
+ volumes:
+ - .:/code
+ ports:
+ - 8000:8000
+ depends_on:
+ db:
+ condition: service_healthy
+ db:
+ image: postgres:14
+ restart: always
+ ports:
+ - 5432:5432
+ environment:
+ POSTGRES_DB: mydatabase
+ POSTGRES_USER: myuser
+ POSTGRES_PASSWORD: ${DB_PASSWORD}
+ volumes:
+ - ./postgres_data:/var/lib/postgresql/data/
+ healthcheck:
+ test: pg_isready -U myuser -d mydatabase
+ interval: 1s
+ timeout: 10s
+ retries: 10
+ start_period: 30s
+
+
+You will need to substitute myuser
and mydatabase
with what you like most. To explain a bit what is happening here, we are running a postgres container, then performing healthy checks to be sure the database is running and after that we launch the webapp container. You could provide the password also there, but for security reasons is better to provide it through an environment variable, just like before. The database is stored in the folder postgres_data
locally, so that whenever you kill the container you don’t lose the data. The port 5432 is forwarded locally so you can connect to the database from your machine when the container is running and see the data.
Wait! We have not finished yet. We need to create the start.sh
and modify the myprojectname/settings.py
file. The DATABASES
variable should look like this:
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+
DATABASES = {
+ 'default': {
+ 'ENGINE': 'django.db.backends.postgresql',
+ 'NAME': 'mydatabase',
+ 'USER': 'myuser',
+ 'PASSWORD': os.environ.get('DB_PASSWORD', ''),
+ 'HOST': 'db',
+ 'PORT': '5432',
+ }
+}
+
The only change is on HOST
. It is now set up to db
which is the name of the posgres container. And the start.sh
script is the following:
1
+2
+3
+
python manage.py makemigrations
+python manage.py migrate
+python manage.py runserver 0.0.0.0:8000
+
It is creating all the migrations needed to set up all the models from Django into postgres. Finally, just run the container:
+ +1
+
docker compose up
+
To stop it, do ctrl+C
and then docker compose rm
. That’s it, that’s all you need to do to restart your webapp from anywhere. You can now take your code to any machine and you won’t need to set up posgres, python and django from scratch. Just install docker and run docker compose up
. Also, it is a good practice to include a .dockerignore
, just like .gitignore
. For this simple app I have
postgres_data/
+Dockerfile
+docker-compose.yml
+*/__pycache__/
+*/*/__pycache__/
+*/*/*/__pycache__/
+
+
+This way I don’t load any unnecessary files to the container, making it faster. So now that we have everything packed up in our bag, let’s travel. Let’s deploy the web to AWS for others to use it.
+ +Amazon Web Services are a way to deploy your code into servers that you don’t need to manage. This way instead of the cost of setting up a whole server, you just pay for the hours used. Nevertheless, you won’t save yourself the cost of configuring everything. Even though configuring AWS may be simpler than configuring a server, it is still an important investment of time. For that, I will provide here the bare minimum to make our webapp work on AWS. You will still need to read the AWS docs extensively, there are many tutorials online, but Amazon keeps changing the interface every so often. The only web that is for sure updated is the AWS official docs page.
+ +Before we can start configuring the server, we need to configure an account. For that you will need access keys. You can create access keys for your root account but it is not recommended. AWS recommends that you create role with less permissions than your root account (specially without billing permissions) and to use those access keys. In the past this was made using the Identity and Access Management (IAM) app. Now, it is being migrated to IAM Identity Center. Both methods still work as of this writing but I will explain the second one which is more updated. The following is a reduced version of this tutorial. Go to the AWS console. There look for the IAM Identity Center. Once on the IAM Identity Center you will need to create an user, create a permission set and link both. In the section User click to Add User and fill the neccesary information. Then, on permission sets, click on Create permission set and create the predefined role AdministratorAccess. After that, go to AWS accounts, select the account under root and click Assign users or groups. Select your created user, click next, select the role, click next, review it and click submit. Finally, to activate that user, you must open the mail you provided and register that user with some password. Before you continue, go to Dashboard and save your AWS access portal URL, that is the URL you need to use to log in with that user. Now, click that URL and sign in. Once you are logged in you should see your user and two links at the right: one for Management console and one for Command line or programatic access. Click the latter and you will see your access keys.
+ +The next step is to install and configure the AWS CLI. Go here and follow the steps for the installation. Once installed execute aws configure
and provide the access key and secret access key obtained previously. It will also ask for a region, I will be using us-east-1
. If you choose a different one you may encounter problems later on because the free tiers differ across regions. And for the output format choose json
. You are now (almost) ready to start launching instances.
Having created our account it is time to create an instance where to deploy our webapp. If your page gets too large you may be interested in storing the database in S3 buckets, but for now I will store code and data in the same instance. You can find the docs for EC2 here. As before, I will summarize it to just use what we need. First, create the instance. For that go to the EC2 console. Under Instances section, click Launch instances. There give it a name, select the OS and arch (I recommend Ubuntu and x86-64), select the instance type (I will be using t2.micro cuase it is free), select a key-pair or create since you probably don’t have one and leave everything as default. Once you launched it you can now access your machine through ssh. In the instances section, you can click on your created instance and then click on Connect and it will give you instructions on how to connect. The next steps are to install Docker, copy your webapp to the instance, change the firewall of the instance to allow http and postgres traffic and finally deploy the app.
+ +If you have chosen Ubuntu as your OS you can follow the instructions here. You just basically need to execute the following commands after accessing the machine:
+ +1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+
sudo apt-get update -y
+sudo apt-get install ca-certificates curl gnupg -y
+sudo install -m 0755 -d /etc/apt/keyrings
+curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
+sudo chmod a+r /etc/apt/keyrings/docker.gpg
+echo \
+ "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
+ "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
+ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
+sudo apt-get update -y
+sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y
+
And if you want to check that everything is working just do
+ +1
+
sudo docker run hello-world
+
You can copy your entire directory recursively into you EC2 instance with scp:
+ +1
+
scp -r -i YOUR_KEY ./* ubuntu@YOUR_EC2_ADDRESS:.
+
YOUR_KEY
is the key pair previously created and YOUR_EC2_ADDRESS
may look something like ec2-30-29-46-221.compute-1.amazonaws.com
. It is the same address that you use to ssh into your machine.
In AWS, the instances have some security rules that control the inbound and outbound traffic of your app. By default all the ports are closed except for 22 which is the ssh port. We will need to open the port 80 and 5432 since they are the http and postgres ports. If you have an SSL certificate you could open the port 443 for https, but we will just use those two for now. Let’s go back to the EC2 console and go to Security Groups. Click on Create security group. Give it a name and add two inbound rules. You can just select HTTP and Postgresql in the dropdown menu and it will set the port for you automatically. Then, on source, select Anywhere IPv4 and click Create security group. Now go back to your created instance and click Actions > Security > Change security groups. There simply add the newly created security group and you are free to go.
+ +In order to deploy our app we need to make one change to our docker-compose.yml
. Initially we were redirecting port 8000 into 8000, we are now going to redirect it to 80 which the http port. The line to change will end up like this
ports:
+ - 80:8000
+
+
+Finally, ssh into your machine with docker installed and execute
+ +1
+
sudo DB_PASSWORD=... docker compose up -d
+
Remember that you have to specify the password of the database as a environment variable. Okay, the app is running but, how can we access it? Well, we cannot. We still need to make some changes. Stop the container and let’s finish this:
+ +1
+
sudo DB_PASSWORD=... docker compose down
+
The first thing to know is what is the IP that we can use to access this page. In the AWS console, when you enter your instance it displays somewhere “Public IPv4 address”. That is the IP of your app. However, if you were to enter there, Django will not let you in. That is because you need to allow that host. For that, go the setting.py
of your app and add it:
1
+
ALLOWED_HOSTS = ['YOUR_IP']
+
Also, even after changing this, when you access your ip you don’t see the page. That is because the base url is not pointing anywhere, but we can fix that. Create a view that only has the redirection:
+ +1
+2
+3
+4
+
from django.shortcuts import redirect
+
+def redirect_to_create_user(request):
+ return redirect('/simpleapp/create_user')
+
Then, in your main urls.py
add path('', redirect_to_create_user)
, it will end up like this:
1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+
from django.contrib import admin
+from django.urls import path
+from django.urls import include
+from .views import redirect_to_create_user
+
+urlpatterns = [
+ path('', redirect_to_create_user),
+ path('admin/', admin.site.urls),
+ path('simpleapp/', include('simpleapp.urls')),
+]
+
Now, copy again all the files into your machine and deploy the webapp:
+ +1
+
sudo DB_PASSWORD=... docker compose up -d
+
Let’s see how we can access the server database locally. Open you favourite DB program (mine is DBeaver) and create a new connection. This time you will have to provide an URL instead of localhost. Everything else is the same as when you did it locally. The port is 5432, the user is what you gave it, and the database name is what you name it. If you have configured properly the EC2 security group you could now access your database locally.
+ +Nice, we have our fantastic webapp up and running, but wait, are you going to share to your friends the page 50.283.48.100? Obviously not, you need a fancy domain like myawesomepage.com or something that describes your project. To achieve that you need to first buy a domain and then link that domain to your IP. Domains that are not on high demand typically cost around 10$ to 20$. You can buy them on Namecheap. Once you have it you need to do several things on the AWS side. You will need to fix the IP so that it doesn’t change, otherwise the DNS redirection will get broken over time. After that you need to create nameservers and then route your domain to that IP. Let’s go step by step. To fix the IP go to the section Elastic IP in the left bar of the EC2 menu. Create such Elastic IP and then, in actions, associate it to your instance. Once you have done that, you will need to create the hosted zone. For that, search in AWS the Route 53 service. Once there, click on Create hosted zone. Insert your domain and create it. Before we continue, two more records need to be created. Create one with Type A and your previous Elastic IP under the value section, everything else as default. Repeat now but add ‘www’ in subdomain so that your page can be accessed either by its domain or adding www at the beginning. Once you have done that, go to your domain on Namecheap and click on manage. Select custom DNS and enter the four nameservers that were created previously. If you didn’t understand something, you can check the tutorials I followed both for the AWS and Namecheap part. DNS redirection may take up to 48 hours. There is one last thing to modify, remember that you need to include the IP in ALLOWED_HOSTS
? Well, you also need to include your domain there. Change that and you will have your marvelous webpage running.
Congratulations! You have managed to reach to the end of this tutorial. If you followed the steps carefully you now know how to create your own web apps. The first time you do it is quite tiresome, but once you know how to do it you can get your millionaire idea up and running at the moment. Let’s recap. First, you need to create your Django app. Then, you create a Docker to launch it easily. After that, you create an AWS instance and deploy your app there. Finally, you link your domain to the instance IP. Once you are done you can enjoy your creation!
+ ++ + + + +
+ + ++ Data science from Granada to the world. +
+This blog was bornt from my need to explain everything that I have learnt during my degrees and to show to the world the results of my research in a more divulgative way.
+ +The name from the blog was inspired from the love I have for the place I belong: Granada, and the love I have for Data Science. As you may have noticed, the footer image is of La Alhambra, as it couldn't be otherwise.
+ +About me I can say that I always have many things in mind, but seldom have time to put them in practice. Below is a photo of me surrounded by olives, you cannot see it but behind the camera there was a rally. I hope you enjoy my blog.
+ + + + + +Thank you for your support! Your donation helps me to maintain and improve this blog.
+ +Buy me a coffee + ++ Data science from Granada to the world. +
+