forked from Sudhanshu-Dubey14/Blogs_MD
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnginx_regex.html
76 lines (76 loc) · 7.21 KB
/
nginx_regex.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>nginx_regex</title>
<style type="text/css">
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
</style>
</head>
<body>
<h1 id="nginx-x-regex">Nginx x Regex</h1>
<p>Too many x’s in the title, right. Well that’s kind of required and interesting. In this post, we will be talking about <a href="https://www.nginx.com/">Nginx</a> the web server, <a href="https://en.wikipedia.org/wiki/Regular_expression">Regex</a> the legend’s way of searching and how I have combined the power of the two to get my work done.</p>
<p>What was my work? Well, to put it simply, it was to enable <a href="https://en.wikipedia.org/wiki/Common_Gateway_Interface">CGI</a> for every user. It sounds simple but it wasn’t. The details of how I completed my task will be discussed in another post and here today we will kind of build the base for it by discussing the background stuff.</p>
<h2 id="so-what-is-nginx">So what is Nginx?</h2>
<blockquote>
<p>nginx [engine x] is an HTTP and reverse proxy server, a mail proxy server, and a generic TCP/UDP proxy server, originally written by <a href="http://sysoev.ru/en/">Igor Sysoev</a>.</p>
</blockquote>
<p>The above is how Nginx <a href="https://nginx.org/en/">define themself</a>. For us simple folks, its a <a href="https://en.wikipedia.org/wiki/Web_server">web server</a>. Now people who know their networking stuff will ask “Why Nginx?”, “Why are you going for all the trouble when you can easily have <a href="https://httpd.apache.org">Apache</a> do the job for you?”. Well, here are my reasons:</p>
<ol type="1">
<li>Because <a href="https://www.digitalocean.com/community/tutorials/apache-vs-nginx-practical-considerations">some researches</a> indicate that nginx is faster and more efficient than apache.</li>
<li>Because it is much more challenging and fun than just doing <code>a2enmod</code>.</li>
</ol>
<p>So now that we are clear as to why I am using Nginx, let’s move on.</p>
<h2 id="what-is-regex">What is Regex?</h2>
<p>Expanding to <a href="https://en.wikipedia.org/wiki/Regular_expression">Regular Expression</a>, it is a sequence of characters that define a search pattern. Usually such patterns are used by string searching algorithms for “find” or “find and replace” operations on strings, or for input validation. It’s concept was given by <a href="https://en.wikipedia.org/wiki/Stephen_Cole_Kleene">Stephen Cole Kleene</a>.</p>
<p>Recently, I have found a mention of regex in our Theory of Computation classes. But honestly, I like it’s practical applications more than it’s theory.</p>
<p>Anyway, if you casually look at a regex it’s just a strange collection of characters. But so is programming a strange collection of words and numbers and we all the know how incredibly powerful it is. Similarly, regex are incredibly powerful if someone know how to handle them. <a href="https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples-649dc1c3f285">Here</a> is a cheatsheet of regex if you want to get a taste of it.</p>
<p>Now comes the important thing, how regex is used in Nginx. Well, without getting much technical here I will just show you a simple demo cause the bulk of it will be dicussed in the next post. So this is what you add in your site’s configuration in Nginx for enabling <a href="http://publib.boulder.ibm.com/httpserv/manual70/howto/public_html.html">per-user web directories</a>:</p>
<pre><code>location ~ ^/~(.+?)(/.*)?$ {
alias /home/$1/public_html$2;
index index.html index.htm;
autoindex on;
}</code></pre>
<p>Now what is happening here??? First of all, focus only on the first two line cause the last two aren’t important right now. So, basically it is telling Nginx to look for the pattern <code>^/~(.+?)(/.*)?$</code> (this is the regex here and remember regex are patterns) in the URL and if a pattern is found then name that pattern <code>/home/$1/public_html$2</code>.</p>
<p>Thus, if you pass a URL like <code>https://code.gdy.club/~1715081/cgi-bin/info.php</code>, then it will extract this <code>/~1715081/cgi-bin/info.php</code> as the pattern and try to see if it matches the given regex or not. Now this one matches (believe me!) and to see how, we need to understand the given regex which we will be dissecting now (easier for those who have gone through the cheatsheet):</p>
<ol type="1">
<li>^ : This signifies the start of the regex.</li>
<li>/ and ~ : These are simple characters, meant to be matched exactly.</li>
<li>(): These are used for grouping, both of the times.</li>
<li>. : Meant to match any single character.</li>
<li><ul>
<li>: Meant to match one or more of the previous character (Here it’s ‘.’, i.e any single character)</li>
</ul></li>
<li>? : Meant to match zero or one of the previous character (Here it’s “.+”, i.e any character sequence with 1 or more characters).</li>
<li><strong>(.+?)</strong> : Thus this is a group having the smallest sequence of any possible characters (except null terminators).</li>
<li><ul>
<li>: Meant to match zero or more of the previous characters (Here it’s ‘.’, i.e any single character)</li>
</ul></li>
<li>(/.*)? : This is a group of characters that starts with ‘/’ and then goes on till the end.</li>
<li>$ : This signifies the end of the regex.</li>
</ol>
<p>Now if we make a one-to-matching between the pattern and the regex, we get:</p>
<ul>
<li>/~ = /~</li>
<li>(.+?) = 1715081</li>
<li>(/.*)? = /cgi-bin/info.php</li>
</ul>
<p>And since we have used grouping:</p>
<ul>
<li>$1 = (.+?) = 1715081</li>
<li>$2 = (/.*)? = /cgi-bin/info.php</li>
</ul>
<p>Finally replacing the variables $1 and $2 are replced and we get <code>/home/1715081/public_html/cgi-bin/info.php</code> which is the full path of the file on the server and now that Nginx has the address of the file show, it can show it properly.</p>
<p>* drinks a glass of water * “sigh!” Now that was quite long and sorry if I couldn’t make it completely clear to you. Regex is a matter of practise after all. Now in the next post, we will get into more details of it and how I got CGI to work on my Nginx server.</p>
<p>And since you have endured this long with me to learn regex, let’s save the world with it:</p>
<p><img src="https://imgs.xkcd.com/comics/regular_expressions.png" /></p>
<p>Ah, but I have used it before too, so here is another:</p>
<p><img src="https://www.explainxkcd.com/wiki/images/7/7b/regex_golf.png" /></p>
<p>And if you want more of this comic and regex, well <a href="https://www.explainxkcd.com/wiki/index.php/1313:_Regex_Golf">enjoy!</a></p>
</body>
</html>