-
Notifications
You must be signed in to change notification settings - Fork 11
MOM CA: What happens when an http request comes in?
Work in progress. Bernhard is on it
We are using the eXist Native XML database. Under the hood it is Java based. By default, it comes together with a lightweight server, called Jetty. Jetty itself is a Java Servlet Container and Web Server. The Web Server part can communicate with the outside world and the Java Servlet Container part provides a habitat for Java Servlets ("Servlet" like "Applet", only living on the server). By default, and also in our case, eXist itself is deployed as a Java Servlet.
When an HTTP request comes in it is handled by the Jetty server. The Jetty server has a configuration file called web.xml, which in the case of mom-ca is located in mom.XRX/localhost/webapp/WEB-INF/web.xml
.
In here, you can see information about all the different servlets, which are living inside the Jetty Servlet Container. There are all these different servlets, that can handle different tasks. Look around the web.xml file. There is a servlet for XForms, then there is one for database monitoring called JMXSerlvet, one for the WebDAV interface called "milton", also an XSLTServlet, which "applies an XSLT transformation to its input stream", one for SOAP access and more.
And then there is the XQueryURLRewrite servlet, which is essential for getting the http-requests to the correct place.
In the comments of the web.xml it says:
<!--
IMPORTANT: the XQueryURLRewrite servlet filter does now serve as a single
entry point into the web application. All eXist-related URL mappings are
handled by XQueryURLRewrite (see controller-config.xml).
[...]
-->
<!-- XQuery URL rewriter -->
<servlet-mapping>
<servlet-name>XQueryURLRewrite</servlet-name>
<url-pattern>/*</url-pattern>
</servlet-mapping>
url-pattern /* ("everything") infers "single entry point".
As can be seen from other parts of code in the web.xml file, the actual controller-config.xml is located in mom.XRX/localhost/webapp/WEB-INF/controller-config.xml
It is strongly recommended to take a look inside this file! Since this is the single entry-point into the web application, everything within its realm can be seen here. All the basic path mappings and URL rewriting rules can be found in this file.
In the first half of the controller-config.xml you find rules to forward requests to the different servlets based on the URL pattern.
What follows then is information about which collections are accessed in the database, when a specific pattern is encountered. Among them is this line:
<exist:root pattern="/fs" path="/" xmlns:exist="http://exist.sourceforge.net/NS/exist"/><exist:root xmlns:exist="http://exist.sourceforge.net/NS/exist" pattern="/mom/" path="xmldb:exist:///db/XRX.live/mom/"/><exist:root pattern=".*" path="/" xmlns:exist="http://exist.sourceforge.net/NS/exist"/>
I am not quite sure yet what the pattern=/fs -> path="/" and pattern=".*" -> path="/" are needed for exactly.
But pattern="/mom/" -> path="xmldb:exist:///db/XRX.live/mom/" leads to every request with /mom/ ending up in the /db/XRX.live/mom/ collection in the database. There, eXist default behaviour is to look inside the collection and if there is a controller.xql file present, look into it. In this ```/db/XRX.live/mom/controller.xql```` file a check is taking place, if the URL contains specific patterns (mostly if it starts or ends with a specific string). Mostly it somehow resembles the following snippet:
else if(ends-with($exist:resource, '.jpg') or ends-with($exist:resource, '.png') or ends-with($exist:resource, '.gif')) then
<dispatch xmlns="http://exist.sourceforge.net/NS/exist">
<forward url="{ concat('res/img/', $exist:resource) }"/>
</dispatch>
This forwards the request to the appropriate folder in the database, in this case for example /db/XRX.live/mom/res/img/foo.jpg
If the URL matches none of the tested patterns (mostly images, js-files, fonts), then the last statement kicks in:
(: main dispatcher :)
else
<dispatch xmlns="http://exist.sourceforge.net/NS/exist">
<forward url="{ $xrx-relative-path }mom/app/xrx/main.xql"/>
<cache-control cache="no"/>
</dispatch>
And this is where we enter the inner XRX++ circle for the first time and where the magic begins.
So, we move inside the /db/XRX.live/mom/app/xrx/main.xql
file.
The magnitude of this main.xql file cannot be exaggerated. It is hidden so deep down inside the collection hierarchy, that you might happen to overlook it. But it is the de facto heart of the entire application. Once you thoroughly understand what is going on in this file, the "black-box" gains transparency.
Go ahead and open the eXide IDE from the eXist Dashboard. Open this /db/XRX.live/mom/app/xrx/main.xql
file.
First of all, in the main.xql you will find namespace declarations. A lot of them. Don't be intimidated. Reminder: Namespaces in XML are just names, nothing more. They are only used to unambiguously identify "members" belonging to this namespace. So for example when you have two functions with the same name, you just prepend a different name(space) to each and thus introduce a way of clearly identifying them (by name(space)).
The namespaces are grouped. Look at them, they are well commented.
Then there is this huge bulk of uncommented namespaces which originates from the source code's #DECLARE_XRX_NAMESPACES
in mom.XRX/my/XRX/src/core/app/xrx/main.xql
, an instruction which is inflated at compile time to the bulk of namespaces you see.
Next, you import all the modules, which are used within the application context.
The expression import module namespace foo ="http://bar.com/whatever";
means that you import the module identified by the namespace "http://bar.com/whatever" and make it accessible inside your own application via the foo namespace. So if there is a function x() present in the this module, you can now call this function with foo:x() inside your own code.
Now first, four ordinary and pretty standard modules. eXist knows where to find them. You see the "kwic", meaning "keywords in context" (this is for example used for highlighting keywords in returned search results), json, excel, datetime. Everyday modules.
Next you import the huge bulk of XRX-modules (source code says #IMPORT_XRX_MODULES and expands them at compile-time). Because eXist does not know where to find them, a location of the module is specified with the keyword "at" and a filepath trailing it. Look at the filepaths. They follow a pattern:
All of them look like this: "../foo/bar.xqm"
So in every single XRX-module-import, we go one collection upwards in the database. Now we are in the /db/XRX.live/mom/app
collection. Then we go into the foo collection, and import the bar.xqm, which is sitting there. ".xqm" is a naming convention for XQuery-modules, so everyone knows it is a module.
At this point, every module defined in the collections under /db/XRX.live/mom/app/
is operational!!! In eXide look at the left side of the User-Interface. Under "Outline" you see a staggering amount of functions from the modules that were imported. Compare the names and scroll through the list to catch a glimpse of all the functionality that is now callable!
Now, we go on to see, how all this power is harnessed.
Same file as before: Go to the last line of code in /db/XRX.live/mom/app/xrx/main.xql
The smilies are just the XQuery way of commenting.
(:
we resolve the incoming URI and
call the xrx mode we found in the
project's URI resolver
:)
local:main()
We call the local:main() function. It is declared right above it. And we see that we enter a kind of switch where it is decided which action is to be taken, based on the xrx:mode. Now the xrx:mode? Where is this mode coming from? We got it ourselfes, when we imported the module which we called "xrx" earlier. "xrx:mode" also shows up in the eXide Outline mentioned earlier.
Now the fun part is, that when we look inside /db/XRX.live/mom/xrx/xrx.xqm
(remember, we specified this with the "at" + "path-to-module" statement in the import module instruction), then we see that the resolver mentioned in the smilies above is defined here:
(: the resolver :)
declare variable $xrx:resolver :=
resolver:resolve($xrx:live-project-db-base-collection);
(: the mode defined in the resolver :)
declare variable $xrx:mode :=
$xrx:resolver/xrx:mode/text();
To shorten this a little bit: Based on the URL-request, the resolver dives into the right collection and dives into the foo.app.xml "manifest" files, and extracts the content of the <xrx:mode>
element inside the <xrx:map>
element, whose <xrx:uripattern>
is matching the current request's URL. If you like, look at the resolver:resolve() function inside the /db/XRX.live/mom/xrx/resolver.xqm
file. All it does is loop through all the <xrx:uripattern>
and see which one matches. Inside these <xrx:mode>
elements it is specified which mode we have based on the foo.app.xml context. (for example atom, mainWidget, service etc.)
See the Resolving section of this wiki for more information.
Based on the mode that was defined inside the foo.app.xml the application now acts in the appropriate way and calls the respective mode-function. Look at the different mode functions and see what they do.
Based on the huge bulk of imported modules and their underlying functionality (including having already declared variables at your disposal), paired up with and partly based on information that was obtained from the respective "manifest" foo.app.xml file (complete with xrx:uripattern and everything), we have full control over everything.
Just follow the flow of the main.xql file and look inside the "../foo/bar.xqm" files if you don't know what a function does or a variable contains. The namespaces mentioned earlier are your best friends. Because they immediately tell you in which module definition (the respective "bar".xqm) you need to look.
So, this is how the http-request is handled! There is a lot going on under the hood. And every single URL-request is setting of this entire chain. Many of the variables are defined differently depending on the current request's URL. Different URL, different xrx:mode maybe. This is how things are done within mom-ca.
Now some people might wonder, why there is not a huge performance hit, when every single time a request comes in this enormous amount of code is executed. The reason behind this are the caching facilities of eXist. Because the queries you run are cached in memory in an optimized form hidden from the average programmer. And they are executed very fast once they are cached.
You have noticed, that after you (re-)started your database instance, the first time you query takes ages to load. This is because the query has then not yet been cached. eXist being ever so hungry for memory is also connected.
Quick teaser for what is contained in our own xrx-vocabulary: The xrx.xsd file in core/app/xrx/xsd/
only defines in which format to feed the main.xql and all the imported modules with information. It should be used like a dictionary when in doubt, what is expected by the application framework and/or in doubt what to tell it.
A dictionary "xrx -> English" will follow soon.