-
Notifications
You must be signed in to change notification settings - Fork 126
Low level MongoDB support
{block}[WARNING] This chapter is about advanced uses of MongoDB in Opa and details low-level access to MongoDB in Opa. For most applications, you should only read this chapter instead. {block}
In this chapter, we describe the current state of support for MongoDB in the Opa standard library. We assume some familiarity with MongoDB concepts and particularly with the MongoDB shell. This familiarization can be gained by reading the MongoDB tutorial.
MongoDB is a server-based document-oriented non-relational database intended to be scalable and fast. Documents are stored in a binary JSON-like format called BSON. Although BSON has a richer set of types than JSON it is 100% compatible with JSON. For speed, MongoDB does not implement joins but is instead provided with a powerful query language of its own and almost anything that can be done with a relational database can be implemented in MongoDB with a little bit of effort (see MongoDB's page on SQL compatibility).
In addition, MongoDB allows multiple indices into its data although these are not automatic and have to be initiated in client code. MongoDB is intended to be deployed in reliable large-scale web-based applications and thus has features which facilitate scalability such as sharding and master-slave arrangements of servers along with features for reliability such as replicated servers with fail-over.
Backups of MongoDB data are usually done either offline on a slave server in the network using external tools or to redundant nodes in the MongoDB server network.
If you are not familiar with the MongoDB database, here are some quick instructions to get you going. Firstly, make sure that you have MongoDB installed on your system:
% which mongod
Note that MongoDB doesn't come with any major packages such as Ubuntu, yet, but
installation is trivial, download the latest version from the MongoDB
downloads site and unpack the files locally.
You should then just have to add the bin
directory to your path and you should
be up and running.
To run a MongoDB server, you first have to create a directory to store the
database files.
In fact, you need a directory for each node you wish to run, see the MongoDB
documentation for how to create replica sets, sharding etc.
At its simplest, start a mongod
server with:
% mkdir -p ~/mongodata/master
% mongod --rest --oplogSize 500 --noprealloc --master --dbpath ~/mongodata/master > ~/mongodata/master/log.txt 2>&1 &
Use the --oplogsize
and --noprealloc
options to limit the initial allocated
disk space (the default is about 1Gb).
The --rest
option allows you to monitor your database via the http interface
(found at the port number plus 1000).
If you wish to run the server on a different port, use the --port 27017
option, the default MongoDB server port is 27017.
Note, however, that to run the MongoDB shell on a non-default port you also need
the --port
option:
% mongo --port 27017
MongoDB shell version: 2.0.1
connecting to: test
>
For the MongoDB OPA drivers we recommend version 1.6.0 or greater since much of the current functionality was mature by that version. We always recommend the current MongoDB stable version (at the time of writing 2.0.2) but for the most part the driver is quite stable with respect to backwards compatibility.
The Opa support for MongoDB consists of a hierarchy of modules leading to successively higher-level programming.
Support for the BSON binary format is in the form of the Bson
module, all
other modules are built on top of this one.
In general, BSON values are handled by the Mongo.document
Opa data-type but we
also provide the Bson.opa2doc
and Bson.doc2opa
functions to allow conversion
between Opa types and BSON documents.
This contains general support routines for dealing with replies from the MongoDB server. These include:
- printing results to meaningful strings
- testing results for error status
- handling tag lists instead of bit-mapped integers
- extracting fields and Opa types from MongoDB replies
The code which talks to the MongoDB server is in the private MongoDriver
module. This includes support for
replica sets with automatic
reconnection on fail-over and
cursors but for
programming at this level we provide a single all-purpose module called
MongoConnection
.
Advanced programmers wishing to use some of the more obscure features of MongoDB can use the driver code directly but this is not recommended. MongoDB has a complex API involving over 70 functions and many of the simple access commands have numerous options. Our intention with this driver is to make accessing MongoDB databases as simple and logical as possible while still exposing the power and flexibility of the MongoDB engine.
As an adjunct to the low-level programming interface we provide a module
containing a large (but still incomplete) number of the MongoDB command set
called MongoCommands
.
These encompass most functions that will be required for meta-programming the
MongoDB database, such as dropDatabase
, repairDatabase
, createCollection
and so on plus functions associated with normal database access operations such
as getLastError
.
The more advanced MongoDB functionality is also supported here, including
findAndModify
and the very powerful mapReduce
function.
These commands occur in two flavors, those which return Bson.document
values
and those which convert their results into Opa types.
If you are only looking for a single value out of a large and complex reply
document then using the Bson
module access functions on the raw BSON may be
more efficient.
If you intend complex analysis of the reply then the Opa types may be more
convenient.
At the present time only partial support is provided for Opa types.
Some command results may never be treated this way because they include
arbitrary field names which we can't safely convert into Opa types.
This module represents a type-safe view of the low-level routines in
MongoConnection
.
Here, we insist upon Opa types as arguments and results from MongoDB operations.
This necessarily limits what we can put into the database since the BSON
documents stored in the database have to be consistent with the Opa types they
represent.
To achieve this, we have implemented the MongoSelect
and MongoUpdate
modules
which enforce a type discipline upon the arguments to, for example,
MongoCollection.insert
.
The type safety is implemented as run-time type checks so there is a significant
performance penalty for using these routines.
In the future, however, we will provide fully type-safe compile-time type checks
along the lines of the Opa internal database.
Here, we provide some notes on programming with the Opa MongoDB driver. The full interface is too large for complete coverage here, refer to the online Opa API documentation for detailed notes on each function.
The full Opa BSON data-type is as follows:
/**
* A BSON value encapsulates the types used by MongoDB.
**/
type Bson.value =
{ float Double }
or { string String }
or { Bson.document Document }
or { Bson.document Array }
or { string Binary }
or { string ObjectID }
or { bool Boolean }
or { Date.date Date }
or { Null }
or { (string, string) Regexp }
or { string Code }
or { string Symbol }
or { (string, Bson.document) CodeScope }
or { int Int32 }
or { int32 RealInt32 }
or { (int, int) Timestamp }
or { int Int64 }
or { int64 RealInt64 }
or { Min }
or { Max }
/**
* A BSON element is a named value.
**/
type Bson.element = { string name, Bson.value value }
/**
* The main exported type, a BSON document is just a list of elements.
*/
type Bson.document = list(Bson.element)
While values of this type can be constructed manually:
doc = Bson.document
[{name: "$eval", value: {Code:"function(x,y) \{return x*y;}"}},
{name: "args", value:{Array:[{name:"0", value:{Int32:6}},
{name:"1", value:{Int32:7}}]}}]
there are two more convenient ways of constructing BSON values.
Firstly, we provide a set of abbreviations in the Bson.Abbrevs
module:
H = Bson.Abbrevs
doc = Bson.document [H.code("$eval","function(x,y) \{return x*y;}"),
H.valarr("args",[{Int32:6},{Int32:7}])]
Secondly, we can construct the values in Opa and use Bson.opa2doc
:
doc = Bson.opa2doc({`$eval`:(Bson.code "function(x,y) \{return x*y;}"),
args:(list(Bson.int32) [6,7])})
Notice that to get a field with non-alphanumeric characters we have to back-quote
the field name in the Opa value and that to control the representation in the
BSON type we can apply helper types, for example Bson.code
is just a string
but it instructs Bson.opa2doc
to treat it as code.
Remember also to escape curly brackets in strings.
Note that to get Int32
values you need the Bson.int32
type, the default for
int
is actually Bson.int64
.
There are several such types provided by the Bson
module but some merit
special mention:
- Optional types have a special significance with respect to
Bson.doc2opa
in that if a field value is missing in the document it will appear in the Opa type as{none}
. The alternate direction does not apply,{none}
values are represented in the BSON document as{ none : null }
.
type Bson.register('a) = {'a present} or {absent}
- We take this one step further, however, with the
Bson.register
type, which actually behaves much asoption('a)
except that when we callBson.doc2opa
any{absent}
values are omitted from the resulting document altogether. Note that there is a moduleBson.Register
which provides the same functionality forBson.register
as theOption
module does for typeoption
. - Care should be taken in dealing with integer values which may have been placed into the database outside of OPA. OPA uses, internally, the OCaml integer representation
int
which is actually 31 bits wide on 32-bit systems and 63 bits wide on 64-bit systems (the spare bit is reserved by the garbage collector). Now MongoDB actually uses fully 32-bit and 64-bit integers which means that it is possible to find an integer value in a MongoDB database which is too large for the OPA representation (remember that all values generated by OPA and stored in the database are guaranteed to be within range). Currently, OPA only has 32-bit and 64-bit integers as abstract values. Such values can be stored in OPA as an external type (int32
andint64
) but no operations are possible on these values (they are sometimes needed by external libraries). We handle this situation in the MongoDB driver by automatically detecting overflow values and storing them asRealInt32
andRealInt64
when returningBson.document
types from the driver. While these values may appear to be invisible to theBson
module functions such asfind_int
, you can detect overflows by inspecting the document values:
match (value) {
case {RealInt32:_}: error("overflow");
case {Int32:i}: i;
default: error("not an int");
}
- The
Bson.meta
type is intended to support situations where MongoDB can return a field of different types depending upon the nature of the command executed. A good example of this is theout
option to themapReduce
function which can be either astring
or a document type. We cast the parameter asBson.meta
which allows us to control the type at the function's application. We can also apply this trick to theresult
type frommapReduce
calls:
mr = MC.mapReduceSimple(mongodb,map,reduce,{String:"example1"})
/* or */
mr = MC.mapReduceSimple(mongodb,map,reduce,{Document:[H.str("reduce","session_stat")]})
- Two other cases should be mentioned. Both
list
andintmap
are mapped ontoArray
values in BSON. The difference is thatlist
is mapped to consecutive-numbered elements in theArray
document whereasintmap
allows sparse arrays.
As a rough guide to Bson.opa2doc
and Bson.doc2opa
, the following simple
schema shows the mapping:
/* We use a "natural" mapping of constant types */
float <-> Double
string <-> String
Bson.binary <-> Binary
Bson.oid <-> ObjectID
bool <-> Boolean
Date.date <-> Date
void <-> Null
Bson.regexp <-> Regexp
Bson.code <-> Code
Bson.symbol <-> Symbol
Bson.codescope <-> CodeScope
Bson.int32 <-> Int32
Bson.realint32 <-> Int32
Bson.timestamp <-> Timestamp
Bson.realint64 <-> Int64
Bson.min <-> Min
Bson.max <-> Max
/* Basic record scheme */
{a:'a; b:'b} <-> { a: 'a, b: 'b }
/* Sum types */
{a:'a} / {b:'b} <-> { a: 'a } <or> { b: 'b }
/* Non-record types are called "value" */
'a <-> { value: 'a }
/* Special cases */
/* Default for int is Int64 */
int <-> Int64
/* Overflow */
Bson.realint32 <- Int32 /* when integer exceeds range */
Bson.realint64 <- Int64 /* when integer exceeds range */
/* Options */
option('a):
{some=a} <-> { some : 'a }
{none} <-> { none : null }
{none} <- { }
/* Registers */
Bson.register('a):
{present=a} <-> { present : 'a }
{absent} <- { absent : null }
{absent} <-> { }
/* Lists are consecutive arrays */
list('a) <-> { Array=(<label>,{ 0:'a; 1:'a; ... }) }
/* Intmaps are non-consecutive arrays */
ordered_map(int,'a) <or>
intmap('a) <-> { Array=(<label>,{ 1:'a; 3:'a; ... }) }
/* Bson.document is treated verbatim (including labels) */
Bson.document <-> Bson.document
/* Bson.meta is treated as a variable type */
int:Bson.meta <-> { Int64:int }
string:Bson.meta <-> { String:string }
bool:Bson.meta <-> { Boolean:bool }
etc.
Notes:
- For
ObjectID
values, there are a couple of routines which convert between (hex value) strings and the BSON representation,Bson.oid_of_string
andBson.oid_to_string
. You can also create a BSON-style OID value withBson.new_oid
. -
Bson.document
types are completely write-through, i.e. they are not processed at all. - In case you're wondering,
Min
andMax
are used in sharded databases to indicate infimum and supremum bounds on sharding regions, respectively.
//TODO: other functions find_xyz, to_pretty, error stuff
Connecting to and using the low-level drivers should be done using the
MongoConnection
module.
This gathers together various low-level features in a single module.
The preferred method is to use the system of named connections which can be
defined from the command line or setup internally using the Mongo.param
type
and the MongoConnection.add_named_connection
function.
Initially, there is one default connection (called ''default'') which is set to
localhost:27017
, the default port for MongoDB servers on the local machine.
To open this connection use:
mongodb =
match (MongoConnection.open("default")) {
case {success:mongodb}: mongodb
case {~failure}: ... /* take action on error */
}
/* or */
mongodb = MongoConnection.openfatal("default")
The MongoConnection.open
function returns an outcome of either the connection
or the standard Mongo.failure
type whereas the MongoConnection.openfatal
function returns just the connection but treats a failed connection as a fatal
error.
To setup the connection from the command line the following options are defined:
{table}
{* Option | Abbrev Type | Description *}
{| --mongo-name
| (--mn) <string>
| Name for the MongoDB server connection |}
{| --mongo-repl-name
| (--mr) <string>
| Replica set name for the MongoDB server |}
{| --mongo-buf-size
| (--mb) <int>
| Hint for initial MongoDB connection buffer size |}
{| --mongo-socket-pool
| (--mp) <int>
| Number of sockets in socket pool (>=2 enables socket pool) |}
{| --mongo-seed
| (--ms) <host>{:<port>}
| Add a seed to a replica set, allows multiple seeds |}
{| --mongo-host
| (--mh) <host>{:<port>}
| Host name of a MongoDB server, overwrites any previous hosts |}
{| --mongo-log
| (--ml) <bool>
| Enable MongoLog logging |}
{| --mongo-log-type
| (--mt) <string>
| Type of logging: stdout, stderr, logger, none |}
{| --mongo-auth
| (--ma) <user:pwd@dbname>
| Define user name and password for database dbname |}
{table}
So, for example, to connect to the default connection at machinexyz:12345
you
would use:
% prog.js --mh machinexyz:12345
This remains a single connection, to connect to a replica set you also need to define a name for the replica set plus some seeds:
% prog.js --mn blort --mr blort --ms machinexyz:27017 --ms machineuvw:27017
Here we have defined a connection called ''blort'' to a replica set also called ''blort'' with two seed machines. Remember that you only really need one seed which is active in the set, the connection logic queries the seeds for the actual host list and then polls the hosts until it finds the current primary server. From then on reconnection will be attempted if the current primary goes down.
Note that you can define as many named connections as you like, this example still retains the default connection.
Note also that you can clone a connection such that the connection itself will not be closed until all clones have already been closed.
Handling concurrency within an Opa program is done by a socket pool.
This means that a pool of open connections is maintained to the same server such
that blocking only occurs if there are no more available connections in the pool
(set with --mp 2
, for example).
If you ensure that the pool size is at least as big as the number of threads in
your code then no blocking will occur.
Named connections can also be defined within the program:
MongoConnection.add_named_connection({
name: "blort",
replname: {some: "blort"},
bufsize: 50*1024,
pool_max: 2,
log: false,
seeds:[("localhost",10001),("localhost",10002)],
auth:[{dbname:"mydb",user:"me",password:"secret"}]
})
mongodb2 = N.openfatal("blort")
Once a connection has been opened, it can be pointed to different databases and collections using a functional interface. The default database is ''db'' and the default collection is ''collection'' but we can make a connection to a different collection without re-opening the connection as follows:
mongodb_wiki = MongoConnection.namespace(mongodb,"db","wiki")
This mechanism also applies to the flags that some of the MongoDB operations can
take, for example to set the Upsert
flag for all insert operations:
mongodb3 = MongoConnection.upsert(mongodb)
This method is quite flexible since you can define these flags once when the connection is made, making the flags globally persistent, or you can add these function calls at the point of calling the operation, i.e. locally defined flags (there are examples below). All of the MongoDB flags are supported in this way.
One particular flag is worth mentioning, the log
flag which can be set on the
command line and can actually be overridden in this way allowing you to generate
logs for targeted sections of code.
In fact, you can change any of the command line options this way but bear in mind
that some of them, for example, seed lists, will not take effect until the
connection is reconnected.
As you can see, you can add the MongoDB authentication parameters for a given database
either on the command line using the --mongo-auth
argument which is of the
form: user:password@database_name
or by placing the authentication
parameters in the auth
field in the add_named_connection
function argument.
Alternatively, you can call the MongoCommands.authenticate
function to perform
an additional, external authentication.
Note that if you are connecting to a replica set then the driver needs to
re-authenticate after connecting to the new host so the authentication
parameters are built into the low-level Mongo datatype.
This means that if you call this function you should perform all subsequent
operations on the returned Mongo datatype, not on the original which won't have
the parameters built in.
Remember that authentication in MongoDB is to a database, not to a connection so
you can have multiple user names and passwords associated with a single
connection.
If you want to authenticate with all of the databases over a connection you need
to authenticate with the admin
database which acts a bit like ''root'' access
for databases.
The basic database access operations are the same as the MongoDB protocol operations, i.e. insert, update, query, get_more, delete, kill_cursors and msg. So, for example, to insert a document:
/* A couple of documents */
p1 = [H.str("name","Joe1"), H.i32("age",44)]
p2 = [H.str("name","Joe2"), H.i32("age",55)]
/* Insert the documents */
MongoConnection.insert(mongodb,p1)
MongoConnection.insert_batch(mongodb,[p1,p2])
The basic write operations come in three types:
-
insert
is the write-and-forget operation where the insert message is sent and a boolean value is returned which simply states that the correct number of bytes were written to the socket. -
inserte
is a ''safe'' operation where the insert message has agetlasterror
query piggy-backed onto it and then the raw optional reply is returned. -
insert_result
does aninserte
and then analyzes the reply, turning it into a standardMongo.result
type.
All of the basic write operations have these three forms.
The Mongo.result
type is an outcome
of either success as a Bson.document
type or failure as a Mongo.failure
type.
The Mongo.failure
type looks like:
type Mongo.failure =
{OK}
or {string Error}
or {Bson.document DocError}
or {Incomplete}
or {NotFound}
This defines either a raw document error {DocError:doc}
which is an error as
reported by the MongoDB server, a driver error {Error:str}
which is a
message generated by the Opa driver or a few special-purpose errors returned
under specific circumstances ({OK}
is simply a connection that has never
been used).
Post-processing of results may include checking for errors:
error = MongoConnection.insert_result(MongoConnection.upsert(mongodb),[H.i32("i",n)])
println("insert error={MongoCommon.is_error(error)}")
or extracting specific fields from the reply:
println("errmsg={MongoCommon.result_string(error,"errmsg")}")
noting that we also support the MongoDB dot notation syntax:
println("indexSizes._id_={MongoCommon.dotresult_int(collStats,"indexSizes._id_")}")
Closing a connection is as simple as:
MongoConnection.close(mongodb)
Remember that the connection will only close once all of the clones have also been closed.
Handling queries in MongoDB has the complication that, for efficiency, cursors
are stored on the server which entails tracking them at the client side.
While the bare MongoConnection.query
and MongoConnection.get_more
operations
can be used to handle queries in conjunction with the reply support code in
MongoCommon
they are a bit inconvenient.
For this purpose we have defined cursor operations in the MongoCursor
module
and re-exported the most important ones into the MongoConnection.Cursor
module.
A cursor object itself contains all the parameters needed to manage the cursor
at the server side and, in fact, duplicates some of the information in the
connection object.
Using the re-exported functions reduces the number of parameters to the basic
functions since this information can be lifted from the connection into the
cursor object.
Here is an example of a low-level cursor dialog:
cursor = MongoConnection.Cursor.init(mongodb)
cursor = MongoConnection.Cursor.set_query(cursor,{some:[H.str("name","Joe")]})
cursor = MongoConnection.Cursor.set_limit(cursor,3)
cursor = MongoConnection.Cursor.set_fields(cursor,{some:[H.i32("_id",0)]})
cursor = MongoConnection.Cursor.next(cursor)
result = MongoConnection.Cursor.check_cursor_error(cursor)
println("result 1 = {MongoCommon.pretty_of_result(result)}")
println("valid 1 ={MongoConnection.Cursor.valid(cursor)}")
cursor = MongoConnection.Cursor.next(cursor)
result = MongoConnection.Cursor.check_cursor_error(cursor)
println("result 2 = {MongoCommon.pretty_of_result(result)}")
println("valid 2 = {MongoConnection.Cursor.valid(cursor)}")
MongoConnection.Cursor.reset(cursor)
The cursor is initialized with init
and then the parameters for the query
are setup.
The next
function generates the query
(or get_more
) call to the server and
places the next document internally in the cursor object along with any error
status.
The check_cursor_error
function is a convenient way of extracting either the
current document or the error as a Mongo.result
.
Subsequent calls to next
will either return the next document from the
previous reply or issue a get_more
call to re-populate the cursor.
The end of the matching documents (or if no document matches) is signaled with
NotFound
and if you try to read past the end of matching documents you will
get an ''end of data'' error from the driver.
The valid
function is used to poll whether there is any remaining data.
Finally, the call to reset
is important here because it doesn't just end the
query, it will issue a kill_cursors
operation to the server to tell it to
delete the cursor (cursors time out after 10 minutes by default on the MongoDB
server).
This method works fine but this logic has been wrapped up into some convenience functions:
-
find_one
returns the first matching document as aMongo.result
-
find_all
gives all the matches as a list of documents (use thelimit
function to limit the number of replies).
For example:
/* Find all objects in db.session, excluding the _id field */
mongo_session_no_id =
MongoConnection.fields(MongoConnection.namespace(mongodb,"db","session"),{some:[H.i32("_id",0)]})
println("findAll: {CM.pretty_of_results(MongoConnection.Cursor.find_all(mongo_session_no_id,[]))}")
You can also define custom loops over the matches using start
(or find
) in
conjunction with next
and valid
.
(Note that you must use the MongoConnection.Cursor.for
loop instead of the
more usual for
function in the Opa stdlib, you need to check for valid and
only call next if still valid at that point, otherwise you will miss the last
document in the list of matches).
//Commands //~~~~~~~~
While you can achieve anything that MongoDB is capable of using the low-level drivers, there are no guarantees of type safety while converting between BSON documents and Opa values. You can of course base your entire project around BSON values and eliminate the need for converting between MongoDB's documents and Opa types altogether but this may not be very convenient depending upon what is happening elsewhere in your application. Secondly, to use the low-level drivers requires an investment in learning MongoDB's powerful but rather complex interface (which may be new to users of relational databases) in order to exploit what MongoDB has to offer. Finally, basing your application on MongoDB's API will tie your application to MongoDB and you may at some point in the future wish to migrate to other database solutions.
Ultimately, the intention is to provide an abstract view of the database which
is general enough to encompass several of the existing database solutions, of
which MongoDB is an important player, and support this with compiler-generated
syntax in the manner of the Opa inbuilt database.
This support is still not available but we can offer an intermediate layer of
programming MongoDB whereby we assume collections of Opa types and support
type-safety by performing run-time type-checks on operations over these
collections.
This support is in the form of the MongoCollection
module plus some support
modules for generating values suitable to be applied to these functions.
The central idea in the MongoCollection
module is a collection (in the MongoDB
terminology sense) of Opa values.
This is embodied in the Mongo.collection
type which is extremely simple, it's
just a MongoConnection
value cast to the specific type of the values to be
stored in the collection:
type Mongo.collection('a) = {
Mongo.mongodb db /* the mongodb connection */
}
When a value is stored in the collection it is automatically converted from its Opa type into a matching BSON document and vice versa for queries.
While this sounds simple there are a number of pitfalls to watch out for. We assume that any offline modifications of the collection will not create any incompatible values. If, for example, we add or delete a field from a record then the entry can no longer be represented as an Opa type.
To overcome this problem we place checks in the code to verify the suitability
of documents read from the collection and an error will be generated if any such
values are found.
We also provide features to allow handling of this situation in some specific
circumstances, for example, if you type a field in the collection as
Bson.register
it will allow you to successfully read in values with missing
fields but this is not recommended for collections.
Ultimately, it is up to the maintainer of the database to ensure that the values
stored there are consistent with the application's usage of the collection.
Despite these provisos, using a collection is very simple and gives the programmer the ability to integrate Opa types with the MongoDB system without having to understand the underlying complexity of the database and with a modest level of type-safety. The cost, for the moment, is the overhead of the run-time type-checks which will slow down database operations.
A simple dialog for creating and manipulating a collection might be as follows:
/* The type of our first collection */
type t = {int i}
/* Create a collection of type t */
Mongo.collection(t) c1 = MongoCollection.openfatal("default","db","collection")
/* Put a single value into the collection */
result = MongoCollection.insert_result(c1,{i:0})
/* Finally, destroy the collection */
MongoCollection.destroy(c1)
We define a type for the collection (type t
) so that when we open a connection
to the database we can cast the resulting collection object and thus install the
correct run-time representation of the type.
The openfatal
function returns a collection and treats a connection failure as
fatal.
There are several variants of the open
function.
A collection is a pointer to a specific collection in the database (here,
db.collection
) and we create a connection to the MongoDB server using the
connection name (in this instance, default
).
Inserting a value into the collection is trivial, the value is simply passed as
it is to the insert
function (here we use the safe insert_result
function
which also returns the result of a getlasterror
call).
The insert has exactly the same effect as a call to MongoConnection.insert
but
with the value automatically converted into a BSON document using the scheme
outlined above.
The call to MongoCollection.destroy
should not be forgotten because this
closes the underlying connection.
While the insert
function is trivial, we need more care with update
and
delete
.
The problem is that to maintain our level of type-safety we need to match select
(and update) documents with the type of the collection they are applied to.
We do this with a system of run-time type-checks applied to the select
documents.
For example:
/* Create pre-typed select and update generation functions */c
MongoSelect.create reatest = Bson.document -> Mongo.select(t)
MongoUpdate.create createut = Bson.document -> Mongo.update(t)
/* Generate the select documents */
select = createst(MongoSelectUpdate.int64(MongoSelectUpdate.empty(),"i",0))
update = createut(MongoSelectUpdate.inc(MongoSelectUpdate.int64(MongoSelectUpdate.empty(),"i",1)))
/* We can now apply update to these documents */
result = MongoCollection.update_result(c1,select,update)
Firstly, we use the MongoSelectUpdate
module to generate the basic documents.
Note that we could also have used the Bson.opa2doc
function to achieve the
same result:
select = createst(Bson.opa2doc({i:0}))
update = createut(Bson.opa2doc({`$inc`:{i:1}}))
The choice between these two styles may depend upon the type of document being
generated.
The Opa type-based versions are more readable but the MongoSelectUpdate
ones
are much faster since no conversion is required.
The select documents have to be correctly typed for the collection they apply to
so we generate a couple of convenience functions createst
and createut
to do
the casting for us.
Secondly, once we have these documents we can apply the update
function to
them but note that although a select document is just a typed Bson.document
it
triggers a set of suitability tests.
These tests are complex and probably do not cover all possible MongoDB
operations but briefly, the select document is scanned by a knowledge-base of
the types of MongoDB field types, for example $inc
only applies to updates,
$and
only applies to selects whereas $comment
can apply to both.
Once the status (select/update/both) is determined, the type of the resulting
values is determined from the select document and is verified to be a subtype of
the type of the collection.
So, for example, {int a}
is a subtype of {int a, string b}
but {int a, bool c}
is not.
Presently, we only print a suitable warning but in future, once these routines
have fully matured we may return an error value.
All of the basic database write operations occur in both send-and-forget and in
send-with-getlasterror forms: insert
, insert_result
, insert_batch
,
insert_batch_result
, update
, update_result
, delete
and delete_result
.
As an aside, notice that we use a similar functional interface for flags as for the low-level code:
MongoCollection.delete(MongoCollection.singleRemove(c1),createst(Bson.opa2doc({i:104})))
The select mechanism applies to queries as well but in this case we have to be careful what types we return from the database:
result = MongoCollection.find_one(c1,createst(Bson.op12doc({`$where`:(Bson.code "this.i > 106")})))
match (result) {
case {success:{~i}}: println("i={i}")
case {~failure}: println("error={MongoCommon.string_of_failure(failure)}")
}
This example returns the first value in the collection for which i
is greater
than 106, it expresses the select as a JavaScript expression.
Many of the MongoDB query methods are perfectly safe with collections such as
the $where
example here but some methods are not safe in that they return
documents which contain fields other than those in the Opa type, a good example
being the http://www.mongodb.org/display/DOCS/Explain[`$explain`] documents
which are a set of statistical data concerning the given query (see the
Mongo.explainType
type in MongoCommands
).
In general, we attempt to support such features with special purpose functions
rather than via the normal database operations.
The usual simplified query functions are present in MongoCollection
,
find_one
and find_all
.
There are also two functions which return the bare Bson.document
representation of the result, find_one_doc
and find_all_doc
which may be
useful in the above situation where the result of the query is not compatible
with Opa types.
For more general query scanning, the cursor-based routines are available.
For example, the following code scans the results of a MongoCollection
query
query = createst(Bson.opa2doc({i:{`$gt`:102, `$lt`:106}}))
match (MongoCollection.query(MongoCollection.limit(c1,0),query)) {
case {success:cc1}:
cc1 =
while(cc1,(function(cc1) {
match (MongoCollection.next(cc1)) {
case (cc1,{success={~i}}):
println("i={v}")
(cc1,MongoCollection.has_more(cc1))
case (cc1,{~failure}):
println("error={MongoCommon.string_of_failure(failure)}")
(cc1,false))})
MongoCollection.kill(cc1)
case {~failure}:
println("error={MongoCommon.string_of_failure(failure)}")
}
In this code, we create a Mongo.collection_cursor
object using
MongoCollection.query
to which we can then apply the collection-specific cursor
functions MongoCollection.next
and MongoCollection.has_more
.
This allows arbitrary processing of collection queries.
Remember, as with the low-level cursors above, that the MongoCollection.kill
function does not just end the scan, it also sends a kill_cursors
message to
the MongoDB server to tell it to destroy the cursor.
Another aside in this code is that we set the limit
value to 0
which means
''use the default number of documents per reply''.
If we had set this to 1
we would only ever get one document in the reply
because MongoDB treats this as a special case, i.e. ''just return one document''.
Again, to help with the situation where return values may be incompatible with
Opa types, we provide the _unsafe
variants of the query functions.
These, for example query_unsafe
, take an additional boolean flag,
ignore_incomplete
which instructs the driver to simply ignore any return
documents which have missing fields and are thus not compatible with Opa types.
MongoDB will actually return partial documents if the document meets the query
document but does not contain all of the fields (an exception is the _id
field
which is always returned unless specifically excluded with the return field
selector document).
These functions should be used with care.
Apart from the support described here the MongoCollection
module also provides
a few convenience functions such as creating indexes using collection objects
and some direct support for some of the aggregation functions (count
,
distinct
and group
).
Finally, one of the variants of the open
function, openpkg
and
openpkgfatal
supplies a set of pre-cast versions of MongoSelect.create
and
MongoUpdate.create
.
In this section, we describe how to convert the hello_wiki
example described
in the previous chapter to using the MongoDB database.
This is actually a simple process and uses MongoDB as a simple key-value storage
database.
// TODO: more realistic example
The first task is to open a connection to the database.
We are going to use collections and in fact, we will use the version of open
which also gives us the casting functions for selects:
/**
* The basic info. about the database and table location.
*/
type page = {
string _id,
Bson.int32 _rev,
string content
}
/**
* We work at level 1, run-time type-checked storage of a collection of Opa values.
* The Mongo.pkg type provides convenience functions for building select and update documents.
**/
Mongo.pkg(page) (wiki_collection,wiki_pkg) = MongoCollection.openpkgfatal("default","db","wiki");
function pageselect(v) { wiki_pkg.select(Bson.opa2doc(v)); }
function pageupdate(v) { wiki_pkg.update(Bson.opa2doc(v)); }
The _rev
field has been cast to Bson.int32
so we can use 32-bit integers for
this field (it is unlikely we will ever have more than 4 giga-revisions of any
value in the database!).
We then open our connection using the default named connection and connect to
the collection db.wiki
.
This returns a collection object plus a package of values which we use to build
our select documents.
Next we are actually going to search for documents including the _rev
field so
we can't just use the default index for our collection (the _id
field):
/**
* Indexes aren't automatic in MongoDB apart from the non-removable _id index.
* Since we're searching on _rev as well, we need a separate index.
**/
MongoCollection.create_index(wiki_collection, "db.wiki", Bson.opa2doc({_id:1; _rev:1}), 0)
The get_content
function can then be modified using a simple call to
MongoCollection.find_one
:
function get_content(docid) {
default_page = "This page is empty. Double-click to edit."
function extract_content(page record) { record.content }
/* Order by reverse _rev to get highest numbered _rev. */
orderby = {some:Bson.opa2doc({_rev:-1})}
match (MongoCollection.find_one(MongoCollection.orderby(wiki_collection,orderby),pageselect({_id:docid}))) {
case {success:page}: extract_content(page)
case {failure:{NotFound}}: default_page
case {~failure}:
jlog("hello_wiki_mongo: failure={MongoCommon.string_of_failure(failure)}")
default_page
}
}
We search the database for the given _id
value but we want the
highest-numbered _rev
field so we sort by inverse order on that field (the
default ordering for numerical fields is in increasing order).
A missing document is signaled by the NotFound
failure condition, other
failure
values are errors.
Finally, the save_source
function becomes a call to
MongoCollection.update_result
:
exposed function save_source(topic, source) {
select = pageselect({_id:topic})
update = pageupdate({`$set`:{content:source}, `$inc`:{_rev:(Bson.int32 1)}})
/* Upsert this so we create it if it isn't there */
result = MongoCollection.update_result(MongoCollection.upsert(wiki_collection),select,update);
if MongoCommon.is_error(result)
then <>Error: {MongoCommon.pretty_of_result(result)}</>;
else load_rendered(topic);
}
In this case, we select only the _id
field and we update the document by
setting the content
field and incrementing the _rev
field.
Note that we use the Upsert
flag which tells MongoDB to insert the document if
it isn't already present in the collection.
We test the result for errors using the safe update operation but apart from
that the code is identical to the existing ''Hello wiki'' example.
- A tour of Opa
- Getting started
- Hello, chat
- Hello, wiki
- Hello, web services
- Hello, web services -- client
- Hello, database
- Hello, reCaptcha, and the rest of the world
- The core language
- Developing for the web
- Xml parsers
- The database
- Low-level MongoDB support
- Running Executables
- The type system
- Filename extensions