-
Notifications
You must be signed in to change notification settings - Fork 126
Hello, reCaptcha, and the rest of the world
In this chapter, we will see how to plug an external API to the Opa platform -- here, Google reCaptcha, an API used to protect forms against spammers by attempting to determine whether the person filling the form is actually a human being. Along the way, we will introduce the Opa Binding System Library (or BSL) and some of the mechanisms provided by Opa to permit modular, safe programming.
As we will be interacting with JavaScript, some notions of the JavaScript syntax are useful to understand this chapter.
As in previous chapters, let us start with a picture of the application we will develop in this chapter:
This simple application prompts users to recognize words that are considered too difficult to read by a computer in the current state of the art of character recognition -- typically because Google's own resources have failed to make sense of these words -- and determines that a user is indeed a human being if the answer corresponds to that of sufficient other users through the world. The feature is provided by Google, as an API called reCaptcha. This API involves some code that must be executed on the client (to display the user interface and offer some interactivity) and some code that must be executed on the server (to contact Google's servers and check the validity of the answer).
If you are curious, this is the full source code of our application.
Of course, since the features are provided by Google reCaptcha, the most interesting aspects of the code are not to be found in the source of the application itself, but in how we define the binding of reCaptcha for Opa.
This is done in two parts. At high-level, we find the Opa package.
At low level, we find the JavaScript binding.
In the rest of the chapter, we will walk you through all the concepts and constructions introduced in these listings.
The documentation of the reCaptcha API
details five methods of a JavaScript object called Recaptcha
,
that should be called at distinct stages of the use of the API. Our first step
will therefore be to bind each of these methods to Opa, through the mechanism of
the Binding System Library.
TIP: About the BSL
The Binding System Library was developed to allow binding external features to Opa. The mechanism is used at low-level in Opa itself, to bind Opa to browser features, but also to server features.
At the time of this writing, the BSL provides native bindings with JavaScript and OCaml, and can be used to bind to most languages, including C.
This chapter shows how to plug Opa with some JavaScript client code.
For this purpose, we will create a file called recaptcha.js
, and that we
will populate with Opa-to-JavaScript bindings.
Let us start with initialization. The documentation states that any use of
the JavaScript API must initialize the Recaptcha
object as follows:
Recaptcha.create(pubkey,
id,
{
theme: theme,
callback: Recaptcha.focus_response_field
}
);
where pubkey
is a public key obtained from the reCaptcha admin site,
id
is the identifier of the UI component that will host the reCaptcha
and theme
is the name of the visual theme.
For our purpose, as we are binding the API, these three values should be function arguments, for a function accepting three character strings and returning nothing meaningful.
We specify this as follows:
##register init: string, string, string -> void
##args(id, pubkey, theme)
{
Recaptcha.create(pubkey,
id,
{
theme: theme,
callback: Recaptcha.focus_response_field
}
);
}
The first line registers a function called init
(we could have called it
create
, as the JavaScript method, but init
fits better with the Opa
conventions). This function is implemented in JavaScript and its type
is specified as string, string, string -> void
.
The second line gives names to arguments, respectively id
, pubkey
and theme
. If you are familiar with JavaScript, you can think of
BSL keyword ##args
as a counterpart to function
, with stricter
checks.
The rest of the extract is regular JavaScript, copied directly from the documentation of reCaptcha.
CAUTION: About
void
The only surprise may be that there is no final
return
. Indeed, in regular JavaScript, functions with noreturn
value actually returnundefined
. By opposition, Opa is stricter and does not allowundefined
values. Since our definition states thatinit
always returns avoid
, the BSL will apply an automatic transformation of the Javascrit code so that once bined in Opa, the function returnsvoid
.In Opa, functions always return a value, even if this value is
void
. This is true even of functions implemented in JavaScript.Therefore, if a BSL JavaScript function has type
... -> void
, once in Opa, it returnsvoid
.
The rest of the API is quite similar. Functions reload
and destroy
,
which serve respectively to display a new challenge or to clean the
screen once the reCaptcha has become useless, are bound as follows:
##register reload: -> void
##args()
{
Recaptcha.reload();
}
##register destroy: -> void
##args()
{
Recaptcha.destroy();
}
These bindings should not surprise you. Simply note that we write ##args()
if a function does not take any argument.
Binding functions get_challenge
and get_response
is quite similar. The
first of these functions returns an opaque string that can be used by the
reCaptcha server to determine which image has been sent to the user. The
second returns the text entered by the user.
##register get_challenge: -> string
##args()
{
return (Recaptcha.get_challenge()||"")
}
##register get_response: -> string
##args()
{
return (Recaptcha.get_response()||"")
}
Note that we do not return simply Recaptcha.get_challenge()
or
Recaptcha.get_response()
. Indeed, experience with the reCaptcha API shows
that, in some (undocumented) cases, these functions return value null
,
which is definitely not a valid Opa string
. For this purpose, we normalize
the null
value to the empty string ""
.
CAUTION: About
null
In Opa, the JavaScript value
null
is meaningless. An Opa function implemented as JavaScript and which returnsnull
(or an object in which some fields arenull
) is a programmer error.
We are now done with JavaScript.
The next step is to connect the BSL to the server component and wrap the features as a nice Opa API.
Looking at the documentation of reCaptcha, we may see that a reCaptcha accepts exactly three meaningful arguments:
- a private key, which we need to obtain manually from reCaptcha, and which should never be transmitted to the client, for security reasons;
- a public key, which we obtain along with the private key;
- an optional theme name.
We group these arguments as a record type, as follows:
type Recaptcha.config =
{
{
string privkey
}
cfg_private,
{
string pubkey,
option(string) theme
}
cfg_public
}
By convention, records which simply serve to group arguments under meaningful
names are called configurations and their name ends with .config
. Here,
in order to avoid confusions, we have split this record in two subrecords,
called respectively cfg_private
, for information that we want to keep on the
server, and cfg_public
, for information that can leave the client without
breaching security.
Also, looking at the documentation of the reCaptcha server-side API, we find out that communications with the reCaptcha server can yield the following results:
- a success;
- a failure, which may either mean that the user failed to identify the text, or that some other issue took place, including a communication error between Google servers.
Two additional error cases may appear:
- a communication error between your application and Google (typically, due to a network outage);
- a message returned by Google which does not match the documentation (unlikely but possible);
- an empty answer provided by the user, in which case communication should not even take place.
While these distinct results are represented as strings in the API, in Opa, we will prefer a sum type, which we define as follows:
type Recaptcha.success = {captcha_solved} /**The captcha is correct.*/
type Recaptcha.failure =
{ WebClient.failure captcha_not_reachable } /**Could not reach the distant server.*/
or { string upstream } /**Upstream failure. Could be a user error, but the code is not meant to be exploited.*/
or { list(string) unknown } /**Server could be reached, but produced an error that doesn't match the specifications
provided by Google. Possible cause: proxy problem.*/
or { empty_answer } /**Recaptcha guidelines mention that we should never send answers that are empty.*/
type Recaptcha.result = { Recaptcha.success success } or { Recaptcha.failure failure }
And finally, we define one last type, that of the reCaptcha implementation, as follows:
type Recaptcha.implementation = {
/**Place a request to the reCaptcha server to verify that user entry is correct.
@param challenge
@param response
@param callback*/
(string, string, (Recaptcha.result -> void) -> void) validate,
/**Reload the reCaptcha, displaying a different challenge*/
(-> void) reload,
/**Destroy the reCaptcha*/
(-> void) destroy
}
The role of this type is to encapsulate the functions of the reCaptcha after construction.
This type offers three fields. The first two will map respectively to the
reload
method we have bound earlier and to the destroy
method we have bound
earlier. The third, validate
, is a more powerful function whose role will be
to send to the reCaptcha server the challenge, the response entered by the user,
to wait for the server response and to trigger a function with the result.
TIP: Objects in Opa
Opa is not an Object-Oriented Programming Language in the traditional sense of the term. However, it is a Higher-Order Programming Language, which means among other things that all the major features of Object-Oriented Programming can be found in Opa, and more.
In our listing, values of type
Recaptcha.implementation
contain fields, some of which are functions -- not unlike Objects in OO languages may contain fields and methods.Although this is a slight misues of the term, we often call such records, that is records containing fields, some of which are functions, Objects.
Now that the types are ready, we can write the functions that manipulate them.
The documentation of reCaptcha mentions a JavaScript library, provided by reCaptcha, which needs to be loaded
prior to initializing the reCaptcha. We handle this in a function onready
which we will use shortly:
function void onready(string id, string pubkey, string theme)
{
Client.Script.load_uri_then(path_js_uri,
function()
{
(%% Recaptcha.init %%)(id, pubkey, theme)
}
)
}
This function makes use of Client.Script.load_uri_then
, a function provided by
the library to load a distant script and, once loading is complete, to invoke a
second function. First argument, path_js_uri
, is a constant representing the
URI at which the JS is available. We will define it a bit later. The second
argument is a function that makes use of a construction you have never seen:
(%% Recaptcha.init %%)
This is simply function init
, as we defined it a few minutes ago in JavaScript.
TIP: Calling the BSL
To use an Opa value defined in the BSL (that is, in JavaScript, C, OCaml, or any other language), the syntax is
(%% NameOfThePlugin.name_of_the_value %%)
Replace
NameOfThePlugin
by the name of the file you have passed toopa-plugin-builder
andname_of_the_value
by the name you have registered with##register
. Capitalization of plug-in names is ignored.The contains between the two
%%
is called the key of the external primitive.
In other words, this function onready
loads the JavaScript provided by Google,
then initializes the reCaptcha. To call this function, we will make use of a
Recaptcha.config
and a ready
event, to ensure that it is called only on a
browser that actually makes use of the reCaptcha. We will need this
initialization in our constructor for Recaptcha.implementation
.
TIP: Multiple loads
Module
Client.Script
provides several functions for loading JavaScript. These functions ensure that a JavaScript URI will only be loaded once. In other words, you do not have to worry about the same JavaScript being loaded and executed multiple times.
The second important function, which we will also need to build our
Recaptcha.implementation
, is the validation. We implement it as follows:
function validate(challenge, response, (Recaptcha.result -> void) callback)
{
//By convention, do not even post a request if the data is empty
if (String.is_empty(challenge) || String.is_empty(response))
{
callback({failure: {empty_answer}})
}
else
{
/**POST request, formatted as per API specifications*/
data = [ ("privatekey", privkey)
, ("remoteip", "{HttpRequest.get_ip()?(127.0.0.1)}")
, ("challenge", challenge)
, ("response", response)
]
/**Handle POST failures, decode reCaptcha responses, convert this to [reCaptcha.result].*/
function with_result(res)
{
match (res)
{
case ~{failure}:
callback({failure: {captcha_not_reachable: failure}})
case ~{success}:
details = String.explode("\n", success.content)
match (details)
{
case ["true" | _]: callback({success: {captcha_solved}})
case ["false", code | _]: callback({failure: {upstream: code}})
default: callback({failure: {unknown: details}})
}
}
}
/**Encode arguments, POST them*/
WebClient.Post.try_post_with_options_async(path_validate_uri,
WebClient.Post.of_form({WebClient.Post.default_options with content: {some: data}}),
with_result)
}
}
Although this listing uses several functions that you have not seen yet, it should not surprise you.
The first part responds immediately if the challenge or the response is negative. This complies with
the specification of reCaptcha, in addition to saving precious resources on your server. We then
build data
, a list of association of the arguments expected by the reCaptcha API, and with_result
,
a function that handles the return of our {post}
request by analyzing the resulting string
.
Note the use of function WebClient.Post.of_form
, which converts a request using a list of
associations, as are built by HTML forms, into a raw, string
-based request. Also note function
String.explode
, which splits a string in a list of substrings.
From validate
, as well as our JavaScript implementations of reload
and destroy
, we may now
construct our Recaptcha.implementation
, as follows:
function Recaptcha.implementation make_implementation(string privkey)
{
function validate(challenge, response, (Recaptcha.result -> void) callback)
{
//By convention, do not even post a request if the data is empty
if (String.is_empty(challenge) || String.is_empty(response))
{
callback({failure: {empty_answer}})
}
else
{
/**POST request, formatted as per API specifications*/
data = [ ("privatekey", privkey)
, ("remoteip", "{HttpRequest.get_ip()?(127.0.0.1)}")
, ("challenge", challenge)
, ("response", response)
]
/**Handle POST failures, decode reCaptcha responses, convert this to [reCaptcha.result].*/
function with_result(res)
{
match (res)
{
case ~{failure}:
callback({failure: {captcha_not_reachable: failure}})
case ~{success}:
details = String.explode("\n", success.content)
match (details)
{
case ["true" | _]: callback({success: {captcha_solved}})
case ["false", code | _]: callback({failure: {upstream: code}})
default: callback({failure: {unknown: details}})
}
}
}
/**Encode arguments, POST them*/
WebClient.Post.try_post_with_options_async(path_validate_uri,
WebClient.Post.of_form({WebClient.Post.default_options with content: {some: data}}),
with_result)
}
}
{~validate, reload:cl_reload, destroy:cl_destroy}
}
With this function, we implement a new function make
, to construct both the
Recaptcha.implementation
and the user interface xhtml
component that connects
to this implementation:
function (Recaptcha.implementation, xhtml) make(Recaptcha.config config) {
id = Dom.fresh_id();
xhtml = <div id={id} onready={function(_) { onready(id, config.cfg_public.pubkey, config.cfg_public.theme?"red") }}/>;
(make_implementation(config.cfg_private.privkey), xhtml);
}
One more utility function will be useful, to read the user interface and clean it up immediately:
function {string challenge, string response} get_token() {
result = ({ challenge: (%%Recaptcha.get_challenge%%)()
, response: (%%Recaptcha.get_response%%)()
});
(%%Recaptcha.destroy%%)();
result;
}
With this function, we now have exposed all the features we need to use a reCaptcha. However, at this stage, we are exposing a great deal of the implementation. Consequently, our next step will be to make the implementation abstract and to encapsulate the features in a module.
It is generally a good idea to split large (or even small) projects into packages, as follows:
package tutorial.recaptcha
TIP: About packages
In Opa, a package is a unit of compilation and abstraction. Packages can be distributed separately as compiled code, packages can hold private values, abstract types, etc.
The following declaration states that the current file is part of package
package_name
:package package_name
Conversely, the following declaration states that the current file makes use of package
package_name
:import package_name
Generally, in Opa, packages are assembled into package hierarchies, using reverse domain notation.
Now that we have placed our code in a package, we may decide that
some of our types are abstract, by adding an @abstract
directive as follows:
abstract type Recaptcha.implementation = { ... }
TIP: About abstract types
An abstract type is a type whose definition can only be used in the package in which it was defined, although its name might be used outside of the package.
This feature provides a very powerful mechanism for avoiding errors and ensuring that data invariants remain unbroken, but also to ensure that third-party developers do not base their work on characteristics that may change at a later stage.
Consider the following example:
package example abstract type cost = floatIn this example, type
cost
is defined as a synonym offloat
. In packageexample
, anyfloat
can be used as acost
and reciprocally. However, outside of packageexample
,cost
andfloat
are two distinct types. In particular, we have ensured that only the code of packageexample
can create new values of typecost
. If our specifications expect that acost
is always strictly positive, we have succesfully restriced the perimeter of the code we need to check to ensure that this invariant remains unbroken: only packageexample
needs to be verified.
With this change, methods validate
, destroy
and reload
can now be called only
from our package. As these methods offer important features, we certainly wish to
provide some form of access to the methods, as follows:
function void validate(Recaptcha.implementation implementation, (Recaptcha.result -> void) callback) {
t = get_token();
implementation.validate(t.challenge, t.response, callback);
}
function void reload(Recaptcha.implementation implementation) {
implementation.reload();
}
function void destroy(Recaptcha.implementation implementation) {
implementation.destroy();
}
For further modularization, we will group all our functions as a module, and take the opportunity to hide the function and constants that should not be called from the outside of the module:
module Recaptcha
{
function (Recaptcha.implementation, xhtml) make(Recaptcha.config config) {
id = Dom.fresh_id();
xhtml = <div id={id} onready={function(_) { onready(id, config.cfg_public.pubkey, config.cfg_public.theme?"red") }}/>;
(make_implementation(config.cfg_private.privkey), xhtml);
}
function void validate(Recaptcha.implementation implementation, (Recaptcha.result -> void) callback) {
t = get_token();
implementation.validate(t.challenge, t.response, callback);
}
function void reload(Recaptcha.implementation implementation) {
implementation.reload();
}
function void destroy(Recaptcha.implementation implementation) {
implementation.destroy();
}
private path_validate_uri =
Option.get(Parser.try_parse(Uri.uri_parser, "http://www.google.com/recaptcha/api/verify"));
private path_js_uri =
Option.get(Parser.try_parse(Uri.uri_parser, "http://www.google.com/recaptcha/api/js/recaptcha_ajax.js"));
private function void onready(string id, string pubkey, string theme) {
Client.Script.load_uri_then(path_js_uri,
function()
{
(%% Recaptcha.init %%)(id, pubkey, theme)
}
);
}
private cl_reload =
%%Recaptcha.reload%% /**Implementation of [reload]*/
private cl_destroy =
%%Recaptcha.destroy%% /**Implementation of [destroy]*/
private function Recaptcha.implementation make_implementation(string privkey) {
function validate(challenge, response, (Recaptcha.result -> void) callback) {
//By convention, do not even post a request if the data is empty
if (String.is_empty(challenge) || String.is_empty(response)) {
callback({failure: {empty_answer}})
} else {
/**POST request, formatted as per API specifications*/
data = [ ("privatekey", privkey)
, ("remoteip", "{HttpRequest.get_ip()?(127.0.0.1)}")
, ("challenge", challenge)
, ("response", response)
]
/**Handle POST failures, decode reCaptcha responses, convert this to [reCaptcha.result].*/
function with_result(res) {
match (res) {
case ~{failure}:
callback({failure: {captcha_not_reachable: failure}})
case ~{success}:
details = String.explode("\n", success.content)
match (details) {
case ["true" | _]: callback({success: {captcha_solved}})
case ["false", code | _]: callback({failure: {upstream: code}})
default: callback({failure: {unknown: details}})
}
}
}
/**Encode arguments, POST them*/
WebClient.Post.try_post_with_options_async(path_validate_uri,
WebClient.Post.of_form({WebClient.Post.default_options with content: {some: data}}),
with_result)
}
}
{~validate, reload:cl_reload, destroy:cl_destroy}
}
private function {string challenge, string response} get_token() {
result = ({ challenge: (%%Recaptcha.get_challenge%%)()
, response: (%%Recaptcha.get_response%%)()
});
(%%Recaptcha.destroy%%)();
result;
}
}
TIP: About modules
A module is an extended form of record introduced with
module ModuleName { ... }
instead of{ ... }
.Modules offer a few syntactic enrichments:
- any field may be declared as
private
, which forbids from using it outside of the module;- fields can be declared in any order, even if they have dependencies (including circular dependencies).
In addition, modules are typed slightly differently from regular records. We will detail this in another chapter.
And with this, our binding is complete.
We may compile our package with
opa recaptcha.js hello_recaptcha.opa
Let us recapitulate the Opa source code: Source
As mentioned, we have defined Recaptcha.implementation
as an object. This is a
good reflex when extending the Opa platform through additional BSL bindings that
use data structures can be implemented only on one side.
In Opa, data can be transmitted transparently between the client and the server.
This is impossible for data that is meaningful only on the client. This is the
case here, as JavaScript object Recaptcha
, by definition, exists only on the
client. However, wrapping the JavaScript data structure and the functions that
manipulate it as an object ensures that the user only ever needs to access
methods -- and such methods can always be transmitted from the client to the
server, making the object data structure side-independent.
TIP: Making objects
When a BSL extension to the Opa platform introduces a data structure implemented only on one side, the user must never manipulate this data structure directly. Always hide this data structure behind an object, whose only fields are functions.
- A tour of Opa
- Getting started
- Hello, chat
- Hello, wiki
- Hello, web services
- Hello, web services -- client
- Hello, database
- Hello, reCaptcha, and the rest of the world
- The core language
- Developing for the web
- Xml parsers
- The database
- Low-level MongoDB support
- Running Executables
- The type system
- Filename extensions