The xmlresolver project provides an advanced implementation of the SAX
EntityResolver
, the Transformer URIResolver
, and a new
NamespaceResolver
. The implementation uses the OASIS XML Catalogs V1.1
Standard to provide a mapping from public identifiers to local
resources.
The xmlresolver can be found on Maven Central and has the coordinates:
<groupId>org.xmlresolver</groupId>
<artifactId>xmlresolver</artifactId>
In addition to enhanced support for RDDL-based namespace resolution, the implementation supports automatic local caching of resources. This provides the advantages of the catalog specification without requiring users to manage the mapping by hand.
Applications can use the resolver directly or they can instantiate one of a set of convenience classes to access parsers that automatically implement these resolvers.
The goal of this project is to produce a clean, reasonably simple API and a robust, thread-safe implementation.
See also: https://xmlresolver.org/
For guidelines about how to migrate your application from the Apache Commons resolver to this resolver, see documentation and examples in https://github.com/xmlresolver/resolver-migration
Version 6.x is a significant refactoring and is not (wholly) backwards compatible with version 5.x. (The underlying functionality is the same, but the API is slightly different.) The version 5.x sources are now in the legacy_v5 branch. Important bug fixes will be applied to the 5.x release for some time, but new development is focused on the 6.x release.
Three main considerations drove the refactoring:
- Correcting design errors. For example, using a
javax.xml.transform.Source
to return a non-XML resource. - Simplification of the design (removing the caching feature, for example)
- Bringing the Java and C# implementations into better alignment.
Where previously you would have instantiated an
org.xmlresolver.Resolver
and used it as the entity resolver for SAX
(and other) APIs, you should now instantiate an
org.xmlresolver.XMLResolver
. This new object has methods for
performing catalog lookup and resource resolution. It also has methods that
return resolver APIs. See Using an XML Resolver.
Behind the scenes, the API has been reworked so that most operations
consist of constructing a request for some resource, asking the XMLResolver
to either
(just) look it up in the catalog or resolve it, and returning a response.
The XML Resolver API is often integrated into other projects and products. On the one hand, this means that it’s valuable to publish new releases early so that integrators can test them. On the other hand, integrators quite reasonably want to make production releases with only the most stable versions.
In an effort to make this easier, starting with version 6.x, the XML Resolver releases will use an even/odd pattern version number strategy to identify development and stable branches.
If the second number in the verion is even, that’s a work-in-progress, stabalization release. Please test it, and report bugs. If the second number is odd, that’s a stable release. (Test that and report bugs too, obviously!)
In other words 6.0.x are stabalization releases. When the API is deemed stable, there will be a 6.1.0 release. If more features are developed or significant changes are undertaken, those will be published in a series of 6.2.x releases before stabalizing in a 6.3.0 release. Etc.
This version introduces a new API for registering a scheme resolver. This will allow a resolver to be configured, for example, to handle custom URI schemes as are sometimes found in products.
Several classes and interfaces are marked as deprecated. They were removed for several early 6.0.x releases but have been restored for binary backwards compatibility.
CatalogResover
,Resolver
,ResourceResolver
,StAXResolver
, andXercesResolver
are replaced by methods onXMLResolver
.Resource
,ResolvedResource
andResolvedResourceImpl
are replaced, effectively, byResourceRequest
andResourceResponse
.- All the classes related to caching.
The two main classes for users are XMLResolverConfiguration
(largely
unchanged) and XMLResolver
.
The new XMLResolver
object has methods for querying the catalog and
resolving resources. It also has methods that return resolvers for
different APIs.
getURIResolver()
returns ajavax.xml.transform.URIResolver
getLSResourceResolver()
returns aorg.w3c.dom.ls.LSResourceResolver
getEntityResolver()
returns aorg.xml.sax.EntityResolver
getEntityResolver2()
returns aorg.xml.sax.ext.EntityResolver2
getXMLResolver()
returns ajavax.xml.stream.XMLResolver
The standard contract for the Java resolver APIs is that they return
null
if the resolver doesn’t find a match. But on the modern web, lots
of URIs redirect (from http:
to https:
especially), and some
parsers don’t follow redirects. That causes the parse to fail in ways
that may not be easy for the user to fix.
By default, the XML Resolver will always resolve resources, follow redirects, and return a stream. This deprives the parser of the option to try something else, but means that redirects don’t cause the parse to fail.
If your implementation wants to explicitly just check the catalog, at
the Java API level, you can use the lookup
methods on XMLResolver
.
The resolver class can be configured with either system properties or a properties file. If a property is specified in both places, the system property wins.
It is now possible to use data:
URIs in the catalog. Data URIs are defined by
RFC 2397. For example, this catalog entry:
<uri name="http://example.com/example.xml"
uri="data:application/xml;base64,PGRvYz5JIHdhcyBhIGRhdGEgVVJJPC9kb2M+Cg=="/>
maps the URI http://example.com/example.xml
to a short XML
document defined by that data URI (<doc>I was a data URI</doc>
).
It is now possible to use classpath:
URIs in the catalog. It is also
possible to use classpath:
URIs in the catalog list. The
classpath:
URI scheme seems to be defined
somewhat informally by
the Spring framework.
In brief, a classpath:
URI is resolved by attempting to find a
document with the specified path in the classpath, including within
JAR files on the classpath.
This catalog entry:
<uri name="http://example.com/example.xml"
uri="classpath:path/example-doc.xml"/>
maps the URI http://example.com/example.xml
to a document with the path path/example-doc.xml
on
the classpath. (Searches always begin at the root of the classpath segments, so
path/example-doc.xml
and /path/example-doc.xml
are equivalent.)
Suppose, for example, that your classpath
includes /home/ndw/java/libs/example.jar
:
$ jar vtf /home/ndw/java/libs/example.jar
0 Wed May 05 14:51:50 BST 2021 META-INF/
25 Wed May 05 14:51:50 BST 2021 META-INF/MANIFEST.MF
0 Wed May 05 14:51:48 BST 2021 org/
0 Wed May 05 14:51:50 BST 2021 org/example/
3262 Wed May 05 14:51:50 BST 2021 org/example/DWIM.class
1831 Wed May 05 14:51:50 BST 2021 path/example-doc.xml
219 Wed May 05 14:51:50 BST 2021 path/something-else.txt
Assuming that this JAR file is the first place on your classpath where
path/example-doc.xml
occurs, then that’s the document that will be returned.
At this point, we expect the resolver to return that resource with a base URI of
classpath:path/example-doc.xml
. Unfortunately, if we do that, any attempt to resolve
a URI against that document’s base URI (for example, if example-doc.xml
contains an
XInclude with a relative href
value), will immediately fail. It fails constructing the
URI long before it calls the resolver to attempt to retrieve it.
To avoid this, the resolver lies. It returns the resource with the base URI set to the
resolved location, jar:file:///home/ndw/java/libs/example.jar!path/example-doc.xml
.
The URI class doesn’t resolve relative URIs against that base URI either, but at least
it doesn’t throw an exception.
The practical consequence of this is that the resolver never gets asked to resolve URIs made absolute against either of these forms of URI. If you put a document in a JAR file, make sure that all of its relative references (includes, imports, etc.) will resolve correctly. You can’t re-interpret them in the resolver.
If a project uses a particular schema, or set of schemas, it may be useful to add an additional catalog (or catalogs) to the user’s default catalog file path. That’s not currently practical: if both system properties and a property file are used to configure a resolver, and the same setting appears in both places, either the system property value is used (the default in 2.x) or the property file value is used (the default in 1.x).
So in order to add additional catalog files, you’d have to work out
the current value, construct a new value incorporating both that
default and your new catalog(s), and specify that new value in the
xml.catalog.files
system property.
To make this easier, see the xml.catalog.additions
system property,
and the catalog-additions
property file key. Both properties take
a list of catalog files. Those files will be added to the list defined
by the normal catalog settings.
The resolver is tolerant of errors in catalog files by design. A production application shouldn’t fall over because someone adds a bad catalog file to the catalog path. Instead, the errant file is simply ignored.
During development, it may be useful to take a more restrictive view.
Putting a typo in a catalog file is one common source of resolution
failures. More than once, I’ve spent time trying to track down a
resolver bug only to discover that I’d typed systemid
instead of
systemId
in a catalog file, or made some other similar error.
Obviously, editing your catalog files with a validating editor is one simply remedy to this problem, but experience suggests that’s not always what actually happens.
You can use the
catalog-loader-class property
to specify an alternate catalog loader. If you specify,
org.xmlresolver.loaders.ValidatingXmlLoader
, the catalog files will be validated
as they are loaded and validation errors will raise an exception.
In order to use this feature, you must have Jing version 20181222 on your classpath.
Starting in version 5.2.0, it is possible to construct a catalog directly from a stream of SAX events, rather than by parsing an input stream. (Shout out to @adamretter for providing this patch.)
This means that the catalog can be, for example, stored in a database where it may not be conveniently accessible as a character stream, or even stored in a completely novel format.
See the JavaDoc API for more details.
First clone the Git Repository:
$ git clone https://github.com/xmlresolver/xmlresolver.git
then enter the project folder, and pull in the Git sub-module data
:
$ cd xmlresolver
$ git submodule sync
$ git submodule update --init
To compile the project and check its tests you must have Java 21. It is
easiest to do with the provided Gradle Wrapper (i.e. ./gradlew
), but
you can use your own installed version of Gradle if it’s compatible
with Gradle 8.5 (the version of Gradle used by this project at the
time of this writing).
to build the project:
$ ./gradlew build
In order to run the tests, you must have Docker installed and you will need to build the container that runs the web server.
More details T.B.D.