XSDInferencer is an automatic XML Schema generation tool, which automatically generates XSD files against which some provided XML positive examples validate. It also gathers statistics from the content of that documents.
The XSDs generated by this tool are intended to be as comprehensible as possible, by taking advantage of many commonly unused XSD features. They are useful to formally describe the common structure of the input documents and to find structural errors on other similar ones.
Statistics go further and complete the structural information provided by XSD with information about content: What content would occur more probably on similar files and what not. The combination of both XSD and statistics makes XSDInferencer a suitable tool for XML analysis.
Main benefits of XSDInferencer:
- 100% conformant with XML and XSD specifications, including correct namespace handling.
- Highly configurable, so that generated schemas fit better into user needs.
- Modular and easily extensible.
- It can infer simple and complex types depending not only on node names, but also depending on their context.
- Predefined XSD simple types can be detected: xs:boolean, xs:integer, xs:decimal and xs:string.
- Enumerated simple types can be infered, if desired.
- Optional and required attributes are detected.
- Complex admissible children structures can be infered, including (but not limited to):
- Choices
- Sequences
- Unordered elements (xs:all)
- Optional elements.
- Repeated elements.
- Any valid combination of the previous ones.
- Many XSD representation options.
- Complete statistics, which are not provided by other automatic XSD generators.
This project uses Maven as build automation tool so, in order to build it, it is only necessary to install Maven (version 3.0.3 or higher) and run:
mvn package
(You can also run mvn install
to install XSDInferencer to the local Maven repository)
Once the project is built, at the target
directory some compressed files will appear, concretly, these three:
XSDInferencer-version.zip
XSDInferencer-version.tar.gz
XSDInferencer-version.tar.bz2
(where 'version' is replaced by the current version of the tool)
To install the tool, just take one of the three files (all of them contain the same files) and extract it to the installation directory. To run the tool, just execute:
XSDInferencer-version.bat
or
XSDInferencer-version.sh
depending on your operativing system (where 'version' is replaced by the current version of the tool).
Please ensure that the JAR and all the other contained files (the 'lib' subdirectory) are present at the installation directory (at the same paths) before running the tool.
The design, implementation, theoretical background and everything necessary to know about the tool (and much more) is deeply described at this BEA dissertation, published in Archivo Digital UPM digital storage. It is recommended to have a look at it to anyone interested to know how does the tool work, to anyone who is going to modify the code of the tool or even to anyone interested to fine tune the inference configuration.
Some modifications have been also performed to the tool after the BEA dissertation was published, so have a look the wiki for details.