## Description
Generate swagger specs from raw data of the utilization of the API rather than from the annotations on the API.
Typically swagger spec is written and published by the API owners via annotation on the source code of their API. The problem is that not all providers bother to use Swagger to describe their API.
Raw2swagger intends to create an accurate swagger specs from raw data using unsupervised learning on raw data that you, an API consumer, can collect. For instance instrumenting the API plugin or storing the requests logs against the API that you want to built the swagger spec for.
Swagger is a specification (JSON-based) to describe REST APIs.
The swagger spec can be used to automatically generate beautiful documentation for your API that can be used interactively by your users.
Many API management solutions like 3scale accept swagger-specs to describe your API so that documentation looks like this
Swagger offers a framework to build the API description as annotation on your source code for JAVA family languages (using JAX). There are also tools like source2swagger that are language agnostic.
To install from rubygems
gem install raw2swagger
from source
git clone https://github.com/solso/raw2swagger.git
gem build raw2swagger.gemspec
gem install raw2swagger
To create the feeder object,
require 'raw2swagger'
feeder = Raw2Swagger::Feeder.new()
Add a new raw entry,
feeder.process(entry)
An entry is a Hash
that can be generated from logs, plugins, etc. Up to you. The hash looks like this:
{
"method" => "GET",
"path" => "/admin/api/accounts/34333.xml",
"status" => 200,
"query_string" => "provider_key=foo&page=30%per_page=10",
"body" => "",
"host" => "raw2swagger.3scale.net",
"port" => 80,
"headers" => {}
}
Another example on an entry:
{
"method" => "POST",
"path" => "/resources.json",
"status" => 200,
"query_string" => "",
"body" => {"id" => 10, "foo" => "bar"},
"host" => "raw2swagger.3scale.net",
"port" => 80,
"headers" => {"Content-Type" => "application/json"}
}
The query_string
and request body
fields can either be String
or a Hash
. The headers
field must always be a Hash
.
The field Host
is used to distinguish between APIs. You can use the same Feeder for multiple API's.
After adding entries, raw2swagger will start figuring out the swagger spec, you can get it anytime with
swg = feeder.spec("raw2swagger.3scale.net").to_swagger()
Since Feeder can hold different specs you must use the Host
field to access the spec of the API that you want.
You only need to save it a file
f = File.new("my_autogenerated_swagger_spec.json", "w")
f.puts swg.to_json()
f.close
The file sample_api_traffic.log contains +2000 entries from 3scale Account Management API that are used for tests.
After creating a Feeder object, you can start by processing a single entry
GET /accounts/42.xml?key=foo
At this point raw2swagger will think that the API has only one resource (1 end-point) with one operation (a GET):
/accounts/42.xml
with a GET operation
parameters: [key]
If you feed two more entries like these
GET /accounts/54.xml?key=bar
POST /users.xml
raw2swagger will output an improved spec that has 2 resources,
/accounts/{account_id}.xml
with a GET operation
parameters: [account_id, key]
and
/users.xml
with a POST operation
parameters: []
Note that it has learned that there is an account_id
parameter. Therefore the previous entry /accounts/42.xml
is subsumed since
/accounts/{accounts_id}.xml
is more general. Also, it knows that account_id
is a required parameter.
The more entries you feed to raw2swagger, the better the spec will become.
Feeder has one optional parameter
f = Feeder.new(:occurrences_threshold => 0.20)
The default of :occurrences_threshold
is 1.0, which will generalize paths as soon as possible on an eager fashion.
If you never want to generalize paths use 0.0.
Typically 1.0 works very well but for some API's it can cause an over-generalizations, for instance:
GET /api/users.xml
GET /api/applications.xml
GET /api/features.xml
Might end up having
GET /api/{api_id}.xml
To avoid such events, we recommend tuning the :occurrences_threshold
value until you are ok with the results, 0.2 works pretty well.
The 0.2 means that it will only generalize two segments of a path if both segments are 5 times less frequent (1/0.2) that the most frequent path segment. This is an heuristic that helps distinguish those path segments that are parameters from those who are constants.
Fork the project and send pull requests.