Skip to content

Commit

Permalink
Arbitrary WebAPI JS instrumentation (#642)
Browse files Browse the repository at this point in the history
* Add mdn-browser-compat-data

* js_instrument_modules as list

* Add mdn-browser-compat

* Pass a list of instrumentingFunctions

* Script to generate api data

* Working give or take

Getting errors like

OpenWPM: Error name: TypeError post_request_ajax.html:237:17
OpenWPM: Error message: can't redefine non-configurable property
"UNSENT" post_request_ajax.html:238:17

* Small naming cleanup

* Handle non-configurable properties

* Lint

* Add aspirational API

* Begin migration to new JSInstrumentationRequest interface.

* We build and mandate LogSettings.
* We have a new JSInstrumentatinRequest that everything runs through
* Preset, fingerprinting, will be specified in JSON

* Continue making progress

Enum for Operation

* Begin implementing jsModuleRequest validation.

Changing my mind - all validation and construction to be done python
side.
This will reduce JS overhead at runtime.

* Big cleanout- js-instrumentation work moving to python.

* Continue update to python js-instrumentation

* Lint

Can't do all the things I want to with typing due to scope
when content is loaded into page.

* noqa on wip jsinstrumentation file

* Begin updating existing js instrument tests.

* Small cleanups

* Fix naming in calling instrumentJS

* No display mode native for testing

* Restore py test file to orig.

* Support null propertiesToInstrument

* Re-work instrumentObject tests

* Clean-up text in test page.

* Add default to getLogSettings function

* Don't re-assign logSettings.propertiesToInstrument

* Revert "Don't re-assign logSettings.propertiesToInstrument"

This reverts commit 87ccdab.

* Better assign propertiesToInstrument

* Small cleanup

* Make new logSettings object

* Prettify

* Small clean

* -- BREAK -- JS Rework complete

With this commit, the JS side of this PR is complete.
Tests are still failing as fingerprinting implementation has not been
completed on the python side, but all test_js_* tests are passing due
to the core JS API rework being in place.

* Write-out mdn compat data to js_instrumentation .py

* Dry out js test code

* Consolidate JS tests

* Finish missing renames, and add test js via browser_params

* pep8

* New files and failing tests.

* Add a json schema for js_instrument_modules

* Latest py tests

* Flake8

* Ongoing progress.

* More code, more tests.

* flake

* Rename mdn file

* Add latest tests - just implement fingerprinting.json

* flake8

* Add fingerprinting.json (incomplete)

Mimetypes and plugins

* Correct logSettings property name

* Restore create_xpi as function

Needed by manual_test

* Make explicit option for logging to console

* Process browser_params in task manager

* Start being able to pass browser_params to selenium

Also update manual_test to use click

* Revert "Make explicit option for logging to console"

This reverts commit c840fbc.

* Get manual_test working with browser_params

From toplevel directory run:

`python -m test.manual_test --selenium --browser-params --browser-params-file=debug_params.json`

* More robust test for simple fingerprinting output

Can't guarantee order of string output

* Add timing information when testing

* Make recheck really fast.

You'll never hit this recheck as it all happens before page load.

* Handle all inputs properly

* Debug with all window params instrumented

* Load xpi we just built

* Check for ff version support

* Save a bunch of properties

* Relax constraints on what we can instrument.

Let failing happen during instrumentation by using subscript notation.
Don't restrict to MDN list.

* Correct stringifying

* Better name example params, fix some bugs, sample a_f

Some example browser_params - a_f is just working - but crushes
on a page like google.com. g_l and m_z haven't been vetted yet.

* flake8

* Move example browser_params file out of harms way

* Add failing test for regression I introduced.

* Fix for regression.

* Add simple mimeTypes and plugins to fingerprinting.

* Lint JS

* Rm mdn_browser_comat stuff no longer needed

* Remove example_browser_params

They're not used in tests, were just for my testing.

* Load JS_INSTRUMENT_MODULES from JSON string

* Rename JS_INSTRUMENT_MODULES to JS_INSTRUMENT_SETTINGS

* Fixes #28 - Instrument all window.navigator properties.

* Finish removing unused mdn-compat pieces.

* EventID as a shadow variable

* Flake8

* Remove $ prefix and rename

$instrumentionRequests -> jsInstrumentationSettings

* Rename jsInstrumentationRequests->jsInstrumentationSettings

* TS Lint

* Remove use of "request".

Rename python side as per discussion with @englehardt.
Privatize most methods
Numpy docstrings for public methods

* Convert assertions to ValueErrors

* Rename file/folder and fingerprinting -> collection_fingerprinting

file JSInstrumentation.py -> js_instrumentation/__init__.py
collections have their own folder

* Clean-up naming in schema

* Add processing of json schema to documentation

* Rename js_instrumentation again and ref schema location

* Pass JSON not a js string

* Do copying to xpi in npm postbuild step

* Fix import in manual_test

* Revert "Pass JSON not a js string"

This reverts commit 8eb4edb.

* Add titles to schema pieces

* Add docs for js_instrument_settings

* Bit more README cleanup

* Update README.md

Co-authored-by: Steven Englehardt <[email protected]>

* Move updating schema docs section

* Add title
* Fix typo in mac-osx hyperlink

* Make the single-key dictionary clearer

* Remove versions from npm package files

* Clean up instrument_existing_window_property.html and js

We're not using the js in two htmls now, so unify like other test files

* Fix pyside instrumentation test, add more clarificaiton to README

* pyside test must instrument browser apis
* add more to readme to clarify instrumenting

* Use example.com and example.org as localDomains

* context-manage open, and flake8

Co-authored-by: Steven Englehardt <[email protected]>
  • Loading branch information
birdsarah and englehardt authored Jul 8, 2020
1 parent 627f440 commit aefa048
Show file tree
Hide file tree
Showing 62 changed files with 6,862 additions and 4,077 deletions.
74 changes: 66 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@ Table of Contents <!-- omit in toc -->
* [Debugging the platform](#debugging-the-platform)
* [Managing requirements](#managing-requirements)
* [Running tests](#running-tests)
* [Mac OSX](#mac-osx-limited-support-for-developers)
* [Mac OSX](#mac-osx)
* [Updating schema docs](#updating-schema-docs)
* [Troubleshooting](#troubleshooting)
* [Docker Deployment for OpenWPM](#docker-deployment-for-openwpm)
* [Building the Docker Container](#building-the-docker-container)
Expand Down Expand Up @@ -80,9 +81,9 @@ After running the install script, activate your conda environment by running:
### Developer instructions

Dev dependencies are installed by using the main `environment.yaml` (which
is used by `./install.sh` script.
is used by `./install.sh` script).

You can install pre-commit hooks install the hooks by running `pre-commit install` to
You can install pre-commit hooks install the hooks by running `pre-commit install` to
lint all the changes before you make a commit.

### Troubleshooting
Expand Down Expand Up @@ -173,8 +174,22 @@ available [below](#output-format).
with the exception of images.
See: [Bug 634073](https://bugzilla.mozilla.org/show_bug.cgi?id=634073).
* Javascript Calls
* Records all method calls (with arguments) and property accesses for APIs
of potential fingerprinting interest:
* Records all method calls (with arguments) and property accesses for configured APIs
* Set `browser_params['js_instrument'] = True`
* Configure `browser_params['js_instrument_settings']` to desired settings.
* Data is saved to the `javascript` table.
* The full specification for `js_instrument_settings` is defined by a JSON schema.
Details of that schema are available in [docs/schemas/README.md](docs/schemas/README.md).
In summary, a list is passed with JS objects to be instrumented and details about how
that object should be instrumented. The js_instrument_settings you pass to browser_params
will be validated python side against the JSON schema before the crawl starts running.
* A number of shortcuts are available to make writing `js_instrument_settings` less
cumbersome than spelling out the full schema. These shortcuts are converted to a full
specification by the `clean_js_instrumentation_settings` method in
[automation/js_instrumentation.py](automation/js_instrumentation.py).
* The first shortcut is the fingerprinting collection, specified by
`collection_fingerprinting`. This was the default prior to v0.11.0. It contains a collection
of APIs of potential fingerprinting interest:
* HTML5 Canvas
* HTML5 WebRTC
* HTML5 Audio
Expand All @@ -184,8 +199,43 @@ available [below](#output-format).
and `window.name` access.
* Navigator properties (e.g. `appCodeName`, `oscpu`, `userAgent`, ...)
* Window properties (via `window.screen`)
* Set `browser_params['js_instrument'] = True`
* Data is saved to the `javascript` table.
* `collection_fingerprinting` is the default if `js_instrument` is `True`.
* The fingerprinting collection is specified by the json file
[fingerprinting.json](automation/js_instrumentation_collections/fingeprinting.json).
This file is also a nice reference example for specifying your own APIs using the other
shortcuts.
* Shortcuts:
* Specifying just a string will instrument
the whole API with the [default log settings](docs/schemas/js_instrument_settings-settings-objects-properties-log-settings.md)
* For just strings you can specify a [Web API](https://developer.mozilla.org/en-US/docs/Web/API)
such as `XMLHttpRequest`. Or you can specify instances on window e.g. `window.document`.
* Alternatively, you can specify a single-key dictionary that maps an API name to the properties / settings you'd
like to use. The key of this dictionary can be an instance on `window` or a Web API.
The value of this dictionary can be:
* A list - this is a shortcut for `propertiesToInstrument` (see [log settings](docs/schemas/js_instrument_settings-settings-objects-properties-log-settings.md))
* A dictionary - with non default log settings. Items missing from this dictionary
will be filled in with the default log settings.
* Here are some examples:
```
// Collections
"collection_fingerprinting",
// APIs, with or without settings details
"Storage",
"XMLHttpRequest",
{"XMLHttpRequest": {"excludedProperties": ["send"]}},
// APIs with shortcut to includedProperties
{"Prop1": ["hi"], "Prop2": ["hi2"]},
{"XMLHttpRequest": ["send"]},
// Specific instances on window
{"window.document": ["cookie", "referrer"]},
{"window": ["name", "localStorage", "sessionStorage"]}
```
* Note, the key / string will only have it's properties instrumented. That is, if you want to instrument
`window.fetch` function you must specify `{"window": ["fetch",]}`. If you specify just `window.fetch` the
instrumentation will try to instrument sub properties of `window.fetch` (which won't work as fetch is a
function). As another example, to instrument window.document.cookie, you must use `{"window.document": ["cookie"]}`.
In instances, such as `fetch`, where you do not need to specify `window.fetch`, but can use the alias `fetch`,
in JavaScript code. The instrumentation `{"window": ["fetch",]}` will pick up calls to both `fetch()` and `window.fetch()`.
* Response body content
* Saves all files encountered during the crawl to a `LevelDB`
database de-duplicated by the md5 hash of the content.
Expand Down Expand Up @@ -537,7 +587,7 @@ in the test directory to run all tests:
$ cd test
$ py.test -vv

See the [pytest docs](https://docs.pytest.org/en/latest/) for more information on selecting
See the [pytest docs](https://docs.pytest.org/en/latest/) for more information on selecting
specific tests and various pytest options.

### Mac OSX
Expand All @@ -552,6 +602,14 @@ Running Firefox with xvfb on OSX is untested and will require the user to instal
an X11 server. We suggest [XQuartz](https://www.xquartz.org/). This setup has not
been tested, we welcome feedback as to whether this is working.

### Updating schema docs

In the rare instance that you need to create schema docs
(after updating or adding files to `schemas` folder), run `npm install`
from OpenWPM top level. Then run `npm run render_schema_docs`. This will update the
`docs/schemas` folder. You may want to clean out the `docs/schemas` folder before doing this
incase files have been renamed.


Troubleshooting
---------------
Expand Down
22 changes: 20 additions & 2 deletions automation/Extension/firefox/feature.js/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,25 @@ async function main() {
navigation_instrument:true,
cookie_instrument:true,
js_instrument:true,
js_instrument_modules:"fingerprinting",
js_instrument_settings: `
[
{
object: window.CanvasRenderingContext2D.prototype,
instrumentedName: "CanvasRenderingContext2D",
logSettings: {
propertiesToInstrument: [],
nonExistingPropertiesToInstrument: [],
excludedProperties: [],
excludedProperties: [],
logCallStack: false,
logFunctionsAsStrings: false,
logFunctionGets: false,
preventSets: false,
recursive: false,
depth: 5,
}
},
]`,
http_instrument:true,
callstack_instrument:true,
save_content:false,
Expand Down Expand Up @@ -51,7 +69,7 @@ async function main() {
loggingDB.logDebug("Javascript instrumentation enabled");
let jsInstrument = new JavascriptInstrument(loggingDB);
jsInstrument.run(config['crawl_id']);
await jsInstrument.registerContentScript(config['testing'], config['js_instrument_modules']);
await jsInstrument.registerContentScript(config['testing'], config['js_instrument_settings']);
}

if (config['http_instrument']) {
Expand Down
20 changes: 17 additions & 3 deletions automation/Extension/firefox/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions automation/Extension/firefox/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
{
"name": "OpenWPM",
"description": "OpenWPM Client extension",
"version": "1.0.0",
"author": "Mozilla",
"dependencies": {
"openwpm-webext-instrumentation": "../webext-instrumentation"
Expand Down Expand Up @@ -35,11 +34,12 @@
"private": true,
"repository": {
"type": "git",
"url": "https://github.com/mozilla/openwpm-firefox-webext"
"url": "git+https://github.com/mozilla/OpenWPM.git"
},
"scripts": {
"prebuild": "cd ../webext-instrumentation && npm run build && cd - && webpack",
"postinstall": "cd ../webext-instrumentation && npm install",
"postbuild": "cp dist/openwpm-1.0.zip openwpm.xpi",
"build": "web-ext build",
"eslint": "eslint . --ext jsm,js,json",
"lint": "npm-run-all lint:*",
Expand Down
Loading

0 comments on commit aefa048

Please sign in to comment.