Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor validator specification integration #4566

Closed
schlessera opened this issue Apr 11, 2020 · 2 comments
Closed

Refactor validator specification integration #4566

schlessera opened this issue Apr 11, 2020 · 2 comments
Assignees
Labels
Epic Sanitizers Validation WS:Perf Work stream for Metrics, Performance and Optimizer

Comments

@schlessera
Copy link
Collaborator

schlessera commented Apr 11, 2020

Some of the validation we are doing in our sanitizers is based on the official validator specification found in https://github.com/ampproject/amphtml.

Current Process

  1. Clone https://github.com/ampproject/amphtml repository.
  2. Use Python to assemble all protoascii files (main + extensions).
  3. Use Python to parse assembled protoascii file into a PHP class containing the spec in array form ( => https://github.com/ampproject/amp-wp/blob/1.4.4/includes/sanitizers/class-amp-allowed-tags-generated.php).
  4. Load PHP class into plugin and traverse provided arrays in several ways to power the sanitizer logic.

Problems with this process that need solving

  • The project cloned in step 1. is really big, so this uses a lot of time and bandwidth.
  • Steps 2. and 3. require Python, so this adds an entire additional ecosystem to the dependencies of the project just for the sake of parsing the spec.
  • Step 4. is an all-or-nothing approach of loading the spec. To validate a single value, the complete spec (more specifically, the subset we converted to PHP) needs to be loaded into memory.
  • Step 4. is a huge list of arrays within arrays that contain repeated strings and other values that could be normalized.
  • As step 4. is based on arrays, which are efficient on key-based retrieval but inefficient on value-based retrieval, we end up with a structure that is mostly optimized towards one pre-determined dimension, and is ill-suited to cover any other dimensions (that might even only come up in future requirements). (see Refactor generated spec data to facilitate looking up by spec name #3817 for an example)
@westonruter
Copy link
Member

In regards to step 1, it's not so bad now because we actually grab the tarball of the source code for the latest release rather than doing a full clone. Still, this involves downloading much more than we need. It would better to just download the protoascii files alone, or rather to use the JSON representation as described in #2769.

@kmyram kmyram added the WS:Perf Work stream for Metrics, Performance and Optimizer label Aug 5, 2020
@westonruter
Copy link
Member

westonruter commented Dec 16, 2020

@schlessera as part of your work, please make sure that there's a way to look up a spec by spec_name (see #4597). For example, a validation error object should store the spec_name(s) for the validator spec that it violates. The UI should then be able to look up that validator spec to provide additional information about the rule, including most importantly the spec_url associated with the rule. This is key for #1401.

For example, looking up a spec by name of amp-next-page with src attribute should return:

array(
'attr_spec_list' => array(
'deep-parsing' => array(),
'max-pages' => array(),
'src' => array(
'disallowed_value_regex' => '__amp_source_origin',
'mandatory' => true,
'value_url' => array(
'allow_relative' => false,
'protocol' => array(
'https',
),
),
),
'xssi-prefix' => array(),
),
'tag_spec' => array(
'reference_points' => array(
'AMP-NEXT-PAGE > SCRIPT[type=application/json]' => array(
'mandatory' => false,
'unique' => true,
),
'AMP-NEXT-PAGE > [footer]' => array(
'mandatory' => false,
'unique' => true,
),
'AMP-NEXT-PAGE > [recommendation-box]' => array(
'mandatory' => false,
'unique' => true,
),
'AMP-NEXT-PAGE > [separator]' => array(
'mandatory' => false,
'unique' => true,
),
),
'requires_extension' => array(
'amp-next-page',
),
'spec_name' => 'amp-next-page with src attribute',
'spec_url' => 'https://amp.dev/documentation/components/amp-next-page/',
'unique' => true,
),
),

I'm going to close #3817 in favor of your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic Sanitizers Validation WS:Perf Work stream for Metrics, Performance and Optimizer
Projects
Archived in project
Development

No branches or pull requests

3 participants