Skip to content

Understanding Ansor

Gaurav Verma edited this page Dec 16, 2023 · 3 revisions

Overview:

  • Ansor explores a large optimization search space. It samples high-performing programs from a "hierarchal representation"[A] of the search space.
  • It further fine-tunes the sampled programs using the evolutionary search and gradient-based cost model (e.g., xgboost), resulting in the best-performing tensor programs.
  • Provides/Explores comprehensive coverage of optimizations
  • Ansor dynamically prioritizes sub-graphs of the DNNs that are more likely to improve the overall performance.

Ansor is evaluated on three classes of hardware: Intel CPU, ARM CPU, and NVIDIA GPU.

Def of a Task:

A task is a process performed to generate high-performance programs for a subgraph.

Design Approach:

Ansor uses the operator fusion algorithm from Relay to convert DNNs from input model formats to partitioned subgraphs.

  1. Program Sampler:
    • constructs large search space with two levels: sketch and annotation
    • samples diverse programs from it
    • expands the search space by recursively applying the derivation rules
    • randomly samples complete programs
  2. Performance Tuner:
    • fine-tunes the performance of the sampled program from above (1).
    • It uses re-sampled new programs and good programs from the previous iterations to start the evolutionary search (ES)
    • ES fine-tunes the programs by mutation and crossover (Questionable to me!)
  3. Task Scheduler:
    • allocates time and resources for optimizing multiple subgraphs in the DNNs.
    • Uses a scheduling algorithm based on gradient descent to the most likely candidates (subgraphs).
    • A task is picked based on the gradient-based algorithm maximizing the objective function in a given time budget.

Sketch Generation:

  • To generate a sketch for a DAG: topologically traverse all the nodes and construct the structure iteratively.
  • The sketch includes a basic tile and fusion structures for compute-intensive computation nodes (Conv2D) with data-reuse opportunities.
  • Element-wise nodes are inlined (ReLU)
  • Multiple sketches are generated using a derivation-based recursive approach
  • Derivation rules will differ as per the underlying hardware. Example: multi-level tiling:
    • CPU: SSRSRS, S = one tile level of space loop; R = one tile level of reduction loop
    • GPU: SSSRRSRS to match the architecture of the GPU

Random Annotation:

  • Sketches are annotated to make them complete programs for fine-tuning and evaluation
    • E.g., fill out tile size, unroll an inner loop, etc.
  • Users can provide custom hints to adjust the annotation policy.

Fine-Tuning:

  • Iterative process
  • First, use ES to find high-performant programs using a cost model (probabilistic selection)
  • Get actual measurements on hardware
  • Retrain the cost model

Evolutionary Search

  • meta-heuristic algorithm
  • The following types of mutations are applied:
    • tile size mutation
    • parallel mutation
    • pragma mutation
    • computation location mutation
    • node-based crossover

[A] Hierarchical representation:** high-level structures generation and low-level details; sketch and annotation