records_and_issues.txt

### Issues and TODOs
- The matched results from egg's search of a pattern can contain nodes that cannot be found
in the Egraph using egraph.lookup()

See comments in rewrites.rs:contains_blacklist(). That is where this seems to be happening.

"This placed shouldn't be reached in any case. The pat and subst passed
in as arguments are from the results of searching pat in the Egraph.
So all the nodes in pat should be present in the EGraph. But if we run
bert with 1 iteration of multi and more than ~5/6 iterations of single
rules, this place is reached. Seems that the searched matches returned
by egg can have some nodes in pat that cannot be found in the Egraph.

Right now, we simply treat the pattern as not containing blacklisted
nodes if this happens. Since we do cycle filtering in the end of each
iteration, this should be fine."

TODO: look into this. Ask Max to understand if this is expected

- We use the the correct way to treat concat ops with all inputs being weight tensors as zero
cost in tamago. But TASO doesn't (they treat all new concat as zero cost).

TODO: correct TASO side, and see how much speedup is actually due to the cost model

- I tried treating other ops with all weights input as zero cost as well, but it doesn't work
well with ILP solver. I didn't spend time to understand why

TODO: understand this and see if we can make it work, since this seems to be the more accurate
cost model.

- The naive way for cycle filtering (check cycle each time we apply a match) is only applied
in multi-pattern rewrite, since when I write it I wasn't aware that single rules can also
introduce cycles. But since we are now using efficient cycle filtering, I didn't update the
naive approach to be applied to single rules as well. If it needs to be used again for some
reason,
TODO: do naive cycle filtering for single rules as well.

- Right now, I count the number of graphs explored in TASO by the graphs that ever appears
in the candidate set. We should include the graphs which TASO found the cost goes up and not
added to the candidate set as well.

TODO: do this more proper count on TASO side


### Records

- One factor for speedup in nasnet_a:

TASO's original graph contains duplicated ops (e.g. 2 maxpool2d/avgpool2d ops on a same input
tensor). TASO cannot identify them as identical, so the duplicated ops will be kept. egg's
equality analysis naturally identifies such cases, since a same op with the same input
eclasses will be identified as equivalent. So egg is able to extract graphs without such
duplicated nodes. Due to the way I measure runtime of the original graph from egg side,
the original graph will also do this deduplication, which makes the measured runtime of the
original graph different between egg side and TASO side.


- Algorithm description of efficient filtering:

    - Pre-filter: pass before run_one, record descendents. During application, do filter
    - Record a HashSet of blacklist nodes. Everytime before doing a search pattern (beginning
    of run_one outside if, end of run_one in if), update the blacklist with egraph.find
    - Add a check before applying a rewrite (both multi and single), on if the src graph
    contain any blacklist node
    - Check blacklist in get descendents
    - within run_one, record the order that nodes are added (vec of nodes). After rebuild,
    update the nodes with egraph.find, and construct a hashmap of node -> order
    - At the end of run_one, pass over egraph by dfs (not considering blacklisted nodes on
    children relationship), passing for each eclass: path_from_root (eclass, enode). for each
    enode in this eclass, if any children eclass occurs in path_from_root, record the set of
    enodes as a cycle. After the pass, for each cycle, select most recent added node and add
    to blacklist; if a cycle already contain blacklisted node, pass. Do this until there is
    no more cycles.
    - Add filtering on single rules, by having order in analysis
    - In the end, pass this blacklist to prep_ilp_data.