diff --git a/src/paper.md b/src/paper.md index cea3d83..c355410 100644 --- a/src/paper.md +++ b/src/paper.md @@ -387,7 +387,7 @@ Faceted Feature Visualization

- The reader may be curious why we do not maximize {"f(g(x)) + w^Tg(x)"} instead. We have found that, in practice, the former objective produces far higher quality feature visualizations; we believe this is because the \nabla f(g(x)) acts as a acts as a filter, downweighting the irrelevant components of {"g(x)"} that do not contribute to the objective f \circ g(x). We have found, too, replacing the diversity term on the intermediate activations {"g"} in with g(x)\odot \nabla f(g(x)) improves the quality of resulting visualizations dramatically. + The reader may be curious why we do not maximize {"f(g(x)) + w^Tg(x)"} instead. We have found that, in practice, the former objective produces far higher quality feature visualizations; we believe this is because the \nabla f(g(x)) acts as a filter, downweighting the irrelevant components of {"g(x)"} that do not contribute to the objective f \circ g(x). We have found, too, replacing the diversity term on the intermediate activations {"g"} in with g(x)\odot \nabla f(g(x)) improves the quality of resulting visualizations dramatically.