You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following peer review was solicited as part of the Distill review process.
The reviewer chose to keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service they offer to the community.
Advancing the Dialogue
What's the overall story here?
multimodal neurons that fire for related concepts, that are interestingly abstract
new adversarial attacks for certain types of neural models
ways of exploring what neurons associate with
How significant are these contributions? 4/5
Ultimately, this is just about one type of model which isn't to my knowledge that widely used.
That being said, it's quite cool info about that type of model.
Outstanding Communication
Article structure: 4/5
Basically great, except for how under-explained the face visualizations are.
Writing style: 4/5
Quite readable, but there were quite a few typos.
Diagram & interface style: 3/5
Cool diagrams, but could be better explained, and often had parts obscured in my browser (Firefox on Ubuntu).
How readable is the paper, accounting for difficulty? 4/5
Scientific Correctness & Integrity
Are the claims in the article well supported? Are experiments in the article well designed, and interpreted fairly? 3/5
I think a fair amount of claims about neurons were made without showing the evidence for those claims, but instead qualitatively summarizing that evidence, which is problematic for a subject area that is inherently somewhat subjective.
Also, neurons often responded to things that seemed incongruous with their stated meaning.
Basically, I'm sold on most of the main claims, but I have some quibbles with some individual neurons.
Does the article critically evaluate its limitations? How easily would a lay person understand them? 4/5
Limitations:
Only applies to one type of model
Allegedly didn't work on larger ResNets
Needs special attack texts
I think it does a fine job at evaluating its limitations.
How easy would it be to replicate (or falsify) the results? 2-4/5.
Depends a great deal on how well the training method is explained in the other paper. The very basics of the training method were described, but it was hard to see the details of how the dataset was collected.
It would also be a lot of work to look at each neuron.
Does the article cite relevant work? 4/5.
AFAIK
Considering all factors, does the article exhibit strong intellectual honesty and scientific hygiene? 3/5
I think that epistemic hygiene would be improved by allowing readers to see all of the visualizations of neurons that were used to guide inferences: for instance, all of the faceted feature visualizations (rather than just the face-like ones), or the images of the words 'banana' and 'lemon' that the 'yellow neuron' mentioned in the introduction responded to.
Also, some aspects of the paper are not well-explained, see the 'questions about how things worked' section in the miscellaneous notes.
Miscellaneous notes
Questions about how things worked:
What was the source of the (image, caption) pairs used to train CLIP? The authors reveal that they were collected in 2019, but only in a somewhat obscure footnote.
I'm surprised that neurons reliably had a text facet, face facet, architecture facet, and logo facet. Did these reliably appear, or was some technique used to deliberately get these kinds of facets?
In figure 14, the granny smith apple isn't classified confidently with any of the displayed categories using linear probe classification. Do we know what it is classified as?
Also, how is 'zero-shot' ImageNet classification working?
Thoughts about neurons:
The dataset pictures that the 'West Africa neuron' (1,257 in 4/5/Add_6) has its highest activations in response to are of humans and gorillas, but not other animals. Plausibly a reflection of crude stereotypes. It also responds to text about South Sudan and Zambia, which are not in West Africa.
The 'Hitler neuron' (309 in 4/5/Add_6) also responds to images of .de domains
For region neurons that are Australian, one of them (1,522 in 4/2/Add_6 in Multimodal Resnet-50) responds to Namibia on the map, but all dataset examples that it responds to relate to Australia or NZ. It would be interesting to see if that neuron responds to any Namibian content.
Interestingly, the 4% of regional neurons the network devotes to Africa is pretty close to the 3% of world GDP Africa represents. Would be interesting to see whether this fit is true for other regions.
How do we know that the "2000-2012 neuron" in fact relates to that date range?
The mental illness neuron seems to respond to 'anxiety', 'depression', and 'bipolar', but would be nice to see responses to wider swath of mental illness. One way to explore could be 3 clusters of personality disorders.
The 'dice + poet + ?' neuron (878 in 4/2/Add_6) kind of makes sense to me based on the dataset examples: it has one facet for games, especially dice and card games, and one aspect for poetry, bleeding into literature and fiction. I can sort of tell a story of how they're unified, since poetry is probably the most game-like written form.
Doubts:
"Pepe the frog, a symbol of white nationalism in the United States allegedly promoted by Russia." - not how I understand Pepe's primary use, see the Wikipedia article
"the jealousy emotion is success + grumpy" - well, success + grumpy + hug - crying
Typos and typo-likes:
It appears that Radford et al [3] is meant to be anonymous at this stage of the review process.
Also figures 9 and 10 have some plus and minus signs that go nowhere.
Footnote 38: "to find the neurons that maximially discriminate between the attribution vectors". 'maximially' -> maximally
Footnotes 39 and 40 seem incomplete.
Footnote 41 in text corresponds to the footnote labelled 10 at the bottom of the page, and from then on the footnote numbering is off in the text.
""Pressured" detects Asian culture. (disrespected)" - should '(disrespected)' be there?
"However, this can incent the visualizations" - 'incent' should be 'incentivize'
"Faceted feature visualiziation allows us to see some of this diversity. Hover on a neuron to isolate acitvations." - 'visualiziation' -> visualization, 'acitvations' -> activations
Footnote 15: "There are other {mref("neurons", 672)} which weakly respond to Hillary Clinton"
Footnote about 'angel neuron' doesn't actually link to angel neuron.
Complaints about graphs etc:
Microscope takes over 30 seconds to load for me.
The 'synthetic tuning' section of microscope never loaded any content for me.
I wish the y-axis was labelled on figure 2 and 5, which would have made it easier to understand those figures.
Figures 4 and 15 were glitchy for me, using Firefox on Ubuntu, in a way that made it hard to understand all the relevant information. See this imgur gallery.
I wish figure 8 explained what the rows and columns meant more clearly - it took me a while to figure out which were the ImageNet categories and which were the neuron labels.
Figure 3: "the range over the bar shows the standard deviation of the activations of the person's photos." - this is kind of weird, usually those ranges are confidence intervals right? I kind of wish they were drawn differently, maybe without the bar-lines at the ends.
The first three parts of this worksheet ask reviewers to rate a submission along certain dimensions on a scale from 1 to 5. While the scale meaning is consistently "higher is better", please read the explanations for our expectations for each score—we do not expect even exceptionally good papers to receive a perfect score in every category, and expect most papers to be around a 3 in most categories.
Any concerns or conflicts of interest that you are aware of?: No known conflicts of interest What type of contributions does this article make?: Explanation of existing results
Advancing the Dialogue
Score
How significant are these contributions?
4/5
Outstanding Communication
Score
Article Structure
4/5
Writing Style
4/5
Diagram & Interface Style
3/5
Impact of diagrams / interfaces / tools for thought?
4/5
Readability
4/5
Scientific Correctness & Integrity
Score
Are claims in the article well supported?
3/5
Does the article critically evaluate its limitations? How easily would a lay person understand them?
4/5
How easy would it be to replicate (or falsify) the results?
2-4/5
Does the article cite relevant work?
4/5
Does the article exhibit strong intellectual honesty and scientific hygiene?
3/5
The text was updated successfully, but these errors were encountered:
The following peer review was solicited as part of the Distill review process.
The reviewer chose to keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service they offer to the community.
Advancing the Dialogue
What's the overall story here?
How significant are these contributions? 4/5
Outstanding Communication
Article structure: 4/5
Writing style: 4/5
Diagram & interface style: 3/5
How readable is the paper, accounting for difficulty? 4/5
Scientific Correctness & Integrity
Are the claims in the article well supported? Are experiments in the article well designed, and interpreted fairly? 3/5
Does the article critically evaluate its limitations? How easily would a lay person understand them? 4/5
How easy would it be to replicate (or falsify) the results? 2-4/5.
Does the article cite relevant work? 4/5.
Considering all factors, does the article exhibit strong intellectual honesty and scientific hygiene? 3/5
Miscellaneous notes
Questions about how things worked:
Thoughts about neurons:
Doubts:
Typos and typo-likes:
Complaints about graphs etc:
Distill employs a reviewer worksheet as a help for reviewers.
The first three parts of this worksheet ask reviewers to rate a submission along certain dimensions on a scale from 1 to 5. While the scale meaning is consistently "higher is better", please read the explanations for our expectations for each score—we do not expect even exceptionally good papers to receive a perfect score in every category, and expect most papers to be around a 3 in most categories.
Any concerns or conflicts of interest that you are aware of?: No known conflicts of interest
What type of contributions does this article make?: Explanation of existing results
The text was updated successfully, but these errors were encountered: