About grad-cam visualization #2

itsqyh · 2024-06-19T02:53:56Z

Your work is excellent, and I have seen the Grad-CAM visualization results you provided. Could you please share the code for the Grad-CAM visualization? I would be very grateful.

gordonhu608 · 2024-06-19T04:38:47Z

Thank you for your interest in our work. We referred to the Grad-CAM visualization code in this work: https://github.com/salesforce/ALBEF/blob/main/visualization.ipynb. If you have further questions about Grad-CAM visualization, I will also be glad to help you.

itsqyh · 2024-06-19T05:04:31Z

Thank you for your interest in our work. We referred to the Grad-CAM visualization code in this work: https://github.com/salesforce/ALBEF/blob/main/visualization.ipynb. If you have further questions about Grad-CAM visualization, I will also be glad to help you.

Thank you very much for your effective response! I had previously referred to the work you provided, but due to my limited coding skills, I didn't know where to start making modifications. Although I feel this might be asking for something without much effort on my part, I still want to ask if you could share your source code. This would greatly help me understand and apply the concepts in the future. Of course, if it's inconvenient for you to share, that's perfectly fine. And thank you very much for your response :)

Oscar860601 · 2024-06-21T07:44:12Z

@gordonhu608 Following the GradCam Questions.
I just reproduce the GradCam Visualization of Llava 1.5 on some public dataset.
I want to make sure 3 details:

Which layer of the model did you use to get the result on the paper ? vision encoder layer 22 ? projector layer ?
If the output is a sequence, how did you aggregate different Gradcam result from each output token ? choosing some keywords ? or just average the gradient and activation ?
Do you notice some weird result on llava 1.5 GradCam visualization ? In my own experiments, most of the GradCam heatmap produced by llava 1.5 looks noisy.

I would reproduce GradCam on your brilliant work next week. I just want to make sure my pipeline is the same as yours.
Thanks !

gordonhu608 · 2024-06-21T20:33:11Z

1.For our work, we chose the layer of cross-attention, so I assume for llava it would be the projection layer. 2. We tested on a QA example, we computed the loss on the 'answer' part's output tokens. 3. I happened to test on llava's gradcam results, it's sometimes noisy. I conjecture each of the 576 visual tokens is attending to very different image information and studying some kind of relations.

Oscar860601 · 2024-06-22T06:01:01Z

1.For our work, we chose the layer of cross-attention, so I assume for llava it would be the projection layer. 2. We tested on a QA example, we computed the loss on the 'answer' part's output tokens. 3. I happened to test on llava's gradcam results, it's sometimes noisy. I conjecture each of the 576 visual tokens is attending to very different image information and studying some kind of relations.

Thanks for explanation !
Just for confirmation, when you said "cross-attention layer", you are referring the Multihead Attention in the Resampler in MQT ?

gordonhu608 · 2024-06-23T03:36:18Z

Yes, Correct.

SakuraTroyChen · 2024-08-16T03:39:55Z

@Oscar860601 Hello! I am struggling with GradCAM visualization, could you please provide me with your code for visualizing on llava 1.5? It would be a great help for me.

g-h-chen · 2024-09-04T22:07:17Z

@Oscar860601 Same here! Looking for a piece of code!

Yingshu-Li · 2024-09-30T02:22:13Z

@Oscar860601 Same here! Looking for a piece of code!

liuxuannan · 2024-11-16T06:41:37Z

How to get the output logit distribution of llava model? There is some wrong below the code:
with torch.inference_mode():
output = model(
input_ids,
images=images_tensor,
image_sizes=image_sizes,
use_cache=True,
)
prob = output.logits

cyj95 · 2025-01-09T03:26:14Z

@Oscar860601 Same here! Looking for a piece of code!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About grad-cam visualization #2

About grad-cam visualization #2

itsqyh commented Jun 19, 2024

gordonhu608 commented Jun 19, 2024

itsqyh commented Jun 19, 2024

Oscar860601 commented Jun 21, 2024

gordonhu608 commented Jun 21, 2024

Oscar860601 commented Jun 22, 2024

gordonhu608 commented Jun 23, 2024

SakuraTroyChen commented Aug 16, 2024

g-h-chen commented Sep 4, 2024

Yingshu-Li commented Sep 30, 2024

liuxuannan commented Nov 16, 2024

cyj95 commented Jan 9, 2025

About grad-cam visualization #2

About grad-cam visualization #2

Comments

itsqyh commented Jun 19, 2024

gordonhu608 commented Jun 19, 2024

itsqyh commented Jun 19, 2024

Oscar860601 commented Jun 21, 2024

gordonhu608 commented Jun 21, 2024

Oscar860601 commented Jun 22, 2024

gordonhu608 commented Jun 23, 2024

SakuraTroyChen commented Aug 16, 2024

g-h-chen commented Sep 4, 2024

Yingshu-Li commented Sep 30, 2024

liuxuannan commented Nov 16, 2024

cyj95 commented Jan 9, 2025