Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About grad-cam visualization #2

Open
itsqyh opened this issue Jun 19, 2024 · 11 comments
Open

About grad-cam visualization #2

itsqyh opened this issue Jun 19, 2024 · 11 comments

Comments

@itsqyh
Copy link

itsqyh commented Jun 19, 2024

Your work is excellent, and I have seen the Grad-CAM visualization results you provided. Could you please share the code for the Grad-CAM visualization? I would be very grateful.

@gordonhu608
Copy link
Owner

Thank you for your interest in our work. We referred to the Grad-CAM visualization code in this work: https://github.com/salesforce/ALBEF/blob/main/visualization.ipynb. If you have further questions about Grad-CAM visualization, I will also be glad to help you.

@itsqyh
Copy link
Author

itsqyh commented Jun 19, 2024

Thank you for your interest in our work. We referred to the Grad-CAM visualization code in this work: https://github.com/salesforce/ALBEF/blob/main/visualization.ipynb. If you have further questions about Grad-CAM visualization, I will also be glad to help you.

Thank you very much for your effective response! I had previously referred to the work you provided, but due to my limited coding skills, I didn't know where to start making modifications. Although I feel this might be asking for something without much effort on my part, I still want to ask if you could share your source code. This would greatly help me understand and apply the concepts in the future. Of course, if it's inconvenient for you to share, that's perfectly fine. And thank you very much for your response :)

@Oscar860601
Copy link

@gordonhu608 Following the GradCam Questions.
I just reproduce the GradCam Visualization of Llava 1.5 on some public dataset.
I want to make sure 3 details:

  1. Which layer of the model did you use to get the result on the paper ? vision encoder layer 22 ? projector layer ?
  2. If the output is a sequence, how did you aggregate different Gradcam result from each output token ? choosing some keywords ? or just average the gradient and activation ?
  3. Do you notice some weird result on llava 1.5 GradCam visualization ? In my own experiments, most of the GradCam heatmap produced by llava 1.5 looks noisy.

I would reproduce GradCam on your brilliant work next week. I just want to make sure my pipeline is the same as yours.
Thanks !

@gordonhu608
Copy link
Owner

1.For our work, we chose the layer of cross-attention, so I assume for llava it would be the projection layer. 2. We tested on a QA example, we computed the loss on the 'answer' part's output tokens. 3. I happened to test on llava's gradcam results, it's sometimes noisy. I conjecture each of the 576 visual tokens is attending to very different image information and studying some kind of relations.

@Oscar860601
Copy link

1.For our work, we chose the layer of cross-attention, so I assume for llava it would be the projection layer. 2. We tested on a QA example, we computed the loss on the 'answer' part's output tokens. 3. I happened to test on llava's gradcam results, it's sometimes noisy. I conjecture each of the 576 visual tokens is attending to very different image information and studying some kind of relations.

Thanks for explanation !
Just for confirmation, when you said "cross-attention layer", you are referring the Multihead Attention in the Resampler in MQT ?

@gordonhu608
Copy link
Owner

Yes, Correct.

@SakuraTroyChen
Copy link

@Oscar860601 Hello! I am struggling with GradCAM visualization, could you please provide me with your code for visualizing on llava 1.5? It would be a great help for me.

@g-h-chen
Copy link

g-h-chen commented Sep 4, 2024

@Oscar860601 Same here! Looking for a piece of code!

1 similar comment
@Yingshu-Li
Copy link

@Oscar860601 Same here! Looking for a piece of code!

@liuxuannan
Copy link

How to get the output logit distribution of llava model? There is some wrong below the code:
with torch.inference_mode():
output = model(
input_ids,
images=images_tensor,
image_sizes=image_sizes,
use_cache=True,
)
prob = output.logits

@cyj95
Copy link

cyj95 commented Jan 9, 2025

@Oscar860601 Same here! Looking for a piece of code!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants