Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIDEr Score Mismatch #14

Open
dina-adel opened this issue Jul 17, 2024 · 2 comments
Open

CIDEr Score Mismatch #14

dina-adel opened this issue Jul 17, 2024 · 2 comments

Comments

@dina-adel
Copy link

dina-adel commented Jul 17, 2024

Hello,

Thanks for sharing your work!

I am trying to replicate your results using the shared checkpoints. However, I am not sure if I am using the correct metric. I followed this repo for calculating the CIDEr score. My results were 0.618 on COCO-Val and 1.42 on IU-XRay which does not make sense compared to the results in the paper.

Could you please guide me here or share your code for evaluation?

@ckzbullbullet
Copy link
Contributor

ckzbullbullet commented Jul 18, 2024

We used the this repo https://github.com/EvolvingLMMs-Lab/lmms-eval for evaluation.
Basically, we implement the dragonfly class under their framework. Then you can choose different tasks for evaluation.

Regarding biomedical eval, we are also using pycocoevalcap. Few things to keep in mind. You are using the med version of the model for biomedical evaluations. Also, please make sure to use the correct prompt format (llama3). We have examples on readme. Perhaps, also try on other biomedical tasks and see if you get similarly low score.

Please feel free to reply back if you still see the issues.

@dina-adel
Copy link
Author

Thanks for getting back to me.

I am still running the llm-eval now on the coco dataset.

I ran the CIDEr on the iu-xray dataset again and I got the same score. Am I doing it wrong?

from pycocoevalcap.cider.cider import Cider
from pycocoevalcap.tokenizer.ptbtokenizer import PTBTokenizer

p_tokenizer = PTBTokenizer()
reference_captions_tokenized = p_tokenizer.tokenize(reference_captions)
generated_captions_tokenized = p_tokenizer.tokenize(generated_captions)

cider_scorer = Cider()
score, scores = cider_scorer.compute_score(generated_captions_tokenized, reference_captions_tokenized)
print(f'CIDEr Score: {score}')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants