You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to replicate your results using the shared checkpoints. However, I am not sure if I am using the correct metric. I followed this repo for calculating the CIDEr score. My results were 0.618 on COCO-Val and 1.42 on IU-XRay which does not make sense compared to the results in the paper.
Could you please guide me here or share your code for evaluation?
The text was updated successfully, but these errors were encountered:
We used the this repo https://github.com/EvolvingLMMs-Lab/lmms-eval for evaluation.
Basically, we implement the dragonfly class under their framework. Then you can choose different tasks for evaluation.
Regarding biomedical eval, we are also using pycocoevalcap. Few things to keep in mind. You are using the med version of the model for biomedical evaluations. Also, please make sure to use the correct prompt format (llama3). We have examples on readme. Perhaps, also try on other biomedical tasks and see if you get similarly low score.
Please feel free to reply back if you still see the issues.
Hello,
Thanks for sharing your work!
I am trying to replicate your results using the shared checkpoints. However, I am not sure if I am using the correct metric. I followed this repo for calculating the CIDEr score. My results were 0.618 on COCO-Val and 1.42 on IU-XRay which does not make sense compared to the results in the paper.
Could you please guide me here or share your code for evaluation?
The text was updated successfully, but these errors were encountered: