-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submission #8
Comments
Hi @vtjeng, For this round, I just created the functionality for tex tables from scratch and wrote the main.tex importing these tables manually. In case you want to use the same, I've shared a dropbox folder with the relevant code for tex tables and plots. The input to the "write_table" function is statuses/times/img_idx/epsilons in the same shape as the final latex table. The tables should be at most 50 rows to fit in one page, so for benchmarks with more than 50 cases I suggest splitting them up into several columns. *** Edit. The tables and plots may not be suitable for the ACAS and convolutional oval benchmarks . |
This is fine, modulo a suggestion below to put things into categories. If participating in only one category, please include things in that category folder. If participating in more than one, putting any additional requirements (code, etc.) shared across categories at the root is fine.
Please put the scripts with respect to the category in
CSV or similar is also acceptable, but of course having it easily integrated into the report with the source latex would be ideal. Also note the comment above regarding category, so this would go to e.g.
There is a section in the overleaf to add this information.
Yes, I've shared this with some, if you or your collaborators don't have access, please email me with your overleaf account email as I didn't want to publicly share an editable link.
That is fine. |
In the results section, what section type do want to use under |
I made some latex code to allow people to define their tool results in separate files in the overleaf. They will get added as separate columns in the same table. This way we don't need an external processing step that requires everyone's results, just a single data file per tool/benchmark. See |
Thanks! So, I think this is partly problematic the way it's set up. The individual sat/unsat results are not included per tool, so how do we know if an individual tool was right/wrong or if results differed across tools? It probably would have been better to include a csv/similar for each tool consisting of the result and time instead of just the time (as I think previously we had discussed). If such files already exist, please point them out. |
Yes you're right. Right now it's up to the tool authors to check for consistency with the results column. I think the For now, you can define two columns in each .dat file since the code is just pulling out the first column ("[index] 0"): See |
I made some scripts to plot the results as discussed in #2 (comment). I'm still tweaking things but it should be easy to make it work for each category using the data files used in the tables. Doing a quick google search it looks like cactus plots usually have timeout on the y axis. Is this correct? Seems strange to me but it's easy enough to switch. |
@ttj I've been working off of @pat676's suggestion to generate a PDF report for each benchmark. Should I be generating a CSV with one column as suggested by @stanleybak's instead now? What is the convention for the ordering of the rows for each of the benchmarks to make sure that the results are compatible? |
Please take a look at the current overleaf draft, if anyone needs access, please let me know the email to invite. In the interest of time, I would say either is okay for now, but if possible, it would be nice to have these aggregated as in the existing columns that appear in the current draft, as this will make it easier to compare things (and the ordering is defined there). |
@stanleybak this is a minor issue but I haven't been able to figure out how to right-align my results. Let me know whether I need to change something about my input data. |
I've just been manually setting each column with |
I'm not sure I'll manage to get my code working in time on the CIFAR10 CNN networks from #3 that @GgnDpSngh proposed. How should I report it in that case? |
I would say just skip it in the tables and maybe report in the description which ones you analyzed, although @ttj can confirm. It's an imperfect comparison for a number of reasons (different evaluation platforms, for one). We can discuss how to better arrange things for next year during the workshop. |
Yes, that's fine, and I'll make a note of this in the presentation as well |
@FlorianJaeckle could you please add the computer details (cpu/ram) used to the participant info on Oval in the report? |
@ttj someone left a comment about providing details about the participants; where should we do this? (In my case it was just me). |
I added that comment to hopefully avoid not acknowledging anyone in the talk, and for finalizing the report. I added placeholders in each tool's description subsection and updated those I knew offhand, please check/update as needed. |
Thanks @ttj. @stanleybak --- I have some results that I just added for MNIST 0.3 and CIFAR 8/255 for the GGN-CNN networks after finally fixing my bug --- hope that you can update the cactus plot with that information tomorrow! |
Done! I'll probably do one last update with the latest data at noon today---the deadline Taylor set---in case anyone else made changes. Also, if people see mistakes in the plots let me know. The code to make the plots is in |
Thanks for the instructions @stanleybak. @ttj, where will the final copy of the report be made available? (I'd like to share the report publicly when it is appropriate to do so). |
Hi there, As @vtjeng mentioned, it would be good to release the report soon. Also, in the report, we should rename "GGN-CNN" benchmarks to "COLT-CNN" as they are taken from a project in which I was not involved. I can do such a change if needed. Cheers, |
I agree an "official" release of the report would be useful. Also @GgnDpSngh, do you have the scripts to run ERAN on the VNN-COMP benchmarks? I didn't see it in the repository. |
Hi Stanley, Yes, I will send a pull request by Thursday. Does it have to be a Docker image or only the scripts to run ERAN on the benchmarks suffice? Cheers, |
I think the original instructions said to use a Dockerfile for the setup, although @ttj can answer if there were any changes. |
There seems to be an issue with running the Gurobi license inside the Docker Image which I could not fix yet (maybe someone here knows to resolve it), but I have created a version of ERAN for reproducing competition results here: https://github.com/GgnDpSngh/ERAN-VNN-COMP I can send a pull request if @ttj is fine with it. Cheers, |
I've submitted #18 containing the code that I used for this submission. (No substantial changes to the logic; I primarily did some cleanups to the README, and removed some initial code I had to generate reports.) |
Thanks, sorry for the slow follow-up, have been swamped getting ready for classes. I merged #18. If any changes are needed to the report, such as updating benchmark names, etc., mentioned above, please do these, otherwise I will add some conclusions drawn and finalize it in the next 1-2 weeks. There is no official proceedings, so if you need to cite things, just cite this repository or the VNN website. |
Is a copy of the report pdf available? |
Hi,
Sorry if this information has already been provided, but I can't find detailed submission instructions. If no details have been published yet, my suggestion is this:
Fork the repo and create a folder vnn-comp/<toolkit_name>
The <toolkit_name> folder should contain:
Also, two more questions:
The text was updated successfully, but these errors were encountered: