Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission #8

Open
pat676 opened this issue Jul 13, 2020 · 29 comments
Open

Submission #8

pat676 opened this issue Jul 13, 2020 · 29 comments

Comments

@pat676
Copy link

pat676 commented Jul 13, 2020

Hi,

Sorry if this information has already been provided, but I can't find detailed submission instructions. If no details have been published yet, my suggestion is this:

  1. Fork the repo and create a folder vnn-comp/<toolkit_name>

  2. The <toolkit_name> folder should contain:

  • A readme with instructions on how to set up the environment.
  • A shell script: run_benchmarks to run all the reported benchmarks
  1. The output of the run_benchmarks scripts should be A latex-compiled pdf file: vnn-comp/<toolkit_name>/results.pdf containing:
  • 1-2 paragraphs about the toolkit, testing environment (Processor, Ram), operating system and other relevant information.
  • The pdf file should contain tables with each image number/verification results/time spent/epsilon
  • Preferably one plot for each network with the number of safe, unsafe and total amount of cases solved vs time. (ref: Category Proposal: Piecewise Linear Activations (ReLU) #2 (comment))
  1. Create a pull request.

Also, two more questions:

  1. The VNN-COMP website mentions a shared overleaf project, has this been created yet?
  2. We do not have authorisation at present to deposit the toolkit's source code on GitHub. We plan simply to add a readme with downloading instruction from Imperial's web site. Please let us know if there are issues with this.
@vtjeng
Copy link
Contributor

vtjeng commented Jul 14, 2020

@pat676 how are you planning to programmatically generate the .tex files? I don't know any options in Julia (which is what my verification tool is written in) so I was planning to simply use PyLaTeX, but I'm wondering whether you know of anything that works better.

@pat676
Copy link
Author

pat676 commented Jul 14, 2020

@pat676 how are you planning to programmatically generate the .tex files? I don't know any options in Julia (which is what my verification tool is written in) so I was planning to simply use PyLaTeX, but I'm wondering whether you know of anything that works better.

Hi @vtjeng,

For this round, I just created the functionality for tex tables from scratch and wrote the main.tex importing these tables manually. In case you want to use the same, I've shared a dropbox folder with the relevant code for tex tables and plots. The input to the "write_table" function is statuses/times/img_idx/epsilons in the same shape as the final latex table. The tables should be at most 50 rows to fit in one page, so for benchmarks with more than 50 cases I suggest splitting them up into several columns.

*** Edit. The tables and plots may not be suitable for the ACAS and convolutional oval benchmarks .

@ttj
Copy link
Contributor

ttj commented Jul 16, 2020

Sorry if this information has already been provided, but I can't find detailed submission instructions. If no details have been published yet, my suggestion is this:

  1. Fork the repo and create a folder vnn-comp/<toolkit_name>

This is fine, modulo a suggestion below to put things into categories. If participating in only one category, please include things in that category folder. If participating in more than one, putting any additional requirements (code, etc.) shared across categories at the root is fine.

  1. The <toolkit_name> folder should contain:
  • A readme with instructions on how to set up the environment.
  • A shell script: run_benchmarks to run all the reported benchmarks

Please put the scripts with respect to the category in vnn-comp/category/toolkit_name, e.g., `vnn-comp/CNN/nnv/.' for NNV's batch scripts for the CNN category.

  1. The output of the run_benchmarks scripts should be A latex-compiled pdf file: vnn-comp/<toolkit_name>/results.pdf containing:

CSV or similar is also acceptable, but of course having it easily integrated into the report with the source latex would be ideal. Also note the comment above regarding category, so this would go to e.g. vnn-comp/category/toolkit_name/results.pdf (or .tex/.csv if that's also done), concretely vnn-comp/CNN/nnv/results.pdf for the CNN category results.

  • 1-2 paragraphs about the toolkit, testing environment (Processor, Ram), operating system and other relevant information.

There is a section in the overleaf to add this information.

  • The pdf file should contain tables with each image number/verification results/time spent/epsilon
  • Preferably one plot for each network with the number of safe, unsafe and total amount of cases solved vs time. (ref: #2 (comment))
  1. Create a pull request.

Also, two more questions:

  1. The VNN-COMP website mentions a shared overleaf project, has this been created yet?

Yes, I've shared this with some, if you or your collaborators don't have access, please email me with your overleaf account email as I didn't want to publicly share an editable link.

  1. We do not have authorisation at present to deposit the toolkit's source code on GitHub. We plan simply to add a readme with downloading instruction from Imperial's web site. Please let us know if there are issues with this.

That is fine.

@vtjeng
Copy link
Contributor

vtjeng commented Jul 16, 2020

In the results section, what section type do want to use under subsubsection (e.g. under 4.2.1)? It doesn't look like there's a consistent answer on how to handle that.

@stanleybak
Copy link

stanleybak commented Jul 17, 2020

I made some latex code to allow people to define their tool results in separate files in the overleaf. They will get added as separate columns in the same table. This way we don't need an external processing step that requires everyone's results, just a single data file per tool/benchmark. See data/acasxu-all-nnenum.txt for an example data file.

@ttj
Copy link
Contributor

ttj commented Jul 18, 2020

I made some latex code to allow people to define their tool results in separate files in the overleaf. They will get added as separate columns in the same table. This way we don't need an external processing step that requires everyone's results, just a single data file per tool/benchmark. See data/acasxu-all-nnenum.txt for an example data file.

Thanks! So, I think this is partly problematic the way it's set up. The individual sat/unsat results are not included per tool, so how do we know if an individual tool was right/wrong or if results differed across tools? It probably would have been better to include a csv/similar for each tool consisting of the result and time instead of just the time (as I think previously we had discussed). If such files already exist, please point them out.

@stanleybak
Copy link

stanleybak commented Jul 18, 2020

Yes you're right. Right now it's up to the tool authors to check for consistency with the results column.

I think the pgfplotstable package is powerful enough to be able to produce the results column automatically assuming each tool provides two columns (time / result) and detect inconsistencies automatically (for example simple computations, see here), but I didn't have time to play with the package enough to do this. It also should be able to auto-format the times to two decimal places or significant digits for us, rather than having to do it ourselves.

For now, you can define two columns in each .dat file since the code is just pulling out the first column ("[index] 0"): \pgfplotstablecreatecol[copy column from table={\dataAcasHardNnenum}{[index] 0}] {nnenum} {\dataAcasHard}

See data/acasxu-hard-nnenum.txt for an example of this. It should be backwards compatible with people who just have a single column. Note that you will have to add (unused) column names in the first row in the data file if you do this, since the first row is no longer just numerical (the result column is a string). Otherwise pgfplotstable will omit the first row of the data file assuming it's the column names. There are probably some configuration options to change this.

@stanleybak
Copy link

I made some scripts to plot the results as discussed in #2 (comment). I'm still tweaking things but it should be easy to make it work for each category using the data files used in the tables.

Doing a quick google search it looks like cactus plots usually have timeout on the y axis. Is this correct? Seems strange to me but it's easy enough to switch.

@vtjeng
Copy link
Contributor

vtjeng commented Jul 18, 2020

@ttj I've been working off of @pat676's suggestion to generate a PDF report for each benchmark. Should I be generating a CSV with one column as suggested by @stanleybak's instead now? What is the convention for the ordering of the rows for each of the benchmarks to make sure that the results are compatible?

@ttj
Copy link
Contributor

ttj commented Jul 18, 2020

@ttj I've been working off of @pat676's suggestion to generate a PDF report for each benchmark. Should I be generating a CSV with one column as suggested by @stanleybak's instead now? What is the convention for the ordering of the rows for each of the benchmarks to make sure that the results are compatible?

Please take a look at the current overleaf draft, if anyone needs access, please let me know the email to invite. In the interest of time, I would say either is okay for now, but if possible, it would be nice to have these aggregated as in the existing columns that appear in the current draft, as this will make it easier to compare things (and the ordering is defined there).

@vtjeng
Copy link
Contributor

vtjeng commented Jul 19, 2020

@stanleybak this is a minor issue but I haven't been able to figure out how to right-align my results. Let me know whether I need to change something about my input data.

@stanleybak
Copy link

I've just been manually setting each column with display columns/4/.style={column type = {r}}, in the table, where 4 is the column number and 'r' means right align. There's probably a way to set a default but I couldn't find it right away and this works well enough.

@vtjeng
Copy link
Contributor

vtjeng commented Jul 19, 2020

I'm not sure I'll manage to get my code working in time on the CIFAR10 CNN networks from #3 that @GgnDpSngh proposed. How should I report it in that case?

@stanleybak
Copy link

stanleybak commented Jul 19, 2020

I would say just skip it in the tables and maybe report in the description which ones you analyzed, although @ttj can confirm. It's an imperfect comparison for a number of reasons (different evaluation platforms, for one). We can discuss how to better arrange things for next year during the workshop.

@ttj
Copy link
Contributor

ttj commented Jul 19, 2020

I would say just skip it in the tables and maybe report in the description which ones you analyzed, although @ttj can confirm. It's an imperfect comparison for a number of reasons (different evaluation platforms, for one). We can discuss how to better arrange things for next year during the workshop.

Yes, that's fine, and I'll make a note of this in the presentation as well

@ttj
Copy link
Contributor

ttj commented Jul 19, 2020

@FlorianJaeckle could you please add the computer details (cpu/ram) used to the participant info on Oval in the report?

@vtjeng
Copy link
Contributor

vtjeng commented Jul 20, 2020

@ttj someone left a comment about providing details about the participants; where should we do this? (In my case it was just me).

@ttj
Copy link
Contributor

ttj commented Jul 20, 2020

@ttj someone left a comment about providing details about the participants; where should we do this? (In my case it was just me).

I added that comment to hopefully avoid not acknowledging anyone in the talk, and for finalizing the report. I added placeholders in each tool's description subsection and updated those I knew offhand, please check/update as needed.

@vtjeng
Copy link
Contributor

vtjeng commented Jul 20, 2020

Thanks @ttj. @stanleybak --- I have some results that I just added for MNIST 0.3 and CIFAR 8/255 for the GGN-CNN networks after finally fixing my bug --- hope that you can update the cactus plot with that information tomorrow!

@stanleybak
Copy link

Done! I'll probably do one last update with the latest data at noon today---the deadline Taylor set---in case anyone else made changes. Also, if people see mistakes in the plots let me know.

The code to make the plots is in figure/plot_code if anyone is interested. You can download the project .zip file (from the history tab in overleaf), and then go to the plot_code directory and run python3 make_gnuplot_inputs.py ../../data/ followed by gnuplot make_plots.gnuplot.

@vtjeng
Copy link
Contributor

vtjeng commented Jul 28, 2020

Thanks for the instructions @stanleybak.

@ttj, where will the final copy of the report be made available? (I'd like to share the report publicly when it is appropriate to do so).

@GgnDpSngh
Copy link

Hi there,

As @vtjeng mentioned, it would be good to release the report soon. Also, in the report, we should rename "GGN-CNN" benchmarks to "COLT-CNN" as they are taken from a project in which I was not involved. I can do such a change if needed.

Cheers,
Gagandeep Singh

@stanleybak
Copy link

stanleybak commented Aug 24, 2020

I agree an "official" release of the report would be useful.

Also @GgnDpSngh, do you have the scripts to run ERAN on the VNN-COMP benchmarks? I didn't see it in the repository.

@GgnDpSngh
Copy link

Hi Stanley,

Yes, I will send a pull request by Thursday. Does it have to be a Docker image or only the scripts to run ERAN on the benchmarks suffice?

Cheers,
Gagandeep Singh

@stanleybak
Copy link

I think the original instructions said to use a Dockerfile for the setup, although @ttj can answer if there were any changes.

@GgnDpSngh
Copy link

There seems to be an issue with running the Gurobi license inside the Docker Image which I could not fix yet (maybe someone here knows to resolve it), but I have created a version of ERAN for reproducing competition results here:

https://github.com/GgnDpSngh/ERAN-VNN-COMP

I can send a pull request if @ttj is fine with it.

Cheers,
Gagandeep Singh

@vtjeng
Copy link
Contributor

vtjeng commented Aug 29, 2020

I've submitted #18 containing the code that I used for this submission. (No substantial changes to the logic; I primarily did some cleanups to the README, and removed some initial code I had to generate reports.)

@ttj
Copy link
Contributor

ttj commented Aug 30, 2020

Thanks, sorry for the slow follow-up, have been swamped getting ready for classes. I merged #18. If any changes are needed to the report, such as updating benchmark names, etc., mentioned above, please do these, otherwise I will add some conclusions drawn and finalize it in the next 1-2 weeks. There is no official proceedings, so if you need to cite things, just cite this repository or the VNN website.

@stanleybak
Copy link

Is a copy of the report pdf available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants