Submission #8

pat676 · 2020-07-13T13:43:38Z

Hi,

Sorry if this information has already been provided, but I can't find detailed submission instructions. If no details have been published yet, my suggestion is this:

Fork the repo and create a folder vnn-comp/<toolkit_name>
The <toolkit_name> folder should contain:

A readme with instructions on how to set up the environment.
A shell script: run_benchmarks to run all the reported benchmarks

The output of the run_benchmarks scripts should be A latex-compiled pdf file: vnn-comp/<toolkit_name>/results.pdf containing:

1-2 paragraphs about the toolkit, testing environment (Processor, Ram), operating system and other relevant information.
The pdf file should contain tables with each image number/verification results/time spent/epsilon
Preferably one plot for each network with the number of safe, unsafe and total amount of cases solved vs time. (ref: Category Proposal: Piecewise Linear Activations (ReLU) #2 (comment))

Create a pull request.

Also, two more questions:

The VNN-COMP website mentions a shared overleaf project, has this been created yet?
We do not have authorisation at present to deposit the toolkit's source code on GitHub. We plan simply to add a readme with downloading instruction from Imperial's web site. Please let us know if there are issues with this.

vtjeng · 2020-07-14T05:15:12Z

@pat676 how are you planning to programmatically generate the .tex files? I don't know any options in Julia (which is what my verification tool is written in) so I was planning to simply use PyLaTeX, but I'm wondering whether you know of anything that works better.

pat676 · 2020-07-14T10:50:19Z

@pat676 how are you planning to programmatically generate the .tex files? I don't know any options in Julia (which is what my verification tool is written in) so I was planning to simply use PyLaTeX, but I'm wondering whether you know of anything that works better.

Hi @vtjeng,

For this round, I just created the functionality for tex tables from scratch and wrote the main.tex importing these tables manually. In case you want to use the same, I've shared a dropbox folder with the relevant code for tex tables and plots. The input to the "write_table" function is statuses/times/img_idx/epsilons in the same shape as the final latex table. The tables should be at most 50 rows to fit in one page, so for benchmarks with more than 50 cases I suggest splitting them up into several columns.

*** Edit. The tables and plots may not be suitable for the ACAS and convolutional oval benchmarks .

ttj · 2020-07-16T15:47:07Z

Sorry if this information has already been provided, but I can't find detailed submission instructions. If no details have been published yet, my suggestion is this:

Fork the repo and create a folder vnn-comp/<toolkit_name>

This is fine, modulo a suggestion below to put things into categories. If participating in only one category, please include things in that category folder. If participating in more than one, putting any additional requirements (code, etc.) shared across categories at the root is fine.

The <toolkit_name> folder should contain:

A readme with instructions on how to set up the environment.

A shell script: run_benchmarks to run all the reported benchmarks

Please put the scripts with respect to the category in vnn-comp/category/toolkit_name, e.g., `vnn-comp/CNN/nnv/.' for NNV's batch scripts for the CNN category.

The output of the run_benchmarks scripts should be A latex-compiled pdf file: vnn-comp/<toolkit_name>/results.pdf containing:

CSV or similar is also acceptable, but of course having it easily integrated into the report with the source latex would be ideal. Also note the comment above regarding category, so this would go to e.g. vnn-comp/category/toolkit_name/results.pdf (or .tex/.csv if that's also done), concretely vnn-comp/CNN/nnv/results.pdf for the CNN category results.

1-2 paragraphs about the toolkit, testing environment (Processor, Ram), operating system and other relevant information.

There is a section in the overleaf to add this information.

The pdf file should contain tables with each image number/verification results/time spent/epsilon

Preferably one plot for each network with the number of safe, unsafe and total amount of cases solved vs time. (ref: #2 (comment))

Create a pull request.

Also, two more questions:

The VNN-COMP website mentions a shared overleaf project, has this been created yet?

Yes, I've shared this with some, if you or your collaborators don't have access, please email me with your overleaf account email as I didn't want to publicly share an editable link.

We do not have authorisation at present to deposit the toolkit's source code on GitHub. We plan simply to add a readme with downloading instruction from Imperial's web site. Please let us know if there are issues with this.

That is fine.

vtjeng · 2020-07-16T19:34:43Z

In the results section, what section type do want to use under subsubsection (e.g. under 4.2.1)? It doesn't look like there's a consistent answer on how to handle that.

stanleybak · 2020-07-17T15:29:54Z

I made some latex code to allow people to define their tool results in separate files in the overleaf. They will get added as separate columns in the same table. This way we don't need an external processing step that requires everyone's results, just a single data file per tool/benchmark. See data/acasxu-all-nnenum.txt for an example data file.

ttj · 2020-07-18T12:44:06Z

I made some latex code to allow people to define their tool results in separate files in the overleaf. They will get added as separate columns in the same table. This way we don't need an external processing step that requires everyone's results, just a single data file per tool/benchmark. See data/acasxu-all-nnenum.txt for an example data file.

Thanks! So, I think this is partly problematic the way it's set up. The individual sat/unsat results are not included per tool, so how do we know if an individual tool was right/wrong or if results differed across tools? It probably would have been better to include a csv/similar for each tool consisting of the result and time instead of just the time (as I think previously we had discussed). If such files already exist, please point them out.

stanleybak · 2020-07-18T14:07:41Z

Yes you're right. Right now it's up to the tool authors to check for consistency with the results column.

I think the pgfplotstable package is powerful enough to be able to produce the results column automatically assuming each tool provides two columns (time / result) and detect inconsistencies automatically (for example simple computations, see here), but I didn't have time to play with the package enough to do this. It also should be able to auto-format the times to two decimal places or significant digits for us, rather than having to do it ourselves.

For now, you can define two columns in each .dat file since the code is just pulling out the first column ("[index] 0"): \pgfplotstablecreatecol[copy column from table={\dataAcasHardNnenum}{[index] 0}] {nnenum} {\dataAcasHard}

See data/acasxu-hard-nnenum.txt for an example of this. It should be backwards compatible with people who just have a single column. Note that you will have to add (unused) column names in the first row in the data file if you do this, since the first row is no longer just numerical (the result column is a string). Otherwise pgfplotstable will omit the first row of the data file assuming it's the column names. There are probably some configuration options to change this.

stanleybak · 2020-07-18T17:06:05Z

I made some scripts to plot the results as discussed in #2 (comment). I'm still tweaking things but it should be easy to make it work for each category using the data files used in the tables.

Doing a quick google search it looks like cactus plots usually have timeout on the y axis. Is this correct? Seems strange to me but it's easy enough to switch.

vtjeng · 2020-07-18T17:13:27Z

@ttj I've been working off of @pat676's suggestion to generate a PDF report for each benchmark. Should I be generating a CSV with one column as suggested by @stanleybak's instead now? What is the convention for the ordering of the rows for each of the benchmarks to make sure that the results are compatible?

ttj · 2020-07-18T17:20:16Z

@ttj I've been working off of @pat676's suggestion to generate a PDF report for each benchmark. Should I be generating a CSV with one column as suggested by @stanleybak's instead now? What is the convention for the ordering of the rows for each of the benchmarks to make sure that the results are compatible?

Please take a look at the current overleaf draft, if anyone needs access, please let me know the email to invite. In the interest of time, I would say either is okay for now, but if possible, it would be nice to have these aggregated as in the existing columns that appear in the current draft, as this will make it easier to compare things (and the ordering is defined there).

vtjeng · 2020-07-19T01:44:19Z

@stanleybak this is a minor issue but I haven't been able to figure out how to right-align my results. Let me know whether I need to change something about my input data.

stanleybak · 2020-07-19T02:13:07Z

I've just been manually setting each column with display columns/4/.style={column type = {r}}, in the table, where 4 is the column number and 'r' means right align. There's probably a way to set a default but I couldn't find it right away and this works well enough.

vtjeng · 2020-07-19T06:59:40Z

I'm not sure I'll manage to get my code working in time on the CIFAR10 CNN networks from #3 that @GgnDpSngh proposed. How should I report it in that case?

stanleybak · 2020-07-19T11:17:51Z

I would say just skip it in the tables and maybe report in the description which ones you analyzed, although @ttj can confirm. It's an imperfect comparison for a number of reasons (different evaluation platforms, for one). We can discuss how to better arrange things for next year during the workshop.

ttj · 2020-07-19T16:11:58Z

I would say just skip it in the tables and maybe report in the description which ones you analyzed, although @ttj can confirm. It's an imperfect comparison for a number of reasons (different evaluation platforms, for one). We can discuss how to better arrange things for next year during the workshop.

Yes, that's fine, and I'll make a note of this in the presentation as well

ttj · 2020-07-19T16:12:56Z

@FlorianJaeckle could you please add the computer details (cpu/ram) used to the participant info on Oval in the report?

vtjeng · 2020-07-20T00:37:32Z

@ttj someone left a comment about providing details about the participants; where should we do this? (In my case it was just me).

ttj · 2020-07-20T00:55:53Z

@ttj someone left a comment about providing details about the participants; where should we do this? (In my case it was just me).

I added that comment to hopefully avoid not acknowledging anyone in the talk, and for finalizing the report. I added placeholders in each tool's description subsection and updated those I knew offhand, please check/update as needed.

vtjeng · 2020-07-20T07:14:42Z

Thanks @ttj. @stanleybak --- I have some results that I just added for MNIST 0.3 and CIFAR 8/255 for the GGN-CNN networks after finally fixing my bug --- hope that you can update the cactus plot with that information tomorrow!

stanleybak · 2020-07-20T11:09:26Z

Done! I'll probably do one last update with the latest data at noon today---the deadline Taylor set---in case anyone else made changes. Also, if people see mistakes in the plots let me know.

The code to make the plots is in figure/plot_code if anyone is interested. You can download the project .zip file (from the history tab in overleaf), and then go to the plot_code directory and run python3 make_gnuplot_inputs.py ../../data/ followed by gnuplot make_plots.gnuplot.

vtjeng · 2020-07-28T22:28:59Z

Thanks for the instructions @stanleybak.

@ttj, where will the final copy of the report be made available? (I'd like to share the report publicly when it is appropriate to do so).

GgnDpSngh · 2020-08-24T18:47:50Z

Hi there,

As @vtjeng mentioned, it would be good to release the report soon. Also, in the report, we should rename "GGN-CNN" benchmarks to "COLT-CNN" as they are taken from a project in which I was not involved. I can do such a change if needed.

Cheers,
Gagandeep Singh

stanleybak · 2020-08-24T20:24:10Z

I agree an "official" release of the report would be useful.

Also @GgnDpSngh, do you have the scripts to run ERAN on the VNN-COMP benchmarks? I didn't see it in the repository.

GgnDpSngh · 2020-08-24T20:31:17Z

Hi Stanley,

Yes, I will send a pull request by Thursday. Does it have to be a Docker image or only the scripts to run ERAN on the benchmarks suffice?

Cheers,
Gagandeep Singh

stanleybak · 2020-08-24T20:45:58Z

I think the original instructions said to use a Dockerfile for the setup, although @ttj can answer if there were any changes.

GgnDpSngh · 2020-08-27T15:46:47Z

There seems to be an issue with running the Gurobi license inside the Docker Image which I could not fix yet (maybe someone here knows to resolve it), but I have created a version of ERAN for reproducing competition results here:

https://github.com/GgnDpSngh/ERAN-VNN-COMP

I can send a pull request if @ttj is fine with it.

Cheers,
Gagandeep Singh

vtjeng · 2020-08-29T18:21:12Z

I've submitted #18 containing the code that I used for this submission. (No substantial changes to the logic; I primarily did some cleanups to the README, and removed some initial code I had to generate reports.)

ttj · 2020-08-30T17:33:18Z

Thanks, sorry for the slow follow-up, have been swamped getting ready for classes. I merged #18. If any changes are needed to the report, such as updating benchmark names, etc., mentioned above, please do these, otherwise I will add some conclusions drawn and finalize it in the next 1-2 weeks. There is no official proceedings, so if you need to cite things, just cite this repository or the VNN website.

stanleybak · 2020-09-01T21:08:33Z

Is a copy of the report pdf available?

ttj mentioned this issue Jul 13, 2020

Dates and Deadlines #5

Open

ttj mentioned this issue Jul 18, 2020

Category Proposal: Convolutional Neural Networks #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submission #8

Submission #8

pat676 commented Jul 13, 2020 •

edited

Loading

vtjeng commented Jul 14, 2020

pat676 commented Jul 14, 2020 •

edited

Loading

ttj commented Jul 16, 2020

vtjeng commented Jul 16, 2020

stanleybak commented Jul 17, 2020 •

edited

Loading

ttj commented Jul 18, 2020

stanleybak commented Jul 18, 2020 •

edited

Loading

stanleybak commented Jul 18, 2020

vtjeng commented Jul 18, 2020

ttj commented Jul 18, 2020

vtjeng commented Jul 19, 2020 •

edited

Loading

stanleybak commented Jul 19, 2020

vtjeng commented Jul 19, 2020

stanleybak commented Jul 19, 2020 •

edited

Loading

ttj commented Jul 19, 2020

ttj commented Jul 19, 2020

vtjeng commented Jul 20, 2020

ttj commented Jul 20, 2020

vtjeng commented Jul 20, 2020

stanleybak commented Jul 20, 2020

vtjeng commented Jul 28, 2020 •

edited

Loading

GgnDpSngh commented Aug 24, 2020

stanleybak commented Aug 24, 2020 •

edited

Loading

GgnDpSngh commented Aug 24, 2020

stanleybak commented Aug 24, 2020

GgnDpSngh commented Aug 27, 2020

vtjeng commented Aug 29, 2020

ttj commented Aug 30, 2020

stanleybak commented Sep 1, 2020

Submission #8

Submission #8

Comments

pat676 commented Jul 13, 2020 • edited Loading

vtjeng commented Jul 14, 2020

pat676 commented Jul 14, 2020 • edited Loading

ttj commented Jul 16, 2020

vtjeng commented Jul 16, 2020

stanleybak commented Jul 17, 2020 • edited Loading

ttj commented Jul 18, 2020

stanleybak commented Jul 18, 2020 • edited Loading

stanleybak commented Jul 18, 2020

vtjeng commented Jul 18, 2020

ttj commented Jul 18, 2020

vtjeng commented Jul 19, 2020 • edited Loading

stanleybak commented Jul 19, 2020

vtjeng commented Jul 19, 2020

stanleybak commented Jul 19, 2020 • edited Loading

ttj commented Jul 19, 2020

ttj commented Jul 19, 2020

vtjeng commented Jul 20, 2020

ttj commented Jul 20, 2020

vtjeng commented Jul 20, 2020

stanleybak commented Jul 20, 2020

vtjeng commented Jul 28, 2020 • edited Loading

GgnDpSngh commented Aug 24, 2020

stanleybak commented Aug 24, 2020 • edited Loading

GgnDpSngh commented Aug 24, 2020

stanleybak commented Aug 24, 2020

GgnDpSngh commented Aug 27, 2020

vtjeng commented Aug 29, 2020

ttj commented Aug 30, 2020

stanleybak commented Sep 1, 2020

pat676 commented Jul 13, 2020 •

edited

Loading

pat676 commented Jul 14, 2020 •

edited

Loading

stanleybak commented Jul 17, 2020 •

edited

Loading

stanleybak commented Jul 18, 2020 •

edited

Loading

vtjeng commented Jul 19, 2020 •

edited

Loading

stanleybak commented Jul 19, 2020 •

edited

Loading

vtjeng commented Jul 28, 2020 •

edited

Loading

stanleybak commented Aug 24, 2020 •

edited

Loading