Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Category Proposal: General Nonlinear Activations (tanh, sigmoid, etc) #1

Open
ttj opened this issue Apr 6, 2020 · 37 comments
Open
Labels
category-proposal Proposed category

Comments

@ttj
Copy link
Contributor

ttj commented Apr 6, 2020

Networks composed of tanh, sigmoid, etc. and general nonlinear activations

Representative benchmark(s): control theory controllers

Questions: allow or disallow other nonlinearities as a sort of "combination theory" category (e.g., ReLU, piecewise linear, purely linear, etc.) or keep as a only consisting of tanh/sigmoid, etc.

@ttj ttj added the category-proposal Proposed category label Apr 6, 2020
@alessiolomuscio
Copy link

Activations: I think we need to pin down what activations we would like. tanh and sigmoid seems sufficient to me.

Benchmarks: we could have general classifiers too, including some with high dimensions.

I'd stick with tanh/sigmoid here.

Architectures: Do we need to fix the architecture?

@ttj
Copy link
Contributor Author

ttj commented Apr 8, 2020

Activations: I think we need to pin down what activations we would like. tanh and sigmoid seems sufficient to me.

Benchmarks: we could have general classifiers too, including some with high dimensions.

I'd stick with tanh/sigmoid here.

Architectures: Do we need to fix the architecture?

Thanks for the feedback, agreed, will work on fixing the activations, presumably just these two unless we get further feedback from others on any others to allow.

If you have some ideas for the larger benchmarks, please let us know what you're thinking (e.g., if you know of classifiers that can just work with only these).

Regarding architecture: yes, for the benchmarks, I'm imaging we will provide the explicit networks, so the architecture would in essence be fixed. Before that though, it is I believe fairly open based on the application.

@souradeep-111
Copy link

Hi everyone, sorry for joining the party a bit late. :)
I am assuming general nonlinear activations would include ReLU's as well. Is that correct ?
Which tools do we have in this category ?

@ttj
Copy link
Contributor Author

ttj commented Apr 15, 2020

Hi everyone, sorry for joining the party a bit late. :)
I am assuming general nonlinear activations would include ReLU's as well. Is that correct ?
Which tools do we have in this category ?

We're thinking networks with only ReLUs would be in the piecewise linear category, whereas this would cover nonlinear activations that are not piecewise linear, although if it makes sense with any benchmarks (please provide if you have any), we could consider a combination category.

We're awaiting feedback from tool authors on which categories they'll participate in, by commenting on these issues. We'll send an email to everyone who expressed interest soon to remind them.

@souradeep-111
Copy link

This is to follow up on Taylor's request.
I would like to sign up for this category with Sherlock.
https://github.com/souradeep-111/sherlock

@vtjeng
Copy link
Contributor

vtjeng commented Apr 24, 2020

My MIPVerify tool won't be participating in this category, but I'd like to suggest that the benchmark that we select for this category should be one where networks that are piecewise-linear have been shown to not do well, better motivating why we would want to work with these networks (with general nonlinear activations).

@GgnDpSngh
Copy link

We would like to sign up for this category with ERAN
https://github.com/eth-sri/eran

Following up on what Vincent just said, Tanh and Sigmoid are essential components of LSTM architectures, are we considering those?

Cheers,
Gagandeep Singh

@ttj
Copy link
Contributor Author

ttj commented Apr 27, 2020

Following up on what Vincent just said, Tanh and Sigmoid are essential components of LSTM architectures, are we considering those?

For this category, the benchmarks I at least had in mind, were more so from control theory controllers (e.g. along the lines of the feedforward controllers in case studies from Verisig: https://github.com/Verisig/verisig ), although these aren't going to be that easily parameterizable. We're certainly open to other benchmarks though, so if you/anyone else have any in mind for this category, please let us know. One issue I would imagine with LSTM is that not many (if any?) methods support these layers directly, but if there's sufficient interest and support for them, we can certainly consider it. We earlier discussed an RNN category, but didn't see much interest in that so far, although again, we can reconsider.

@ttj
Copy link
Contributor Author

ttj commented May 2, 2020

General Nonlinear Category Participants:

ERAN
NNV
Sherlock

If anyone else plans to join this category, please add a comment soon, as the participants in this category need to decide on the benchmarks soon (by about May 15).

@pat676
Copy link

pat676 commented May 4, 2020

Hi,

We would like to enter VeriNet https://vas.doc.ic.ac.uk/software/neural/.

The toolkit supports:

  • Local robustness properties.

  • Fully connected and convolutional layers.

  • ReLU, Sigmoid and Tanh activation functions.

Regards,
Patrick Henriksen

@ttj
Copy link
Contributor Author

ttj commented May 8, 2020

Finalized General Nonlinear Category Participants:

ERAN
NNV
Sherlock
VeriNet

@pat676
Copy link

pat676 commented May 12, 2020

Following up on previous comment, I think we should have some general classifiers here. I can train and release some fully-connected Sigmoid and Tanh networks with the MNIST dataset.

@ttj
Copy link
Contributor Author

ttj commented May 29, 2020

We have created some feedforward networks with tanh/sigmoid activations for classification on MNIST, and will share those soon. If there are any other proposals, please let us know.

@ttj
Copy link
Contributor Author

ttj commented Jun 4, 2020

Some ONNX MNIST classifiers with tanh/sigmoid are here, created by @Neelanjana314 , please let us know of any problems loading, etc., then we'll centralize things in this repository after we get the go-ahead/agreement on which of these to use:

https://github.com/Neelanjana314/VNN_COMP_2020/tree/master/Networks/General%20Non-Linear%20activation%20functions

@pat676
Copy link

pat676 commented Jul 1, 2020

Some ONNX MNIST classifiers with tanh/sigmoid are here, created by @Neelanjana314 , please let us know of any problems loading, etc., then we'll centralize things in this repository after we get the go-ahead/agreement on which of these to use:

https://github.com/Neelanjana314/VNN_COMP_2020/tree/master/Networks/General%20Non-Linear%20activation%20functions

@Neelanjana314 have we decided on verification parameters (input images/ epsilons/ timeout) for these networks? Also, there are a total of 12 networks; depending on the number of input images and the timeout setting this may take a long time. Should we use all of them or choose a subset?

@ttj
Copy link
Contributor Author

ttj commented Jul 1, 2020

@Neelanjana314 have we decided on verification parameters (input images/ epsilons/ timeout) for these networks? Also, there are a total of 12 networks; depending on the number of input images and the timeout setting this may take a long time. Should we use all of them or choose a subset?

@Neelanjana314 can provide feedback on a subset of the networks to use (I'd suggest some various different sizes and the different activation types, maybe ~4-6 total). For the inputs/specifications, I would suggest using what's done for the MNIST examples in the other categories (probably same as in the ReLU/piecewise one) unless there are a priori known reasons to use something different.

@Neelanjana314
Copy link
Contributor

Some ONNX MNIST classifiers with tanh/sigmoid are here, created by @Neelanjana314 , please let us know of any problems loading, etc., then we'll centralize things in this repository after we get the go-ahead/agreement on which of these to use:
https://github.com/Neelanjana314/VNN_COMP_2020/tree/master/Networks/General%20Non-Linear%20activation%20functions

@Neelanjana314 have we decided on verification parameters (input images/ epsilons/ timeout) for these networks? Also, there are a total of 12 networks; depending on the number of input images and the timeout setting this may take a long time. Should we use all of them or choose a subset?

I would suggest to use two networks from each of the activation type, X_200_100_50.onnx and X_200_50.onnx and we can as well use the same images as MNIST_ReLU(50 images provided by @pat676 or we can use the first 25). X_200_100_50.onnx should predict 49/50 or 25/25 where as the other should predict all of the images correctly. No normalization is needed for the input images.

For, the epsilon and timeout, I can update by tonight.

@GgnDpSngh
Copy link

Hi @Neelanjana314 , are these fully-connected networks or convolutional? I get "Conv" operations when translating?

Cheers,
Gagandeep Singh

@Neelanjana314
Copy link
Contributor

Hi @Neelanjana314 , are these fully-connected networks or convolutional? I get "Conv" operations when translating?

Cheers,
Gagandeep Singh

Hi @GgnDpSngh these are fully connected networks. Though, I will check and upload once again.

@pat676
Copy link

pat676 commented Jul 2, 2020

Hi @Neelanjana314,

I'm getting this graph for the logsig_200_100_50_onnx.onnx network:

net

All layers seem to be convolutional. Also, we could skip the softmax at the end; limiting the number of nodes makes it somewhat easier to convert to PyTorch.

@Neelanjana314
Copy link
Contributor

Hi @Neelanjana314,

I'm getting this graph for the logsig_200_100_50_onnx.onnx network:

net

All layers seem to be convolutional. Also, we could skip the softmax at the end; limiting the number of nodes makes it somewhat easier to convert to PyTorch.

@pat676 This should be fully connected layer instead. Can you and @GgnDpSngh check with the new file? Let me know if it works in pytorch?

https://www.dropbox.com/s/kq4841shswb89fc/tansig_200_100_50_onnx.onnx?dl=0

@Neelanjana314
Copy link
Contributor

Neelanjana314 commented Jul 2, 2020

@Neelanjana314 can provide feedback on a subset of the networks to use (I'd suggest some various different sizes and the different activation types, maybe ~4-6 total). For the inputs/specifications, I would suggest using what's done for the MNIST examples in the other categories (probably same as in the ReLU/piecewise one) unless there are a priori known reasons to use something different.

We can go ahead with these specifications for the testing.

@pat676
Copy link

pat676 commented Jul 2, 2020

Hi @Neelanjana314,
I'm getting this graph for the logsig_200_100_50_onnx.onnx network:
net
All layers seem to be convolutional. Also, we could skip the softmax at the end; limiting the number of nodes makes it somewhat easier to convert to PyTorch.

@pat676 This should be fully connected layer instead. Can you and @GgnDpSngh check with the new file? Let me know if it works in pytorch?

https://www.dropbox.com/s/kq4841shswb89fc/tansig_200_100_50_onnx.onnx?dl=0

Hi @Neelanjana314, I'm still getting Conv layers with the dropbox file. How are you creating the onnx files? If you use PyTorch, you can enable verbose=true to get a print of all layers during conversion.

@Neelanjana314
Copy link
Contributor

Neelanjana314 commented Jul 2, 2020

Hi @Neelanjana314,
I'm getting this graph for the logsig_200_100_50_onnx.onnx network:
net
All layers seem to be convolutional. Also, we could skip the softmax at the end; limiting the number of nodes makes it somewhat easier to convert to PyTorch.

@pat676 This should be fully connected layer instead. Can you and @GgnDpSngh check with the new file? Let me know if it works in pytorch?
https://www.dropbox.com/s/kq4841shswb89fc/tansig_200_100_50_onnx.onnx?dl=0

Hi @Neelanjana314, I'm still getting Conv layers with the dropbox file. How are you creating the onnx files? If you use PyTorch, you can enable verbose=true to get a print of all layers during conversion.

So, we are converting a matlab file to an onnx file and the issue is in the conversion (matlab->onnx->pytorch). I think that onnx stores minimal operations to describe layers, and I guess they have a preference to have conv layers instead of FC layers. There are ways to make a conv layer behave like an equivalent fully connected layer but this might cause performance differences while checking the robustness.

We are looking into it, if we can change the conversion process (don't think we can change the onnx translator) and will update asap.

Or, you can try to change the conv layer to fc layer after the network is created.

@pat676
Copy link

pat676 commented Jul 7, 2020

Hi @Neelanjana314,
I'm getting this graph for the logsig_200_100_50_onnx.onnx network:
net
All layers seem to be convolutional. Also, we could skip the softmax at the end; limiting the number of nodes makes it somewhat easier to convert to PyTorch.

@pat676 This should be fully connected layer instead. Can you and @GgnDpSngh check with the new file? Let me know if it works in pytorch?
https://www.dropbox.com/s/kq4841shswb89fc/tansig_200_100_50_onnx.onnx?dl=0

Hi @Neelanjana314, I'm still getting Conv layers with the dropbox file. How are you creating the onnx files? If you use PyTorch, you can enable verbose=true to get a print of all layers during conversion.

So, we are converting a matlab file to an onnx file and the issue is in the conversion (matlab->onnx->pytorch). I think that onnx stores minimal operations to describe layers, and I guess they have a preference to have conv layers instead of FC layers. There are ways to make a conv layer behave like an equivalent fully connected layer but this might cause performance differences while checking the robustness.

We are looking into it, if we can change the conversion process (don't think we can change the onnx translator) and will update asap.

Or, you can try to change the conv layer to fc layer after the network is created.

@Neelanjana314 I believe i have successfully converted to convolutional layers to fc now; however, the results are somewhat strange. Should the input values be in the range [0, 1] or [0,255]? Also, have you found reasonable epsilons and timeouts?

@Neelanjana314
Copy link
Contributor

@Neelanjana314 I believe i have successfully converted to convolutional layers to fc now; however, the results are somewhat strange. Should the input values be in the range [0, 1] or [0,255]? Also, have you found reasonable epsilons and timeouts?
Hi @pat676 the range is [0,255] and the epsilon values i.e (0.02 and 0.05) will be translated to 5 and 12 respectively. Also for timeout ~15 mins should be fine.

@pat676
Copy link

pat676 commented Jul 8, 2020

@Neelanjana314, sorry but I'm still getting some strange results and am unsure if my conversion from conv->fc succeded. Could you post your verification results for one of the network-epsilon combinations for me to use as a sanity check?

@Neelanjana314
Copy link
Contributor

@Neelanjana314, sorry but I'm still getting some strange results and am unsure if my conversion from conv->fc succeded. Could you post your verification results for one of the network-epsilon combinations for me to use as a sanity check?

HI @pat676 What about the classification results with zero epsilon values? Did you able to get the 25/25 predictions correct? However I will update the network-epsilon combinations asap.

@Neelanjana314
Copy link
Contributor

@Neelanjana314, sorry but I'm still getting some strange results and am unsure if my conversion from conv->fc succeded. Could you post your verification results for one of the network-epsilon combinations for me to use as a sanity check?

HI @pat676 What about the classification results with zero epsilon values? Did you able to get the 25/25 predictions correct? However I will update the network-epsilon combinations asap.

Hi @pat676 please find the final layer(after softmax) output for image1 and tansig_200_50.onnx as below:
0.000115960517742989
9.12453041114710e-06
3.89569815818540e-05
9.95785548319003e-05
1.07316381560933e-06
2.03678675334422e-06
7.14853401880313e-08
0.999422106024639
1.52424218336877e-05
0.000295849533050314

Also, the labels are in 1-10 range.

@pat676
Copy link

pat676 commented Jul 9, 2020

@Neelanjana314, sorry but I'm still getting some strange results and am unsure if my conversion from conv->fc succeded. Could you post your verification results for one of the network-epsilon combinations for me to use as a sanity check?

HI @pat676 What about the classification results with zero epsilon values? Did you able to get the 25/25 predictions correct? However I will update the network-epsilon combinations asap.

Hi @pat676 please find the final layer(after softmax) output for image1 and tansig_200_50.onnx as below:
0.000115960517742989
9.12453041114710e-06
3.89569815818540e-05
9.95785548319003e-05
1.07316381560933e-06
2.03678675334422e-06
7.14853401880313e-08
0.999422106024639
1.52424218336877e-05
0.000295849533050314

Also, the labels are in 1-10 range.

Thanks, that helped. Everything works now.

@Neelanjana314
Copy link
Contributor

As the time is a constraint now, we can reduce the number of images to the 1st 16 instead of 25, that might save couple of hours.

@pat676
Copy link

pat676 commented Jul 10, 2020

As the time is a constraint now, we can reduce the number of images to the 1st 16 instead of 25, that might save couple of hours.

I verified all 25 this round, but I can report only the first 16 if that's prefered. The problem seemed surprisingly difficult for such small networks, especially for eps=5. Maybe we should consider adding an easier (smaller) epsilon value too for next years competition?

@Neelanjana314
Copy link
Contributor

Neelanjana314 commented Jul 10, 2020

I verified all 25 this round, but I can report only the first 16 if that's prefered. The problem seemed surprisingly difficult for such small networks, especially for eps=5. Maybe we should consider adding an easier (smaller) epsilon value too for next years competition?

Hi @pat676 , you are right. I was about to mention that too(i.e to use epsilon 1 and 3 maybe) , but i guess everyone has already spent time on it, so, left it for this year. The previous idea was to match the specifications for the piecewiese-linear and non-linear cases.

@pat676
Copy link

pat676 commented Jul 10, 2020

I verified all 25 this round, but I can report only the first 16 if that's prefered. The problem seemed surprisingly difficult for such small networks, especially for eps=5. Maybe we should consider adding an easier (smaller) epsilon value too for next years competition?

Hi @pat676 , you are right. I was about to mention that too(i.e to use epsilon 1 and 3 maybe) , but i guess everyone has already spent time on it, so, left it for this year. The previous idea was to match the specifications for the piecewiese-linear and non-linear cases.

Hi @Neelanjana314, I'm trying a run with epsilon=3 for logsig_200_50, and the results are more interesting. How about we say that if people have the time they can also run and report eps=3, if not that's no problem for this round?

@Neelanjana314
Copy link
Contributor

I verified all 25 this round, but I can report only the first 16 if that's prefered. The problem seemed surprisingly difficult for such small networks, especially for eps=5. Maybe we should consider adding an easier (smaller) epsilon value too for next years competition?

It will be fine if everyone agrees.

@GgnDpSngh
Copy link

Hi all,

So what eps are we using for the networks in the end?

Cheers,

@Neelanjana314
Copy link
Contributor

Neelanjana314 commented Jul 17, 2020

Hi all,

So what eps are we using for the networks in the end?

Cheers,

@GgnDpSngh I think we can go ahead with 5 and 12, as it was decided previously. Adding new epsilons may create confusion among others.

Thanks
Neelanjana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category-proposal Proposed category
Projects
None yet
Development

No branches or pull requests

7 participants