-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Support for non-straight network shapes? #12
Comments
Hey there! Right now, we have support for arbitrary computational graphs via Dense layers, which take in as input all previous layers as input. This technically allows for any sort of computational graph, and we used it to implement residual networks. While this does mean it can support something like the split network example, it's not the cleanest way to approach that. So your notebook example could be implemented as follows (there might be a fix necessary to allow dense networks to operate directly on the input, which I haven't tested before; I believe there's currently a check to make sure Dense networks only refer back to other Dense layers, which isn't strictly necessary): netsplit = nn.DenseSequential(
# first split
nn.Linear(1,20),
nn.ReLU(),
nn.Linear(20,5),
nn.ReLU(),
# second split, skipping the previous 4 layers
Dense(nn.Linear(1,3), None, None, None, None)
nn.ReLU(),
nn.Linear(3,5),
nn.ReLU(),
# combine the two split networks
Dense(nn.Linear(5,10), None, None, None, nn.Linear(5,10)),
nn.ReLU(),
nn.Linear(10,10),
nn.ReLU(),
nn.Linear(10,20)
) However, due to the modularity of our most recent approach, I believe it would be fairly straightforward to implement the split network with one change: modifying the DualNetwork class to conform to the DualLayer signature, which would allow us to call it recursively. Then, the example in your notebook would work like the following: netsplit = nn.DenseSequential(
# first split
nn.Sequential(
nn.Linear(1,20),
nn.ReLU(),
nn.Linear(20,5),
nn.ReLU(),
),
# second split, skipping the previous output
Dense(
nn.Sequential(
nn.Linear(1,3),
nn.ReLU(),
nn.Linear(3,5),
nn.ReLU(),
),
None
)
# combine the two split networks
Dense(nn.Linear(5,10), nn.Linear(5,10)),
nn.ReLU(),
nn.Linear(10,10),
nn.ReLU(),
nn.Linear(10,20)
) which I think is the better approach. This modification I think actually makes a lot of sense: a dual network should really just be another dual layer, there's no reason for it to be a different class since it ends up implementing the same functionality, so I'll include it in the current refactor. EDIT: second example needed a Dense layer to skip the previous network |
Hi @riceric22, Thanks a lot for your help first of all! I've tried both your first and your second proposed approach, but couldn't get any of them to run (yet). I've uploaded a Jupyter notebook showing the exceptions of using the second approach to: The network learns well for the non-robust version, but then aborts adversarial learning with the error message "TypeError: object of type 'NoneType' has no len()" in the transpose_all function. The code also refers to the "toAffineTranspose" function, which seems to have been thrown out during your refactoring. Thanks! |
Hey @progirep , So I realized the problem you had was slightly different: it was not just having two different paths from the input, but the input itself was also split in half. It's not the prettiest solution, but you can achieve the split using existing functionality by using fixed linear operators on the input. I've pushed an update to master branch which fixes the error you came up with, see e747af7. In order to make the split network example work, I also had to add some None cases for when layers are non-applicable (e.g. applying a layer in one split to the other split is meaningless, and is now indicated by returning None; this happens when a Dense layer's non-None operations are applied to only None inputs). The split example now works on master branch, using the following: eye = torch.eye(2)
split1 = nn.Linear(2,1, bias=False)
split1.weight = nn.Parameter(eye[:1])
split1.weight.requires_grad = False
split2 = nn.Linear(2,1, bias=False)
split2.weight = nn.Parameter(eye[1:])
split2.weight.requires_grad = False
netSplit = convex_adversarial.DenseSequential(
# add a no-op mapping for Dense support, since Dense operators don't currently work directly on input layers
Dense(nn.Sequential()),
# split the input into two different layers
Dense(split1),
Dense(split2, None),
# operate on split2
nn.Linear(1,20),
nn.ReLU(),
nn.Linear(20,5),
nn.ReLU(),
# operate on split1, skipping the previous 5 layers
Dense(nn.Linear(1,3), None, None, None, None, None),
nn.ReLU(),
nn.Linear(3,5),
nn.ReLU(),
# combine the two split networks
Dense(nn.Linear(5,10), None, None, None, nn.Linear(5,10)),
nn.ReLU(),
nn.Linear(10,10),
nn.ReLU(),
nn.Linear(10,20)
)
opt = optim.Adam(filter(lambda p: p.requires_grad,netSplit.parameters()), lr=1e-3) This should work for now until I implement the recursive case for the sequential operator. |
Whoops, accidentally closed; I'm going to leave this open until the recursive case is implemented |
Hi Eric. Again, thanks for your support. I will try out your latest version very soon. |
Hi Eric. Your example worked very well! I am still a bit unsure if I am using the dense layers correctly, as I frequently obtain errors such as "AttributeError: 'int' object has no attribute 'dim'" in dual_layers.py I've stripped all details from my non-working examples to one in which only linear layers are used, which you can find at: The architecture learns correctly in the non-robust case, but the notebook shows the runtime error for the robust case. The example does not make sense since it only consists of linear layers, but it examplifies the problems I have when trying to modify your example. Thanks! |
Thanks for the detailed examples! There was a bug in the Dense layer which would return 0 for a forward pass when there are no applicable connections when it should have returned None. I've patched this in 797ddbb on the master branch and the network in your example runs properly now. |
Dear LocusLab members,
how difficult would it to add support for network shapes that are not a simple chaint? There are a few applications in which they make sense (at least for experimentation). I'm pretty sure that your adversarial learning method would be able to support this, but from a software engineering point of view, adding this feature looks non-trivial as your code structure does not seem to be made for supporting nestings of "nn.Sequential" layers.
As an example, I've uploaded a corresponding Jupyter Notebook to: https://github.com/progirep/convex_adversarial/blob/master/examples/SplitExample.ipynb
The example network splits up into two separate parts, which are then joined again. I've added a "SplitNetwork" class to my fork of your project.
Such network architectures could make sense for applications in which you expect many dependencies between some groups of input variables, but limited interaction between these groups for the overall classification.
The text was updated successfully, but these errors were encountered: