-
Notifications
You must be signed in to change notification settings - Fork 152
Recurrent neural networks #185
Comments
I've worked on the current neural network code, so I think I have some comments that could be helpful for you working with this codebase (Disclaimer: I haven't read through the paper/Python code you link so they might lead to solutions to some potential issues I mention. Also, I'm thinking through this as I type so it's likely to contain errors; please do be skeptical of any issues/requirements I claim). Unfortunately, I suspect there will be some annoyances with implementing RNNs (although it shouldn't be impossible) arising from the fact that RNNs have an internal state that gets updated as they process data. Generally speaking, the More specifically about working with the neural network code, I don't know if you've taken a look, but it's setup around the idea of stacking together multiple "NetLayers", so to implement RNNs, you just need to write a recurrent NetLayer. One important thing to know is that individual layers don't actually store their own parameters; all parameters for an entire network are stored in one contiguous array, and each layer receives a slice of that array to compute its function (You need parameters all in one place like this to optimize by gradient descent). So the million dollar question is how to represent the state of a recurrent layer. You can't add the state to the entire Network's I could be wrong, but I think you will end up having to (minimally) modify the existing neural network code to allow for a concept of state in layers. One rough idea that might work is to have your recurrent layer output its state as well as the function it actually computes in one matrix. The layer's let mut index = 0;
for (i, layer) in self.layers.iter().enumerate() {
let shape = layer.param_shape();
let slice = unsafe {
MatrixSlice::from_raw_parts(weights.as_ptr().offset(index as isize),
shape.0,
shape.1,
shape.1)
};
let (output, state) = if i == 0 {
// You might be wondering why append state to slice instead of making it a separate parameter
// This is just to keep modifications to the NetLayer interface minimal since most layers shouldn't have any state
let both = layer.forward(inputs, slice.append(states.last())).unwrap();
// split both into true output and state
(rows 0...layer.output_cutoff() of both, rows layer.output_cutoff()...both.rows() of both)
} else {
let both = layer.forward(activations.last().unwrap(), slice.append(states.last())).unwrap()
(rows 0...layer.output_cutoff() of both, rows layer.output_cutoff()...both.rows() of both)
};
activations.push(output);
params.push(slice);
states.push(state);
index += layer.num_params();
}
let output = activations.last().unwrap(); The actual changes would probably look different than above, but basically just keep track of states as you forward propagate without directly modifying the Network or its layers, and then you'll have them stored for computing gradients via BPTT. Finally, I'm not claiming something like what I've sketched is the only or best way to go about adding RNNs to this codebase, but hopefully some of this comment has been informative. |
Couple of comments from regular user of RNNs: TLDR: You cannot get it right. I cannot get it right. Most of us cannot get it right. Based on that, I would really suggest to leave neural net implementation to specialized libs (like wrappers around tensorflow, cntk, or whatever, ...) and focus on the rest of API. |
Heya,
As I understand, RNNs are not yet in the project. If there's no current work on them, I'd like to look a bit into implementing them with rusty-machine. I'm still fairly new to both machine learning and Rust - done with my FNNs and linear regressions thus far - but have a specific project in mind which I'd like to try RNNs out with.
A start would be to set up an architehture for creating a RNN topology and forward-propagating through it. Then start working on the BPTT and later LSTM, which, if I've understood correctly, seems to be central in making RNNs work in practice, at least when working with data that spans more than a handful of timesteps.
For the implementation sources, I currently have this paper: https://arxiv.org/pdf/1610.02583.pdf and then this with some actual example code in Python: http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
If there already is code in for this, then that's all for the better of course!
The text was updated successfully, but these errors were encountered: