Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-- convergence: nb of training epochs, ... #10

Open
jc-Rosichini opened this issue Mar 5, 2017 · 7 comments
Open

-- convergence: nb of training epochs, ... #10

jc-Rosichini opened this issue Mar 5, 2017 · 7 comments

Comments

@jc-Rosichini
Copy link

Hi,
Just tried to run your model but unfortunately I'm far from getting the same accuracy.

Could you please provide some additional infos on:

  • how many epochs (on Camvid database) have been run in order to get your results (for DenseNet 103)?
  • is FloatX set to Float32 or Float64?
  • what about the flag: optimizer_including=fusion, I can't find any description in theano doc...
  • Are the settings: RMSprop, initial learning rate: 1e-3, exponential decay:0.995, weight decay: 1e-4, dropout rate: 0:2 the ones you used consistently?

Thanks for your feedback.

@felixgwu
Copy link

Hi jc-Rosichini,

I also faced difficulty in reproducing the results.
In short, I couldn't match the accuracy with my own implementation in PyTorch and I tried this code but getting RuntimeError: error getting worksize: CUDNN_STATUS_BAD_PARAM
(For details, please see #11.)
I am curious about what version of Theano, Lasagne, CUDA, and CuDNN you used.
Would you like to share your implementation of data loading procedure?

Thanks.

@jc-Rosichini
Copy link
Author

Hi Felix,
Sure!

Here is the config:
Using cuDNN version 5105
device cuda0: GeForce GTX 980M
theano: 0.9.0rc3.dev-7e40dae76680552680ae4ff7cbf0d56713bcd2cb
numpy: 1.11.0
pygpu: 0.6.1
lasagne: 0.2.dev1

As for the error CUDNN_STATUS_BAD_PARAM, I went through it too, and it was fixed by using the latest Nvidia driver

For the data loader I'm merely using the camvid dataset and have the rgb in the range [0,1].
I'll check and try with a normalized approach: (value - mean)/std_deviation , but I'm not sure this will give better results.

I tend to think it's necessary to have a pre-training (using an auto-encoder approach?) in order to hope getting the accuracy Simon refers to.
Moreover to get high scores the loss error should use a class weighted approach since the classes are not balanced (ie sky versus bicyclist, ...)

I'll put some more thinking into it when I have some time.

@felixgwu
Copy link

Hi jc-Rosichini,

Thanks for your quick reply.
After fixing the CUDNN_STATUS_BAD_PARAM problem, the out of memory issue popped up.
It's quite weird, since I am just simply running the testing code with batch_size = 1 (the input shape is (1, 3, 360, 480)).
The GPU I used has 12G memory, which should be fine.
I wonder how much memory you used for testing.

Also, I tried balancing the class as well. It works better for mIoU but slightly worse for accuracy. The numbers I reported were in fact the one that balancing the class.

Though pre-training sounds reasonable to me, I remember that the authors emphasized that there is neither further post-processing nor pre-training in the paper.

@jc-Rosichini
Copy link
Author

Hi Felix,
I've been using cropped images (1,3,256,256) and (1,3,288,320) on a GPU980M with barely 4Gb.
with batch_size=1 with the new Theano backend (cuda)
In term of unsupervized pre_training one may certainly advocate that this would help to avoid to get stuck into some 'bad' regions of the parameter space, and thus be prone to better accuracy...

@SimJeg
Copy link
Owner

SimJeg commented Mar 15, 2017

Hi, thanks a lot for your message, it's a very important feedback as the method needs different to be validated on different datasets. Unfortunately I don't have a lot of time to work on FC-DenseNets these days but I'll try to answer.

We used float32, and easily get convergence on the CamVid dataset. We made a first attempt on PascalVOC and it did not converge until we use Adam optimizer but it ended with poor results. The 'optimizer_including=fusion' won't change anything on the results, it's juste a way to add an optimization during the compilation of the theano graph (the fast run mode doesn't work because of the important number of skip connections which was not took into account in theano so far).

I don't remember the number of epochs needed for convergence sorry. Pretraining would be great sure ! Please consider to do more experiments on the batch norm layer (maybe try batch renormalization because out batch size is small ?). As you saw we don't use it in a standard way, and several authors in segmentation don't use moving averages neither.

@jc-Rosichini
Copy link
Author

Thanks Simon,
I'll keep you posted whenever some progress is made ;-)
btw. are you sill @ECP ?
Jean-Claude

@ahundt
Copy link

ahundt commented Apr 20, 2017

@SimJeg what adam parameters did you use, and though you mentioned they are poor do you have the result numbers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants