Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model not training #6

Open
DmitriyALoza opened this issue May 16, 2023 · 6 comments
Open

Model not training #6

DmitriyALoza opened this issue May 16, 2023 · 6 comments

Comments

@DmitriyALoza
Copy link

Whenever I try to train the model I get an error that says this:

WARNING:tensorflow:Model was constructed with shape (None, 512, 512, 3) for input KerasTensor(type_spec=TensorSpec(shape=(None, 512, 512, 3), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'"), but it was called on an input with incompatible shape (None, None, None).
Traceback (most recent call last):
File "C:\Users\Documents\OrganoID-master\OrganoID.py", line 33, in
program.RunProgram(args)
File "C:\Users\Documents\OrganoID-master\CommandLine\Train.py", line 99, in RunProgram
TrainModel(model, parserArgs.learningRate, parserArgs.patience, parserArgs.epochs,
File "C:\Users\Documents\OrganoID-master\Core\Model.py", line 82, in TrainModel
model.fit(x=ImageGenerator(trainingData, batchSize, model),
File "C:\Users\Anaconda3\envs\OrganoID\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\AppData\Local\Temp_autograph_generated_filetbjvh430.py", line 15, in tf__train_function
retval
= ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
ValueError: in user code:

File "C:\Users\Anaconda3\envs\OrganoID\lib\site-packages\keras\engine\training.py", line 1249, in train_function  *
    return step_function(self, iterator)
File "C:\Users\Anaconda3\envs\OrganoID\lib\site-packages\keras\engine\training.py", line 1233, in step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Users\Anaconda3\envs\OrganoID\lib\site-packages\keras\engine\training.py", line 1222, in run_step  **
    outputs = model.train_step(data)
File "C:\Users\Anaconda3\envs\OrganoID\lib\site-packages\keras\engine\training.py", line 1023, in train_step
    y_pred = self(x, training=True)
File "C:\Users\Anaconda3\envs\OrganoID\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
File "C:\Users\Anaconda3\envs\OrganoID\lib\site-packages\keras\engine\input_spec.py", line 250, in assert_input_compatibility
    raise ValueError(

ValueError: Exception encountered when calling layer 'model' (type Functional).

Input 0 of layer "conv2d" is incompatible with the layer: expected min_ndim=4, found ndim=3. Full shape received: (None, None, None)

Call arguments received by layer 'model' (type Functional):
  • inputs=tf.Tensor(shape=(None, None, None), dtype=uint8)
  • training=True
  • mask=None 

I have used the preexisting Augment.py script and also my own to augment the images. Both Augmenters use an image resized function that resizes the images to 1984, 1984, 3 and also 512, 512, 3. I do not understand how to fix this error and any help would be greatly appreciated!

@schmoogol
Copy link

Try changing line 17 of Model.py from:
inputs = tf.keras.layers.Input((imageSize[0], imageSize[1], 3))
to
inputs = tf.keras.layers.Input((imageSize[0], imageSize[1], 1))

That seems to allow the training to run for me, although I have yet to determine whether it generates a valid model.

@DmitriyALoza
Copy link
Author

So I tried your approach and created a model but it created a model that did not weigh as much as the "OptimizedModel". It creates a folder with the saved_model.pb which is 711KB. It also does not segment the images anymore. I would love to see if there is anything that I am missing when trying to train the model. I also get an error when tracking:

"File ________________\Tracking.py", line 64, in Track m2 = np.zeros(np.max(mapping[:,0])+1)"
"IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed".

I'm not sure if it's:

  1. An issue with the way that I trained the model (following the advice you gave me earlier by changing line 17 of Model.py)
  2. not using the right trained model (the sizes of the models are different and I'm using the saved_model.pb which is also located in the "TrainableModel" folder.
  3. Any other issue

Any help would be great!

@schmoogol
Copy link

It sounds like you aren't using a proper model file. Make sure you include '--lite' in the train command to generate the correct .tflite file for running the identification. Once generated, use that file instead of 'OptimizedModel' in the run command. Make sure you include the .tflite extension in the command (e.g. newmodel.tflite), or just remove that extension from the file if you want.

@Djul0
Copy link

Djul0 commented Nov 8, 2023

Changing line 17 seems to solve the training error. But i still have the same issue as @WinterMedved. I've tried the solution of @schmoogol with the addition of "--lite". the training is very fast (around 20 epoch) and at the end i have these messages showing :

2023-11-08 11:20:49.590241: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:378] Ignored output_format.
2023-11-08 11:20:49.590277: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:381] Ignored drop_control_dependency.
2023-11-08 11:20:49.590677: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /var/folders/1s/nwdhlk99679c3wlc3tb25d880000gn/T/tmp12457uj7
2023-11-08 11:20:49.594018: I tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2023-11-08 11:20:49.594025: I tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: /var/folders/1s/nwdhlk99679c3wlc3tb25d880000gn/T/tmp12457uj7
2023-11-08 11:20:49.600012: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
2023-11-08 11:20:49.602312: I tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle.
2023-11-08 11:20:49.706858: I tensorflow/cc/saved_model/loader.cc:217] Running initialization op on SavedModel bundle at path: /var/folders/1s/nwdhlk99679c3wlc3tb25d880000gn/T/tmp12457uj7
2023-11-08 11:20:49.735001: I tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags { serve }; Status: success: OK. Took 144325 microseconds.
2023-11-08 11:20:49.773980: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.

When i use the newmodel.tflite It doesnt seems to change anything and it does not segment the images anymore. I'm pretty sure it's a taining problem. here is the command i used to start the training:

python OrganoID.py train /path/to/trainimg/images /path/to/new/model MODELNAME -M TrainableModel --lite

Am i missing something?

@schmoogol
Copy link

Have you tried carrying out the training with the dataset linked in the readme to rule out any problems with your dataset? Presumably you have included the correct paths to your training images and the trainable model and have not used the command above verbatim?

@Djul0
Copy link

Djul0 commented Nov 9, 2023

Dont worried I did not used the command above verbatim :)

Your suggestion was correct, i tried to use the linked dataset and it's working. so i knew it was a problem in my images format. And indeed my images were in ".tif" i converted them in ".png" format and it's working !
Thank you so much for your help @schmoogol !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants