Skip to content
This repository has been archived by the owner on Jun 9, 2021. It is now read-only.

model.evaluate and model.predict conflict #252

Open
parkesb opened this issue May 4, 2021 · 6 comments
Open

model.evaluate and model.predict conflict #252

parkesb opened this issue May 4, 2021 · 6 comments

Comments

@parkesb
Copy link

parkesb commented May 4, 2021

A strange issue when running an example from Laurence Moroney's "AI and Machine Learning for Coders...". When running the following code on an M1 MacBook Air

import tensorflow as tf

mnist = tf.keras.datasets.fashion_mnist

(training_images, training_labels) , (test_images, test_labels) = mnist.load_data()

training_images = training_images / 255.0
test_images = test_images / 255.0

model = tf.keras.Sequential([
	tf.keras.layers.Flatten(input_shape=(28,28)),
	tf.keras.layers.Dense(128, activation='relu'),
	tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer = 'adam',
	loss='sparse_categorical_crossentropy',
	metrics=['accuracy'])
	
model.fit(training_images, training_labels, epochs=5)

model.evaluate(test_images, test_labels)

classifications = model.predict(test_images)
print(classifications[0])
print(test_labels[0])

I have the following output:

2021-05-04 16:55:00.006592: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-04 16:55:00.006877: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
Epoch 1/5
1875/1875 [==============================] - 1s 371us/step - loss: 0.6283 - accuracy: 0.7843
Epoch 2/5
1875/1875 [==============================] - 1s 362us/step - loss: 0.3812 - accuracy: 0.8641
Epoch 3/5
1875/1875 [==============================] - 1s 360us/step - loss: 0.3384 - accuracy: 0.8760
Epoch 4/5
1875/1875 [==============================] - 1s 358us/step - loss: 0.3089 - accuracy: 0.8882
Epoch 5/5
1875/1875 [==============================] - 1s 351us/step - loss: 0.2940 - accuracy: 0.8925
313/313 [==============================] - 0s 250us/step - loss: 0.3407 - accuracy: 0.8773
2021-05-04 16:55:03.721625: I tensorflow/compiler/tf2mlcompute/kernels/mlc_subgraph_op.cc:326] Compute: Failed in processing TensorFlow graph sequential/MLCSubgraphOp_2_0 with frame_id = 0 and iter_id = 0 with error: Internal: ExecuteMLCInferenceGraph: Failed to execute MLC inference graph. (error will be reported 5 times unless TF_MLC_LOGGING=1).
2021-05-04 16:55:03.722229: F tensorflow/core/framework/op_kernel.cc:983] Check failed: outputs_[index].tensor == nullptr (0x155827f80 vs. nullptr)

whereas on a 2017 Intel MBP, I have:

2021-05-04 16:54:06.839207: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-04 16:54:06.839455: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-04 16:54:07.030005: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.6239 - accuracy: 0.7835
Epoch 2/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.3826 - accuracy: 0.8624
Epoch 3/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.3382 - accuracy: 0.8761
Epoch 4/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.3124 - accuracy: 0.8851
Epoch 5/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2951 - accuracy: 0.8907
[7.2896250e-06 1.5256417e-09 3.3264627e-07 3.5464927e-09 1.3898362e-07
 1.7050464e-02 1.0498255e-06 8.5028261e-03 7.9395040e-06 9.7443002e-01]
9

Also, if I remove either the model.predict or the model.evaluate the code produces correct output and no errors.

I'm using regular python virtual envs on the MBP but Miniforge on the MacBook Air

Tensorflow package differences are as follows:

44,46c67,68
< tensorboard==2.5.0
< tensorboard-data-server==0.6.0
---
> tensorboard==2.4.1
48,50c70
< tensorflow==2.4.1
< tensorflow-addons==0.12.1
< tensorflow-datasets==4.2.0
---
> tensorflow-addons-macos==0.1a3
52c72
< tensorflow-metadata==0.30.0
---
> tensorflow-macos==0.1a3
@lsw9803
Copy link

lsw9803 commented May 6, 2021

hello, I have run into the same issue, have you got the solution?

@parkesb
Copy link
Author

parkesb commented May 9, 2021

No, but I've not looked at it. It also fails if you try and switch the order (i.e. predict and then evaluate)

@ongtw
Copy link

ongtw commented May 11, 2021

Same problem on my M1 Mac:

(m1) $ python tf_m1_eval_predict_test.py 
loading data...
creating model...
model.fit()...
2021-05-11 16:23:49.023920: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-11 16:23:49.025618: W tensorflow/core/platform/profile_utils/cpu_utils.cc:126] Failed to get CPU frequency: 0 Hz
Epoch 1/5
1875/1875 [==============================] - 1s 331us/step - loss: 0.6303 - accuracy: 0.7810
Epoch 2/5
1875/1875 [==============================] - 1s 326us/step - loss: 0.3850 - accuracy: 0.8613
Epoch 3/5
1875/1875 [==============================] - 1s 324us/step - loss: 0.3423 - accuracy: 0.8760
Epoch 4/5
1875/1875 [==============================] - 1s 323us/step - loss: 0.3148 - accuracy: 0.8864
Epoch 5/5
1875/1875 [==============================] - 1s 321us/step - loss: 0.2973 - accuracy: 0.8904
model.evaluate()...
313/313 [==============================] - 0s 234us/step - loss: 0.3342 - accuracy: 0.8781
model.predict()...
2021-05-11 16:23:52.478149: I tensorflow/compiler/tf2mlcompute/kernels/mlc_subgraph_op.cc:326] Compute: Failed in processing TensorFlow graph sequential/MLCSubgraphOp_2_0 with frame_id = 0 and iter_id = 0 with error: Internal: ExecuteMLCInferenceGraph: Failed to execute MLC inference graph. (error will be reported 5 times unless TF_MLC_LOGGING=1).
2021-05-11 16:23:52.480338: F tensorflow/core/framework/op_kernel.cc:983] Check failed: outputs_[index].tensor == nullptr (0x14b617f70 vs. nullptr)
Abort trap: 6

If I run either one of model.evaluate() or model.predict(), then it is fine.

1875/1875 [==============================] - 1s 325us/step - loss: 0.2924 - accuracy: 0.8936
model.predict()...
classifications: [2.3991666e-05 1.9442258e-07 2.4191124e-06 3.0609449e-06 2.1939153e-05
 2.4558472e-02 5.9402762e-05 7.7641018e-02 2.1327533e-04 8.9747626e-01]
1875/1875 [==============================] - 1s 330us/step - loss: 0.2968 - accuracy: 0.8891
model.evaluate()...
313/313 [==============================] - 0s 235us/step - loss: 0.3537 - accuracy: 0.8694
test_labels: 9

Looks like there is code error when using both functions in sequence.

@alessio-ca
Copy link

alessio-ca commented May 13, 2021

Experiencing the same problem as well (MacBook Pro 13-inch, 2020, Quad-Core Intel Core i5).
The problem occurs with both CPU and GPU, using mlcompute.set_mlc_device(device_name="gpu")

@devnev39
Copy link

#266 (comment)
Check this issue and its solution
Might help

@edavidk7
Copy link

edavidk7 commented May 24, 2021

It seems the culprit here is the specified activation function of the output layer. Once this parameter is removed, the code works fine.
Edit: linear for output layer works fine, sigmoid doesn't

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants