Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to tune the number of epochs and batch_size? #122

Closed
ogreyesp opened this issue Oct 21, 2019 · 23 comments
Closed

How to tune the number of epochs and batch_size? #122

ogreyesp opened this issue Oct 21, 2019 · 23 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@ogreyesp
Copy link

ogreyesp commented Oct 21, 2019

Hi,

How I can tune the number of epochs and batch size?

The provided examples always assume fixed values for these two hyperparameters.

@omalleyt12
Copy link
Contributor

omalleyt12 commented Oct 21, 2019

@ogreyesp Thanks for the issue!

This comment is updated by @haifeng-jin because it was out-of-date.
Following is the latest recommended way of doing it:

This is a barebone code for tuning batch size.
The *args and **kwargs are the ones you passed from tuner.search().

class MyHyperModel(kt.HyperModel):
    def build(self, hp):
        model = keras.Sequential()
        model.add(layers.Flatten())
        model.add(
            layers.Dense(
                units=hp.Int("units", min_value=32, max_value=512, step=32),
                activation="relu",
            )
        )
        model.add(layers.Dense(10, activation="softmax"))
        model.compile(
            optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"],
        )
        return model

    def fit(self, hp, model, *args, **kwargs):
        return model.fit(
            *args,
            batch_size=hp.Choice("batch_size", [16, 32]),
            **kwargs,
        )

tuner = kt.RandomSearch(
    MyHyperModel(),
    objective="val_accuracy",
    max_trials=3,
    overwrite=True,
    directory="my_dir",
    project_name="tune_hypermodel",
)

For epochs specifically, I'd alternatively recommend looking at using early stopping during training via passing in the tf.keras.callbacks.EarlyStopping callback if it's applicable to your use case. This can be configured to stop your training as soon as the validation loss stops improving. You can pass Keras callbacks like this to search:

# Will stop training if the "val_loss" hasn't improved in 3 epochs.
tuner.search(x, y, epochs=30, callbacks=[tf.keras.callbacks.EarlyStopping('val_loss', patience=3)])

For n-fold cross validation, you can also just do it in HyperModel.fit() and return the result as a dictionary like {"val_accuracy": 0.3}, where the key is the name of the objective.
Please follow this guide for more details.

@omalleyt12 omalleyt12 self-assigned this Oct 21, 2019
@omalleyt12 omalleyt12 added the documentation Improvements or additions to documentation label Oct 21, 2019
@ogreyesp
Copy link
Author

Thanks @omalleyt12.

Your response is very helpful.

@ogreyesp
Copy link
Author

ogreyesp commented Oct 22, 2019

This project is very important and useful for me. However, the lack of documentation and tutorials is hampering its use.

For example, how can I determine the best subset of hyperparameters by conducting a cross validation?

@omalleyt12
Copy link
Contributor

omalleyt12 commented Oct 22, 2019

This comment is updated by @haifeng-jin because it was out-of-date.
Please use the code snippets above instead.

@omalleyt12
Copy link
Contributor

Please see pending PR here with a tutorial: #136

@pickfire
Copy link

pickfire commented Feb 9, 2020

Is it possible to do tuning without creating a class?

@VincBar
Copy link

VincBar commented May 2, 2020

Thanks for the explanation on batch size. However, when I retrieve the parameters of the best model by tuner.get_best_hyperparameters()[0] and take a look at the values through .get_config()["values"] the batch_size is not listed there.
How can I retrieve the hyperparameter "batch size" when doing the search in the way described here.

@tolandwehr
Copy link

@omalleyt12 @VincBar Was this issue resolved? Using KerasTuner for epoch and batch_size right now, too. Not very keen to have invisible results after 10hrs of running.

@VincBar
Copy link

VincBar commented Aug 25, 2020

@tolandwehr hey, I dont know if the direct way is solved, but I went around by inculding the batchsize hyperparam in the hypermodel and save it to self.batch_size ( or in my case actually a dictionary with some other stuff) and define a fit function in my hypermodel that then takes this (and whatever else the fit might need).

@tolandwehr
Copy link

@VincBar Sounds interesting. Could you give a code, if still available ^^'?

@tolandwehr
Copy link

@omalleyt12

other issue: got a NaN/Inf error after some hours of iterations... which is strange, cause I double checked the dataset with

.isnull().sum().sum()

and there were no NaNs

ValueError                                Traceback (most recent call last)
<ipython-input-666-7713a18234fe> in <module>
----> 1 tuner.search(X_train, y_train, epochs=40, validation_split=0.1, callbacks=[tf.keras.callbacks.EarlyStopping('val_loss', patience=3)])

~\Anaconda3\envs\Tensorflow\lib\site-packages\kerastuner\engine\base_tuner.py in search(self, *fit_args, **fit_kwargs)
    118         self.on_search_begin()
    119         while True:
--> 120             trial = self.oracle.create_trial(self.tuner_id)
    121             if trial.status == trial_module.TrialStatus.STOPPED:
    122                 # Oracle triggered exit.

~\Anaconda3\envs\Tensorflow\lib\site-packages\kerastuner\engine\oracle.py in create_trial(self, tuner_id)
    147             values = None
    148         else:
--> 149             response = self._populate_space(trial_id)
    150             status = response['status']
    151             values = response['values'] if 'values' in response else None

~\Anaconda3\envs\Tensorflow\lib\site-packages\kerastuner\tuners\bayesian.py in _populate_space(self, trial_id)
    101         x, y = self._vectorize_trials()
    102         try:
--> 103             self.gpr.fit(x, y)
    104         except exceptions.ConvergenceWarning:
    105             # If convergence of the GPR fails, create a random trial.

~\Anaconda3\envs\Tensorflow\lib\site-packages\sklearn\gaussian_process\_gpr.py in fit(self, X, y)
    232             optima = [(self._constrained_optimization(obj_func,
    233                                                       self.kernel_.theta,
--> 234                                                       self.kernel_.bounds))]
    235 
    236             # Additional runs are performed from log-uniform chosen initial

~\Anaconda3\envs\Tensorflow\lib\site-packages\sklearn\gaussian_process\_gpr.py in _constrained_optimization(self, obj_func, initial_theta, bounds)
    501             opt_res = scipy.optimize.minimize(
    502                 obj_func, initial_theta, method="L-BFGS-B", jac=True,
--> 503                 bounds=bounds)
    504             _check_optimize_result("lbfgs", opt_res)
    505             theta_opt, func_min = opt_res.x, opt_res.fun

~\AppData\Roaming\Python\Python36\site-packages\scipy\optimize\_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
    608     elif meth == 'l-bfgs-b':
    609         return _minimize_lbfgsb(fun, x0, args, jac, bounds,
--> 610                                 callback=callback, **options)
    611     elif meth == 'tnc':
    612         return _minimize_tnc(fun, x0, args, jac, bounds, callback=callback,

~\AppData\Roaming\Python\Python36\site-packages\scipy\optimize\lbfgsb.py in _minimize_lbfgsb(fun, x0, args, jac, bounds, disp, maxcor, ftol, gtol, eps, maxfun, maxiter, iprint, callback, maxls, **unknown_options)
    343             # until the completion of the current minimization iteration.
    344             # Overwrite f and g:
--> 345             f, g = func_and_grad(x)
    346         elif task_str.startswith(b'NEW_X'):
    347             # new iteration

~\AppData\Roaming\Python\Python36\site-packages\scipy\optimize\lbfgsb.py in func_and_grad(x)
    293     else:
    294         def func_and_grad(x):
--> 295             f = fun(x, *args)
    296             g = jac(x, *args)
    297             return f, g

~\AppData\Roaming\Python\Python36\site-packages\scipy\optimize\optimize.py in function_wrapper(*wrapper_args)
    325     def function_wrapper(*wrapper_args):
    326         ncalls[0] += 1
--> 327         return function(*(wrapper_args + args))
    328 
    329     return ncalls, function_wrapper

~\AppData\Roaming\Python\Python36\site-packages\scipy\optimize\optimize.py in __call__(self, x, *args)
     63     def __call__(self, x, *args):
     64         self.x = numpy.asarray(x).copy()
---> 65         fg = self.fun(x, *args)
     66         self.jac = fg[1]
     67         return fg[0]

~\Anaconda3\envs\Tensorflow\lib\site-packages\sklearn\gaussian_process\_gpr.py in obj_func(theta, eval_gradient)
    223                 if eval_gradient:
    224                     lml, grad = self.log_marginal_likelihood(
--> 225                         theta, eval_gradient=True, clone_kernel=False)
    226                     return -lml, -grad
    227                 else:

~\Anaconda3\envs\Tensorflow\lib\site-packages\sklearn\gaussian_process\_gpr.py in log_marginal_likelihood(self, theta, eval_gradient, clone_kernel)
    474             y_train = y_train[:, np.newaxis]
    475 
--> 476         alpha = cho_solve((L, True), y_train)  # Line 3
    477 
    478         # Compute log-likelihood (compare line 7)

~\AppData\Roaming\Python\Python36\site-packages\scipy\linalg\decomp_cholesky.py in cho_solve(c_and_lower, b, overwrite_b, check_finite)
    194     (c, lower) = c_and_lower
    195     if check_finite:
--> 196         b1 = asarray_chkfinite(b)
    197         c = asarray_chkfinite(c)
    198     else:

~\Anaconda3\envs\Tensorflow\lib\site-packages\numpy\lib\function_base.py in asarray_chkfinite(a, dtype, order)
    497     if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
    498         raise ValueError(
--> 499             "array must not contain infs or NaNs")
    500     return a
    501 

ValueError: array must not contain infs or NaNs

@saranyaprakash2012
Copy link

I would like to use Bayesian optimization tuner to tune epochs and batch size for a BLSTM model. My data is passed in using a custom data generator, which takes batch size as input. How do I use the Keras tuner in this case?

@LIKHITA12
Copy link

@ogreyesp Thanks for the issue!

This can be done by subclassing the Tuner class you are using and overriding run_trial. (Note that Hyperband sets the epochs to train for via its own logic, so if you're using Hyperband you shouldn't tune the epochs). Here's an example with kt.tuners.BayesianOptimization:

class MyTuner(kerastuner.tuners.BayesianOptimization):
  def run_trial(self, trial, *args, **kwargs):
    # You can add additional HyperParameters for preprocessing and custom training loops
    # via overriding `run_trial`
    kwargs['batch_size'] = trial.hyperparameters.Int('batch_size', 32, 256, step=32)
    kwargs['epochs'] = trial.hyperparameters.Int('epochs', 10, 30)
    super(MyTuner, self).run_trial(trial, *args, **kwargs)

# Uses same arguments as the BayesianOptimization Tuner.
tuner = MyTuner(...)
# Don't pass epochs or batch_size here, let the Tuner tune them.
tuner.search(...)

For epochs specifically, I'd alternatively recommend looking at using early stopping during training via passing in the tf.keras.callbacks.EarlyStopping callback if it's applicable to your use case. This can be configured to stop your training as soon as the validation loss stops improving. You can pass Keras callbacks like this to search:

# Will stop training if the "val_loss" hasn't improved in 3 epochs.
tuner.search(x, y, epochs=30, callbacks=[tf.keras.callbacks.EarlyStopping('val_loss', patience=3)])

hello @ogreyesp,
I have implemented this in the Hyperband keras tuner. I have a doubt, for the first trial, why the batch_size is not included and from the second trial onwards. Why is it so? Is there any way to include batch_size in first trial itself? Please let me know.

@JoepC
Copy link

JoepC commented Apr 22, 2021

I used the following code to optimise the number of epochs and batch size:

class MyTuner(kerastuner.tuners.BayesianOptimization):
def run_trial(self, trial, *args, **kwargs):
# You can add additional HyperParameters for preprocessing and custom training loops
# via overriding run_trial
kwargs['batch_size'] = trial.hyperparameters.Int('batch_size', 32, 256, step=32)
kwargs['epochs'] = trial.hyperparameters.Int('epochs', 10, 30)
super(MyTuner, self).run_trial(trial, *args, **kwargs)

Now I want to save the number of epochs and batch size for the best trial that the tuner found.

I tried using the following code suggested by @fredshu, but I could not get it working:

values['batch_size'] = best_trial.batch_size

How is 'best_trial' defined? I use best_model = tuner.get_best_models()[0] to get the best model to make predictions afterwards, if I replace best_trial with best_model it does not work.
I used with redirect_stdout(f): tuner.results_summary() to save the full summary to a text file but now I only want to have the number of epochs and batch size of the best trial.

So how do I save the number of epochs and batch size of the best trial to seperate variables? If it is possible I would also like to save the other optimised hyperparameters.

@sukrit2018
Copy link

I am new to Keras and Tensorflow. I want to simultaneously explore the number of epochs and the CV for my project. Can you please help me to write the custom Tuner?

@21kc-caracol
Copy link

@saranyaprakash2012
Did you manage to use a Keras training generator with a Keras Tuner that tunes the batch_size?

Can anyone give a code snippet that does that?

The above example @omalleyt12
gave didn't change the actual batch size that the training generator (ImageDataGenerator) took.

I mean that in the log the Keras Tuner shows it printed as if the batch size was taken into consideration, but the actual log also showed that the training generator ignored the Keras Tuner batch_size and just took a predefined value...

Examples:
The actual batch size was 128 on a debug dataset of 150~ samples, so we had 2 batches:
`2/2 [==============================]

  • ETA: 0s - loss: 1.6348 - accuracy: 0.4901

2/2 [==============================]

  • 8s 4s/step - loss: 1.6348 - accuracy: 0.4901 - val_loss: 12738311.0000 - val_accuracy: 0.5000
    `

but in the hyper parameters of the tuner it showed

`Hyperparameter |Value |Best Value So Far

learning_rate |0.5 |0.5

decay |0.01 |0.01

momentum |0 |0

batch_size |2 |1

`
(I only had 1,2 as the batch size options inside the tuner)

@saranyaprakash2012
Copy link

Try something like this :

#create a model class

def create_hypermodel(hp):
learning_rate=0.0001
K.clear_session()
inputs_pose_gaze = Input(POSE_GAZE_INPUT_SHAPE)
blstm1_pose_gaze = Bidirectional(LSTM(200,return_sequences=True,recurrent_dropout =0,activation='tanh'))(inputs_pose_gaze)
max_pooled_poze= GlobalMaxPooling1D()(blstm1_pose_gaze)
output = Dense(1,activation='sigmoid')(max_pooled_poze)
model = Model(inputs=[inputs_pose_gaze], outputs=output)
model.compile(optimizer=optimizers.Adam(learning_rate),
loss=losses.BinaryCrossentropy(),
)
print(model.summary())

return model

#Tuner class

class MyTuner2(BayesianOptimization):
def run_trial(self, trial, *args, **kwargs):
# You can add additional HyperParameters for preprocessing and custom training loops
# via overriding run_trial
hp = trial.hyperparameters

        kwargs['batch_size'] = hp.Int('batch_size', 4, 16, step=4)
        # kwargs['val_batch_size'] = hp.Int('val_batch_size', 1, 4, step=1)
        kwargs['epochs'] = hp.Int('epochs', 10, 25,step=5)
        
        
        train_data_gen= video_batch_generator(train_upsample_files,hp.Int('batch_size',4,16,step=4))
        print(f"batch_size:{hp.Int('batch_size',4,16,step=4)}")
        val_data_gen= video_batch_generator(val_files,hp.Int('val_batch_size', 1, 4, step=1))
    
        steps_per_epoch= math.floor(len(train_upsample_files)/hp.Int('batch_size',4,16,step=4)) 
        val_steps_per_epoch= math.floor(len(val_files)/VAL_BATCH_SIZE)
        early_stopping = tf.keras.callbacks.EarlyStopping(
        monitor='auc', 
        verbose=1,
        patience=2,
        mode='max',
        restore_best_weights=True)
    
        model = self.hypermodel.build(hp)
        model.fit(train_data_gen,steps_per_epoch=steps_per_epoch,epochs = hp.Int('epochs', 5, 20,step=5),callbacks=[early_stopping])
        val_metrics = model.evaluate(val_data_gen,steps =val_steps_per_epoch,return_dict=True)
        print(f"Evaluation val_metrics :{val_metrics}")
        self.oracle.update_trial(
          trial.trial_id, {'val_auc': val_metrics['auc']})
        self.save_model(trial.trial_id, model)



# Uses same arguments as the BayesianOptimization Tuner.
tuner = MyTuner2(create_hypermodel,
    objective=Objective("val_auc", direction="max"),
    max_trials=6,
    executions_per_trial=1,
    directory=os.path.normpath('keras_tuning_blstm_video'),
    project_name='kerastuner_bayesian_lstm_video',overwrite=True)



# Don't pass epochs or batch_size here, let the Tuner tune them.

tuner.search_space_summary()
    
tuner.search()
model_best_model_epoch_batch_size = tuner.get_best_models(num_models=1)
# model_tuned = model_best_model_epoch_batch_size[0]
print(tuner.get_best_hyperparameters()[0].get_config()["values"])
# filepath_best_model ="video_best_batch_model"
# model_best_model_epoch_batch_size.save(filepath_best_model)

@21kc-caracol
Copy link

Try something like this :
@saranyaprakash2012

Could you make it more clear about what code goes into what block?
The indentation is a bit confusing

Thanks!

@haifeng-jin
Copy link
Collaborator

This guide is out of date.
Please follow this guide instead.

@vaxherra
Copy link

vaxherra commented Feb 10, 2022

I had some problems with the below version. Namely, I couldn't make it to run with custom objective.

class MyTuner(kerastuner.tuners.BayesianOptimization):
  def run_trial(self, trial, *args, **kwargs):
    # You can add additional HyperParameters for preprocessing and custom training loops
    # via overriding `run_trial`
    kwargs['batch_size'] = trial.hyperparameters.Int('batch_size', 32, 256, step=32)
    kwargs['epochs'] = trial.hyperparameters.Int('epochs', 10, 30)
    super(MyTuner, self).run_trial(trial, *args, **kwargs)

I added the return statement and it fixed that

class MyTuner(kerastuner.tuners.BayesianOptimization):
  def run_trial(self, trial, *args, **kwargs):
    # You can add additional HyperParameters for preprocessing and custom training loops
    # via overriding `run_trial`
    kwargs['batch_size'] = trial.hyperparameters.Int('batch_size', 32, 256, step=32)
    kwargs['epochs'] = trial.hyperparameters.Int('epochs', 10, 30)
    return super(MyTuner, self).run_trial(trial, *args, **kwargs)

@davidwanner-8451
Copy link

davidwanner-8451 commented Jun 10, 2022

@ogreyesp Thanks for the issue!

This comment is updated by @haifeng-jin because it was out-of-date. Following is the latest recommended way of doing it:

This is a barebone code for tuning batch size. The *args and **kwargs are the ones you passed from tuner.search().

class MyHyperModel(kt.HyperModel):
    def build(self, hp):
        model = keras.Sequential()
        model.add(layers.Flatten())
        model.add(
            layers.Dense(
                units=hp.Int("units", min_value=32, max_value=512, step=32),
                activation="relu",
            )
        )
        model.add(layers.Dense(10, activation="softmax"))
        model.compile(
            optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"],
        )
        return model

    def fit(self, hp, model, *args, **kwargs):
        return model.fit(
            *args,
            batch_size=hp.Choice("batch_size", [16, 32]),
            **kwargs,
        )

tuner = kt.RandomSearch(
    MyHyperModel(),
    objective="val_accuracy",
    max_trials=3,
    overwrite=True,
    directory="my_dir",
    project_name="tune_hypermodel",
)

For epochs specifically, I'd alternatively recommend looking at using early stopping during training via passing in the tf.keras.callbacks.EarlyStopping callback if it's applicable to your use case. This can be configured to stop your training as soon as the validation loss stops improving. You can pass Keras callbacks like this to search:

# Will stop training if the "val_loss" hasn't improved in 3 epochs.
tuner.search(x, y, epochs=30, callbacks=[tf.keras.callbacks.EarlyStopping('val_loss', patience=3)])

For n-fold cross validation, you can also just do it in HyperModel.fit() and return the result as a dictionary like {"val_accuracy": 0.3}, where the key is the name of the objective. Please follow this guide for more details.

Curious - is this considered the proper approach for tuning batch_size? It looks like this comment was edited in Feb 2022 so my assumption is yes, but I have not seen this approach in the docs (I could be missing them)

@haifeng-jin
Copy link
Collaborator

Yes, this is the official recommended approach. Thanks

@muriloasouza
Copy link

muriloasouza commented Mar 14, 2023

I am also trying to tune the batch_size and could use some help here please:

class MyHyperModel(keras_tuner.HyperModel):
    def build(self, hp):
        model = Sequential(name='Conv1D_Model') 
        model.add(InputLayer((timesteps, input_dim), name='input_layer'))
        for j in range(hp.Int("num_conv_layers", 1, 2)):
            model.add(Conv1D(filters=hp.Int(f'filters_{j}', min_value=32, max_value=256, step=32),
                             kernel_size=hp.Int('kernel_size', min_value=2, max_value=6, step=2),
                             activation='tanh',
                             name=f'{j}_conv_layer'))
            model.add(MaxPooling1D(pool_size=1))
        model.add(Flatten())
        if hp.Boolean("dropout"):
            model.add(Dropout(rate=0.25))

        for k in range(hp.Int("num_layers", 1, 3)):
            model.add(Dense(units=hp.Int(f'units_{k}', min_value=24, max_value=72, step=24),
                            activation='tanh',
                            name=f'{k}_dense'))
        model.add(Dense(units=1,
                        activation='tanh',
                        name='output_layer'))
        model.compile(optimizer='adam',
                      loss='mean_squared_error')
        return model

    def fit(self, hp, model, *args, batch_size=32,
            **kwargs):
        return model.fit(
            *args,
            batch_size=hp.Choice("batch_size", [16, 32, 64]),
            **kwargs,
        )

But in the search space i got this:

Search space summary
Default search space size: 6
num_conv_layers (Int)
{'default': None, 'conditions': [], 'min_value': 1, 'max_value': 2, 'step': 1, 'sampling': None}
filters_0 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 256, 'step': 32, 'sampling': None}
kernel_size (Int)
{'default': None, 'conditions': [], 'min_value': 2, 'max_value': 6, 'step': 2, 'sampling': None}
dropout (Boolean)
{'default': False, 'conditions': []}
num_layers (Int)
{'default': None, 'conditions': [], 'min_value': 1, 'max_value': 3, 'step': 1, 'sampling': None}
units_0 (Int)
{'default': None, 'conditions': [], 'min_value': 24, 'max_value': 72, 'step': 24, 'sampling': None}
None

Here is the first trial:

Search: Running Trial #1
Value             |Best Value So Far |Hyperparameter
1                 |?                 |num_conv_layers
160               |?                 |filters_0
4                 |?                 |kernel_size
False             |?                 |dropout
1                 |?                 |num_layers
72                |?                 |units_0

Shouldn't the batch size appears both in the search space and the trial report? How do i know wich batch size is being used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests