You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When finetuning chronos-t5-small on the ETTh1 dataset and ETTh2 dataset respectively, the performance drops compared to the zeroshot performance. Could that be the case because the prediction_length is recommended to be <=64?
Expected behavior
If the model chronos-t5-small is finetuned on let's say the dataset ETTh1 only, the finetuned model should yield superior MAE and MSE performance compared to the zeroshot model.
How to reproduce
This example is focused on the ETTh1 dataset. For the ETTh2 dataset, the procedure is identical. Please note that the finetuning evaluation for my experiments is done individually for both datasets, so the model is not finetuned and evaluated on both datasets at once.
Standardize ETTh dataset and convert it into arrow format
def convert_to_arrow(
path: Union[str, Path],
time_series: Union[List[np.ndarray], np.ndarray],
compression: str = "lz4",
):
"""
Store a given set of series into Arrow format at the specified path.
Input data can be either a list of 1D numpy arrays, or a single 2D
numpy array of shape (num_series, time_length).
"""
assert isinstance(time_series, list) or (
isinstance(time_series, np.ndarray) and
time_series.ndim == 2
)
# Set an arbitrary start time
start = np.datetime64("2016-07-01 00:00:00", "s")
dataset = [
{"start": start, "target": ts} for ts in time_series
]
ArrowWriter(compression=compression).write_to_file(
dataset,
path=path,
)
if name == "main":
# Load and preprocess the dataset
df = pd.read_csv('/path/to/dataset')
df=df[0:12194]
# Ensure time column is in datetime format
time_column = 'date'
df[time_column] = pd.to_datetime(df[time_column])
# Define feature columns
feature_columns = ['HUFL', 'HULL', 'MUFL', 'MULL', 'LUFL', 'LULL', 'OT']
# Standardize the feature columns
scaler = StandardScaler()
df[feature_columns] = scaler.fit_transform(df[feature_columns])
df['id'] = 0
# Create the structured DataFrame
structured_df = df[['id', time_column] + feature_columns].rename(columns={time_column: 'timestamp'})
# Extract the time series and start times
time_series = [structured_df[col].to_numpy() for col in feature_columns]
start_times = [np.datetime64(structured_df['timestamp'].iloc[0], 's')] * len(feature_columns)
Finetune the model
Use training pipeline implemented in file chronos-forecasting/scripts/training/train.py shown in the tutorial with the following config chronos-t5-small.yaml:
Evaluate finetuned model on the final checkpoint with the evaluation pipeline implemented in chronos-forecasting/scripts/evaluation/evaluate.py with the following modifications in the load_and_split_dataset() function:
The evaluation is performed on the test section of the standardized ETTh1 dataset, hence the offset. For the evaluation pipeline, use the following config:
Thank you opening this, although this is not exactly a bug. When it comes to fine-tuning there is no one-fits-all set of hyperparameters that would work for all datasets. Therefore, it is completely possible that due to specific settings such as a large learning_rate or max_steps, the model's performance worsens upon fine-tuning (e.g., due to over-fitting). I would encourage you to try different fine-tuning hyperparameters.
Thanks, drastically reducing the initial learning rate solves the problem (the finetune performance is better than the zeroshot performance, for both datasets).
Describe the bug
When finetuning chronos-t5-small on the ETTh1 dataset and ETTh2 dataset respectively, the performance drops compared to the zeroshot performance. Could that be the case because the
prediction_length
is recommended to be <=64?Expected behavior
If the model chronos-t5-small is finetuned on let's say the dataset ETTh1 only, the finetuned model should yield superior MAE and MSE performance compared to the zeroshot model.
How to reproduce
This example is focused on the ETTh1 dataset. For the ETTh2 dataset, the procedure is identical. Please note that the finetuning evaluation for my experiments is done individually for both datasets, so the model is not finetuned and evaluated on both datasets at once.
if name == "main":
# Load and preprocess the dataset
df = pd.read_csv('/path/to/dataset')
Use training pipeline implemented in file
chronos-forecasting/scripts/training/train.py
shown in the tutorial with the following configchronos-t5-small.yaml
:chronos-forecasting/scripts/evaluation/evaluate.py
with the following modifications inthe load_and_split_dataset()
function:and following metrics:
The evaluation is performed on the test section of the standardized ETTh1 dataset, hence the offset. For the evaluation pipeline, use the following config:
I get the following results:
Zeroshot ETTh1
dataset,model,MAE[0.5],MSE[mean]
ETTh,amazon/chronos-t5-small_,0.5081954018184918,0.560689815315581_
Zeroshot ETTh2
dataset,model,MAE[0.5],MSE[mean]
ETTh,amazon/chronos-t5-small,0.2625630043626757,0.1391419442914831
Finetuned ETTh1
dataset,model,MAE[0.5],MSE[mean]
ETTh,/path/to/checkpoint-final,0.7746078180721628,1.1865953634689008
Finetuned ETTh2
dataset,model,MAE[0.5],MSE[mean]
ETTh,/path/to/checkpoint-final,0.35415831866543424,0.2516080298962922
As you can see, MAE and MSE are worse for the finetuned checkpoint than for the default model. That shouldn't be the case.
Environment description
Operating system: Ubuntu 22.04.4 LTS
Python version: 3.10.14
CUDA version: 12.4
PyTorch version: 2.4.0
HuggingFace transformers version: 4.44.2
HuggingFace accelerate version: 0.33.0
The text was updated successfully, but these errors were encountered: