Skip to content

Commit

Permalink
update timegan & explanation
Browse files Browse the repository at this point in the history
  • Loading branch information
euisuk-chung committed Dec 13, 2021
1 parent be749f9 commit 2e56cef
Show file tree
Hide file tree
Showing 5 changed files with 133 additions and 48 deletions.
125 changes: 93 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,84 +1,145 @@
# Time Series Generation
Github Repo for timeseries generation (시계열 생성)

## Purpose
## 1. Purpose
Time Series Generation can be used for multiple purposes

For example :
- Time Series Data Augmentations
- Generating Simulations


## How to use
1. TimeGAN을 이용한 시계열 생성
## 2. How to use

```python
python run_timegan.py
- You can check detail about the argument at `4. Model Parameters`

### 2.1. Time Series Generation using TimeGAN

```python
# Example
python run_timegan.py --file_name test_data --cols_to_remove Time MNG_NO --time_gap 500 --emb_epochs 10 --sup_epochs 10 --gan_epochs 10 --window_size 5
```

2. VRAE를 이용한 시계열 생성
### 2.2. VRAE를 이용한 시계열 생성

```python
python run_vrae.py

# Example
python run_vrae.py --file_name test_data --cols_to_remove Time MNG_NO --time_gap 500 --n_epochs 10 --window_size 5

```

## Models Used
### TimeGAN
## 3. Models Used
### 3.1. TimeGAN

<img src = 'https://github.com/euisuk-chung/timeseries-generation/blob/main/image/TimeGAN_architecture.PNG?raw=true' width="650" height="400">

- pyTorch implementation for `TimeGAN`
- Code Reference : https://github.com/d9n13lt4n/timegan-pytorch

### Variational Recurrent AutoEncoder (VRAE)
### 3.2. Variational Recurrent AutoEncoder (VRAE)

<img src = 'https://github.com/euisuk-chung/timeseries-generation/blob/main/image/LSTM_VAE_architecture.png?raw=true' width="650" height="400">

- pyTorch implementation for `VRAE`
- Code Reference : https://github.com/tejaslodaya/timeseries-clustering-vae

## CAUTIONS!
## 4. Model Arguments

- Indeed, TimeGAN and VRAE have shared parameters, however there are also lot of parameters that are not shared.
- Therefore, Arguments for each model is in `config_timegan.py` and `config_vrae.py`

### 4.1. Shared Arguments

Training method for each model are the same, which uses dataset that is loaded by moving sliding window(default=30) with certain stride(default=1).

<img src = 'https://github.com/euisuk-chung/timeseries-generation/blob/main/image/train_test_image.png?raw=true' width="650" height="400">

However, the generation method for each model are different! See below for more detail.
There are few things you need to know before implementing our code :

- The generation method for each model are different :
- TimeGAN generates window sized timeseries from `random noise` (without any input)
- VRAE generates window sized timeseries from `given timeseries` and `trained latent space` (with input)

- The query for train/test split in my code is currently used for my side-project.
- If you want to use train/test you need to go to `utils.custom_dataset` and change the query.
- For generation purpose, you don't have to worry about train/test `split (defalut = False)`

Here are the following arguments:
```
--file_name file_name # 분석에 사용할 파일이름
--cols_to_remove Var1 Var2 # 분석에서 제외할 변수이름 (Ex. time var, idx var)
--time_gap 500 # 데이터 수집 GAP(텀)
--window_size 10 # 학습에 사용할 윈도우 크기
```

### 4.2. TimeGAN Arguments

TimeGAN has following modes (for more check `config_timegan.py`) :

### TimeGAN
TimeGAN has 2 Modes, which is used to decide whether to train or generate :
1. is_train (default = True) : train model with loaded train data (window_size=30, stride=1)
2. is_generate (default = True) : generate multiple(num_generation) sequences of window (window_size=30)

```
# Mode 1 : Train mode
--is_train # train timeGAN
--is_train `True` or `False` # Train 데이터를 이용한 학습
--num_generation 100 # 생성할 데이터 윈도우의 개수
--is_generate `True` or `False` # 학습된 latent space를 바탕으로 데이터 생성
# Mode 2 : Generation mode
--is_generate # generate window size sequences
--num_generation # number of sequences to make
--emb_epochs 3000 # AutoEncoder 학습 Epoch 수
--sup_epochs 3000 # Supervisor 학습 Epoch 수
--gan_epochs 3000 # GAN 모델 학습 Epoch 수
```

### Variational Recurrent AutoEncoder (VRAE)
VRAE has 3 Modes, which is used to decide whether to train or generate(train) or generate(test) :
1. is_train (default = True) : train model with loaded train data (window_size=30, stride=1)
2. is_generate_train (default = True) : generate train dataset loaded sequentially (window_size=stride)
3. is_generate_test (default = False) : generate test dataset loaded sequentially (window_size=stride)
### 4.3. Variational Recurrent AutoEncoder (VRAE) Arguments

✨ The query for train/test split in my code is currently used for my side-project.
✨ If you want to use train/test you need to go to `utils.custom_dataset` and change the query.
VRAE has following modes (for more check `config_vrae.py`) :

1. is_train (default = True) : train model with loaded train data (window_size=30, stride=1)
2. is_generate_train (default = True) : generate train dataset loaded sequentially (stride=window_size)
3. is_generate_test (default = False) : generate test dataset loaded sequentially (stride=window_size)

```
# Mode 1 : Train mode
--is_train # train VRAE
--is_train `True` or `False` # Train 데이터를 이용한 학습
--is_generate_train `True` or `False` # 학습된 latent space와 Train Data를 바탕으로 데이터 생성
# Mode 2 : Train Generation mode
--is_generate_train # generate train dataset
--is_generate_test `True` or `False` # 학습된 latent space와 Test 데이터 생성 (실험용 : 데이터 생성 시 사용할 필요 X)
# Mode 3 : Test Generation mode
--is_generate_test # generate test dataset
--n_epochs 2000 # 모델 학습할 Epoch 수
```

## Repository Structure
```
├── data
│ ├── Data You Want to Use (in pkl)
├── gen_data_gan
│ └── where GAN Generated Data are saved
├── gen_data_vae
│ └── where VAE Generated Data are saved
├── models
│ ├── TimeGAN.py
│ └── vrae.py
├── save_model
│ └── where model parameters get saved
├── utils
│ ├── TSTR.py # TSTR(TRTS) code
│ ├── custom_dataset.py # dataloading code
│ ├── utils_timegan.py # util function for timegan
│ ├── utils_vrae.py # util function for vrae
│ ├── visual_timegan.py # visualization function for timegan
│ └── visual_vrae.py # visualization function for vrae
├── run_timegan.py
├── run_vrae.py
├── config_timegan.py
├── config_vrae.py
```
17 changes: 16 additions & 1 deletion config_timegan.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,21 @@ def parser_setting(parser):
'--file_name',
default='netis',
type=str)
parser.add_argument(
'--cols_to_remove',
default='Time',
type=str,
nargs='*',
help = 'Columns to Remove')
parser.add_argument(
"--split",
type=str2bool,
default=False,
help = 'Argument for Train/Test split')
parser.add_argument(
'--time_gap',
default=100,
type=int)

# train/generate argument
parser.add_argument(
Expand Down Expand Up @@ -113,7 +128,7 @@ def parser_setting(parser):
default=1e-3,
type=float)
parser.add_argument(
'--model_path',
'--dload',
default="save_model",
type=str)
return parser
Expand Down
25 changes: 17 additions & 8 deletions run_timegan.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,31 +28,40 @@
from utils.custom_dataset import *
from utils.utils_timegan import *

args = config.get_config() # argument 호출
fix_seed(args.seed) # seed 고정
# argument 호출
args = config.get_config()

# seed 고정
fix_seed(args.seed)

file_name = args.file_name # 데이터 파일명
WINDOW_SIZE = args.window_size # Window size
scale_type = args.scale_type # Scaler Type 설정 ('Standard' or 'MinMax' or 'Robust')
undo = args.undo # reconstruction 시 unscale 수행여부
cols_to_remove = args.cols_to_remove # 제거할 변수
split = args.split # TRAIN/TEST Split 여부
time_gap = args.time_gap # 데이터수집단위


# Load & Scale data
TRAIN_DF, TEST_DF, TRAIN_SCALED, TEST_SCALED, TRAIN_Time, TEST_Time, cols, scaler = load_gen_data(file_name = file_name, scale_type = scale_type)
TRAIN_DF, TEST_DF, TRAIN_SCALED, TEST_SCALED, TRAIN_Time, TEST_Time, cols, scaler = load_gen_data(file_name = file_name, \
scale_type = scale_type,\
cols_to_remove = cols_to_remove,\
split = split)

# under custom_dataset.py
## Train dataset with stride
train_dataset = NetisDataset(data = TRAIN_SCALED, timestamps = TRAIN_Time, window_size = WINDOW_SIZE, stride =1)

## Test dataset with no window collapse (for generation window size must be WINDOW_SIZE)
test_dataset = NetisDataset(data = TEST_SCALED, timestamps = TEST_Time, window_size = WINDOW_SIZE, stride = WINDOW_SIZE)
train_dataset = TimeSeriesDataset(data = TRAIN_SCALED, timestamps = TRAIN_Time, window_size = WINDOW_SIZE, stride =1, time_gap = time_gap)

# SET DEVICE
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# SET ARGUMENTS
args.feature_dim = train_dataset[0].size(1)
args.Z_dim = train_dataset[0].size(1)
args.model_path = "save_model"
args.dload = "./save_model"

# model_path
# DEFINE MODEl
model = TimeGAN(args)
model = model.to(device)
Expand Down
2 changes: 1 addition & 1 deletion run_vrae.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
train_dataset = TimeSeriesDataset(data = TRAIN_SCALED, timestamps = TRAIN_Time, window_size = WINDOW_SIZE, stride =1, time_gap = time_gap)

# SET ARGUMENTS
args.dload = "./save_model_test" # CHANGE BACK TO NORMAL AFTER TESTING
args.dload = "./save_model"
args.sequence_length = WINDOW_SIZE
args.number_of_features = train_dataset[0].shape[1]

Expand Down
12 changes: 6 additions & 6 deletions utils/utils_timegan.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,9 +235,9 @@ def timegan_trainer(model, dataset, args):

# Save model, args, and hyperparameters
print('SAVING TRAINED MODEL...')
torch.save(args, f"{args.model_path}/args.pickle")
torch.save(model.state_dict(), f"{args.model_path}/model.pt")
print(f"\nSaved at path: {args.model_path}")
torch.save(args, f"{args.dload}/args.pickle")
torch.save(model.state_dict(), f"{args.dload}/model.pt")
print(f"\nSaved at path: {args.dload}")

# timeGAN generator
def timegan_generator(model, Num, args):
Expand All @@ -250,14 +250,14 @@ def timegan_generator(model, Num, args):
- generated_data (np.ndarray): The synthetic data generated by the model
"""
# Load model for inference
if not os.path.exists(args.model_path):
if not os.path.exists(args.dload):
raise ValueError(f"Model directory not found...")

# Load arguments and model
with open(f"{args.model_path}/args.pickle", "rb") as fb:
with open(f"{args.dload}/args.pickle", "rb") as fb:
args = torch.load(fb)

model.load_state_dict(torch.load(f"{args.model_path}/model.pt"))
model.load_state_dict(torch.load(f"{args.dload}/model.pt"))

print("\nGenerating Data...")
# Initialize model to evaluation mode and run without gradients
Expand Down

0 comments on commit 2e56cef

Please sign in to comment.