ML 기반 실험 관리의 어려움

석사 과정 당시, 대조 학습 방법을 추천 시스템에 적용하여 제품 추천 다양성을 증가시키는 연구를 수행한 적이 있습니다. 실제 연구에 사용하였던 설정 파일과 모델 학습 코드의 템플릿을 같이 확인해봅시다!

<설정 파일>

# config.yaml
learning_rate: 0.001      # learning rate for model
dropout_rate: 0.1         # dropout rate for model
lambda_for_cl: 0.1        # control loss for encoder model(range: 0 ~ 1, allow float type)
substitution_rate: 0.3    # ratio for item substitution in batch
masking_rate: 0.3         # ratio for item masking in batch
cropping_rate: 0.3        # ratio for item cropping in batch
batch_size: 256       
sequence_len: 1           # if run_mode == 'diversity' you should use 1, else using 30 will be good 
embedding_dim: 128        # embedding size
early_stop_patience: 20   # control early stopping steps
warmup_steps: 10000       # control warmup steps for learning rate scheduling
epochs: 200              
clip_norm: 5.0            # gradient clipping norm
log_interval: 10          # interval for logging for loguru and wandb
temperature: 0.1          # temperature for NT-Xent Loss

try_num: '9'              # experiment number
run_mode: 'test'          # running mode for main.py: ['train', 'test', 'diversity']
data_type: 'RentTheRunway'  # 'RentTheRunway'
train_type: 'NCF'         # Model which you want to train: ['CL', 'REC', 'ONLY_REC', 'NCF']
aug_mode: 'substitute'    # Item augmentaion mode: ['substitute', 'crop', 'mask']
rec_num_dense_layer: 2
model_trainable: False    # True, False

k: 20                     # top-K item
sparsity: 0.5             # transaction count sparsity for checking diversity

<모델 학습 샘플 코드>

# train.py (Templete code)

from omegaconf import OmegaConf

# Duck typing for accessing dataclass variable
def load_config(path: str) -> Cfg:
    schema = OmegaConf.structured(Cfg())

    # duck typing
    config: Cfg = OmegaConf.merge(schema, OmegaConf.load(path))

    return config

# load config 
config = load_config("config.yaml")

# load dataset
dataset = load_dataset(config)

# load dataloader
dataloader = load_dataloader(config)

# initialize model
model = Model(config)

# train model
model.train(dataset, dataloader, model)

코드를 작성할 당시에는 미숙한 부분이 많아, 실험 목표를 달성한 것에 만족하고 넘어갔습니다.

하지만, 추후 코드 리뷰를 하면서 실험 설정과 관련하여 코드를 수정할 곳이 매우 많다는 사실을 알게 되었어요😥

그 중에서, 핵심 사항은 다음과 같이 세 가지로 추렸습니다🦾