Run a model with config files
Run a model with config files#
Note
The tutorial is for FuxiCTR v1.0.
This tutorial shows how to use YAML config files to define dataset and model hyper-parameters, and then run the model.
We take DeepFM_with_config.py
in the demo directory as an example. The config files are located in demo/demo_config folder.
The dataset config dataset_config.yaml
is as follows:
### Tiny data for tests only
taobao_tiny_data:
data_root: ../data/
data_format: csv
train_data: ../data/tiny_data/train_sample.csv
valid_data: ../data/tiny_data/valid_sample.csv
test_data: ../data/tiny_data/test_sample.csv
min_categr_count: 1
feature_cols:
- {name: ["userid","adgroup_id","pid","cate_id","campaign_id","customer","brand","cms_segid",
"cms_group_id","final_gender_code","age_level","pvalue_level","shopping_level","occupation"],
active: True, dtype: str, type: categorical}
label_col: {name: clk, dtype: float}
The model config model_config.yaml
is as follows.
# The `Base` can be shared by different expid settings
Base:
model_root: '../checkpoints/'
workers: 3
verbose: 1
patience: 2
pickle_feature_encoder: True
use_hdf5: True
save_best_only: True
every_x_epochs: 1
debug: False
# The expid should be unique among all settings
DeepFM_test:
model: DeepFM
dataset_id: taobao_tiny_data # each expid corresponds to a dataset_id
loss: 'binary_crossentropy'
metrics: ['logloss', 'AUC']
task: binary_classification
optimizer: adam
hidden_units: [64, 32]
hidden_activations: relu
net_regularizer: 0
embedding_regularizer: 1.e-8
learning_rate: 1.e-3
batch_norm: False
net_dropout: 0
batch_size: 128
embedding_dim: 4
epochs: 1
shuffle: True
seed: 2019
monitor: 'AUC'
monitor_mode: 'max'
We use the Base
to keep some common hyper-paramerters that could be shared by different expid settings. It is also flexible to merge all the key-values pairs in Base
to FM_test
for your convenience.
Note that the naming dataset_config
and model_config
should keep unchanged. Both dataset config and model config should be kept in the same directory: either 1) put dataset_config.yaml and model_config.yaml as shown in ./demo/demo_config, or 2) put in dataset_config and model_config folders as shown in ./config when a bunch of config files are available.
import sys
import os
from fuxictr.datasets import data_generator
from fuxictr.datasets.taobao import FeatureEncoder
from datetime import datetime
from fuxictr.utils import set_logger, print_to_json, load_config
import logging
from fuxictr.pytorch.models import DeepFM
from fuxictr.pytorch.utils import seed_everything
if __name__ == '__main__':
# Load params from config files
config_dir = 'demo_config'
experiment_id = 'DeepFM_test'
params = load_config(config_dir, experiment_id)
# set up logger and random seed
set_logger(params)
logging.info(print_to_json(params))
seed_everything(seed=params['seed'])
# Set up feature encoder
feature_encoder = FeatureEncoder(**params)
feature_encoder.fit(train_data=params['train_data'],
min_categr_count=params['min_categr_count'])
# Build train/validation/test data generators
train_gen, valid_gen, test_gen = data_generator(feature_encoder,
train_data=params['train_data'],
valid_data=params['valid_data'],
test_data=params['test_data'],
batch_size=params['batch_size'],
shuffle=params['shuffle'],
use_hdf5=params['use_hdf5'])
# Build a DeepFM model
model = DeepFM(feature_encoder.feature_map, **params)
model.fit_generator(train_gen, validation_data=valid_gen, epochs=params['epochs'],
verbose=params['verbose'])
# Reloading weights of the best checkpoint
model.load_weights(model.checkpoint)
# Evalution on validation
logging.info('***** validation results *****')
model.evaluate_generator(valid_gen)
# Evalution on test
logging.info('***** test results *****')
model.evaluate_generator(test_gen)
For easy use, we also provide a useful tool script run_expid.py
to run FuxiCTR models based on YAML config files.
–config: The config directory of data and model config files.
–expid: The given expid that denotes the detailed experimental settings.
–gpu: The gpu index used for experiment, and -1 for CPU.
Try the following examples:
!cd benchmarks
# run the demo config
!python run_expid.py --config ../demo/demo_config --expid DeepFM_test --gpu 0
# run DeepFM_test, located in config/model_config/tests.yaml
!python run_expid.py --config ../config --expid DeepFM_test --gpu 0