Train an Reinforcement Learning Baseline
RL algorithms are common baselines for measuring AI Agent performance in Human-AI teaming research. Once
CREW is installed and the crew
conda environment is activated, simply navigate under
crew-algorithms and run:
python crew_algorithms/ddpg envs=hide_and_seek_1v1 collector.frames_per_batch=240 batch_size=240 train_batches=30
The script will first open and close a dummy environment to read the environment specs for creating an agent. Then it will launch a server instance of the 1 v 1 Hide and Seek game and train an DDPG algorithm on the seeker agent.
By default, CREW uses wandb to handle experiment logging. If you
have not used wandb before, you can create an account according to their prompt after running the script.
You can also disable wandb logging by adding WANDB_MODE=disabled
before the python command:
WANDB_MODE=disabled python crew_algorithms/ddpg envs=hide_and_seek_1v1 collector.frames_per_batch=240 batch_size=240 train_batches=30
The above script will train the agent for 60 minutes. You should be able to observe decent performance by 20-30 minutes.crew
By default, the model weights will be saved after every training batch to
CREW/Data/00/ddpg/<exp name>
, where <exp name>
is an automatically
generated experiment name containing the time, environment, algorithm and seed information of the
experiment. To evaluate and visualize trained models, use crew_algorithms/ddpg/eval.py
, and
add arguments exp_path
and eval_weights
to chose the experiment and weights to
evaluate and visualize. exp_path
starts with the subject ID, which is 00
by
default, and eval_weights
is a list of integers of the weights you wish to evaluate. For
instance: