Agent Learning Framework (ALF) is a reinforcement learning framework emphasizing on the flexibility and easiness of implementing complex algorithms involving many different components. ALF is built on PyTorch. The development of previous version based on Tensorflow 2.1 has stopped as of Feb 2020.
A draft tutorial can be accessed on RTD. This tutorial is still under construction and some chapters are unfinished yet.
Read the ALF documentation here.
The following installation has been tested on Ubuntu22.04 and Ubuntu24.04 with CUDA 11.8.
Python3.10-3.12 is currently supported by ALF. Note that some pip packages (e.g., pybullet) need python dev files, so make sure the corresponding python3-dev package is installed:
sudo apt install -y python3.11 python3.11-dev
We also require the following packages:
sudo apt install libboost-all-dev # required by ALF for fast parallel environments.
sudo apt install ninja-build # required to build modules via torch.utils.cpp_extension
sudo apt install swig # required to build box2d-py
sudo apt install xvfb # for running headless training jobs locally
Virtualenv is recommended for the installation. After creating and activating a virtual env, you can run the following commands to install ALF:
git clone https://github.com/HorizonRobotics/alf
cd alf
uv can manage an isolated environment for ALF without creating a virtualenv yourself. A typical workflow looks like this:
-
Install uv (once per machine):
curl -Ls https://astral.sh/uv/install.sh | shRestart your shell or source the profile snippet the installer prints.
-
Bootstrap the project environment (creates
.venv/anduv.lockautomatically):uv sync
-
Launch Python inside that environment:
uv run python
Any command prefixed with
uv runuses the synced environment, e.g.uv run python -m alf.bin.train --conf=.... -
Add or update packages:
uv add some-package uv add pandas==2.2.3
uv records the changes in
pyproject.tomland regeneratesuv.lock. -
Remove packages when they are no longer needed:
uv remove some-package
-
To refresh dependencies after editing
pyproject.tomlby hand, rerunuv sync. -
Commit dependency changes (keep
pyproject.tomlanduv.lockin version control):git add pyproject.toml uv.lock git commit -m "Update dependencies" git push origin <your-branch>
Everything stays local to the repository; no system-wide packages or pre-existing virtualenv are required.
pip install -e .There is a built-in Nix-based development environment defined in flake.nix. To activate it, run
$ nix developin the root of your local repository.
We also provide a docker image of ALF for convenience. In order to use this image, you need to have docker and nvidia-docker (for ALF gpu usage) installed first.
docker run --gpus all -it horizonrobotics/cuda:11.8.0-py3.11-torch2.2-ubuntu22.04 /bin/bashThis will give you a shell that have all ALF and dependencies pre-installed.
The current docker image contains an ALF version on Feb 21, 2024. Regular version updates are expected in the future.
You can train any _conf.py file under alf/examples as follows:
python -m alf.bin.train --conf=CONF_FILE --root_dir=LOG_DIR- CONF_FILE is the path to your conf file which follows ALF configuration file format (basically python).
- LOG_DIR is the directory when you want to store the training results. Note that if you want to train from scratch, LOG_DIR must point to a location that doesn't exist. Otherwise, it is assumed to resume the training from a previous checkpoint (if any).
During training, we use tensorboard to show the progress of training:
tensorboard --logdir=LOG_DIRAfter training, you can evaluate the trained model and visualize environment frames using the following command:
python -m alf.bin.play --root_dir=LOG_DIRTo launch single-node multi-gpu training, set the 'multi-gpu' argument
python -m alf.bin.train --conf=CONF_FILE --root_dir=LOG_DIR --distributed multi-gpuTo launch multi-node multi-gpu training, we use torch distributed launch module. The 'local_rank' for each process can be obtained from 'PerProcessContext' class, which can be used to assign gpu for your environment if you wish. For details on how PyTorch assign 'local_rank' and 'ddp_rank', please refer to the documentation. To start training, run the following command on the host machine:
export NCCL_SOCKET_IFNAME=SOCKET # find in ifconfig
export NCCL_IB_DISABLE=1
torchrun \
--nproc_per_node=NGPU_ON_NODE \
--nnodes=NUMBER_OF_NODES \
--node_rank=NODE_RANK \
--master_addr=HOST_IP \
--master_port=12345 \
./alf/bin/train.py \
--conf=CONF_FILE \
--root_dir=LOG_DIR \
--distributed multi-node-multi-gpu \and simultaneously run the same command on each worker machine. For each worker machine, assign a NODE_RANK to it and update NGPU_ON_NODE if the number of GPUs is different from the host. Please make sure that all machines get a same copy of the codebase.
An older version of ALF used gin
for job configuration. Its syntax is not as flexible as ALF conf (e.g., you can't easily
do math computation in a gin file). There are still some examples with .gin
under alf/examples. We are in the process of converting all .gin examples to _conf.py
examples.
You can train any .gin file under alf/examples using the following command:
cd alf/examples; python -m alf.bin.train --gin_file=GIN_FILE --root_dir=LOG_DIR- GIN_FILE is the path to the gin conf (some
.ginfiles underalf/examplesmight be invalid; they have not been converted to use the latest pytorch version of ALF). - LOG_DIR has the same meaning as in the ALF conf example above.
Warning: When using gin, ALF has to be launched in the same directory with the gin file(s). If an error says that no configuration file is found, then probably you've launched ALF in a wrong place.
All the examples below are trained on a single machine Intel(R) Core(TM) i9-7960X CPU @ 2.80GHz with 32 CPUs and one RTX 2080Ti GPU.
-
Cart pole. The training score took only 30 seconds to reach 200, using 8 environments.
-
Atari games. Need to install python package atari-py for atari game environments. The evaluation score (by taking argmax of the policy) took 1.5 hours to reach 800 on Breakout, using 64 environments.
-
Simple navigation with visual input. Follow the instruction at SocialRobot to install the environment.
-
PR2 grasping state only. Follow the instruction at SocialRobot to install the environment.
-
Humanoid. Learning to walk using the pybullet Humanoid environment. Need to install python pybullet>=2.5.0 for the environment. The evaluation score reaches 3k in 50M steps, using 96 parallel environments.
-
procgen. Game "bossfight" as an example. Need to install python package procgen.
-
MetaDrive. Learning to drive on randomly generated map with interaction on the MetaDrive simulator, with BEV as input. Need to install python package metadrive-simulator.
-
DDQN on Atari. Game "Q*Bert" performance.
-
FetchSlide (sparse rewards). Need to install the MuJoCo simulator first. This example reproduces the performance of vanilla DDPG reported in the OpenAI's Robotics environment paper. Our implementation doesn't use MPI, but obtains (evaluation) performance on par with the original implementation. (The original MPI implementation has 19 workers, each worker containing 2 environments for rollout and sampling a minibatch of size 256 from its replay buffer for computing gradients. All the workers' gradients will be summed together for a centralized optimizer step. Our implementation simply samples a minibatch of size 5000 from a common replay buffer per optimizer step.) The training took about 1 hour with 38 (19*2) parallel environments on a single GPU.
-
FetchReach (sparse rewards). Need to install the MuJoCo simulator first. The training took about 20 minutes with 20 parallel environments on a single GPU.
-
FetchSlide (sparse rewards). Need to install the MuJoCo simulator first. This is the same task with the DDPG example above, but with SAC as the learning algorithm. Also it has only 20 (instead of 38) parallel environments to improve sample efficiency. The training took about 2 hours on a single GPU.
-
Fetch Environments (sparse rewards) w/ Action Repeat. We are able to achieve even better performance than reported by DDPG + Hindsight Experience Replay in some cases simply by using SAC + Action Repeat with length 3 timesteps. See this note to view learning curves, videos, and more details.
-
Super Mario. Playing Super Mario only using intrinsic reward. Python package gym-retro>=0.7.0 is required for this experiment and also a suitable
SuperMarioBros-Nesrom should be obtained and imported (roms are not included in gym-retro). See this doc on how to import roms.
- Montezuma's Revenge. Training the hard exploration game Montezuma's Revenge with intrinsic rewards generated by RND. A lucky agent can get an episodic score of 6600 in 160M frames (40M steps with
frame_skip=4). A normal agent would get an episodic score of 4000~6000 in the same number of frames. The training took about 6.5 hours with 128 parallel environments on a single GPU.
-
Pendulum. Learning diverse skills without external rewards.
-
Pendulum. Learning a control policy from offline demonstrations.
-
Collect Good Objects. Learn to collect good objects and avoid bad objects.
DeepmindLabis required, Follow the instruction at DeepmindLab to install the environment.
-
6x6 Go. It took about a day to train a reasonable agent to play 6x6 go using one GPU.
If you use ALF for research and find it useful, please consider citing:
@software{Xu2021ALF,
title={{{ALF}: Agent Learning Framework}},
author={Xu, Wei and Yu, Haonan and Zhang, Haichao and Hong, Yingxiang and Yang, Break and Zhao, Le and Bai, Jerry and ALF contributors},
url={https://github.com/HorizonRobotics/alf},
year={2021}
}
You are welcome to contribute to ALF. Please follow the guideline here.





























