-
-
Notifications
You must be signed in to change notification settings - Fork 289
Tasks
Task name |
---|
maze2d-open-v0 |
maze2d-umaze-v1 |
maze2d-medium-v1 |
maze2d-large-v1 |
maze2d-open-dense-v0 |
maze2d-umaze-dense-v1 |
maze2d-medium-dense-v1 |
maze2d-large-dense-v1 |
The Maze2D domain involves moving force-actuated ball (along the X and Y axis) to a fixed target location. The observation consists of the (x, y) location and velocities.
The four maze layouts are shown below (from left to right: open, umaze, medium large):
The four environments maze2d-open-v0
, maze2d-umaze-v0
, maze2d-medium-v0
, maze2d-large-v0
use a sparse reward which is has a value of 1.0 when the agent (light green ball) is within a 0.5 unit radius of the target (light red ball).
Each environment has a dense reward version, which instead uses the negative exponentiated distance as the reward.
Task name |
---|
antmaze-umaze-v0 |
antmaze-umaze-diverse-v0 |
antmaze-medium-diverse-v0 |
antmaze-medium-play-v0 |
antmaze-large-diverse-v0 |
antmaze-large-play-v0 |
The AntMaze domain uses the same umaze, medium, and large mazes from the Maze2D domain, but replaces the agent with the "Ant" robot from the OpenAI Gym MuJoCo benchmark.
The dataset in 'antmaze-umaze-v0' is generated by commanding a fixed goal location from a fixed starting location (these are the opposite sides of the wall in the umaze).
For harder tasks, the "diverse" dataset is generated by commanding random goal locations in the maze and navigating the ant to them. The "play" dataset is generated by commanding specific hand-picked goal locations from hand-picked initial positions.
Task name |
---|
minigrid-fourrooms-v0 |
minigrid-fourrooms-random-v0 |
The Minigrid domain is a discrete analog of Maze2D.
Two datasets are provided: minigrid-fourrooms-v0
, which is generated by a controller that randomly samples goal locations and navigates to them, and minigrid-fourrooms-random-v0
, which samples actions uniformly at random.
Task name |
---|
pen-demos-v0 |
pen-cloned-v0 |
pen-expert-v0 |
hammer-demos-v0 |
hammer-cloned-v0 |
hammer-expert-v0 |
door-demos-v0 |
door-cloned-v0 |
door-expert-v0 |
relocate-demos-v0 |
relocate-cloned-v0 |
relocate-expert-v0 |
The Adroit domain involves controlling a 24-DoF robotic hand. There are 4 tasks, from the hand_dapg repository. Clockwise from the top left, they are pen (aligning a pen with a target orientation), door (opening a door), relocate (move a ball to a target position), and hammer (hammer a nail into a board).
There are 3 datasets for each environment.
- Demos uses the 25 human demonstrations provided in the DAPG repository.
- Cloned uses a 50-50 split between demonstration data and 2500 trajectories sampled from a behavioral cloned policy on the demonstrations. The demonstration trajectories are copied to match the number of behavioral cloned trajectories.
- Expert uses 5000 trajectories sampled from an expert that solves the task, provided in the DAPG repository.
Task name |
---|
halfcheetah-random-v0 |
halfcheetah-medium-v0 |
halfcheetah-expert-v0 |
halfcheetah-mixed-v0 |
halfcheetah-medium-expert-v0 |
walker2d-random-v0 |
walker2d-medium-v0 |
walker2d-expert-v0 |
walker2d-mixed-v0 |
walker2d-medium-expert-v0 |
hopper-random-v0 |
hopper-medium-v0 |
hopper-expert-v0 |
hopper-mixed-v0 |
hopper-medium-expert-v0 |
- Random uses 1M samples from a randomly initialized policy.
- Expert uses 1M samples from a policy trained to completion with SAC.
- Medium uses 1M samples from a policy trained to approximately 1/3 the performance of the expert.
- Mixed uses the replay buffer of a policy trained up to the performance of the medium agent.
- Medium-Expert uses a 50-50 split of medium and expert data.
Task name |
---|
flow-ring-random-v0 |
flow-ring-controller-v0 |
flow-merge-random-v0 |
flow-merge-controller-v0 |
The Flow environment involves controlling the acceleration of autonomous vehicles (1 in ring, up to 5 in merge) in order to maximize traffic flow. The two road configurations we include are shown above: a single-lane ring environment and a highway merge intersection. We also include two types of datasets - the "random" data conisists of random accelerations being commanded to the autonomous vehicle, and the "controller" data uses an intelligent driver model (IDM) in order to command accelerations.
Task name |
---|
kitchen-complete-v0 |
kitchen-partial-v0 |
kitchen-mixed-v0 |
The goal of the FrankaKitchen environment is to interact with the various objects in order to reach a desired state configuration. The objects you can interact with include the position of the kettle, flipping the light switch, opening and closing the microwave and cabinet doors, or sliding the other cabinet door. The desired goal configuration for all 3 tasks is to complete 4 subtasks: open the microwave, move the kettle, flip the light switch, and slide open the cabinet door. 3 datasets are included:
- The complete dataset includes demonstrations of all 4 target subtasks being completed, in order.
- The partial dataset includes other tasks being performed, but there are subtrajectories where the 4 target subtasks are completed in sequence.
- The mixed dataset contains various subtasks being performed, but the 4 target subtasks are never completed in sequence together.
Task name |
---|
carla-lane-v0 |
carla-town-v0 |
We include tasks based on two map layouts within the CARLA simulator. Observations are provided as a 6912-dimensional vector, which can be reshaped into a (48, 48, 3)-dimensional RGB image.
- carla-lane-v0 is a lane-keeping task in a figure 8 road layout (CARLA Town04). The dataset consists of a hand-coded lane-keep controller which drives continuously while avoiding crashes with other vehicles.
- carla-town-v0 is a navigation task, where the vehicle must navigate to a target location. The dataset consists of the same lane-keeping controller from carla-lane-v0, except the vehicle makes random turns at intersections.
There is some additional setup required for CARLA, which can be found here