Feature/vec env #239

benblack769 · 2021-03-23T14:55:07Z

Why this exists:

Adds support for gym/stable baselines vector environments.
adds support for multi-processing based environment parallelization that comes with them.
adds support for parameter sharing in pettingzoo environments using the pettingzoo_env_to_vec_env_v0 vectorization trick in supersuit docs, tutorial

What this changes:

Adds an ALL VectorEnvironment abstract base class
Adds a DuplicatedEnv which just duplicates an arbitrary ALL environment (with correct masking, see this issue for context Support for Vector Envs #220 )
.duplicate() method in environments now return a DuplicatedEnv rather than a list of environments
Adds a GymVectorEnvironment which wraps over a gym or stable baselines vector environment to make an ALL vector environment
ParallelEnvExperiment now either accepts a vector environment or a regular environment (duplicating it as necessary)

How this is tested:

ParallelEnvExperiment test is left mostly unchanged to test backwards compatability.
DuplicatedEnv has a number of unit tests similar to the GymEnvironment.
GymVectorEnvironment is tested to return the same exact thing as the DuplicatedEnv, up to the autoreset problem. Support for Vector Envs #220

Additionally, I ran a manual test using the following script:

import gym
import supersuit
from all.environments import GymVectorEnvironment, GymEnvironment
from all.presets.classic_control import a2c
from all.experiments import ParallelEnvExperiment

def test_vec_env(vec_env):
    vec_env = GymVectorEnvironment(vec_env, "CartPole-v0")
    preset = a2c.env(vec_env).device("cpu").hyperparameters(n_envs=vec_env.num_envs).build()
    experiment = ParallelEnvExperiment(preset, vec_env)
    experiment.train(episodes=400)
    experiment.save()
    returns = experiment.test(episodes=5)
    print(returns)
    assert sum(returns) / 5 > 40

n_envs = 4
gym_env = supersuit.gym_vec_env_v0(gym.make('CartPole-v0'), n_envs)
stable_baselines_env = supersuit.stable_baselines3_vec_env_v0(gym.make('CartPole-v0'), n_envs)
supersuit_env = supersuit.concat_vec_envs_v0(gym.make('CartPole-v0'), n_envs)

test_vec_env(gym_env)
test_vec_env(stable_baselines_env)
test_vec_env(supersuit_env)

And it passes, the vector environments seem to learn pretty well in 400 episodes.

benblack769 · 2021-03-23T15:12:58Z

Ok, I updated the test script to include the DuplicatedEnv and it doesn't train. Let me debug this a bit.

benblack769 · 2021-03-23T16:48:27Z

False alarm, my test just didn't use the environment correctly. Fixing the integration tests now

benblack769 · 2021-03-23T18:26:46Z

@cpnota The reason the integration tests are failing is because the test agents are single Agents that expect a single observation and emit a single action.

The problem is that in the new parallel experiment, the environment is still a vector. In theory, you can just give a dummy action for the other environments, but this is 1) inefficient in general, and 2) does not work in the pettingzoo vector env trick.

I assume there was a reason you did not want the test agents to be parallel agents, but there are potentially significant performance improvements by having a parallel agent during evaluation (can take advantage of the multiprocessing environment, and takes advantage of batch parallelization on a GPU), in addition to the above problem.

cpnota · 2021-03-23T19:06:16Z

I made the TestAgent for ParallelAgents be a a regular Agent 1) to make evaluation a little easier (there's some random issues, like that you need to consider the first 100 episodes that are started, not the first 100 that are finished), but mainly 2) so that can load a non-parallel agent once the agent is trained, for things like the watch script.

For this PR, I would try to figure out the minimal fix and go with that (e.g., find a way to extract a single env from the vector env). However, I think there is a use for both parallel and non-parallel test agents, so it could be nice to extend the ParallelPreset API:

class ParallelPreset():
    """
    A Preset ParallelAgent factory.

    This is the ParallelAgent version of all.presets.Preset.
    This class allows the user to instantiate preconfigured ParallelAgents and test Agents.
    All Agents constructed by the ParallelPreset share a network model and parameters.
    However, other objects, such as ReplayBuffers, are independently created for each Agent.
    The ParallelPreset can be saved and loaded from disk.
    """

    def __init__(self, name, device, hyperparameters):
        self.name = name
        self.device = device
        self.hyperparameters = hyperparameters

    @abstractmethod
    def agent(self, writer=None, train_steps=float('inf')):
        """
        Instantiate a training-mode ParallelAgent with the existing model.

        Args:
            writer (all.logging.Writer, optional): Coefficient for the entropy term in the total loss.
            train_steps (int, optional): The number of steps for which the agent will be trained.

        Returns:
            all.agents.ParallelAgent: The instantiated Agent.
        """
        pass

    @abstractmethod
    def test_agent(self):
        """
        Instantiate a test-mode Agent with the existing model.

        Returns:
            all.agents.Agent: The instantiated test Agent.
        """
        pass

    @abstractmethod
    def test_parallel_agent(self):
        """
        Instantiate a test-mode ParallelAgent with the existing model.

        Returns:
            all.agents.Parallel: The instantiated test ParallelAgent.
        """
        pass

    @property
    def n_envs(self):
        return self.hyperparameters['n_envs']

    def save(self, filename):
        """
        Save the preset and the contained model to disk.

        The preset can later be loaded using torch.load(filename), allowing
        a test mode agent to be instantiated for evaluation or other purposes.

        Args:
            filename (str): The path where the preset should be saved.
        """
        return torch.save(self, filename)

Thoughts?

benblack769 · 2021-03-23T22:09:03Z

Yeah, that approach of making parallel test agents seems fairly simple and well concieved.

Like you suggested, I just had the parallel env experiment evaluate the first environment in the vector environment to make this PR relatively simple and independent.

cpnota

Looks good! Good job on the tests, also. Just the one minor comment.

So the next steps on this are:

Add the parallel test agents and make parallel env experiment use this instead
Add multi-process support (I guess we technically have this now, in that you can pass in a Gym vector environment that uses multiprocessing, but it's not natively supported)
Is that correct?

My only other concern is that it seems like there is tensions between the terminology Parallel/Vector. Should it just be a VectorAgent? Renaming stuff could be rolled into 1. above.

cpnota · 2021-03-23T23:12:05Z

all/environments/vector_env_test.py

+        env2.seed(42)
+        state1 = env1.reset()
+        state2 = env2.reset()
+        assert env1.name == env2.name


self.assertEqual() on all of these?

cpnota · 2021-03-23T23:13:10Z

all/experiments/parallel_env_experiment.py

-    def _fps(self, i):
-        end_time = timer()
-        return (self._frame - self._episode_start_frames[i]) / (end_time - self._episode_start_times[i])
+        first_state = self._env.reset()[0]


Will definitely have to come back to this soon, as discussed.

Yes, this is really hacky.

benblack769 · 2021-03-24T00:40:32Z

As for the idea of renaming parallel to Vector. I do like the terminology vector agent and vector environment, vector experiment, etc, but I am a C++ person so I am a bit biased. I know that physics people tend to hate the vector terminology, as they think that vectors should have linear algebra properties. I also don't really see a problem with just renaming the environment to a ParallelEnvironment, GymParallelEnvironment, etc. I'm just not really picky about these things.

As for the idea of native multiprocessing, I would say that is a problem for much later, if at all. I guess I don't think that ALL needs to do everything natively.

cpnota

Thanks also for the detailed info in the PR!

benblack769 added 10 commits January 29, 2021 19:38

added vector env environment

b6b2188

various additions and fixes to vector env

c4ac4b2

added duplicate env, tested

97d6967

added vector env similarity test

0e7661c

fixed test bug

8b8fe80

fixed test

7f2bbc1

fixed type problem

f3fb6d8

merged with develop

e8b3393

fixed linting

4512de0

fixed FPS

71ca07b

fixed vector env problem

02b5a95

cpnota requested changes Mar 23, 2021

View reviewed changes

fixed asserts

4d83c83

cpnota approved these changes Mar 24, 2021

View reviewed changes

cpnota merged commit 5ee29ea into develop Mar 24, 2021

cpnota mentioned this pull request Mar 30, 2021

Feature/parallel test agent #240

Merged

cpnota mentioned this pull request Jun 12, 2021

Release/0.7.1 #248

Merged

cpnota deleted the feature/vec_env branch April 12, 2022 21:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/vec env #239

Feature/vec env #239

benblack769 commented Mar 23, 2021

benblack769 commented Mar 23, 2021

benblack769 commented Mar 23, 2021

benblack769 commented Mar 23, 2021

cpnota commented Mar 23, 2021

benblack769 commented Mar 23, 2021

cpnota left a comment

cpnota Mar 23, 2021

benblack769 Mar 24, 2021

cpnota Mar 23, 2021

benblack769 Mar 24, 2021

benblack769 commented Mar 24, 2021

cpnota left a comment

Feature/vec env #239

Feature/vec env #239

Conversation

benblack769 commented Mar 23, 2021

benblack769 commented Mar 23, 2021

benblack769 commented Mar 23, 2021

benblack769 commented Mar 23, 2021

cpnota commented Mar 23, 2021

benblack769 commented Mar 23, 2021

cpnota left a comment

Choose a reason for hiding this comment

cpnota Mar 23, 2021

Choose a reason for hiding this comment

benblack769 Mar 24, 2021

Choose a reason for hiding this comment

cpnota Mar 23, 2021

Choose a reason for hiding this comment

benblack769 Mar 24, 2021

Choose a reason for hiding this comment

benblack769 commented Mar 24, 2021

cpnota left a comment

Choose a reason for hiding this comment