Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/vec env #239

Merged
merged 12 commits into from
Mar 24, 2021
Merged

Feature/vec env #239

merged 12 commits into from
Mar 24, 2021

Conversation

benblack769
Copy link
Collaborator

Why this exists:

  • Adds support for gym/stable baselines vector environments.
  • adds support for multi-processing based environment parallelization that comes with them.
  • adds support for parameter sharing in pettingzoo environments using the pettingzoo_env_to_vec_env_v0 vectorization trick in supersuit docs, tutorial

What this changes:

  • Adds an ALL VectorEnvironment abstract base class
  • Adds a DuplicatedEnv which just duplicates an arbitrary ALL environment (with correct masking, see this issue for context Support for Vector Envs #220 )
  • .duplicate() method in environments now return a DuplicatedEnv rather than a list of environments
  • Adds a GymVectorEnvironment which wraps over a gym or stable baselines vector environment to make an ALL vector environment
  • ParallelEnvExperiment now either accepts a vector environment or a regular environment (duplicating it as necessary)

How this is tested:

  • ParallelEnvExperiment test is left mostly unchanged to test backwards compatability.
  • DuplicatedEnv has a number of unit tests similar to the GymEnvironment.
  • GymVectorEnvironment is tested to return the same exact thing as the DuplicatedEnv, up to the autoreset problem. Support for Vector Envs #220

Additionally, I ran a manual test using the following script:

import gym
import supersuit
from all.environments import GymVectorEnvironment, GymEnvironment
from all.presets.classic_control import a2c
from all.experiments import ParallelEnvExperiment

def test_vec_env(vec_env):
    vec_env = GymVectorEnvironment(vec_env, "CartPole-v0")
    preset = a2c.env(vec_env).device("cpu").hyperparameters(n_envs=vec_env.num_envs).build()
    experiment = ParallelEnvExperiment(preset, vec_env)
    experiment.train(episodes=400)
    experiment.save()
    returns = experiment.test(episodes=5)
    print(returns)
    assert sum(returns) / 5 > 40

n_envs = 4
gym_env = supersuit.gym_vec_env_v0(gym.make('CartPole-v0'), n_envs)
stable_baselines_env = supersuit.stable_baselines3_vec_env_v0(gym.make('CartPole-v0'), n_envs)
supersuit_env = supersuit.concat_vec_envs_v0(gym.make('CartPole-v0'), n_envs)

test_vec_env(gym_env)
test_vec_env(stable_baselines_env)
test_vec_env(supersuit_env)

And it passes, the vector environments seem to learn pretty well in 400 episodes.

@benblack769
Copy link
Collaborator Author

Ok, I updated the test script to include the DuplicatedEnv and it doesn't train. Let me debug this a bit.

@benblack769
Copy link
Collaborator Author

False alarm, my test just didn't use the environment correctly. Fixing the integration tests now

@benblack769
Copy link
Collaborator Author

@cpnota The reason the integration tests are failing is because the test agents are single Agents that expect a single observation and emit a single action.

The problem is that in the new parallel experiment, the environment is still a vector. In theory, you can just give a dummy action for the other environments, but this is 1) inefficient in general, and 2) does not work in the pettingzoo vector env trick.

I assume there was a reason you did not want the test agents to be parallel agents, but there are potentially significant performance improvements by having a parallel agent during evaluation (can take advantage of the multiprocessing environment, and takes advantage of batch parallelization on a GPU), in addition to the above problem.

@cpnota
Copy link
Owner

cpnota commented Mar 23, 2021

I made the TestAgent for ParallelAgents be a a regular Agent 1) to make evaluation a little easier (there's some random issues, like that you need to consider the first 100 episodes that are started, not the first 100 that are finished), but mainly 2) so that can load a non-parallel agent once the agent is trained, for things like the watch script.

For this PR, I would try to figure out the minimal fix and go with that (e.g., find a way to extract a single env from the vector env). However, I think there is a use for both parallel and non-parallel test agents, so it could be nice to extend the ParallelPreset API:

class ParallelPreset():
    """
    A Preset ParallelAgent factory.

    This is the ParallelAgent version of all.presets.Preset.
    This class allows the user to instantiate preconfigured ParallelAgents and test Agents.
    All Agents constructed by the ParallelPreset share a network model and parameters.
    However, other objects, such as ReplayBuffers, are independently created for each Agent.
    The ParallelPreset can be saved and loaded from disk.
    """

    def __init__(self, name, device, hyperparameters):
        self.name = name
        self.device = device
        self.hyperparameters = hyperparameters

    @abstractmethod
    def agent(self, writer=None, train_steps=float('inf')):
        """
        Instantiate a training-mode ParallelAgent with the existing model.

        Args:
            writer (all.logging.Writer, optional): Coefficient for the entropy term in the total loss.
            train_steps (int, optional): The number of steps for which the agent will be trained.

        Returns:
            all.agents.ParallelAgent: The instantiated Agent.
        """
        pass

    @abstractmethod
    def test_agent(self):
        """
        Instantiate a test-mode Agent with the existing model.

        Returns:
            all.agents.Agent: The instantiated test Agent.
        """
        pass

    @abstractmethod
    def test_parallel_agent(self):
        """
        Instantiate a test-mode ParallelAgent with the existing model.

        Returns:
            all.agents.Parallel: The instantiated test ParallelAgent.
        """
        pass

    @property
    def n_envs(self):
        return self.hyperparameters['n_envs']

    def save(self, filename):
        """
        Save the preset and the contained model to disk.

        The preset can later be loaded using torch.load(filename), allowing
        a test mode agent to be instantiated for evaluation or other purposes.

        Args:
            filename (str): The path where the preset should be saved.
        """
        return torch.save(self, filename)

Thoughts?

@benblack769
Copy link
Collaborator Author

Yeah, that approach of making parallel test agents seems fairly simple and well concieved.

Like you suggested, I just had the parallel env experiment evaluate the first environment in the vector environment to make this PR relatively simple and independent.

Copy link
Owner

@cpnota cpnota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Good job on the tests, also. Just the one minor comment.

So the next steps on this are:

  1. Add the parallel test agents and make parallel env experiment use this instead
  2. Add multi-process support (I guess we technically have this now, in that you can pass in a Gym vector environment that uses multiprocessing, but it's not natively supported)
    Is that correct?

My only other concern is that it seems like there is tensions between the terminology Parallel/Vector. Should it just be a VectorAgent? Renaming stuff could be rolled into 1. above.

env2.seed(42)
state1 = env1.reset()
state2 = env2.reset()
assert env1.name == env2.name
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.assertEqual() on all of these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

def _fps(self, i):
end_time = timer()
return (self._frame - self._episode_start_frames[i]) / (end_time - self._episode_start_times[i])
first_state = self._env.reset()[0]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will definitely have to come back to this soon, as discussed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is really hacky.

@benblack769
Copy link
Collaborator Author

As for the idea of renaming parallel to Vector. I do like the terminology vector agent and vector environment, vector experiment, etc, but I am a C++ person so I am a bit biased. I know that physics people tend to hate the vector terminology, as they think that vectors should have linear algebra properties. I also don't really see a problem with just renaming the environment to a ParallelEnvironment, GymParallelEnvironment, etc. I'm just not really picky about these things.

As for the idea of native multiprocessing, I would say that is a problem for much later, if at all. I guess I don't think that ALL needs to do everything natively.

Copy link
Owner

@cpnota cpnota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks also for the detailed info in the PR!

@cpnota cpnota merged commit 5ee29ea into develop Mar 24, 2021
@cpnota cpnota mentioned this pull request Jun 12, 2021
@cpnota cpnota deleted the feature/vec_env branch April 12, 2022 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants