-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/vec env #239
Feature/vec env #239
Conversation
Ok, I updated the test script to include the DuplicatedEnv and it doesn't train. Let me debug this a bit. |
False alarm, my test just didn't use the environment correctly. Fixing the integration tests now |
@cpnota The reason the integration tests are failing is because the test agents are single Agents that expect a single observation and emit a single action. The problem is that in the new parallel experiment, the environment is still a vector. In theory, you can just give a dummy action for the other environments, but this is 1) inefficient in general, and 2) does not work in the pettingzoo vector env trick. I assume there was a reason you did not want the test agents to be parallel agents, but there are potentially significant performance improvements by having a parallel agent during evaluation (can take advantage of the multiprocessing environment, and takes advantage of batch parallelization on a GPU), in addition to the above problem. |
I made the For this PR, I would try to figure out the minimal fix and go with that (e.g., find a way to extract a single env from the vector env). However, I think there is a use for both parallel and non-parallel test agents, so it could be nice to extend the class ParallelPreset():
"""
A Preset ParallelAgent factory.
This is the ParallelAgent version of all.presets.Preset.
This class allows the user to instantiate preconfigured ParallelAgents and test Agents.
All Agents constructed by the ParallelPreset share a network model and parameters.
However, other objects, such as ReplayBuffers, are independently created for each Agent.
The ParallelPreset can be saved and loaded from disk.
"""
def __init__(self, name, device, hyperparameters):
self.name = name
self.device = device
self.hyperparameters = hyperparameters
@abstractmethod
def agent(self, writer=None, train_steps=float('inf')):
"""
Instantiate a training-mode ParallelAgent with the existing model.
Args:
writer (all.logging.Writer, optional): Coefficient for the entropy term in the total loss.
train_steps (int, optional): The number of steps for which the agent will be trained.
Returns:
all.agents.ParallelAgent: The instantiated Agent.
"""
pass
@abstractmethod
def test_agent(self):
"""
Instantiate a test-mode Agent with the existing model.
Returns:
all.agents.Agent: The instantiated test Agent.
"""
pass
@abstractmethod
def test_parallel_agent(self):
"""
Instantiate a test-mode ParallelAgent with the existing model.
Returns:
all.agents.Parallel: The instantiated test ParallelAgent.
"""
pass
@property
def n_envs(self):
return self.hyperparameters['n_envs']
def save(self, filename):
"""
Save the preset and the contained model to disk.
The preset can later be loaded using torch.load(filename), allowing
a test mode agent to be instantiated for evaluation or other purposes.
Args:
filename (str): The path where the preset should be saved.
"""
return torch.save(self, filename) Thoughts? |
Yeah, that approach of making parallel test agents seems fairly simple and well concieved. Like you suggested, I just had the parallel env experiment evaluate the first environment in the vector environment to make this PR relatively simple and independent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Good job on the tests, also. Just the one minor comment.
So the next steps on this are:
- Add the parallel test agents and make parallel env experiment use this instead
- Add multi-process support (I guess we technically have this now, in that you can pass in a Gym vector environment that uses multiprocessing, but it's not natively supported)
Is that correct?
My only other concern is that it seems like there is tensions between the terminology Parallel/Vector. Should it just be a VectorAgent? Renaming stuff could be rolled into 1. above.
all/environments/vector_env_test.py
Outdated
env2.seed(42) | ||
state1 = env1.reset() | ||
state2 = env2.reset() | ||
assert env1.name == env2.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertEqual()
on all of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
def _fps(self, i): | ||
end_time = timer() | ||
return (self._frame - self._episode_start_frames[i]) / (end_time - self._episode_start_times[i]) | ||
first_state = self._env.reset()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will definitely have to come back to this soon, as discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is really hacky.
As for the idea of renaming parallel to Vector. I do like the terminology vector agent and vector environment, vector experiment, etc, but I am a C++ person so I am a bit biased. I know that physics people tend to hate the vector terminology, as they think that vectors should have linear algebra properties. I also don't really see a problem with just renaming the environment to a ParallelEnvironment, GymParallelEnvironment, etc. I'm just not really picky about these things. As for the idea of native multiprocessing, I would say that is a problem for much later, if at all. I guess I don't think that ALL needs to do everything natively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks also for the detailed info in the PR!
Why this exists:
pettingzoo_env_to_vec_env_v0
vectorization trick in supersuit docs, tutorialWhat this changes:
VectorEnvironment
abstract base classDuplicatedEnv
which just duplicates an arbitrary ALL environment (with correct masking, see this issue for context Support for Vector Envs #220 )DuplicatedEnv
rather than a list of environmentsGymVectorEnvironment
which wraps over a gym or stable baselines vector environment to make an ALL vector environmentHow this is tested:
ParallelEnvExperiment
test is left mostly unchanged to test backwards compatability.DuplicatedEnv
has a number of unit tests similar to the GymEnvironment.GymVectorEnvironment
is tested to return the same exact thing as theDuplicatedEnv
, up to the autoreset problem. Support for Vector Envs #220Additionally, I ran a manual test using the following script:
And it passes, the vector environments seem to learn pretty well in 400 episodes.