Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Vector Envs #220

Closed
cpnota opened this issue Jan 22, 2021 · 4 comments
Closed

Support for Vector Envs #220

cpnota opened this issue Jan 22, 2021 · 4 comments
Assignees

Comments

@cpnota
Copy link
Owner

cpnota commented Jan 22, 2021

No description provided.

@benblack769
Copy link
Collaborator

@cpnota So I worked on this quite a bit, but after almost finishing I figured out that the Gym vector API is completely incompatible with ALL.

The problem is what happens to the observation during reset.

ALL's parallel experiment (and any sane system) expects this:

O1, r1, d1, i1
O2, r2, d2, i2
...
Ot, rt, dt, it
O1, r1, d1, i1

Unfortunately, the gym vector API is not sane, so it does this:

O1, r1, d1, i1
O2, r2, d2, i2
...
O1, rt, dt, it
O2, r2, d2, i2

Since the environments are interweaved and reset at different times (they autoreset when done), this does not work with ALL's notion of a mask on Ot.

There are a few options:

  1. Ignore the problem. The agent will not train on the first action, only the 2nd and later actions.
  2. I have my own implementations of vector environments in SuperSuit. So in theory, I can just make them compatible with ALL via an argument. But this means that ALL will not have perfect interoperability with gym and stable baseline's vector environments. Not clear that this is a big problem, considering how bad interoperability is between stable baselines and Gym's vector environments right now.

Thoughts?

@cpnota
Copy link
Owner Author

cpnota commented Mar 22, 2021

Oh I see, instead of giving a terminal "observation", it provides the final reward etc. during the first timestep of the previous episode. In their defense, I see why they did it that way (to avoid masking). But that is sort of frustrating.

Are 1. and 2. incompatible? For most of the current environments, I'm guessing that it won't make really any difference, but it could be nice to have the option. I would probably go with the minimum viable fix for now.

@benblack769
Copy link
Collaborator

No, the two options are perfectly compatible. I realized that after sending the last message.

@cpnota
Copy link
Owner Author

cpnota commented Jul 2, 2021

Closed by #240

@cpnota cpnota closed this as completed Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants