Support for Vector Envs #220

cpnota · 2021-01-22T21:25:16Z

No description provided.

benblack769 · 2021-03-22T22:04:14Z

@cpnota So I worked on this quite a bit, but after almost finishing I figured out that the Gym vector API is completely incompatible with ALL.

The problem is what happens to the observation during reset.

ALL's parallel experiment (and any sane system) expects this:

O1, r1, d1, i1
O2, r2, d2, i2
...
Ot, rt, dt, it
O1, r1, d1, i1

Unfortunately, the gym vector API is not sane, so it does this:

O1, r1, d1, i1
O2, r2, d2, i2
...
O1, rt, dt, it
O2, r2, d2, i2

Since the environments are interweaved and reset at different times (they autoreset when done), this does not work with ALL's notion of a mask on Ot.

There are a few options:

Ignore the problem. The agent will not train on the first action, only the 2nd and later actions.
I have my own implementations of vector environments in SuperSuit. So in theory, I can just make them compatible with ALL via an argument. But this means that ALL will not have perfect interoperability with gym and stable baseline's vector environments. Not clear that this is a big problem, considering how bad interoperability is between stable baselines and Gym's vector environments right now.

Thoughts?

cpnota · 2021-03-22T22:50:35Z

Oh I see, instead of giving a terminal "observation", it provides the final reward etc. during the first timestep of the previous episode. In their defense, I see why they did it that way (to avoid masking). But that is sort of frustrating.

Are 1. and 2. incompatible? For most of the current environments, I'm guessing that it won't make really any difference, but it could be nice to have the option. I would probably go with the minimum viable fix for now.

benblack769 · 2021-03-23T13:52:07Z

No, the two options are perfectly compatible. I realized that after sending the last message.

cpnota · 2021-07-02T15:47:02Z

Closed by #240

cpnota assigned cpnota and benblack769 Jan 22, 2021

benblack769 mentioned this issue Mar 23, 2021

Feature/vec env #239

Merged

cpnota mentioned this issue Mar 30, 2021

Feature/parallel test agent #240

Merged

cpnota closed this as completed Jul 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Vector Envs #220

Support for Vector Envs #220

cpnota commented Jan 22, 2021

benblack769 commented Mar 22, 2021

cpnota commented Mar 22, 2021

benblack769 commented Mar 23, 2021

cpnota commented Jul 2, 2021

Support for Vector Envs #220

Support for Vector Envs #220

Comments

cpnota commented Jan 22, 2021

benblack769 commented Mar 22, 2021

cpnota commented Mar 22, 2021

benblack769 commented Mar 23, 2021

cpnota commented Jul 2, 2021