Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty objects should be data, Far should be marked as such #2018

Closed
michaelfig opened this issue Nov 13, 2020 · 29 comments · Fixed by #3525
Closed

Empty objects should be data, Far should be marked as such #2018

michaelfig opened this issue Nov 13, 2020 · 29 comments · Fixed by #3525
Assignees
Labels
bug Something isn't working marshal package: marshal

Comments

@michaelfig
Copy link
Member

michaelfig commented Nov 13, 2020

Describe the bug
The current specified behaviour of @agoric/marshal is to treat empty objects as Remotable. This is handy for creating object handles which are === to themselves when returned to the vat that created them. However, this convenience comes at a cost of inadvertently creating Presences when only an empty data object is intended.

To Reproduce
Steps to reproduce the behavior:

  1. Send an empty object ({}) as an argument to a method defined in another vat (via @agoric/marshal)
  2. console.log out the received argument.
  3. Note that it displays as a Presence, not just an empty object.

Expected behavior
It would be clearer to have empty data objects treated as pass-by-copy.

Platform Environment

  • what OS are you using? what version of Node.js? n/a
  • is there anything special/unusual about your platform? n/a
  • what version of the Agoric-SDK are you using? latest master as of 2020-11-13

Additional context
See #804 for some discussion

As a migration step, requiring people to explicitly mark empty objects as either Remotable or Data would be useful to detect up-front when they haven't considered what the object is intended to be. During this intermediate step, the Data marking could be applied automatically to empty objects that are received as pass-by-copy from another vat to prevent false positives (only empty objects constructed from within the vat would need explicit marking).

Data can be made optional in a later step when the default for unmarked empty objects is pass-by-copy.

@erights
Copy link
Member

erights commented Jan 15, 2021

We thought we could get by treating empty objects as remotables even though they sometimes just represent empty data objects. We missed that there is no semantics for sameStructure that works for both interpretations of an empty object.

const x1 = harden({ foo: 1 });
const y1 = harden({ foo: 1 });
safeStructure(x0, y0); // true

but

const x0 = harden({});
const y0 = harden({});
safeStructure(x0, y0); // false

So we need to make progress on #804 urgently in order to fix this one.
Some progress at #2178

@warner
Copy link
Member

warner commented Jan 21, 2021

In today's kernel meeting, we settled on a transition plan:

  • Introduce a function, probably named Data(), which records the intention to be pass-by-copy. This lives next to the existing Far() and Remotable() functions.
  • Change marshal to warn when it serializes an empty object that is not marked as either Remotable/Far or Data.
  • Using the warning as a guide, locate and mark all empty objects as either Far or Data, until the warnings stop.
  • Change marshal to throw an error when asked to serialize an unmarked empty object.
  • Wait a while, make sure we've caught all the cases.
  • Change marshal to treat empty objects as pass-by-copy data, even without the mark.
  • Remove the Data() markers.

The end/goal state is that pass-by-data is the default (I think we had agreement on this), a Data marker is neither needed nor available (I think we mostly had agreement on this), and that all pass-by-reference objects should be marked with Far/Remotable (I'm not sure we had agreement on this).

We certainly want unmarked empty objects to be pass-by-copy, because e.g. @FUDCo 's option-bag use case is best handled that way.

Our current behavior is that unmarked empty objects are pass-by-reference. In one sense this is ok, because they'll arrive as Presences, which look mostly like empty objects, so any code which e.g. enumerates over their properties, or splats them into another object (x = { a: 1, ...remoteRecordWhichHappensToBeAPresence }) will accidentally do the right thing. But sameStructure treats Presences as distinct, so comparing two records that contain empty records won't work correctly. @dtribble noted that since pass-by-reference empty objects lack type information (because the only way to attach type information is with Far/Remotable), we could work around this temporarily by changing sameStructure to treat untyped Presences as equal. But I think we decided that we should execute the plan above instead of using this workaround.

@dtribble dtribble assigned warner and unassigned erights and michaelfig Jan 22, 2021
@warner
Copy link
Member

warner commented Jan 22, 2021

@erights mentioned that he's happy with this plan

@michaelfig
Copy link
Member Author

For future reference, I think it will be simpler just to mark the actual empty-object literals (the exact {} occurance, and possibly the return value from Object.fromEntries(...) or Object.create(...)), but leave them mutable.

That's both more systematic and more tolerant than trying to determine when the object is "ready for delivery", especially since we are currently auto-hardening arguments to eventual sends.

@erights
Copy link
Member

erights commented Jan 22, 2021

As an interim step, sure. As a place to land, that would make me very sad.

@erights
Copy link
Member

erights commented Jan 22, 2021

Likewise, having our lower level harden objects without even a warning is masking bugs. They should at least warn, and eventually reject unhardened arguments.

@warner
Copy link
Member

warner commented Jan 24, 2021

work is happening in /~https://github.com/Agoric/agoric-sdk/tree/2018-empty-objects

I count 216 warnings in the swingset tests (probably far fewer call sites, each run multiple times).

@warner
Copy link
Member

warner commented Jan 24, 2021

I fixed all the occurrences in the swingset tests. I'm now looking at the rest of the tree, currently there are 768 warnings.

@warner warner added the marshal package: marshal label Jan 25, 2021
@warner
Copy link
Member

warner commented Jan 26, 2021

I ran into an old problem with the definition of Data: I followed the existing pattern used by Far/Remotable (a WeakMap/WeakSet defined at the top level of marshal.js):

const registeredData = new WeakSet();

function isPassByCopyRecord(val) {
  ...
  const descEntries = Object.entries(Object.getOwnPropertyDescriptors(val));
  if (descEntries.length === 0) {
    // empty non-array objects must be registered with Data or Far/Remotable
    if (!remotableToInterface.has(val) && !registeredData.has(val)) {
      console.log(`--- @@marshal: empty object without Data/Far/Remotable`);
    }
  ...
}

export function Data(obj) {
  harden(obj);
  registeredData.add(obj);
  return obj;
};

but, our kernel bundle (which includes the copy of liveslots.js which is used by all local vats) includes one copy of marshal.js, while each vat bundle includes its own (separate) copy of marshal.js, and they don't share the same WeakSet table. So if vats use import { Data } from '@agoric/marshal.js' to get Data, their markers won't be recognized by liveslots when it goes to serialize the arguments. This is a form of the "identity discontinuity" problem that made using multiple Realms difficult.

We've seen this problem before, when we first introduced Remotable. The question is how to share a singleton with vat code. The wrinkle is that we have code defined in libraries that want to run both inside a vat (for regular operation) and outside of a vat (for unit tests). There are basically three answers:

  • 1: import it from a module. Given our current bundleSource/rollup/importBundle approach, this has the identity-discontinuity problem, which could be addressed by:
    • tell rollup that the module in question is an "external", so rollup will leave a require() in place, and then provide a special require global to vats, which is wired to a fake module that returns the singleton
    • add the singleton as a global, change the module to only define the singleton if it is "first on the scene", deferring to the global if it already exists
  • 2: fetch it from a global
  • 3: pass it as an argument (the vatPowers to buildRootObject), threaded through the entire call stack down to the code that needs it

Importing from a module is the most ergonomic answer. Fetching it from a global is easy to code against, but unit tests would need some externally-supplied shim to inject the global first. Passing it as as argument is really painful, because when buildRootObject calls A, A calls B, B calls C, and C wants to use Data, all of A/B/C must include Data as an argument, which bloats the function signatures and is awfully hard to maintain.

We currently manage harden() as a global, such that all code using harden is declared to require a "SES environment". Unit tests achieve this by importing @agoric/install-ses before any other imports, while vats bundles are evaluated in a SES Compartment which has a harden global as the SES spec defines.

The Remotable function is currently managed as an argument, passed into vatPowers. It is used in captp, but no other user-level code like ERTP or Zoe.

The recently-added Far function (which is implemented in terms of Remotable) is managed as an imported module, however this suffers from the identity-discontinutity problem, so I don't think it's actually functional in real vats right now (cc @erights).

Marking with a Symbol instead of registering in a WeakMap

A fourth approach would be to mark objects as Data or Remotable by adding a non-enumerable property whose "name" is a registered Symbol. For Data, the property's value could just be true, while for Remotable it could be the interface name.

This would address the identity discontinuity problem because any two acts of registering the same name will yield Symbols that share an object identity: Symbol.for('@agoric/marshal:Data') === Symbol.for('@agoric/marshal:Data'), no matter what bundles or acts of evaluation contain the two sides. The left side can be the copy of marshal.js that is imported by liveslots in the kernel bundle, while the right side could be the other copy of marshal.js that is imported into a vat bundle.

(Unregistered Symbols, in contrast, are unforgeable: Symbol('data') !== Symbol('data'), and the string is merely a diagnostic description, rather than an identity. Using an unregistered Symbol would cause the same problem that the unforgeable WeakMap/WeakSet has)

(Yet another option that @FUDCo pointed out would be to make the WeakMap a global. This wouldn't cause any direct authority problems, because WeakMaps are unenumerable, but could still provide an illicit communication channel because widely-held objects like Object and Number could be added/removed by otherwise mutually-incommunicado code)

By making the marker property be unenumerable, casual users of the object won't be able to tell that it is marked (they would have to use Object.getOwnProperties or similar to see the symbol property). Also, because our serialization layer uses JSON.stringify, the serialized form will ignore the marker property too, so remote users won't be able to tell either (even if they do look deeper).

Note that this makes Data and Far/Remotable forgeable in a way they weren't previously: not only could code use Object.getOwnPropertySymbols to extract the Data symbol and attach it to a different object, but it can also just synthesize a new copy of the symbol itself, with Symbol.for. But we don't need these markers to be unforgeable, so this isn't an issue.

One other difference is that, since we're adding a property, we cannot apply Data to an object that is already hardened. As @erights pointed out, this is actually a feature: if I create an object as pass-by-reference, you should not be able to cause my object to be treated as pass-by-copy. I can prevent this by merely hardening the object ahead of time, which I ought to be doing anyways. Most code will apply the Data or Far marker immediately surrounding the object literal:

const empty = Data({});
const handle = Far('interface name', {});

@FUDCo 's "cumulative option bag" use-case defers applying the marker until after the bag is full, but before the object is revealed outside the construction function:

const options = {};
for (let opt of argv) {
  ...
  options[name] = value;
}
return Data(options);

So my new plan is to have marshal.js contain something like:

const DataSymbol = Symbol.for('@agoric/marshal:Data');
function Data(obj) {
  Object.defineProperty(obj, DataSymbol, { enumerable: false, value: true });
  return harden(obj);
}

and change the passStyleOf code to use obj[DataSymbol] (or maybe Object.getOwnPropertyDescriptor(obj, DataSymbol).value) to sense it.

I think I need to change Remotable and Far to do the same thing (using the interface name instead of true as the property's value), because I think Far is currently broken.

@michaelfig I wanted to check this plan with you, I know you and @erights circled around this issue a lot, and settled on the current vatPowers.Remotable approach with good reason. If I'm missing something, please let me know.

@warner
Copy link
Member

warner commented Jan 27, 2021

It looks like I need to change the way isPassByCopyRecord works, to ignore non-enumerable properties (it currently looks at properties with Object.getOwnPropertyDescriptors, which includes the non-enumerable ones), and to tolerate Symbol-named properties (it has code which checks for these, but I think it's broken/bypassed, because it uses Object.entries around the descriptors, which ignores Symbols, but then the actual marshalling code uses Reflect.ownKeys().sort(), which fails because Symbols can't be auto-stringified, so they can't be sorted).

@erights
Copy link
Member

erights commented Jan 27, 2021

to ignore non-enumerable properties

Symbol-named properties

No, that's not at all what I have in mind. We need to fix the bug you identified in isPassByCopyRecord so it is less tolerant. We should tolerate only the symbol properties we're introducing to communicate with marshal -- marking something remotable or data. Any other symbols, and any other non-enumerable properties, must cause rejection. These special properties should not be serialized as properties. Rather, their only affect on serialization is how the object is classified, and thus how or whether the other properties are serialized.

@warner
Copy link
Member

warner commented Jan 27, 2021

Got it. Out of curiosity, what's the rationale for rejecting (rather than ignoring) those other properties? Is it to ensure that the round-tripped object is the same as the original (specifically that Object.getOwnPropertyDescriptors() of the copy is deep-equal to that of the original)?

That reminds me, I think we need to also change the deserializer to add the same Data marking as the original. We want to make sure that an empty (Data) object can be passed into a method as an argument, and then passed back out again (as the return value, or in an argument of an outbound message), without the code needing to add its own Data(obj) in the middle.

@erights
Copy link
Member

erights commented Jan 27, 2021

That reminds me, I think we need to also change the deserializer to add the same Data marking as the original.

Yes. In the same way that the unserializer adds the same Remotable marking as the original. Data does not have the same third party problem since everyone is equally authoritative over their own copy, so the way we're currently restoring the Remotable marking will actually be correct for Data.

@erights
Copy link
Member

erights commented Jan 27, 2021

Got it. Out of curiosity, what's the rationale for rejecting (rather than ignoring) those other properties? Is it to ensure that the round-tripped object is the same as the original (specifically that Object.getOwnPropertyDescriptors() of the copy is deep-equal to that of the original)?

  • The semantics of our distributed object system is in terms of a much simpler base semantics, and a closer-to-language-independent one, than JS itself.

  • Our own code processing Data objects we receive sometimes uses primitives that enumerate all own properties. Sometimes uses primitives that enumerate only enumerable own properties. "..." both for expressions and destructuring patterns is one of the most convenient ways of manipulating objects, but operates only on enumerable own properties. Our code currently assumes all of these agree. I would rather make that code correct by removing the difference than force it to deal with the complexities of these operations being different. I would also like to relieve our users of that hazard.

@erights
Copy link
Member

erights commented Jan 27, 2021

without the code needing to add its own Data(obj) in the middle.

Yes. But the code in the middle can't anyway because the object is hardened.

@warner
Copy link
Member

warner commented Jan 29, 2021

@erights FYI, I found that we're currently adding a Symbol.toStringTag to a few objects, which the marshal code needs to tolerate (ignore, do not serialize). It occurs here:

const FakeWeakRefPrototype = {
deref() {
return weakRefTarget.get(this);
},
[Symbol.toStringTag]: 'WeakRef',
};

Soon, liveslots will need access to WeakRef and FinalizationRegistry to perform its semi-determinstic GC work. This work is half-complete, but liveslots is currently passed both, and there are a few tests to exercise their use. As a measure of compatibility with Node.js v12 (which does not provide either unless you pass an unstable CLI --flag), src/weakref.js defines a fake WeakRef (which actually holds a strong ref) when it's not otherwise available. To make this fake WeakRef behave mostly like the real one, I found I needed to add a Symbol.toStringTag to the instances.

(Hm, apparently I wrote the first version of the fake WeakRef, but then you made a cleanup pass, so toStringTag might be in code that you wrote)

It's not a big deal to have marshal ignore this particular property when prohibiting other Symbol-named properties (the same code will ignore our Data/Far marker properties), but if we don't like this, we could either find a different way to build the fake WeakRef, or drop compatibility with the older Node.js and always pass a real WeakRef in.

@warner
Copy link
Member

warner commented Feb 1, 2021

@erights I was completely wrong about the source of the toStringTag. It's actually coming from marshal.js itself, when it creates a Remotable: a new prototype is inserted between the Remotable and its original prototype, the inserted one includes both toString and Symbol.toStringTag, and mustPassByRemote() checks that the prototype is passByRemote too:

Object.create(oldRemotableProto, {
toString: {
value: toString,
},
[Symbol.toStringTag]: {
value: allegedName,
},
}),

I was tricked by the fact that the failure only showed up in the SwingSet tests: apparently the marshal tests don't ever subject a Remotable to mustPassByPresence (it can be passed to serialize without complaint, but mustPassByPresence/mustPassByRemote complains). Swingset does do that: liveslots tests the return value of buildRootObject against mustPassByPresence to fail fast if the user code does something wrong.

I don't understand the "is the prototype passByRemote" check. I also don't understand why serialize tolerate this: I see comments in mustPassByRemote that suggest it wouldn't be called if the object is registered in the Remotable table, but I'm not sure what might be doing such a check, or whether the check is unnecessary in that situation.

My current plan is to update test-marshal.js to include a mustPassByPresence(Far(..)), to trigger this situation, and then make serialization tolerate (ignore, do not serialize) the Symbol.toStringTag property on both pass-by-reference (which is causing the problem) and pass-by-copy (which is not, currently, but it seems like a good idea to keep them in sync).

@warner
Copy link
Member

warner commented Feb 1, 2021

that first PR merely adds tests of the existing behavior, without changing that behavior

@warner
Copy link
Member

warner commented Feb 3, 2021

I should record the fact that our current plan is to make Far/Remotable mandatory for all pass-by-reference objects. When we're done with the transition process, it will be an error to serialize e.g. harden({ foo: () => 0 }). The rules will be:

  • everything must be hardened
  • if it's marked with Far/Remotable, it will be pass-by-reference
  • else it's pass-by-copy, and must only contain data
    • if/when we implement pass-by-reference of bare Functions, someday, it may contain functions too
  • there is no Data() marker

warner added a commit that referenced this issue Mar 5, 2021
When #2018 changes the interpretation of `harden({})` to become pass-by-copy,
any code which was previously using that to make a "handle" will break,
because the things they send will be treated as pass-by-copy. To catch cases
where this is happening, we forbid such keys from being used in
store/weakstore. Clients should use `Far('interfacename')` to create handles,
and these will be accepted by store/weakstore as keys.
warner added a commit that referenced this issue Mar 10, 2021
When #2018 changes the interpretation of `harden({})` to become pass-by-copy,
any code which was previously using that to make a "handle" will break,
because the things they send will be treated as pass-by-copy. To catch cases
where this is happening, we forbid such keys from being used in
store/weakstore. Clients should use `Far('interfacename')` to create handles,
and these will be accepted by store/weakstore as keys.
warner added a commit to Agoric/dapp-fungible-faucet that referenced this issue Mar 10, 2021
warner added a commit that referenced this issue Mar 11, 2021
Annotate cosmic-swingset objects with Far or Data. We didn't try to mark everything. Instead, we marked everything we could easily find and which seemed like it was important to mark (ambiguous empty objects). Also, we changed `store` to reject empty pass-by-copy objects as keys, and we marked enough empty objects to survive that check, which ought to be enough for the immediate goal (changing `{}` to be pass-by-copy). We'll defer the other half of #2018 (requiring Far on all pass-by-reference objects) for later.

refs #2018
warner added a commit that referenced this issue Mar 11, 2021
When #2018 changes the interpretation of `harden({})` to become pass-by-copy,
any code which was previously using that to make a "handle" will break,
because the things they send will be treated as pass-by-copy. To catch cases
where this is happening, we forbid such keys from being used in
store/weakstore. Clients should use `Far('interfacename')` to create handles,
and these will be accepted by store/weakstore as keys.
warner added a commit that referenced this issue Mar 12, 2021
Finally change the serialization of a plain empty object, i.e. `harden({})`,
from a pass-by-reference "marker" to plain pass-by-copy data. We think we've
identified all the places where an empty object was used represent a marker,
and changed them to use `Far(iface, {})` instead, which triggers
pass-by-reference.

refs #2018
warner added a commit that referenced this issue Mar 12, 2021
Finally change the serialization of a plain empty object, i.e. `harden({})`,
from a pass-by-reference "marker" to plain pass-by-copy data. We think we've
identified all the places where an empty object was used represent a marker,
and changed them to use `Far(iface, {})` instead, which triggers
pass-by-reference.

refs #2018
@warner
Copy link
Member

warner commented Mar 12, 2021

Current status: harden({}) is now pass-by-copy, Far() is recommended but not required (no warnings are produced, but you could uncomment lines in marshal.js to enable warnings and/or hard errors), Data() is still available but not required.

The next step is to strip out Data (#2632). Then establish a different ticket (#2634) for promoting/requiring the use of Far.

warner added a commit that referenced this issue Mar 13, 2021
Now that `harden({})` is pass-by-copy, we no longer need the Data() marker.

refs #2018
warner added a commit that referenced this issue Mar 13, 2021
warner added a commit that referenced this issue Mar 16, 2021
Now that `harden({})` is pass-by-copy, we no longer need the Data() marker.

refs #2018
warner added a commit that referenced this issue Mar 16, 2021
warner added a commit that referenced this issue Mar 16, 2021
Now that `harden({})` is pass-by-copy, we no longer need the Data() marker.

refs #2018
warner added a commit that referenced this issue Mar 16, 2021
warner added a commit that referenced this issue Mar 17, 2021
Now that `harden({})` is pass-by-copy, we no longer need the Data() marker.

refs #2018
warner added a commit that referenced this issue Mar 17, 2021
warner added a commit that referenced this issue Mar 17, 2021
Now that `harden({})` is pass-by-copy, we no longer need the Data() marker.

refs #2018
warner added a commit that referenced this issue Mar 17, 2021
@erights
Copy link
Member

erights commented Jul 29, 2021

See also #3558

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working marshal package: marshal
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants