Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language independent hashing mechanism for float and integers (Proposal) #52

Open
weigandf opened this issue Feb 23, 2019 · 3 comments
Open

Comments

@weigandf
Copy link

weigandf commented Feb 23, 2019

Depending on what programming language is used to implement ObjectHash there is quite a difference in behavior and the resulting hash. One big issue I see is the distinguishment between a float and an integer in the case of integer-valued floats.

An example (taken from the test cases) is:

(1) ["foo", {"bar":["baz", null, 1, 1.5, 0.0001, 1000, 2, -23.1234, 2]}]
-and-
(2) ["foo", {"bar":["baz", null, 1.0, 1.5, 0.0001, 1000.0, 2.0, -23.1234, 2.0]}]

In Python the results are:
(1) 726e7ae9e3fadf8a2228bf33e505a63df8db1638fa4f21429673d387dbd1c52a
-and-
(2) 783a423b094307bcb28d005bc2f026ff44204442ef3513585e7e73b66e3c2213

The Go implementation introduced a CommonJSON object using the Go marshalling function to address this issue:

json.Marshal(o)

I would like to suggest a different solution which is language independent by following the JSON Schema proposal in:

http://json-schema.org/draft-04/json-schema-core.html#rfc.section.5.5:

It is acknowledged by this specification that some programming languages, and their associated parsers, use different internal representations for floating point numbers and integers, while others do not.

As a consequence, for interoperability reasons, JSON values used in the context of JSON Schema, whether that JSON be a JSON Schema or an instance, SHOULD ensure that mathematical integers be represented as integers as defined by this specification.

In my opinion this can be simply achieved by adding a case differentiation:

case Type.Float:
{
  if ((float)val % 1.0 == 0.0)
  { 
    HashInt((int)val);
  } else
  {
    HashFloat((float)val);
  }
  break;
}       

It can be discussed if it is useful to exclude zero from that case distinction by adding:
(float)val % 1.0 == 0.0 && (float)val != 0.0

In my opinion it would be real great for the ObjectHash project to have a common understanding about this issue and for all implementations to follow the recommendation.

@KellerFuchs
Copy link

@weigandf That's already addressed in the README

@KellerFuchs
Copy link

Regarding your actual proposal, there are 3 major issues:

  • it's backwards incompatible, i.e. the hash of some objects will change, it requires changing existing implementations;
  • it's not implementable in constant time, even when the schema and layout of the object are known;
  • x mod 1 == 0 is a poor test of integerness, as it is subject to floating-point rounding effects that may be platform dependent, i.e. some platforms round down subnormal numbers to 0, or may use a different bitwidth for their float type (f32 vs. f64, ...).

@weigandf
Copy link
Author

weigandf commented Mar 20, 2019

Hello @KellerFuchs, first thanks for your answer.

Please consider my comments:

  • Compatibility is a big issue and that is why I wrote this issue. Unfortunately if you have a look at the other implementations of ObjectHash (like Java, Go, Python, ...) you see that there is no consistent implementation for integer-valued floats. (see the Python example from the issue description)
  • I agree with you that testing for an integer with (float)val % 1.0 == 0.0 is a poor test even when it was meant like this (float)val % 1.0 < ɛ​. But still the issue to define ɛ is depending on language specific implementations and on the float type (as you said). So I agree with you that this is not a good (because too difficult) solution either.
  • I agree with the README and your comment that it would be better to introduce a function to generate a common JSON before hashing it. (as done in the Go reference implementation.) It would just be great to have a clear definition of this function (as there is currently none or I am not able to find it). Using the Go json.Marshal(o) looks like a black box to me which is quite hard to implement in other languages.

Currently I use this function:

case Type.Integer:
    {
        if (Settings.COMMON_JSONIFY)
        {
            HashFloat((float)value);
        }
        else
        {
            HashInt((int)value);
        }
        break;
    }

Not sure if this is enough to fully cover the json.Marshal(o) function of Go for the use case of integer-valued floats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants