-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Implement Int.from_bytes()
and Int.as_bytes()
#3795
Changes from 8 commits
fc608a7
cd91096
3eca741
b9f189d
cc97ffe
b580bc9
569e6ff
239bba6
21384ec
bc22870
63fa5b4
5b41e1f
b5e09c5
7766ea3
30027df
eed9463
7f142a8
3b93cd6
b0bc485
776b4df
265b648
0e5d293
6e144d5
3211d24
8caad41
dd29814
694693a
40b9fc0
98c29fc
8db9a83
3af72c5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -17,6 +17,7 @@ These are Mojo built-ins, so you don't need to import them. | |||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
from collections import KeyElement | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
from bit import byte_swap | ||||||||||||||||||||||||||||||||||||||||
from math import Ceilable, CeilDivable, Floorable, Truncable | ||||||||||||||||||||||||||||||||||||||||
from hashlib.hash import _hash_simd | ||||||||||||||||||||||||||||||||||||||||
from hashlib._hasher import _HashableWithHasher, _Hasher | ||||||||||||||||||||||||||||||||||||||||
|
@@ -27,12 +28,12 @@ from collections.string import ( | |||||||||||||||||||||||||||||||||||||||
) | ||||||||||||||||||||||||||||||||||||||||
from python import Python, PythonObject | ||||||||||||||||||||||||||||||||||||||||
from python._cpython import Py_ssize_t | ||||||||||||||||||||||||||||||||||||||||
from memory import UnsafePointer | ||||||||||||||||||||||||||||||||||||||||
from memory import memcpy, UnsafePointer | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
from utils import Writable, Writer | ||||||||||||||||||||||||||||||||||||||||
from utils import Span, Writable, Writer | ||||||||||||||||||||||||||||||||||||||||
from utils._visualizers import lldb_formatter_wrapping_type | ||||||||||||||||||||||||||||||||||||||||
from utils._select import _select_register_value as select | ||||||||||||||||||||||||||||||||||||||||
from sys import is_nvidia_gpu, bitwidthof | ||||||||||||||||||||||||||||||||||||||||
from sys import is_big_endian, is_nvidia_gpu, bitwidthof | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
# ===----------------------------------------------------------------------=== # | ||||||||||||||||||||||||||||||||||||||||
# Indexer | ||||||||||||||||||||||||||||||||||||||||
|
@@ -1194,6 +1195,64 @@ struct Int( | |||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
writer.write(self) | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
@staticmethod | ||||||||||||||||||||||||||||||||||||||||
fn from_bytes[ | ||||||||||||||||||||||||||||||||||||||||
type: DType, big_endian: Bool = False | ||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm in favor of adding a I also don't think we want to add a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was adding these methods because they are very convenient for IO or networking protocols. For example I am needing these for a pure Mojo implementation of websockets. If we don't have the
Python implementation of If we want to implement something IO-related, like WebSockets, we already know the endianness of the target we want to read from or write to, so it's faster if we use parameters. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If anything, these comments make me think that
I agree :) that's why I still said it should be a parameter, but my comments were questioning what the parameter should look like, because I'm not sold on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah if we follow Python's lead, I think
Suggested change
As for this, I personally never liked There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Completely agree with @martinvuyk . We are already diverging from the Python |
||||||||||||||||||||||||||||||||||||||||
](bytes: Span[Byte]) raises -> Self: | ||||||||||||||||||||||||||||||||||||||||
"""Converts a byte array to an integer. | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
Args: | ||||||||||||||||||||||||||||||||||||||||
bytes: The byte array to convert. | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
Parameters: | ||||||||||||||||||||||||||||||||||||||||
type: The type of the integer. | ||||||||||||||||||||||||||||||||||||||||
big_endian: Whether the byte array is big-endian. | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
Returns: | ||||||||||||||||||||||||||||||||||||||||
The integer value. | ||||||||||||||||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||||||||||||||||
if type.sizeof() != len(bytes): | ||||||||||||||||||||||||||||||||||||||||
raise Error("Byte array size does not match the integer size.") | ||||||||||||||||||||||||||||||||||||||||
var ptr: UnsafePointer[Byte] = UnsafePointer.address_of(bytes[0]) | ||||||||||||||||||||||||||||||||||||||||
var type_ptr: UnsafePointer[Scalar[type]] = ptr.bitcast[Scalar[type]]() | ||||||||||||||||||||||||||||||||||||||||
var value = type_ptr[] | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
@parameter | ||||||||||||||||||||||||||||||||||||||||
if is_big_endian() and not big_endian: | ||||||||||||||||||||||||||||||||||||||||
value = byte_swap(value) | ||||||||||||||||||||||||||||||||||||||||
elif not is_big_endian() and big_endian: | ||||||||||||||||||||||||||||||||||||||||
value = byte_swap(value) | ||||||||||||||||||||||||||||||||||||||||
return int(value) | ||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if is_big_endian() == big_endian:
...
else:
... |
||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
fn as_bytes[type: DType, big_endian: Bool = False](self) -> List[Byte]: | ||||||||||||||||||||||||||||||||||||||||
"""Convert the integer to a byte array. | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
Parameters: | ||||||||||||||||||||||||||||||||||||||||
type: The type of the integer. | ||||||||||||||||||||||||||||||||||||||||
big_endian: Whether the byte array should be big-endian. | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
Returns: | ||||||||||||||||||||||||||||||||||||||||
The byte array. | ||||||||||||||||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||||||||||||||||
alias type_len = type.sizeof() | ||||||||||||||||||||||||||||||||||||||||
var value = Scalar[type](self) | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
@parameter | ||||||||||||||||||||||||||||||||||||||||
if is_big_endian() and not big_endian: | ||||||||||||||||||||||||||||||||||||||||
value = byte_swap(value) | ||||||||||||||||||||||||||||||||||||||||
elif not is_big_endian() and big_endian: | ||||||||||||||||||||||||||||||||||||||||
value = byte_swap(value) | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
var ptr: UnsafePointer[Scalar[type]] = UnsafePointer.address_of(value) | ||||||||||||||||||||||||||||||||||||||||
var byte_ptr: UnsafePointer[Byte] = ptr.bitcast[Byte]() | ||||||||||||||||||||||||||||||||||||||||
var list = List[Byte](capacity=type_len) | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
# TODO: Maybe this can be a List.extend(ptr, count) method | ||||||||||||||||||||||||||||||||||||||||
memcpy(list.unsafe_ptr(), byte_ptr, type_len) | ||||||||||||||||||||||||||||||||||||||||
list.size = type_len | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should add more
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. Changed here: msaelices@30027df |
||||||||||||||||||||||||||||||||||||||||
return list^ | ||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
@always_inline("nodebug") | ||||||||||||||||||||||||||||||||||||||||
fn __mlir_index__(self) -> __mlir_type.index: | ||||||||||||||||||||||||||||||||||||||||
"""Convert to index. | ||||||||||||||||||||||||||||||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,8 @@ from testing import assert_equal, assert_true, assert_false, assert_raises | |
from python import PythonObject | ||
from memory import UnsafePointer | ||
|
||
alias Bytes = List[Byte] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: This definition is not decided over yet. could you take this into the scope of your test? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure! Done: msaelices@7766ea3 |
||
|
||
|
||
def test_properties(): | ||
assert_equal(Int.MAX, (1 << bitwidthof[DType.index]() - 1) - 1) | ||
|
@@ -246,6 +248,56 @@ def test_conversion_from_python(): | |
assert_equal(Int.try_from_python(PythonObject(-1)), -1) | ||
|
||
|
||
def test_from_bytes(): | ||
assert_equal(Int.from_bytes[DType.int16, big_endian=True](Bytes(0, 16)), 16) | ||
assert_equal( | ||
Int.from_bytes[DType.int16, big_endian=False](Bytes(0, 16)), 4096 | ||
) | ||
assert_equal( | ||
Int.from_bytes[DType.int16, big_endian=True](Bytes(252, 0)), -1024 | ||
) | ||
assert_equal( | ||
Int.from_bytes[DType.uint16, big_endian=True](Bytes(252, 0)), 64512 | ||
) | ||
assert_equal( | ||
Int.from_bytes[DType.int16, big_endian=False](Bytes(252, 0)), 252 | ||
) | ||
assert_equal( | ||
Int.from_bytes[DType.int32, big_endian=True](Bytes(0, 0, 0, 1)), 1 | ||
) | ||
assert_equal( | ||
Int.from_bytes[DType.int32, big_endian=False](Bytes(0, 0, 0, 1)), | ||
16777216, | ||
) | ||
assert_equal( | ||
Int.from_bytes[DType.int32, big_endian=True](Bytes(1, 0, 0, 0)), | ||
16777216, | ||
) | ||
assert_equal( | ||
Int.from_bytes[DType.int32, big_endian=True](Bytes(1, 0, 0, 1)), | ||
16777217, | ||
) | ||
assert_equal( | ||
Int.from_bytes[DType.int32, big_endian=False](Bytes(1, 0, 0, 1)), | ||
16777217, | ||
) | ||
assert_equal( | ||
Int.from_bytes[DType.int32, big_endian=True](Bytes(255, 0, 0, 0)), | ||
-16777216, | ||
) | ||
for x_ref in List[Int](10, 100, -12, 0, 1, -1, 1000, -1000): | ||
x = x_ref[] | ||
|
||
@parameter | ||
for b in range(2): | ||
assert_equal( | ||
Int.from_bytes[DType.int16, big_endian=b]( | ||
Int(x).as_bytes[DType.int16, big_endian=b]() | ||
), | ||
x, | ||
) | ||
|
||
|
||
def main(): | ||
test_properties() | ||
test_add() | ||
|
@@ -269,3 +321,4 @@ def main(): | |
test_int_uint() | ||
test_float_conversion() | ||
test_conversion_from_python() | ||
test_from_bytes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think my comment before was ever properly addressed. I think this method should be on
SIMD
, notInt
. That way theDType
makes more sense. We can think more about what it would look like onInt
later, and doingint(UInt64.from_bytes(span))
doesn't seem like too much of a hit to me.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed it. Done:
msaelices@776b4df
msaelices@265b648
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for all the change requests, but to be clear I am comfortable merging these changes on
SIMD
, but not onInt
. I think theSIMD
API makes a ton of sense, but theint
API would need some more thought since taking aDType
feels wrong.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think we should/could land the uncontroversial part (
from/to_byte
onSIMD
) first and iterate. WDYT @msaelices?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, actually the whole point of this PR was to help Python developers who need to migrate Python code using int.from_bytes() and int.to_bytes(). So, to me, it's easier if we have it on the
Int
struct too, and iterate it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with having it on the Int struct too because we would either:
I'm also not compelled by the idea that the goal is to mirror a Python API considering your earlier argument in favor of breaking away from the Python API using a parameter. As I mentioned before, I think
int(UInt64.from_bytes(my_span))
is fine for now