-
Notifications
You must be signed in to change notification settings - Fork 13k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Auto merge of #110083 - saethlin:encode-hashes-as-bytes, r=cjgillot
Encode hashes as bytes, not varint In a few places, we store hashes as `u64` or `u128` and then apply `derive(Decodable, Encodable)` to the enclosing struct/enum. It is more efficient to encode hashes directly than try to apply some varint encoding. This PR adds two new types `Hash64` and `Hash128` which are produced by `StableHasher` and replace every use of storing a `u64` or `u128` that represents a hash. Distribution of the byte lengths of leb128 encodings, from `x build --stage 2` with `incremental = true` Before: ``` ( 1) 373418203 (53.7%, 53.7%): 1 ( 2) 196240113 (28.2%, 81.9%): 3 ( 3) 108157958 (15.6%, 97.5%): 2 ( 4) 17213120 ( 2.5%, 99.9%): 4 ( 5) 223614 ( 0.0%,100.0%): 9 ( 6) 216262 ( 0.0%,100.0%): 10 ( 7) 15447 ( 0.0%,100.0%): 5 ( 8) 3633 ( 0.0%,100.0%): 19 ( 9) 3030 ( 0.0%,100.0%): 8 ( 10) 1167 ( 0.0%,100.0%): 18 ( 11) 1032 ( 0.0%,100.0%): 7 ( 12) 1003 ( 0.0%,100.0%): 6 ( 13) 10 ( 0.0%,100.0%): 16 ( 14) 10 ( 0.0%,100.0%): 17 ( 15) 5 ( 0.0%,100.0%): 12 ( 16) 4 ( 0.0%,100.0%): 14 ``` After: ``` ( 1) 372939136 (53.7%, 53.7%): 1 ( 2) 196240140 (28.3%, 82.0%): 3 ( 3) 108014969 (15.6%, 97.5%): 2 ( 4) 17192375 ( 2.5%,100.0%): 4 ( 5) 435 ( 0.0%,100.0%): 5 ( 6) 83 ( 0.0%,100.0%): 18 ( 7) 79 ( 0.0%,100.0%): 10 ( 8) 50 ( 0.0%,100.0%): 9 ( 9) 6 ( 0.0%,100.0%): 19 ``` The remaining 9 or 10 and 18 or 19 are `u64` and `u128` respectively that have the high bits set. As far as I can tell these are coming primarily from `SwitchTargets`.
- Loading branch information
Showing
38 changed files
with
288 additions
and
137 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
//! rustc encodes a lot of hashes. If hashes are stored as `u64` or `u128`, a `derive(Encodable)` | ||
//! will apply varint encoding to the hashes, which is less efficient than directly encoding the 8 | ||
//! or 16 bytes of the hash. | ||
//! | ||
//! The types in this module represent 64-bit or 128-bit hashes produced by a `StableHasher`. | ||
//! `Hash64` and `Hash128` expose some utilty functions to encourage users to not extract the inner | ||
//! hash value as an integer type and accidentally apply varint encoding to it. | ||
//! | ||
//! In contrast with `Fingerprint`, users of these types cannot and should not attempt to construct | ||
//! and decompose these types into constitutent pieces. The point of these types is only to | ||
//! connect the fact that they can only be produced by a `StableHasher` to their | ||
//! `Encode`/`Decode` impls. | ||
use crate::stable_hasher::{StableHasher, StableHasherResult}; | ||
use rustc_serialize::{Decodable, Decoder, Encodable, Encoder}; | ||
use std::fmt; | ||
use std::ops::BitXorAssign; | ||
|
||
#[derive(Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord, Default)] | ||
pub struct Hash64 { | ||
inner: u64, | ||
} | ||
|
||
impl Hash64 { | ||
pub const ZERO: Hash64 = Hash64 { inner: 0 }; | ||
|
||
#[inline] | ||
pub(crate) fn new(n: u64) -> Self { | ||
Self { inner: n } | ||
} | ||
|
||
#[inline] | ||
pub fn as_u64(self) -> u64 { | ||
self.inner | ||
} | ||
} | ||
|
||
impl BitXorAssign<u64> for Hash64 { | ||
#[inline] | ||
fn bitxor_assign(&mut self, rhs: u64) { | ||
self.inner ^= rhs; | ||
} | ||
} | ||
|
||
impl<S: Encoder> Encodable<S> for Hash64 { | ||
#[inline] | ||
fn encode(&self, s: &mut S) { | ||
s.emit_raw_bytes(&self.inner.to_le_bytes()); | ||
} | ||
} | ||
|
||
impl<D: Decoder> Decodable<D> for Hash64 { | ||
#[inline] | ||
fn decode(d: &mut D) -> Self { | ||
Self { inner: u64::from_le_bytes(d.read_raw_bytes(8).try_into().unwrap()) } | ||
} | ||
} | ||
|
||
impl StableHasherResult for Hash64 { | ||
#[inline] | ||
fn finish(hasher: StableHasher) -> Self { | ||
Self { inner: hasher.finalize().0 } | ||
} | ||
} | ||
|
||
impl fmt::Debug for Hash64 { | ||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
self.inner.fmt(f) | ||
} | ||
} | ||
|
||
impl fmt::LowerHex for Hash64 { | ||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
fmt::LowerHex::fmt(&self.inner, f) | ||
} | ||
} | ||
|
||
#[derive(Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord, Default)] | ||
pub struct Hash128 { | ||
inner: u128, | ||
} | ||
|
||
impl Hash128 { | ||
#[inline] | ||
pub fn truncate(self) -> Hash64 { | ||
Hash64 { inner: self.inner as u64 } | ||
} | ||
|
||
#[inline] | ||
pub fn wrapping_add(self, other: Self) -> Self { | ||
Self { inner: self.inner.wrapping_add(other.inner) } | ||
} | ||
|
||
#[inline] | ||
pub fn as_u128(self) -> u128 { | ||
self.inner | ||
} | ||
} | ||
|
||
impl<S: Encoder> Encodable<S> for Hash128 { | ||
#[inline] | ||
fn encode(&self, s: &mut S) { | ||
s.emit_raw_bytes(&self.inner.to_le_bytes()); | ||
} | ||
} | ||
|
||
impl<D: Decoder> Decodable<D> for Hash128 { | ||
#[inline] | ||
fn decode(d: &mut D) -> Self { | ||
Self { inner: u128::from_le_bytes(d.read_raw_bytes(16).try_into().unwrap()) } | ||
} | ||
} | ||
|
||
impl StableHasherResult for Hash128 { | ||
#[inline] | ||
fn finish(hasher: StableHasher) -> Self { | ||
let (_0, _1) = hasher.finalize(); | ||
Self { inner: u128::from(_0) | (u128::from(_1) << 64) } | ||
} | ||
} | ||
|
||
impl fmt::Debug for Hash128 { | ||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
self.inner.fmt(f) | ||
} | ||
} | ||
|
||
impl fmt::LowerHex for Hash128 { | ||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
fmt::LowerHex::fmt(&self.inner, f) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.