Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace chrono::now() with Instant::now() #5097

Merged
merged 3 commits into from
Oct 29, 2021
Merged

Replace chrono::now() with Instant::now() #5097

merged 3 commits into from
Oct 29, 2021

Conversation

pmnoxx
Copy link
Contributor

@pmnoxx pmnoxx commented Oct 28, 2021

We shouldn't be using non monotonous time with chrono::Now()

Closes #5096

Moving changes out of #5089

@pmnoxx pmnoxx self-assigned this Oct 28, 2021
@pmnoxx pmnoxx linked an issue Oct 28, 2021 that may be closed by this pull request
@pmnoxx pmnoxx marked this pull request as ready for review October 28, 2021 21:04
@pmnoxx pmnoxx requested a review from matklad October 28, 2021 21:05
@pmnoxx
Copy link
Contributor Author

pmnoxx commented Oct 28, 2021

@bowenwang1996 @mfornet can you take a look?

@matklad
Copy link
Contributor

matklad commented Oct 29, 2021

https://passcod.name/technical/no-time-for-chrono.html also feels relevant.

Can we go as far (in subsequent PRs) as removing chrono from Cargo.toml altogether? That is, as a system, do we care about calendar time at all? If not, I feel we should try to expunge chrono from Cargo.tomls, and, ideally, from Cargo.lock as well. Not caring about system time is nice!

chain/network/src/routing.rs Outdated Show resolved Hide resolved
@pmnoxx pmnoxx merged commit 1cb995f into near:master Oct 29, 2021
@pmnoxx pmnoxx deleted the piotr-remove-usage-of-chrono branch October 29, 2021 21:59
@pmnoxx
Copy link
Contributor Author

pmnoxx commented Oct 29, 2021

@matklad If you look at the documentation of Instant: https://doc.rust-lang.org/src/std/time.rs.html#241
It's mentioned that in practice instant is not-monotonic, and that can cause panics.

Maybe, we should switch back to chrono:Datatime<Utc>?

@matklad
Copy link
Contributor

matklad commented Nov 1, 2021

Yeah, that's a good point.

To clarify, the doc say that the underlying OS functionality doesn't always live up to its promise, and so, for that reason, on certain platforms Rust's stdlib tries to paper over this in user-space.

The problem with that strategy though is that, with virtualization, you don't actually know which platform you are running on top. Eg, x86 Linux implements monotonic time correctly, but if it is run in a VM in windows host, the windows can break things. This was observed in tokio-rs/tokio#3619 and is being fixed in rust-lang/rust#89926. Technically, that is a bug in Linux kernel and/or hypervisor implementations, but this indeed is observed in practice .

I don't think using chrono::DateTime is a solution here -- that is decidedly non-monotonic (as it changes when you change system time, or when ntpd makes adjustments), and is generally a wrong kind of time. We shouldn't be caring about timezones, etc. If we want to work-around it, I suggest to avoid using Instance::now/elapsed, and instead define our own wrappers which force the thing to be monotonic

pub fn now() -> Instance {
    // furious hand-waving
    static LAST: Mutex<Instance> = ...
    let now = Instance::now();
    let now = now.max(*LAST.lock());
    *LAST.lock() = now;
    now
}

Practically though, it seems that this problem shouldn't be something that we care about. That is, the amount of setups where time goes backwards despite stdlib's mitigations should be petty small, and, if time is unreliable in your setup, you should fix the setup, as we do rely on time significantly.

@bowenwang1996 do you remember more details about https://near.zulipchat.com/#narrow/stream/295302-dev-general/topic/general? Was it some production environment, or just some weird ad-hoc setup?

@pmnoxx
Copy link
Contributor Author

pmnoxx commented Nov 1, 2021

Yeah, that's a good point.

To clarify, the doc say that the underlying OS functionality doesn't always live up to its promise, and so, for that reason, on certain platforms Rust's stdlib tries to paper over this in user-space.

The problem with that strategy though is that, with virtualization, you don't actually know which platform you are running on top. Eg, x86 Linux implements monotonic time correctly, but if it is run in a VM in windows host, the windows can break things. This was observed in tokio-rs/tokio#3619 and is being fixed in rust-lang/rust#89926. Technically, that is a bug in Linux kernel and/or hypervisor implementations, but this indeed is observed in practice .

I don't think using chrono::DateTime is a solution here -- that is decidedly non-monotonic (as it changes when you change system time, or when ntpd makes adjustments), and is generally a wrong kind of time. We shouldn't be caring about timezones, etc. If we want to work-around it, I suggest to avoid using Instance::now/elapsed, and instead define our own wrappers which force the thing to be monotonic

pub fn now() -> Instance {
    // furious hand-waving
    static LAST: Mutex<Instance> = ...
    let now = Instance::now();
    let now = now.max(*LAST.lock());
    *LAST.lock() = now;
    now
}

Practically though, it seems that this problem shouldn't be something that we care about. That is, the amount of setups where time goes backwards despite stdlib's mitigations should be petty small, and, if time is unreliable in your setup, you should fix the setup, as we do rely on time significantly.

@bowenwang1996 do you remember more details about https://near.zulipchat.com/#narrow/stream/295302-dev-general/topic/general? Was it some production environment, or just some weird ad-hoc setup?

@matklad The main reason we use DateTime is because it has nice functions to convert time to unix timestamp, and back from it. I don't see any such function inside the same crate as Instant.

For some reason UNIX_EPOCH is available as SystemTime, but not as Instant. It would be easy to write down such conversion function if we had some way to convert UNIX_EPOCH to Instant, or do something equivalent like initializing Instant with value of UNIX_EPOCH.

Any ideas?

#[stable(feature = "time2", since = "1.8.0")] pub const UNIX_EPOCH: SystemTime = SystemTime(time::UNIX_EPOCH);

@pmnoxx
Copy link
Contributor Author

pmnoxx commented Nov 1, 2021

Yeah, that's a good point.

To clarify, the doc say that the underlying OS functionality doesn't always live up to its promise, and so, for that reason, on certain platforms Rust's stdlib tries to paper over this in user-space.

The problem with that strategy though is that, with virtualization, you don't actually know which platform you are running on top. Eg, x86 Linux implements monotonic time correctly, but if it is run in a VM in windows host, the windows can break things. This was observed in tokio-rs/tokio#3619 and is being fixed in rust-lang/rust#89926. Technically, that is a bug in Linux kernel and/or hypervisor implementations, but this indeed is observed in practice .

I don't think using chrono::DateTime is a solution here -- that is decidedly non-monotonic (as it changes when you change system time, or when ntpd makes adjustments), and is generally a wrong kind of time. We shouldn't be caring about timezones, etc. If we want to work-around it, I suggest to avoid using Instance::now/elapsed, and instead define our own wrappers which force the thing to be monotonic

pub fn now() -> Instance {
    // furious hand-waving
    static LAST: Mutex<Instance> = ...
    let now = Instance::now();
    let now = now.max(*LAST.lock());
    *LAST.lock() = now;
    now
}

Practically though, it seems that this problem shouldn't be something that we care about. That is, the amount of setups where time goes backwards despite stdlib's mitigations should be petty small, and, if time is unreliable in your setup, you should fix the setup, as we do rely on time significantly.

@bowenwang1996 do you remember more details about https://near.zulipchat.com/#narrow/stream/295302-dev-general/topic/general? Was it some production environment, or just some weird ad-hoc setup?

Here is a part of documentation from Instant:

/// https://doc.rust-lang.org/src/std/time.rs.html#60

/// Instants are opaque types that can only be compared to one another. There is
/// no method to get "the number of seconds" from an instant. Instead, it only
/// allows measuring the duration between two instants (or comparing two
/// instants).

It looks like you are meant to use instant just to compare a relative time, which is completely fine, but we need to be able to convert time, for example to unix timestamp and back. That's not what Instant is meant to do.

I think we can't simply replace Datetime with Instant for that reason. I'm not suggesting to keep DateTime either, but maybe there is some other way to represent time that doesn't have drawback of either of those. One valid solution would be do define our own type that wraps around the UNIX timestamp, or find some other library.

@matklad What do you think?

@bowenwang1996
Copy link
Collaborator

bowenwang1996 commented Nov 1, 2021

Yeah, that's a good point.

To clarify, the doc say that the underlying OS functionality doesn't always live up to its promise, and so, for that reason, on certain platforms Rust's stdlib tries to paper over this in user-space.

The problem with that strategy though is that, with virtualization, you don't actually know which platform you are running on top. Eg, x86 Linux implements monotonic time correctly, but if it is run in a VM in windows host, the windows can break things. This was observed in tokio-rs/tokio#3619 and is being fixed in rust-lang/rust#89926. Technically, that is a bug in Linux kernel and/or hypervisor implementations, but this indeed is observed in practice .

I don't think using chrono::DateTime is a solution here -- that is decidedly non-monotonic (as it changes when you change system time, or when ntpd makes adjustments), and is generally a wrong kind of time. We shouldn't be caring about timezones, etc. If we want to work-around it, I suggest to avoid using Instance::now/elapsed, and instead define our own wrappers which force the thing to be monotonic

pub fn now() -> Instance {
    // furious hand-waving
    static LAST: Mutex<Instance> = ...
    let now = Instance::now();
    let now = now.max(*LAST.lock());
    *LAST.lock() = now;
    now
}

Practically though, it seems that this problem shouldn't be something that we care about. That is, the amount of setups where time goes backwards despite stdlib's mitigations should be petty small, and, if time is unreliable in your setup, you should fix the setup, as we do rely on time significantly.

@bowenwang1996 do you remember more details about https://near.zulipchat.com/#narrow/stream/295302-dev-general/topic/general? Was it some production environment, or just some weird ad-hoc setup?

@matklad Are you sure the link is correct? It does not link to an issue related to time

@matklad
Copy link
Contributor

matklad commented Nov 2, 2021

@matklad
Copy link
Contributor

matklad commented Nov 2, 2021

For some reason UNIX_EPOCH is available as SystemTime, but not as Instant.

Yeah, that's what I am getting at: Instant doesn't have have this functionality in a fundamental way, and that is a feature. Basically, there are three kinds of time in computers:

  • Number of seconds since hardware timer started. This is monotonic time / std::time::Instant. It is reliably tracked by hardware, allows for measuring time intervals within the process precisely, but can't tell anything about time in the absolute terms. Hardware timer can't know how much time elapsed in total since the unix epoch. This time never goes backwards.
  • Number of seconds since unix epoch. This is absolute time / std::time::SystemTime. This can't be tracked purely by the hardware (as the clock would drift). The way it works is there's a system-level state which stores "current time the last time it was adjusted", and the hardware adds a relative time to that. The config can be changed by a human, or by other computers. Usually, when the computer boots, and then periodically, it sends network requests to time servers asking for time, and then trusts this info. This kind of time is significantly more complicated than the previous one -- the time can go backwards after upgrade, or the computer can be tricked into thinking wrong about time, if time servers are compromised. One extra bit of complexity is that this time, most of the time, is roughly the same on different computers, but this isn't reliable: there's always at least some amount of small drifts, and occasionally big diff due to misconfigurations.
  • Calendar date/wall clock time in the room where the computer is situated. This is chrono::Instant<TimeZone>. This time depends on the "time since epoch", plus the current physical location of the computer (to get the time zone), plus the time zone database (as the meaning of timezone itself changes over time, when, eg, government decides that daylight savings time is not a great idea, and then changes its mind back couple of years later). This time has all the complexity of SystemTime plus a lot of edge-cases related to human's notion of timezone plus fun things like leap seconds.

I feel relatively strongly that we need to stick to the simplest model of time we can, that is, to Instant or, if we must, to SystemTime.

but we need to be able to convert time, for example to unix timestamp and back

Yeah, if we need that, we do need SystemTime in those parts that use such functionality. Could you point where we need this, though? I want to understand whether we fundamentally need to know absolute time, or whether this is something which can be replaced by Instant? Ie, here:

/// Timestamp at which the block was built (number of non-leap-nanoseconds since January 1, 1970 0:00:00 UTC).
pub timestamp: u64,
/// Hash of the next epoch block producers set

we definitely want timestamp (so, SystemTime), because we want to coordinate this value across different machines. Though, I'd say we should be using SystemTime, and not chrono, as the source of time there.

But for, eg

pub first_seen: u64,
pub last_seen: u64,
}

it seems that we want to use Instant, as that is machine-local state.

@matklad
Copy link
Contributor

matklad commented Nov 2, 2021

Oh, while looking at it, I found a case where we very well may panic:

let now = Utc::now();
let mut to_remove = vec![];
for (peer_id, peer_status) in self.peer_states.iter() {
let diff = (now - peer_status.last_seen()).to_std()?;
if peer_status.status != KnownPeerStatus::Connected

Here, the ? will return an Error if the difference is negative. But Utc::now() can return time in the past, and that can happen during the normal operation, is system time is adjusted just before the call. Practically, I think systems try relatively hard to make system time monotonic, but that's best effort, and can definitely break in practice if you, eg, manually adjust the clock.

If this code returns an error, we will panic at the call-site, where we unwrap the thing.

@pmnoxx
Copy link
Contributor Author

pmnoxx commented Nov 2, 2021

@matklad Some data structures that we store in DB have timestamps. That's the main reason why we can't use instant.


#[derive(Debug, Clone)]
pub struct KnownPeerState {
    pub peer_info: PeerInfo,
    pub status: KnownPeerStatus,
    pub first_seen: u64,
    pub last_seen: u64,
}

    pub fn peer_connected(
        &mut self,
        peer_info: &PeerInfo,
    ) -> Result<(), Box<dyn std::error::Error>> {
        self.add_trusted_peer(peer_info.clone(), TrustLevel::Signed)?;
        let entry = self.peer_states.get_mut(&peer_info.id).unwrap();
        entry.last_seen = to_timestamp(Utc::now());
        entry.status = KnownPeerStatus::Connected;
        let mut store_update = self.store.store_update();
        store_update.set_ser(ColPeers, &peer_info.id.try_to_vec()?, entry)?;
        store_update.commit().map_err(|err| err.into())
    }

@pmnoxx
Copy link
Contributor Author

pmnoxx commented Nov 2, 2021

For some reason UNIX_EPOCH is available as SystemTime, but not as Instant.

Yeah, that's what I am getting at: Instant doesn't have have this functionality in a fundamental way, and that is a feature. Basically, there are three kinds of time in computers:

  • Number of seconds since hardware timer started. This is monotonic time / std::time::Instant. It is reliably tracked by hardware, allows for measuring time intervals within the process precisely, but can't tell anything about time in the absolute terms. Hardware timer can't know how much time elapsed in total since the unix epoch. This time never goes backwards.
  • Number of seconds since unix epoch. This is absolute time / std::time::SystemTime. This can't be tracked purely by the hardware (as the clock would drift). The way it works is there's a system-level state which stores "current time the last time it was adjusted", and the hardware adds a relative time to that. The config can be changed by a human, or by other computers. Usually, when the computer boots, and then periodically, it sends network requests to time servers asking for time, and then trusts this info. This kind of time is significantly more complicated than the previous one -- the time can go backwards after upgrade, or the computer can be tricked into thinking wrong about time, if time servers are compromised. One extra bit of complexity is that this time, most of the time, is roughly the same on different computers, but this isn't reliable: there's always at least some amount of small drifts, and occasionally big diff due to misconfigurations.
  • Calendar date/wall clock time in the room where the computer is situated. This is chrono::Instant<TimeZone>. This time depends on the "time since epoch", plus the current physical location of the computer (to get the time zone), plus the time zone database (as the meaning of timezone itself changes over time, when, eg, government decides that daylight savings time is not a great idea, and then changes its mind back couple of years later). This time has all the complexity of SystemTime plus a lot of edge-cases related to human's notion of timezone plus fun things like leap seconds.

I feel relatively strongly that we need to stick to the simplest model of time we can, that is, to Instant or, if we must, to SystemTime.

but we need to be able to convert time, for example to unix timestamp and back

Yeah, if we need that, we do need SystemTime in those parts that use such functionality. Could you point where we need this, though? I want to understand whether we fundamentally need to know absolute time, or whether this is something which can be replaced by Instant? Ie, here:

/// Timestamp at which the block was built (number of non-leap-nanoseconds since January 1, 1970 0:00:00 UTC).
pub timestamp: u64,
/// Hash of the next epoch block producers set

we definitely want timestamp (so, SystemTime), because we want to coordinate this value across different machines. Though, I'd say we should be using SystemTime, and not chrono, as the source of time there.

This struct is part of our protocol. We need to serialize it and send it over the network. SystemTime doesn't implement BorshSerialize, BorshDeserialize, Serialize; therefore it can't be used.


#[derive(BorshSerialize, BorshDeserialize, Serialize, Debug, Clone, Eq, PartialEq)]
pub struct BlockHeaderInnerLite {
...
    /// Timestamp at which the block was built (number of non-leap-nanoseconds since January 1, 1970 0:00:00 UTC).
    pub timestamp: u64,
...
}```

Although I agree, we shouldn't have timestamp as part of our protocol. 

@matklad
Copy link
Contributor

matklad commented Nov 3, 2021

Some data structures that we store in DB have timestamps

Doh, I suspected something like this, and specifically looked for it, and somehow still missed #[BorsSerialize] 🤦 Thanks for pointing this out.

Yeah, so this is the case where we need SystemTime. We still don't need chrono here, as we don't care about timezones.

SystemTime doesn't implement BorshSerialize, BorshDeserialize, Serialize; therefore it can't be used.

SystemTime does implement Serialize:

https://docs.rs/serde/1.0.130/src/serde/ser/impls.rs.html#611-625

It can implement BorshSerialize as well, although that would need to use u128, rather than u64, as Durations can hold more than u64 nanos, and panicking/truncating would be wrong for the borsh-style library. It seems that we are just using u64

So my current understanding is:

  • for in-process parts which don't involve serialization/IPC we should prefer monotonic time Instant.
  • for parts which are stored, send to other processes or serialized in other way, we should prefer SystemTime, but be careful with monotonicity assumptions
  • we don't need timezones, so we don't need chrono.

@matklad matklad mentioned this pull request Nov 3, 2021
pmnoxx added a commit that referenced this pull request Nov 20, 2021
* Replace chrono::now() with Instant::now()

* refactor

* refactor

Co-authored-by: Piotr Mikulski <piotr@near.org>
@pmnoxx pmnoxx added A-network Area: Network C-enhancement Category: An issue proposing an enhancement or a PR with one. C-housekeeping Category: Refactoring, cleanups, code quality P-low Priority: low labels Dec 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-network Area: Network C-enhancement Category: An issue proposing an enhancement or a PR with one. C-housekeeping Category: Refactoring, cleanups, code quality P-low Priority: low
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop using non monotous chrono::DateTime
4 participants