Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent datetime.*.strftime() year padding behavior across %y, %Y, %F and %C #122272

Closed
mgorny opened this issue Jul 25, 2024 · 21 comments
Closed
Assignees
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes type-bug An unexpected behavior, bug, or error

Comments

@mgorny
Copy link
Contributor

mgorny commented Jul 25, 2024

Bug report

Bug description:

#120713 changed the way %y and %Y format works with strftime() to pad "short" years with leading zeros. However, the %F and %C formats were left alone, creating inconsistency:

>>> from datetime import date
>>> date(909, 9, 9).strftime("%Y")
'0909'
>>> date(909, 9, 9).strftime("%y")
'09'
>>> date(909, 9, 9).strftime("%C")
'9'
>>> date(909, 9, 9).strftime("%F")
'909-09-09'

While I don't have a strong opinion about %C, the change to %Y now means that %Y-%m-%d no longer matches %F.

Notably, this change broke the logic in Django that assumed that if %Y returns a zero-padded year number, no padding needs to be done for all formats:

/~https://github.com/django/django/blob/0e94f292cda632153f2b3d9a9037eb0141ae9c2e/django/utils/formats.py#L266-L273

I will report a bug there as well, but I think it would be beneficial to preserve consistency here.

Django bug: https://code.djangoproject.com/ticket/35630

CC @blhsing @martinvuyk

CPython versions tested on:

3.13, CPython main branch

Operating systems tested on:

Linux

Linked PRs

@mgorny mgorny added the type-bug An unexpected behavior, bug, or error label Jul 25, 2024
@martinvuyk
Copy link

Hi,

>>> date(909, 9, 9).strftime("%y")
'09'

seems consistent with %y | Year without century as a zero-padded decimal number. | 00, 01, …, 99 from the docs

I can't find any mention of "%C" or "%F" on the docs though

@mgorny
Copy link
Contributor Author

mgorny commented Jul 25, 2024

The doc has:

The following is a list of all the format codes that the 1989 C standard requires, and these work on all platforms with a standard C implementation.

%F is C99, and %C is POSIX. It's supported by a vast majority (if not all) platforms supported by CPython, and projects (such as Django) clearly are relying on them.

@martinvuyk
Copy link

martinvuyk commented Jul 25, 2024

I'm not a contributor or a member of the Python community really so I don't know their rules etc. So take this as only my words.

To me this seems like padding implementation problems for the libc standard, and if it's not in the documentation it shouldn't be assumed to be supported. And if you go around what is explicitly supported and the underlying C implementation is wrong it shouldn't be padded over, otherwise we could be discussing issues of wrong implementations of the C standards forever for every edge case where a developer decided to do it their own way and bypass what is explicitly supported by Python.

And in the specific case of %F and %C, that functionality is covered by %Y and %y as you yourself pointed out. And realistically people that use Python only use the latter; also, many years have passed and nobody reported a bug with %F or %C. I only found the problem for %Y because I was testing for bugs for my own implementation in another language and I was using Python as a reference (using years less than a 1000 is very uncommon).

@mgorny
Copy link
Contributor Author

mgorny commented Jul 25, 2024

To me this seems like padding implementation problems for the libc standard, and if it's not in the documentation it shouldn't be assumed to be supported. And if you go around what is explicitly supported and the underlying C implementation is wrong it shouldn't be padded over, otherwise we could be discussing issues of wrong implementations of the C standards forever for every edge case where a developer decided to do it their own way and bypass what is explicitly supported by Python.

This argument is not internally consistent. Either you believe Python should just follow the C89 standard, in which case it shouldn't be padding things over, or it shouldn't, in which case there is no reason not to be consistent. The way things are right now is the worst outcome possible, it goes against the principle of the least surprise, it was changed very late in the beta testing phase of Python 3.13 and it actually breaks high-profile projects (I dare say Django is one).

And in the specific case of %F and %C, that functionality is covered by %Y and %y as you yourself pointed out.

I don't see how %C is covered by them.

And realistically people that use Python only use the latter;

This is a bold claim that you're making. "No bug reported" is hardly any evidence. Did you actually, say, grep Python projects from PyPI? AFAIR they had archives of "top N projects" for that purpose.

also, many years have passed and nobody reported a bug with %F or %C. I only found the problem for %Y because I was testing for bugs for my own implementation in another language and I was using Python as a reference (using years less than a 1000 is very uncommon).

As the linked Django code points out, the problem has been reported back in 2011.

@thesamesam
Copy link
Contributor

cc @serhiy-storchaka and @erlend-aasland as reviewers too

@martinvuyk
Copy link

I dare say Django is one

Yes, Django is important.

I don't see how %C is covered by them.

Yeah, misread some docs. It's different

This is a bold claim that you're making. "No bug reported" is hardly any evidence. Did you actually, say, grep Python projects from PyPI? AFAIR they had archives of "top N projects" for that purpose.

Tone down your way of writing. This is a civilized place. I did not, I could have.

Anyway, In My Oppinion It is not in the Python docs. It is not explicitly supported.

And this code is the exact example of what I mean that users of a language should patch when they bypass what is explicitly stated to be supported and use platform specific hacks. It quite literally says it in the docstrings, this is a Django issue since they were patching over Python errors. Python fixed one of those errors, they should just patch their code plain and simple (In My Oppinion).

def sanitize_strftime_format(fmt):
    """
    Ensure that certain specifiers are correctly padded with leading zeros.

    For years < 1000 specifiers %C, %F, %G, and %Y don't work as expected for
    strftime provided by glibc on Linux as they don't pad the year or century
    with leading zeros. Support for specifying the padding explicitly is
    available, however, which can be used to fix this issue.

    FreeBSD, macOS, and Windows do not support explicitly specifying the
    padding, but return four digit years (with leading zeros) as expected.

    This function checks whether the %Y produces a correctly padded string and,
    if not, makes the following substitutions:

    - %C → %02C
    - %F → %010F
    - %G → %04G
    - %Y → %04Y

    See https://bugs.python.org/issue13305 for more details.
    """
    if datetime.date(1, 1, 1).strftime("%Y") == "0001":
        return fmt
    mapping = {"C": 2, "F": 10, "G": 4, "Y": 4}
    return re.sub(
        r"((?:^|[^%])(?:%%)*)%([CFGY])",
        lambda m: r"%s%%0%s%s" % (m[1], mapping[m[2]], m[2]),
        fmt,
    )

@eli-schwartz
Copy link
Contributor

Tone down your way of writing. This is a civilized place. I did not, I could have.

Can you clarify what you feel is uncivilized enough about that comment to be worth reprimanding someone for inappropriate behavior? Maybe I'm missing something.

@eli-schwartz
Copy link
Contributor

eli-schwartz commented Jul 25, 2024

Notably, this change broke the logic in Django that assumed that if %Y returns a zero-padded year number, no padding needs to be done for all formats:

/~https://github.com/django/django/blob/0e94f292cda632153f2b3d9a9037eb0141ae9c2e/django/utils/formats.py#L266-L273

I will report a bug there as well, but I think it would be beneficial to preserve consistency here.

It looks like the change was also backported to 3.12 which means it will be breaking Django in a point release of an existing stable python version as well? This seems a lot more worrisome than a change in behavior "late in a beta cycle".

@martinvuyk
Copy link

Can you clarify what you feel is uncivilized enough about that comment to be worth reprimanding someone for inappropriate behavior? Maybe I'm missing something.

This is a bold claim that you're making. "No bug reported" is hardly any evidence.

Easy to say and then provide no contrary evidence for saying something that is not part of the Python documentation is often used. Just throw an argumentative fallacy out there.

Did you actually, say, grep Python projects from PyPI?

Unnecessary sarcastic remark , say, for something that he also clearly didn't do. Otherwise he would've used the results as evidence.

Or maybe I'm reading too much into it ?

It looks like the change was also backported to 3.12 which means it will be breaking Django in a point release of an existing stable python version as well? This seems a lot more worrisome than a change in behavior "late in a beta cycle".

I don't know what Python releases entail, and I don't know the extent and how you cover platform specific compatibility for issues like this, as I clearly stated that it's my opinion and I'm not privy to the Python community's ways. But yeah it is important to revert this change if this breaks things in production for 3.12

@eli-schwartz
Copy link
Contributor

Or maybe I'm reading too much into it ?

Yes, you're reading too much into it. He's using a single result that he uncovered as evidence that it happened at least once, and you responded to an existing report of people doing it, by saying people don't realistically do it.

There is an existing report of breakage. It's not an "argumentative fallacy" to claim that the burden of proof is on people who want to make a change (in this case: who want to modify the padding behavior as present in a decade of released python versions, to behave differently in 3.13) to prove that the change is good -- the status quo never has the burden of proof.

But that's really not my point. Calling someone uncivilized is something you should only do when you are quite certain the accusation is correct. Calling someone uncivilized and then being wrong with your accusation, is, well, "uncivilized behavior" and "false accusations". I would feel more comfortable if people don't accuse others of being uncivilized if they are concerned they may be reading too much into a phrase.

I don't think "made use of a logical fallacy", on its own, constitutes sufficient evidence (whether it is or isn't a fallacy), and I don't think reading into a synonym for the phrase "for example" constitutes sufficient evidence either.

@eli-schwartz
Copy link
Contributor

eli-schwartz commented Jul 25, 2024

But yeah it is important to revert this change if this breaks things in production for 3.12

It doesn't do so yet, because it was backported almost a month after the release of 3.12.4, so we have until 3.12.5 (roughly two weeks) before this hits django users in the wild (and also: distro packaging unittests for django).

@martinvuyk
Copy link

Yes, you're reading too much into it.

Very well put together, thank you. I shouldn't have used that word or assumed so much, sorry.

we have until 3.12.5 (roughly two weeks) before this hits django users in the wild

I see PR #121145 touches quite a few files and I've never contributed to this repo and don't have the time to get deep into it this week or weekend (no idea if you have a revert mechanism either). Do you know how to @eli-schwartz of do we need to wait for @blhsing ?

@serhiy-storchaka
Copy link
Member

Thank you for your report, @mgorny. This is definitely omission.

%F should be equivalent to %Y-%m-%d by definition. %C in documented in the Linux manpage as "The century number (year/100) as a 2-digit integer." and in the Posix documentation as "Replaced by the year divided by 100 and truncated to an integer, as a decimal number [00,99]." So both should be zero-padded. The date utility also outputs both with leading zeroes:

$ date -d 1-2-3 +%F
0001-02-03
$ date -d 1-2-3 +%C
00

What should we do practically?

  • For 3.12, it is safer to revert the change. It no longer looks so simple and harmless. This is an old behavior, so these that care already have workarounds. Later we should apply documentation changes, add notes that the behavior is platform depended.
  • For main we should complete the change. In long term it is better if the behavior is consistent between platforms and users do not need to create workarounds. This change should be considered as a new feature and advertised correspondingly: versionchanged directives, What's New entry.
  • As for 3.13, I am not sure. This change can still be considered a 3.13 feature or be postponed to 3.14.

cc @pablogsal, @Yhg1s

@serhiy-storchaka serhiy-storchaka added release-blocker 3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Jul 25, 2024
@serhiy-storchaka serhiy-storchaka self-assigned this Jul 25, 2024
@eli-schwartz
Copy link
Contributor

@serhiy-storchaka thanks for the excellent summary of practical things to do. I have a small inclination to prefer postponing it to 3.14 since we are already past the very last beta ever -- the next cpython release will be release candidate 1, and is supposed to ideally be fully representative of the exact bytes installed by the final release. I think it's a bad idea to make further changes that need to be tested, but can only be tested in the release candidate, since at that point we kinda lost the chance to make any further changes without messing people up.

If this hadn't been the very last beta then maybe it would be worth trying one last time to fix it for 3.13. But I would say that in between the final beta and the first release candidate, is when anything that's still an issue should be reverted for the sake of having the best release candidate we can.

@blhsing
Copy link
Contributor

blhsing commented Jul 26, 2024

Many thanks to @mgorny for the report and to @serhiy-storchaka for the excellent summary and for taking on the task. Sorry for replying late, but I'll gladly fix it too if you haven't already started working on it.

@encukou
Copy link
Member

encukou commented Jul 29, 2024

@serhiy-storchaka, will you be able to fix this by tomorrow's 3.13.0rc1?

@serhiy-storchaka
Copy link
Member

Sure.

@serhiy-storchaka
Copy link
Member

Sorry for replying late, but I'll gladly fix it too if you haven't already started working on it.

Go ahead! After reverting the changes in 3.12 and 3.13 we will not have a time pressure.

@blhsing
Copy link
Contributor

blhsing commented Jul 30, 2024

Go ahead! After reverting the changes in 3.12 and 3.13 we will not have a time pressure.

Sure. I'll proceed with a fix then.

@blhsing
Copy link
Contributor

blhsing commented Jul 30, 2024

@serhiy-storchaka Can you help review the PR? Thanks!

Note that I don't think we should explicitly document the support for %C and %F unless we plan on supporting all C99 specifiers. In this PR I've made datetime.strftime handle %C and %F only when the C library supports them.

@encukou
Copy link
Member

encukou commented Jul 30, 2024

With the 3.13 & 3.12 reverts, this is not a release blocker any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes type-bug An unexpected behavior, bug, or error
Projects
Archived in project
Development

No branches or pull requests

7 participants