-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: to_datetime
in Pandas 2
#24952
fix: to_datetime
in Pandas 2
#24952
Conversation
to_datetime
in Pandas 2
Codecov Report
@@ Coverage Diff @@
## master #24952 +/- ##
===========================================
- Coverage 69.03% 58.51% -10.53%
===========================================
Files 1905 1905
Lines 74136 74136
Branches 8212 8212
===========================================
- Hits 51181 43380 -7801
- Misses 20832 28633 +7801
Partials 2123 2123
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 293 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
2.0.3 and the behavior of ``pd.to_datetime`` changed. | ||
""" | ||
df = pd.DataFrame({"__time": ["2017-07-01T00:00:00.000Z"]}) | ||
assert ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betodealmeida why is this check needed? As expected there's nothing up Pandas's sleeve when you create the DataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not needed, it's just to show what's in the dataframe if anyone's reading at this test. I'm happy to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @betodealmeida for the fix. Apart from the possibly unnecessary assert this LGTM.
"possibly"? You mean "totally". 😆 |
@john-bodley I removed the 3.0 label because the Pandas upgrade is not in 3.0. |
(cherry picked from commit 41ca4a0)
🏷️ preset:2023.31 |
SUMMARY
The recent upgrade on Pandas from 1.5.3 to 2.0.3 broke some features. In this PR we fix a regression where a column with a full ISO timestamp (
2017-07-01T00:00:00.000Z
) can't be configured with the Python date format "%Y-%m-%d" because in Pandas 2.0 the format should match the whole string by default.I changed the relevant calls to
to_datetime
to haveexact=False
, to preserve the behavior from Pandas 1.5.3.One additional problem is that we had
errors="coerce"
in the call, which returnsNaN
if something goes wrong. I changed to to"raise"
, since I think it's a better experience. Otherwise the user has no idea of why their data is missing.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
Added a test covering the regression.
ADDITIONAL INFORMATION