-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlite3: optionally autoconvert table_info's DATETIME fields #79326
Comments
Currently, fields are converted to datetime as described in https://docs.python.org/3/library/sqlite3.html#sqlite3.PARSE_COLNAMES : 'select x as "x [datetime]" from table' In my use case I don't know the names and locations of the datetime fields in advance. So I would need to do pandas.read_sql_query('select * from table', con) and get correct datetime columns. The type info has to be available in the sqlite database, because I see that SQLiteStudio correctly detects the datetime columns. |
(In fact, I am currently taking the first non-missing entry of each text column and trying to dateutil.parser.parse it. If that works, I use pandas.to_datetime on the column.) |
According to the sqlite documentation, there's no fundamental datetime type in sqlite: https://www.sqlite.org/datatype3.html#date_and_time_datatype
If you have an arbitrary database whose schema you don't know, I'm not sure it would be possible to automatically determine that it's a datetime, though it appears that Python already provides this functionality by exposing the converters "date" and "timestamp" ( https://docs.python.org/3/library/sqlite3.html#default-adapters-and-converters ) If you don't know the schema you can't be 100% accurate on which columns are datetime, but apparently datetime types that are text will be of the format "YYYY-MM-DD HH:MM:SS.SSS", which is a variant of iso8601, REAL columns will be Julian day numbers and integers will be epoch time. If you assume that all your datetime columns will be TEXT and that any TEXT column that happens to be a valid date of is a datetime column, then you can either use: datetime.strftime(text_column, "%Y-%m-%d %H:%M:%S.%f") Or if you want to be faster and less strict (this will allow several other variations on ISO 8601): datetime.fromisoformat(text_column) I would not recommend using |
Thank you, Paul, for your hints on sniffing. I have now asked at SqliteStudio for confirmation that they also do content sniffing to detect datetime text fields: pawelsalawa/sqlitestudio#3449 . So in this issue I am suggesting to embed this datetime text sniffing into sqlite3, for optional activation. Perhaps a new flag value of detect_types=DETECT_DATETIME to the sqlite3.connect() function would be suitable? |
For my part, we could close this issue just because I am sure they sniff the format of the string. You can read this doc about the read_sql https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html Now, sqlite3 does not support the datetime type and I think it is not the job of Python to do this operation but the caller (your code) but you can see the example in Doc/includes/sqlite3/pysqlite_datetime.py import sqlite3
import datetime
con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
cur = con.cursor()
cur.execute("create table test(d date, ts timestamp)")
today = datetime.date.today()
now = datetime.datetime.now()
cur.execute("insert into test(d, ts) values (?, ?)", (today, now))
cur.execute("select d, ts from test")
row = cur.fetchone()
print(today, "=>", row[0], type(row[0]))
print(now, "=>", row[1], type(row[1]))
cur.execute('select current_date as "d [date]", current_timestamp as "ts [timestamp]"')
row = cur.fetchone()
print("current_date", row[0], type(row[0]))
print("current_timestamp", row[1], type(row[1])) 2018-11-04 => 2018-11-04 <class 'datetime.date'>
2018-11-04 12:58:01.399669 => 2018-11-04 12:58:01.399669 <class 'datetime.datetime'>
current_date 2018-11-04 <class 'datetime.date'>
current_timestamp 2018-11-04 11:58:01 <class 'datetime.datetime'> |
Stéphane, your example does not match my use case with unknown table structure. Does everyone agree that such a sniffing is out of scope for sqlite3? |
ok in this case, I re-open the issue. |
I have created the attached sqlite database file (with SQLiteStudio) for testing. I contains a table For some reason SQLiteStudio manages to know these types (although the columns contain the same value, and even after renaming the file and reconnecting to it), so sqlite3 should also be able to detect them. |
With regards to automatically deducing column types, I am -1 on that. It has way too many dangerous edge cases, and I know of countless bugs in software that are the direct result of stringly-typed data being coerced into a specific data type based on its form. For example, if you have an excel column of alphanumeric strings and happen to get one that looks like "13943E1", that will be coerced (lossily) into the number 139430.0, because it happens to take the form of exponential notation.
It sounds to me like SQLiteStudio is doing the same thing that Python is doing, by extending the "types" available to include a DATETIME. Presumably they do it the same way, with "datetime" in the column name. If that's the case then it's just a mismatch between what they call their datetime adapter and what python does, and you just need to rename the relevant columns to say "timestamp" instead of "datetime". As an aside, this is what we get from people not offering real datetime types in their standards. JSON has the exact same problem - people need to transmit datetimes but there's no way to do it in the standard, so everyone extends the standard in a sightly different way and you end up with a bunch of incompatible ways of storing datetimes as strings. Bah humbug. |
Paul, the sniffing would be only active for people who explicitly add a connect() argument like detect_types=DETECT_DATETIME, which can have the appropriate warning in the docs. You can also extend the sniffing to all values, not just the first non-missing one. (I'd gladly pay the computational price.) How many cases are there where a column is full of (I'm of course not suggesting to try detecting REAL as Julian day, or INTEGER as Unix Time.) Forget about my test file, by the way:
For my real use case however, those two statements yield empty results :-( |
Here is now a reduced version of my real use case: 2017v2-reduced.db. The file was originally created with "Oracle DBConvert". This file contains a table without rows.
This also works with the original file from "Oracle DBConvert". Wouldn't it make sense for sqlite3 to use this information, e.g. when connect() is called with something like If yes, then I suggest changing this issue's title to '"SELECT *" should optionally autoconvert DATETIME fields if found in DDL'. The question is of course how much sqlite generating software creates this metadata. But Oracle is certainly an important actor. And sqlite3 itself could also save this DDL instead of extending the field names (which seems more hacky). |
So my workaround is of course
I would be interested in creating a pull request for the |
Hi @jondo, Of course you can submit a PR and we can help you with a review if we have time because we are volunteers and it's in function of our free time. Have a nice day, |
Sorry for not providing a pull request yet. By the way, the "DB Browser for SQLite" (https://sqlitebrowser.org/) also interprets DATETIME fields correctly. (Could someone please change the "Stage" of this issue to "needs patch"?) |
As per discussion on Discourse, we will deprecate the built-in, default adapters and converters. Thus, this issue is now superseded by gh-90016. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: