sqlite3: optionally autoconvert table_info's DATETIME fields #79326

jondo · 2018-11-02T12:16:24Z

BPO	35145
Nosy	@berkerpeksag, @matrixise, @pganssle, @jondo
Files	SQLiteStudio-3449.sqlite 2017v2-reduced.db

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2018-11-02.12:16:24.101>
labels = ['3.8', 'type-bug', 'library']
title = "sqlite3: optionally autoconvert table_info's DATETIME fields"
updated_at = <Date 2019-07-25.11:30:28.046>
user = '/~https://github.com/jondo'

bugs.python.org fields:

activity = <Date 2019-07-25.11:30:28.046>
actor = 'jondo'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2018-11-02.12:16:24.101>
creator = 'jondo'
dependencies = []
files = ['47905', '47911']
hgrepos = []
issue_num = 35145
keywords = []
message_count = 14.0
messages = ['329128', '329129', '329157', '329214', '329225', '329226', '329227', '329291', '329309', '329319', '329350', '330393', '335544', '348435']
nosy_count = 5.0
nosy_names = ['ghaering', 'berker.peksag', 'matrixise', 'p-ganssle', 'jondo']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'resolved'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue35145'
versions = ['Python 3.8']

jondo · 2018-11-02T12:16:24Z

Currently, fields are converted to datetime as described in https://docs.python.org/3/library/sqlite3.html#sqlite3.PARSE_COLNAMES :

'select x as "x [datetime]" from table'

In my use case I don't know the names and locations of the datetime fields in advance. So I would need to do pandas.read_sql_query('select * from table', con) and get correct datetime columns.
(My current workaround is try calling pandas.to_datetime on each text column afterwards.)

The type info has to be available in the sqlite database, because I see that SQLiteStudio correctly detects the datetime columns.

jondo · 2018-11-02T12:25:15Z

(In fact, I am currently taking the first non-missing entry of each text column and trying to dateutil.parser.parse it. If that works, I use pandas.to_datetime on the column.)

pganssle · 2018-11-02T20:00:12Z

According to the sqlite documentation, there's no fundamental datetime type in sqlite: https://www.sqlite.org/datatype3.html#date_and_time_datatype

SQLite does not have a storage class set aside for storing dates
and/or times. Instead, the built-in Date And Time Functions of SQLite
are capable of storing dates and times as TEXT, REAL, or INTEGER values

If you have an arbitrary database whose schema you don't know, I'm not sure it would be possible to automatically determine that it's a datetime, though it appears that Python already provides this functionality by exposing the converters "date" and "timestamp" ( https://docs.python.org/3/library/sqlite3.html#default-adapters-and-converters )

If you don't know the schema you can't be 100% accurate on which columns are datetime, but apparently datetime types that are text will be of the format "YYYY-MM-DD HH:MM:SS.SSS", which is a variant of iso8601, REAL columns will be Julian day numbers and integers will be epoch time.

If you assume that all your datetime columns will be TEXT and that any TEXT column that happens to be a valid date of is a datetime column, then you can either use:

    datetime.strftime(text_column, "%Y-%m-%d %H:%M:%S.%f")

Or if you want to be faster and less strict (this will allow several other variations on ISO 8601):

    datetime.fromisoformat(text_column)

I would not recommend using dateutil.parser.parse, as the dateutil parser is intended for taking something you know to be a string representing a datetime and getting you a datetime object from it. It is not designed to tell you whether something is or is not a datetime.

jondo · 2018-11-04T00:13:41Z

Thank you, Paul, for your hints on sniffing. I have now asked at SqliteStudio for confirmation that they also do content sniffing to detect datetime text fields: pawelsalawa/sqlitestudio#3449 .

So in this issue I am suggesting to embed this datetime text sniffing into sqlite3, for optional activation. Perhaps a new flag value of detect_types=DETECT_DATETIME to the sqlite3.connect() function would be suitable?

matrixise · 2018-11-04T12:00:04Z

For my part, we could close this issue just because I am sure they sniff the format of the string.
If you use sqliteman you get a TEXT and not "datetime"

You can read this doc about the read_sql https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html

Now, sqlite3 does not support the datetime type and I think it is not the job of Python to do this operation but the caller (your code) but you can see the example in Doc/includes/sqlite3/pysqlite_datetime.py

import sqlite3
import datetime

con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
cur = con.cursor()
cur.execute("create table test(d date, ts timestamp)")

today = datetime.date.today()
now = datetime.datetime.now()

cur.execute("insert into test(d, ts) values (?, ?)", (today, now))
cur.execute("select d, ts from test")
row = cur.fetchone()
print(today, "=>", row[0], type(row[0]))
print(now, "=>", row[1], type(row[1]))

cur.execute('select current_date as "d [date]", current_timestamp as "ts [timestamp]"')
row = cur.fetchone()
print("current_date", row[0], type(row[0]))
print("current_timestamp", row[1], type(row[1]))

2018-11-04 => 2018-11-04 <class 'datetime.date'>
2018-11-04 12:58:01.399669 => 2018-11-04 12:58:01.399669 <class 'datetime.datetime'>
current_date 2018-11-04 <class 'datetime.date'>
current_timestamp 2018-11-04 11:58:01 <class 'datetime.datetime'>

jondo · 2018-11-04T13:25:07Z

Stéphane, your example does not match my use case with unknown table structure.

Does everyone agree that such a sniffing is out of scope for sqlite3?
That's not what I'd call "batteries included" :-(

matrixise · 2018-11-04T13:49:35Z

ok in this case, I re-open the issue.

jondo · 2018-11-05T11:49:15Z

I have created the attached sqlite database file (with SQLiteStudio) for testing. I contains a table t with a TEXT column x and a DATETIME (according to SQLiteStudio) column y. The table contains a single row with the value 2018-11-05 12:20:30 for both columns.

For some reason SQLiteStudio manages to know these types (although the columns contain the same value, and even after renaming the file and reconnecting to it), so sqlite3 should also be able to detect them.

pganssle · 2018-11-05T16:56:49Z

With regards to automatically deducing column types, I am -1 on that. It has way too many dangerous edge cases, and I know of countless bugs in software that are the direct result of stringly-typed data being coerced into a specific data type based on its form. For example, if you have an excel column of alphanumeric strings and happen to get one that looks like "13943E1", that will be coerced (lossily) into the number 139430.0, because it happens to take the form of exponential notation.

I contains a table t with a TEXT column x and a DATETIME (according to SQLiteStudio) column y.

It sounds to me like SQLiteStudio is doing the same thing that Python is doing, by extending the "types" available to include a DATETIME. Presumably they do it the same way, with "datetime" in the column name. If that's the case then it's just a mismatch between what they call their datetime adapter and what python does, and you just need to rename the relevant columns to say "timestamp" instead of "datetime".

As an aside, this is what we get from people not offering real datetime types in their standards. JSON has the exact same problem - people need to transmit datetimes but there's no way to do it in the standard, so everyone extends the standard in a sightly different way and you end up with a bunch of incompatible ways of storing datetimes as strings. Bah humbug.

jondo · 2018-11-05T21:47:22Z

Paul, the sniffing would be only active for people who explicitly add a connect() argument like detect_types=DETECT_DATETIME, which can have the appropriate warning in the docs. You can also extend the sniffing to all values, not just the first non-missing one. (I'd gladly pay the computational price.) How many cases are there where a column is full of YYYY-MM-DD HH:MM:SS.SSS data, DETECT_DATETIME is switched on, and the user doesn't want this column to be interpreted as datetime?

(I'm of course not suggesting to try detecting REAL as Julian day, or INTEGER as Unix Time.)

Forget about my test file, by the way:
I have now found out that I can extract the DATETIME type information from my test file SQLiteStudio-3449.sqlite in the following two ways:

In [3]: pd.read_sql_query('pragma table_info("t")', con)
Out[3]: 
   cid name      type  notnull dflt_value  pk
0    0    x      TEXT        0       None   0
1    1    y  DATETIME        0       None   0

In [4]: pd.read_sql_query('SELECT SQL FROM sqlite_master WHERE name = "t"', con)
Out[4]: 
                                   sql
0  CREATE TABLE t (x TEXT, y DATETIME)

For my real use case however, those two statements yield empty results :-(

jondo · 2018-11-06T09:43:06Z

Here is now a reduced version of my real use case: 2017v2-reduced.db. The file was originally created with "Oracle DBConvert". This file contains a table without rows.
I was wrong in my last comment: There's also no sniffing needed here, see either of the following useful outputs:

In [5]: pd.read_sql_query('PRAGMA table_info("t2")', con)['type']
Out[5]: 
0       DATETIME
1    VARCHAR (3)
Name: type, dtype: object

In [6]: pd.read_sql_query('SELECT SQL FROM sqlite_master WHERE name = "t2"', con).iloc[0,0]
Out[6]: 'CREATE TABLE t2 (localtime DATETIME DEFAULT NULL, freq VARCHAR (3))'

This also works with the original file from "Oracle DBConvert".

Wouldn't it make sense for sqlite3 to use this information, e.g. when connect() is called with something like detect_types=CONVERT_DATETIME? One could even call the option CONVERT_ORACLE_DATETIME to describe that one cannot expect it to work with files from other sources.

If yes, then I suggest changing this issue's title to '"SELECT *" should optionally autoconvert DATETIME fields if found in DDL'.

The question is of course how much sqlite generating software creates this metadata. But Oracle is certainly an important actor. And sqlite3 itself could also save this DDL instead of extending the field names (which seems more hacky).

jondo · 2018-11-25T19:47:29Z

So my workaround is of course

table = pd.read_sql_query('select * from table', con)
column_type = pd.read_sql_query('PRAGMA table_info("table")', con)['type']
datetimes = table.columns[column_type == 'DATETIME']
table.loc[:, datetimes] = table.loc[:, datetimes].apply(pd.to_datetime)

I would be interested in creating a pull request for the detect_types=CONVERT_DATETIME argument that I described. Is there any chance to get that merged? I currently see 541 open cpython pull requests labeled "awaiting review" :-/

matrixise · 2019-02-14T16:48:24Z

Hi @jondo,

Of course you can submit a PR and we can help you with a review if we have time because we are volunteers and it's in function of our free time.

Have a nice day,

jondo · 2019-07-25T11:30:28Z

Sorry for not providing a pull request yet.

By the way, the "DB Browser for SQLite" (https://sqlitebrowser.org/) also interprets DATETIME fields correctly.

(Could someone please change the "Stage" of this issue to "needs patch"?)

erlend-aasland · 2022-05-25T18:39:26Z

As per discussion on Discourse, we will deprecate the built-in, default adapters and converters. Thus, this issue is now superseded by gh-90016.

jondo mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Nov 2, 2018

jondo mannequin changed the title ~~sqlite3: "select *" should autoconvert datetime fields~~ sqlite3: "select *" should optionally autoconvert TEXT datetime fields Nov 4, 2018

matrixise closed this as completed Nov 4, 2018

jondo mannequin changed the title ~~sqlite3: "select *" should optionally autoconvert TEXT datetime fields~~ sqlite3: "select *" should optionally sniff and autoconvert TEXT datetime fields Nov 4, 2018

matrixise reopened this Nov 4, 2018

jondo mannequin changed the title ~~sqlite3: "select *" should optionally sniff and autoconvert TEXT datetime fields~~ sqlite3: optionally autoconvert table_info's DATETIME fields Nov 18, 2018

jondo mannequin added 3.8 (EOL) end of life and removed 3.7 (EOL) end of life labels Jul 25, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

erlend-aasland added topic-sqlite3 type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels May 16, 2022

erlend-aasland added this to sqlite3 issues May 21, 2022

erlend-aasland moved this to Decision needed in sqlite3 issues May 21, 2022

erlend-aasland removed the status in sqlite3 issues May 21, 2022

erlend-aasland closed this as not planned Won't fix, can't repro, duplicate, stale May 25, 2022

erlend-aasland moved this to Done in sqlite3 issues May 25, 2022

erlend-aasland moved this from Done to Discarded in sqlite3 issues May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sqlite3: optionally autoconvert table_info's DATETIME fields #79326

sqlite3: optionally autoconvert table_info's DATETIME fields #79326

jondo mannequin commented Nov 2, 2018

jondo mannequin commented Nov 2, 2018

jondo mannequin commented Nov 2, 2018

pganssle commented Nov 2, 2018

jondo mannequin commented Nov 4, 2018

matrixise commented Nov 4, 2018

jondo mannequin commented Nov 4, 2018

matrixise commented Nov 4, 2018

jondo mannequin commented Nov 5, 2018

pganssle commented Nov 5, 2018

jondo mannequin commented Nov 5, 2018

jondo mannequin commented Nov 6, 2018

jondo mannequin commented Nov 25, 2018

matrixise commented Feb 14, 2019

jondo mannequin commented Jul 25, 2019

erlend-aasland commented May 25, 2022

sqlite3: optionally autoconvert table_info's DATETIME fields #79326

sqlite3: optionally autoconvert table_info's DATETIME fields #79326

Comments

jondo mannequin commented Nov 2, 2018

jondo mannequin commented Nov 2, 2018

jondo mannequin commented Nov 2, 2018

pganssle commented Nov 2, 2018

jondo mannequin commented Nov 4, 2018

matrixise commented Nov 4, 2018

jondo mannequin commented Nov 4, 2018

matrixise commented Nov 4, 2018

jondo mannequin commented Nov 5, 2018

pganssle commented Nov 5, 2018

jondo mannequin commented Nov 5, 2018

jondo mannequin commented Nov 6, 2018

jondo mannequin commented Nov 25, 2018

matrixise commented Feb 14, 2019

jondo mannequin commented Jul 25, 2019

erlend-aasland commented May 25, 2022