Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite3: optionally autoconvert table_info's DATETIME fields #79326

Closed
jondo mannequin opened this issue Nov 2, 2018 · 15 comments
Closed

sqlite3: optionally autoconvert table_info's DATETIME fields #79326

jondo mannequin opened this issue Nov 2, 2018 · 15 comments
Labels
3.8 (EOL) end of life stdlib Python modules in the Lib dir topic-sqlite3 type-feature A feature request or enhancement

Comments

@jondo
Copy link
Mannequin

jondo mannequin commented Nov 2, 2018

BPO 35145
Nosy @berkerpeksag, @matrixise, @pganssle, @jondo
Files
  • SQLiteStudio-3449.sqlite
  • 2017v2-reduced.db
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2018-11-02.12:16:24.101>
    labels = ['3.8', 'type-bug', 'library']
    title = "sqlite3: optionally autoconvert table_info's DATETIME fields"
    updated_at = <Date 2019-07-25.11:30:28.046>
    user = '/~https://github.com/jondo'

    bugs.python.org fields:

    activity = <Date 2019-07-25.11:30:28.046>
    actor = 'jondo'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2018-11-02.12:16:24.101>
    creator = 'jondo'
    dependencies = []
    files = ['47905', '47911']
    hgrepos = []
    issue_num = 35145
    keywords = []
    message_count = 14.0
    messages = ['329128', '329129', '329157', '329214', '329225', '329226', '329227', '329291', '329309', '329319', '329350', '330393', '335544', '348435']
    nosy_count = 5.0
    nosy_names = ['ghaering', 'berker.peksag', 'matrixise', 'p-ganssle', 'jondo']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'resolved'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue35145'
    versions = ['Python 3.8']

    @jondo
    Copy link
    Mannequin Author

    jondo mannequin commented Nov 2, 2018

    Currently, fields are converted to datetime as described in https://docs.python.org/3/library/sqlite3.html#sqlite3.PARSE_COLNAMES :

    'select x as "x [datetime]" from table'

    In my use case I don't know the names and locations of the datetime fields in advance. So I would need to do pandas.read_sql_query('select * from table', con) and get correct datetime columns.
    (My current workaround is try calling pandas.to_datetime on each text column afterwards.)

    The type info has to be available in the sqlite database, because I see that SQLiteStudio correctly detects the datetime columns.

    @jondo jondo mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Nov 2, 2018
    @jondo
    Copy link
    Mannequin Author

    jondo mannequin commented Nov 2, 2018

    (In fact, I am currently taking the first non-missing entry of each text column and trying to dateutil.parser.parse it. If that works, I use pandas.to_datetime on the column.)

    @pganssle
    Copy link
    Member

    pganssle commented Nov 2, 2018

    According to the sqlite documentation, there's no fundamental datetime type in sqlite: https://www.sqlite.org/datatype3.html#date_and_time_datatype

    SQLite does not have a storage class set aside for storing dates
    and/or times. Instead, the built-in Date And Time Functions of SQLite
    are capable of storing dates and times as TEXT, REAL, or INTEGER values
    

    If you have an arbitrary database whose schema you don't know, I'm not sure it would be possible to automatically determine that it's a datetime, though it appears that Python already provides this functionality by exposing the converters "date" and "timestamp" ( https://docs.python.org/3/library/sqlite3.html#default-adapters-and-converters )

    If you don't know the schema you can't be 100% accurate on which columns are datetime, but apparently datetime types that are text will be of the format "YYYY-MM-DD HH:MM:SS.SSS", which is a variant of iso8601, REAL columns will be Julian day numbers and integers will be epoch time.

    If you assume that all your datetime columns will be TEXT and that any TEXT column that happens to be a valid date of is a datetime column, then you can either use:

        datetime.strftime(text_column, "%Y-%m-%d %H:%M:%S.%f")

    Or if you want to be faster and less strict (this will allow several other variations on ISO 8601):

        datetime.fromisoformat(text_column)

    I would not recommend using dateutil.parser.parse, as the dateutil parser is intended for taking something you know to be a string representing a datetime and getting you a datetime object from it. It is not designed to tell you whether something is or is not a datetime.

    @jondo
    Copy link
    Mannequin Author

    jondo mannequin commented Nov 4, 2018

    Thank you, Paul, for your hints on sniffing. I have now asked at SqliteStudio for confirmation that they also do content sniffing to detect datetime text fields: pawelsalawa/sqlitestudio#3449 .

    So in this issue I am suggesting to embed this datetime text sniffing into sqlite3, for optional activation. Perhaps a new flag value of detect_types=DETECT_DATETIME to the sqlite3.connect() function would be suitable?

    @jondo jondo mannequin changed the title sqlite3: "select *" should autoconvert datetime fields sqlite3: "select *" should optionally autoconvert TEXT datetime fields Nov 4, 2018
    @matrixise
    Copy link
    Member

    For my part, we could close this issue just because I am sure they sniff the format of the string.
    If you use sqliteman you get a TEXT and not "datetime"

    You can read this doc about the read_sql https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html

    Now, sqlite3 does not support the datetime type and I think it is not the job of Python to do this operation but the caller (your code) but you can see the example in Doc/includes/sqlite3/pysqlite_datetime.py

    import sqlite3
    import datetime
    
    con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
    cur = con.cursor()
    cur.execute("create table test(d date, ts timestamp)")
    
    today = datetime.date.today()
    now = datetime.datetime.now()
    
    cur.execute("insert into test(d, ts) values (?, ?)", (today, now))
    cur.execute("select d, ts from test")
    row = cur.fetchone()
    print(today, "=>", row[0], type(row[0]))
    print(now, "=>", row[1], type(row[1]))
    
    cur.execute('select current_date as "d [date]", current_timestamp as "ts [timestamp]"')
    row = cur.fetchone()
    print("current_date", row[0], type(row[0]))
    print("current_timestamp", row[1], type(row[1]))
    2018-11-04 => 2018-11-04 <class 'datetime.date'>
    2018-11-04 12:58:01.399669 => 2018-11-04 12:58:01.399669 <class 'datetime.datetime'>
    current_date 2018-11-04 <class 'datetime.date'>
    current_timestamp 2018-11-04 11:58:01 <class 'datetime.datetime'>

    @jondo
    Copy link
    Mannequin Author

    jondo mannequin commented Nov 4, 2018

    Stéphane, your example does not match my use case with unknown table structure.

    Does everyone agree that such a sniffing is out of scope for sqlite3?
    That's not what I'd call "batteries included" :-(

    @jondo jondo mannequin changed the title sqlite3: "select *" should optionally autoconvert TEXT datetime fields sqlite3: "select *" should optionally sniff and autoconvert TEXT datetime fields Nov 4, 2018
    @matrixise
    Copy link
    Member

    ok in this case, I re-open the issue.

    @matrixise matrixise reopened this Nov 4, 2018
    @jondo
    Copy link
    Mannequin Author

    jondo mannequin commented Nov 5, 2018

    I have created the attached sqlite database file (with SQLiteStudio) for testing. I contains a table t with a TEXT column x and a DATETIME (according to SQLiteStudio) column y. The table contains a single row with the value 2018-11-05 12:20:30 for both columns.

    For some reason SQLiteStudio manages to know these types (although the columns contain the same value, and even after renaming the file and reconnecting to it), so sqlite3 should also be able to detect them.

    @pganssle
    Copy link
    Member

    pganssle commented Nov 5, 2018

    With regards to automatically deducing column types, I am -1 on that. It has way too many dangerous edge cases, and I know of countless bugs in software that are the direct result of stringly-typed data being coerced into a specific data type based on its form. For example, if you have an excel column of alphanumeric strings and happen to get one that looks like "13943E1", that will be coerced (lossily) into the number 139430.0, because it happens to take the form of exponential notation.

    I contains a table t with a TEXT column x and a DATETIME (according to SQLiteStudio) column y.

    It sounds to me like SQLiteStudio is doing the same thing that Python is doing, by extending the "types" available to include a DATETIME. Presumably they do it the same way, with "datetime" in the column name. If that's the case then it's just a mismatch between what they call their datetime adapter and what python does, and you just need to rename the relevant columns to say "timestamp" instead of "datetime".

    As an aside, this is what we get from people not offering real datetime types in their standards. JSON has the exact same problem - people need to transmit datetimes but there's no way to do it in the standard, so everyone extends the standard in a sightly different way and you end up with a bunch of incompatible ways of storing datetimes as strings. Bah humbug.

    @jondo
    Copy link
    Mannequin Author

    jondo mannequin commented Nov 5, 2018

    Paul, the sniffing would be only active for people who explicitly add a connect() argument like detect_types=DETECT_DATETIME, which can have the appropriate warning in the docs. You can also extend the sniffing to all values, not just the first non-missing one. (I'd gladly pay the computational price.) How many cases are there where a column is full of YYYY-MM-DD HH:MM:SS.SSS data, DETECT_DATETIME is switched on, and the user doesn't want this column to be interpreted as datetime?

    (I'm of course not suggesting to try detecting REAL as Julian day, or INTEGER as Unix Time.)

    Forget about my test file, by the way:
    I have now found out that I can extract the DATETIME type information from my test file SQLiteStudio-3449.sqlite in the following two ways:

    In [3]: pd.read_sql_query('pragma table_info("t")', con)
    Out[3]: 
       cid name      type  notnull dflt_value  pk
    0    0    x      TEXT        0       None   0
    1    1    y  DATETIME        0       None   0
    
    In [4]: pd.read_sql_query('SELECT SQL FROM sqlite_master WHERE name = "t"', con)
    Out[4]: 
                                       sql
    0  CREATE TABLE t (x TEXT, y DATETIME)
    

    For my real use case however, those two statements yield empty results :-(

    @jondo
    Copy link
    Mannequin Author

    jondo mannequin commented Nov 6, 2018

    Here is now a reduced version of my real use case: 2017v2-reduced.db. The file was originally created with "Oracle DBConvert". This file contains a table without rows.
    I was wrong in my last comment: There's also no sniffing needed here, see either of the following useful outputs:

    In [5]: pd.read_sql_query('PRAGMA table_info("t2")', con)['type']
    Out[5]: 
    0       DATETIME
    1    VARCHAR (3)
    Name: type, dtype: object
    
    In [6]: pd.read_sql_query('SELECT SQL FROM sqlite_master WHERE name = "t2"', con).iloc[0,0]
    Out[6]: 'CREATE TABLE t2 (localtime DATETIME DEFAULT NULL, freq VARCHAR (3))'
    

    This also works with the original file from "Oracle DBConvert".

    Wouldn't it make sense for sqlite3 to use this information, e.g. when connect() is called with something like detect_types=CONVERT_DATETIME? One could even call the option CONVERT_ORACLE_DATETIME to describe that one cannot expect it to work with files from other sources.

    If yes, then I suggest changing this issue's title to '"SELECT *" should optionally autoconvert DATETIME fields if found in DDL'.

    The question is of course how much sqlite generating software creates this metadata. But Oracle is certainly an important actor. And sqlite3 itself could also save this DDL instead of extending the field names (which seems more hacky).

    @jondo jondo mannequin changed the title sqlite3: "select *" should optionally sniff and autoconvert TEXT datetime fields sqlite3: optionally autoconvert table_info's DATETIME fields Nov 18, 2018
    @jondo
    Copy link
    Mannequin Author

    jondo mannequin commented Nov 25, 2018

    So my workaround is of course

    table = pd.read_sql_query('select * from table', con)
    column_type = pd.read_sql_query('PRAGMA table_info("table")', con)['type']
    datetimes = table.columns[column_type == 'DATETIME']
    table.loc[:, datetimes] = table.loc[:, datetimes].apply(pd.to_datetime)
    

    I would be interested in creating a pull request for the detect_types=CONVERT_DATETIME argument that I described. Is there any chance to get that merged? I currently see 541 open cpython pull requests labeled "awaiting review" :-/

    @matrixise
    Copy link
    Member

    Hi @jondo,

    Of course you can submit a PR and we can help you with a review if we have time because we are volunteers and it's in function of our free time.

    Have a nice day,

    @jondo
    Copy link
    Mannequin Author

    jondo mannequin commented Jul 25, 2019

    Sorry for not providing a pull request yet.

    By the way, the "DB Browser for SQLite" (https://sqlitebrowser.org/) also interprets DATETIME fields correctly.

    (Could someone please change the "Stage" of this issue to "needs patch"?)

    @jondo jondo mannequin added 3.8 (EOL) end of life and removed 3.7 (EOL) end of life labels Jul 25, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @erlend-aasland erlend-aasland added topic-sqlite3 type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels May 16, 2022
    @erlend-aasland erlend-aasland moved this to Decision needed in sqlite3 issues May 21, 2022
    @erlend-aasland
    Copy link
    Contributor

    As per discussion on Discourse, we will deprecate the built-in, default adapters and converters. Thus, this issue is now superseded by gh-90016.

    @erlend-aasland erlend-aasland closed this as not planned Won't fix, can't repro, duplicate, stale May 25, 2022
    @erlend-aasland erlend-aasland moved this from Done to Discarded in sqlite3 issues May 25, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life stdlib Python modules in the Lib dir topic-sqlite3 type-feature A feature request or enhancement
    Projects
    Status: Discarded
    Development

    No branches or pull requests

    3 participants