Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External database access unification (jdbc/odbc bridges) #3210

Merged
merged 11 commits into from
Oct 9, 2018
Merged

External database access unification (jdbc/odbc bridges) #3210

merged 11 commits into from
Oct 9, 2018

Conversation

alex-krash
Copy link
Contributor

@alex-krash alex-krash commented Sep 25, 2018

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

The purpose of this PR is to implement connect to external databases via odbc/jdbc bridges, spawned as separate daemons.
I am working in parallel on JDBC-bridge here - /~https://github.com/alex-krash/clickhouse-jdbc-bridge

The final result of this PR would be like:

  1. Table functions "odbc" and "jdbc" exists
  2. Engines "ODBC" and "JDBC" exists
  3. (optional) support for JDBC dictionaries implemented

While working on this PR, I've figured out following problems, which I would like to solve as well:

  1. Existing ODBC bridge forces using of double-quotes as escape sequence for identifiers. It breaks compatibility with MySQL, particularly. I propose to add API method to odbc/jdbc bridge, that will return value of escape character (or add it to existsing /column_info handler, and rename it)
  2. Nullability support abscence. For JDBC bridge I've implemented it (see below), maybe we should implement it for ODBC as well?

Example of jdbc interaction with MySQL:

mysql> CREATE TABLE `test`.`test` (
    ->   `int_id` INT NOT NULL AUTO_INCREMENT,
    ->   `int_nullable` INT NULL DEFAULT NULL,
    ->   `float` FLOAT NOT NULL,
    ->   `float_nullable` FLOAT NULL DEFAULT NULL,
    ->   `double` DOUBLE NOT NULL,
    ->   `double_nullable` DOUBLE NULL DEFAULT NULL,
    ->   `timestamp` TIMESTAMP NOT NULL,
    ->   `timestamp_nullable` TIMESTAMP NULL DEFAULT NULL,
    ->   `date` DATE NOT NULL,
    ->   `date_nullable` DATE NULL DEFAULT NULL,
    ->   PRIMARY KEY (`int_id`));
Query OK, 0 rows affected (0,09 sec)

mysql> insert into test (`int_id`, `float`, `double`, `timestamp`, `date`) VALUES (1,2,3, '2018-01-02 03:04:05', '2019-05-06');Query OK, 1 row affected (0,00 sec)

mysql> select * from test;
+--------+--------------+-------+----------------+--------+-----------------+---------------------+--------------------+------------+---------------+
| int_id | int_nullable | float | float_nullable | double | double_nullable | timestamp           | timestamp_nullable | date       | date_nullable |
+--------+--------------+-------+----------------+--------+-----------------+---------------------+--------------------+------------+---------------+
|      1 |         NULL |     2 |           NULL |      3 |            NULL | 2018-01-02 03:04:05 | NULL               | 2019-05-06 | NULL          |
+--------+--------------+-------+----------------+--------+-----------------+---------------------+--------------------+------------+---------------+
1 row in set (0,00 sec)

From clickhouse-client:

DESCRIBE TABLE jdbc('jdbc:mysql://localhost:3306/?user=root&password=root', 'test', 'test')

┌─name───────────────┬─type───────────────┬─default_type─┬─default_expression─┐
│ int_id             │ Int32              │              │                    │
│ int_nullable       │ Nullable(Int32)    │              │                    │
│ float              │ Float32            │              │                    │
│ float_nullable     │ Nullable(Float32)  │              │                    │
│ double             │ Float64            │              │                    │
│ double_nullable    │ Nullable(Float64)  │              │                    │
│ timestamp          │ DateTime           │              │                    │
│ timestamp_nullable │ Nullable(DateTime) │              │                    │
│ date               │ Date               │              │                    │
│ date_nullable      │ Nullable(Date)     │              │                    │
└────────────────────┴────────────────────┴──────────────┴────────────────────┘

10 rows in set. Elapsed: 0.031 sec.

SELECT *
FROM jdbc('jdbc:mysql://localhost:3306/?user=root&password=root', 'test', 'test') 

┌─int_id─┬─int_nullable─┬─float─┬─float_nullable─┬─double─┬─double_nullable─┬───────────timestamp─┬─timestamp_nullable─┬───────date─┬─date_nullable─┐
│      1 │         ᴺᵁᴸᴸ │     2 │           ᴺᵁᴸᴸ │      3 │            ᴺᵁᴸᴸ │ 2018-01-02 03:04:05 │               ᴺᵁᴸᴸ │ 2019-05-06 │          ᴺᵁᴸᴸ │
└────────┴──────────────┴───────┴────────────────┴────────┴─────────────────┴─────────────────────┴────────────────────┴────────────┴───────────────┘

1 rows in set. Elapsed: 0.055 sec.

Alexander Krasheninnikov and others added 2 commits September 25, 2018 03:21
@alex-krash
Copy link
Contributor Author

Currently, new implementation of ODBC, via unified bridge interface is registered via "idbc" name, in order to being able to perform comparison between them.

@alexey-milovidov
Copy link
Member

I propose to add API method to odbc/jdbc bridge, that will return value of escape character

Ok.

maybe we should implement it for ODBC as well?

Yes. There is no reason, why it is not implemented.

Currently, new implementation of ODBC, via unified bridge interface is registered via "idbc" name, in order to being able to perform comparison between them.

You can replace old ODBC to your implementation as soon as integration test passes.

@alex-krash alex-krash changed the title [WIP] External database access unification (jdbc/odbc bridges) External database access unification (jdbc/odbc bridges) Oct 1, 2018
@alex-krash
Copy link
Contributor Author

Looks like I've finished with this PR.

  1. A new API endpoint /identifier_quote has been added to odbc bridge. Now queries are escaped with appropriate character, and should work for MySQL ODBC (when segfault there disappears :) )
  2. A support for Nullable was not implemented due to conflict with ODBCBlockInputStream and ExternalResultDescription (looks like nullability support there is something out of current PR, and can have significant side-effects):
DESCRIBE TABLE odbc('DSN=test', '', 'test')

┌─name─┬─type────────────┬─default_type─┬─default_expression─┐
│ id   │ Int32           │              │                    │
│ two  │ Nullable(Int32) │              │                    │
└──────┴─────────────────┴──────────────┴────────────────────┘

2 rows in set. Elapsed: 0.076 sec. 

krash :) select * from odbc('DSN=test', '', 'test')

SELECT *
FROM odbc('DSN=test', '', 'test') 

Received exception from server (version 18.14.0):
Code: 86. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::Exception: Received error from remote server /?connection_string=DSN%3Dtest&columns=columns%20format%20version%3A%201%0A2%20columns%3A%0A%60id%60%20Int32%0A%60two%60%20Nullable%28Int32%29%0A&max_block_size=65536. HTTP status code: 500 Internal Server Error, body: ��Code: 50, e.displayText() = DB::Exception: Unsupported type Nullable(Int32), e.what() = DB::Exception, 
  1. Unfortunately, I've failed to run integration tests (did not start, python/Docket/whatever problems)
  2. I propose to review the code, and merge it, turning off jdbc() function and JDBC engine, till clickhouse-jdbc-bridge is ready for production.

dbms/src/Common/XDBCBridgeHelper.h Outdated Show resolved Hide resolved
docs/ru/query_language/table_functions/jdbc.md Outdated Show resolved Hide resolved
dbms/src/TableFunctions/ITableFunctionXDBC.h Outdated Show resolved Hide resolved
std::string identifier = "";

SQLSMALLINT t;
SQLRETURN r = POCO_SQL_ODBC_CLASS::SQLGetInfo(hdbc, SQL_IDENTIFIER_QUOTE_CHAR, NULL, 0, &t);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we know that the size of SQL_IDENTIFIER_QUOTE_CHAR is one or zero bytes. So maybe here we can allocate buffer (identifier) and request ODBC only once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to keep it consistent with bloody ODBC-interface, and not make too tricky and fragile.

dbms/src/Common/XDBCBridgeHelper.h Outdated Show resolved Hide resolved
dbms/src/TableFunctions/ITableFunctionXDBC.cpp Outdated Show resolved Hide resolved
dbms/src/Storages/StorageXDBC.cpp Outdated Show resolved Hide resolved
dbms/src/Common/XDBCBridgeHelper.h Outdated Show resolved Hide resolved
dbms/programs/odbc-bridge/ColumnInfoHandler.cpp Outdated Show resolved Hide resolved
dbms/programs/odbc-bridge/ColumnInfoHandler.cpp Outdated Show resolved Hide resolved
@alesapin
Copy link
Member

alesapin commented Oct 3, 2018

Style Erros:

dbms/src/TableFunctions/ITableFunctionXDBC.h:38: const std::string & connection_string_) const override { 
dbms/src/TableFunctions/ITableFunctionXDBC.h:55: const std::string & connection_string_) const override { 
dbms/src/Common/XDBCBridgeHelper.h:238:struct ODBCBridgeMixin {

Build logs:
https://pastebin.com/2iGXGYpJ

@alesapin
Copy link
Member

alesapin commented Oct 4, 2018

Build still fails with errors: https://pastebin.com/EeVHeViY

@alex-krash
Copy link
Contributor Author

@alesapin , sorry, my bad.
I don't have reproduce for this error while build.

@alesapin
Copy link
Member

alesapin commented Oct 4, 2018

Ok, thank you. Now travis build fails (you can see error in "Details" link) and two ODBC integration tests fail.
https://pastebin.com/FuMwJTy4

@alex-krash
Copy link
Contributor Author

Got it, thanks! Will spend more time on trying to launch integrations.

@alesapin
Copy link
Member

alesapin commented Oct 4, 2018

Maybe it's possible to recognize error from stack trace:

E           QueryRuntimeException: Client failed! Return code: 232, stderr: Received exception from server (version 18.14.2):
E           Code: 1000. DB::Exception: Received from 172.18.0.3:9000. DB::Exception: Not found: dictionary.source.connection_string. Stack trace:
E          
E           0. /usr/bin/clickhouse(StackTrace::StackTrace()+0x13) [0x6873223]
E           1. /usr/bin/clickhouse(DB::TCPHandler::runImpl()+0xfbe) [0x2d6192e]
E           2. /usr/bin/clickhouse(DB::TCPHandler::run()+0x1a) [0x2d68cfa]
E           3. /usr/bin/clickhouse(Poco::Net::TCPServerConnection::start()+0xc) [0x6976acc]
E           4. /usr/bin/clickhouse(Poco::Net::TCPServerDispatcher::run()+0x118) [0x6976f98]
E           5. /usr/bin/clickhouse(Poco::PooledThread::run()+0xad) [0x6a3dfad]
E           6. /usr/bin/clickhouse(Poco::ThreadImpl::runnableEntry(void*)+0x3a) [0x6a3b72a]
E           7. /usr/bin/clickhouse() [0x705cdcf]
E           8. /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f8568228184]
E           9. /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f8567a4703d]

Intergration tests instruction /~https://github.com/yandex/ClickHouse/blob/master/dbms/tests/integration/README.md

@alexey-milovidov alexey-milovidov merged commit 1472e3a into ClickHouse:master Oct 9, 2018
@alexey-milovidov
Copy link
Member

alexey-milovidov commented Oct 9, 2018

Nullability support abscence. For JDBC bridge I've implemented it (see below), maybe we should implement it for ODBC as well?

We want to add the support for Nullable types in ODBC and MySQL storages.

The logic will be, for MySQL and ODBC table engine:

  • convert NULL to default value, if it is not Nullable in ClickHouse table;
  • read NULL value if it is Nullable in ClickHouse table.

For mysql and odbc table functions:

  • introduce a setting to use Nullable types (by default) or to convert everything to non-Nullable (current behaviour).

@alexey-milovidov
Copy link
Member

As I understand, it is disabled right now.
Nevertheless, I've merged this PR to make further development easier.

Things to do:

  • enable this feature;
  • prepare a manual on how to start using it after installing ClickHouse:
    -- how to install JDK;
    -- how to install JDBC bridge;
    -- how to install example database with JDBC driver (maybe it will be ClickHouse itself);
  • add integration test.

@alexey-milovidov
Copy link
Member

We should think how to make this feature available for users with absolute minimum installation steps.

  • create a jar with all dependencies inside?
  • package it in selfcontained application (is it possible)?
  • add it to the clickhouse-common-static package?

@alex-krash
Copy link
Contributor Author

alex-krash commented Oct 10, 2018

@alexey-milovidov , I've wrapped JDBC bridge into .deb package.
/~https://github.com/alex-krash/clickhouse-jdbc-bridge#installation
Maybe we should move this repo to Yandex one, and distribute as clickhouse-jdbc-bridge ?

Application consists of single jar with all the dependencies, and requires JDK to start this jar.

@alex-krash alex-krash deleted the jdbc-bridge branch October 10, 2018 14:55
@alexey-milovidov
Copy link
Member

We want to add the support for Nullable types in ODBC and MySQL storages.
The logic will be, for MySQL and ODBC table engine:
convert NULL to default value, if it is not Nullable in ClickHouse table;
read NULL value if it is Nullable in ClickHouse table.

This is implemented in #3362

For mysql and odbc table functions:
introduce a setting to use Nullable types (by default) or to convert everything to non-Nullable (current behaviour).

This is to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants