-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HELP] 0.3.2 pre-release for public testing #768
Comments
Hello, @zhicwu
|
Thank you for testing the new driver and reporting the issue here. I'll try to get it fixed in next build. Actually I like the way DBeaver handle Update: |
This is one of the crucial parts of the usage:
|
Looks like a combination of |
It seems that the Java client is currently relying on thread pools underneath to implement its async API, I mean, instead of being fully async at its core. Is it tentative or intended? Also if I make a request to a node and try to make another request w/ the same client to a different node, then depending on the timing, the former call may fail since the second request may close the underlying connection too early. In that regard, I'm not sure what's the intended usage pattern of the Java client API. For instance, if I want to make a few requests to ClickHouse in the context of an incoming HTTP call, then what am I supposed to reuse across the requests to a ClickHouse node? ClickHouseClient? ClickHouseRequest? UPDATE: I had a look at |
Thanks @dynaxis, these are all good points.
Unfortunately this is intended, because JDBC driver is built on top of the client, meaning we prefer least dependency and we still need to support JDK 8. I hope we can find somewhere in the middle - a compact lib to serve very basic functions for both JDBC and R2DBC drivers.
Yes, you can reuse ClickHouseRequest. Each time you call its execute()/send() method, it will create a sealed copy for the execution, which is similar as copy-on-write data structure for thread safety. On the other, ClickHouseClient is responsible for handling protocol-specific details like how to execute a request and get response. Taking http as an example, depending on whether the concrete http connection(e.g. HttpURLConnection) is reusable, it may suggest to create new connection for each request or simply reuse the same one. |
|
Did you mean the version on |
public static void main(String[] args) throws Exception {
String url = "jdbc:clickhouse://localhost:8123/default";
ClickHouseDataSource dataSource = new ClickHouseDataSource(url);
try (ClickHouseConnection conn = dataSource.getConnection()) {
DatabaseMetaData meta = conn.getMetaData();
ResultSet rs = meta.getTables("foo", null, "%", new String[] {"TABLE", "VIEW", "FOREIGN TABLE", "MATERIALZED VIEW"});
ResultSetMetaData rsMeta = rs.getMetaData();
while (rs.next()) {
for (int i = 0; i < rsMeta.getColumnCount(); i++) {
System.out.println(rsMeta.getColumnName(i+1) + ": " + rs.getString(i+1));
}
}
}
} |
I'm not sure this is the right place to comment on the current pre-release. Anyway, how about making |
Yes, I added some missing parts and made changes as well to make the driver more JDBC-compliant. I'll try to emphasize the change in release notes - all in all, 0.3.2 is not a drop-in replacement of previous version, despite its version(just trying to stick with the roadmap). Apart from that, as you may notice, more table types were added. I was kind of hoping DBeaver can support that like in SQuirrel SQL(see dbeaver/dbeaver#14773 ). |
Yes it's the right place and thanks for the feedback.
I thought about this a few times as well. In the very beginning, there's a What's your case of making the method public? If it's about issuing more queries, you can try something like |
@zhicwu I hope to create and close a client around every call to a method where actual queries are performed. That is, the reference is required to close the client as a clean up. My intention is to pick a new ClickHouse node for every incoming HTTP request for the purpose of load balancing, where possibly multiple queries are performed. For that, I need to create a new client for each incoming HTTP request. If you are uncomfortable w/ making getClient public, it's enough to have a public method for closing the associated client from a ClickHouseRequest. I guess you may consider a request's holding a client is an implementation detail, but at least to me, it seems a detail that is unlikely to change. |
Building, TestingThanks @zhicwu for helping me to get the compile / tests running again. For reference:
<toolchains>
<toolchain>
<type>jdk</type>
<provides>
<version>11</version>
</provides>
<configuration>
<jdkHome>/usr/lib/jvm/java-11-openjdk</jdkHome>
</configuration>
</toolchain>
<toolchain>
<type>jdk</type>
<provides>
<version>8</version>
</provides>
<configuration>
<jdkHome>/usr/lib/jvm/java-8-jdk</jdkHome>
</configuration>
</toolchain>
<toolchain>
<type>jdk</type>
<provides>
<version>17</version>
</provides>
<configuration>
<jdkHome>/usr/lib/jvm/java-17-openjdk</jdkHome>
</configuration>
</toolchain>
</toolchains>
DatabaseMetaDataI created issue #778 to review the use. I think that there aren't many clients relying on this information, but we should not make stuff up. |
Thanks for the explanation. Are those multiple queries issued in same thread or separated ones? This is an example for the former case. The latter one can be tackled with chained Futures in a similar way. IMO, it's safer and easier to close the client once from outside, instead of passing the responsibility to request(s). It sounds like soon we'll need to maintain a reference counter to ensure the client can be only closed after all relevant requests completed. Maybe it's the similar reason of why in JDBC 3 more items I can think of for the discussion:
|
@zhicwu I got your point on not wanting to expose the client reference held in a request. Actually the method performing multiple queries is like a Spring MVC handler (I'm currently using Micronaut), and I wanted to inject So I'm turning to another implementation strategy where the I have a few points to make on the items you listed above:
|
Sorry for the inconvinence at your end. I hope we can refine the API in 0.3.3 and mitigate situation like this.
In 0.3.2 I removed public modifier of the class as it's buggy and incomplete. I'll rewrite it in 0.3.3 and need to test against multiple nodes(same cluster or not).
Very true. To be honest, I'm still struggling with whether a client should be able to connect to different nodes. In a cluster environment, each time you call
The simpler the better. Let me see what I can do in 0.3.3 - maybe support both scenarios? |
Background
v0.3.2 was a minor release scheduled to be released months ago, but now it's a complete rewrite mainly for two reasons:
decoupling(see Proposal of restructuring code (RFC) #570 for details)
switching data format to RowBinary to fix issues and improve performance
Benchmark results...
0.3.2-test1...
clickhouse-grpc-jdbc
andclickhouse-http-jdbc
are new JDBC driver(0.3.2) usingRowBinary
data formatclickhouse-jdbc
is the old JDBC driver(0.3.1-patch) based onTabSeparated
clickhouse-native-jdbc
is ClickHouse-Native-JDBC 2.6.0Benchmark settings: thread=1, sampleSize=100000, fetchSize=10000, mode=throughput(ops/s).
0.3.2-test3...
Unlike previous round of testing, ClickHouse container is re-created a few minutes before benchmarking each driver.
Note: HttpClient is async(uses more than one thread in runtime); gRPC uses gzip(why?) which is slower than lz4.
Note: on client side, the new driver consumes less memory and CPU than others, BUT higher CPU on server side(due to overhead of http protocol?).
Comparison
VM utilization
0.3.2...
Query performance is similar as shown in 0.3.2-test3 so this time we only focus on insertion.
Note: gRPC does not support LZ4 compression so we use GZIP in the test.
0.3.2-test1, 0.3.2-test2, and 0.3.2-test3 are pre-release for public testing.
Downloads
Maven dependency:
To download JDBC drivers:
Note: the first two are recommended. grpc is experimental so you'd better use http.
Known Issues
java.io.IOException: HTTP/1.1 header parser received no bytes
when using JDK 11+ andhttp_connection_provider
is set toHTTP_CLIENT
RESOURCE_EXHAUSTED: Compressed gRPC message exceeds maximum size
- increasemax_inbound_message_size
to resolveselect 1 format JSON
works in http but not grpc, because grpc client is not aware of response formatinsert into table values(?, ?)
is slow in batch mode - tryinsert into table select c2,c3 from input('c1 String, c2 UInt8, c3 Nullable(UInt32)')
insteaduse_time_zone
anduse_server_time_zone_for_dates
properties do not workjdbc(*)
databaseKey Changes
jdbc:ch://localhost
is same asjdbc:clickhouse:http://localhost:8123
jdbc:ch:grpc://localhost/db
is same asjdbc:clickhouse:grpc://localhost:9100/db
com.clickhouse.jdbc.ClickHouseDriver
(will removeru.yandex.clickhouse.ClickHouseDriver
starting from 0.4.0)custom_http_headers
andcustom_http_params
for customization - won't work for grpc clientjdbcCompliant
(defaults to true) to support fake transaction and standard synchronous UPDATE and DELETE statementstypeMappings
to customize type mapping(e.g.DateTime=java.lang.String,DateTime32=java.lang.String
)Some more details can be found at #736, #747, #769, and #777.
The text was updated successfully, but these errors were encountered: