Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RoaringBitmap support #596

Merged
merged 13 commits into from
Mar 23, 2021
Merged

RoaringBitmap support #596

merged 13 commits into from
Mar 23, 2021

Conversation

zhicwu
Copy link
Contributor

@zhicwu zhicwu commented Mar 21, 2021

Object Read Write
RoaringBitmap  Y
ImmutableRoaringBitmap as RoaringBitmap Y
MutableRoaringBitmap as RoaringBitmap Y
Roaring64Bitmap as Roaring64NavigableMap and only when cardinality <= 32 as Roaring64NavigableMap
Roaring64NavigableMap only when cardinality <= 32 Y

Usage

// use JDBC interface - NOT recommended before 0.3.1
try (PreparedStatement statement = connection.prepareStatement("insert into my_bitmap_table values(..., ?, ...)")) {
    ...
    // RoaringBitmap bitmap = RoaringBitmap.bitmapOf(1,2,3,...);
    s.setObject(index++, ClickHouseBitmap.wrap(bitmap, ClickHouseDataType.UInt32));
    ...
    // the actual SQL in 0.3.0 will be something like below, which is also why batch insertion does not work...
    // insert into my_bitmap_table values(..., bitmapBuild([toUInt32(1),toUInt32(3),toUInt32(3),...]) ...)
    s.execute();
}

// use extended API - recommended in 0.3.0
try (ClickHouseStatement statement = connection.createStatement()) {
    statement.sendRowBinaryStream("insert into my_bitmap_table", new ClickHouseStreamCallback() {
        public void writeTo(ClickHouseRowBinaryStream stream) throws IOException {
            ...
            // RoaringBitmap bitmap = RoaringBitmap.bitmapOf(1,2,3,...);
            // In addition to RoaringBitmap, you can pass:
            // ImmutableRoaringBitmap, MutableRoaringBitmap and even Roaring64NavigableMap
            stream.writeBitmap(ClickHouseBitmap.wrap(bitmap, ClickHouseDataType.UInt32));
            ...
        }
    });
}

@github-actions
Copy link

Benchmark                           (client)  (statement)   Mode  Cnt       Score       Error  Units
Basic.insertOneRandomNumber  clickhouse-jdbc       normal  thrpt   20  533147.555 ± 33574.299  ops/s
Basic.insertOneRandomNumber  clickhouse-jdbc     prepared  thrpt   20     256.789 ±    26.478  ops/s
Basic.selectOneRandomNumber  clickhouse-jdbc       normal  thrpt   20    1142.817 ±   149.810  ops/s
Basic.selectOneRandomNumber  clickhouse-jdbc     prepared  thrpt   20    1179.416 ±   158.935  ops/s

@lzzuo2
Copy link

lzzuo2 commented Mar 22, 2021

您好 请问下用哪种方式去从CK里面读取RoaringBitmap比较好?直接读的话 貌似会有内存溢出的问题

@github-actions
Copy link

Benchmark                           (client)  (statement)   Mode  Cnt       Score      Error  Units
Basic.insertOneRandomNumber  clickhouse-jdbc       normal  thrpt   20  560076.510 ± 8225.068  ops/s
Basic.insertOneRandomNumber  clickhouse-jdbc     prepared  thrpt   20     238.036 ±   31.159  ops/s
Basic.selectOneRandomNumber  clickhouse-jdbc       normal  thrpt   20    1455.879 ±  224.413  ops/s
Basic.selectOneRandomNumber  clickhouse-jdbc     prepared  thrpt   20    1330.789 ±  179.090  ops/s

@zhicwu
Copy link
Contributor Author

zhicwu commented Mar 22, 2021

您好 请问下用哪种方式去从CK里面读取RoaringBitmap比较好?直接读的话 貌似会有内存溢出的问题

目前使用什么方式读取导致内存溢出?当前版本还是建议用extended API,例如:

try (ClickHouseRowBinaryInputStream in = statement.executeQueryClickhouseRowBinaryStream("SELECT * FROM my_bitmap_table")) {
    ...
    ClickHouseBitmap bitmap = in.readBitmap(ClickHouseDataType.UInt32);
    RoaringBitmap rb = (RoaringBitmap)bitmap.unwrap();
    ...
}

@github-actions
Copy link

Benchmark                           (client)  (statement)   Mode  Cnt       Score       Error  Units
Basic.insertOneRandomNumber  clickhouse-jdbc       normal  thrpt   20  526151.979 ± 10659.838  ops/s
Basic.insertOneRandomNumber  clickhouse-jdbc     prepared  thrpt   20     251.023 ±    27.312  ops/s
Basic.selectOneRandomNumber  clickhouse-jdbc       normal  thrpt   20    1181.098 ±   193.725  ops/s
Basic.selectOneRandomNumber  clickhouse-jdbc     prepared  thrpt   20    1165.106 ±   148.765  ops/s

@github-actions
Copy link

Benchmark                           (client)  (statement)   Mode  Cnt       Score       Error  Units
Basic.insertOneRandomNumber  clickhouse-jdbc       normal  thrpt   20  521913.493 ± 13423.092  ops/s
Basic.insertOneRandomNumber  clickhouse-jdbc     prepared  thrpt   20     245.400 ±    25.875  ops/s
Basic.selectOneRandomNumber  clickhouse-jdbc       normal  thrpt   20    1324.137 ±   174.162  ops/s
Basic.selectOneRandomNumber  clickhouse-jdbc     prepared  thrpt   20    1186.195 ±   221.061  ops/s

@zhicwu zhicwu merged commit 9267440 into ClickHouse:develop Mar 23, 2021
@zhicwu zhicwu deleted the roaringbitmap branch March 23, 2021 06:04
@LeeMouRen
Copy link

It's so convenient

@wntp
Copy link

wntp commented May 21, 2021

用上面的方法单条提交没有问题,如果批量提交的话,就报 void DB::ParalleParsingBlockInputStream::onBackgroundException(): Code 33, e.displayText() = DB::Exception Cannot read all data.Bytes expected: 105. Stack trace....
完全参考/~https://github.com/ClickHouse/clickhouse-jdbc/issues/532的写法,只是excute 换成了batchexcute。

@sloan-zhang
Copy link

sloan-zhang commented Dec 6, 2021

This method will generate a large size sql script, isn't it written directly insert into a bitmap object?
If so, how should I write a bitmap object with a large cardinality...

@zhicwu
Copy link
Contributor Author

zhicwu commented Dec 6, 2021

This method will generate a large size sql script, isn't it written directly insert into a bitmap object? If so, how should I write a bitmap object with a large cardinality...

I guess you're using JDBC API which wii generate a large SQL as you said - you should use extended API. Behind the scene, in most cases, it's just about how to use input function and/or external/temp table.

Starting 0.3.2, the situation has changed: 1) extended API is replaced by new Java client; and 2) you can load bitmap or use it as a query parameter using standard JDBC API. Please take a look at examples at here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write Roaring Bitmap data from Spark(ck-jdbc) to CH Support bitmap in jdbc Support bitmap in jdbc
5 participants