-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get garbled codes in the snapshot phase with MySql CDC #1166
Comments
Please use English in the open source community. |
To produce the problem, please add the sql in the costomer.sql
You can add the test in the MySqlConnectorITCase
For the quick fix, you can modify as we do in the picture. |
The bug is because the jdbc connection uses the utf-8 encoding format. It causes the results from the jdbc connection has already been encoded in the utf-8 charset. But when convert the object array to the |
这个改动只能兼容ucs2,相同的问题在其他引擎也会存在。debezium对数据库表字段的编码都进行了解析,并会进行相应的编解码,flink里边的快照读应该都使用的是默认的UTF-8,在这里对所有的类型进行转换总觉得不是最好的方式 |
Please use English... |
It's just a quick fix. You should refer to the doc[1] that shows the mapping releationship bewteen the mysql charset and java charset. I think we should do all the mapping in the doc. Maybe we should also test how the debezium works in the snapshot phase. [1] https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-charsets.html |
I am interested in this ticket. I could help to fix it. |
Thanks for your help. Please ping me when you finish your PR. |
Describe the bug(Please use English)
A clear and concise description of what the bug is.
Environment :
To Reproduce
Steps to reproduce the behavior:
mysql数据库编码为utf-8,A库的a1表的aa1字段的编码为ucs2,然后使用flink cdc读取数据,存量数据读取是乱码的;
读取binlog的新增数据是不乱码的;
然后我查了下读取binlog对aa1字段使用的utf-16编码,使用的也是utf-16解码;
读取存量快照的时候对usc2是utf-8编码(可能是默认),utf-16解码,所以乱码了
Additional Description
The text was updated successfully, but these errors were encountered: