Upi logs parser #13

leonlnj · 2023-01-13T02:45:45Z

What this PR does / why we need it:
As part of the closed-loop ml, this is part of the logwriter that will move UPI logs from Kafka to BQ.

Kafka (Prediction/Router logs) -> Fluentd (Kafka source -> Upi logs parser -> Bq sink) -> BQ
This PR focus on the UPI logs parser specifically. Future PR will add the full fluentd with bq capabilities.

Installation steps is in logwriter/fluent-plugin-upi-logs/README.md

Changes
CI
.github/workflows/publish-upi-logs-parser.yaml - publish plugin to rubygem (I took a look at kafka, bq, protoparser fluentd repo and a few google protobuf repo ci, none of them uses the CI or a version passed in from git tag as release version which I thought would be more mainstream. So I followed, which make the version manually declared in gemspec. Dont foresee much issue as I dont think this lib will be updated often, mostly due to change in logs definition)
.github/workflows/timber.yml - buiild and test

Parser
logwriter/fluent-plugin-upi-logs/lib/fluent/plugin/parser_upi_logs.rb - Main parser implementation
logwriter/fluent-plugin-upi-logs/test/plugin/test_parser_upi_logs.rb - Unit test

Testing
fluent-plugin-upi-logs-0.0.0.dev.2 is built locally for testing. Using a local fluentd deployment (not part of this PR), the parser is imported in fluentd.conf using kafka source and stdout+bq as sink

BQ Schema

Stdout

BQ

sample-outout.txt

leonlnj · 2023-01-13T09:36:46Z

marking this PR as draft. As I was working with the BQ sink, converting to json doesn't seems to work and throws an error record must be a Hash: String. Setting record to hash works but the json data is weird that needs further investigation.

pradithya · 2023-01-16T07:29:21Z

@leonlnj
I think input / ModelInput and output / ModelOutput should be treated as RECORD column type instead of JSON.

The column will then have feature_table, entities_table, and raw_features as JSON type.

// Model input stores information of all input for prediction process.
// The information in model input are extracted from the prediction request received by model.
message ModelInput {
    // JSON-representation of features table. "table_schema_version" field describe the encoding of this field.
    google.protobuf.Struct features_table = 1;
    // JSON-representation of entities table. "table_schema_version" field describe the encoding of this field.
    google.protobuf.Struct entities_table = 2;
    // JSON-representation of raw_features table. "table_schema_version" field describe the encoding of this field.
    google.protobuf.Struct raw_features = 3;
    // Context of the prediction request.
    repeated Variable prediction_context = 4;   
    // map containing request headers/metadata
    map<string,string> headers = 10;
}

leonlnj · 2023-01-16T09:23:02Z

Getting
message="Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details." reason="invalid"

message="Error while reading data, error message: JSON parsing error in row starting at position 0: No such field: input.{\"name\"=>\"ctx_1\", \"type\"=>\"TYPE_DOUBLE\", \"double_value\"=>1.1}." reason="invalid"

after changing the bq schema to the below. Would need more investigation.

pradithya · 2023-01-16T09:59:41Z

Getting message="Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details." reason="invalid"

message="Error while reading data, error message: JSON parsing error in row starting at position 0: No such field: input.{\"name\"=>\"ctx_1\", \"type\"=>\"TYPE_DOUBLE\", \"double_value\"=>1.1}." reason="invalid"

after changing the bq schema to the below. Would need more investigation.

Got it! Probably we need to update how we parse it in the parser plugin

yield time, JSON.parse(record.to_json({ preserve_proto_fieldnames: true }))

leonlnj · 2023-01-17T04:50:04Z

Updated the description with the new BQ schema, using new proto of caramel-upi-protos '0.0.1' with the map of header changed to a struct, no further changes is required on the parser/fluentd end.

pradithya

Thanks @leonlnj ! LGTM

leonlnj added 4 commits January 12, 2023 17:58

add upi fluentd custom parser

002e31d

update ci

ae1b58c

change parsing format option and add publish ci

2225cb8

remove ci condition which was for testing

c0bbc14

leonlnj marked this pull request as ready for review January 13, 2023 03:08

leonlnj requested review from terryyylim and tiopramayudi January 13, 2023 04:30

leonlnj marked this pull request as draft January 13, 2023 09:18

update output format of parser to hash type, allow manual run of ci

8a683a9

leonlnj self-assigned this Jan 16, 2023

leonlnj marked this pull request as ready for review January 16, 2023 04:17

leonlnj requested a review from pradithya January 16, 2023 04:17

Merge branch 'caraml-dev:main' into upi-logs-parser

963ba55

leonlnj marked this pull request as draft January 16, 2023 09:21

update caraml proto and remove redundant lib

92ea648

leonlnj marked this pull request as ready for review January 17, 2023 04:50

pradithya reviewed Jan 17, 2023

View reviewed changes

pradithya approved these changes Jan 17, 2023

View reviewed changes

fix version as 0.0.0

7a195c2

leonlnj merged commit bf37c05 into caraml-dev:main Jan 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upi logs parser #13

Upi logs parser #13

leonlnj commented Jan 13, 2023 •

edited

Loading

leonlnj commented Jan 13, 2023

pradithya commented Jan 16, 2023

leonlnj commented Jan 16, 2023 •

edited

Loading

pradithya commented Jan 16, 2023

leonlnj commented Jan 17, 2023

pradithya left a comment

Upi logs parser #13

Upi logs parser #13

Conversation

leonlnj commented Jan 13, 2023 • edited Loading

leonlnj commented Jan 13, 2023

pradithya commented Jan 16, 2023

leonlnj commented Jan 16, 2023 • edited Loading

pradithya commented Jan 16, 2023

leonlnj commented Jan 17, 2023

pradithya left a comment

Choose a reason for hiding this comment

leonlnj commented Jan 13, 2023 •

edited

Loading

leonlnj commented Jan 16, 2023 •

edited

Loading