Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upi logs parser #13

Merged
merged 8 commits into from
Jan 18, 2023
Merged

Upi logs parser #13

merged 8 commits into from
Jan 18, 2023

Conversation

leonlnj
Copy link
Contributor

@leonlnj leonlnj commented Jan 13, 2023

What this PR does / why we need it:
As part of the closed-loop ml, this is part of the logwriter that will move UPI logs from Kafka to BQ.

Kafka (Prediction/Router logs) -> Fluentd (Kafka source -> Upi logs parser -> Bq sink) -> BQ
This PR focus on the UPI logs parser specifically. Future PR will add the full fluentd with bq capabilities.

Installation steps is in logwriter/fluent-plugin-upi-logs/README.md

Changes
CI
.github/workflows/publish-upi-logs-parser.yaml - publish plugin to rubygem (I took a look at kafka, bq, protoparser fluentd repo and a few google protobuf repo ci, none of them uses the CI or a version passed in from git tag as release version which I thought would be more mainstream. So I followed, which make the version manually declared in gemspec. Dont foresee much issue as I dont think this lib will be updated often, mostly due to change in logs definition)
.github/workflows/timber.yml - buiild and test

Parser
logwriter/fluent-plugin-upi-logs/lib/fluent/plugin/parser_upi_logs.rb - Main parser implementation
logwriter/fluent-plugin-upi-logs/test/plugin/test_parser_upi_logs.rb - Unit test

Testing
fluent-plugin-upi-logs-0.0.0.dev.2 is built locally for testing. Using a local fluentd deployment (not part of this PR), the parser is imported in fluentd.conf using kafka source and stdout+bq as sink

Screenshot 2023-01-16 at 11 40 43 AM

BQ Schema
image

Stdout
image

BQ

image

sample-outout.txt

@leonlnj leonlnj marked this pull request as ready for review January 13, 2023 03:08
@leonlnj leonlnj marked this pull request as draft January 13, 2023 09:18
@leonlnj
Copy link
Contributor Author

leonlnj commented Jan 13, 2023

marking this PR as draft. As I was working with the BQ sink, converting to json doesn't seems to work and throws an error record must be a Hash: String. Setting record to hash works but the json data is weird that needs further investigation.

@leonlnj leonlnj self-assigned this Jan 16, 2023
@leonlnj leonlnj marked this pull request as ready for review January 16, 2023 04:17
@leonlnj leonlnj requested a review from pradithya January 16, 2023 04:17
@pradithya
Copy link
Member

@leonlnj
I think input / ModelInput and output / ModelOutput should be treated as RECORD column type instead of JSON.

The column will then have feature_table, entities_table, and raw_features as JSON type.

// Model input stores information of all input for prediction process.
// The information in model input are extracted from the prediction request received by model.
message ModelInput {
    // JSON-representation of features table. "table_schema_version" field describe the encoding of this field.
    google.protobuf.Struct features_table = 1;
    // JSON-representation of entities table. "table_schema_version" field describe the encoding of this field.
    google.protobuf.Struct entities_table = 2;
    // JSON-representation of raw_features table. "table_schema_version" field describe the encoding of this field.
    google.protobuf.Struct raw_features = 3;
    // Context of the prediction request.
    repeated Variable prediction_context = 4;   
    // map containing request headers/metadata
    map<string,string> headers = 10;
}

@leonlnj leonlnj marked this pull request as draft January 16, 2023 09:21
@leonlnj
Copy link
Contributor Author

leonlnj commented Jan 16, 2023

Getting
message="Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details." reason="invalid"

message="Error while reading data, error message: JSON parsing error in row starting at position 0: No such field: input.{\"name\"=>\"ctx_1\", \"type\"=>\"TYPE_DOUBLE\", \"double_value\"=>1.1}." reason="invalid"

after changing the bq schema to the below. Would need more investigation.

image

@pradithya
Copy link
Member

Getting message="Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details." reason="invalid"

message="Error while reading data, error message: JSON parsing error in row starting at position 0: No such field: input.{\"name\"=>\"ctx_1\", \"type\"=>\"TYPE_DOUBLE\", \"double_value\"=>1.1}." reason="invalid"

after changing the bq schema to the below. Would need more investigation.

image

Got it! Probably we need to update how we parse it in the parser plugin

yield time, JSON.parse(record.to_json({ preserve_proto_fieldnames: true }))

@leonlnj
Copy link
Contributor Author

leonlnj commented Jan 17, 2023

Updated the description with the new BQ schema, using new proto of caramel-upi-protos '0.0.1' with the map of header changed to a struct, no further changes is required on the parser/fluentd end.

@leonlnj leonlnj marked this pull request as ready for review January 17, 2023 04:50
Copy link
Member

@pradithya pradithya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @leonlnj ! LGTM

@leonlnj leonlnj merged commit bf37c05 into caraml-dev:main Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants