-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adding support for native Python transformations on a single dictionary #4724
Conversation
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
…..not an ideal solution Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
a593231
to
71927f5
Compare
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
FYI @robhowley |
is there no way to do any kind of type inference in the codepath to determine it's a singleton or leave it up to the person implementing the transformation to know what data they're expecting and raise errors accordingly? just kinda lean more into duck typing. strikes me as a little unusual to have to pass a |
That's the way I originally started do it, the problem is that requires explicit type annotations in the ODFV definition and I think that's too strict of a requirement. |
I definitely agree this Boolean is not the preferred option but it was the what made the most sense to me given that other sorts of type inference start to become brittle once you allow for different input types. |
i was thinking more like relax the type annotation be dict[str, Any] instead of dict[str, List[Any]]. it's the transformation functions problem to handle it. |
Yeah that could be done today because type annotations can still be ignored by users but I don't think that's a good enough experience for Data Scientists. I think this makes the transformation really easy and it has rational defaults. |
and defining it as a Union of the two types wasn't sufficient? |
output_dict: dict[str, Any] = self.feature_transformation.transform( | ||
feature_dict | ||
) | ||
if self.singleton and self.mode == "python": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robhowley see here
inferred_features = self.feature_transformation.infer_features( | ||
self._construct_random_input() | ||
) | ||
random_input = self._construct_random_input(singleton=self.singleton) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robhowley and here
@@ -663,20 +681,25 @@ def _construct_random_input(self) -> dict[str, list[Any]]: | |||
ValueType.BOOL_LIST: [[True]], | |||
ValueType.UNIX_TIMESTAMP_LIST: [[_utc_now()]], | |||
} | |||
if singleton: | |||
rand_dict_value = {k: rand_dict_value[k][0] for k in rand_dict_value} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robhowley and here
@@ -570,6 +570,8 @@ def _augment_response_with_on_demand_transforms( | |||
proto_values.append( | |||
python_values_to_proto_values( | |||
feature_vector | |||
if isinstance(feature_vector, list) | |||
else [feature_vector] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robhowley additional special logic
@robhowley I tagged you in some places that outline some of why I ended up doing it this way. |
I'm also not following why it can't be just a type check. Adding this singleton feels complex the logic. |
Depends on the type check. You may have input features that are mixed that you want to create a feature out of. For example, if I did a type check to ensure all inputs were not lists, they will not accommodate the case I just mentioned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't block it to go alive, but I think it would be great if there is some other way for this instead of this "singleton" parameter :)
# This flattens the list of elements to extract the first one | ||
# in the case of a singleton element, it takes the value directly | ||
# in the case of a list of lists, it takes the first list | ||
input_dict = {k: v[0] for k, v in input_dict.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v[0] might cause error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't but I'll add a test for it
Do you have suggestion? I had dawned discussion with @robhowley indicating that this was not my preferred solution but simply the one I ended up having to settle on due to the tradeoffs. |
I don't have any direct suggestion on this yet. But I believe after refactoring rebuild the Transformation entirely it should work in some structured way :) |
…ctionary (feast-dev#4724) * feat: Adding support for native Python transformations on a dictionary Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> * Updated type checking and added exception handling to try basic dict...not an ideal solution Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> * updated tests Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> * adding protos Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> * fixed unit test Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# [0.42.0](v0.41.0...v0.42.0) (2024-12-05) ### Bug Fixes * Add adapters for sqlite datetime conversion ([#4797](#4797)) ([e198b17](e198b17)) * Added grpcio extras to default feature-server image ([#4737](#4737)) ([e9cd373](e9cd373)) * Changing node version in release ([7089918](7089918)) * Feast create empty online table when FeatureView attribute online=False ([#4666](#4666)) ([237c453](237c453)) * Fix db store types in Operator CRD ([#4798](#4798)) ([f09339e](f09339e)) * Fix the config issue for postgres ([#4776](#4776)) ([a36f7e5](a36f7e5)) * Fixed example materialize-incremental and improved explanation ([#4734](#4734)) ([ca8a7ab](ca8a7ab)) * Fixed SparkSource docstrings so it wouldn't used inhereted class docstrings ([#4722](#4722)) ([32e6aa1](32e6aa1)) * Fixing PGVector integration tests ([#4778](#4778)) ([88a0320](88a0320)) * Incorrect type passed to assert_permissions in materialize endpoints ([#4727](#4727)) ([b72c2da](b72c2da)) * Issue of DataSource subclasses using parent abstract class docstrings ([#4730](#4730)) ([b24acd5](b24acd5)) * Operator envVar positioning & tls.SecretRef.Name ([#4806](#4806)) ([1115d96](1115d96)) * Populates project created_time correctly according to created ti… ([#4686](#4686)) ([a61b93c](a61b93c)) * Reduce feast-server container image size & fix dev image build ([#4781](#4781)) ([ccc9aea](ccc9aea)) * Removed version func from feature_store.py ([#4748](#4748)) ([f902bb9](f902bb9)) * Support registry instantiation for read-only users ([#4719](#4719)) ([ca3d3c8](ca3d3c8)) * Syntax Error in BigQuery While Retrieving Columns that Start wit… ([#4713](#4713)) ([60fbc62](60fbc62)) * Update release version in a pertinent Operator file ([#4708](#4708)) ([764a8a6](764a8a6)) ### Features * Add api contract to fastapi docs ([#4721](#4721)) ([1a165c7](1a165c7)) * Add Couchbase as an online store ([#4637](#4637)) ([824859b](824859b)) * Add Operator support for spec.feastProject & status.applied fields ([#4656](#4656)) ([430ac53](430ac53)) * Add services functionality to Operator ([#4723](#4723)) ([d1d80c0](d1d80c0)) * Add TLS support to the Operator ([#4796](#4796)) ([a617a6c](a617a6c)) * Added feast Go operator db stores support ([#4771](#4771)) ([3302363](3302363)) * Added support for setting env vars in feast services in feast controller ([#4739](#4739)) ([84b24b5](84b24b5)) * Adding docs outlining native Python transformations on singletons ([#4741](#4741)) ([0150278](0150278)) * Adding first feast operator e2e test. ([#4791](#4791)) ([8339f8d](8339f8d)) * Adding github action to run the operator end-to-end tests. ([#4762](#4762)) ([d8ccb00](d8ccb00)) * Adding ssl support for registry server. ([#4718](#4718)) ([ccf7a55](ccf7a55)) * Adding SSL support for the React UI server and feast UI command. ([#4736](#4736)) ([4a89252](4a89252)) * Adding support for native Python transformations on a single dictionary ([#4724](#4724)) ([9bbc1c6](9bbc1c6)) * Adding TLS support for offline server. ([#4744](#4744)) ([5d8d03f](5d8d03f)) * Building the feast image ([#4775](#4775)) ([6635dde](6635dde)) * File persistence definition and implementation ([#4742](#4742)) ([3bad4a1](3bad4a1)) * Object store persistence in operator ([#4758](#4758)) ([0ae86da](0ae86da)) * OIDC authorization in Feast Operator ([#4801](#4801)) ([eb111d6](eb111d6)) * Operator will create k8s serviceaccount for each feast service ([#4767](#4767)) ([cde5760](cde5760)) * Printing more verbose logs when we start the offline server ([#4660](#4660)) ([9d8d3d8](9d8d3d8)) * PVC configuration and impl ([#4750](#4750)) ([785a190](785a190)) * Qdrant vectorstore support ([#4689](#4689)) ([86573d2](86573d2)) * RBAC Authorization in Feast Operator ([#4786](#4786)) ([0ef5acc](0ef5acc)) * Support for nested timestamp fields in Spark Offline store ([#4740](#4740)) ([d4d94f8](d4d94f8)) * Update the go feature server from Expedia code repo. ([#4665](#4665)) ([6406625](6406625)) * Updated feast Go operator db stores ([#4809](#4809)) ([2c5a6b5](2c5a6b5)) * Updated sample secret following review ([#4811](#4811)) ([dc9f825](dc9f825))
* Add adapters for sqlite datetime conversion ([feast-dev#4797](feast-dev#4797)) ([e198b17](feast-dev@e198b17)) * Added grpcio extras to default feature-server image ([feast-dev#4737](feast-dev#4737)) ([e9cd373](feast-dev@e9cd373)) * Changing node version in release ([7089918](feast-dev@7089918)) * Feast create empty online table when FeatureView attribute online=False ([feast-dev#4666](feast-dev#4666)) ([237c453](feast-dev@237c453)) * Fix db store types in Operator CRD ([feast-dev#4798](feast-dev#4798)) ([f09339e](feast-dev@f09339e)) * Fix the config issue for postgres ([feast-dev#4776](feast-dev#4776)) ([a36f7e5](feast-dev@a36f7e5)) * Fixed example materialize-incremental and improved explanation ([feast-dev#4734](feast-dev#4734)) ([ca8a7ab](feast-dev@ca8a7ab)) * Fixed SparkSource docstrings so it wouldn't used inhereted class docstrings ([feast-dev#4722](feast-dev#4722)) ([32e6aa1](feast-dev@32e6aa1)) * Fixing PGVector integration tests ([feast-dev#4778](feast-dev#4778)) ([88a0320](feast-dev@88a0320)) * Incorrect type passed to assert_permissions in materialize endpoints ([feast-dev#4727](feast-dev#4727)) ([b72c2da](feast-dev@b72c2da)) * Issue of DataSource subclasses using parent abstract class docstrings ([feast-dev#4730](feast-dev#4730)) ([b24acd5](feast-dev@b24acd5)) * Operator envVar positioning & tls.SecretRef.Name ([feast-dev#4806](feast-dev#4806)) ([1115d96](feast-dev@1115d96)) * Populates project created_time correctly according to created ti… ([feast-dev#4686](feast-dev#4686)) ([a61b93c](feast-dev@a61b93c)) * Reduce feast-server container image size & fix dev image build ([feast-dev#4781](feast-dev#4781)) ([ccc9aea](feast-dev@ccc9aea)) * Removed version func from feature_store.py ([feast-dev#4748](feast-dev#4748)) ([f902bb9](feast-dev@f902bb9)) * Support registry instantiation for read-only users ([feast-dev#4719](feast-dev#4719)) ([ca3d3c8](feast-dev@ca3d3c8)) * Syntax Error in BigQuery While Retrieving Columns that Start wit… ([feast-dev#4713](feast-dev#4713)) ([60fbc62](feast-dev@60fbc62)) * Update release version in a pertinent Operator file ([feast-dev#4708](feast-dev#4708)) ([764a8a6](feast-dev@764a8a6)) * Add api contract to fastapi docs ([feast-dev#4721](feast-dev#4721)) ([1a165c7](feast-dev@1a165c7)) * Add Couchbase as an online store ([feast-dev#4637](feast-dev#4637)) ([824859b](feast-dev@824859b)) * Add Operator support for spec.feastProject & status.applied fields ([feast-dev#4656](feast-dev#4656)) ([430ac53](feast-dev@430ac53)) * Add services functionality to Operator ([feast-dev#4723](feast-dev#4723)) ([d1d80c0](feast-dev@d1d80c0)) * Add TLS support to the Operator ([feast-dev#4796](feast-dev#4796)) ([a617a6c](feast-dev@a617a6c)) * Added feast Go operator db stores support ([feast-dev#4771](feast-dev#4771)) ([3302363](feast-dev@3302363)) * Added support for setting env vars in feast services in feast controller ([feast-dev#4739](feast-dev#4739)) ([84b24b5](feast-dev@84b24b5)) * Adding docs outlining native Python transformations on singletons ([feast-dev#4741](feast-dev#4741)) ([0150278](feast-dev@0150278)) * Adding first feast operator e2e test. ([feast-dev#4791](feast-dev#4791)) ([8339f8d](feast-dev@8339f8d)) * Adding github action to run the operator end-to-end tests. ([feast-dev#4762](feast-dev#4762)) ([d8ccb00](feast-dev@d8ccb00)) * Adding ssl support for registry server. ([feast-dev#4718](feast-dev#4718)) ([ccf7a55](feast-dev@ccf7a55)) * Adding SSL support for the React UI server and feast UI command. ([feast-dev#4736](feast-dev#4736)) ([4a89252](feast-dev@4a89252)) * Adding support for native Python transformations on a single dictionary ([feast-dev#4724](feast-dev#4724)) ([9bbc1c6](feast-dev@9bbc1c6)) * Adding TLS support for offline server. ([feast-dev#4744](feast-dev#4744)) ([5d8d03f](feast-dev@5d8d03f)) * Building the feast image ([feast-dev#4775](feast-dev#4775)) ([6635dde](feast-dev@6635dde)) * File persistence definition and implementation ([feast-dev#4742](feast-dev#4742)) ([3bad4a1](feast-dev@3bad4a1)) * Object store persistence in operator ([feast-dev#4758](feast-dev#4758)) ([0ae86da](feast-dev@0ae86da)) * OIDC authorization in Feast Operator ([feast-dev#4801](feast-dev#4801)) ([eb111d6](feast-dev@eb111d6)) * Operator will create k8s serviceaccount for each feast service ([feast-dev#4767](feast-dev#4767)) ([cde5760](feast-dev@cde5760)) * Printing more verbose logs when we start the offline server ([feast-dev#4660](feast-dev#4660)) ([9d8d3d8](feast-dev@9d8d3d8)) * PVC configuration and impl ([feast-dev#4750](feast-dev#4750)) ([785a190](feast-dev@785a190)) * Qdrant vectorstore support ([feast-dev#4689](feast-dev#4689)) ([86573d2](feast-dev@86573d2)) * RBAC Authorization in Feast Operator ([feast-dev#4786](feast-dev#4786)) ([0ef5acc](feast-dev@0ef5acc)) * Support for nested timestamp fields in Spark Offline store ([feast-dev#4740](feast-dev#4740)) ([d4d94f8](feast-dev@d4d94f8)) * Update the go feature server from Expedia code repo. ([feast-dev#4665](feast-dev#4665)) ([6406625](feast-dev@6406625)) * Updated feast Go operator db stores ([feast-dev#4809](feast-dev#4809)) ([2c5a6b5](feast-dev@2c5a6b5)) * Updated sample secret following review ([feast-dev#4811](feast-dev#4811)) ([dc9f825](feast-dev@dc9f825))
What this PR does / why we need it:
This PR adds support for native Python transformations on a dictionary of a singleton (i.e., what most Data Scientists would call a single row).
As an example, Data Scientists can now transform features using an On Demand Transform like so:
Compare this to the previous logic to do the transformation which looks like:
Which issue(s) this PR fixes:
#4075
Misc
Added documentation in this PR: #4741