Still not able to use non-sklearn estimators without wrapping them in a pipeline #734

hp2500 · 2019-07-11T18:11:12Z

Hi there, I raised this in issue #724. I have been trying to run experiments with a fairly new sklearn-extra classifier (/~https://github.com/Alex7Li/scikit-learn-extra/tree/master/sklearn_extra). The classifier runs fine on a local dataset. However, when I am trying to run it on an openml task, I am getting an error.

Here is a minimal example:

# define classifier
from sklearn_extra.fast_kernel import FKC_EigenPro
clf = FKC_EigenPro()
# get task
task = openml.tasks.get_task(3)
# run model on task
run = openml.runs.run_model_on_task(clf, task)
# publish run on openml
run.publish()

AttributeError Traceback (most recent call last)
in
4 task = openml.tasks.get_task(3)
5 # run model on task
----> 6 run = openml.runs.run_model_on_task(clf, task)
7 # publish run on openml
8 run.publish()

/miniconda3/lib/python3.7/site-packages/openml/runs/functions.py in run_model_on_task(model, task, avoid_duplicate_runs, flow_tags, seed, add_local_measures, upload_flow, return_flow)
104 seed=seed,
105 add_local_measures=add_local_measures,
--> 106 upload_flow=upload_flow,
107 )
108 if return_flow:

/miniconda3/lib/python3.7/site-packages/openml/runs/functions.py in run_flow_on_task(flow, task, avoid_duplicate_runs, flow_tags, seed, add_local_measures, upload_flow)
172 task, flow = flow, task
173
--> 174 flow.model = flow.extension.seed_model(flow.model, seed=seed)
175
176 # We only need to sync with the server right now if we want to upload the flow,

AttributeError: 'NoneType' object has no attribute 'seed_model'

You mentioned that this should be fixed via #722, but I am still encountering the same error.

@amueller

amueller · 2019-07-22T18:51:38Z

Easier way to reproduce:

import openml
from sklearn.linear_model import LogisticRegression

# there needs to be a version specified but this works lol.
__version__ = 0.1

class MyLR(LogisticRegression):
    pass

clf = MyLR()
# get task
task = openml.tasks.get_task(3)
# run model on task
run = openml.runs.run_model_on_task(clf, task)
# publish run on openml
run.publish()

RuntimeError: No extension could be found for flow None: main.MyLR

So get_extension_by_model returns sklearn because isinstance(MyLR(), BaseEstimator) - which is also not the correct test btw but whatever.

The problem is that the flow that is created from that model is not an sklearn extension flow, because that's created by get_extension_by_flow, and the sklearn.extension module doesn't set include sklearn in its external_version, it's only including the sklearn version in the tags:

openml-python/openml/extensions/sklearn/extension.py

Line 420 in 56fcc00

flow = OpenMLFlow(name=name,

There are two obvious fixes:
a) When creating the flow, allow setting the extension directly, because we know what the extension is supposed to be.

b) include the sklearn version in the external version

I feel we should be doing both possibly?

mfeurer · 2019-07-23T08:40:54Z

which is also not the correct test btw but whatever.

what would you test for? The interface?

include the sklearn version in the external version

that would definitively be helpful

When creating the flow, allow setting the extension directly, because we know what the extension is supposed to be.

I hope that this won't be necessary, but we should keep it in mind in case this problem persists.

amueller · 2019-10-14T16:48:57Z

This issue still persists with older flows:

import openml
openml.flows.get_flow(7660, reinstantiate=True)

I would really like to run flow 7777 because it's used in the definition of CC-18, but I can't because it contains the ConditionalImputer, which can't be reinstantiated with current openml (I tried to use older openml and failed as well).

mfeurer · 2019-10-14T18:48:35Z

Two questions:

Do you have the ConditionalImputer installed? If yes, could you please paste the error?
What's the behavior that you expect? That the pipeline is partially instantiated, except for the ConditionalImputer?

amueller · 2019-10-15T07:25:44Z

Yes. The error is that we can only instantiate sklearn flows. This was fixed in add sklearn version to external version in sklearn flows, #742 for new flows, but this is an old flow and doesn't have sklearn in the external version, so the sklearn extension can't detect that it handles it.
That the pipeline is instantiated. The rest can be instantiated.

We can decide that we basically abandon all third-party flows that were created before #742, or we need to change how an extension detects if it can handle a flow. Hacky solution: add study14 to the modules to check for in the module to know that the extension can handle a flow.

amueller · 2019-10-15T07:25:52Z

cc @janvanrijn

amueller · 2019-10-16T12:44:46Z

fixed in #830

mfeurer assigned Neeratyoy Jul 15, 2019

amueller mentioned this issue Jul 22, 2019

AttributeError: 'NoneType' object has no attribute 'seed_model' #724

Closed

amueller mentioned this issue Jul 22, 2019

add sklearn version to external version in sklearn flows, #742

Merged

mfeurer assigned amueller and unassigned Neeratyoy Jul 23, 2019

mfeurer closed this as completed in #742 Jul 26, 2019

amueller reopened this Oct 14, 2019

amueller mentioned this issue Oct 15, 2019

also check dependencies for sklearn string #830

Merged

amueller closed this as completed Oct 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Still not able to use non-sklearn estimators without wrapping them in a pipeline #734

Still not able to use non-sklearn estimators without wrapping them in a pipeline #734

hp2500 commented Jul 11, 2019

amueller commented Jul 22, 2019

mfeurer commented Jul 23, 2019

amueller commented Oct 14, 2019

mfeurer commented Oct 14, 2019

amueller commented Oct 15, 2019

amueller commented Oct 15, 2019

amueller commented Oct 16, 2019

Still not able to use non-sklearn estimators without wrapping them in a pipeline #734

Still not able to use non-sklearn estimators without wrapping them in a pipeline #734

Comments

hp2500 commented Jul 11, 2019

amueller commented Jul 22, 2019

mfeurer commented Jul 23, 2019

amueller commented Oct 14, 2019

mfeurer commented Oct 14, 2019

amueller commented Oct 15, 2019

amueller commented Oct 15, 2019

amueller commented Oct 16, 2019