Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'NoneType' object has no attribute 'seed_model' #724

Closed
hp2500 opened this issue Jul 2, 2019 · 9 comments
Closed

AttributeError: 'NoneType' object has no attribute 'seed_model' #724

hp2500 opened this issue Jul 2, 2019 · 9 comments

Comments

@hp2500
Copy link

hp2500 commented Jul 2, 2019

Hi there,

I have been trying to run experiments with a fairly new sklearn-extra classifier (/~https://github.com/Alex7Li/scikit-learn-extra/tree/master/sklearn_extra). The classifier runs fine on a local dataset. However, when I am trying to run it on an openml task, I am getting an error.

Here is a minimal example:

import openml
# define classifier
from sklearn_extra.fast_kernel import FKC_EigenPro
clf = FKC_EigenPro()
# get task
task = openml.tasks.get_task(3)
# run model on task
run = openml.runs.run_model_on_task(clf, task)
# publish run on openml
run.publish()

AttributeError Traceback (most recent call last)
in
4 task = openml.tasks.get_task(3)
5 # run model on task
----> 6 run = openml.runs.run_model_on_task(clf, task)
7 # publish run on openml
8 run.publish()

/miniconda3/lib/python3.7/site-packages/openml/runs/functions.py in run_model_on_task(model, task, avoid_duplicate_runs, flow_tags, seed, add_local_measures, upload_flow, return_flow)
104 seed=seed,
105 add_local_measures=add_local_measures,
--> 106 upload_flow=upload_flow,
107 )
108 if return_flow:

/miniconda3/lib/python3.7/site-packages/openml/runs/functions.py in run_flow_on_task(flow, task, avoid_duplicate_runs, flow_tags, seed, add_local_measures, upload_flow)
172 task, flow = flow, task
173
--> 174 flow.model = flow.extension.seed_model(flow.model, seed=seed)
175
176 # We only need to sync with the server right now if we want to upload the flow,

AttributeError: 'NoneType' object has no attribute 'seed_model'

Is there anything I can do to prevent this from happening?
@amueller

@amueller
Copy link
Contributor

amueller commented Jul 2, 2019

I haven't looked at this since the extensions got refactored.

It looks like you need to tell openml that the sklearn extension can be used to handle this estimator.
I'm a bit surprised that that's not the fallback / default.

@hp2500
Copy link
Author

hp2500 commented Jul 5, 2019

I think I found a workaround. Using the estimator in a pipeline seems to solve the issue.
@amueller

@mfeurer
Copy link
Collaborator

mfeurer commented Jul 8, 2019

This appears to have the same root cause as #718 and #720 and will be fixed via #722. Please reopen if this is not the case.

@hp2500
Copy link
Author

hp2500 commented Jul 11, 2019

Hi there, unfortunately the issue still persists.

@nok
Copy link

nok commented Jul 11, 2019

Hello @hp2500 , why do you swap the objects in line 172?

task, flow = flow, task

To be clear I just read your provided source code and it confused me. Can you check the data type with type(flow) after line 172 please?

@amueller
Copy link
Contributor

amueller commented Jul 22, 2019

Easier way to reproduce is this:

# define classifier
import openml
from sklearn.linear_model import LogisticRegression

# there needs to be a version specified but this works lol.
__version__ = 0.1

class MyLR(LogisticRegression):
    pass

clf = MyLR()
# get task
task = openml.tasks.get_task(3)
# run model on task
run = openml.runs.run_model_on_task(clf, task)
# publish run on openml
run.publish()

RuntimeError: No extension could be found for flow None: main.MyLR

@amueller
Copy link
Contributor

amueller commented Jul 22, 2019

It's pretty obvious that this was broken in #647.
The sklearn extension should handle this, but the sklearn extension can handle only things that _is_sklearn_flow returns True for, and that's ',sklearn==' in flow.external_version

@amueller
Copy link
Contributor

Ok so get_extension_by_model returns sklearn because isinstance(MyLR(), BaseEstimator) - which is also not the correct test btw but whatever.

The problem is that the flow that is created from that model is not an sklearn extension flow, because that's created by get_extension_by_flow, and the sklearn.extension module doesn't set include sklearn in its external_version, it's only including the sklearn version in the tags:

flow = OpenMLFlow(name=name,

There are two obvious fixes:
a) When creating the flow, allow setting the extension directly, because we know what the extension is supposed to be.

b) include the sklearn version in the external version

I feel we should be doing both possibly?

@amueller amueller reopened this Jul 22, 2019
@amueller
Copy link
Contributor

damn meant to comment on #734

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants