Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs-alspredictstreamop.md示例代码待更新/错误 #104

Closed
Sunny-Island opened this issue Jun 29, 2020 · 0 comments
Closed

docs-alspredictstreamop.md示例代码待更新/错误 #104

Sunny-Island opened this issue Jun 29, 2020 · 0 comments
Labels
bug Something isn't working

Comments

@Sunny-Island
Copy link

尝试使用 alspredictstreamop.md 中的示例代码使用Alink中的ALS训练训练和预测功能,示例代码如下:

data = np.array([
    [1, 1, 0.6],
    [2, 2, 0.8],
    [2, 3, 0.6],
    [4, 1, 0.6],
    [4, 2, 0.3],
    [4, 3, 0.4],
])

df_data = pd.DataFrame({
    "user": data[:, 0],
    "item": data[:, 1],
    "rating": data[:, 2],
})


df_data["user"] = df_data["user"].astype('int')
df_data["item"] = df_data["item"].astype('int')

data = dataframeToOperator(df_data, schemaStr='user bigint, item bigint, rating double', op_type='stream')

als = AlsTrainBatchOp().setUserCol("user").setItemCol("item").setRateCol("rating") \
    .setNumIter(10).setRank(10).setLambda(0.01)
predictor = AlsPredictStreamOp()\
    .setUserCol("user").setItemCol("item").setPredictionCol("predicted_rating")

model = als.linkFrom(data)
predictor.linkFrom(model, data).print()

其中predictor定义时调用的AlsTrainBatchOp()初始化明确表明了需要传入一个类型为BatchOperator的model。代码见pyalink-alink-stream-base.py:

class BaseModelStreamOp(StreamOperator):
    def __init__(self, *args, **kwargs):
        if "model" in kwargs:
            self.model = kwargs.pop("model")
        elif len(args) >= 1:
            self.model = args[0]
            args = args[1:]
        else:
            raise Exception("A BatchOperator representing model must be provided.")
        super(BaseModelStreamOp, self).__init__(j_op=None, model=self.model, *args, **kwargs, )

    def linkFrom(self, *args):
        super(BaseModelStreamOp, self).linkFrom(*args)
        self.inputs = [self.model] + [x for x in args]
        return self

但是示例代码中没有传入。经过调整我把示例代码修改如如下:

data = np.array([
    [1, 1, 0.6],
    [2, 2, 0.8],
    [2, 3, 0.6],
    [4, 1, 0.6],
    [4, 2, 0.3],
    [4, 3, 0.4],
])

df_data = pd.DataFrame({
    "user": data[:, 0],
    "item": data[:, 1],
    "rating": data[:, 2],
})
df_data["user"] = df_data["user"].astype('int')
df_data["item"] = df_data["item"].astype('int')

data_stream = dataframeToOperator(df_data, schemaStr='user bigint, item bigint, rating double', op_type='stream')
data_batch = dataframeToOperator(df_data, schemaStr='user bigint, item bigint, rating double', op_type='batch')
als = AlsTrainBatchOp().setUserCol("user").setItemCol("item").setRateCol("rating") \
    .setNumIter(10).setRank(10).setLambda(0.01)
model = als.linkFrom(data_batch)
predictor = AlsPredictStreamOp(als)\
    .setUserCol("user").setItemCol("item").setPredictionCol("predicted_rating")


predictor.linkFrom(data_stream).print()
StreamOperator.execute()

可以得到与示例相同的输出。猜测是因为版本更迭没有更新docs?

@shaomengwang shaomengwang added the bug Something isn't working label Jun 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants