Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concatenate_datasets removes ClassLabel typing. #3111

Closed
Dref360 opened this issue Oct 19, 2021 · 1 comment · Fixed by #3120
Closed

concatenate_datasets removes ClassLabel typing. #3111

Dref360 opened this issue Oct 19, 2021 · 1 comment · Fixed by #3120
Assignees
Labels
bug Something isn't working

Comments

@Dref360
Copy link
Contributor

Dref360 commented Oct 19, 2021

Describe the bug

When concatenating two datasets, we lose typing of ClassLabel columns.

I can work on this if this is a legitimate bug,

Steps to reproduce the bug

import datasets
from datasets import Dataset, ClassLabel, Value, concatenate_datasets

DS_LEN = 100
my_dataset = Dataset.from_dict(
    {
        "sentence": [f"{chr(i % 10)}" for i in range(DS_LEN)],
        "label": [i % 2 for i in range(DS_LEN)]
    }
)
my_predictions = Dataset.from_dict(
    {
        "pred": [(i + 1) % 2 for i in range(DS_LEN)]
    }
)

my_dataset = my_dataset.cast(datasets.Features({"sentence": Value("string"), "label": ClassLabel(2, names=["POS", "NEG"])}))
print("Original")
print(my_dataset)
print(my_dataset.features)


concat_ds = concatenate_datasets([my_dataset, my_predictions], axis=1)
print("Concatenated")
print(concat_ds)
print(concat_ds.features)

Expected results

The features of concat_ds should contain ClassLabel.

Actual results

On master, I get:

{'sentence': Value(dtype='string', id=None), 'label': Value(dtype='int64', id=None), 'pred': Value(dtype='int64', id=None)}

Environment info

  • datasets version: 1.14.1.dev0
  • Platform: macOS-10.15.7-x86_64-i386-64bit
  • Python version: 3.8.11
  • PyArrow version: 4.0.1
@Dref360 Dref360 added the bug Something isn't working label Oct 19, 2021
Dref360 pushed a commit to Dref360/datasets that referenced this issue Oct 19, 2021
Dref360 pushed a commit to Dref360/datasets that referenced this issue Oct 19, 2021
@Dref360
Copy link
Contributor Author

Dref360 commented Oct 19, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants