We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When concatenating two datasets, we lose typing of ClassLabel columns.
I can work on this if this is a legitimate bug,
import datasets from datasets import Dataset, ClassLabel, Value, concatenate_datasets DS_LEN = 100 my_dataset = Dataset.from_dict( { "sentence": [f"{chr(i % 10)}" for i in range(DS_LEN)], "label": [i % 2 for i in range(DS_LEN)] } ) my_predictions = Dataset.from_dict( { "pred": [(i + 1) % 2 for i in range(DS_LEN)] } ) my_dataset = my_dataset.cast(datasets.Features({"sentence": Value("string"), "label": ClassLabel(2, names=["POS", "NEG"])})) print("Original") print(my_dataset) print(my_dataset.features) concat_ds = concatenate_datasets([my_dataset, my_predictions], axis=1) print("Concatenated") print(concat_ds) print(concat_ds.features)
The features of concat_ds should contain ClassLabel.
concat_ds
On master, I get:
{'sentence': Value(dtype='string', id=None), 'label': Value(dtype='int64', id=None), 'pred': Value(dtype='int64', id=None)}
datasets
The text was updated successfully, but these errors were encountered:
huggingface#3111 Set features correctly when concatenating.
cce0ae8
835b1ea
Something like this would fix it I think: /~https://github.com/huggingface/datasets/compare/master...Dref360:HF-3111/concatenate_types?expand=1
Sorry, something went wrong.
mariosasko
Successfully merging a pull request may close this issue.
Describe the bug
When concatenating two datasets, we lose typing of ClassLabel columns.
I can work on this if this is a legitimate bug,
Steps to reproduce the bug
Expected results
The features of
concat_ds
should contain ClassLabel.Actual results
On master, I get:
Environment info
datasets
version: 1.14.1.dev0The text was updated successfully, but these errors were encountered: