Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix args of feature docstrings #7103

Merged
merged 7 commits into from
Aug 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/source/package_reference/main_classes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -211,18 +211,26 @@ Dictionary with split names as keys ('train', 'test' for example), and `Iterable

[[autodoc]] datasets.Features

### Scalar

[[autodoc]] datasets.Value

[[autodoc]] datasets.ClassLabel

### Composite

[[autodoc]] datasets.LargeList

[[autodoc]] datasets.Sequence

### Translation

[[autodoc]] datasets.Translation

[[autodoc]] datasets.TranslationVariableLanguages

### Arrays

[[autodoc]] datasets.Array2D

[[autodoc]] datasets.Array3D
Expand All @@ -231,8 +239,12 @@ Dictionary with split names as keys ('train', 'test' for example), and `Iterable

[[autodoc]] datasets.Array5D

### Audio

[[autodoc]] datasets.Audio

### Image

[[autodoc]] datasets.Image

## Filesystems
Expand Down
59 changes: 32 additions & 27 deletions src/datasets/features/features.py
Original file line number Diff line number Diff line change
Expand Up @@ -460,8 +460,9 @@ def cast_to_python_objects(obj: Any, only_1d_for_numpy=False, optimize_list_cast
@dataclass
class Value:
"""
The `Value` dtypes are as follows:
Scalar feature value of a particular data type.
The possible dtypes of `Value` are as follows:
- `null`
- `bool`
- `int8`
Expand Down Expand Up @@ -489,6 +490,10 @@ class Value:
- `string`
- `large_string`
Args:
dtype (`str`):
Name of the data type.
Example:
```py
Expand Down Expand Up @@ -546,9 +551,9 @@ class Array2D(_ArrayXD):
Args:
shape (`tuple`):
The size of each dimension.
Size of each dimension.
dtype (`str`):
The value of the data type.
Name of the data type.
Example:
Expand All @@ -571,9 +576,9 @@ class Array3D(_ArrayXD):
Args:
shape (`tuple`):
The size of each dimension.
Size of each dimension.
dtype (`str`):
The value of the data type.
Name of the data type.
Example:
Expand All @@ -596,9 +601,9 @@ class Array4D(_ArrayXD):
Args:
shape (`tuple`):
The size of each dimension.
Size of each dimension.
dtype (`str`):
The value of the data type.
Name of the data type.
Example:
Expand All @@ -621,9 +626,9 @@ class Array5D(_ArrayXD):
Args:
shape (`tuple`):
The size of each dimension.
Size of each dimension.
dtype (`str`):
The value of the data type.
Name of the data type.
Example:
Expand Down Expand Up @@ -1139,7 +1144,7 @@ class Sequence:
Mostly here for compatiblity with tfds.
Args:
feature:
feature ([`FeatureType`]):
A list of features of a single type or a dictionary of types.
length (`int`):
Length of the sequence.
Expand Down Expand Up @@ -1170,7 +1175,7 @@ class LargeList:
It is backed by `pyarrow.LargeListType`, which is like `pyarrow.ListType` but with 64-bit rather than 32-bit offsets.
Args:
dtype:
dtype ([`FeatureType`]):
Child feature data type of each item within the large list.
"""

Expand Down Expand Up @@ -1695,30 +1700,30 @@ class Features(dict):
and values are the type of that column.
`FieldType` can be one of the following:
- a [`~datasets.Value`] feature specifies a single typed value, e.g. `int64` or `string`.
- a [`~datasets.ClassLabel`] feature specifies a field with a predefined set of classes which can have labels
associated to them and will be stored as integers in the dataset.
- a python `dict` which specifies that the field is a nested field containing a mapping of sub-fields to sub-fields
features. It's possible to have nested fields of nested fields in an arbitrary manner.
- a python `list` or a [`~datasets.Sequence`] specifies that the field contains a list of objects. The python
`list` or [`~datasets.Sequence`] should be provided with a single sub-feature as an example of the feature
type hosted in this list.
- [`Value`] feature specifies a single data type value, e.g. `int64` or `string`.
- [`ClassLabel`] feature specifies a predefined set of classes which can have labels associated to them and
will be stored as integers in the dataset.
- Python `dict` specifies a composite feature containing a mapping of sub-fields to sub-features.
It's possible to have nested fields of nested fields in an arbitrary manner.
- Python `list`, [`LargeList`] or [`Sequence`] specifies a composite feature containing a sequence of
sub-features, all of the same feature type.
<Tip>
A [`~datasets.Sequence`] with a internal dictionary feature will be automatically converted into a dictionary of
A [`Sequence`] with an internal dictionary feature will be automatically converted into a dictionary of
lists. This behavior is implemented to have a compatibility layer with the TensorFlow Datasets library but may be
un-wanted in some cases. If you don't want this behavior, you can use a python `list` instead of the
[`~datasets.Sequence`].
un-wanted in some cases. If you don't want this behavior, you can use a Python `list` or a [`LargeList`]
instead of the [`Sequence`].
</Tip>
- a [`Array2D`], [`Array3D`], [`Array4D`] or [`Array5D`] feature for multidimensional arrays.
- an [`Audio`] feature to store the absolute path to an audio file or a dictionary with the relative path
- [`Array2D`], [`Array3D`], [`Array4D`] or [`Array5D`] feature for multidimensional arrays.
- [`Audio`] feature to store the absolute path to an audio file or a dictionary with the relative path
to an audio file ("path" key) and its bytes content ("bytes" key). This feature extracts the audio data.
- an [`Image`] feature to store the absolute path to an image file, an `np.ndarray` object, a `PIL.Image.Image` object
or a dictionary with the relative path to an image file ("path" key) and its bytes content ("bytes" key). This feature extracts the image data.
- [`~datasets.Translation`] and [`~datasets.TranslationVariableLanguages`], the two features specific to Machine Translation.
- [`Image`] feature to store the absolute path to an image file, an `np.ndarray` object, a `PIL.Image.Image` object
or a dictionary with the relative path to an image file ("path" key) and its bytes content ("bytes" key).
This feature extracts the image data.
- [`Translation`] or [`TranslationVariableLanguages`] feature specific to Machine Translation.
"""

def __init__(*args, **kwargs):
Expand Down
4 changes: 2 additions & 2 deletions src/datasets/features/translation.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

@dataclass
class Translation:
"""`FeatureConnector` for translations with fixed languages per example.
"""`Feature` for translations with fixed languages per example.
Here for compatiblity with tfds.
Args:
Expand Down Expand Up @@ -50,7 +50,7 @@ def flatten(self) -> Union["FeatureType", Dict[str, "FeatureType"]]:

@dataclass
class TranslationVariableLanguages:
"""`FeatureConnector` for translations with variable languages per example.
"""`Feature` for translations with variable languages per example.
Here for compatiblity with tfds.
Args:
Expand Down
Loading