Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SVHN dataset #3535

Merged
merged 4 commits into from
Jan 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
207 changes: 207 additions & 0 deletions datasets/svhn/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
---
annotations_creators:
- machine-generated
- expert-generated
language_creators:
- machine-generated
languages:
- en
licenses:
- other
multilinguality:
- monolingual
size_categories:
- 100K<n<1M
source_datasets:
- original
task_categories:
- other
task_ids:
- other-other-object-detection
- other-other-image-classification
paperswithcode_id: svhn
pretty_name: Street View House Numbers
---

# Dataset Card for Street View House Numbers

## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)

## Dataset Description

- **Homepage:** http://ufldl.stanford.edu/housenumbers
- **Repository:**
- **Paper:** [Reading Digits in Natural Images with Unsupervised Feature Learning](http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf)
- **Leaderboard:** https://paperswithcode.com/sota/image-classification-on-svhn
- **Point of Contact:** streetviewhousenumbers@gmail.com

### Dataset Summary

SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor to MNIST (e.g., the images are of small cropped digits), but incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images. The dataset comes in two formats:
1. Original images with character level bounding boxes.
2. MNIST-like 32-by-32 images centered around a single character (many of the images do contain some distractors at the sides).

### Supported Tasks and Leaderboards

- `object-classification`: The dataset can be used to train a model for digit detection.
- `image-classification`: The dataset can be used to train a model for Image Classification where the task is to predict a correct digit on the image. The leaderboard for this task is available at:
https://paperswithcode.com/sota/image-classification-on-svhn

### Languages

English

## Dataset Structure

### Data Instances

#### full_numbers

The original, variable-resolution, color house-number images with character level bounding boxes.

```
{
'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=98x48 at 0x259E3F01780>,
'digits': {
'bbox': [
[36, 7, 13, 32],
[50, 7, 12, 32]
],
'label': [6, 9]
}
}
```

#### cropped_digits

Character level ground truth in an MNIST-like format. All digits have been resized to a fixed resolution of 32-by-32 pixels. The original character bounding boxes are extended in the appropriate dimension to become square windows, so that resizing them to 32-by-32 pixels does not introduce aspect ratio distortions. Nevertheless this preprocessing introduces some distracting digits to the sides of the digit of interest.

```
{
'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=32x32 at 0x25A89494780>,
'label': 1
}
```

### Data Fields

#### full_numbers

- `image`: A `PIL.Image.Image` object containing the image. Note that when accessing the image column: `dataset[0]["image"]` the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the `"image"` column, *i.e.* `dataset[0]["image"]` should **always** be preferred over `dataset["image"][0]`
- `digits`: a dictionary containing digits' bounding boxes and labels
- `bbox`: a list of bounding boxes (in the [coco](https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/#coco) format) corresponding to the digits present on the image
- `label`: a list of integers between 0 and 9 representing the digit.

#### cropped_digits

- `image`: A `PIL.Image.Image` object containing the image. Note that when accessing the image column: `dataset[0]["image"]` the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the `"image"` column, *i.e.* `dataset[0]["image"]` should **always** be preferred over `dataset["image"][0]`
- `digit`: an integer between 0 and 9 representing the digit.

### Data Splits

#### full_numbers

The data is split into training, test and extra set. The training set contains 33402 images, test set 13068 and the extra set 202353 images.

#### cropped_digits

The data is split into training, test and extra set. The training set contains 73257 images, test set 26032 and the extra set 531131 images.

The extra set can be used as extra training data. The extra set was obtained in a similar manner to the training and test set, but with the increased detection threshold in order to generate this large amount of labeled data. The SVHN extra subset is thus somewhat biased toward less difficult detections, and is thus easier than SVHN train/SVHN test.

## Dataset Creation

### Curation Rationale

From the paper:
> As mentioned above, the venerable MNIST dataset has been a valuable goal post for researchers seeking to build better learning systems whose benchmark performance could be expected to translate into improved performance on realistic applications. However, computers have now reached essentially human levels of performance on this problem—a testament to progress in machine learning and computer vision. The Street View House Numbers (SVHN) digit database that we provide can be seen as similar in flavor to MNIST (e.g., the images are of small cropped characters), but the SVHN dataset incorporates an order of magnitude more labeled data and comes from a significantly harder, unsolved, real world problem. Here the gap between human performance and state of the art feature representations is significant. Going forward, we expect that this dataset may fulfill a similar role for modern feature learning algorithms: it provides a new and difficult benchmark where increased performance can be expected to translate into tangible gains on a realistic application.

### Source Data

#### Initial Data Collection and Normalization

From the paper:
> The SVHN dataset was obtained from a large number of Street View images using a combination
of automated algorithms and the Amazon Mechanical Turk (AMT) framework, which was
used to localize and transcribe the single digits. We downloaded a very large set of images from
urban areas in various countries.

#### Who are the source language producers?

[More Information Needed]

### Annotations

#### Annotation process

From the paper:
> From these randomly selected images, the house-number patches were extracted using a dedicated sliding window house-numbers detector using a low threshold on the detector’s confidence score in order to get a varied, unbiased dataset of house-number signs. These low precision detections were screened and transcribed by AMT workers.

#### Who are the annotators?

The AMT workers.

### Personal and Sensitive Information

[More Information Needed]

## Considerations for Using the Data

### Social Impact of Dataset

[More Information Needed]

### Discussion of Biases

[More Information Needed]

### Other Known Limitations

[More Information Needed]

## Additional Information

### Dataset Curators

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu and Andrew Y. Ng

### Licensing Information

Non-commerical use only.

### Citation Information

```
@article{netzer2011reading,
title={Reading digits in natural images with unsupervised feature learning},
author={Netzer, Yuval and Wang, Tao and Coates, Adam and Bissacco, Alessandro and Wu, Bo and Ng, Andrew Y},
year={2011}
}
```

### Contributions

Thanks to [@mariosasko](/~https://github.com/mariosasko) for adding this dataset.
1 change: 1 addition & 0 deletions datasets/svhn/dataset_infos.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"full_numbers": {"description": "SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting.\nIt can be seen as similar in flavor to MNIST (e.g., the images are of small cropped digits), but incorporates an order of magnitude more labeled data (over 600,000 digit images)\nand comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images.\n", "citation": "@article{netzer2011reading,\n title={Reading digits in natural images with unsupervised feature learning},\n author={Netzer, Yuval and Wang, Tao and Coates, Adam and Bissacco, Alessandro and Wu, Bo and Ng, Andrew Y},\n year={2011}\n}\n", "homepage": "http://ufldl.stanford.edu/housenumbers/", "license": "Custom (non-commercial)", "features": {"image": {"id": null, "_type": "Image"}, "digits": {"feature": {"bbox": {"feature": {"dtype": "int32", "id": null, "_type": "Value"}, "length": 4, "id": null, "_type": "Sequence"}, "label": {"num_classes": 10, "names": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"], "names_file": null, "id": null, "_type": "ClassLabel"}}, "length": -1, "id": null, "_type": "Sequence"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "svhn", "config_name": "full_numbers", "version": {"version_str": "1.0.0", "description": null, "major": 1, "minor": 0, "patch": 0}, "splits": {"train": {"name": "train", "num_bytes": 390404309, "num_examples": 33402, "dataset_name": "svhn"}, "test": {"name": "test", "num_bytes": 271503052, "num_examples": 13068, "dataset_name": "svhn"}, "extra": {"name": "extra", "num_bytes": 1868720340, "num_examples": 202353, "dataset_name": "svhn"}}, "download_checksums": {"http://ufldl.stanford.edu/housenumbers/train.tar.gz": {"num_bytes": 404141560, "checksum": "4b17bb33b6cd8f963493168f80143da956f28ec406cc12f8e5745a9f91a51898"}, "http://ufldl.stanford.edu/housenumbers/test.tar.gz": {"num_bytes": 276555967, "checksum": "57ac9ceb530e4aa85b55d991be8fc49c695b3d71c6f6a88afea86549efde7fb5"}, "http://ufldl.stanford.edu/housenumbers/extra.tar.gz": {"num_bytes": 1955489752, "checksum": "e857e27d1e65bd1e7d3959b094061777f6506bbc39889a0df3bba6a729d60f9c"}}, "download_size": 2636187279, "post_processing_size": null, "dataset_size": 2530627701, "size_in_bytes": 5166814980}, "cropped_digits": {"description": "SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting.\nIt can be seen as similar in flavor to MNIST (e.g., the images are of small cropped digits), but incorporates an order of magnitude more labeled data (over 600,000 digit images)\nand comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images.\n", "citation": "@article{netzer2011reading,\n title={Reading digits in natural images with unsupervised feature learning},\n author={Netzer, Yuval and Wang, Tao and Coates, Adam and Bissacco, Alessandro and Wu, Bo and Ng, Andrew Y},\n year={2011}\n}\n", "homepage": "http://ufldl.stanford.edu/housenumbers/", "license": "Custom (non-commercial)", "features": {"image": {"id": null, "_type": "Image"}, "label": {"num_classes": 10, "names": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"], "names_file": null, "id": null, "_type": "ClassLabel"}}, "post_processed": null, "supervised_keys": null, "task_templates": [{"task": "image-classification", "image_column": "image", "label_column": "label", "labels": null}], "builder_name": "svhn", "config_name": "cropped_digits", "version": {"version_str": "1.0.0", "description": null, "major": 1, "minor": 0, "patch": 0}, "splits": {"train": {"name": "train", "num_bytes": 128364360, "num_examples": 73257, "dataset_name": "svhn"}, "test": {"name": "test", "num_bytes": 44464040, "num_examples": 26032, "dataset_name": "svhn"}, "extra": {"name": "extra", "num_bytes": 967853504, "num_examples": 531131, "dataset_name": "svhn"}}, "download_checksums": {"http://ufldl.stanford.edu/housenumbers/train_32x32.mat": {"num_bytes": 182040794, "checksum": "435e94d69a87fde4fd4d7f3dd208dfc32cb6ae8af2240d066de1df7508d083b8"}, "http://ufldl.stanford.edu/housenumbers/test_32x32.mat": {"num_bytes": 64275384, "checksum": "cdce80dfb2a2c4c6160906d0bd7c68ec5a99d7ca4831afa54f09182025b6a75b"}, "http://ufldl.stanford.edu/housenumbers/extra_32x32.mat": {"num_bytes": 1329278602, "checksum": "a133a4beb38a00fcdda90c9489e0c04f900b660ce8a316a5e854838379a71eb3"}}, "download_size": 1575594780, "post_processing_size": null, "dataset_size": 1140681904, "size_in_bytes": 2716276684}}
Binary file not shown.
Binary file not shown.
Loading