PKG-INFO

Metadata-Version: 2.3
Name: IndianPines
Version: 0.1.22
Summary: Indian Pines datasets for scikit-learn
Author-email: Kotaro SONODA <kotaro1976@gmail.com>
License: MIT
Requires-Python: >=3.10
Requires-Dist: colormap>=1.2.0
Requires-Dist: importlib>=1.0.4
Requires-Dist: numpy>=2.1.3
Requires-Dist: pandas>=2.2.3
Requires-Dist: requests>=2.32.3
Requires-Dist: scikit-learn>=1.5.2
Requires-Dist: scipy>=1.14.1
Requires-Dist: tifffile>=2024.9.20
Requires-Dist: tqdm>=4.67.0
Description-Content-Type: text/markdown

# IndianPines Dataset

IndianPine Site3 [dataset] taken by AVIRIS, an aerial imaging camera equipped with a high-resolution sensor, can be read by scikit-learn.

[dataset]:https://purr.purdue.edu/publications/1947/1

## module install

```{shell}
pip install git+/~https://github.com/helmenov/IndianPines
```

## provides

- *`Readme.md`* : This document self
- *`setup.py`* : Script required for installing the package by pip
- *`__init__.py`* : constructor for the provided modules
- *`dataset.py`* : Main module includes two function `load` and `make_dataset`
- *`example.py`*: sample code
- *`resource/`*
  - *`IndianPines.rst`*: Description for dataset
  - *`LabelsFromTIF.rst`*: Description for Original Ground Truth
  - *`LabelsFromMAT.rst`*: Description for GIC Ground Truth
  - *`ReduceWaterAbsorption.rst`*: Description for Reducing Water Absorption Channels
  - *`recategorize_rules/`*
    - *`recategorize17to10.csv`*: recategorize_rule example. which recategorize original 17 categories to 10 categories.

## usage

```
import IndianPines as pines
```

see [example](example.ipynb)

### IndianPines.load

```
load(
  pca = 0,
  include_background = True,
  recategorize_rule = None,
  exclude_WaterAbsorptionChannels = True,
)
```

#### Inputs

- `pca` : Parameter for PCA (default: `0`)
  - `pca` == 0: No analized
  - 0 < `pca` < 1 (float): contribution ratio
  - `pca` >= 1 (int): number of dimensions after compression
  - `pca` < 0 : No compression but analyzed PCs
  - it effects the bunch shape (if >0, ($145 * 145$, $220$)->($145 * 145$, #_of_features))

- `include_background` : whether the background (un-registered) samples are included in the Bunch or not. (default: `True`)
  - it effects the bunch shape (if False, ($145 * 145$, $220$)->(#_of_samples, $220$))

- `recategorize_rule` : (default: None)
  - original data is categorized 16-category and background. But some categories are very few counted. Therefore you can reserve Transform Table as following format, and set it in this option.

  if you recategorize in following new categories

  |BackGround|Alfalfa|Corn|Grass|Hay-windrowed|Soybeans|Wheat|Woods|Bldg-Grass-Tree-Drives|Stone-steel towers|
  |---|---|---|---|---|---|---|---|---|---|
  |#ffffff|#ff0088|#0000ff|#007f7f|#a2b5cd|#da70d6|#ffa500|#884428|#00ff00|#ffff00|

  in following transform rule

  |original|0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|
  |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
  |recategorized|0|1|2|2|2|3|3|0|4|0|5|5|5|6|7|8|9|

  , you should write following CSV file.

  ```
  BackGround,Alfalfa,Corn,Grass,Hay-windrowed,Soybeans,Wheat,Woods,Bldg-Grass-Tree-Drives,Stone-steel towers
  #ffffff,#ff0088,#0000ff,#007f7f,#a2b5cd,#da70d6,#ffa500,#884428,#00ff00,#ffff00
  0,1,2,2,2,3,3,0,4,0,5,5,5,6,7,8,9
  ```

- `exclude_WaterAbsorptionChannels`: (default: True)
  - some original channels are effected by water absorption. Gualtieri and Cromp [SC99] have suggested to reduce [104-108], [150-163], and 220 -th channel (20-channels) for Analysis.
  - [SC99] J.A. Gualtieri, R.F. Cromp, ``Support vector machines for hyperspectral remote sensing classification,'' 27th AIPR workshop: Advances in Computer-assisted Recognition, vol. 3584, pp. 221-232. SPIE, 1999.

- `gt_gic`: (default: True)
  - ground truth labels from GIC's MAT-file
  - if False, from original bundle's TIF-file


#### Outputs
  - Bunch（compatiple with scikit-learn）
    - **.features** : ($145 * 145$, #-features) numpy arrays in float raw
    - **.feature_names**：columns name of each features
    - **.target** ：($145 * 145$, ) numpy array in integer (indexed in 'target_names')
    - **.target_names**：(#-categories, ) numpy array in strings. category names corresponding with indexes
    - **.coordinates** ：($145 * 145$, 2) numpy arrays in integer. x-y coordinates: column-numbers and row-numbers, for each samples
    - **.coordinate_names**：(2,) numpy array in strings. cordinates name: ['#columns', '#rows']
    - **.hex_names**：(#-categories,) numpy array in strings. hex-color format (ex, '#0000' for black) for each category indexed in 'target_names'
    - **.DESCR**：description for dataset
    - **.filename**：CSV file name saved features and targets

  `features`, `target` and `coordinates` is ordered `for row in range(image height) for column in range(image width)`

### IndianPines.make_dataset

Input and Output are None.

This function called from IndianPines.load if you haven't download the original bundles from Pardue University yet. (to guard redundant downloads)

It also make CSV-format dataset.