Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Commit

Permalink
Updated README, version and changelog
Browse files Browse the repository at this point in the history
  • Loading branch information
jorgecarleitao committed Aug 11, 2021
1 parent 79ce377 commit 4c18c0a
Show file tree
Hide file tree
Showing 4 changed files with 75 additions and 11 deletions.
4 changes: 3 additions & 1 deletion .github_changelog_generator
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
since-tag=v0.1.0
since-tag=v0.2.0
future-release=v0.3.0
pr-wo-labels=false
add-sections={"features":{"prefix":"**Enhancements:**","labels":["enhancement"]}, "documentation":{"prefix":"**Documentation updates:**","labels":["documentation"]}}
enhancement-label=**New features:**
enhancement-labels=feature
base=CHANGELOG.md
49 changes: 49 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,54 @@
# Changelog

## [v0.3.0](/~https://github.com/jorgecarleitao/arrow2/tree/v0.3.0) (2021-08-11)

[Full Changelog](/~https://github.com/jorgecarleitao/arrow2/compare/v0.2.0...v0.3.0)

**Breaking changes:**

- Renamed `sum` to `sum_primitive` [\#273](/~https://github.com/jorgecarleitao/arrow2/issues/273)
- Moved trait `Index` from `array::Index` to `types::Index` [\#272](/~https://github.com/jorgecarleitao/arrow2/issues/272)
- Added optional `projection` to IPC FileReader [\#271](/~https://github.com/jorgecarleitao/arrow2/issues/271)
- Added optional `page_filter` to parquet's `RecordReader` and `get_page_iterator` [\#270](/~https://github.com/jorgecarleitao/arrow2/issues/270)
- Renamed parquets' `CompressionCodec` to `Compression` [\#269](/~https://github.com/jorgecarleitao/arrow2/issues/269)

**New features:**

- Added support for FFI of dictionary-encoded arrays [\#267](/~https://github.com/jorgecarleitao/arrow2/pull/267) ([jorgecarleitao](/~https://github.com/jorgecarleitao))
- Added support for projection pushdown on IPC files [\#264](/~https://github.com/jorgecarleitao/arrow2/pull/264) ([jorgecarleitao](/~https://github.com/jorgecarleitao))
- Added support to read parquet asynchronously [\#260](/~https://github.com/jorgecarleitao/arrow2/pull/260) ([jorgecarleitao](/~https://github.com/jorgecarleitao))
- Added support to filter parquet pages. [\#256](/~https://github.com/jorgecarleitao/arrow2/pull/256) ([jorgecarleitao](/~https://github.com/jorgecarleitao))
- Added wrapping\_cast to cast kernels [\#254](/~https://github.com/jorgecarleitao/arrow2/pull/254) ([sundy-li](/~https://github.com/sundy-li))
- Added support to parquet IO on wasm32 [\#239](/~https://github.com/jorgecarleitao/arrow2/pull/239) ([jorgecarleitao](/~https://github.com/jorgecarleitao))
- Added support to round-trip dictionary arrays on parquet [\#232](/~https://github.com/jorgecarleitao/arrow2/pull/232) ([jorgecarleitao](/~https://github.com/jorgecarleitao))
- Added Scalar API [\#56](/~https://github.com/jorgecarleitao/arrow2/pull/56) ([jorgecarleitao](/~https://github.com/jorgecarleitao))

**Fixed bugs:**

- Fixed error in computing remainder of chunk iterator [\#262](/~https://github.com/jorgecarleitao/arrow2/pull/262) ([jorgecarleitao](/~https://github.com/jorgecarleitao))
- Fixed error in slicing bitmap. [\#250](/~https://github.com/jorgecarleitao/arrow2/pull/250) ([jorgecarleitao](/~https://github.com/jorgecarleitao))

**Enhancements:**

- Improve the performance in cast kernel using AsPrimitive trait in generic dispatch [\#252](/~https://github.com/jorgecarleitao/arrow2/issues/252)
- Poor performance in `sort::sort_to_indices` with limit option in arrow2 [\#245](/~https://github.com/jorgecarleitao/arrow2/issues/245)
- Support loading Feather v2 \(IPC\) files with more than 1 million tables [\#231](/~https://github.com/jorgecarleitao/arrow2/issues/231)
- Migrated to parquet2 v0.3 [\#265](/~https://github.com/jorgecarleitao/arrow2/pull/265) ([jorgecarleitao](/~https://github.com/jorgecarleitao))
- Added more tests to cast and min/max [\#253](/~https://github.com/jorgecarleitao/arrow2/pull/253) ([jorgecarleitao](/~https://github.com/jorgecarleitao))
- Prettytable is unmaintained. Change to comfy-table [\#251](/~https://github.com/jorgecarleitao/arrow2/pull/251) ([PsiACE](/~https://github.com/PsiACE))
- Added IndexRange to remove checks in hot loops [\#247](/~https://github.com/jorgecarleitao/arrow2/pull/247) ([jorgecarleitao](/~https://github.com/jorgecarleitao))
- Make merge\_sort\_slices MergeSortSlices public [\#243](/~https://github.com/jorgecarleitao/arrow2/pull/243) ([sundy-li](/~https://github.com/sundy-li))

**Documentation updates:**

- Added example and guide section on compute [\#242](/~https://github.com/jorgecarleitao/arrow2/pull/242) ([jorgecarleitao](/~https://github.com/jorgecarleitao))

**Closed issues:**

- Allow projection pushdown to IPC files [\#261](/~https://github.com/jorgecarleitao/arrow2/issues/261)
- Add support to write dictionary-encoded pages [\#211](/~https://github.com/jorgecarleitao/arrow2/issues/211)
- Make IpcWriteOptions easier to find. [\#120](/~https://github.com/jorgecarleitao/arrow2/issues/120)

## [v0.2.0](/~https://github.com/jorgecarleitao/arrow2/tree/v0.2.0) (2021-07-30)

[Full Changelog](/~https://github.com/jorgecarleitao/arrow2/compare/v0.1.0...v0.2.0)
Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "arrow2"
version = "0.2.0"
version = "0.3.0"
license = "Apache-2.0"
description = "Unofficial implementation of Apache Arrow spec in safe Rust"
homepage = "/~https://github.com/jorgecarleitao/arrow2"
Expand Down
31 changes: 22 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,17 +52,31 @@ venv/bin/python parquet_integration/write_parquet.py

## Features in this crate and not in the official

### Safety and Security

* safe by design (i.e. no transmutes, runtime type checking nor pointer casts)
* Uses Rust's compiler whenever possible to prove that memory reads are sound
* All non-IO components pass MIRI checks (MIRI and file systems are a bit funny atm)

### Arrow Format

* IPC supports big endian
* `MutableArray` API to work in-memory in-place.
* faster IPC reader (different design that avoids an extra copy of all data)
* IPC supports 2.0 (compression)
* FFI support for dictionary-encoded arrays

### Parquet

* Reading parquet is 10-20x faster (single core) and deserialization is parallelizable
* Writing parquet is 3-10x faster (single core) and serialization is parallelizable
* MIRI checks on non-IO components (MIRI and file systems are a bit funny atm)
* parquet IO has no `unsafe`
* IPC supports big endian
* parquet IO supports `async` read

### Others

* More predictable JSON reader
* `MutableArray` API to work with arrays in-place.
* Generalized parsing of CSV based on logical data types
* faster IPC reader (different design that avoids an extra copy of all data)
* IPC supports 2.0 (compression)

## Features in the original not available in this crate

Expand All @@ -72,12 +86,11 @@ venv/bin/python parquet_integration/write_parquet.py
## Features in this crate not in pyarrow

* Read and write of delta-encoded utf8 to and from parquet
* parquet roundtrip of all arrow types.
* parquet roundtrip of all supported arrow types.

## Roadmap
## Features in pyarrow not in this crate

1. parquet read of nested types.
2. bring documentation up to speed
Too many to enumerate; e.g. nested dictionary arrays, union, map, nested parquet.

## How to develop

Expand Down

0 comments on commit 4c18c0a

Please sign in to comment.