Arkimet

Introduction

Arkimet is a set of tools to archive, dispatch and distribute data files.

Features

General

All arkimet functionality besides metadata extraction and dataset recovery is file format agnostic
Support for new metadata types, match functions and file formats can be added with plugins
Data is treated like an opaque, read only binary string, never modified to guarantee integrity
Archive data files are only accessed using append operations, to avoid the risk of accidentally corrupting existing data
Metadata is archived in a compact, binary representation to save space, but it can be handled in easy to read, easy to parse, standard formats.

Metadata

The extraction of metadata is flexible and customisable, and for generic formats like GRIB or HDF5 it can be scripted using Python
Metadata is stored in a compact binary representation to save space, but can be converted to standard formats when required
Metadata can have timestamped annotations to track its workflow
Metadata can be summarised to allow to know what data can be found in a big dataset without scanning its contents.

Remote access

Remote data access is provided through arki-server, an HTTP server application
arki-server can serve data from local datasets, as well as from remote datasets served by other arki-server instances (this allows, for example, to provide a single arki-server external front-end to various internal arki-servers in an organisation)
arki-server can be run behind apache mod-proxy to provide encrypted (SSL) or authenticated access
Client data access is done using libCURL, and can access the server over SSL or HTTP proxies
Summary of query results can be obtained quickly before transfering the result data]]
Postprocessing chains can be provided by the server to transfer only the postprocessed data (e.g. an average value instead of a big grid)

User interfaces

A powerful and flexible suite of commandline tools allows to easily integrate arkimet into automated data processing chains.
arki-server not only allows remote access to the datasets, but it also provides a low-level, web-based query interface
ArkiMEOW is a web-based front-end to arkimet that provides simple and powerful browsing and data retrieval for end users.

Factsheet

The metadata currently supported are:

Reference time
Origin
Product
Level
Time Range
Area
Product definition information

The formats used to handle metadata are:

Arkimet-specific compact binary representation
Human-readable, easy to parse YAML representation http://www.yaml.org/
XML/RDF can be implemented as soon as a suitable standard is defined

The file formats that can be scanned are:

WMO GRIB (edition 1 and edition 2), any template can be supported via eccodes and Python scripts
WMO BUFR (edition 3 and edition 4), via DB-All.e
More can easily be added via plugins

The match expressions currently available are:

Exact or partial match on any types of metadata

Advanced matching of reference times:

 # Open or close intervals
 reftime:>=2005
 reftime:>=2005,<2007
 # Exact years, month, days, hours, etc.
 reftime:=2007
 reftime:=2007-05
 # Time extremes can vary in precision
 reftime:>=2005-06-01 12:00
 # Also interval of time of day can be specified
 reftime:>=12:00,<18:00
 # And repetitions during the day
 reftime:>=2007-06%6h
 reftime:=2007,>=12:00/30m,<18:00

Configurable aliases of subexpressions for easy recall via mnemonic names

Licensing and acquisition

arkimet is Free Software, licensed under the GNU General Public License version 2 or later.
the source is available at http://www.arpa.emr.it/sim/?arkimet

Software dependencies

arkimet is written in C++, and it regularly compiles and passes the test suite using GCC version 3.3.3, 4.1.2 and 4.2.3.
It should run on any reasonable Unix system, and has been tested on Debian, RedHat and Suse Linux systems.
Python version 3.6 or later is required for the customisable logic and GRIB import
ECMWF grib_api is required for GRIB (edition 1 and 2) support http://www.ecmwf.int/products/data/software/grib_api.html
SQLite is optionally required for indexed datasets http://www.sqlite.org/
CURL is optionally required to access remote datasets http://curl.haxx.se/
python and python-cherrypy are optionally required to serve datasets over HTTP http://www.python.org/ http://www.cherrypy.org/
DB-All.e version 4.0 or later is optionally required to enable BUFR support http://www.arpa.emr.it/dettaglio_documento.asp?id=514&idlivello=64

Examples

To extract metadata:

$ arki-scan --dump --annotate file.grib

To create a summary of the content of many GRIB files:

$ arki-scan --summary *.grib > summary

To view the contents of the generated summary file:

$ arki-dump --yaml summary

To select GRIB messages from a file:

$ arki-scan file.grib | arki-grep --data 'origin:GRIB1,98;reftime:>=2008' > file1.grib

To dispatch grib messages into datasets:

$ arki-mergeconf datasets/* > allowed-datasets
$ arki-scan --dispatch=allowed-datasets file.grib > dispatched.md

To query:

$ arki-query 'origin:GRIB1,98;reftime:>=2008' datasets/* --data

To get a preview of the results without performing the query:

$ arki-query 'origin:GRIB1,98;reftime:>=2008' datasets/* --summary --yaml --annotate

To export some datasets for remote access:

$ arki-mergeconf datasets/* > allowed-datasets
$ arki-server --port=8080 allowed-datasets

To query a remote dataset:

$ arki-mergeconf http://host.name > allowed-datasets
$ arki-query 'origin:GRIB1,98;reftime:>=2008' allowed-datasets --data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INTRO.mdwn

INTRO.mdwn

Arkimet

Introduction

Features

General

Metadata

Remote access

Archive

User interfaces

Factsheet

Examples

Files

INTRO.mdwn

Latest commit

History

INTRO.mdwn

File metadata and controls

Arkimet

Introduction

Features

General

Metadata

Remote access

Archive

User interfaces

Factsheet

Examples