This library provides integrated MIMIC-III with discharge summary provenance of data annotations and Pythonic classes. This is a package meant for other researchers based on the paper Hospital Discharge Summarization Data Provenance.
See the full documentation. The API reference is also available.
Both the package and database must be installed.
The library can be installed with pip from the pypi repository:
pip3 install --use-deprecated=legacy-resolver zensols.dsprov
The --use-deprecated=legacy-resolver
is needed for the spaCy 3.2 dependency.
Python version 3.10 must also be used since it depends on PyTorch 1.13 (at
least for the binary packages).
The MIMIC-III database must be installed and configured as documented in the
mimic package's configuration, which is necessary to render the EHR data used
by the annotations. However, instead of using ~/.mimicrc
, the file must be
called ~/.dsprovrc
if not explicitly set with the --config
option.. The
mimic package's configuration section also provides instructions on how to
install a MIMIC-III database via PostgreSQL, SQLite or in a Docker container.
For example, to store cached files in ~/.dsprov/cache
using a the SQLite
MIMIC-III database file ~/.dsprov/cache/mimic3.sqlite3
, your ~/.dsprovrc
would be:
[default]
# the directory where cached data is stored
data_dir = ~/.dsprov/cache
[mimic_sqlite_conn_manager]
# location of the MIMIC-III SQLite database file
db_file = path: ~/.dsprov/cache/mimic3.sqlite3
The package includes a command line interface, which is probably most useful by dumping selected admission annotations.
# help
$ dsprov --help
# get two admission IDs (hadm_id)
$ dsprov ids -l 2
# print out two admissions
$ dsprov show -l 2
# print out admissions 139676
$ dsprov show -d 139676
# output the JSON of two admissions with indent 4
$ dsprov show -i 4 -f json -d $(dsprov ids -l 2 | awk '{print $1}' | paste -s -d, -)
The package can be used directly in your research to provide Python object oriented access to the annotations:
>>> from zensols.nlp import FeatureDocument
>>> from zensols.dsprov import ApplicationFactory, AdmissionMatch
>>> stash = ApplicationFactory.get_stash()
>>> am: AdmissionMatch = next(iter(stash.values()))
>>> doc: FeatureDocument = am.note_matches[0].discharge_summary.note.doc
>>> print(f'hadm: {am.hadm_id}')
>>> print(f'sentences: {len(doc.sents)}')
>>> print(f'tokens: {doc.token_len}')
>>> print(f'entities: {doc.entities}')
hadm: 120334
sentences: 46
tokens: 1039
entities: (<Admission>, <Date>, <Discharge>, <Date>, <Date of Birth>, <Sex>, ...)
A docker image is available as well.
To use the docker image, do the following:
- Create (or obtain) the Postgres docker image
- Clone this repository
git clone --recurse-submodules /~https://github.com/plandes/dsprov
- Set the working directory to the repo:
cd dsprov
- Copy the configuration from the installed mimicdb image
configuration:
make -C docker/mimicdb SRC_DIR=<cloned mimicdb directory> cpconfig
- Start the container:
make -C docker/app up
- Test sectioning a document:
make -C docker/app testdumpsec
- Log in to the container:
make -C docker/app devlogin
If you use this project in your research please use the following BibTeX entry:
@inproceedings{landesHospitalDischargeSummarization2023,
title = {Hospital {{Discharge Summarization Data Provenance}}},
booktitle = {The 22nd {{Workshop}} on {{Biomedical Natural Language Processing}} and {{BioNLP Shared Tasks}}},
author = {Landes, Paul and Chaise, Aaron and Patel, Kunal and Huang, Sean and Di Eugenio, Barbara},
date = {2023-07},
pages = {439--448},
publisher = {{Association for Computational Linguistics}},
location = {{Toronto, Canada}},
url = {https://aclanthology.org/2023.bionlp-1.41},
urldate = {2023-07-10},
eventtitle = {{{BioNLP}} 2023}
}
Also please cite the Zensols Framework:
@article{Landes_DiEugenio_Caragea_2021,
title={DeepZensols: Deep Natural Language Processing Framework},
url={http://arxiv.org/abs/2109.03383},
note={arXiv: 2109.03383},
journal={arXiv:2109.03383 [cs]},
author={Landes, Paul and Di Eugenio, Barbara and Caragea, Cornelia},
year={2021},
month={Sep}
}
An extensive changelog is available here.
Copyright (c) 2023 - 2024 Paul Landes