Python library multicsv
is designed for handling multi-CSV format
files. It provides an interface for reading, writing, and manipulating
sections of a CSV file as individual text file objects.
- Efficient Section Management: Read and write multiple independent sections within a single CSV file.
- TextIO Interface: Sections are treated as TextIO objects, enabling familiar file operations.
- Flexible Operations: Supports reading, writing, iterating, and deleting sections.
- Context Management: Ensures resource safety with
with
statement compatibility. - Integrated Testing: Includes comprehensive unit tests, covering 100% of the functionality.
The multi-CSV format is an extension of the traditional CSV
(Comma-Separated Values) format that supports dividing a single file
into multiple independent sections. Each section is demarcated by a
header enclosed in square brackets, e.g., [section_name]
.
This format is commonly known for usage in Illumina-MiSeq sample sheet
files.
Conceptually, this file format provides the ability to store a whole SQL database in a single, human readable file.
Here's a simplified example of a multi-CSV file:
[section1]
header1,header2,header3
value1,value2,value3
[section2]
headerA,headerB,headerC
valueA,valueB,valueC
In the example above, the file contains two sections: section1
and
section2
. Each section has its own headers and rows of data.
Here's a quick example of how to use the multicsv
library:
import csv
import multicsv
with multicsv.open('example.csv', mode='w+') as csv_file:
# Write the CSV content to the file
csv_file.section('section1').write("header1,header2,header3\nvalue1,value2,value3\n")
csv_file.section('section2').write("header4,header5,header6\nvalue4,value5,value6\n")
# Read a section using the csv module
csv_reader = csv.reader(csv_file['section1'])
assert list(csv_reader) == [['header1', 'header2', 'header3'],
['value1', 'value2', 'value3']]
There are only two methods exported in multicsv
: open
and wrap
.
This is how the latter one is meant to be used:
import io
import multicsv
# Initialize the MultiCSVFile with a base CSV string
csv_content = io.StringIO("""\
[section1]
a,b,c
1,2,3
[section2]
d,e,f
4,5,6
""")
csv_file = multicsv.wrap(csv_content)
# Accessing a section
section1 = csv_file["section1"]
print(section1.read()) # Outputs: "a,b,c\n1,2,3\n"
# Adding a new section
new_section = io.StringIO("g,h,i\n7,8,9\n")
csv_file["section3"] = new_section
csv_file.flush()
# Verify the new section is added
csv_content.seek(0)
print(csv_content.read())
# Outputs:
# [section1]
# a,b,c
# 1,2,3
# [section2]
# d,e,f
# 4,5,6
# [section3]
# g,h,i
# 7,8,9
Both exported methods return a MultiCSVFile
object.
Objects of that class are MutableMapping
s from names of sections (: str
) to contents of sections (: TextIO
).
So, for instance, this is how to print all sections in a multi-csv file:
import multicsv
for section in multicsv.open("example.csv"):
print(section)
Install the library using pip:
pip install multicsv
Set up your environment for development as follows:
-
Clone the repository:
git clone /~https://github.com/cfe-lab/multicsv.git
-
Navigate to the project directory:
cd multicsv
-
Create a virtual environment:
python3 -m venv venv source venv/bin/activate
-
Install dependencies:
pip install -e .[dev,test]
Run the test suite to ensure everything is functioning correctly:
pytest
Contributions are welcome! Please follow these steps for contributions:
- Fork the repository.
- Create a new branch with a descriptive name.
- Make your changes and ensure the test suite passes.
- Open a pull request with a clear description of what you've done.
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.