Bioinformatics is an interdisciplinary field that analyzes biological data using computer science. I have created a new package for bioinformatics and computational analysis built in C#. This package can analyze biological data such as DNA, RNA, and protein sequences, as well as perform tasks such as Biological Database Scraping. Additionally, 'GEO Analysis' offers a variety of analyses for RNA-seq preprocessing, parsing various biological file formats (FASTA, PDB, SOFT, FASTQ, etc.). BioSySNet also utilizes the Python interpreter to run certain plotting functions (i.e., it uses IronPython to execute Python functions, "so first, you need to make sure that Python is installed on your machine").
- Scrape useful information from various databases like, DrugBank, UnipProt, PDB etcand PDB.
- Download and parse some of the important biological file formats such as .pdb, .fasta, .fastq, .soft, and CSV.
- Built-in statistical functions for differential expression analysis and pre-processing.
- Built-in CSV reader and manipulation.
- Graphs for fastq quality analysis, GC plots, and enrichment analysis using Python.
- Different array computation programs and matrix calculations.
- Built-in data structures for data science and statistics.
BioSyS requires Python on your machine for plotting and for providing some functionalities. First of all, download the Python 3.x version. Then, download the supported version of BioSyS and configure it to your Python interpreter.
- HtmlAgilityPack >= 1.11.x
- IronPython >= 3.4.1
- MathNet.Numerics >= 4.9.0
- Matplotlib >= 3.6.0
- beautifulsoup4 == 4.11.0
- seaborn >= 0.12.1
You can easily download the package with nuget package manager:
PM> NuGet\Install-Package BioSySNet -Version x.y.z
OR, Can also be installed by CLI using dotnet:
> dotnet add package BioSySNet --version x.y.z
Where, x.y.z will be desired version i.e. -version 0.0.4
OR --version 0.0.4
(LATEST)
- Class
csvReader
has been changed to new nameCSVArray
. - The
readCSV
function of theCSVArray
class (formerly known ascsvReader
) has been renamed toreadNumericCSV
in this version. This change allows the CSV file to be read as numeric points only. - A new class feature called
ImportData
has been added, which allows for importing or reading datasets from CSV, tab, or other delimiter symbols. Additionally, a class calledDataFrame
has been introduced for dataset handling and manipulations. TrainTest_Split
function now takes an array of double type as arguments and returns a floating point value.ReadSoftFile
has been changed toReadMetaDataBySOFT
in theBioFormats
class, which is responsible for reading metadata in the SOFT file format.- Lastly, the
ArrayExpression
function in theBioFormats
class enables access to microarray data from the Gene Expression Omnibus database.
Go through the Medium Link:
In the next Part, we will discuss about classes for statistical analysis on biological data (GEOAnalysis, GEOCSVTable, Statistics and descriptiveAnalysis classes of BioSySNet) and scraping information from Drug Bank (drugScraping class of BioSySNet).
Sometimes the features of Python functions can be defective and cause errors!