GCModeller: genomics CAD(Computer Assistant Design) Modeller system in .NET language
- HOME: http://gcmodeller.org
- Github: /~https://github.com/SMRUCC/GCModeller
- BioTools: https://bio.tools/gcmodeller
Supported platform: Microsoft Windows
, GNU Linux
, MAC
, Microsoft Azure Cloud
Development: Microsoft VisualStudio 2019 | VisualBasic.NET
Runtime environment: sciBASIC# v2.1.5 beta & .NET Framework 4.7
(or mono 6.4
)
Installation: VS2019
is required of compiles this project. After the source code have been clone using git, just open solution file /src/GCModeller.sln, and when restore nuget packages finished, then you are good to go of compile GCModeller project.
NOTE: Due to the reason of this project is using git submodule for manage some runtime component, so that please do not directly download the project source code from github by using the
Donwload Zip
button. The internal github client in the VisualStudio is recommended using for download the project source code.
Dependency: Part of the GCModeller function required running Linux tools through Darwinism Docker environment for VB.NET (If you are running GCModeller on Windows Platform). This toolkit required of these environment installed on your windows server:
- Microsoft PowerShell SDK 3.0
- Latest version of Docker for X64
- Then pull environment container image via:
docker pull xieguigang/gcmodeller-env
.
The docker container image contains these utils that required by GCModeller:
- MEME suite for motif analysis
- Mothur for construct OTU
Install Database: Some feature in GCModeller required the fasta sequence database was installed on a specific location on your server's filesystem, please follow this instruction to install the database for GCModeller.
GCModeller
is an open source cloud computing platform for the geneticist and systems biology. You can easily build a local computing server cluster for GCModeller
on the large amount biological data analysis.
The GCModeller
platform is original writen in VisualBasic.NET
language, a feature bioinformatics analysis environment that .NET language hybrids programming with R language was included, which its SDK is available at repository:
/~https://github.com/SMRUCC/R.Bioinformatics
Currently the R
language hybrids programming environment just provides some bioconductor
API for the analysis in GCModeller
.
GCModeller
is a set of utility tools working on the annotation of the whole cell system, this including the whole genome regulation annotation, transcriptome analysis toolkits, metabolism pathway analysis toolkits. And some common bioinformatics problem utils tools and common biological database I/O tools is also available in GCModeller for the .NET language programming.
- /GCModeller : The location of GCModeller compile output, I have config all of the project output in the path
./GCModeller/bin/
- /src
- /src/GCModeller : GCModeller basic library and analysis protocols
- /src/interops : GCModeller tools that dependent on the external programs
- /src/R.Bioconductor : R language hybrids environment
- /src/R-sharp : The GCModeller R# language scripting engine
- /src/repository : GCModeller data repository system
- /src/runtime : Third part library and VisualBasic runtime source code
- /tools
- GCModeller supports the
SBML
andBIOM
data standards for exchanges the analysis and model data with other bioinformatics softwares. - Supports
PSI
data for the biological interaction network model - Supports
OBO
data for ontology database likego
.
GCModeller provides a set of .NET libraries and CLI tools for processing biological analysis data. Currently GCModeller can provides these productive ready libraries:
- NCBI data analysis toolkit: Genbank/Taxonomy/nt/nr database
- Common Data: FastA database, FastQ, SAM data file I/O class
- Biological Data Standard Supports: SBML(level 3), BIOM(level1), PSI, OBO
- Biological Pathway Database: MetaCyc, Reactome, KEGG data tools for .NET language
- Circos API(genomic visualizing)
- Cytoscape DataModel API(Biological network visualizing)
- SequenceLogo(Molecular motif site visualize)
- KEGG pathway map visualizer
- A complete NCBI localblast toolkit for proteins and nucleotide sequence analysis, includes parallel task library for Win/Linux Server and data analysis protocol.
- SNP toolkit
- Nucleotide sequence topology feature site analysis toolkit.
- RegPrecise database tool and MEME software toolkit for the annotation of bacterial genomics regulation network.
- Go(Gene Ontology) annotation tools
- KEGG/GO GSEA functional enrichment tools and reference genome background model creator based on UniProt database.
- Includes basically R language API wrapper for VisualBasic, like Api in
base
,utils
,stats
namespace from R base. - and some R package wrapper API from CRAN and Bioconductor is also included.
- GCModeller
R#
language scripting
- Cellular module simulator, and virtual cell model generator protocol.
- Proteomics data analysis toolkit
- Single-cell data analysis toolkit
Here is a code snapshot of R# scripting for drawing sequence logo, input data is accepted from the commandline input:
# Demo script for create sequence logo based on the MSA alignment analysis
# nt base frequency is created based on the MSA alignment operation.
imports "bioseq.sequenceLogo" from "seqtoolkit.dll";
imports "bioseq.fasta" from "seqtoolkit.dll";
# script cli usage
#
# R# sequenceLogo.R --seq input.fasta [--title <logo.title> --save output.png]
#
# get input data from commandline arguments and
# fix for the optional arguments default value
# by apply or default syntax for non-logical values
let seq.fasta as string = ?"--seq" || stop("No sequence input data for draw sequence logo!");
let logo.png as string = ?"--save" || `${seq.fasta}.logo.png`;
let title as string = ?"--title" || basename(seq.fasta);
# read sequence and then do MSA alignment
# finally count the nucleotide base frequency
# and then draw the sequence logo
# by invoke sequence logo drawer api
seq.fasta
:> read.fasta
:> MSA.of
:> plot.seqLogo(title)
:> save.graphics( file = logo.png );
Run the R# script from commandline:
@echo off
R# ./sequenceLogo.R --seq LexA.fasta --save LexA.png --title "LexA"
Here listing the scientific paperworks that based on the analysis services of GCModeller:
-
Niu, X.-N., et al. (2015). "Complete sequence and detailed analysis of the first indigenous plasmid from Xanthomonas oryzae pv. oryzicola." BMC Microbiol 15(1): 1-15.
- DOI: 10.1186/s12866-015-0562-x
Bacterial plasmids have a major impact on metabolic function and adaptation of their hosts. An indigenous plasmid was identified in a Chinese isolate (GX01) of the invasive phytopathogen Xanthomonas oryzae pv. oryzicola (Xoc), the causal agent of rice bacterial leaf streak (BLS). To elucidate the biological functions of the plasmid, we have sequenced and comprehensively annotated the plasmid.
single cell data toolkit includes in GCModeller:
Visit our project home: http://gcmodeller.org
Here are some released library of the GCModeller
is published on nuget, then you can install these library in VisualStudio
from Package Manager Console:
# Install Microsoft VisualBasic sciBASIC# runtime via nuget:
# /~https://github.com/xieguigang/sciBASIC/
PM> Install-Package sciBASIC -Pre
# The GCModeller core base library was released:
# /~https://github.com/SMRUCC/GCModeller.Core
PM> Install-Package GCModeller.Core
# The NCBI localblast analysis toolkit:
# /~https://github.com/SMRUCC/ncbi-localblast
PM> Install-Package NCBI_localblast
The GCModeller demo script and data for user tutorials can be download from these public data repository:
Copyleft © SMRUCC genomics 2016. All rights reversed.