Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
nghiavtr committed Nov 16, 2018
1 parent d2e9b22 commit 575f353
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 6 deletions.
7 changes: 5 additions & 2 deletions createAnno/ex_createAnnoRData_Homo_sapiens.GRCh38.94.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
##### Some remarks in Step 2 before use:
# 1) check to use the correct dataset and version from ensembl database
# 2) check to use the correct attribute containing gene paralog when using biomaRt
# 3) Step 2.2 can be IGNORED for some species that do not contain information of hgncName, ribSubunitDb, mitoTransDb, ribonuproDb
# 3) step 2.2 can be IGNORED for some species that do not contain information of hgncName, ribSubunitDb, mitoTransDb, ribonuproDb
# 4) check the concordance between transcript/gene names in cdna fasta file and gtf file, see line 67 of "remove version of gene/transcript"

args = commandArgs(trailingOnly=TRUE)
cdnaFn=args[1]
Expand Down Expand Up @@ -63,7 +64,8 @@ geneName=unlist(lapply(txInfo, function(x) {
return(z[length(z)])
}))

#remove version of gene/transcript
##remove version of gene/transcript
## This step need to be done if there are a inconsistency between transcript/gene names in cdna fasta file and gtf file. Homo_sapiens.GRCh38.94 is a typical example.
txName=unlist(lapply(txName, function(x) unlist(strsplit(x,"\\."))[1]))
geneName=unlist(lapply(geneName, function(x) unlist(strsplit(x,"\\."))[1]))

Expand Down Expand Up @@ -173,3 +175,4 @@ length(geeqMap)
##### Step 4: save to file
save(geeq,eqgeMap,geeqMap,txAnno,geneAnno,geneParalog,hgncName,ribSubunitDb,mitoTransDb,ribonuproDb,file=txAnnoFn)


13 changes: 9 additions & 4 deletions createAnno/ex_createAnno_TAIR10.41.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,10 @@
##### Some remarks in Step 2 before use:
# 1) check to use the correct dataset and version from ensembl database
# 2) check to use the correct attribute containing gene paralog when using biomaRt
# 3) Step 2.2 can be IGNORED for some species that do not contain information of hgncName, ribSubunitDb, mitoTransDb, ribonuproDb
# 3) step 2.2 can be IGNORED for some species that do not contain information of hgncName, ribSubunitDb, mitoTransDb, ribonuproDb
# 4) check the concordance between transcript/gene names in cdna fasta file and gtf file, see line 67 of "remove version of gene/transcript"



args = commandArgs(trailingOnly=TRUE)
cdnaFn=args[1]
Expand Down Expand Up @@ -61,9 +64,10 @@ geneName=unlist(lapply(txInfo, function(x) {
}))


#remove version of gene/transcript
txName=unlist(lapply(txName, function(x) unlist(strsplit(x,"\\."))[1]))
geneName=unlist(lapply(geneName, function(x) unlist(strsplit(x,"\\."))[1]))
##remove version of gene/transcript
## This step need to be done if there are a inconsistency between transcript/gene names in cdna fasta file and gtf file. Homo_sapiens.GRCh38.94 is a typical example.
#txName=unlist(lapply(txName, function(x) unlist(strsplit(x,"\\."))[1]))
#geneName=unlist(lapply(geneName, function(x) unlist(strsplit(x,"\\."))[1]))

#put them together
txAnno=cbind(txName,cdnaType,chrInfo,geneType,txType,geneName)
Expand Down Expand Up @@ -145,3 +149,4 @@ length(geeqMap)
save(geeq,eqgeMap,geeqMap,txAnno,geneAnno,geneParalog,file=txAnnoFn)

rm(list=ls())

0 comments on commit 575f353

Please sign in to comment.