Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

name is not defined in paftools.js gff2bed #422

Open
johnomics opened this issue Jun 11, 2019 · 6 comments
Open

name is not defined in paftools.js gff2bed #422

johnomics opened this issue Jun 11, 2019 · 6 comments

Comments

@johnomics
Copy link

johnomics commented Jun 11, 2019

Thank you for all your excellent work on minimap2, we use it every day.

I'm trying to convert the NCBI GRCh38 RefSeq annotation to BED format for aligning with minimap2 using paftools.js gff2bed. As per your advice, I'm using the no_alt_analysis GRCh38, and have got the full_analysis_set GFF and GTF from the same folder:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
ftp://ftp.ncbi.nlm.nih.gov//genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gff.gz
ftp://ftp.ncbi.nlm.nih.gov//genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gtf.gz

I get the following error when running gff2bed, with the GTF or GFF (minimap2 v2.17 release):

$ paftools.js gff2bed -j GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gtf
/mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:1593: ReferenceError: name is not defined
			exons.push([t[0], t[3], t[4], t[6], id, type, name, tname]);
                                                 ^
ReferenceError: name is not defined
    at paf_gff2bed (/mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:1593:50)
    at main (/mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:2517:29)
    at /mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:2534:1

The name variable used at line 1593 is set in the if statements at lines 1567 and 1574, but it is not initialised; instead, a gname variable is initialised at line 1562 but does not appear to be used.

If I change the name variable to gname, the command works, but I only ever get N/A for gene names; the NCBI annotations have gene_id and gene, but not gene_name. However, changing gene_name to gene_id or gene, or adding additional else if statements to check for gene_id or gene, doesn't work either.

Please could you look into this? Should I be using a different annotation? Or is there a fix that will include the NCBI gene names? Many thanks.

@lh3
Copy link
Owner

lh3 commented Jun 11, 2019

Please try the latest paftools. It should have resolved the issue.

@lh3 lh3 closed this as completed Jun 11, 2019
@johnomics
Copy link
Author

Thanks for the quick response. This works for the GTF, so I can continue with that, but just to let you know, it doesn't work with the GFF (maybe a separate issue?):

$ paftools.js gff2bed -j GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gff
chr1	12227	12612	NR_046018.2|misc_RNA|N/A	1000	+
chr1	12721	13220	NR_046018.2|misc_RNA|N/A	1000	+
chr1	14829	14969	NR_024540.1|misc_RNA|N/A	1000	-
chr1	15038	15795	NR_024540.1|misc_RNA|N/A	1000	-
chr1	15947	16606	NR_024540.1|misc_RNA|N/A	1000	-
chr1	16765	16857	NR_024540.1|misc_RNA|N/A	1000	-
chr1	17055	17232	NR_024540.1|misc_RNA|N/A	1000	-
chr1	17368	17605	NR_024540.1|misc_RNA|N/A	1000	-
chr1	17742	17914	NR_024540.1|misc_RNA|N/A	1000	-
chr1	18061	18267	NR_024540.1|misc_RNA|N/A	1000	-
chr1	18366	24737	NR_024540.1|misc_RNA|N/A	1000	-
chr1	24891	29320	NR_024540.1|misc_RNA|N/A	1000	-
/mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:1578: Error: No transcript_id
		if (id == null) throw Error("No transcript_id");
                        ^
Error: No transcript_id
    at Error (<anonymous>)
    at paf_gff2bed (/mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:1578:25)
    at main (/mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:2518:29)
    at /mnt/lustre/groups/biol-tf-2018/software/miniconda3/bin/paftools.js:2535:1

@lh3
Copy link
Owner

lh3 commented Jun 11, 2019

Then use GTF. I think NCBI GFF3 is problematic more or less, and is inconsistent with the corresponding GTF. Gencode/ensembl GTF and GFF3 pretty much have the same information.

@lh3
Copy link
Owner

lh3 commented Jun 11, 2019

I am reopening this issue in case I may come back to it and make further improvement for NCBI GFF3.

@lh3 lh3 reopened this Jun 11, 2019
@niehu2018
Copy link

Please try the latest paftools. It should have resolved the issue.

I found the GTF of human and mouse from ENSEMBL all have gene_id and gene_name,
but some genes of other species (GFF from ENSEMBL) have gene_id attribute, but no gene_name attribute.
How did you fix this problem, just ignore these genes which have "gene_id" attribute but not have "gene_name" attribute in the bam file? or use gene_id or something instead of gene_name?

@akshayMpatel
Copy link

I am still getting the original "...ReferenceError: name is not defined..." as above with minimap2 2.17-r941 (latest version of paftools.js I assume). I'm trying to use the --junc-bed option and only have the gtf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants