Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ref calls not printed with --print_ref_calls when --vcf_fn=FILE is specified #261

Closed
gtollefson opened this issue Jan 11, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@gtollefson
Copy link

gtollefson commented Jan 11, 2024

Hi,

When I run the following clair3 command with --vcf_fn and --print_ref_calls provided, reference records report the genotype as ./. and don't report the other format fields (DP, etc.)

singularity exec \
			-B {params.INPUT_DIR} \
		  	-B {params.OUTPUT_DIR} \
		  	-B {params.REF_DIR} \
			-B {params.RESOURCE_DIR} \
		  	{input.sif_path} \
		  	/opt/bin/run_clair3.sh \
		  	--bam_fn={input.samples_to_run} \
		  	--ref_fn={input.REF} \
		  	--model_path=/opt/models/{params.MODEL_NAME} \
		  	--output={params.OUTPUT_DIR} \
		  	--threads={params.THREADS} \
		  	--platform=ont \
		  	--include_all_ctgs \
		  	--no_phasing_for_fa \
		  	--sample_name={wildcards.samples} \
		  	--gvcf \
		  	--include_all_ctgs \
			--threads=1 \
			--vcf_fn={params.targets_vcf} \
			--print_ref_calls

VCF output example showing missing ref fields:
Screenshot 2024-01-10 at 11 48 37 PM

When I run a the same clair3 command but without --vcf_fn, the reference genotypes are reported as 0/0 and other format fields are reported.

singularity exec \
			-B {params.INPUT_DIR} \
		  	-B {params.OUTPUT_DIR} \
		  	-B {params.REF_DIR} \
			-B {params.RESOURCE_DIR} \
		  	{input.sif_path} \
		  	/opt/bin/run_clair3.sh \
		  	--bam_fn={input.samples_to_run} \
		  	--ref_fn={input.REF} \
		  	--model_path=/opt/models/{params.MODEL_NAME} \
		  	--output={params.OUTPUT_DIR} \
		  	--threads={params.THREADS} \
		  	--platform=ont \
		  	--include_all_ctgs \
		  	--no_phasing_for_fa \
		  	--sample_name={wildcards.samples} \
		  	--gvcf \
		  	--include_all_ctgs \
			--threads=1 \
			--print_ref_calls

VCF output example showing present ref fields:

Screenshot 2024-01-10 at 11 49 51 PM

Is this expected behavior? I'd like to be able to run clair3 to report full records for reference calls with GT, GQ, and DP values when --vcf_fn is specified.

@aquaskyline
Copy link
Member

Yes it is expected. In the second example, you enabled --gvcf, which incurs more calculations at the reference calls, thus giving you more details.

@gtollefson
Copy link
Author

@aquaskyline Thanks for your response, but I don't think that answers my question since I enabled --gvcf for both examples - not just the second example. --vcf_fn is the only difference between the two runs.

Also the output examples I pasted above are both VCF file output (not gvcf files) from runs with --gvcf enabled which are automatically written along with the gvcf files. The gvcf output from the same run with --gvcf and --vcf_fn enabled does show 0/0 calls. I've pasted a screenshot from this run showing that the gvcf output shows 0/0 calls on the same sites that are reported as ./. in the vcf output.

Ref genotype missing at: chr4 748134 in VCF
Screenshot 2024-01-16 at 12 00 39 PM

Ref genotype present at: chr4 748134 in GVCF from same run:
Screenshot 2024-01-16 at 12 00 49 PM

This suggests that 0/0 genotypes and associated DP, GQ, AD values aren't reported in the VCF output (but are reported in the gvcf output) when --vcf_fn is specified.

We want to use the VCF output in our pipelines, but also need to run clair3 with '--vcf_fn' while outputting 0/0 reference genotypes and associated fields (DP, GQ, AD etc). Is this possible?

@gtollefson
Copy link
Author

@aquaskyline Just following up here. I'm hoping to see if Clair3 can support our use case before we move on to trying out other tools - Clair3 our favorite choice currently but we need to be able to calculate and report DP, AD, etc. for reference calls in non-gvcf output when --vcf_fn is specified.

@zhengzhenxian
Copy link
Collaborator

zhengzhenxian commented Jan 25, 2024

Hi, @gtollefson,

The './.' variants indicate low-AF sites that were not processed by Clair3 and thus not subsequently included in the output. To output these variants as RefCall, you can try setting --snp_min_af=0.0 and --indel_min_af=0.0 in addition to --vcf_fn.

In the next version, we will set --snp_min_af=0.0 and --indel_min_af=0.0 as the default behavior when '--vcf_fn' is applied.

@aquaskyline
Copy link
Member

@gtollefson may we collect your feedback on whether the two settings --snp_min_af=0.0 and --indel_min_af=0.0 solve your problem?

@gtollefson
Copy link
Author

gtollefson commented Jan 30, 2024

Hi @aquaskyline @zhengzhenxian,

Thank you very much for your help. Setting --snp_min_af=0.0 and --indel_min_af=0.0 along with --vcf_fn solved my problem. I am able to generate vcfs which report DP, AD, etc. for reference calls in non-gvcf output when --vcf_fn is specified.

However I just want to point out something interesting. The example site that was missing DP, AD, etc before shows an AF of 0.9094.

The './.' variants indicate low-AF sites that were not processed by Clair3 and thus not subsequently included in the output.

This site had high AF in the AF field. Instead could the reason they were excluded be that the sites had no alternate count in the AD field since they are reference calls?

Screenshot of new vcf output showing this site has high AF but no alternate count in AD field (as is expected for a reference call):
Screenshot 2024-01-30 at 1 08 49 PM

@aquaskyline
Copy link
Member

aquaskyline commented Jan 31, 2024

--snp_min_af and --indel_min_af actually refer to variant AF, a site with reference AF 0.9094 is with variant AF <0.1

@aquaskyline aquaskyline added the enhancement New feature or request label Jan 31, 2024
@gtollefson
Copy link
Author

Ok I see! Thank you. We appreciate your fast response and clean solution.
Best,
George

@aquaskyline
Copy link
Member

Fixed in v1.0.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants