f_orig_query sums to greater than 1.0

Hi, I've run `sourmash gather` on approx 1,600 different metagenomes with the following:
`parallel -j 64 sourmash gather -k 21 {} -o {/.}.gt ~/sourmash_dbs/nih_smgc_k21.sbt.zip ~/sourmash_dbs/gtdb-rs207.genomic-reps.dna.k21.zip ~/sourmash_dbs/genbank-2022.03-fungi-k21.zip ::: ../*.sig`

**My immediate goal is to know the total proportion of the metegenome that is contained in my databases.** When I sum up the `f_orig_query` column in each of +1,000 resultant csv files (`for X in *.csv; do echo -ne $X"\t"; awk -F "," 'NR>1{sum=sum+$2} END{print(sum)}' $X; done | sort -nk 2,2`), I get a wide distribution of proportions, and >99% are between 0 and 1.0, which is what I'd expect. However, the `f_orig_query` column in 7 of the csv files sums to more than 1.0, which I would not expect. **Do I misunderstand the meaning of `f_orig_query` column, or is there something else that causes a sum greater than 1.0?**

I've searched the issues, read [this lovely guide](https://sourmash.readthedocs.io/en/latest/classifying-signatures.html) to interpreting gather csvs, and haven't found an answer to my question, but I'm skeptical that it hasn't been asked already, so I apologize for any redundancy.

Here is an example gather csv from one of the metagenomes with a `f_orig_query` column sum greater than 1.0:
[ju473iap108192021_S185_adj.csv](/~https://github.com/sourmash-bio/sourmash/files/9648310/ju473iap108192021_S185_adj.csv)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

f_orig_query sums to greater than 1.0 #2300

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development