Too many unclassified reads?

Hi, everybody!

I’m using Metaphlan [version 4.0.3 (24 Oct 2022)] for taxonomic profiling and the result files show a high percentage of unclassified reads (~82%). This means that only 18% of the reads are correctly assigned to taxa, right? Isn’t that too low?

I removed adapters, overrepresented sequences, and host sequences with Kneaddata, and checked quality with FastQC / MultiQC. It all seemed OK.

Does anybody know what could be the issue here?

Here is the command I used:

metaphlan ${SEQS}/${name}.fastq \
    --input_type fastq \
    --bowtie2db ${bowtie2db} \
    --sample_id ${name} \
    --nproc ${threads} \
    --bowtie2out ${result_dir}/${name}.bowtie2.bz2 \
    --unclassified_estimation \
    --subsampling 14800000 \
    --subsampling_seed 0 \
    --output_file ${result_dir}/${name}_profile.txt

I really appreciate any help you can provide.

Hi @vrrodovalho
That fraction is really dependent on the data you are analizing, in the human gut it is usually around 25%, but in other less characterized environments the fraction can be quite high.

1 Like

Hi @aitor.blancomiguez, thank you for your answer. I’m working with samples from a rodent, the golden hamster. So, I think the expected rate of unclassified reads should be more than 25%?

Hi @vrrodovalho
I would say the higher unclassified fraction is expected in such environment. We do not have too many rodent MAGs in the database