Hello! I ran Metaphlan4 using several hundred metagenomics samples. Most of them had 100% unclassified reads. A few had reads classified in one class of Eukaryota. I even tried to change “–stat_q” to 0.01, but it didn’t help. However, in Metaphlan3, the same samples were classified to over 100 species of bacteria. The estimated counts for the classified bacteria were big in metaphlan3 (over 10k). The bowtie2 outputs from metaphlan4 are larger than those from metaphlan3. I was wondering why these reads are not classified as in metaphlan3, even when --stat_q is that small? I really appreciate it if you could help me take a look.
The command I used for metaphlan4 for one sample was:
python read_fastx.py -l 70 ${FQ1},${FQ2} > ${NAME2}_test.fastq 2>${NAME2}_test.param.txt
bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S ${NAME2}_test.sam -x ${mp4_db} -U ${NAME2}_test.fastq -p 4
metaphlan ${NAME2}_test.sam --input_type sam -o ${NAME2}_0.2_metagenome_profile_test.txt \
--unclassified_estimation -t rel_ab_w_read_stats --stat_q 0.01 --nreads 38643240
I split the steps because the one-step command can’t run due to memory issue.
The std error said:
38643240 reads; of these:
38643240 (100.00%) were unpaired; of these:
37224909 (96.33%) aligned 0 times
1343206 (3.48%) aligned exactly 1 time
75125 (0.19%) aligned >1 times
3.67% overall alignment rate
The output had 1001411 reads mapped to Eukaryota, and only 92 reads mapped to Bacteria. But in metaphlan3, the majority of reads were mapped to bacteria. This happened in all of the 500+ samples.
I really appreciate your kind help!