when I run Metaphlan3, only Archaea (and only Thermoplasmata class) are listed in the output file and I know that Bacteria and Eukarya are there among other Archaea, other versions of Methaplan (first I think) confirm it. When I use the --unknown_estimation parameter, it says 99.5… what could be the mistake I’m making?
The command I’m using is:
metaphlan metagenoma.fasta --input_type fasta --bowtie2db …/1_database_metaphlan/ --bowtie2out metagenome.bowtie2.bz2 --output_file metaphlan3.txt --unknown_estimation --nproc 40
Thanks in advance!
Which kind of metagenome are you analyzing?
MetaPhlAn or MetaPhlAn2? If they are present in low abundance, I’d try to run the analysis with
It’s a shotgun metagenome. As input, I used the merged forward and reverse sequences. I also tried with the fastq files, but the result was the same.
MetaPhlAn (I’m pretty sure…it was a long time ago). I used this metagenome to compare results. I checked it with --stat_q 0.1 and… nothing, same result. Anyway, sequences of Bacteria are up to 50% (according to ref-seq database) and I was able to obtain bacterial MAGs of “common” microorganisms from this metagenome. They are there…somewhere.
Thanks for the quick replay!
I mean, from which environment is coming from?
It is possible that due to introduction of quality control on the read mapping quality and the filtering of “bad” taxonomically assigned species these species are not seen. Also, the original MetaPhlAn database is 8 years old and the species present are only ~1000, could you post here the list of species you were able to identify?
It comes from an acidic environment (acid mine drenage).
About the Metaphlan version I used… now I doubt (sorry, I don’t have the software in my computer anymore). Anyway, I’m attaching the output file we obtained, in case you want to check it.
metRT.txt (131 KB)
I see. Most of the species were detected are still present in the database but everything is reported in <1% relative abundance, a hypothesis can be that the MAPQ of the alignments against these markers is quite low, you could try to see if anything changes by using
--min_mapq_val -1 in order to use all the alignments generated.
I tried with:
metaphlan metagenomart.fasta --input_type fasta --bowtie2db …/1_database_metaphlan/ --bowtie2out metagenome.bowtie2.bz2 --output_file metaphlan3.txt --unknown_estimation --nproc 40 --min_mapq_val -1
And the result is the same.
metaphlan3.txt (2.21 KB)
Can you also upload the sam and bowtie2out file?
I can’t upload the files from here. Maybe you can download them from this link:
From the SAM file it seems that very few markers are identifies, hence the few species identified. If you allow more markers to be used in the calculation of the robust average with decreasing
stat_q and ignore the mapping quality with
--min_mapq_val, you can get most of the species identified with MetaPhlAn. But still, I would not trust their presence based on a single marker identified.
Excuse my ignorance but, knowing that we could construct some MAGs from this metagenome (at least 10 different MAGs-80% completeness), Is it normal not to find more markers to, at least, identify the species to which that MAGs belong?
And, does this mean that the result we got from the other version of Metaphlan is not confident?
Thank you very much for you help and your patience!
If the MAG species is not included in the MetaPhlAn database, yes. Also, if a substantially number of markers are not identified, the species cannot be identified since it’s required to be present at least 1/3 of the total number of markers.