Request for guidance in tracking sequences analyzed by MetaPhlan2.0

Hello Sir and Ma’am, I hope this email finds you well. I am currently processing metagenomic samples. Upon analyzing our metagenomic sequences generated through illumina, the BIOM file generated by MetaPhlan 2.0 detected a lot of microorganisms including one of the main targets of the research - Influenza A virus as shown below.

{“id": “8797”, “metadata”: {“taxonomy”: [“k__Viruses”, “p__Viruses_noname”, “c__Viruses_noname”, “o__Viruses_noname”, “f__Orthomyxoviridae”, “g__Influenzavirus_A”, “s__Influenza_A_virus”, “t__PRJNA15622”]}}

{“id”: “18976”, “metadata”: {“taxonomy”: [“k__Viruses”, “p__Viruses_noname”, “c__Viruses_noname”, “o__Viruses_noname”, “f__Orthomyxoviridae”, “g__Influenzavirus_A”, “s__Influenza_A_virus”, “t__PRJNA14892”]}}

As this report will greatly matter for those concerned, how could I extract the nucleotide sequences that MetaPhlan 2.0 detected from our metagenomic sequencing data? Do 8797 and 18976 correspond to the 8797th and 18976th sequence in our metagenomic sequencing? Lastly, is it possible to download the reference marker sequence used by MetaPhlan in identifying our samples?

Thank you very much!

Best regards,

Crist John M. Pastor

Hi @Crist_John_Pastor
For extracting the reads mapping against your target species you need to generate the SAM file while running MetaPhlAn with the option --samout. Then you need to filter out the samfile, keeping only the mapping results of those reads mapping against your species markers (you can check the id of the markers of your species here: http://cmprod1.cibio.unitn.it/biobakery3/metaphlan_databases/mpa_v20_m200_marker_info.txt.bz2) and extract the reads from the filtered sam file using samtools. For the marker sequences, you can download the FASTA file from here: http://cmprod1.cibio.unitn.it/biobakery3/metaphlan_databases/mpa_v20_m200.tar