question about unclassified

Dear metaphlan team,

In my profile result (see below the last line) I got

t__Enterobacter_cloacae_unclassified 93.68831

My first question is: what does it mean ‘t’ ?

My second question is: what does it mean ‘unclassified’ ?

Does it mean it is a new subspecies or a new strain?

#SampleID Metaphlan2_Analysis
k__Bacteria 100.0
k__Bacteria|p__Proteobacteria 100.0
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria 100.0
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales 100.0
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae 100.0
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Enterobacter 93.68831
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Escherichia 6.31169
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Enterobacter|s__Enterobacter_cloacae 93.68831
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia_unclassified 6.31169
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Enterobacter|s__Enterobacter_cloacae|t__Enterobacter_cloacae_unclassified93.68831

Best regards,

By the way this group is really helpful!

Marilyne

Hi Marilyne,

t__ is the taxonomy rank which identifies the strain, in this case, unclassified means that MetaPhlAn was not able to identify the exact strain present in the community, otherwise, you’ll get something like t__GCA_XXXXXX.
It’s fine to filter all the t__ entries and focus only on the species level assigments.

Best,
Francesco

Hi Francesco,

Thank you

I would like to know if metaphlan includes also fungi ? Or it is only bacteria ?

Yes, MetaPhlAn2 includes markers for some fungi, you can check which species can be profiles by looking into the marker information marker information file here.

You can also have a look at MetaPhlAn 3.0 which includes more fungal markers

Hi Francesco Great thanks!

Hi Francesco,

I have difficulties to install metaphlan3 do you mind to provide me the anaconda command lines ?

I tried conda install metaphlan=3.0=pyh5ca1d4c_2 --no-channel-priority
but got:
Fetching package metadata …An unexpected error has occurred.

finally it works but got

Disk quota exceeded

how can I change the directory of installation envs/ and pkgs/ ?

There’s a newer build pyh5ca1d4c_4

You should move the whole anaconda installation to another location, see this for more information

Hi Francesco,

I could change the directory of installation and install metaphlan3 but when I run it I get

FileNotFoundError: [Errno 2] No such file or directory: ‘/dev/fd/63’

do you know how to solve this problem ?

It would be great if you could help me.

regards

Marilyne

metaphlan --input_type fastq <(zcat …/…/batch9_data/$1_R1_001.cleaned.fastq.gz …/…/batch9_data/$1_R2_001.cleaned.fastq.gz) --bowtie2db …/…/database_metaphlan/ --nproc 12 --bowtie2out $1.bowtie2out.txt -o $1_profile.txt

this is the command I use

I get:

slurmstepd: error: task/cgroup: unable to add task[pid=26585] to memory cg ‘(null)’
Use of uninitialized value $bt2_args[2] in join or string at /home/mdebieu/.conda/envs/metaphlan_version3b/bin/bowtie2 line 423.
Use of uninitialized value bt2_args[3] in join or string at /home/mdebieu/.conda/envs/metaphlan_version3b/bin/bowtie2 line 423. Use of uninitialized value [2] in string eq at /home/mdebieu/.conda/envs/metaphlan_version3b/bin/bowtie2 line 360.
Use of uninitialized value $
[3] in string eq at /home/mdebieu/.conda/envs/metaphlan_version3b/bin/bowtie2 line 360.
Use of uninitialized value in exists at /home/mdebieu/.conda/envs/metaphlan_version3b/bin/bowtie2 line 81.
Use of uninitialized value in exists at /home/mdebieu/.conda/envs/metaphlan_version3b/bin/bowtie2 line 81.
Use of uninitialized value $bt2_args[2] in join or string at /home/mdebieu/.conda/envs/metaphlan_version3b/bin/bowtie2 line 459.
Use of uninitialized value $bt2_args[3] in join or string at /home/mdebieu/.conda/envs/metaphlan_version3b/bin/bowtie2 line 459.
Traceback (most recent call last):
File “/home/mdebieu/.conda/envs/metaphlan_version3b/bin/read_fastx.py”, line 10, in
sys.exit(main())
File “/home/mdebieu/.conda/envs/metaphlan_version3b/lib/python3.7/site-packages/metaphlan/utils/read_fastx.py”, line 155, in main
nreads += read_and_write_raw(f, opened=False, min_len=min_len)
File “/home/mdebieu/.conda/envs/metaphlan_version3b/lib/python3.7/site-packages/metaphlan/utils/read_fastx.py”, line 118, in read_and_write_raw
with fopen(fd) as inf:
File “/home/mdebieu/.conda/envs/metaphlan_version3b/lib/python3.7/site-packages/metaphlan/utils/read_fastx.py”, line 53, in fopen
return open(fn)
FileNotFoundError: [Errno 2] No such file or directory: ‘/dev/fd/63’

Finally it worked with

/metaphlan …/…/batch9_data/JC-0355_S647_L002_R1_001.cleaned.fastq.gz,…/…/batch9_data/JC-0355_S647_L002_R2_001.cleaned.fastq.gz --bowtie2out JC-0355_S647_L002.bowtie2.bz2 --bowtie2db …/…/database_metaphlan/ --nproc 12 --input_type fastq -o JC-0355_S647_L002_profile.txt

what is the last version of metaphlan3 and the last version of the databases ?

how much memory should I request ?

what does it mean ‘additional_species’ is it all possible subspecies ?

The latest version available is 3.0.1, you can check for new releases [here](Package Recipe 'metaphlan' — Bioconda documentation, the latest database is v30.

To answer this, you should manually check how much memory uses to profile one of the metagenomes, more or less it should not require more than 6GB.

See this Unexpected output (format) - #2 by fbeghini

Hi Francesco, Great! Thank you!

Hi Francesco,

What is the difference between metaphlan2 and metaphlan3 ?

Is the method still the same ?

How many markers are used per species ? what is the average length of a marker ?

Best regards,

Marilyne

Yes, the method is practically identical, we introduced a couple of QC on the alignment quality and a new expanded database comprising 12k species (see https://forum.biobakery.org/t/can-you-tell-us-about-the-db-updates/310). We try to use maximum 150 markers per species, for complete marker statistics you can check the marker info file here https://www.dropbox.com/sh/7qze7m7g9fe2xjg/AAAlyQITZuUCtBUJxpxhIroIa/mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2?dl=1

Hi Francesco,

I am planning to use phyloseq to analyse alpha and beta diversities with my mataphlan2 outputs.

So, I need three inputs: a table, a tree and a fasta file.

I know how to get the table I can use the script “merge_metaphlan_tables.py”.

But how can I generate the tree and the fasta file? I would like to know if a script is available to extract the fasta sequences ? And I suppose I could generate a tree using those sequences ? Which R package would you recommend to create the tree?

Best regards,

Marilyne

Hi Marilyne,
you can find the Newick tree built with all the genomes included in MetaPhlAn in the GitHub repository (https://github.com/biobakery/MetaPhlAn/blob/master/metaphlan/utils/mpa_v30_CHOCOPhlAn_201901_species_tree.nwk), I don’t recall what the fasta file in phyloseq is needed for, but if you need to perform alpha and beta diversity measures, the tree should be enough

Hi Francesco,

Thank you!

I was able to create a phyloseq object.

I have now a question about the otu table, I got it from the script “merge_metaphlan_tables.py”.
But when I run analysis with phyloseq, I got this error: “function accepts only integers (counts)”
I assume I need to have the number of reads in my otu_table, so I was thinking about to multiply the relative abundance per the total number of reads but it seems that I do not have relative abundance in the otu table. I saw that when I calculate the sum for each sample, I get a number between 100 and 800. What does it mean ? Is it correlated with the number of reads ? Or Do I need an other script to convert it in read counts ?

Best regards,

Marilyne

For importing the MetaPhlAn profiles into phyloseq you can have a look at this R function Import a table of MetaPhlAn taxonomic abundances into phyloseq (github.com).

There is the possibility to have the estimated read counts in the output profile when MetaPhlAn is run with the option -t rel_ab_w_read_stats