Metaphlan3 keeps giving an UNKNOWN 100 output

reymonera · June 10, 2022, 5:19pm

Hi,

I’m currently trying to perform a community analysis for milk samples. I’ve received the data and performed a trimming using trimmomatic and then proceeded to use metaphlan. However, all the samples have returned an UNKNOWN 100 output.
I used the following command:

#!/bin/bash

module load miniconda
conda activate metaphlan

METAPHLAN_DB=meta_database/

for f in *.gz
do name=$(basename $f .fastq.gz)
   metaphlan --nproc $SLURM_CPUS_PER_TASK --bowtie2out ${name}.bt2.bz2 --input_type fastq --bowtie2db $METAPHLAN_DB $f > ${name}_profile.txt
done

And I received this output:

#mpa_v30_CHOCOPhlAn_201901
#/home/ccastillo/.conda/envs/metaphlan/bin/metaphlan L006_S3_L001_R1_001.fastq.gz --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --bowtie2db meta_db --input_type fastq --unknown_estimation -o profiled_me$
#SampleID       Metaphlan_Analysis
#clade_name     NCBI_tax_id     relative_abundance      additional_species
UNKNOWN -1	100.0

So, I decided to perform a metaphlan run with just one sample. I used the following command:

metaphlan L006_S3_L001_R1_trimmed.fastq.gz --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --bowtie2db meta_db --input_type fastq --unknown_estimation -o profiled_metagenome.txt

And I still recieved the same output. I tried the raw samples (before the trimming) using the following command:

metaphlan L006_S3_L001_R1_001.fastq.gz --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --bowtie2db meta_db --input_type fastq --unknown_estimation -o profiled_metagenome.txt

And the same problem persisted.

Reading that the read length might be an issue, I performed the following command:

gunzip -c L006_S3_L001_R1_trimmed.fastq.gz | awk 'NR%4 == 2 {lengths[length($0)]++} END {for (l in lengths) {print l, lengths[l]}}' > count.txt

And after performing head and tail in the output file, I noticed that most reads are in the 250 nt length. Which apparently isn’t a problem:

So, right now, I’m out of ideas. I was wondering if maybe it has something to do with the database, and maybe I’m specifying it wrongly.

This is the version I’m currently using:

MetaPhlAn version 3.0.14 (19 Jan 2022)

Thanks for any help you could give me!

leonard.dubois · June 13, 2022, 8:50am

Hi,

indeed it looks like the database is not specified correctly. In your meta_database/ folder you should have a set of bowtie indexes files (.bt2 files). Try specifying the path including the basename of these files. Usually it looks like -x meta_database/mpa_v30_CHOCOPhlAn_201901

Hope this is clear and will solve your issue. Feel free to ask if not
Have a nice day!

Léonard

reymonera · June 13, 2022, 7:10pm

Hi Léonard,

I applied the change you suggested. However, I still get an UNKNOWN 100. Here’s how the new script looked like:

#!/bin/bash

module load miniconda
conda activate metaphlan

METAPHLAN_DB=meta_database/mpa_v30_CHOCOPhlAn_201901

for f in *.gz
do name=$(basename $f .fastq.gz)
   metaphlan --nproc $SLURM_CPUS_PER_TASK --bowtie2out ${name}.bt2.bz2 --input_type fastq --bowtie2db $METAPHLAN_DB $f > ${name}_profile.txt
done

In which meta_database/mpa_v30_CHOCOPhlAn_201901 was the new path, but it only created a new directory that downloaded the database again.

Regards,
Camila C.

aitor.blancomiguez · June 15, 2022, 12:04pm

Hi @reymonera
As metaphlan is producing the output (even if it is UNKNOWN 100%) it means that the execution was performed correctly. This is not really an error but a report saying that after mapping all your reads, MetaPhlAn is not able to detect any microbial species in your sample. This does not mean that your sample does not contain any microorganism, but that with the current parameters, MetaPhlAn is not able to detect them. By default, MetaPhlAn needs to find 20% of the markers of a species to report the species is present in the sample. This is controlled by the stat_q parameter (Check this post Fungal signatures - #2 by aitor.blancomiguez). In samples with low microbial biomass, 20% can be too strict (I am not an expert on human milk microbiome, but I think this might be your case). Reducing the --stat_q parameter (maybe to 10%) might help.

reymonera · June 15, 2022, 7:52pm

Hi!

Thanks for your input!

I changed the stat_q flag to 0.1 and 0.05 and I’m still getting that error. Here’s my attempt:

metaphlan --nproc 6 --bowtie2out L006_S3_L001_R1_trimmed.bt2.bz2 --input_type fastq --stat_q 0.05 --bowtie2db meta_db L006_S3_L001_R1_trimmed.fastq.gz

And here’s the output:

#mpa_v30_CHOCOPhlAn_201901
#/home/ccastillo/.conda/envs/metaphlan/bin/metaphlan --nproc 6 --bowtie2out L006_S3_L001_R1_trimmed.bt2.bz2 --input_type fastq --stat_q 0.05 --bowtie2db meta_db L006_S3_L001_R1_trimmed.fastq.gz
#SampleID	Metaphlan_Analysis
#clade_name	NCBI_tax_id	relative_abundance	additional_species
UNKNOWN	-1	100.0

Thanks for any input you could give me.

Regards,
Camila C.

aitor.blancomiguez · June 17, 2022, 7:31am

Hi @reymonera
Can you check the content of the L006_S3_L001_R1_trimmed.bt2.bz2 file? this file contains a simplified version of the mapping results and it will be really useful to see how many reads are mapping.

reymonera · June 17, 2022, 4:52pm

Hi!

Yes, this is what appears inside the .bt2 file:

M04874:89:000000000-CTKPJ:1:1102:2622:14981__1.8674     610243__A0A060QJF7__ASAP_2763
M04874:89:000000000-CTKPJ:1:1102:23769:26112__1.11366   610243__A0A060QJF7__ASAP_2763
M04874:89:000000000-CTKPJ:1:1114:19261:2019__1.79569    610243__A0A060QJF7__ASAP_2763
M04874:89:000000000-CTKPJ:1:2101:3489:20055__1.89927    610243__A0A060QJF7__ASAP_2763
M04874:89:000000000-CTKPJ:1:2106:16354:27005__1.124143  610243__A0A060QJF7__ASAP_2763
M04874:89:000000000-CTKPJ:1:2113:7893:24764__1.167862   610243__A0A060QJF7__ASAP_2763
#nreads 173884
#avg_read_length        234.74154608819674

aitor.blancomiguez · June 20, 2022, 2:04pm

Hi @reymonera
So I see the problem. From your 173,884 reads, only 6 are mapping against MetaPhlAn markers (in this case just to one marker) so it is not possible for MetaPhlAn to accurately detect any species on it.

reymonera · June 20, 2022, 4:51pm

Hi,

I see. Thanks for the input. However, what should be the course of action here? Is there something wrong with the sequences?

Also, just as a question: are .bt2 files supposed to be compressed inside the database directory? Because any time I tried to perform the head command on these files in the database directory, they’re unreadable.

fbeghini · June 20, 2022, 5:49pm

Hi @reymonera,

Are you running metaphlan on a 16S rRNA dataset? From the total number of reads and the average length it seems so.

reymonera · June 24, 2022, 7:15pm

Hi,

After your question, I asked again if these were indeed shotgun reads as I’ve been told and they confirmed this again. I’ll try to check again but what number of reads and average length should I expect from a shotgun run?

Thanks!

aitor.blancomiguez · June 28, 2022, 10:15am

Hi @reymonera
If you are sure your sequences are from shotgun sequencing and not 16S, there is not too much to do. I might be that there is nothing wrong in your sample. Shotgun sequencing of milk samples for microbial DNA is not an easy task. As an example, in breast milk samples, most of the reads will come from human DNA and the microbial DNA is so low that it is posible to get no reads at all mapping against microbial species (https://journals.asm.org/doi/10.1128/mSystems.00164-16). I am sorry I cannot give you more help than this

reymonera · July 7, 2022, 1:03pm

Hi! Just to report on this: Apparently there was a confusion labelling the shotgun samples and they got mixed with the 16S samples. I’ve ran the shotgun reads with MetaPhlan and it all went well after putting in some sensible parameters. Human DNA was very mixed with the microbial DNA.

Thanks for the help and input!

CMagnoBR · November 21, 2022, 3:09pm

Hi! I have the same issue here. Can you share the parameters applied? Thanks!

Topic		Replies	Views
100 Unknown raw reads MetaPhlAn	5	1090	March 4, 2021
MetaPhlan3 --unknown_estimation MetaPhlAn	28	3644	December 6, 2021
Analysis says 100% unknown? MetaPhlAn	1	431	June 7, 2022
Metaphlan analysis show 100 % unknown MetaPhlAn	1	322	December 19, 2022
Annotation of supplied contig file MetaPhlAn	1	310	May 18, 2022

Metaphlan3 keeps giving an UNKNOWN 100 output

Related topics