Dear MetaPhlan3 developers,
I am very excited about this new release with so many more references
I did a testrun as this: for i in *.fastq; do metaphlan $i --input_type fastq --nproc 20 --unknown_estimation --index latest --add_viruses > ../metaphlan3/${i%.fastq}_profile.txt; done
I am a bit confused about the output. Please check attached file. Why are there two rows named ‘UNKNOWN’? One has only ‘0’ but the first has very high values (80-90+ %) which seems a lot taken into account that this is human fecal samples.
When I sum up all relative abundances I end up with 150-180% which is strangem too.
Please help me interpret my results!
Thank you!
Stef
Hi Stef,
For fecal samples, it is a quite high value, is it possible that the sample contains contaminants like human sequences?
About the >100% sum, the UNKNOWN value is referred to the sum of the relative abundances at one clade level, so if you sum up all the species’ relative abundance and add the UNKNOWN value you’ll get 100%.
I have removed human reads before running metaphlan3 (which were 4% as the highest in one sample). So that should not be it. And why two rows with ‘UNKNOWN’ one being zero and the other above 80%? Could you please take a look at the output I posted?
I very much appreciate your help interpreting the results!
/Stef
I identified the issue! The second row of UNKNOWN came from the negative control. It was 100 there and 0 for all samples. When merging all samples without the NC it looks fine.
But I am still VERY worried about the UNKNOWN in my samples being above 80%!
I’ll resolve this issue, it seems that the string printed when no output is available and the one for the unknown estimation are slightly different.
About increasing the mappability, the metagenome size seems below average, are these MiSeq reads?
Given the particularly longer read length, I’d try to use MetaPhlAn with a local alignment, you can do this by running MetaPhlAn with the --bt2_ps sensitive-local or --bt2_ps very-sensitive-local parameter.
Thanks! I will!
Could you please tell me how exactly sensitive and very sensitive differ? I cannot find that information in the tutorial. And which min_alignment_len do you recommend?
/Stef
For the parameters definition, I’ll point you to the Bowtie2 manual since it’s a bowtie2 parameter. I’d not decrease the min_alignment_len below 100, you should not have markers with that size and it should guarantee you to find enough hits.
Hi Francesco,
Using the local alignment I could decrease the UNKNOWN by around half. So this is much better but still about 40% left as unknown. Do you have any further suggestions on how to optimise the parameters to longer MiSeq reads and shallow datasets?
Thank you!
Stef
I’m glad it worked out. 40% is a reasonable number for UNKWNOWN.
For longer reads, the tuneable parameters are the two you used before (min_alignment_len and --bt2_ps, and are the one that would mostly impact on the increase of mappability.