Metaphlan & Strainphlan Output Expectation

Jeffrey_Chiu · May 28, 2021, 6:25pm

Hi Biobakery team,

The most commonly identified bacterial species from metaphlan in my dataset is overwhelmingly Pseudomonas yamanorum. Thus I found it quite odd that I wasn’t able to produce a strainphlan Raxml tre for Pseudomonas yamanorum because “too many markers were discarded”. This is the error when I try to visualize the strain relationship of P. yamanorum:

Could anyone explain why this would occur? Quite confused. Thank you!

aitor.blancomiguez · May 28, 2021, 7:44pm

Hi @Jeffrey_Chiu
Thanks for getting in touch. Given the information provided, it is a bit difficult to assess the problem. Could you provide some additional information:

The avg number of reads of your samples and the relative abundance of P. yamanorum that metaphlan reported. While MetaPhlAn profiling is really sensitive even with low abundant species, StrainPhlAn can struggle to reconstruct enough markers genes for low abundant taxa or shallow sequenced samples. In some cases, the use of some specific parameters (as --sample_with_n_markers) could help.
The number of samples you are using. As more samples you include in your analysis, more noise can be included and more difficult will be to find a good number of shared marker sequences. For this reason, the use of some parameters (as --marker_in_n_samples) could help.
The current set of parameters you used. This is related with the previous two points. By default, StrainPhlAn will only use samples with > 20 markers and markers present in more than 80% of the samples.

Best,
Aitor

Jeffrey_Chiu · May 29, 2021, 1:15am

Hi @aitor.blancomiguez ,

Thank you so much for getting back to me.

Overall, my dataset have an average 23772927 reads per sample. The relative abundance of P. yamanorum for each samples are different. But with the merged_abundance_table_species_only file (That can be used to generate the heatmap) I can calculate the average relative abundance across my samples as well as how many samples was it detected in:
19.45383313 % abundance /sample & detected in 94 out of 150 samples.

I will give this a try and get back to you. Thank you.

This is the command I use:
strainphlan -s consensus_markers/*.pkl -m db_markers/s__Pseudomonas_yamanorum.fna -r reference_genomes/pseudomonas_yamanorum.fasta -o output_p_yamanorum/ -n 8 -c s__pseudomonas_yamanorum --mutation_rates

Topic		Replies	Views
Deeper issue to "Phylogeny can not be inferred. Too many samples were discarded" StrainPhlAn	5	909	March 7, 2022
Metaphlan-Strainphlan discrepancy StrainPhlAn	1	636	July 28, 2022
Phylogeny can not be inferred. No enough markers were kept for the samples StrainPhlAn	4	467	May 31, 2022
Strainphlan execution failed due to too many discarded samples StrainPhlAn	8	803	July 5, 2022
[e] Phylogeny can not be inferred. Too many markers were discarded StrainPhlAn	1	1253	May 18, 2020

Metaphlan & Strainphlan Output Expectation

Related topics