Metaphlan & Strainphlan Output Expectation

Hi Biobakery team,

The most commonly identified bacterial species from metaphlan in my dataset is overwhelmingly Pseudomonas yamanorum. Thus I found it quite odd that I wasn’t able to produce a strainphlan Raxml tre for Pseudomonas yamanorum because “too many markers were discarded”. This is the error when I try to visualize the strain relationship of P. yamanorum:

Could anyone explain why this would occur? Quite confused. Thank you!

Hi @Jeffrey_Chiu
Thanks for getting in touch. Given the information provided, it is a bit difficult to assess the problem. Could you provide some additional information:

  • The avg number of reads of your samples and the relative abundance of P. yamanorum that metaphlan reported. While MetaPhlAn profiling is really sensitive even with low abundant species, StrainPhlAn can struggle to reconstruct enough markers genes for low abundant taxa or shallow sequenced samples. In some cases, the use of some specific parameters (as --sample_with_n_markers) could help.
  • The number of samples you are using. As more samples you include in your analysis, more noise can be included and more difficult will be to find a good number of shared marker sequences. For this reason, the use of some parameters (as --marker_in_n_samples) could help.
  • The current set of parameters you used. This is related with the previous two points. By default, StrainPhlAn will only use samples with > 20 markers and markers present in more than 80% of the samples.

Best,
Aitor

Hi @aitor.blancomiguez ,

Thank you so much for getting back to me.

Overall, my dataset have an average 23772927 reads per sample. The relative abundance of P. yamanorum for each samples are different. But with the merged_abundance_table_species_only file (That can be used to generate the heatmap) I can calculate the average relative abundance across my samples as well as how many samples was it detected in:
19.45383313 % abundance /sample & detected in 94 out of 150 samples.

I will give this a try and get back to you. Thank you.

This is the command I use:
strainphlan -s consensus_markers/*.pkl -m db_markers/s__Pseudomonas_yamanorum.fna -r reference_genomes/pseudomonas_yamanorum.fasta -o output_p_yamanorum/ -n 8 -c s__pseudomonas_yamanorum --mutation_rates