I am using StrainPhlan in MetaPhlan 4.0.6. I have been trying to generate strain phylogenies for taxonomies that appear in >50% of my samples according to my MetaPhlan outputs. Even with this threshold I am getting a lot of strainphlan issues that say “too many samples discarded”.
I noticed that there is also the following options:
[--marker_in_n_samples MARKER_IN_N_SAMPLES] [--sample_with_n_markers SAMPLE_WITH_N_MARKERS]
Both of these thresholds are automatically set to 80%. Are there suggested minimum thresholds to use or would setting these to 1% be appropriate? Or should I set the --marker_in_n_samples to 1% and keep the --sample_with_n_markers to 50% to minimize low quality (uninformative) alignments?