StrainPhlAn - Secondary references and samples

I’m familiarizing myself with StrainPhlAn. I have ran all the previous steps described in the documentation tutorial and so far so good. I’m now going through the different parameters that StrainPhlAn provides. Among them are those that refer to secondary features (--secondary_samples, --secondary_references, etc). I have tried to find information on the utility that these parameters provide, but besides the --help information the tool displays I haven’t been successful understanding them. Could you point me to where I can find more information on this? or could you explain me what is the purpose of these parameters?

Hi @david-castillo
While the primary samples / references will be used to filter out markers that are not present in enough samples (by the --marker_in_n_samples), the secondary samples / references will only be included if they have enough of these filtered markers (–secondary_sample_with_n_markers). This is particularly useful when you are profiling thousands of samples and you want, for example, to define a robust set of markers for the phylogeny using as primary samples those with a minimum depth of coverage for your species of interest, and then add as secondary those with less coverage but that could adapt to that robust set of markers.


Hello @aitor.blancomiguez ,
thank you for the clarification, it makes total sense.
Best regards