Understanding strain_transmission.py output and "list index out of range" error

Thank you for making MetaPhlAn 4.1.1.

I have an FMT dataset with 79 longitudinal samples and 3 donors. When I run strain_transmission.py I encounter several issues:

  1. I found a bug where some of the output files were being written in the input folder rather than the specified output path. I was wondering if you found the same problem.

  2. For some of the SGB the “transmission_events.info” file was not computed and I got the following error:

Traceback (most recent call last):
  File "/shared/team/conda/mpa_411/bin/strain_transmission.py", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/shared/team/conda/mpa_411/lib/python3.12/site-packages/metaphlan/utils/strain_transmission.py", line 322, in main
    strain_transmission(args.tree, args.metadata, args.threshold, args.sgb_id,
  File "/shared/team/conda/mpa_411/lib/python3.12/site-packages/metaphlan/utils/strain_transmission.py", line 307, in strain_transmission
    threshold = get_threshold(training_distances, distr_threshold)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/shared/team/conda/mpa_411/lib/python3.12/site-packages/metaphlan/utils/strain_transmission.py", line 214, in get_threshold
    return distances[int(len(distances)*distr_threshold)]
           ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: list index out of range

Do you know what could be the issue? The trees have some extra samples (e.g. other donors that were not used in the end) so they are not included in the metadata file. Could this be a possible reason for this error? These samples would not have any common “relation_id” with the others, but they would still appear as transmission events for some SGB anyway, even if not specified in the metadata.

  1. For some of the SGB where transmission events were detected, it seems it is not reporting all the events below the threshold. For example, for this SGB14993:

Selected strain-transmission threshold: 0.059109782214254836
Number of transmission events: 12

47542_1_51 ↔ 47542_1_72
47542_1_72 ↔ 47542_1_81
47542_1_118 ↔ 47542_1_81
47542_1_118 ↔ 47542_1_51
47542_1_118 ↔ 47542_1_72
47542_1_118 ↔ 47542_1_68
47542_1_118 ↔ 47542_1_173
47542_1_68 ↔ 47542_1_72
47542_1_173 ↔ 47542_1_81
47542_1_173 ↔ 47542_1_51
47542_1_173 ↔ 47542_1_72
47542_1_173 ↔ 47542_1_68

However, if I look at the pairwise distance file tre.dist I see there are other pairs with a distance smaller than the threshold value 0.059109782214254836:

|47542_1_103|47542_1_38|0.08866441350825625|
|47542_1_103|47542_1_65|0.10706048407387056|
|47542_1_103|47542_1_93|0.10171325409904351|
|47542_1_103|47542_1_53|0.1321989072639618|
|47542_1_103|47542_1_61|0.13002588180336852|
|47542_1_103|47542_1_165|0.11233039054652781|
|47542_1_103|47542_1_131|0.12634654620236924|
|47542_1_103|47542_1_152|0.11775628763430487|
|47542_1_103|47542_1_120|0.10478067496486596|
|47542_1_103|47542_1_89|0.10967240354212773|
|47542_1_103|47542_1_147|0.15089484811037565|
|47542_1_103|47542_1_123|0.11065364475243172|
|47542_1_103|47542_1_160|0.11715830416646209|
|47542_1_103|47542_1_121|0.12767666237362849|
|47542_1_103|47542_1_135|0.1317097231054832|
|47542_1_103|47542_1_150|0.13152440943640867|
|47542_1_103|47542_1_81|0.1144435576139486|
|47542_1_103|47542_1_51|0.12167751703889663|
|47542_1_103|47542_1_72|0.14207036951688737|
|47542_1_103|47542_1_118|0.14668958850580296|
|47542_1_103|47542_1_68|0.14270003771905493|
|47542_1_103|47542_1_173|0.142753601292209|
|47542_1_103|47542_1_162|0.1093315315278234|
|47542_1_103|47542_1_30|0.18922066065884366|
|47542_1_103|47542_1_97|0.17334218666152834|
|47542_1_103|47542_1_49|0.17383563204435526|
|47542_1_103|47542_1_17|0.17390900671794543|
|47542_1_103|47542_1_34|0.1541674613380055|
|47542_1_103|47542_1_36|0.10922587879675272|
|47542_1_103|47542_1_19|0.1090532023026968|
> |47542_1_103|47542_1_13|0.0131048763147948|
|47542_1_38|47542_1_65|0.09434180764879481|
|47542_1_38|47542_1_93|0.08899457767396775|
|47542_1_38|47542_1_53|0.11948023083888602|
|47542_1_38|47542_1_61|0.11730720537829276|
|47542_1_38|47542_1_89|0.09695372711705195|
|47542_1_38|47542_1_81|0.10172488118887284|
|47542_1_38|47542_1_51|0.10895884061382087|
|47542_1_38|47542_1_72|0.1293516930918116|
|47542_1_38|47542_1_68|0.12998136129397916|
|47542_1_38|47542_1_97|0.16062351023645258|
|47542_1_38|47542_1_49|0.1611169556192795|
> |47542_1_65|47542_1_93|0.025995105542266476|
|47542_1_65|47542_1_89|0.09700482164019344|
|47542_1_65|47542_1_81|0.10177597571201431|
|47542_1_65|47542_1_72|0.12940278761495308|
|47542_1_65|47542_1_68|0.13003245581712067|
|47542_1_65|47542_1_97|0.16602110978937887|
|47542_1_93|47542_1_97|0.1606738798145518|
|47542_1_53|47542_1_65|0.11953132536202749|

Why are these pairs below the threshold not reported in the “transmission_events.info” output?

Any help with this would be really appreciated.