Strainphlan Error: No matching key/data pair found

Hi Everybody,

I hope you can help me. I am pretty new in metagenomic analysis and currently trying to understand the biobakery tools and especially to apply them.

I am currently into one error using Strainphlan which is can not solve.

For a small set of 4 samples I generate .pkl files using the following command

sample2markers.py -i sams/*.sam.bz2 -o consensus_markers -n 8

I tried to further use strainphlan to generate a tree file and an alignment.

strainphlan -s consensus_markers/*.pkl -m db_markers/s__Faecalibacterium_prausnitzii.fna -r eference_genomes/G000273725.fna -o output -n 8 -c s__Faecalibacterium_prausnitzii --mutation_rates

I am then runnign into an error
Mon Apr 19 07:10:52 2021: Start StrainPhlAn 3.0 execution
Mon Apr 19 07:10:52 2021: Creating temporary directory…
Mon Apr 19 07:10:53 2021: Done.
Mon Apr 19 07:10:53 2021: Getting markers from main sample files…
Mon Apr 19 07:10:59 2021: Done.
Mon Apr 19 07:10:59 2021: Getting markers from main reference files…Warning: [blastn] Examining 5 or more matches is recommended
Error: mdb_dbi_open: MDB_NOTFOUND: No matching key/data pair found

Mon Apr 19 07:11:50 2021: Done.
Mon Apr 19 07:11:50 2021: Removing bad markers / samples…
[e] Phylogeny can not be inferred. Too many samples were discarded
Mon Apr 19 07:11:50 2021: Stop StrainPhlAn 3.0 execution.

I hope you can help me with this issue.

Thank you very much for your help!!

Best
Sandra

Hi @sarei
Thanks for getting in touch.
From you output it looks like StrainPhlAn is not able to reconstruct enough F. prausnitzii markers in your samples to infer the phylogeny. If StrainPhlAn is not able to reconstruct enough clade-specific markers from a sample (by default 20 markers) that sample will be discarded (this behaviour can be modified by using the --sample_with_n_markers parameter). Moreover, the mininum amount of filtered samples to run the phylogeny is 4, so in case one of your samples was discarded you will not be able to build the tree. If you use the option --print_clades_only StrainPhlAn will tell you the clades it is able to reconstruct with the specified parameters, e.g:
strainphlan -s consensus_markers/*.pkl -m db_markers/s__Faecalibacterium_prausnitzii.fna -r eference_genomes/G000273725.fna -o output -n 8 -c s__Faecalibacterium_prausnitzii --print_clades_only
I hope this helps