StrainPhlan4: No markers were found for the clade

JJGiacomini · November 5, 2022, 1:10am

I am running the following code in order to extract species specific markers for Strainphlan:

extract_markers.py -c s__Haemophilus_parainfluenzae -o $METAPHLAN_4_DIR/db_markers

Here is the output:

Fri Nov  4 18:28:13 2022: Start extract markers execution
Fri Nov  4 18:28:17 2022:       Extracting markers from the Bowtie2 database...
Fri Nov  4 18:47:08 2022:       Done.
Fri Nov  4 18:47:08 2022:       Loading MetaPhlAn mpa_vJan21_CHOCOPhlAnSGB_202103 database...
Fri Nov  4 18:48:00 2022:       Done.
[e] No markers were found for the clade "s__Haemophilus_parainfluenzae".
Traceback (most recent call last):
  File "/bioware/metaphlan-4.0.2/bin/extract_markers.py", line 33, in <module>
    sys.exit(load_entry_point('MetaPhlAn==4.0.2', 'console_scripts', 'extract_markers.py')())
  File "/bioware/metaphlan-4.0.2/lib/python3.10/site-packages/metaphlan/utils/extract_markers.py", line 138, in main
    extract_markers(args.database, args.clades, args.output_dir)
  File "/bioware/metaphlan-4.0.2/lib/python3.10/site-packages/metaphlan/utils/extract_markers.py", line 122, in extract_markers
    return output_file
UnboundLocalError: local variable 'output_file' referenced before assignment

As you can see, I am using the mpa_vJan21_CHOCOPhlAnSGB_202103 database. I checked the marker_info text file and there are indeed 200 marker genes for Haemophilus parainfluenzae.

cat mpa_vJan21_CHOCOPhlAnSGB_202103_marker_info.txt | grep "s__Haemophilus_parainfluenzae" | wc -l # 200

Any have any suggestions? I don’t know what the is meant by “UnboundLocalError: local variable ‘output_file’ referenced before assignment” in the traceback or why no markers were found for the clade “s__Haemophilus_parainfluenzae”.

aitor.blancomiguez · November 7, 2022, 12:57pm

Hi @JJGiacomini
With the adoption of the SGB system in the MetaPhlAn / StrainPhlAn version 4, now the clade selection should be done at the SGB level (t__SGB…). This is due mainly to the reason that some species (e.g. F. praus or P. copri) span multiple SGBs.

JJGiacomini · November 7, 2022, 4:41pm

Thanks @aitor.blancomiguez

Please note that specifying ‘t__SGB9712’ did not work, where as ‘t__SGB9712_group’ did work.

If we look for that clade in the marker info text file, we see that the clade is specified as ‘t__SGB9712_group’

cat mpa_vJan21_CHOCOPhlAnSGB_202103_marker_info.txt | grep "t__SGB9712" | awk '{print $1, $3}'

And if we look for that clade in the list of detected clades in the metagenomes found by running strainphlan with the --print_clades_only flag, we see that again the clade is specified as ‘t__SGB9712_group’.

cat $METAPHLAN_4_DIR/db_markers/print_clades_only.tsv | grep "t__SGB9712"

However, if we look for that clade in Table S1 from the MetaPhlAn 4 paper, it is only ever specified as ‘t__SGB9712’ and does not have the ‘__group’ suffix.

Is the ‘__group’ suffix always required, or is that only for certain clades?

Thank you!

aitor.blancomiguez · November 11, 2022, 8:58am

Hi @JJGiacomini
In table S1 of the preprint, we report the initial SGB clustering of all the ~700k M-HQ non-redundant genomes in the initial genomic database. However, for the MetaPhlAn profiling, we perform a further quality control based on the number of genomes assigned to uSGBs as well as the grouping of some of them due to their closeness and impossibility to find markers for them separately. For those latest cases, the _group specification is necessary, but not for all the SGBs. When using the print_clades_only option you can check which of the SGBs need the _group specification

Topic		Replies	Views
Problem extracting the species marker genes from metaphlan4 database StrainPhlAn	1	400	May 9, 2023
Having issues with Strainphlan StrainPhlAn	15	2818	November 27, 2020
Strainphlan Error: No matching key/data pair found StrainPhlAn	1	654	April 19, 2021
Error when running extract_markers.py StrainPhlAn	22	3716	April 22, 2021
Strainphlan question on get the strain info StrainPhlAn	3	1384	March 8, 2021

StrainPhlan4: No markers were found for the clade

Related topics