Extract_markers.py: input database path format

nick-youngblut · January 7, 2023, 5:44pm

From the extract_markers.py script docs, it is not clear how the --database path should be formatted. I originally tried using 4.0.3, which is the directory name holding all of the bowtie2 db files, but I received the error: Could not locate a Bowtie index corresponding to basename "4.0".

After looking at the generate_markers_fasta function, I see that one must supply the bowtie2 database basename + a file extension (e.g., mpa_vJan21_CHOCOPhlAnSGB_202103.md5). It would be helpful if the extract_markers.py script docs included that info.

Notably, for metaphlan, I could just use the directory containing the bowtie2 database (e.g., 4.0.3), so the UI seems to differ between metaphlan and extract_markers.py.

nick-youngblut · January 7, 2023, 6:12pm

Also, it appears that extract_markers.py assumes that the *.pkl database file is bzip-compressed:

    def load_database(self, verbose=True):
        """Loads the MetaPhlAn PKL database"""
        if self.database_pkl is None:
            if verbose:
                info('Loading MetaPhlAn {} database...'.format(self.get_database_name()))
            self.database_pkl = pickle.load(bz2.BZ2File(self.database))
            if verbose:
                info('Done.')

Maybe a try - except would be helpful here, in order to allow for the input of an uncompressed pickle file? Does bzip2 compression really help reduce the size of the pickle file?

aitor.blancomiguez · January 17, 2023, 4:57pm

Hi @nick-youngblut
For all strainphlan-related scripts, the --database should point to the metaphlan PKL database (that is always bz2 compressed when exported by us even if not in the file extension). I will update the docs to make this fact clearer

nick-youngblut · January 18, 2023, 3:46pm

Thank you for updating the docs. I’m surprised that bzip2 compression helps with pickled files, since they are already binary. Regardless, thank you for clarifying!

Topic		Replies	Views
Error when running extract_markers.py StrainPhlAn	22	3666	April 22, 2021
Error with recognizing bowtie2 (bt2l) index file MetaPhlAn	0	691	March 14, 2021
Metaphlan database problem MetaPhlAn	2	126	January 23, 2025
Metaphlan marker level analysis MetaPhlAn	6	908	July 22, 2020
Problem extracting the species marker genes from metaphlan4 database StrainPhlAn	1	385	May 9, 2023

Extract_markers.py: input database path format

Related topics