Given the large amount of memory required to run metaphlan4 (from the docs: “minimum of 15GB or memory is needed”), it would be helpful to have a smaller database for testing purposes. This could also be useful for metaphlan4 CI testing, which seems to only comprise help-doc testing (MetaPhlAn/pythonpackage.yml at master · biobakery/MetaPhlAn · GitHub) – possibly due to the high memory that would be required for CI tests using the full metaphlan4 database.
We are currently working on building a toy database for our tutorials, I will ping you when it is available
You can find the toy database here: Index of /biobakery4/metaphlan_databases/toy_databases
It contains only the SGBs that are profilable in the tutorials available on github:
metaphlan4 · biobakery/biobakery Wiki · GitHub
strainphlan4 · biobakery/biobakery Wiki · GitHub
StrainPhlAn 4 · biobakery/MetaPhlAn Wiki · GitHub
That would be great! Once the toy database is created, I could help update the GitHub action for CI testing, if you’d like the contribution.
The toy database will really help with cloud-based development (e.g., GitHub Codespaces) where the memory and disk space are rather limited.
Thanks a lot! I also want to test the toy database. Can I know how I need to specify it is the toy that is being used by metaphaln4? Which arguments should I use? I have already downloaded the md5 and tar from Index of /biobakery4/metaphlan_databases/toy_databases.
You should then uncompress the tar file and specify the db path and index name when calling metaphlan with the parameters --bowtie2db and --index, respectively. E.g:
metaphlan … --bowtie2db /path/where/db/is/uncompressed --index mpa_vJan21_TOY_CHOCOPhlAnSGB_202103 …