Having issues with Strainphlan

mradz19 · February 19, 2020, 12:18am

I downloaded metaphlan2 using conda and I am attempting to use strainphlan to profile my samples at the strain level.

I have been following the instructions on this page:
https://bitbucket.org/biobakery/biobakery/wiki/strainphlan

I successfully completed the sample2markers.py step, however I cannot run step 3: Identify clades detected in the samples and build reference databases. The command i am using is:

strainphlan.py --ifn_samples p100_bowtie2_aligned.markers --output_dir markers/ --print_clades_only > clades.txt

Which results in the following error:

Traceback (most recent call last):
File “/mnt/nfs/home/30041036/.conda/envs/metaphlan2/bin/strainphlan.py”, line 1585, in
strainphlan()
File “/mnt/nfs/home/30041036/.conda/envs/metaphlan2/bin/strainphlan.py”, line 1581, in strainphlan
strainer(args)
File “/mnt/nfs/home/30041036/.conda/envs/metaphlan2/bin/strainphlan.py”, line 1365, in strainer
db = pickle.load(bz2.BZ2File(args[‘mpa_pkl’]))
File “/mnt/nfs/home/30041036/.conda/envs/metaphlan2/lib/python3.7/bz2.py”, line 92, in init
self._fp = _builtin_open(filename, mode)
IsADirectoryError: [Errno 21] Is a directory: ‘/mnt/nfs/home/30041036/.conda/envs/metaphlan2/bin/metaphlan_databases’

How can I fix this? I can’t see that anyone else has had similar issues. Also in step 4 --ifn_markers s__Eubacterium_siraeum.markers.fasta is used in the command, how do I generate this fasta file for the species I am interested in (e.g. staphylococcus aureus)?

aitor.blancomiguez · February 19, 2020, 8:48am

Hi mradz19,
Could you tell me the version of the MetaPhlAn2 database you used for the profiling?

For the question about the step 4, you should use the script extract_markers.py
You can take a look on this example (step 4): https://bitbucket.org/biobakery/metaphlan2/src/default/README.md#markdown-header-usage
For this script, remember to specify the correct metaphlan2 database version.

Best,
Aitor

mradz19 · February 19, 2020, 10:01pm

Hi Aitor,

This is the version:

MetaPhlAn version 2.96.1 (02 Feb 2020)

aitor.blancomiguez · February 20, 2020, 9:58am

Hi mradz19,
Try to add the param –index v296_CHOCOPhlAn_201901 to the strainphlan execution like:
strainphlan.py --ifn_samples p100_bowtie2_aligned.markers --output_dir markers/ --print_clades_only --index v296_CHOCOPhlAn_201901 > clades.txt

Best,
Aitor

mradz19 · February 21, 2020, 2:35am

Hi Aitor,

I tried adding that parameter and got the same error message.

aitor.blancomiguez · February 21, 2020, 10:18am

Hi Michael,
Could you check then the content of this folder: /mnt/nfs/home/30041036/.conda/envs/metaphlan2/bin/metaphlan_databases

mradz19 · February 23, 2020, 11:16pm

Hi Aitor,

These are the contents of that folder:

mpa_latest
mpa_v295_CHOCOPhlAn_201901.1.bt2
mpa_v295_CHOCOPhlAn_201901.2.bt2
mpa_v295_CHOCOPhlAn_201901.3.bt2
mpa_v295_CHOCOPhlAn_201901.4.bt2
mpa_v295_CHOCOPhlAn_201901.fna.bz2
mpa_v295_CHOCOPhlAn_201901.md
mpa_v295_CHOCOPhlAn_201901.pkl
mpa_v295_CHOCOPhlAn_201901.rev.1.bt2
mpa_v295_CHOCOPhlAn_201901.rev.2.bt2
mpa_v295_CHOCOPhlAn_201901.tar

I chaged the --index tag to v295_CHOCOPhlan_201901 and the command ran, however the output .txt file is empty.

This is the log of the run:

strainphlan.py --ifn_samples p100_bowtie2_aligned.markers --output_dir markers/ --print_clades_only --index v295_CHOCOPhlAn_201901 > clades2.txt

CK_zhu · March 14, 2020, 4:00am

Hello
I have the same problem, nothing output when I run the last step strainphlan.py

#MetaPhlAn version 2.96.1 (02 Feb 2020)

# run metaphlan2 on the demo sample input files
>metaphlan2.py $INPUT_FOLDER/13530241_SF05.fasta.gz $OUTPUT_FOLDER/13530241_SF05_profile.txt --bowtie2out $OUTPUT_FOLDER/13530241_SF05_bowtie2.txt --samout $OUTPUT_FOLDER/13530241_SF05.sam.bz2 --input_type multifasta --index mpa_v296_CHOCOPhlAn_201901 --nproc $THREADS

Elapsed time to run MetaPhlAn2: 61.940003871917725 s


# run sample to markers on all of the samples
>sample2markers.py --ifn_samples $OUTPUT_FOLDER/13530241_SF05.sam.bz2 --input_type sam --output_dir $OUTPUT_FOLDER --nprocs $THREADS

/software/StrainPhlAn2/biobakery-biobakery-414eab928577/demos/biobakery_demos/data/strainphlan/output/13530241_SF05.sam.bz2 | samtools view -bS - | samtools sort - -o /software/StrainPhlAn2/biobakery-biobakery-414eab928577/demos/biobakery_demos/data/strainphlan/output/13530241_SF05.sam.bz2.bam.sorted | samtools mpileup -u - | bcftools view -c -g -p 1.1 - | fix_AF1.py --input_file - | vcfutils.pl vcf2fq



# run metaphlan2 strainer on all samples (add the flag to reduce the default as these are subsampled)
strainphlan.py --index v296_CHOCOPhlAn_201901 --ifn_samples $OUTPUT_FOLDER/*.markers --ifn_markers $INPUT_FOLDER/s__Eubacterium_siraeum.markers.fasta --ifn_ref_genomes $INPUT_FOLDER/GCF_000154325.fna.bz2 --output_dir $OUTPUT_FOLDER --nprocs_main $THREADS --clades s__Eubacterium_siraeum --marker_in_clade 0.2 --keep_alignment_files


2020-03-14 11:40:15,504 | INFO | __main__ | strainer | 1364 | Load mpa_pkl
2020-03-14 11:40:25,664 | INFO | __main__ | strainer | 1380 | Get clades from db
2020-03-14 11:40:28,017 | INFO | __main__ | strainer | 1444 | Add reference genomes
2020-03-14 11:40:28,032 | DEBUG | __main__ | add_ref_genomes | 617 | add 1 reference genomes
...
2020-03-14 11:40:28,495 | DEBUG | __main__ | filter_sequence | 475 | sample GCF_000154325, number of markers after N_in_marker: 150
sample GCF_000154325, number of markers after marker_strip_length: 150
2020-03-14 11:40:28,495 | DEBUG | __main__ | strainer | 1503 | remove samples with percentage of markers less than marker_in_clade
2020-03-14 11:40:28,495 | DEBUG | __main__ | build_tree | 850 | skip clade s__Eubacterium_siraeum because number of present samples is 1
2020-03-14 11:40:28,495 | INFO | __main__ | strainer | 1550 | Finished!

aitor.blancomiguez · March 16, 2020, 8:42am

Hi Michael,
When StrainPhlAn is not able to return any clade could be due two main reasons:

The database version you used for create the SAM file is different than the version you used for executing StrainPhlAn. This can be checked taking a look on the first line of the abundances report file generated together with the SAM file.
The sample2markers script was not able to reconstruct enough markers for your sample.

If you could share your markers file I could take a deeper look on the problem.

Best,
Aitor

aitor.blancomiguez · March 16, 2020, 8:45am

Hi CK_zhu,
As you can see in the lines returned by StrainPhlAn:

2020-03-14 11:40:28,495 | DEBUG | main | build_tree | 850 | skip clade s__Eubacterium_siraeum because number of present samples is 1

StrainPhlAn only detected the clade in one of your files, so the strain-level analysis is imposible to execute.

Best,
Aitor

lzh1982 · November 20, 2020, 2:18am

Hi Aitor,
Where I can download the STRAINPHLAN_DB_REFERENCE and STRAINPHLAN_DB_MAKERS directly?Thank you very much!

Best regards

Li Zhihua

aitor.blancomiguez · November 20, 2020, 10:31am

Hi @lzh1982
The StrainPhlAn markers’ database is same as the MetaPhlAn markers’ database.
If you installed MetaPhlAn 3 via conda, StrainPhlAn and the markers’ database will be also downloaded and installed, please check the tutorial for more info: https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-3.0#installation
If you have any issue with the conda installatioiin, you can also download the database from the following links:

Best,
Aitor

lzh1982 · November 21, 2020, 3:13am

Hi Aitor,
Thank you very much for your explanation! I have installed biobakery_workflow through docker. I need know how to download the reference database not only maker database and install STRAINPHLAN_DB_REFERENCE manually? Would you help me?

Best regards

Li Zhihua

fbeghini · November 23, 2020, 2:51pm

Hi,
You should set the the environmental variables STRAINPHLAN_DB_REFERENCE and STRAINPHLAN_DB_MAKERS inside your Docker instance using export.

STRAINPHLAN_DB_REFERENCE should point to the folder containing the reference genomes used when running StrainPhlAn and STRAINPHLAN_DB_MARKERS points to the folder containing the StrainPhlAn marker files.

lzh1982 · November 24, 2020, 11:53pm

Dear Dr.Francesco.Beghini,
Thank you very much for your explanation! I do not know where I can download the STRAINPHLAN_DB_REFERENCE database. Because I can not install directly using the order:biobakery_workflows --install wmgx, so I want to download the corresponding datatbase and install manually! Many thanks!

Best regards

Li Zhihua

fbeghini · November 27, 2020, 10:26am

You should retrieve the genomes of the species of interest from any genomic repository (e.g. Refseq)

Topic		Replies	Views
Error running strainphlan StrainPhlAn	9	2151	February 12, 2021
Problem running StrainPhlan StrainPhlAn	6	1350	August 21, 2024
Error when running extract_markers.py StrainPhlAn	22	3671	April 22, 2021
StrainPhlan 4 tutorial issues StrainPhlAn	3	640	March 22, 2024
Question about Srainphlan call phylophlan StrainPhlAn	10	1811	May 12, 2020

Having issues with Strainphlan

Related topics