Scripting with strainphlan.py

blair.merrick · August 28, 2023, 2:01pm

Hi,

I am trying to run a script with a loop to using a number of extracted markers (located in “db_markers”) over my sample set (.pkl files) in “consensus_markers” and output results into “output”.

For a particular SGB e.g. 9228, I am able to run the following with no issue:

strainphlan -s consensus_markers/*.pkl
-m db_markers/t__SGB9228.fna
-o output/SGB9228
-n 8
-c t__SGB9228
–mutation_rates

When I then try to run this in a script:

#!/bin/bash

Define paths

CONS_MARKERS_DIR=“consensus_markers”
DB_MARKERS_DIR=“db_markers”
OUTPUT_BASE=“output”

Loop through each .fna file in the db_markers folder

for REFERENCE_FILE in ${DB_MARKERS_DIR}/t__*.fna; do
# Extract the reference group name from the filename
REFERENCE_GROUP=“$(basename “${REFERENCE_FILE}” .fna)”

# Construct the output directory path
OUTPUT_DIR="${OUTPUT_BASE}/${REFERENCE_GROUP}" 

# Run the strainphlan command
strainphlan -s ${CONS_MARKERS_DIR}/*.pkl \
             -m "$REFERENCE_FILE" \
             -o "$OUTPUT_DIR" \
             -n 8 \
             -c "$REFERENCE_GROUP" \
             --mutation_rates

done

I get the following error…

Processing file: db_markers/t__SGB9228.fna
Reference group: db_markers/t__SGB9228
Output directory: output/db_markers/t__SGB9228
Running strainphlan command…
Mon Aug 28 14:56:53 2023: Start StrainPhlAn 4.0.6 execution
Mon Aug 28 14:56:53 2023: Creating temporary directory…
Mon Aug 28 14:56:53 2023: Done.
Mon Aug 28 14:56:53 2023: Filtering markers and samples…
Mon Aug 28 14:56:53 2023: Getting markers from main samples…
Mon Aug 28 14:56:55 2023: Done.
Mon Aug 28 14:56:55 2023: Getting markers from main references…
Mon Aug 28 14:56:55 2023: Done.
Mon Aug 28 14:56:55 2023: Removing bad markers / samples…
Mon Aug 28 14:56:55 2023: Done.
Mon Aug 28 14:56:55 2023: Getting markers from secondary samples and references…
Mon Aug 28 14:56:55 2023: Done.
Mon Aug 28 14:56:55 2023: Done.
Mon Aug 28 14:56:55 2023: Writing samples as markers’ FASTA files…
Mon Aug 28 14:56:55 2023: [Error] An error ocurred when creating the “[Errno 2] No such file or directory: ‘output/db_markers/t__SGB9228/tmpehrlwlp_/db_markers/t__SGB9228.StrainPhlAn4’” folder
Mon Aug 28 14:56:55 2023: Stop StrainPhlAn execution.
Done processing file: db_markers/t__SGB9228.fna

It looks like it is running into an issue with where the temporary files are being created, but despite (many) modifications to the script (all driven by AI) I can’t work out how to get around this, and think it may require a modification of the strainphlan.py script?

Already way beyond my level of expertise, and so would be grateful for any advice as to how to overcome this issue.

Thanks in advance.

blair.merrick · August 29, 2023, 2:01pm

for f in db_markers/t__*.fna; do
echo “Running StrainPhlAn on ${f}”
bn=$(basename ${f})
id=${bn%.fna}
mkdir -p “output/${id}”
strainphlan -s consensus_markers/**.pkl
-m “$f”
-o “output/${id}”
-n 8
-c “$id”
–mutation_rates
done

aitor.blancomiguez · September 1, 2023, 12:34pm

Hi @blair.merrick , you can try to run strainphlan without the -m parameter. If don’t specify, StrainPhlAn will automatically run extract_markers.py to get the clade markers for the specified -c parameter

blair.merrick · September 1, 2023, 2:15pm

Thanks Aitor.
Sorry only half of my follow-up message seems to have posted.
Managed to get it work, so all good.
Best wishes,
Blair

Topic		Replies	Views
Strainphlan error StrainPhlAn	4	299	October 6, 2023
Strainphlan: [e] Parallel execution fails: Ran out of input StrainPhlAn	1	115	March 14, 2024
Strainphlan execution failed due to too many discarded samples StrainPhlAn	8	757	July 5, 2022
Strainphlan_config & Not found output/mutation_rates.tsv StrainPhlAn	1	474	May 3, 2021
Processing samples... error during StrainPhlAn 4 run StrainPhlAn	1	516	April 20, 2023

Scripting with strainphlan.py

Define paths

Loop through each .fna file in the db_markers folder

Related topics