Hi,
I am trying to run a script with a loop to using a number of extracted markers (located in “db_markers”) over my sample set (.pkl files) in “consensus_markers” and output results into “output”.
For a particular SGB e.g. 9228, I am able to run the following with no issue:
strainphlan -s consensus_markers/*.pkl
-m db_markers/t__SGB9228.fna
-o output/SGB9228
-n 8
-c t__SGB9228
–mutation_rates
When I then try to run this in a script:
#!/bin/bash
Define paths
CONS_MARKERS_DIR=“consensus_markers”
DB_MARKERS_DIR=“db_markers”
OUTPUT_BASE=“output”
Loop through each .fna file in the db_markers folder
for REFERENCE_FILE in ${DB_MARKERS_DIR}/t__*.fna; do
# Extract the reference group name from the filename
REFERENCE_GROUP=“$(basename “${REFERENCE_FILE}” .fna)”
# Construct the output directory path
OUTPUT_DIR="${OUTPUT_BASE}/${REFERENCE_GROUP}"
# Run the strainphlan command
strainphlan -s ${CONS_MARKERS_DIR}/*.pkl \
-m "$REFERENCE_FILE" \
-o "$OUTPUT_DIR" \
-n 8 \
-c "$REFERENCE_GROUP" \
--mutation_rates
done
I get the following error…
Processing file: db_markers/t__SGB9228.fna
Reference group: db_markers/t__SGB9228
Output directory: output/db_markers/t__SGB9228
Running strainphlan command…
Mon Aug 28 14:56:53 2023: Start StrainPhlAn 4.0.6 execution
Mon Aug 28 14:56:53 2023: Creating temporary directory…
Mon Aug 28 14:56:53 2023: Done.
Mon Aug 28 14:56:53 2023: Filtering markers and samples…
Mon Aug 28 14:56:53 2023: Getting markers from main samples…
Mon Aug 28 14:56:55 2023: Done.
Mon Aug 28 14:56:55 2023: Getting markers from main references…
Mon Aug 28 14:56:55 2023: Done.
Mon Aug 28 14:56:55 2023: Removing bad markers / samples…
Mon Aug 28 14:56:55 2023: Done.
Mon Aug 28 14:56:55 2023: Getting markers from secondary samples and references…
Mon Aug 28 14:56:55 2023: Done.
Mon Aug 28 14:56:55 2023: Done.
Mon Aug 28 14:56:55 2023: Writing samples as markers’ FASTA files…
Mon Aug 28 14:56:55 2023: [Error] An error ocurred when creating the “[Errno 2] No such file or directory: ‘output/db_markers/t__SGB9228/tmpehrlwlp_/db_markers/t__SGB9228.StrainPhlAn4’” folder
Mon Aug 28 14:56:55 2023: Stop StrainPhlAn execution.
Done processing file: db_markers/t__SGB9228.fna
It looks like it is running into an issue with where the temporary files are being created, but despite (many) modifications to the script (all driven by AI) I can’t work out how to get around this, and think it may require a modification of the strainphlan.py script?
Already way beyond my level of expertise, and so would be grateful for any advice as to how to overcome this issue.
Thanks in advance.