HUMAnN3 analysis not proceeding with RNAseq data

Hello? I’ve been getting a lot of help using Humann3. Thank you.
The problem I’m facing is that when I try to perform functional profiling on RNAseq data, even though it’s RNA sequencing data conducted on the same platform (Illumina / BGI), for some data, the humann3 pipeline works quickly (10 threads, 12 hours), but for other data, it doesn’t produce output even after several days.
The code I’m using is as follows.

adapter trimming, filtering

for f in ls -1 *.R1.fq.gz | sed 's/.R1.fq.gz//'
do
fastp -i ${f}.R1.fq.gz -I ${f}.R2.fq.gz -o 01_trimmed/${f}.R1.fq.gz -O 01_trimmed/${f}.R2.fq.gz -w 10
done

merge

for f in ls -1 *.R1.fq.gz | sed 's/.R1.fq.gz//'
do
cat 01_trimmed/${f}.R1.fq.gz 01_trimmed/${f}.R2.fq.gz > 00_merged/${f}.fq.gz
done

humann

for f in ls -1 *.R1.fq.gz | sed 's/.R1.fq.gz//'
do
humann --input 00_merged/${f}.fq.gz
–output 07_humann
–protein-database /HDD1/humann3/uniref90
–threads 10
–bypass-nucleotide-search
done

[configuration setting]
humann_config --print
HUMAnN Configuration ( Section : Name = Value )
database_folders : nucleotide = /HDD3/SJ_Shotgun/Mac_tools/DB/chocophlan
database_folders : protein = /HDD1/humann3/uniref90
database_folders : utility_mapping = /home/hspark/anaconda3/lib/python3.9/site-packages/humann/data/misc
run_modes : resume = False
run_modes : verbose = False
run_modes : bypass_prescreen = False
run_modes : bypass_nucleotide_index = False
run_modes : bypass_nucleotide_search = False
run_modes : bypass_translated_search = False
run_modes : threads = 1
alignment_settings : evalue_threshold = 1.0
alignment_settings : prescreen_threshold = 0.01
alignment_settings : translated_subject_coverage_threshold = 50.0
alignment_settings : translated_query_coverage_threshold = 90.0
alignment_settings : nucleotide_subject_coverage_threshold = 50.0
alignment_settings : nucleotide_query_coverage_threshold = 90.0
output_format : output_max_decimals = 10
output_format : remove_stratified_output = False
output_format : remove_column_description_output = False

Same setting, system, and code act well for shotgun metagenomics data and some RNAseq data.
Should I wait, or is there anything wrong? I’m in desperate need of help.

What sort of RNA-seq data are you analyzing? Is it microbial?

In general, the factor that determines how quickly HUMAnN runs is the fraction of your sequencing reads that could be quickly assigned to known species, as the fallback step (translated search) is much, much slower. Hence, when you have 10 samples of the same depth and one is taking longer to profile, it’s usually because it contains a higher proportion of unclassified DNA.

@franzosa

Thank you for the reply.

The data is originated from human tissue.

I think you are right. This data has very low microbial DNA. And when I left the process, it goes to the next file about 3 days which is about 6 fold compared to other humann process with higher microbial DNA.

Thank you !