HUMAnN3 analysis not proceeding with RNAseq data

minchance · March 12, 2024, 7:40am

Hello? I’ve been getting a lot of help using Humann3. Thank you.
The problem I’m facing is that when I try to perform functional profiling on RNAseq data, even though it’s RNA sequencing data conducted on the same platform (Illumina / BGI), for some data, the humann3 pipeline works quickly (10 threads, 12 hours), but for other data, it doesn’t produce output even after several days.
The code I’m using is as follows.

adapter trimming, filtering

for f in ls -1 *.R1.fq.gz | sed 's/.R1.fq.gz//'
do
fastp -i ${f}.R1.fq.gz -I ${f}.R2.fq.gz -o 01_trimmed/${f}.R1.fq.gz -O 01_trimmed/${f}.R2.fq.gz -w 10
done

merge

for f in ls -1 *.R1.fq.gz | sed 's/.R1.fq.gz//'
do
cat 01_trimmed/${f}.R1.fq.gz 01_trimmed/${f}.R2.fq.gz > 00_merged/${f}.fq.gz
done

humann

for f in ls -1 *.R1.fq.gz | sed 's/.R1.fq.gz//'
do
humann --input 00_merged/${f}.fq.gz
–output 07_humann
–protein-database /HDD1/humann3/uniref90
–threads 10
–bypass-nucleotide-search
done

[configuration setting]
humann_config --print
HUMAnN Configuration ( Section : Name = Value )
database_folders : nucleotide = /HDD3/SJ_Shotgun/Mac_tools/DB/chocophlan
database_folders : protein = /HDD1/humann3/uniref90
database_folders : utility_mapping = /home/hspark/anaconda3/lib/python3.9/site-packages/humann/data/misc
run_modes : resume = False
run_modes : verbose = False
run_modes : bypass_prescreen = False
run_modes : bypass_nucleotide_index = False
run_modes : bypass_nucleotide_search = False
run_modes : bypass_translated_search = False
run_modes : threads = 1
alignment_settings : evalue_threshold = 1.0
alignment_settings : prescreen_threshold = 0.01
alignment_settings : translated_subject_coverage_threshold = 50.0
alignment_settings : translated_query_coverage_threshold = 90.0
alignment_settings : nucleotide_subject_coverage_threshold = 50.0
alignment_settings : nucleotide_query_coverage_threshold = 90.0
output_format : output_max_decimals = 10
output_format : remove_stratified_output = False
output_format : remove_column_description_output = False

Same setting, system, and code act well for shotgun metagenomics data and some RNAseq data.
Should I wait, or is there anything wrong? I’m in desperate need of help.

franzosa · April 11, 2024, 6:24pm

What sort of RNA-seq data are you analyzing? Is it microbial?

In general, the factor that determines how quickly HUMAnN runs is the fraction of your sequencing reads that could be quickly assigned to known species, as the fallback step (translated search) is much, much slower. Hence, when you have 10 samples of the same depth and one is taking longer to profile, it’s usually because it contains a higher proportion of unclassified DNA.

minchance · April 15, 2024, 3:35am

@franzosa

Thank you for the reply.

The data is originated from human tissue.

I think you are right. This data has very low microbial DNA. And when I left the process, it goes to the next file about 3 days which is about 6 fold compared to other humann process with higher microbial DNA.

Thank you !

Topic		Replies	Views
Humann3 metatranscriptome analysis stuck at nucleotide alignment post processing HUMAnN	13	1360	March 7, 2023
Bowtie2 unaligned reads slow HUMAnN	14	1961	November 8, 2024
Speed up humann3 Data resource	1	738	June 26, 2020
Any benchmarking analysis about HUMAnN3? HUMAnN	5	1450	June 4, 2021
HUMAnN3 no species selected from prescreen HUMAnN	3	915	January 7, 2021

HUMAnN3 analysis not proceeding with RNAseq data

adapter trimming, filtering

merge

humann

Related topics