Humann3 does not recognize "relative abundance" in metaphlan profile file

Hi,

Thank you for developing these helpful tools! I am working with humann3 recently. I realised that current version of humann3 will not correctly read relative abundance results from metaphlan3 output profile file, when “estimated_number_of_reads_from_the_clade” is included in the profile. Instead, it will take the “coverage” column as “relative abundance” and do to downstream analysis. Therefore, it will ignore many species. I fixed this by reformat the profile file and get rid of the “estimated_number_of_reads_from_the_clade” column. I thought it would be helpful to post this issue here. Hope this can be fixed in the future version.

Best wishes,
AuAs

Hello - Thank you for the detailed post. Would you (if you have not already) double check that you have the most recent version of HUMAnN v3 installed? If you do have the most recent version and you still see the error would you provide a small example input file that replicates the error that you are seeing.

Thank you,
Lauren

Dear Lauren,

Sorry for the late response. I checked the version is humann v3.0.0.alpha.4

I solved the problems by removing “estimated_number_of_reads_from_the_clade” column in metaphlan profile files. It seems that the software cannot recognise “relative abundance” column when “estimated_number_of_reads_from_the_clade” column is there.

Sorry but I do not have the output demo at this moment. But I think this problem can be reproduce by using metaphlan profile file with “estimated_number_of_reads_from_the_clade”

Best wishes,
Shen

Hi Shen, Thank you for the follow up! Can you let me know the command and demo input file you used to generate the metaphlan profile that you ran with? If so then I can recreate the demo file you have and try to reproduce the problem on our end.

Thank you!
Lauren

Hi Lauren,

It’s great to hear from you. I am running metaphlan with the command:

cat {0} {1} | metaphlan --unknown_estimation --nproc 8 --input_type fastq -t rel_ab_w_read_stats -s {2}/{3}.sam --bowtie2out {4}/{3}.bowtie2.out > {5}/{3}.profile.txt

Seems that “rel_ab_w_read_stats” flag will add an additional column. Which cause problem in Humann when using the profile file as input.

Best wishes,
Shen

Hello,

I noticed the same problem with the current version of Humann version 3.6.

I tried to run the demo example with the full uniref50 database either with the default metaphlan option --metaphlan-options "-t rel_ab" and with --metaphlan-options "-t rel_ab_w_read_stats".
When running with latter option then humann does not detect any species from pre-screen with Metaphlan (No species were selected from the prescreen.).
See the below log for a comparison.

Run with rel_ab_w_read_stats Metaphlan option:

(humann3.6_metaphlan4_py3.9) bernhard@macbook humann_tutorial % humann --input demo.fastq.gz --output demo_fastq_full_3 --search-mode uniref50 --metaphlan-options "-t rel_ab_w_read_stats --read_min_len 60 --stat_q 0.1 --bt2_ps very-sensitive-local" --threads 10 --verbose 
Creating output directory: /Users/bernhard/Documents/demo_fastq_full_3
Output files will be written to: /Users/bernhard/Documents/demo_fastq_full_3

Writing temp files to directory: /Users/bernhard/Documents/demo_fastq_full_3/demo_humann_temp

File ( /Users/bernhard/Documents/demo.fastq.gz ) is of format:  fastq.gz

Decompressing gzipped file ...


Running metaphlan ........


/Users/bernhard/miniconda3/envs/humann3.6_metaphlan4_py3.9/bin/metaphlan /Users/bernhard/Documents/demo_fastq_full_3/demo_humann_temp/tmplt6jx1j5/tmptcdubvvs -t rel_ab_w_read_stats --read_min_len 60 --stat_q 0.1 --bt2_ps very-sensitive-local -o /Users/bernhard/Documents/demo_fastq_full_3/demo_humann_temp/demo_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /Users/bernhard/Documents/demo_fastq_full_3/demo_humann_temp/demo_metaphlan_bowtie2.txt --nproc 10


TIMESTAMP: Completed prescreen : 257 seconds


Total species selected from prescreen: 0

Selected species explain 0.00% of predicted community composition



No species were selected from the prescreen.
Because of this the custom ChocoPhlAn database is empty.
This will result in zero species-specific gene families and pathways.



TIMESTAMP: Completed custom database creation : 0 seconds


Running diamond ........

Run with rel_ab Metaphlan option:

(humann3.6_metaphlan4_py3.9) bernhard@macbook humann_tutorial % humann --input demo.fastq.gz --output demo_fastq_full_4 --search-mode uniref50 --metaphlan-options "-t rel_ab --read_min_len 60 --stat_q 0.1 --bt2_ps very-sensitive-local" --threads 10 --verbose        
Creating output directory: /Users/bernhard/Documents/demo_fastq_full_4
Output files will be written to: /Users/bernhard/Documents/demo_fastq_full_4

Writing temp files to directory: /Users/bernhard/Documents/demo_fastq_full_4/demo_humann_temp

File ( /Users/bernhard/Documents/demo.fastq.gz ) is of format:  fastq.gz

Decompressing gzipped file ...


Running metaphlan ........


/Users/bernhard/miniconda3/envs/humann3.6_metaphlan4_py3.9/bin/metaphlan /Users/bernhard/Documents/demo_fastq_full_4/demo_humann_temp/tmp6_w81a1s/tmpt8ao3gn8 -t rel_ab --read_min_len 60 --stat_q 0.1 --bt2_ps very-sensitive-local -o /Users/bernhard/Documents/demo_fastq_full_4/demo_humann_temp/demo_metaphlan_bugs_list.tsv --input_type fastq --bowtie2out /Users/bernhard/Documents/demo_fastq_full_4/demo_humann_temp/demo_metaphlan_bowtie2.txt --nproc 10


TIMESTAMP: Completed prescreen : 262 seconds

Found t__SGB1815 : 50.66% of mapped reads ( s__Bacteroides_dorei,s__Phocaeicola_vulgatus,s__Bacteroides_vulgatus,s__Bacteroides_sp_9_1_42FAA,s__Bacteroides_sp_3_1_33FAA,s__Bacteroides_sp_NMBE5,s__Phocaeicola_dorei,s__Bacteroidaceae_bacterium,g__Phocaeicola.s__Phocaeicola_vulgatus,g__Bacteroides.s__Bacteroides_sp_9_1_42FAA,g__Bacteroides.s__Bacteroides_sp_3_1_33FAA,g__Bacteroides.s__Bacteroides_sp_NMBE5,g__Bacteroidaceae_unclassified.s__Bacteroidaceae_bacterium )
Found t__SGB1814 : 49.34% of mapped reads ( s__Bacteroides_vulgatus,s__Bacteroides_dorei,s__Phocaeicola_dorei,s__Bacteroides_sp_3_1_33FAA,s__Bacteroides_sp_4_3_47FAA,s__Bacteroides_sp_3_1_40A,s__Bacteroides_sp_AM18_9,s__Bacteroides_sp_AM23_18,s__Bacteroides_sp_AF39_10AT,s__Bacteroides_sp_AF32_15BH,s__Bacteroides_sp_AF25_18,s__Bacteroides_sp_AF16_29,s__Bacteroides_sp_AM28_6,s__Bacteroides_sp_AM27_13,s__Bacteroides_sp_AM26_11,s__Bacteroides_sp_AF15_23LB,s__Bacteroides_sp_AF17_1,g__Phocaeicola.s__Phocaeicola_dorei,g__Bacteroides.s__Bacteroides_sp_3_1_33FAA,g__Bacteroides.s__Bacteroides_sp_4_3_47FAA,g__Bacteroides.s__Bacteroides_sp_3_1_40A,g__Bacteroides.s__Bacteroides_sp_AM18_9,g__Bacteroides.s__Bacteroides_sp_AM23_18,g__Bacteroides.s__Bacteroides_sp_AF39_10AT,g__Bacteroides.s__Bacteroides_sp_AF32_15BH,g__Bacteroides.s__Bacteroides_sp_AF25_18,g__Bacteroides.s__Bacteroides_sp_AF16_29,g__Bacteroides.s__Bacteroides_sp_AM28_6,g__Bacteroides.s__Bacteroides_sp_AM27_13,g__Bacteroides.s__Bacteroides_sp_AM26_11,g__Bacteroides.s__Bacteroides_sp_AF15_23LB,g__Bacteroides.s__Bacteroides_sp_AF17_1 )

Total species selected from prescreen: 47

Selected species explain 100.00% of predicted community composition


Creating custom ChocoPhlAn database ........


/usr/bin/gunzip -c /Users/bernhard/Documents/humann3.6_dbs/chocophlan/g__Bacteroides.s__Bacteroides_dorei.centroids.v201901_v31.ffn.gz /Users/bernhard/Documents/humann3.6_dbs/chocophlan/g__Bacteroides.s__Bacteroides_vulgatus.centroids.v201901_v31.ffn.gz


TIMESTAMP: Completed custom database creation : 1 seconds


Running bowtie2-build ........


/Users/bernhard/miniconda3/envs/humann3.6_metaphlan4_py3.9/bin/bowtie2-build -f /Users/bernhard/Documents/demo_fastq_full_4/demo_humann_temp/demo_custom_chocophlan_database.ffn /Users/bernhard/Documents/demo_fastq_full_4/demo_humann_temp/demo_bowtie2_index


TIMESTAMP: Completed database index : 24 seconds


Running bowtie2 ........


/Users/bernhard/miniconda3/envs/humann3.6_metaphlan4_py3.9/bin/bowtie2 -q -x /Users/bernhard/Documents/demo_fastq_full_4/demo_humann_temp/demo_bowtie2_index -U /Users/bernhard/Documents/demo_fastq_full_4/demo_humann_temp/tmp6_w81a1s/tmpt8ao3gn8 -S /Users/bernhard/Documents/demo_fastq_full_4/demo_humann_temp/demo_bowtie2_aligned.sam -p 10 --very-sensitive


TIMESTAMP: Completed nucleotide alignment : 1 seconds


TIMESTAMP: Completed nucleotide alignment post-processing : 1 seconds

Total bugs from nucleotide alignment: 2
g__Bacteroides.s__Bacteroides_dorei: 1609 hits
g__Bacteroides.s__Bacteroides_vulgatus: 1619 hits

Total gene families from nucleotide alignment: 628

Unaligned reads after nucleotide alignment: 84.6285714286 %


Running diamond ........

Versions used:

(humann3.6_metaphlan4_py3.9) bernhard@MBPvonBnhard115 humann_tutorial % humann --version
humann v3.6
(humann3.6_metaphlan4_py3.9) bernhard@MBPvonBnhard115 humann_tutorial % metaphlan --version
MetaPhlAn version 4.0.3 (24 Oct 2022)

Yes, there are a few threads about this now. Since it seems like users are starting to use this mode more and more we will add robustness to it in a future version of HUMAnN.

I’ll also note that, while we added flags to pass additional parameters to MetaPhlAn, Bowtie 2, and DIAMOND in HUMAnN 3 (both as an aid to development and because it was a frequently requested feature), I consider these to be “expert options”: HUMAnN is not guaranteed to work if you change our default settings for these programs.