Where can I find the taxonomy profile.tsv file in humann3 outputs

I recently got through the humann3 pipeline.I am doing the metatranscriptome analysis with paired metagenome data, so I’d like to use the taxonomy profile from metagenome data. However, I am confused about the taxonomy profile in the humann3 outputs. According to the manual,
Basic usage

$ humann --input $SAMPLE --output $OUTPUT_DIR

$SAMPLE = a single file that is one of the following types:

  1. filtered shotgun sequencing metagenome file (fastq, fastq.gz, fasta, or fasta.gz format)
  2. alignment file (sam, bam or blastm8 format)
  3. gene table file (tsv or biom format)

$OUTPUT_DIR = the output directory

Five main output files will be created:

  1. $OUTPUT_DIR/$SAMPLENAME_0.log
  2. $OUTPUT_DIR/$SAMPLENAME_1_metaphlan_profile.tsv
  3. $OUTPUT_DIR/$SAMPLENAME_2_genefamilies.tsv
  4. $OUTPUT_DIR/$SAMPLENAME_3_reactions.tsv
  5. $OUTPUT_DIR/$SAMPLENAME_4_pathabundance.tsv

It showed that humann3 should produce 5 output files containing metaphlan_profile.tsv, which I understand it’s the taxonomy profile file. But actually, I only got three output files, the same as the tutorial showed. I looked into the intermediate temp files, and only found a file named bug_lists.tsv, but I am not sure whether it’s the taxonomy profile file.

I deleted the intermediate temp files when I ran the humann for metagenome analyses because of the limited disk space. In this case, should I rerun the Metaphlan to get the taxonomy profile (if bug_list.tsv is)? Is there any suggestion to let the humann produce the taxonomy profile outside the intermediate temp files folder?

–Chao

The 5 output files you note is a reorganization coming in HUMAnN 4. In HUMAnN 3.X the taxonomic profile is found under the sample’s temp folder. Can you clarify which version of HUMAnN you’re running / which docs you’re referring to?

Hi @ray , Thanks for the detailed post. Sorry for the confusion with respect to the file naming conventions in the user manual (readme.md in the github repository). We are preparing for HUMAnN v4 in which the file naming conventions and locations of some files will change. I have modified the readme so it is clearer which file names correspond to which versions of HUMAnN. Please post here if there is still any confusion.

Thanks!
Lauren

Hi Lauren,

Thanks so much for updating the readme file. Can’t wait to see HUMAnN 4. I am currently using HUMAnN V3.8, I can only find a file named $SampleName_metaphlan_bugs_list.tsv file in the temp folder. Is this the taxonomy profile file?

-Chao

1 Like

Hi Eric,

Thanks for your reply. I am currently using HUMAnN V3.8, The file I mentioned is $SampleName_metaphlan_bugs_list.tsv file under the temp folder.

-Chao

Yes, you are correct. Sorry again for any confusion!
Lauren

Hi @lauren.j.mciver
I tried to merge $SampleName_metaphlan_bugs_list.tsv files using fololwoing script
humann_join_tables -i output/metaphlan -o metaphlan-merged.tsv --file_name metaphlan.
It resulted in error
Traceback (most recent call last):
File “/home/gssidhu/anaconda3/envs/biobakery3/bin/humann_join_tables”, line 10, in
sys.exit(main())
File “/home/gssidhu/anaconda3/envs/biobakery3/lib/python3.7/site-packages/humann/tools/join_tables.py”, line 238, in main
join_gene_tables(gene_tables,args.output,verbose=args.verbose)
File “/home/gssidhu/anaconda3/envs/biobakery3/lib/python3.7/site-packages/humann/tools/join_tables.py”, line 98, in join_gene_tables
sorted_gene_list=util.fsort(list(gene_table_data))
File “/home/gssidhu/anaconda3/envs/biobakery3/lib/python3.7/site-packages/humann/tools/util.py”, line 403, in fsort
features = sorted( features, key=lambda f: c_topsort.get( fsplit( f )[0], default ) )
File “/home/gssidhu/anaconda3/envs/biobakery3/lib/python3.7/site-packages/humann/tools/util.py”, line 403, in
features = sorted( features, key=lambda f: c_topsort.get( fsplit( f )[0], default ) )
File “/home/gssidhu/anaconda3/envs/biobakery3/lib/python3.7/site-packages/humann/tools/util.py”, line 377, in fsplit
sys.exit( “LETHAL ERROR: bad feature name: {}”.format( f ) )
NameError: name ‘f’ is not defined

I would appreciate if you let me how to resolve this issue.
Thanks
Gurjit

Hi @gsidhu , Sorry for any confusion. The MetaPhlAn tables are of a different format than that expected by the HUMAnN join script. You can use the biobakery_workflows utility script “join_taxonomic_profiles.py” or the MetaPhlAn merge utility script to merge the files.

Thanks,
Lauren

1 Like