Query regarding HUMAnN2

  • I have installed HUMAnN2 from bioconda and downloaded respective Metaphlan data base (mpa_v20_m200), Chocophlan database (full_chocophlan_plus_viral.v0.1.1.tar.gz) and uniref90 database (uniref90_annotated.1.1.dmnd).
  • I merged the paired end reads through flash.
  • I run the command humann2 --input merged.fastq --output humann2_out.
    -Though my sample is rice paddy soil sample (highly diverse), only 10 species were detected after the successful completion of Metaphlan run. Why ?
  • Among them only two are selected for functional annotation. Why ?
  • The command is running from last 24 hr, how much time it generally take to complete analysis.
  • Am I doing something wrong or something is missing ?

Please suggest, what should I do ?

I think all of these issues are related to a lack of known species in your community relative to the tools’ databases. The reason why only a subset of the detected species are used for functional profiling is that some detected “species” are of the form s__Genus_unclassified, for which we do not have a species pangenome for mapping. As a result, most of your reads will be forwarded to translated search, which is considerably slower than nucleotide-level mapping (~2 hrs per 10M reads using 8 threads). Because of these issues, I’d recommend running with UniRef50 rather than UniRef90 in order to map more reads to homologs of known proteins.

Thanks @franzosa for your suggestions.