The bioBakery help forum

Eukaryotic Uniref90 Gene Families in Gene Family TSV Files

Are these UniRef90s assigned to the unclassified stratification, meaning that their abundance was identified from translated search? If so, it is possible that they represent residual host contamination. Conversely, UniRef90 abundance assigned to specific species is much less likely to be host-derived.

Since you’re dealing with RNA, you’ll want to host-deplete against a human transcript database in addition to the human genome. It’s possible that you have host reads in your sample that cover fused exons, in which case the read might not map to the genome (where the exons are not adjacent).

The UniRef90s are assigned to unclassified stratification. Do you have a suggestion for which transcript database would be best to use?

Is there a way to make the uniref90 database prokaryotic only?

We use a human EST database from NCBI to remove host-derived RNA when performing quality control on metatranscriptomic sequences:

https://bitbucket.org/biobakery/kneaddata/wiki/Home

You can use the infer_taxonomy script to attach the original sources of UniRef90s to their identifiers. This might help you to weed out UniRef90s in the unclassified stratum that were due to host contamination.

https://bitbucket.org/biobakery/humann2/wiki/Home#markdown-header-humann2_infer_taxonomy

I tried to follow the link you posted for infer_taxonomy, but it leads a page that says the link has no power. Has the page migrated to another page in the bitbucket?

Hi Kying,
Yes, the humann2 was migrated to github and is now referred to humann 2.0. You can find the same information using this link below.

Thanks,
Sagun

Thanks Sagun. I was able to use infer_taxonomy, and in the github it states:

“The modified gene families output files can then be reprocessed through HUMAnN 2.0 to compute pathway abundance/coverage using the inferred taxonomic stratifications.”

What function would I be using?

You can provide the resulting gene families file (or any gene families file) to HUMAnN as an --input. It will know that it is starting from genes rather than raw sequencing reads (the typical input) based on the file formatting.

I put it into HUMAnN2 as --input, but when asked to provide it an --input-format, TSV is not an accepted format.

If you are using --input-format you would specify “genetable”, but the format should be automatically detected from --input with the TSV extension.

When I put in the command humann2 --input file.tsv --input-format "genetable" --output output_folder and it runs thru the humann2 command.

I get the following error message: CalledProcessError: Command '['python', $/humann2/quantify/MinPath12hmp.py', '-any', '$output', '-map', '$output', '-report', '$output', '-details', '$output', '-mps', $output']' returned non-zero exit status 1

Seems that there is an error with glpsol from Minpath when it tries to compute the pathway abundance and coverage from the new infer_taxonomy genefamilies file.

I’m not sure where the issues lies to be able to fix the error.

Have you inspected the inferred gene families file to make sure it looks OK? If you’re able to share that file it might help us to diagnose this error.

I have looked, and the inferred gene families look fine. I would be happy to share the inferred gene families file, what would be the best way to send it to you?

If you’re not able / interested to attach it here, you can email it to me at franzosa@hsph.harvard.edu. If it’s too large to email you can attach just the first ~1000 lines or so.

I have just sent to your e-mail. Thanks

Sorry for the long delay! Your genes file looked fine to me. I renamed it to test.tsv and ran it through HUMAnN with the following command:

humann2 --input test.tsv --output . --input-format genetable

And it produced pathway-level output files successfully. It seems like there might be something wrong with your installation?

I think there might be, and it seems that the error message is coming from when running glpsol from Minpath.

Is there a way to update Minpath and all its prerequisites or a way to update all the packages in Humann2?

You could have a look at this thread from the archived forum which dealt with upgrading glpk to fix minpath problems:

https://groups.google.com/d/topic/humann-users/MSVmXC7DSW0/discussion

Thank you Eric. Will report back when I find out the answer

Was able to fix the issue by reinstalling Humann2 using: conda install -c biobakery humann2

I am now able to get the same 2 files that you produced.

Great! It looks like this might be an issue with the HUMAnN 2.0 recipe on bioconda, which we do not maintain. If you can use the biobakery channel you’ll at least get something we made and tested ourselves.