We are running MetaWIBELE using the docker and the same versions of the licensed software that are listed as “tested versions” on the GitHub README. The only exception is Interproscan which is version 5.56-89.0.
Unfortunately the output of characterization on the demo data does not look the same as is shown on the README. In particular, we get very nonspecific UniRef90 annotations - example screenshot below. One thing to note is that we are bypassing psortb, although we don’t believe that this is the origin of the difference.
Could you please help? Thank you so much.
A protein family is assigned with uniref90 annotations only when it is identified as “strong homology” of a uniref90 cluster. In your case, it looks like that Cluster_1 was identified as “worse homology” lacking significant uniref90 homologs. Thus there were no uniref90 annotations assigned to it. Please make sure the whole run was successfully finished without any errors. If there were some intermediate steps that failed, it’s also possible to lead to incomplete results.
In addition, we updated MetaWIBELE to make it more compatible with different versions of tools (e.g. now works for MSPminer >v1.0.0, tested v1.1.2) and released it as v0.4.6 on Github (GitHub - biobakery/metawibele: MetaWIBELE: Workflow to Identify novel Bioactive Elements in microbiome). It will also be available to pip/conda/Docker soon. Feel free to try this latest version.
Thank you for this information.We are running MetaWIBELE on the demo data provided in the README. I just checked and the run said it finished successfully. I see on the README that the output for cluster 1 is different and there is strong homology. But in our output, there is worse homology. How can we troubleshoot this discrepancy and be sure MetaWIBELE is working correctly before running on our real data?
Did you install the dependent databases (e.g. UniProt/UniRef databases, guided in GitHub - biobakery/metawibele: MetaWIBELE: Workflow to Identify novel Bioactive Elements in microbiome) and re-config the global configuration file of MetaWIBELE (i.e.
metawibele.cfg)? If you didn’t customize the dependent databases, MetaWIBELE will use a toy but tiny database packed in the tool for running, resulting in lots of worse homology. Meanwhile, a warning message about this will be printed out as well.
Yes, we downloaded them as described in the MetaWIBELE Github. The we put the path in the metawibele.cfg file. The files in the database path are shown in the attached screenshot. Does this look correct? I also double checked the full terminal output and I do not see any warnings about using the toy database.
What else can I check? Thank you!
Did you check if your protein families built by the demo inputs were the same or closely same with the example outputs in MetaWIBELE package: metawibele/demo_proteinfamilies.clstr at master · biobakery/metawibele · GitHub? Make sure “Cluster_1” you checked is the exactly the same protein cluster in the example.
Then, if the dependent database and protein families all look orderly, you may want to take some effort into further debugging. MetaWIBELE not only prints a brief log monitoring the global running, but also reports detailed logs of each step into files that are usually named with
$STEPNAME.log. To investigate each step running, you can check their corresponding detailed log file. In your case, it looks like you got a bit different results of the uniref90-based characterization compared with the typical demo run. To debug this, you can go to
folder, and check for details info from the logs files (e.g.
uniref90.mapping.log, uniref90_annotation.log uniref90_protein.log, uniref90_proteinfamilies.log).
Meanwhile, do you mind sharing the whole global log file (both STDERR and STDOUT outputs) with me as well? I can debug it on my end if needed.
Thank you so much for your detailed responses! We actually think we figured out the issue (redownloading the databases seemed to help), but will let you know as soon as possible if that is not true. Thanks again!
Meena and Boryana