PanPhlAn_pangenome_exporter uniref uniref_annotator diamond issue

drelo · January 21, 2021, 1:06pm

Dear bioBakery, I have an issue trying to use the Pangenome exporter.
I installed all the required packages with conda, I had problems with prokka (the is a current issue with the blastp version, I removed the one I had, reinstalled blast, prokka and hmmer) but I solved it. Now I have diamond v0.9.14.115 and blast 2.9.0.

I got this error close to the end of the run

Executing uniref_annotator...Error: Database was built with a different version of diamond as is incompatible.

I uploaded both the list of packages within conda environment and the output of this demo run with 2 samples. Thanks for the help.

output.txt (137.5 KB) listPackages.txt (13.6 KB)

leonard.dubois · January 21, 2021, 2:20pm

Hello,

when we designed the PanPhlAn genome exporter, we were using diamond version 0.9.24
Maybe the program crashes because your version if older…

Hope this could easily and simply solve your problem

drelo · January 21, 2021, 3:13pm

I installed that version and it is working flawlessly at the moment, thanks for your help.

wget https://github.com/bbuchfink/diamond/releases/download/v0.9.24/diamond-linux64.tar.gz tar xzf diamond-linux64.tar.gz and [in my case] replacing the anaconda link cp diamond ~/anaconda3/envs/panphlan/bin/diamond

drelo · January 21, 2021, 4:14pm

ETA: I posted this as a new topic now

I got an error message at the end… [let me know if I should post a separate topic for this]
This is the last part of the output:

Closing the input file...  [7e-06s]
Closing the output file...  [2.2e-05s]
Closing the database file...  [3e-06s]
Deallocating taxonomy...  [1e-06s]
Total time = 335.776s
Reported 36612 pairwise alignments, 36662 HSPs.
3578 queries aligned.
Parsing results file:
  trash1/tmp/uniref/RT078_CDM120/tmp/RT078_CDM120.faa.uniref50.hits
Writing new output file:
  trash1/tmp/uniref/RT078_CDM120/RT078_CDM120.faa
Summary of annotations:
  Genes in input FASTA: 3,594
  UniRef90 codes assigned: 3,427 (95.4%)
  UniRef50 codes assigned: 3,465 (96.4%)
  UniRef50 codes inferred from UniRef90 codes: 0 (0.0%)
Finished successfully.

Thu Jan 21 12:53:43 2021 Done.
Thu Jan 21 12:53:43 2021 Clustering unnanotated proteins at UniRef90 level...['mmseqs', 'createdb', 'trash1/tmp/unannotated/unannotated_90.faa', 'trash1/tmp/mmseq/db/unannotated_90']
['mmseqs', 'cluster', 'trash1/tmp/mmseq/db/unannotated_90', 'trash1/tmp/mmseq/db_clustered/unannotated_90', 'trash1/tmp/mmseq/tmp', '-c', '0.8', '--min-seq-id', '0.9', '--threads', '6']
['mmseqs', 'createtsv', 'trash1/tmp/mmseq/db/unannotated_90', 'trash1/tmp/mmseq/db/unannotated_90', 'trash1/tmp/mmseq/db_clustered/unannotated_90', 'trash1/tmp/unannotated_90.clustered.tsv', '--threads', '6']

Thu Jan 21 12:53:44 2021 Done.
Thu Jan 21 12:53:44 2021 Clustering unnanotated proteins at UniRef50 level...['mmseqs', 'createdb', 'trash1/tmp/unannotated/unannotated_50.faa', 'trash1/tmp/mmseq/db/unannotated_50']
['mmseqs', 'cluster', 'trash1/tmp/mmseq/db/unannotated_50', 'trash1/tmp/mmseq/db_clustered/unannotated_50', 'trash1/tmp/mmseq/tmp', '-c', '0.8', '--min-seq-id', '0.5', '--threads', '6']
['mmseqs', 'createtsv', 'trash1/tmp/mmseq/db/unannotated_50', 'trash1/tmp/mmseq/db/unannotated_50', 'trash1/tmp/mmseq/db_clustered/unannotated_50', 'trash1/tmp/unannotated_50.clustered.tsv', '--threads', '6']

Thu Jan 21 12:53:47 2021 Done.
Thu Jan 21 12:53:47 2021 Reannotating genomes...
Thu Jan 21 12:56:00 2021 Done.
Thu Jan 21 12:56:00 2021 Writing PanPhlAn tsv...Traceback (most recent call last):
  File "/home/andrespara/anaconda3/envs/panphlan/lib/python3.8/site-packages/Bio/SeqIO/Interfaces.py", line 47, in __init__
    self.stream = open(source, "r" + mode)
TypeError: expected str, bytes or os.PathLike object, not FakeHandle

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./panphlan_exporter.py", line 520, in <module>
    panphlan_exporter(args.input, args.tmp, args.output, args.clade_name, args.nprocs, args.db_path)
  File "./panphlan_exporter.py", line 501, in panphlan_exporter
    write_panphlan_tsv(inputdir, tmp_dir, ppa_outdir, clade_name, contigs_names_dict, contigs_names_dict_prokka, extend_pangenome)
  File "./panphlan_exporter.py", line 425, in write_panphlan_tsv
    for rec in GFF.parse(gff_file, limit_info=dict(gff_type = ['CDS'])):
  File "/home/andrespara/anaconda3/envs/panphlan/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 745, in parse
    for rec in parser.parse_in_parts(gff_files, base_dict, limit_info,
  File "/home/andrespara/anaconda3/envs/panphlan/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 322, in parse_in_parts
    for results in self.parse_simple(gff_files, limit_info, target_lines):
  File "/home/andrespara/anaconda3/envs/panphlan/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 343, in parse_simple
    for results in self._gff_process(gff_files, limit_info, target_lines):
  File "/home/andrespara/anaconda3/envs/panphlan/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 637, in _gff_process
    for out in self._lines_to_out_info(line_gen, limit_info, target_lines):
  File "/home/andrespara/anaconda3/envs/panphlan/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 699, in _lines_to_out_info
    fasta_recs = self._parse_fasta(FakeHandle(line_iter))
  File "/home/andrespara/anaconda3/envs/panphlan/lib/python3.8/site-packages/BCBio/GFF/GFFParser.py", line 560, in _parse_fasta
    return list(SeqIO.parse(in_handle, "fasta"))
  File "/home/andrespara/anaconda3/envs/panphlan/lib/python3.8/site-packages/Bio/SeqIO/__init__.py", line 607, in parse
    return iterator_generator(handle)
  File "/home/andrespara/anaconda3/envs/panphlan/lib/python3.8/site-packages/Bio/SeqIO/FastaIO.py", line 183, in __init__
    super().__init__(source, mode="t", fmt="Fasta")
  File "/home/andrespara/anaconda3/envs/panphlan/lib/python3.8/site-packages/Bio/SeqIO/Interfaces.py", line 51, in __init__
    if source.read(0) != "":
TypeError: read() takes 1 positional argument but 2 were given```

YangJing-BIG · April 14, 2021, 4:05am

hello drelo. Did you solve the problem yet? I met the the same one…

leonard.dubois · April 14, 2021, 6:59am

Hello,

sorry I forgot to answer here. This is actually a well known issue coming from the BioPython version used. It should not be newer than the 1.76.

Check this issue for more details : polymut.py error reading gff file ? · Issue #4 · SegataLab/cmseq · GitHub

Best regards
Léonard

drelo · April 14, 2021, 10:47am

Yeah I don’t remember how I solved but I managed to build a pangenome with this tool. I needed PanPhlan installed in a cluster to use this custom pangenome. It was only installed last week so I will check if everything works fine. Thanks for the link I will check it!

Topic		Replies	Views
PanPhlAn_pangenome_exporter issue while writing .tsv PanPhlAn	7	612	January 25, 2021
Issue with Humann4 diamond run HUMAnN	0	40	November 25, 2024
Error: the version of diamond in humann3 HUMAnN	11	3453	October 5, 2023
Diamond version error HUMAnN	10	2043	June 24, 2021
Announcing HUMAnN 3.6 (Critical Update) HUMAnN	6	5374	March 31, 2023

PanPhlAn_pangenome_exporter uniref uniref_annotator diamond issue

Related topics