Help with PanPhlAn tutorial

sapphi · April 27, 2020, 7:34pm

Hi,

I am following the PanPhlAn tutorial online and ran into an issue I can’t solve (https://github.com/SegataLab/panphlan/wiki/Tutorial#4-build-bowtie2-indexes). I am at step 4 (Build Bowtie2 indexes) but getting an error.

Error: Please provide input files, either (–fna & --ffn), (–gff & --fna), or --gff alone.

I am confused because the tutorial doesn’t mention -gff files or --ffa files.

Thanks

leonard.dubois · April 28, 2020, 2:37pm

Hello,

there are currently two scripts with similar names. panphlan_pangenome_generation.py and panphlan_new_pangenome_generation.py

The first one comes from previous version where the user should provide fasta file and annotation files to generate the pangenome himself (using the USEARCH software).
The second script assume that you already have everything and only need. For the next step (mapping with panphlan_map.py), one only need to generate the Bowtie2 indexes from fna files.

Anyway, I confess this isn’t the best option anyway. We are currently working on a full pangenome database, providing fasta, annotation and Bowtie2 indexes, that should be available soon.

sapphi · April 29, 2020, 9:35am

Thank you. I am quite new to bioinformatics so was struggling a bit. I have the new version of panphlan so can I create my own pangenome with fasta files and annotation files?

leonard.dubois · April 30, 2020, 9:45am

Hello,

creating your own pangenome with fasta and annotation is actually part of the “old code” ( panphlan_pangenome_generation.py) from PanPhlAn 1.2. For that you’ll need the USEARCH software for clustering (https://www.drive5.com/usearch/)

sapphi · May 1, 2020, 9:46am

Hi,

Thank you. I have just realized that I was using the 1.2 version so I should be able to make my own pangenome. Also, the panphlan_erectale16_pangenome.csv export from ChocoPhlan looks different to the panphlan_39491_pangenome.csv in the tutorial. Is there a reason one shows the UniProtID and the other the locus tags instead?

CK_zhu · May 2, 2020, 12:20pm

Hi , I have been studying panphlan example

I download 14 fna and put them into reference_genomes/

GCA_000209935.fna
GCA_000209955.fna
GCA_001404855.fna
GCA_001405295.fna
GCA_001406375.fna
GCA_001406835.fna
GCA_003122495.fna
GCA_003436035.fna
GCA_003436785.fna
GCA_003438175.fna
GCA_003438365.fna
GCA_003438715.fna
GCA_003438925.fna
GCA_003438965.fna

when I run the follwing code I get an error，would you please give panphlan_39491_pangenome.csv file?

./panphlan/panphlan_new_pangenome_generation.py \
	-c 39491 --i_fna reference_genomes/ -o . --verbose

(panphlan) [ckzhu@vm-login02 panphlan]$ ./panphlan/panphlan_new_pangenome_generation.py \
> -c 39491 --i_fna reference_genomes/ -o . --verbose

 Error: Pangenome file (reference_genomes/panphlan_39491_pangenome.csv) not found

I also try old code

(panphlan) [ckzhu@vm-login02 panphlan]$ ./panphlan/panphlan_pangenome_generation.py -c 39491 --i_fna reference_genomes/ -o . --verbose

PanPhlAn pangenome generation version 1.2.3.7
Python version: 3.7.3
System: linux
./panphlan/panphlan_pangenome_generation.py -c 39491 --i_fna reference_genomes/ -o . --verbose

[I] Input genome FNA folder: /public/home/sample_lib/ckzhu/software/panphlan/reference_genomes/
[I] Species database name: 39491
[I] Identity threshold percentage: 95.0 %.
[I] Output folder: ./
[I] Temporary folder: TMP_panphlan_db/

STEP 1. Checking required software installations ...
[I] Bowtie2 is installed, version: 2.4.1, path: /public/home/sample_lib/ckzhu/miniconda3/envs/panphlan/bin/bowtie2
[I] Usearch v.7 is installed, version: v7.0.1090_i86linux32, path: /public/home/sample_lib/ckzhu/software/panphlan/usearch7

STEP 2. Prepare input gene and genome files ...
Traceback (most recent call last):
  File "./panphlan/panphlan_pangenome_generation.py", line 1341, in <module>
    main()
  File "./panphlan/panphlan_pangenome_generation.py", line 1294, in main
    path_genome_fna_files, path_gene_ffn_files = check_genomes(args['i_ffn'], args['i_fna'], VERBOSE)
  File "./panphlan/panphlan_pangenome_generation.py", line 1089, in check_genomes
    genefiles   = [f for f in os.listdir(ffn_folder) if fnmatch(f,'*.'+FFN)]
NotADirectoryError: [Errno 20] Not a directory: False

Is there something else I need to download?

leonard.dubois · May 11, 2020, 9:50am

Hello,

I answered your question here : https://github.com/SegataLab/panphlan/issues/3

Topic		Replies	Views
Panphlan_pangenome_generation.py PanPhlAn	1	903	April 24, 2020
Problems with panphlan_map.py PanPhlAn	6	723	April 22, 2022
Samtools failure "[mpileup] fail to read the header of /tmp/panphlan_nprq80h8.bam" PanPhlAn	14	4825	November 3, 2020
Panphlan reference input PanPhlAn	1	503	November 8, 2021
Pangenome generation PanPhlAn	9	1696	January 17, 2024

Help with PanPhlAn tutorial

Related topics