Questions about the command in Step1 of Example 02

Hi! I’m using PhyloPhlAn3 to construct phylogenic tree from my genomes. I followed the instruction of example2 in GitHub, but came across some problems.

After I run step1 command, I found that the command seemed not to stop. Seems all the files were downloaded and the command continued to run. I’m wondering if I use the command in a wrong way, or should I stop it using ‘ctrl +c’ ? Please give me some suggestion.

By the way, I run the command twice and the log is like

phylophlan_get_reference.py version 3.0.18 (27 November 2020)

Command line: /home/boot/anaconda3/envs/phylophlan/bin/phylophlan_get_reference -g all -o input_genomes/ -n 1 --verbose

Arguments: {'get': 'all', 'list_clades': False, 'database_update': False, 'output_file_extension': '.fna.gz', 'output': 'input_genomes/', 'how_many': 1, 'genbank_mapping': 'assembly_summary_genbank.txt', 'verbose': True}
Downloading "http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/taxa2genomes.txt" to "taxa2genomes.txt"
Downloading file of size: 0.00 MB
0.01 MB 1500.37 %  64.89 MB/sec  0 min -0 sec
Downloading "http://cmprod1.cibio.unitn.it/databases/PhyloPhlAn/taxa2genomes_cpa201901_up201901.txt.bz2" to "taxa2genomes_cpa201901_up201901.txt.bz2"
Downloading file of size: 0.50 MB
0.50 MB 100.54 %   0.59 MB/sec  0 min -0 sec
Creating output folder "input_genomes/"
Downloading "https://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_genbank.txt" to "assembly_summary_genbank.txt"
Downloading file of size: 1108.72 MB
1108.72 MB 100.00 %  11.52 MB/sec  0 min -0 sec



phylophlan_get_reference.py version 3.0.18 (27 November 2020)

Command line: /home/boot/anaconda3/envs/phylophlan/bin/phylophlan_get_reference -g all -o input_genomes/ -n 1 --verbose

Arguments: {'get': 'all', 'list_clades': False, 'database_update': False, 'output_file_extension': '.fna.gz', 'output': 'input_genomes/', 'how_many': 1, 'genbank_mapping': 'assembly_summary_genbank.txt', 'verbose': True}
File "taxa2genomes.txt" present
File "taxa2genomes_cpa201901_up201901.txt.bz2" present
Output folder "input_genomes/" present
File "assembly_summary_genbank.txt" present

Hi there, that portion of the log you provided is just the download of the reference files to retrieve the isolate genomes.
from the command line, you are trying to download 1 genome for all species, so that’s gonna take quite a bit of time, and I believe that the command will printing the current genome being downloaded after the lines you provided, is that the case?
If so, I’m afraid that you will have to wait for the download to finish if you need to have one representative genome for all taxonomic species, they will be >12k.

I hope this helps,
Francesco