NCBI .fasta files are nucleotide sequence uninterrupted, but demos are broken up every 100

jfoldi · January 22, 2020, 1:56am

Long-time reader, first-time poster. I’m trying to run HUMAnN2 on a .fasta file from NCBI but their file formats all have the nucleotides in a long uninterrupted sequence, while the demos provided all have theirs broken up in increments of 100 nucleotides. I was able to successfully run HUMAnN2 on the demo.fasta, but when trying to then run it on one of these genomes I got a critical error about Bowtie2 and memory allocation. I think this may be because the .fasta file was not broken up into chunks like the demos. Is that what my issue is, in which case I just need to write a script to break it up into these chunks, or should I be able to run HUMAnN2 on this type of .fasta file and am getting the error for some other reason? Thanks!

franzosa · January 22, 2020, 10:24pm

The files you downloaded from NCBI sounds like microbial GENOMES, whereas HUMAnN2 is designed for working on metagenomes (i.e. collections of many short DNA sequences samples from a variety of microbes, similar to the format of the demo file). If you’re interested in functionally annotating a new genome you might want to look at something like Prokka instead.

Topic		Replies	Views
Optimising Humann run time - low species number - uniref database question HUMAnN	2	890	February 11, 2022
Gene sequences fasta files for bowtie2 and diamond index HUMAnN	4	45	March 7, 2025
HUMAnN3 no species selected from prescreen HUMAnN	3	916	January 7, 2021
Metatranscriptome workflow, with more than one metagenome HUMAnN	0	382	December 15, 2021
Bowtie2 unaligned reads slow HUMAnN	14	1969	November 8, 2024

NCBI .fasta files are nucleotide sequence uninterrupted, but demos are broken up every 100

Related topics