NCBI .fasta files are nucleotide sequence uninterrupted, but demos are broken up every 100

Long-time reader, first-time poster. I’m trying to run HUMAnN2 on a .fasta file from NCBI but their file formats all have the nucleotides in a long uninterrupted sequence, while the demos provided all have theirs broken up in increments of 100 nucleotides. I was able to successfully run HUMAnN2 on the demo.fasta, but when trying to then run it on one of these genomes I got a critical error about Bowtie2 and memory allocation. I think this may be because the .fasta file was not broken up into chunks like the demos. Is that what my issue is, in which case I just need to write a script to break it up into these chunks, or should I be able to run HUMAnN2 on this type of .fasta file and am getting the error for some other reason? Thanks!

The files you downloaded from NCBI sounds like microbial GENOMES, whereas HUMAnN2 is designed for working on metagenomes (i.e. collections of many short DNA sequences samples from a variety of microbes, similar to the format of the demo file). If you’re interested in functionally annotating a new genome you might want to look at something like Prokka instead.

1 Like