I am trying to figure out if I can get with humann3 descent info out from shallow shotgun metagenomics sequencing data of human stool samples, Novaseq paired-end 250 bp reads . For now it looks like I have high rates of unmapped reads.
I’m using humann v3.6, with MetaPhlAn version 4.0.3.
Command on the server to run concatenated kneaddata cleaned reads:
sbatch --ntasks=1 --cpus-per-task=5 -p shared-cpu -t 02:15:00 --mem=30000
humann --threads 5 --input cat_reads/MCS.fastq --output humann3_out_MCS/
The sequencing depth ranges between 43000 reads (neg controls) to 5M reads per pair of reads, so in the end up to 10M reads per sample. This is a pilot of 40 samples, where we aimed to have ~2M reads per sample. Median per sample in total is 4327477 reads currently. After removing human reads etc with kneaddata median per sample 3834510.
After running humann, taking into consideration only samples that had in total (both read pairs) at least 2M reads, I have an median of 63.2% unmapped in the gene families file, ranging from 36 to 94 %.
I was thinking that the high percentage of unmapped could be linked to low read numbers, but there seems to be no clear correlation:
When I blasted (blastx) some of the diamond unaligned reads, I had various results, ranging from some reads not mapping anywhere, but many mapping just partially - as my reads are rather long 250bp :
I was wondering if I might have too long reads?? Or is it still potentially a problem of not having enough depth (though I saw somewhere mentioned that even with 1M you could get some results…) ?
Any advice on how to increase the mapping rate? Any advice on analyzing shallow shotgun data with humann3 - maybe changing some parameters would make sense?
Thank you very much in advance!!
And have lovely holidays