Hi all,
I’m trying to run HUMAnN 3.0 without any translated protein search.
Here is the command I use:
$ cat Sample1_R1.trimmed.fastq.gz Sample1_R2.trimmed.fastq.gz > Sample1_trimmed.fastq.gz
$ humann --input Sample1_trimmed.fastq.gz --output /shotgun_seq/humann3 --threads 24 --bypass-translated-search
Then, I get sequences from ChocoPhlAn database through the UniRef90 ID in genefamilies.tsv.
However, it seems like there are different UniRef90 IDs that have the same nucleotide sequences in ChocoPhlAn database. For example, UniRef90_A0A1A9P878 and UniRef90_A0A379QCR7.
Could you please tell me why these two gene families have different RPKs? Or did I do something wrong?
- genefamilies.tsv
UniRef90_A0A1A9P878|g__Klebsiella.s__Klebsiella_pneumoniae 34.5649582837
UniRef90_A0A379QCR7|g__Klebsiella.s__Klebsiella_pneumoniae 40.5098326496
- Sample1_trimmed_bowtie2_aligned.txt
$ grep 'UniRef90_A0A1A9P878\|UniRef90_A0A379QCR7' Sample1_trimmed_bowtie2_aligned.txt | grep 'g__Klebsiella.s__Klebsiella_pneumoniae' | cut -f 2 | sort | uniq -c
58 573__A0A1A9P878__recQ_2|k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Enterobacterales.f__Enterobacteriaceae.g__Klebsiella.s__Klebsiella_pneumoniae|UniRef90_A0A1A9P878|UniRef50_A0A1A9P878|1827
68 573__A0A379QCR7__recQ_2|k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Enterobacterales.f__Enterobacteriaceae.g__Klebsiella.s__Klebsiella_pneumoniae|UniRef90_A0A379QCR7|UniRef50_A0A2T1LC21|1827
- g__Klebsiella.s__Klebsiella_pneumoniae.centroids.v296_201901.ffn.gz
>573__A0A1A9P878__recQ_2|k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Enterobacterales.f__Enterobacteriaceae.g__Klebsiella.s__Klebsiella_pneumoniae|UniRef90_A0A1A9P878|UniRef50_A0A1A9P878|1827
GTGGCACAGGCGGAAGTATTAAATCAGGAATCGCTGGCTAAGCAGGTTTTACAAGAGACC
TTCGGCTACCAGCAGTTCCGTCCTGGCCAGGAAACGATTATCGAGACGGCGCTCGAAGGC
CGGGACTGCCTGGTGGTCATGCCGACCGGTGGCGGCAAGTCGCTGTGCTATCAGGTGCCG
GCGCTGGTCATGGGCGGTCTGACGGTCGTGGTCTCACCGCTGATCTCGCTGATGAAGGAC
CAGGTCGATCAGCTGCTGGCCAACGGCGTGGCGGCGGCTTGTCTGAACTCGACGCAAAGC
CGCGAGCAGCAGCAGGAGGTGATGGCCGGCTGCCGCAGCGGGCAGGTTCGTCTGCTGTAT
ATCGCGCCGGAACGGCTGATGCTGGATAACTTTCTTGAGCATCTGGCGAACTGGAACCTG
GCGATGCTGGCGGTAGACGAGGCGCACTGTATCTCGCAGTGGGGCCATGACTTCCGTCCG
GAATATGCCGCGCTGGGCCAGCTGCGTCAGCGGATGCCGCAGATCCCGTTTATGGCGTTG
ACCGCCACCGCCGATGATACCACCCGCCGCGATATCGTCCGCCTGCTGGGGCTTAACGAT
CCGCTGATTCAGGTCAGCAGCTTCGACCGGCCAAACATCCGCTATATGCTGATGGAGAAA
TTCAAGCCGCTCGATCAGCTGATGCGCTACGTTCAGGATCAGCGCGGCAAATCGGGCATT
ATCTACTGCAACAGCCGTTCGAAAGTGGAAGACACCGCCGCCAGGCTGCAAAGCCGCGGT
ATTAGCGCGGCGGCTTACCATGCCGGTCTGGAAAACGACGTGCGCGCCGAGGTGCAGGAG
AAATTCCAGCGCGACGATCTGCAGATCGTGGTGGCGACGGTGGCCTTCGGGATGGGCATT
AACAAGCCGAACGTCCGCTTTGTGGTGCATTTTGATATTCCGCGCAATATAGAATCCTAC
TATCAGGAGACCGGCCGCGCCGGGCGTGATGGTCTGCCGGCGGAAGCGATGCTGTTTTAC
GATCCGGCGGATATGGCGTGGCTGCGCCGCTGTCTGGAAGAAAAACCCGCCGGGCCGCTA
CAGGATATCGAACGGCATAAGCTGAATGCGATGGGGGCGTTTGCCGAAGCGCAGACCTGT
CGCCGTCTGGTGCTGCTGAACTATTTTGGCGAAGGGCGTCAGGAGCCGTGCGGCAACTGC
GATATCTGTCTTGACCCGCCAAAGCAGTACGATGGCTTAATGGACGCCCGCAAGGCGCTT
TCAACGATTTACCGGGTCAATCAACGCTTCGGAATGGGTTACGTGGTGGAGGTCCTGCGC
GGGGCCAACAACCAGCGCATCCGAGAGATGGGCCACGATAAGCTGCCGGTTTACGGTATC
GGCCGGGAGCAAAGTCACGAGCACTGGGTGAGCGTGATCCGCCAGCTGATCCACCTTGGG
CTGGTGACGCAGAATATCGCCCAGCACTCCGCGCTGCAGCTGACCGAAGCCGCGCGACCG
GTGCTGCGTGGCGAAGTGCCGCTGCAGCTCGCCGTGCCGCGTATCGTGGCGCTGAAGCCA
AAGGCGATGCAGAAATCCTTTGGCGGCAATTACGACCGTAAACTGTTCGCCAAGCTGCGC
AAATTACGTAAAGCGATCGCCGACGAAGAGAACATCCCGCCATATGTGGTCTTCAACGAC
GCGACGCTTATCGAGATGGCCGAACAATCGCCGCTGACCGCCGGCGAAATGCTCAGCGTC
AACGGCGTGGGGACACGCAAGCTCGAGCGTTTCGGGAAGCCGTTTATGGCGCTGATCCGG
GCGCATGTTGATGGCGACGATGAGTAG
>573__A0A379QCR7__recQ_2|k__Bacteria.p__Proteobacteria.c__Gammaproteobacteria.o__Enterobacterales.f__Enterobacteriaceae.g__Klebsiella.s__Klebsiella_pneumoniae|UniRef90_A0A379QCR7|UniRef50_A0A2T1LC21|1827
GTGGCACAGGCGGAAGTATTAAATCAGGAATCGCTGGCTAAGCAGGTTTTACAAGAGACC
TTCGGCTACCAGCAGTTCCGTCCTGGCCAGGAAACGATTATCGAGACGGCGCTCGAAGGC
CGGGACTGCCTGGTGGTCATGCCGACCGGTGGCGGCAAGTCGCTGTGCTATCAGGTGCCG
GCGCTGGTCATGGGCGGTCTGACGGTCGTGGTCTCACCGCTGATCTCGCTGATGAAGGAC
CAGGTCGATCAGCTGCTGGCCAACGGCGTGGCGGCGGCTTGTCTGAACTCGACGCAAAGC
CGCGAGCAGCAGCAGGAGGTGATGGCCGGCTGCCGCAGCGGGCAGGTTCGTCTGCTGTAT
ATCGCGCCGGAACGGCTGATGCTGGATAACTTTCTTGAGCATCTGGCGAACTGGAACCTG
GCGATGCTGGCGGTAGACGAGGCGCACTGTATCTCGCAGTGGGGCCATGACTTCCGTCCG
GAATATGCCGCGCTGGGCCAGCTGCGTCAGCGGATGCCGCAGATCCCGTTTATGGCGTTG
ACCGCCACCGCCGATGATACCACCCGCCGCGATATCGTCCGCCTGCTGGGGCTTAACGAT
CCGCTGATTCAGGTCAGCAGCTTCGACCGGCCAAACATCCGCTATATGCTGATGGAGAAA
TTCAAGCCGCTCGATCAGCTGATGCGCTACGTTCAGGATCAGCGCGGCAAATCGGGCATT
ATCTACTGCAACAGCCGTTCGAAAGTGGAAGACACCGCCGCCAGGCTGCAAAGCCGCGGT
ATTAGCGCGGCGGCTTACCATGCCGGTCTGGAAAACGACGTGCGCGCCGAGGTGCAGGAG
AAATTCCAGCGCGACGATCTGCAGATCGTGGTGGCGACGGTGGCCTTCGGGATGGGCATT
AACAAGCCGAACGTCCGCTTTGTGGTGCATTTTGATATTCCGCGCAATATAGAATCCTAC
TATCAGGAGACCGGCCGCGCCGGGCGTGATGGTCTGCCGGCGGAAGCGATGCTGTTTTAC
GATCCGGCGGATATGGCGTGGCTGCGCCGCTGTCTGGAAGAAAAACCCGCCGGGCCGCTA
CAGGATATCGAACGGCATAAGCTGAATGCGATGGGGGCGTTTGCCGAAGCGCAGACCTGT
CGCCGTCTGGTGCTGCTGAACTATTTTGGCGAAGGGCGTCAGGAGCCGTGCGGCAACTGC
GATATCTGTCTTGACCCGCCAAAGCAGTACGATGGCTTAATGGACGCCCGCAAGGCGCTT
TCAACGATTTACCGGGTCAATCAACGCTTCGGAATGGGTTACGTGGTGGAGGTCCTGCGC
GGGGCCAACAACCAGCGCATCCGAGAGATGGGCCACGATAAGCTGCCGGTTTACGGTATC
GGCCGGGAGCAAAGTCACGAGCACTGGGTGAGCGTGATCCGCCAGCTGATCCACCTTGGG
CTGGTGACGCAGAATATCGCCCAGCACTCCGCGCTGCAGCTGACCGAAGCCGCGCGACCG
GTGCTGCGTGGCGAAGTGCCGCTGCAGCTCGCCGTGCCGCGTATCGTGGCGCTGAAGCCA
AAGGCGATGCAGAAATCCTTTGGCGGCAATTACGACCGTAAACTGTTCGCCAAGCTGCGC
AAATTACGTAAAGCGATCGCCGACGAAGAGAACATCCCGCCATATGTGGTCTTCAACGAC
GCGACGCTTATCGAGATGGCCGAACAATCGCCGCTGACCGCCGGCGAAATGCTCAGCGTC
AACGGCGTGGGGACACGCAAGCTCGAGCGTTTCGGGAAGCCGTTTATGGCGCTGATCCGG
GCGCATGTTGATGGCGACGATGAGTAG