Hi,
I am trying to run humann v3.7 using .sam files as the input locally.
I’ve ended up with the .sam output as these files were more easily transferrable from collaborator than the raw read files.
Example of the .sam output (that I am using as the input) below:
HD VN:1.5 SO:unsorted GO:query
@SQ SN:UniRef90_A0A0P9AHF6|1__10|SGB12765 LN:1500
@SQ SN:UniRef90_A0A1I5H7C1|1__7|SGB12765 LN:1050
@SQ SN:UniRef90_A0A0P9A6P7|1__6|SGB12765 LN:850
@SQ SN:UniRef90_A0A0P8ZFJ4|1__6|SGB12765 LN:900
@SQ SN:UniRef90_A0A0P8Z1L0|1__4|SGB12765 LN:550
@SQ SN:UniRef90_A0A1I5H9G0|1__4|SGB12765 LN:550
@SQ SN:UniRef90_A0A0P9B7F0|1__4|SGB12765 LN:550
@SQ SN:UniRef90_A0A0P8ZDG2|1__4|SGB12765 LN:600
@SQ SN:UniRef90_A0A0P9B864|1__3|SGB12765 LN:450
@SQ SN:UniRef90_A0A1I5EST1|1__23|SGB12765 LN:3450
@SQ SN:UniRef90_A0A0P9AVK5|1__18|SGB12765 LN:2650
@SQ SN:UniRef90_A0A0P9C085|1__18|SGB12765 LN:2700
@SQ SN:UniRef90_A0A1I5SP85|2__16|SGB12765 LN:2150
@SQ SN:UniRef90_A0A1I5HWZ6|1__15|SGB12765 LN:2250
@SQ SN:UniRef90_A0A1I5GIL6|1__15|SGB12765 LN:2250
@SQ SN:UniRef90_UNK12765-BNMMOHAH_04145|4__17|SGB12765 LN:2050
@SQ SN:UniRef90_A0A0P9DM60|1__14|SGB12765 LN:2100
@SQ SN:UniRef90_A0A1I5JFC0|1__13|SGB12765 LN:1950
@SQ SN:UniRef90_A0A0N8PD01|1__13|SGB12765 LN:1950
@SQ SN:UniRef90_A0A0P9CJ31|1__13|SGB12765 LN:1950
@SQ SN:UniRef90_A0A0P9BH31|1__13|SGB12765 LN:1950
@SQ SN:UniRef90_A0A0P9EXR7|1__12|SGB12765 LN:1800
@SQ SN:UniRef90_A0A0P9DRJ5|1__12|SGB12765 LN:1800
@SQ SN:UniRef90_A0A0P9AQ77|1__12|SGB12765 LN:1800
…
@SQ SN:VDB|0046-0165-0-0003|M801-c99-c0-c189 LN:10458
@SQ SN:VDB|0047-002F-0-0006|M801-c99-c0-c190 LN:10269
@SQ SN:VDB|001D-011E-0-0007|M801-c99-c0-c191 LN:8842
@SQ SN:VDB|001D-00C7-0-0008|M801-c99-c0-c192 LN:7159
@SQ SN:VDB|003B-0000-0-021D|M489-c9-c0-c0 LN:171541
@PG ID:bowtie2 PN:bowtie2 VN:2.5.1 CL:“/home/vinoy_ramachandran_nibsc_org/anaconda3/envs/meta/bin/bowtie2-align-l --wrapper basic-0 --seed 1992 --quiet --very-sensitive -x /home/vinoy_ramachandran_nibsc_org/anaconda3/envs/meta/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_vOct22_CHOCOPhlAnSGB_202212 -p 56 --passthrough -U -”
VH00941:3:AAAWGTYHV:1:1101:32092:7967__1.163 16 UniRef90_A0A173XGA0|1__6|SGB102086 235 42 136M * 0 0 GTGGTCCCCAACCTCGACGCCCTGATCGACGCGAAGGACCTTGTCGGATTCGCACGCGTCGGTGTGAACGTCGAATTCGACGCATACGTGGCGCAAGGAGCCGATCCAGCCGAAGCGCGCATCGCCACCGCCCATC CCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCC;;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;;C;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCC AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:136 YT:Z:UU
VH00941:3:AAAWGTYHV:1:1101:20598:6282__1.221 16 VDB|000E-000C-0-0002|M892-c3062-c0-c0 16330 1 128M * 0 0 TGTCAATAAGCCTTTCAGCCCATCCAAAATGACACGAGCTGAGGCGAGGGAGGCATATCCGGAGTGGTATGAGCGTGTTGTTGTGAGAGGCGAGAAAGGACGCAAGAAGTGGGATATTGCCGGAAAGG CCCCCCCCC;CCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCC;CCCCCCCCCC;CCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCC AS:i:0 XS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:128 YT:Z:UU
VH00941:3:AAAWGTYHV:1:1101:38682:7664__1.6 0 UniRef90_A0A1C6J9P1|1__5|SGB5090 544 42 124M * 0 0 AATGCCAGGGTAATTAAAGAGGTGGCATCAGAAATTCCGCTTTCTTCGATTGTGCTTGAGACAGATAGTCCATATCTGGCACCTGTGCCATATCGCGGAAAACGTAATAATTCAATGTATTTAA CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;C;CCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCC;-CCCCCCCCCCCCCCCCCCCCCCCC AS:i:-10 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:65C23G34 YT:Z:UU
VH00941:3:AAAWGTYHV:1:1101:24177:6377__1.269 16 UniRef90_A0A291TDJ7|1__6|SGB15318 46 42 124M * 0 0 GCTCTGCGGCGGCAGACCGCCCGCGCCAATGCCGAGGATGCCCGGCTGGAGGCGGAGGCCGCCATCCCGGCCCTGCGCCACGCTGAGGACGAGGTGCGGGTGCGGGGCATTCGCTGCGCGCTGG CCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCC;;CC-CCCCCCCCCCCCCCCCCCCCCC;CCC;CCCCC AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:110C13 YT:Z:UU
VH00941:3:AAAWGTYHV:1:1101:17133:7967__1.159 0 UniRef90_UNK4798-ONLMMPNP_01778|2__7|SGB4798 574 42 131M * 0 0 TGCCTGTGAGGGTGTGGCAAAACCCCTGAAGGAAGAACTGGAAGACTTTGAGATGTATCGGCGCTATATGTATGAACTCTGTGACATGGGTTACTGGTGCATACTGGAAAAGGCTTCCGGGGAAATTATAG CCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCC;CCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCC AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:131 YT:Z:UU
…
|VH00941:5:AAAY3FVHV:2:2614:26033:49275__1.65851449|16|UniRef90_A0A1C5R1I3|1__5|SGB4828|525|0|136M||0|0|CGTAACAACACTGATCGGTGTATTTATCATGATGCTTTCCATCAACGTGTGGATGACACTGGCAGCAGTGCTGATCCTGCCGGTTTCCATGCTTATCATTAATAAAGTAATGAAACACTCCCAGAAATATTTCCAG|CCCCCCCCCCCCC;CCCCC;CC;CCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCC|AS:i:-80|XN:i:0|XM:i:16|XO:i:0|XG:i:0|NM:i:16|MD:Z:0T2G17C14G2T5T2A18C1T0T28C2C2G8G2T11C6|YT:Z:UU|
|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|
|VH00941:5:AAAY3FVHV:2:2614:53963:50505__1.65850929|16|UniRef90_R6A1I9|1__6|SGB9262|514|42|135M||0|0|ATCGCCAATGCTTCTCCGTTTGAAGAAGGAAAGGAAGAACAGCGGTTAAATATGGTGCGCCGCCATGTGAACGCCATCGGAATGGACGCCTGCTACGTCAATATGTGCGGAGCGCAGGATGAAATCGTTTTTGAC|CCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCC-CCC|AS:i:0|XN:i:0|XM:i:0|XO:i:0|XG:i:0|NM:i:0|MD:Z:135|YT:Z:UU|
When running this sam file through humann I am getting 3 output files (as well as various temporary files).
Processing: 605_P001-baseline_S1_clean_combined.fastq_genefamilies.tsv…Processing: 605_P001-baseline_S1_clean_combined.fastq_bowtie2_aligned.tsv…Processing: 605_P001-baseline_S1_clean_combined.fastq_pathabundance.tsv…
605_P001-baseline_S1_clean_combined.fastq_pathcoverage.tsv (108 Bytes)
My assumption is that as everything is coming out as unmapped, humann is unable to recognise the sam file format I am using for the input. Does that therefore mean I need to run from raw reads?
I have run humann_test and all coming out ok, and run the demo.sam with the following output:
Processing: demo_genefamilies.tsv…Processing: demo_bowtie2_aligned.tsv…Processing: demo_pathabundance.tsv…
demo_pathcoverage.tsv (71 Bytes).
So I am hoping I have installed humann correctly.
Apologies, I am probably just being dense here, but I’m a complete novice to all of this and having spent quite a while trying to find a solution myself, thought it best to seek expert advice and guidance.
Many thanks in advance,
Blair