Issues using .sam input files

Hi,

I am trying to run humann v3.7 using .sam files as the input locally.
I’ve ended up with the .sam output as these files were more easily transferrable from collaborator than the raw read files.

Example of the .sam output (that I am using as the input) below:
HD VN:1.5 SO:unsorted GO:query
@SQ SN:UniRef90_A0A0P9AHF6|1__10|SGB12765 LN:1500
@SQ SN:UniRef90_A0A1I5H7C1|1__7|SGB12765 LN:1050
@SQ SN:UniRef90_A0A0P9A6P7|1__6|SGB12765 LN:850
@SQ SN:UniRef90_A0A0P8ZFJ4|1__6|SGB12765 LN:900
@SQ SN:UniRef90_A0A0P8Z1L0|1__4|SGB12765 LN:550
@SQ SN:UniRef90_A0A1I5H9G0|1__4|SGB12765 LN:550
@SQ SN:UniRef90_A0A0P9B7F0|1__4|SGB12765 LN:550
@SQ SN:UniRef90_A0A0P8ZDG2|1__4|SGB12765 LN:600
@SQ SN:UniRef90_A0A0P9B864|1__3|SGB12765 LN:450
@SQ SN:UniRef90_A0A1I5EST1|1__23|SGB12765 LN:3450
@SQ SN:UniRef90_A0A0P9AVK5|1__18|SGB12765 LN:2650
@SQ SN:UniRef90_A0A0P9C085|1__18|SGB12765 LN:2700
@SQ SN:UniRef90_A0A1I5SP85|2__16|SGB12765 LN:2150
@SQ SN:UniRef90_A0A1I5HWZ6|1__15|SGB12765 LN:2250
@SQ SN:UniRef90_A0A1I5GIL6|1__15|SGB12765 LN:2250
@SQ SN:UniRef90_UNK12765-BNMMOHAH_04145|4__17|SGB12765 LN:2050
@SQ SN:UniRef90_A0A0P9DM60|1__14|SGB12765 LN:2100
@SQ SN:UniRef90_A0A1I5JFC0|1__13|SGB12765 LN:1950
@SQ SN:UniRef90_A0A0N8PD01|1__13|SGB12765 LN:1950
@SQ SN:UniRef90_A0A0P9CJ31|1__13|SGB12765 LN:1950
@SQ SN:UniRef90_A0A0P9BH31|1__13|SGB12765 LN:1950
@SQ SN:UniRef90_A0A0P9EXR7|1__12|SGB12765 LN:1800
@SQ SN:UniRef90_A0A0P9DRJ5|1__12|SGB12765 LN:1800
@SQ SN:UniRef90_A0A0P9AQ77|1__12|SGB12765 LN:1800

@SQ SN:VDB|0046-0165-0-0003|M801-c99-c0-c189 LN:10458
@SQ SN:VDB|0047-002F-0-0006|M801-c99-c0-c190 LN:10269
@SQ SN:VDB|001D-011E-0-0007|M801-c99-c0-c191 LN:8842
@SQ SN:VDB|001D-00C7-0-0008|M801-c99-c0-c192 LN:7159
@SQ SN:VDB|003B-0000-0-021D|M489-c9-c0-c0 LN:171541
@PG ID:bowtie2 PN:bowtie2 VN:2.5.1 CL:“/home/vinoy_ramachandran_nibsc_org/anaconda3/envs/meta/bin/bowtie2-align-l --wrapper basic-0 --seed 1992 --quiet --very-sensitive -x /home/vinoy_ramachandran_nibsc_org/anaconda3/envs/meta/lib/python3.7/site-packages/metaphlan/metaphlan_databases/mpa_vOct22_CHOCOPhlAnSGB_202212 -p 56 --passthrough -U -”
VH00941:3:AAAWGTYHV:1:1101:32092:7967__1.163 16 UniRef90_A0A173XGA0|1__6|SGB102086 235 42 136M * 0 0 GTGGTCCCCAACCTCGACGCCCTGATCGACGCGAAGGACCTTGTCGGATTCGCACGCGTCGGTGTGAACGTCGAATTCGACGCATACGTGGCGCAAGGAGCCGATCCAGCCGAAGCGCGCATCGCCACCGCCCATC CCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCC;;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;;C;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCC AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:136 YT:Z:UU
VH00941:3:AAAWGTYHV:1:1101:20598:6282__1.221 16 VDB|000E-000C-0-0002|M892-c3062-c0-c0 16330 1 128M * 0 0 TGTCAATAAGCCTTTCAGCCCATCCAAAATGACACGAGCTGAGGCGAGGGAGGCATATCCGGAGTGGTATGAGCGTGTTGTTGTGAGAGGCGAGAAAGGACGCAAGAAGTGGGATATTGCCGGAAAGG CCCCCCCCC;CCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCC;CCCCCCCCCC;CCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCC AS:i:0 XS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:128 YT:Z:UU
VH00941:3:AAAWGTYHV:1:1101:38682:7664__1.6 0 UniRef90_A0A1C6J9P1|1__5|SGB5090 544 42 124M * 0 0 AATGCCAGGGTAATTAAAGAGGTGGCATCAGAAATTCCGCTTTCTTCGATTGTGCTTGAGACAGATAGTCCATATCTGGCACCTGTGCCATATCGCGGAAAACGTAATAATTCAATGTATTTAA CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;C;CCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCC;-CCCCCCCCCCCCCCCCCCCCCCCC AS:i:-10 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:65C23G34 YT:Z:UU
VH00941:3:AAAWGTYHV:1:1101:24177:6377__1.269 16 UniRef90_A0A291TDJ7|1__6|SGB15318 46 42 124M * 0 0 GCTCTGCGGCGGCAGACCGCCCGCGCCAATGCCGAGGATGCCCGGCTGGAGGCGGAGGCCGCCATCCCGGCCCTGCGCCACGCTGAGGACGAGGTGCGGGTGCGGGGCATTCGCTGCGCGCTGG CCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCC;;CC-CCCCCCCCCCCCCCCCCCCCCC;CCC;CCCCC AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:110C13 YT:Z:UU
VH00941:3:AAAWGTYHV:1:1101:17133:7967__1.159 0 UniRef90_UNK4798-ONLMMPNP_01778|2__7|SGB4798 574 42 131M * 0 0 TGCCTGTGAGGGTGTGGCAAAACCCCTGAAGGAAGAACTGGAAGACTTTGAGATGTATCGGCGCTATATGTATGAACTCTGTGACATGGGTTACTGGTGCATACTGGAAAAGGCTTCCGGGGAAATTATAG CCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCC;CCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCC AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:131 YT:Z:UU

|VH00941:5:AAAY3FVHV:2:2614:26033:49275__1.65851449|16|UniRef90_A0A1C5R1I3|1__5|SGB4828|525|0|136M||0|0|CGTAACAACACTGATCGGTGTATTTATCATGATGCTTTCCATCAACGTGTGGATGACACTGGCAGCAGTGCTGATCCTGCCGGTTTCCATGCTTATCATTAATAAAGTAATGAAACACTCCCAGAAATATTTCCAG|CCCCCCCCCCCCC;CCCCC;CC;CCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCC|AS:i:-80|XN:i:0|XM:i:16|XO:i:0|XG:i:0|NM:i:16|MD:Z:0T2G17C14G2T5T2A18C1T0T28C2C2G8G2T11C6|YT:Z:UU|
|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|—|
|VH00941:5:AAAY3FVHV:2:2614:53963:50505__1.65850929|16|UniRef90_R6A1I9|1__6|SGB9262|514|42|135M|
|0|0|ATCGCCAATGCTTCTCCGTTTGAAGAAGGAAAGGAAGAACAGCGGTTAAATATGGTGCGCCGCCATGTGAACGCCATCGGAATGGACGCCTGCTACGTCAATATGTGCGGAGCGCAGGATGAAATCGTTTTTGAC|CCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCC-CCC|AS:i:0|XN:i:0|XM:i:0|XO:i:0|XG:i:0|NM:i:0|MD:Z:135|YT:Z:UU|

When running this sam file through humann I am getting 3 output files (as well as various temporary files).
Processing: 605_P001-baseline_S1_clean_combined.fastq_genefamilies.tsv…Processing: 605_P001-baseline_S1_clean_combined.fastq_bowtie2_aligned.tsv…Processing: 605_P001-baseline_S1_clean_combined.fastq_pathabundance.tsv…
605_P001-baseline_S1_clean_combined.fastq_pathcoverage.tsv (108 Bytes)

My assumption is that as everything is coming out as unmapped, humann is unable to recognise the sam file format I am using for the input. Does that therefore mean I need to run from raw reads?

I have run humann_test and all coming out ok, and run the demo.sam with the following output:
Processing: demo_genefamilies.tsv…Processing: demo_bowtie2_aligned.tsv…Processing: demo_pathabundance.tsv…
demo_pathcoverage.tsv (71 Bytes).
So I am hoping I have installed humann correctly.

Apologies, I am probably just being dense here, but I’m a complete novice to all of this and having spent quite a while trying to find a solution myself, thought it best to seek expert advice and guidance.

Many thanks in advance,
Blair

You can definitely run HUMAnN starting from a SAM file if the SAM files records mappings of reads to a HUMAnN-formatted database. That doesn’t appear to be the case for these files.

Are these SAM files the output from MetaPhlAn (containing mappings to MetaPhlAn marker genes)? If so, there isn’t enough signal there to do functional profiling, but you can provide a SAM input to MetaPhlAn to generate a taxonomic profile and then provide that profile to HUMAnN via the --taxonomic-profile flag to perform a traditional HUMAnN analysis (indexing and mapping against detected pangenomes).

If these SAM files were created against a custom database then I’d need to know more details about what you’re attempting to do in order to troubleshoot further.

Thanks Franzosa for such a rapid response.
Sorry should have said, these sam files are the output from metaphlan. And as I feared, this can’t be used as input to humann.
But from what you are saying I can run these sam files back through metaphlan and use the taxonomic profile output as input to humann, so i’ll give that a go.
Many thanks for your help

Sorry to trouble again.
I’ve generated a taxonomic profile and provided that to humann via the --taxonomic-profile flag, but still ending up with the same result (all unmapped).
Any further suggestions?
Many thanks in advance

Your previous message had the right interpretation (i.e. the SAMs are MetaPhlAn temp files and can be used for taxonomic profile generation but aren’t appropriate for input to HUMAnN).

Can you tell me a bit more about what you did after generating the taxonomic profile? What did you provide as input to HUMAnN (alongside the profile)? How did the profiles look - did they include known species? And are you sure you have a compatible version of MetaPhlAn and HUMAnN? For example you need to be using HUMAnN 3.6+ to use a MetaPhlAn 4 taxonomic profile as input (and so forth).

Using HUMAnN 3.7 and MetaPhlAn 4.
Apologies, but I misunderstood that creation of the taxonomic profile and using that as a flag would enable me to use the original SAMs as input. The taxonomic profiles looked ok including known species.
I’ll need to get the raw data, or get my collaborator to re-run the files and generate SAMs to send over that are compatible with input to HUMAnN input.
Thanks again