Hi,
I have run MetaPhlAn 3 with -t marker_counts as I wanted to output read counts for each marker to do further statistical analysis such as phyloseq and deseq2.
This has given me an output of what I assume is each marker and the read count. Is there anyway to get the taxonomy of each marker?
I am also getting an error when trying to merge the tables together. Does merge_metaphlan_tables.py only work with the rel_ab output?
Traceback (most recent call last):
File "/pub38/cchong/miniconda2/envs/metaphlan3/bin/merge_metaphlan_tables.py", line 10, in <module>
sys.exit(main())
File "/pub38/cchong/miniconda2/envs/metaphlan3/lib/python3.7/site-packages/metaphlan/utils/merge_metaphlan_tables.py", line 78, in main
merge(args.aistms, sys.stdout)
File "/pub38/cchong/miniconda2/envs/metaphlan3/lib/python3.7/site-packages/metaphlan/utils/merge_metaphlan_tables.py", line 46, in merge
names = names,
UnboundLocalError: local variable 'names' referenced before assignment
Thanks in advance!
Directly no, but you can use the mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2
file in order to get the 'taxonomy'
field associated to each marker.
Yes, only with rel_ab
and rel_ab_w_read_stats
Hello Charlotte and Francesco,
I came across a similar problem: trying to translate markers when running metaphlan with the -t marker_counts
option.
As suggested by Francesco I used the mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2
for this. I created a small script and managed to obtain the format I wanted, these are the first lines (the second field is the read count (adding all the read count markers that contributed to a bug)):
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Lachnospiraceae|g__Lachnospiraceae_unclassified|s__Eubacterium_rectale 58749
k__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Burkholderiales|f__Burkholderiaceae|g__Cupriavidus|s__Cupriavidus_metallidurans 1
k__Bacteria|p__Synergistetes|c__Synergistia|o__Synergistales|f__Synergistaceae|g__Pyramidobacter|s__Pyramidobacter_piscolens 78
k__Bacteria|p__Verrucomicrobia|c__Verrucomicrobiae|o__Verrucomicrobiales|f__Akkermansiaceae|g__Akkermansia|s__Akkermansia_muciniphila 2712
k__Bacteria|p__Fusobacteria|c__Fusobacteriia|o__Fusobacteriales|f__Fusobacteriaceae|g__Fusobacterium|s__Fusobacterium_equinum 4
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Xanthomonadales|f__Xanthomonadaceae|g__Lysobacter|s__Lysobacter_enzymogenes 1
k__Bacteria|p__Fusobacteria|c__Fusobacteriia|o__Fusobacteriales|f__Leptotrichiaceae|g__Leptotrichia|s__Leptotrichia_sp_oral_taxon_225 1
k__Bacteria|p__Proteobacteria|c__Deltaproteobacteria|o__Desulfovibrionales|f__Desulfovibrionaceae|g__Bilophila|s__Bilophila_wadsworthia 1111
k__Bacteria|p__Proteobacteria|c__Deltaproteobacteria|o__Desulfovibrionales|f__Desulfovibrionaceae|g__Desulfovibrionaceae_unclassified|s__Desulfovibrionaceae_bacterium 3
Same thing as what Charlotte mentions, I want to use this table for further analysis, I’m thinking DESeq2 to perform a differential abundance analysis. However, I’m concerned I’m not taking into consideration thresholds such as percentage of markers with zero counts to discard an specie, among others.
Therefore, my question is: Is it valid to use the read counts table with the taxonomy info for differential abundance with DESeq2? or I’m beeing too naive?
Regards,
David
Yes! After transformation you should consider the different read depth by adding an offset term in your model.
There’s also an example of the count conversion in the curatedMetagenomicData vignette here
Hello David,
Can you share your script?
Emiliana
But this is assuming that the read depth per sample is known. In many cases, this is not the case. There are several threads where people are asking about how to output counts rather than the normalised relative abundances, but none provide a clear response. Relative abundances are suboptimal as information is lost in the normalisation, and it is also not suitable for common analyses such as Bray-Curtis.