Dear sir,
I read your article “profiling novel lateral gene transfer events in the human microbiome”. The waffle tool was a good tool for studying lateral gene transfer (LGT) events in the human microbiome. I also used the waffle tool recently, but I met some problems. First, I used the tool to detect lateral gene transfer events in soil metagenome data. Does it right? Second, I found lots of LGT events, but I am not sure how to normalize them. Because I am not sure about the meaning of total assembly size in 1000s genes. I thought the total assembly size meant the number of assembly contigs in a sample. Third, If clade A and clade B have one LGT event, how can I get the number of A_GENE, and B_GENE, like in table S2 in your article? I hope you can solve my problems and thank you very much.
Sorry for the slow reply. In answer to your questions:
-
I’m not sure how well that default database will reflect soil metagenomes since it’s focused on genomes that were downloaded from NCBI during the creation of HUMAnN 2. You could check the taxonomy file to see if the sort of taxa you’d expect in soil are (well) represented? I imagine that soil databases are much better today with the expansion of soil MAGs.
-
Our approach was to normalize events (i.e. contigs believed to capture an LGT event) by the total assembly size in numbers of genes. I think any reasonable similar approach would be fine - this just happened to be convenient for us. The key idea is that you have to adjust for the fact that, as you assemble more sequence from a metagenome, you will find more LGT (just as you find more of everything as you assemble more).
-
Most of those counts are derived from the gene-order patterns in the output files. E.g. a contig with pattern
AAABB
adds 3 A genes and 2 B genes.
Hope this helps!
I got it and thank you very much.