Dataset S3 CRISPR

sagunmaharjann · May 4, 2021, 3:34pm

My lab thoroughly enjoyed reading your recent Cell Host & Microbe paper
analyzing CRISPR immune systems in the HMP data. We are hoping to
leverage your taxonomic mapping for a project of our own, but my student
Wei and I can only find a portion of the data in the supplemental data
files posted on your web site.

The paper describes 1,630,590 spacer sequences taxonomically identified
by mapping to assemblies with MetaPhlAn2 or UniRef90 annotations and an
additional 768,068 spacer sequences taxonomically identified with
DIAMOND blastx directly against UniRef90. The latter set is clearly
identified in your “hmp1-II-crispr-spacers-annotation.tar.gz” datafile.
However, we have been unable to find the former (larger) set.

Can you give any guidance? The paper references “Dataset S3,” but it is
not clear which file corresponds to this dataset.

pmuench · May 4, 2021, 3:39pm

Thanks for pointing this out! We have uploaded the dataset to the crispr2020 – The Huttenhower Lab webpage as mapping_assembly.tar.gz. This contains the taxonomic annotation of the 1,630,590 spacers.

Best,
Philipp

Topic		Replies	Views
HMP shotgun metagenomic data request IBDMDB	1	666	January 2, 2020
Mining Metaphlan4 published results Data resource	1	215	December 4, 2023
PiCrust 2.0 reference database genome list Data resource	0	56	May 8, 2024
Greetings and regardes Data resource	1	209	November 3, 2023
Raw HMP2 host transcriptomics data IBDMDB	2	303	February 7, 2023

Dataset S3 CRISPR

Related topics