About the ARepA category

sagunmaharjann · November 12, 2019, 5:12pm

Description: ARepA is an acronym for Automated Repository Acquisition, and is designed as a command-line tool to easily fetch 'omics data from multiple heterogeneous repositories and process them in a standardized way.
Its main features include, but are not limited to:

Gene ID standardization: i.e. all output gene identifiers can be set to be translated to a single convention. Supported identifiers include Gene Symbols, UniProt, UniRef, Entrez Gene, Kegg Orthologs, and more.
File updates on an as-needed basis: ARepA only reruns processes that are necessary to the building of a file. It keeps track of where you left off, so you don’t have to waste valuable computational resources!
File standardization: data is saved as a tab-delimited text format, and metadata is saved as a python pickle object. For some modules, you have automatically generated R packages as a final output.
Modular design: other repositories can be built on top of the existing architecture. ARepA can be used as an all-purpose data mining tool!
Currently, ARepA fetches data from seven repositories: Bacteriome, RegulonDB, STRING, BioGRID, MPIDB, GEO, and IntAct.

Citation

Daniela Börnigen, Yo Sup Moon*, Gholamali Rahnavard, Levi Waldron, Lauren McIver, Afrah Shafquat, Eric Franzosa, Larissa Miropolsky, Christopher Sweeney, Xochitl Morgan, Wendy S. Garrett, and Curtis Huttenhower* "A reproducible approach to high-throughput biological data acquisition and integration. PeerJ. 2015; 3: e791. (* contributed equally)

Topic		Replies	Views
MetaPhlaAn3 Genome List MetaPhlAn	10	2945	March 14, 2022
About the PICRUSt category PICRUSt	0	632	November 12, 2019
About the HAllA category HAllA	0	783	November 12, 2019
Problem creating a custom DB with KEGG HUMAnN	8	1562	August 3, 2020
About the Metaphlan category MetaPhlAn	1	824	July 27, 2022

About the ARepA category

Citation

Related topics