Name: ARepA
User manual || Tutorial
Description: ARepA is an acronym for Automated Repository Acquisition, and is designed as a command-line tool to easily fetch 'omics data from multiple heterogeneous repositories and process them in a standardized way.
Its main features include, but are not limited to:
- Gene ID standardization: i.e. all output gene identifiers can be set to be translated to a single convention. Supported identifiers include Gene Symbols, UniProt, UniRef, Entrez Gene, Kegg Orthologs, and more.
- File updates on an as-needed basis: ARepA only reruns processes that are necessary to the building of a file. It keeps track of where you left off, so you don’t have to waste valuable computational resources!
- File standardization: data is saved as a tab-delimited text format, and metadata is saved as a python pickle object. For some modules, you have automatically generated R packages as a final output.
- Modular design: other repositories can be built on top of the existing architecture. ARepA can be used as an all-purpose data mining tool!
- Currently, ARepA fetches data from seven repositories: Bacteriome, RegulonDB, STRING, BioGRID, MPIDB, GEO, and IntAct.
Citation
Daniela Börnigen, Yo Sup Moon*, Gholamali Rahnavard, Levi Waldron, Lauren McIver, Afrah Shafquat, Eric Franzosa, Larissa Miropolsky, Christopher Sweeney, Xochitl Morgan, Wendy S. Garrett, and Curtis Huttenhower* "A reproducible approach to high-throughput biological data acquisition and integration. PeerJ. 2015; 3: e791. (* contributed equally)