Data from: SARS-CoV-2 Infections - Gene Expression Omnibus (GEO) Data Mining, Pathway Enrichment Analysis, and Prediction of Potentially Repurposable Drugs
Author(s): Chaparala, Srilakshmi+, Iwema, Carrie L.+, Chattopadhyay, Ansuman*+ * Corresponding Author + University of Pittsburgh Author

This project describes a protocol to analyze publicly available transcriptomic data from the NCBI Gene Expression Database (GEO) using in silico approach. This involved identifying SARS-CoV-2 infection-induced differentially Expressed (DE) genes from human gene mapping and viral mapping, statistically enriched pathway associations, and potentially repurposable drugs for COVID-19 treatment. Further in vitro and in vivo studies are necessary to verify these results before clinical application.

The collection contains 17 datasets:

  • Pathway enrichment analysis results identified using Ingenuity Pathway Analysis (files SD1, SD3) and BaseSpace Correlation Engine (file SD2)
  • Predicted repurposable drug/compound list identified using BaseSpace Correlation Engine (file SD4)
  • DE gene list between SARS-CoV-2-infected vs mock control nasopharyngeal cells (using Partek Flow; GSE152075 data)
  • 6 DE gene lists between SARS-CoV-2-infected vs mock controls with different cell lines (A549, A549+ACE2, Calu3, NHBE) and/or under different viral load conditions (MOI 0.2 or 2.0) (using CLC Genomics Workbench; GSE147507 data). File names are in this format: series#_GEO identifier: Differential gene expression Cell Line _Virus name (multiplicity of infection amount) infected vs. Mock control
  • 6 corresponding gene lists from viral mapping (using CLC Genomics Workbench; GSE147507 data). File names are in this format: series#_GEO identifier: Virus name (strain) viral gene expression from infected (multiplicity of infection amount) Cell Line
Access via figshare

gene / drug / pathway lists
Accession #: 10.6084/m9.figshare.c.5007983

Access Restrictions
Free to all
Access Instructions
Download individual files via figshare
Other Resources
Data Type
Software Used
BaseSpace Correlation Engine
CLC Genomics Workbench
Ingenuity Pathway Analysis
Partek Flow
Dataset Format(s)
CSV (.csv)
Resource Type(s)
