In the months since SARS-CoV-2 emerged and began spreading, researchers have been attempting to collect new data to understand the virus and disease it causes.
But they have also been digging into sequence and gene expression data that was originally generated for other purposes to go beyond what is known about the virus and try to fill in some of the many remaining unknowns.
In particular, several teams started trying to untangle SARS-CoV-2 infection susceptibility and symptoms by studying ACE2, a gene that codes for the angiotensin-converting enzyme 2. That cell surface protein, which has been studied in the context of blood pressure regulation, heart disease, and other conditions, is well known for acting as a receptor for the original SARS coronavirus (known as SARS-CoV or SARS-CoV-1).
In 2003, for example, researchers at the Brigham and Women's Hospital and other centers in Massachusetts demonstrated that the "S1" domain of the SARS-CoV spike protein binds to ACE2, using the enzyme as a receptor as it wrangles its way into human cells.
In a Nature paper published at the time, the researchers noted that "a soluble form of ACE2, but not the related enzyme ACE1, blocked association of the S1 domain with Vero E6 cells," while antibodies targeting ACE2 interfered with SARS-CoV replication in the Vero E6 African green monkey kidney cell line.
From these and other experiments in human cell lines, the authors concluded that the ACE2 enzyme "is a functional receptor for SARS-CoV."
In contrast, subsequent studies on the coronavirus behind Middle Eastern Respiratory Syndrome, or MERS, suggest it uses a dipeptidyl peptidase 4 enzyme encoded by a gene called DPP4 rather than ACE2 to enter human cells.
Interest in ACE2 was renewed earlier this year, when independent teams led by investigators at the Chinese Center for Disease Control and Prevention, the University of Minnesota, the National Institute of Allergy and Infectious Diseases, the University of Gottingen, and other centers reported that the protein appears to play a similar receptor role for SARS-CoV-2, the COVID-19-causing coronavirus that emerged late last year and has since spread around the world.
Now, though, the research community is armed with a large — and growing — collection of genomic data from healthy control individuals and population study participants. And that means that several teams have been busy interrogating ACE2 variation, its potential impact on SARS-CoV-2 interactions, and the receptor's tissue-specific expression in data from thousands or hundreds of thousands of individuals, even as clinical samples from COVID-19 patients are still being collected.
Much of the research done so far has been shared in preprint form, meaning the specific findings of these studies may change somewhat as the papers go through peer review. Even so, the strategies used offer a glimpse at the kinds of information that can be gleaned from available ACE2 sequence and gene expression data as investigators prepare to analyze samples obtained during the ongoing COVID-19 pandemic.