The COVID-19 pandemic has presented confounding contradictions: While the disease kills elderly individuals and those with some preexisting health conditions, it also has caused severe symptoms or death in some younger adults.
And then as more people test positive, more people with mild, atypical symptoms, or no symptoms at all pop up— seemingly healthy people who might inadvertently spread and amplify the disease.
Though environmental or lifestyle factors may contribute to these dramatically different outcomes, it seems increasingly likely that researchers and clinicians will need to dig into the core of our own biology to fully explain this variability.
"Given that there is such remarkable variability amongst people, there really is the possibility that there will be strong genetic effects in the [human] genome, and we need to determine that absolutely as rapidly as we can," noted David Goldstein, director of Columbia University's Institute for Genomic Medicine. "We, unfortunately, have plenty of samples to work with to build up a large genetic study."
That realization has members of the genomics community gearing up to do hypothesis-free and hypothesis-driven research—both locally and within collaborations springing up, virtually, across borders—to search for potential host contributors to COVID-19 susceptibility, in the hopes of finding genetic markers for the virus's progression and biological insights that might point to potential drug targets.
For one of these efforts, known as the COVID-19 Host Genetics Initiative, investigators from more than two-dozen existing studies, research centers, or biobanks in Europe, the U.K., Asia, and North America are teaming up to share data and analyses done on COVID-19 patients in a "bottom-up" manner.
In practice, that means investigators at participating sites are tapping into their own expertise and resources to assess host genetic contributors to infection—from analyzing genetic features in biobank participants who happen to become infected with COVID-19 to prospectively collecting samples from infected patients for genotyping, genome sequencing, RNA sequencing, immune assays, or other approaches.
"There are some people who will need to start a new collection [for the COVID-19 Host Genetics Initiative]," explained Andrea Ganna, a European Molecular Biology Laboratory group leader in human genomics with the University of Helsinki's Institute for Molecular Medicine Finland. "For example, in Italy there are no big biobanks."
"Some other people, through the global biobank network that already exists, will use these networks to identify cases in their biobank and then send in the data," he added.
Ganna, who comes from a region in Italy that has been hard-hit by the COVID-19 pandemic, is one of the investigators behind the burgeoning host genetics collaboration—an international group that has come together in a matter of weeks and continues to grow.
"It was three days from the idea to launching the website," Ganna said in a call last Wednesday, "and we've already got a lot of people interested—it was very fast."
At his own center in Finland, investigators plan to genotype germline samples from individuals infected with COVID-19, including patients treated at several hospitals in Italy.
Collaborators at the University of Siena have already secured ethics approval to start consenting and collecting samples from its COVID-19 patients there, Ganna explained, and still other hospitals in northern Italy are seeking similar approval. Once the initiative is fully underway, a center in Siena will take in patient samples from 11 participating hospitals to do the host DNA extraction step before shipping that DNA to FIMM for genotyping.
Ganna is hoping that a "quick and dirty" genotyping analysis could lead to patient management, diagnostic, or drug repurposing clues, though researchers at FIMM have not ruled out the possibility of doing more in-depth genome sequencing, exome sequencing, or immunological assay analyses on the samples from Italy in the future.
In the meantime, the investigators are reaching out to teams elsewhere to find strategies for harmonizing and analyzing data as it is generated on patients in other parts of the world. They are also drawing from their experience so far to help others come up with consistent informed consent, IRB, and sample collection protocols.
"The idea is that if we bring people together … and people can brainstorm, then we can harmonize across different collections," Ganna said. "That's really the key, harmonization."
Researchers at Stanford University were among the first to sign on to the COVID-19 Host Genetics Initiative effort.
Though the details are still being hammered out, that team expects to tap into a clinical genome sequencing service that is about to go live at Stanford to sequence samples from COVID-19 patients treated there, explained Carlos Bustamante, a population genetics, biomedical data science, and genomics researcher at Stanford.
Researchers there are particularly interested in understanding if, and how, SARS-CoV-2 viral load might impact patient outcomes, he noted, since preliminary work out of China suggests that the sickest COVID-19 patients may have particularly high levels of the virus.
The group is also building on the extensive experience gained in prior population and disease genomic research to search for informative common polymorphisms, protective loss-of-function mutations, and other genetic factors that may coincide with individuals' propensity for SARS-CoV-2 infection or progression.
"One of the things that we've learned from human genetics is that there are extremes at the human phenotype distribution, and pathogen susceptibility is no different," Bustamante said. "There are going to be people who are particularly susceptible and there are going to be those who are particularly resistant [to SARS-CoV-2]."
"For the common polymorphisms — which is what I think we would initially look for — the low-pass sequencing would work great as well as genotyping directly on arrays," he said. "And then we can always throttle up the coverage to get at rarer and rarer variants."
The Stanford team will likely use additional tools such as RNA sequencing to dig into host immune responses and viral sequence details, and Bustamante noted that assays to test infection features such as viral load could evolve relatively quickly out of growing genomic datasets.
"One thing we learned from 1000 Genomes, GTEx, and other [large genomic] projects is that putting the data in the hands of analysts is the fastest way to turn things around," he added.
Biomedical data science researcher Manuel Rivas, who is leading the Stanford arm of the COVID-19 Host Genetics Initiative, noted that analytical resources that have already been developed for other research may prove useful for the coming COVID-19 patient analyses.
His own lab has already contributed tools for analyzing host genetics, and members of the International Common Disease Alliance have established a communications channel for investigators working on plans for analyzing patient data.
"Obviously the most challenging part right now is how do you start integrating that data in the setting where it is an emergency situation," Rivas said, noting that "the analysis plan is starting to be worked out."
Both Rivas and Bustamante credit Stanford collaborators such as Euan Ashley and Benjamin Pinsky with helping to move the work along.
"When the COVID emergency hit, we reached out to them and said, 'There's this real grassroot effort on an international level to aggregate data, and as the data science folks, we'd love to tap into our clinical enterprise,'" Bustamante said. "It's just been an incredible response from them. Even in the midst of everything that's going on we've been able to get an IRB in the works and get our clinical collaborators on board."
He suggested that still other centers in northern California may join forces on the host genomics work, though the idea of establishing a regional sequencing group with pooled resources is still in its early stages.
At McGill University in Quebec, Canada, meanwhile, human genetics, epidemiology, and biostatistics clinician investigator Brent Richards is part of a COVID-19 Host Genetics Initiative-contributing team that is putting together a province-wide COVID-19 patient biobank for genomic profiling with funding from Quebec's provincial government.
"We will collect DNA, RNA and other biological specimens on COVID-19 positive patients and those who have been tested, but are negative," Richards explained in an email. "We anticipate this biobank will contain samples and data from thousands of participants, which will provide evidence to improve clinical care during this pandemic."
Not everyone is looking quite so broadly across the host genome. Other teams have already demonstrated the potential for hypothesis-driven susceptibility studies.
Indeed, Columbia's Goldstein — who is not currently part of the COVID-19 Host Genetics Initiative, but has a long history of researching host genetic contributions to infections with viruses such as HIV, herpes simplex virus 2, or hepatitis C virus — believes targeted and largescale genomic studies will provide complementary insights into gene, pathway, and expression shifts that may render some individuals more or less prone to infection or severe illness from SARS-CoV-2.
He and his coauthors from Columbia recently put out a preprint paper that reached into published host gene expression datasets, drug response studies, and more to search for candidate compounds that appear to drive down host expression of TMPRSS2 — a gene that codes for a human protease enzyme protein that has shown to be used by both the SARS-CoV-2 and SARS-CoV coronaviruses as they infiltrate human cells.
"We had set up a paradigm to try to find transcriptional regulators of important disease genes," Goldstein explained, noting that they were able to quickly adapt this framework to search for regulators of the TMPRSS2 protein used for cell entry.
"The results were pretty striking, with really consistent effects for estrogen- and androgen-related agents on the expression of this protease that is required for viral entry," he added, noting that those results will need to be explored further, but provide hypotheses to test in the future.
Among other things, that team is interested in using gene expression data from patient diagnostic samples to see if the regulatory variants influencing TMPRSS2 levels may coincide with infection or disease severity.
"We definitely want to see genome-wide, unbiased exploration of the genome, too, because there might well be things there that we don't know about," Goldstein said, noting that Columbia is in the process of organizing a biobank effort to collect samples from patients treated for COVID-19 at that center.
Still others are looking at the potential influence of the human leukocyte antigen (HLA) region or other sources of immune variability from one individual to the next. As GenomeWeb reported today, for example, a Rockefeller University-led team intends to tap into data on individuals with inborn errors of immunity to try to understand the dangerous immune complications that occur in some young adults infected with SARS-CoV-2.
Finally, as ever more information comes out for resolved COVID-19 cases, other less obvious sources of infection variability are being examined and proposed. In a preprint study posted to medRxiv in mid-March, a team from the Southern University of Science and Technology and other centers in China brought together ABO blood group data for nearly 2,200 individuals treated for COVID-19 at three hospitals in Wuhan and Shenzhen, including 206 fatal cases in Wuhan.
Together with blood group data for tens of thousands of unaffected individuals from the broader populations in Wuhan and Shenzhen, the patient profiles hinted at a potential protective effect for the O blood group. In contrast, SARS-CoV-2 infections appeared to be significantly more common in individuals with blood group A, the team reported.
Though there is still a great deal to be learned on the host side of the all-too-common SARS-CoV-2 infection, the speed with which genetics researchers have responded to the COVID-19 pandemic, the tools already available from past genomics efforts, and the data sharing underway are encouraging, according to experts in the field.
"You have a whole generation of analysts and scientists who are really prepared to jump into this," Bustamante said. "The one experience that we have from the human genomics world is the importance of reproducibility and the importance of doing things to scale and the importance of data sharing."
Because data are being generated quickly in an emergency setting and much of studies are not yet peer-reviewed due, he added, "the best rule here is cross-validate, cross-validate, cross-validate."
This story first appeared in our sister publication, Genomeweb.