A study of the SARS-CoV-2 genome has found that the virus can be classified into six major types, which are characterized by 14 signature single nucleotide variations, and that one type in particular has evolved into the dominant, disease-causing strain.
In a study published on Thursday in the Proceedings of the National Academy of Sciences, researchers at the Academia Sinica in Taiwan said they used the complete sequences of 1,932 SARS-CoV-2 genomes to perform various clustering analyses, which consistently identified six types of the virus. They then identified 13 signature variations in the form of SNVs in protein-coding regions and one SNV in the 5' untranslated region (UTR), and then validated the six types and their underlying signatures in two subsequent analyses of 6,228 and 38,248 batches of SARS-CoV-2 genomes.
To date, type VI has become the dominant viral strain and is characterized by four signature SNVs, the researchers said. The increasing frequency of the type VI haplotype in the majority of the submitted samples from various countries suggested a possible fitness gain conferred by the strain's SNVs. Further, they added, the fact that strains missing one or two of these signature SNVs failed to persist implied possible interactions among these SNVs, suggesting that they may become an important consideration in SARS-CoV-2 classification and surveillance.
In one analysis, the researchers found that the proportion of the SARS-CoV-2 strains from the six types was dynamic and changed with time and geographic regions. For example, types I and II emerged around December 2019 — they were first observed in China on Dec. 26 and Dec. 30, 2019, respectively. These two types were the dominant groups before mid-February 2020 but became the minority groups after March 2020, when type VI took over. Further, the first two strains that were observed outside of China — in Australia on Jan. 3 and Thailand on Jan. 5 — belong to type I, illustrating that the international transmission of COVID-19 can be traced back as early as Jan. 3, the researchers said.
Types III and IV were the only two types that were first observed outside of China. Type III was first seen in the UK in February, and type IV was first observed in the US in February. Type V was first detected in China in January and represents as a minor population. Type VI was first observed in China on Jan. 24, and was transmitted to other continents and increased its frequency after Feb. 20.
Type VI has four signature SNVs — C3037T, C14408T, A23403G, and C241T in the 5' UTR. Since the role of this last SNV is still unclear, the researchers focused their attention on the other three SNVs.
Their analyses showed that these SNVs, when carried simultaneously, conferred a strong fitness gain on the viral type. Further, the initial type VI strain carried certain non-signature SNVs in each country that were lost rapidly. For example, the initial type VI virus in the US had three SNVs in addition to the signature SNVs, but these were quickly lost. Up to 52 additional SNVs occurred in type VI virus in the US but most of them disappeared. The investigators observed similar trends in other countries as well.
Cumulatively, the signature type VI SNVs had a haplotype frequency of nearly 60% among all of the reported genomes in the dataset of 6,228 samples. This frequency greatly exceeded the frequency of 9.23% in strains without any of the 13 signature SNVs in protein-coding regions, the researcher said.
"The persistence of the signature SNVs may imply a fitness gain or simply a founder effect," the authors wrote. "However, the multiple lines of evidence presented here favor a positive selection. Nevertheless, the biological implication of each variation and their interactions remain an interesting topic to be explored."
This story first appeared in our sister publication, Genomeweb.