What we know about Coronavirus

By Susan Hu

Insights from Dr. Ching-Yung Lin, Founder and CEO of Graphen, leading AI researcher on Coronavirus genomic evolution

As of March 30, 2020, based on the 2,724 reported genome sequences of the COVID-19 virus (SARS-CoV-2) from worldwide labs, Graphen Inc., in conjunction with Columbia University, aligns the genome of viruses, looks for the canonical form of each gene location, and identifies the exact variant(s) of a virus. 1,388 different strains have been found from these 2,724 viruses.

We asked Dr. Ching-Yung Lin, Founder and CEO of Graphen to share insights on what we know about the Coronavirus to date.

Q: Can we trace the earliest US Coronavirus back to the original virus in Wuhan?

A: Currently, over 1,000 sequenced virus strains worldwide can be traced back to a prototype virus. This prototype virus is the same as the virus found in the earlier cases in Wuhan. But this does not mean that the virus originated in Wuhan. It is also possible that virus originated way before - only the first big outbreak started in Wuhan.

A small number of two or three variants of the virus have been seen elsewhere in China. If the number of mutations is large, the transmission time is usually longer. However, because this is a very small number due to data collection, it may not be accurate, therefore we cannot assume anything.

Q: If the Coronavirus did not originate in Wuhan, why didn't the virus outbreak happen in its place of origin? Has it gotten worse in Wuhan?

A: There are two possibilities.

If a human is infected with this virus from the beginning, it will spread immediately and stay in Wuhan. Wuhan became the first epidemic center. In this case, if there was an outbreak before elsewhere, it was just not detected.

Since the nearly 300 virus strains from the U.S. today do not include all mutations, it can be ruled out that the earlier flu in the U.S. was the Coronavirus. Yesterday I heard some people said that Coronavirus originated in Italy. Due to the lack of data from Italy, I'm not able to comment on this.

As for the second possibility, the Coronavirus got worse after arriving in Wuhan. The reason why this virus is so powerful is that a newly released study traced a small segment of the virus gene, which seems to make the virus resemble the original cells of the human body when attached. Therefore, human immune cells do not attack the virus at first. Given the chance to grow in the body, the virus then waited for the opportunity to spread. I am now studying this small piece of gene and seeing how it mutated before. In fact, Coronavirus only has 30,000 gene spots. When replicated on the host, each of these points may mutate randomly. Sometimes, a single point of mutation may cause the strain's mechanism to catch cells from six to seven places, or less than five places.

Q: ‬ What about Europe?

A: When we first started the analysis, there weren't much data from Europe. We are now getting a lot more from there, about half of the virus strains, five or six hundred, from Europe. Coronavirus is spreading so fast in Europe, the Netherlands, the U.K., Germany, Iran, Italy, etc. The virus in Europe then spread to South America.

Q: Some Italian media reported a few days ago that "Patient Zero" was found in Italy. What's your opinion?

A: There has seen controversy over the Italian "Patient Zero". From the 1793 virus strains I have seen, 15 of them are from Italy. It is important to note that Patient Zero in a country is often irrelevant to the infection status in a country. Many cases in one country later shows up through other channels. It doesn't make sense to track down one patient in a country. In the case of Italy, there were three strains of the virus in Rome, one of which was too short to detect. One of the strains detected on January 29 (57 year old male) and one on January 31 (66 year old female), are of the same genome.

The parent virus of this strain appeared in Australia and Hong Kong, while the ancestor was the same as the virus from Wuhan. Its offspring first appeared in Hangzhou on January 25, and the other offsprings were all over the world, including Georgia, Washington, New York, France, and the U.K. So this "Patient Zero" child of the same gene generation, so far has not been not seen in Italy. The other Italian virus strain from January (the case of a 66 year old female) has no mutation. A virus circulating in northern Italy may have 39 viruses of the same genome, which appeared in Italy, Czech Republic, the U.K., Brazil, Denmark, Georgia, Ireland, the Netherlands and Utah from February 2. The parent virus of this strain includes two strains tested in Shanghai on January 28 and in Munich on the same day (and appeared in Finland on March 1). Its ancestor was the prototype virus in Wuhan.

Q: Some said that the Coronavirus will become less dangerous after several generations of mutations. But you've mentioned virus mutations make them more aggressive. Could you please explain?

A: The genetic variation of the virus is completely random. The result of mutation could be stronger or weaker. If left unchecked, it is possible to develop into a super virus. Just like this original virus may have just happened to encounter an important mutation and later on became more aggressive.

Q: Why is Coronavirus so contagious? Can you find the cause from genetic research?

A: I saw some previous research papers on SARS. In the non-structure part of the first half of the Coronavirus gene, the function of some paragraphs was identified. For example, the 16th paragraph of nsp16 is related to the reproduction speed of the virus. As long as there is a mutation in this paragraph, the reproduction rate of SARS will be 10% of the original rate. In fact, on the contrary, will it be the mutation here that caused COVID-19 to accelerate? In addition, government departments mentioned recently that the virus is unusual. Some patients have antibodies, but the virus is still in the respiratory tract. Perhaps as I observed in the past few days, this virus in this paragraph seems to make its RNA camouflage similar to human cells. So our immune system won't attack it. The virus stays alive in the respiratory tract and continues to infect others.

Q: Is it possible that AI can help us to establish an early warning system to promptly warnings when reorganization occurs, and use 3D simulation programs to predict the possibility of protein changes?

A: Currently, most mutations are concentrated in the 1a region. At the moment, the impact on the S, M, and E regions is not much. In the future, rapid analysis of the data through AI will certainly grasp the mutation status and predict protein changes.

Q: Graphen has access to a lot of data. Is it publicly available? Is it from all over the world?

A: The original data was uploaded and shared by research institutions around the world. As long as it is a research institution, you can apply for an account to download. However, after the data is downloaded, complex analysis is required to obtain results similar to ours. That's why there are not many findings in the world yet.

I think all countries capable of doing such analysis have contributed to the data collection. What we are seeing is that scientists from all over the world are working hard to be open and contribute information. China, the U.S., the U.K., the Netherlands and other countries have contributed more than one hundred datasets. Iran started sharing info about two weeks ago. One country with the widespread of Coronavirus that we don't have a lot of data from is Italy. They only shared a few strains at the beginning and then stopped. I don't know if the pandemic is too critical, there is no time for genetic sequencing, or there are other considerations, it is not clear.

Q: Is there any other related research on the gene protein of the Coronavirus?

A: A few days ago I saw news reports from Northwestern University and UC Riverside mentioning new potential targets for the COVID-19 virus. They mentioned in the report that two ORF1a proteins in the first part of the genome are the key to why the virus is so contagious.

"These proteins modify the genetic material of the virus to make it look more like the host (human) cell RNA. This allows the virus to hide from the cells, giving it time to multiply. If a drug can be developed to inhibit nsp10 / nsp16, the immune system should be able to detect the virus and eradicate it faster."

According to a previous article in 2009, many SARS drug targets are on this section of tube-replicating enzyme or unstructured peptone ORF1.

"Rapid upsurge and proliferation of SARS-CoV-2 raised questions about how this virus could became so much more transmissible as compared to the SARS and MERS coronaviruses. The proteins of the coronavirus are mutating, although not very fast, but these changes may contribute to the virus virulence."

The similarity between nsp15, SARA-CoV-2 and SARS-CoV is 88%, and the similarity with MERS is 51%.

Their papers on nsp10 & nsp16 have not been published. However, it seems that this large segment should also be monitored. We will analyze this large section of data next.

Q: Is the Coronavirus affected by different climate? For example the temperature or humidity of the region?

A: Over the last 10 years since SARS, some research papers have pointed out that there are several segments in the genome of SARS that are prone to temperature-sensitive mutations. These mutations may be the reason why SARS virus stopped after summer. The data we are currently seeing is indeed there are more cases in Europe or the northern part of the U.S. There seem to be fewer cases in warmer regions of the world, but this is not enough to predict whether this Coronavirus will also significantly affect its function as the temperature rises. The effect of humidity has not yet been observed.

Q: Is it possible to predict the effects of genetic mutations by combining patient data and viral genetic data?

A: The information we have at present only shows the patient's age, gender and testing location. Although some patients will be marked as hospitalized, recovered or passed away. However, such information is too limited to make any correlation or assumption yet. More data will be very helpful for the research.