COVID-19: How Understanding Data Helps Us All

COVID-19 has had a profound impact on all aspects of our global society since its first appearance in Wuhan, China on November 17th, 2019. Governments, public and private organizations, and the average individual across the globe have felt the impact of this virus. Information about COVID-19 has evolved rapidly and newly discovered aspects pertaining to the virus are constant. We here at Rosoka Software understand how difficult these times have become and are working hard to ensure that we are providing the most accurate and in-depth analysis of COVID-19 data as possible.

COVID -19 Within The Data

As COVID-19 reaches a pandemic level globally many countries and governments are working diligently to handle this crisis. While this work unfolds, data creation related to the Coronavirus has become abundant and the need to collect/understand all these data sources are a necessity. Rosoka’s universal NLP engine, which processes data equally in over 200 languages, provides a solution to this problem. Upon deployment, Rosoka’s NLP engine detects all pertinent information from a data set using a combination of advanced algorithms, machine learning, rule-based entity extraction designed around grammar and pragmatics, and dynamically constructed lexicons. Users can leverage the engine’s metadata output for deeper insight into a designated target, as seen below.


Within the above extraction we see the target, COVID-19, possible spread to parts of Africa, as well as specific locations of possible infections, date of virus arrival, history of spread, parties involved, and lockdown protocols. All the extracted information gathered by these processes gives the user additional data points and insight to act against COVID-19. Additionally, the above extraction highlights Rosoka’s anaphoric entity resolution of various forms related to COVID-19 (ex: Coronavirus, COVID -19, Novel Coronavirus, 2019-NCOV, SARS COV-2). Rosoka’s engine is able to learn various forms of a single entity and normalize them to a single point. This process ensures that all relevant data is connected, providing the user with a more robust view of their target, not only at the document level but across their entire data corpus.

Examining How We Talk About COVID-19

As Rosoka examines the discourse surrounding COVID-19, we like many others understand the difficulty in finding reliable information about the virus. Handling large volumes of data presents its own inherent challenges but when it’s compounded with propaganda, writer bias, and deceptive news the task becomes daunting. Rosoka, being a robust text analytics platform, leverages sentiment analysis to handle this issue. Rosoka provides meaning to data with scientific accuracy based on psycholinguistic approaches giving insight into data’s hidden aspects. Rosoka sentiment analysis functions by using four sentiment metrics of mood, polarity, aspect, intensity. Along with a salience score to identify the key items within text these metrics examine the state of mind of the author and emotional control, positive or negative sentiment of the text, and how motivated the audience is to react.

Below is a news article discussing COVID-19 and Irish Minister of Health Simon Harris. Within this document, Rosoka identifies the individual sentiment scores of all extracted information, highlighted is COVID -19’s score. As expected, COVID -19 has a strongly negative polarity, aspect, and mood, which properly conveys the danger of the virus. Along with a maximum intensity score the reader should understand the danger of this particular entity being discussed by the writer.


Additionally, Rosoka leverages the sentiment of the entire document, below are the scores related to the same Simon Harris document. The document depicts a moderately negative topic in a calm tone that should leave the reader with a slightly positive outlook with low activation. This coupled with the sentiment score of individual elements allows users to better understand the key entities of any document as well as the underlying intent of the data source. Understanding the material at this level coupled with a user’s subject matter knowledge allows them to better answer the question related to data reliability.



COVID -19 Connections To Others

Understanding the spread of COVID -19 has become the top priority of governments and citizens alike across the globe. Making the connections between people, places, and transmission both current and future has become key to fighting this outbreak. Rosoka by means of a PSO relationship triple makes these connections across all supported languages. Additionally, this outputted data can be leveraged in a number of different ways allowing the user to examine connections not only at the document level but across an entire corpus.

Below is a visualization based on salient items and their connections to one another. This simple visualization allows the user insight into how a single highlighted entity connects to other high salient entities within a document or across an entire corpus. Additionally, users are able to access all of the extracted relationships within a corpus, along with attached and connected metadata, both inside this tool or exported to another. Each PSO relationship has a specified predicate tag which highlights the type of relationship established. This provides the user with specific knowledge of that target entity’s relationship to other entities. Below we see that COVID -19 has a connection to multiple PLACE, PERSON, and FACILITY entities. We can also see how these entities spread and are connected to other high salient entities using this visualization. By examining this relationship chart and the text, which is the first screenshot above, we see COVID -19’s potential spread from China to South Africa with a screening checkpoint site set up at OR Tambo Airport started on Wednesday March 23, 2020.

Understanding that South Africa is a possible spread site for COVID -19 allows governments and other organizations to plan and send resources to combat the spread of the virus. This information, coupled with the other previously discussed aspects, give users a cutting-edge tool to take decisive action. We here at Rosoka hope all stay safe during this difficult time. Our team will continue working hard to provide the most advanced text analytic tool on the market to solve this problem as well as many others.

Related Posts

Using Sentiment Analysis to Explore Text Data

Increasingly, companies are concerned with the image and perception of people, organizations, or...


Rosoka NLP vs. Spark NLP

What is Natural Language Processing

Natural language processing (NLP) is a type of artificial...


Rosoka NLP vs. spaCy NLP

How to Choose the Right NLP Software

Natural language processing (NLP) allows for the generation of...