DATA ANALYTICS FOR COVID-19 PANDEMIC USING

Transcription

International Journal of Scientific & Engineering Research Volume 11, Issue 7, July-2020ISSN 2229-55181231DATA ANALYTICS FOR COVID-19PANDEMIC USING MACHINE LEARNINGAnjum Sheikh, Dr. Sunil Kumar, Dr. Asha AmbhaikarAbstract— The whole world is fighting the coronavirus pandemic. The increasing numbers have imposeda challenge on the Governments to initiate necessary steps for combating the effect of coronavirus onthe people. Data analytics can be very helpful for knowing the requirements that will arise in the futureand at the same time help to know the measures to be taken to deal with the difficult situation.Techniques like machine learning increase the speed of analytics process and therefore be helpful inproviding pace to the planning process of pandemic.Index Terms— Covid-19, Predictive analytics, prescriptive analytics, diagnostic analytics, descriptive analytics, machinelearning, big data—————————— ——————————IJSER1. INTRODUCTIONA pandemic is a term used to describe a contagiousdisease that spreads from an infected person tohealthy person simultaneously in many countriesaround the world. At present the whole world isfighting with the Coronavirus pandemic which hasaffected more than 212 countries and territories. Thevirus gets transferred with the droplets generated byan infected person during sneezing or coughing. Anyperson who comes in contact with the infected personcan get the disease due to accidental inhale of thesedroplets. The number of infected persons is increasingrapidly and similarly thousands of deaths are beingreported everyday at the global level. There is nomedicine available and the development of vaccinebeing in the experimental phase, dealing with thispandemic has become a worldwide challenge. One ofthe options available with us for facing the situation,is following the guidelines issued by the Governmentlike washing hands, social distancing and wearingmasks. The severity of pandemic has forced theresearchers to explore solutions for avoiding spreadof infections and predict the requirement of medicalfacilities that will be required to treat the ————— Anjum Sheikh is currently pursuing PhD in electronics &communication university at Kalinga University, India,anjnaznus@gmail.com Dr. Sunil Kumar is head of electrical and electronics engg atKalinga University, Raipur. Dr. Asha Ambhaikar is Professor and Dean students welfare atKalinga University, Raipur.Big data analytics along with tools like Machinelearning (ML) can be useful in facing the currentchallenge by analyzing the COVID-19 dataset topredict the disease spread and estimate the growth innumber of patients. This analysis will facilitate thearrangement of health care facilities according to thefuture demands. Machine learning algorithms likeSupport Vector Machines (SVM), Feature extraction,linear regression etc. can play a critical role inmitigating the impact of Coronavirus.2. IMPORTANCEOFBIG DATA ANALYTICSFOR COVID-19Big data is a term that can be used for a large amountof structured and unstructured data that can bemined for gaining meaningful insights. It is a datathat exceeds processing capacities of a single machine.The ongoing digital transformations and newlydeveloped algorithms have played a significant rolein handling the storage and computing requirementsof the voluminous data. According to a researchwork given in [2] the Big Data consists of threeattributes volume, velocity and variety which aregenerally referred to as 3V’s. The concept of 3V’s wasextended to 9Vs or 3 2 Vs by adding more attributesto it depending on the applications or purpose ofanalytics. The 3Vs of big data can be related toCOVID-19 as follows:Volume: As the disease has affected billions of peopleall over the world with numbers increasing everyIJSER 2020http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 11, Issue 7, July-2020ISSN 2229-5518day, the dataset of the Covid-19 pandemic is huge.The data of the pandemic at the global level and alsofor some big countries is enormous to be handled bythe traditional databases. The volume of data hasbeen increasing everyday with the increase in thenumber of affected people.Variety: Big data is able to handle variety of data likethe structured and unstructured data. Structured datain the form of texts and values are available inCOVID-19 dataset that indicate number of infectedpatients, active, recovered and deceased along withthe details of their country or region and date ofobservation . The COVID-19 data also consists ofunstructured data in the form of X-ray and CT imagesthat are used by the medical practitioners forstudying the differences in the pattern of these imagesas compared to normal people or the ones sufferingfrom diseases like flu and pneumonia. Another sourceof data is the social media platforms that includeresponses of the people during pandemics.1232techniques are descriptive, diagnostic, predictive andprescriptive. Descriptive analytics uses statisticalanalytics for the given database to derive possibleopportunities while diagnostic uses the past data tofind the reasons for certain events. Predictiveanalytics can be used to forecast certain events whichcan then be used to plan accordingly to achieve ourtargets. Prescriptive analytics uses the power ofdecision science to select the best among all theavailable models to maximize chances of success forthe targets.The color coded world and country maps that displayaggregate number of patients in various categorieslike active , recovered or deceased is an example ofdescriptive analytics. It uses statistical charts todisplay the changes in numbers of patients in variouscategories which can be plotted on the basis of dataobtained daily, weekly and monthly. Similarly thespread of the virus, their peak periods and impact canbe indicated according to three levels that includeregional, country and continent. These visualizationpatterns only reflect changes in the current trend ofinfection that can be utilized by the analysts toperform diagnostic or predictive analysis. Diagnosticsdata analytics have efficiently been used by somecountries like Taiwan to control the spread ofpandemic. The team of Taiwenese officials carried outa detailed mapping of the infected persons todetermine their source of infection. This database wasintegrated with the immigration and customsinformation to determine their travel history for thelast two weeks. The health information collected fromall the international travelers was helpful in knowingabout their contacts after the travel and also aboutany symptoms of the infection observed by them.Data processing and analytics facilitated betterinformation, enabled faster and accurate decisions tocombat the spread of virus in Taiwan.IJSERVelocity: It is used to measure the speed of datageneration. As the virus spread is occurring very fastthe real time data of pandemic is arrivingcontinuously. The wide usage of smart phones,laptops, tablets and other digital devices has enabledfast inflow of data.VolumeVarietyVelocityBigDataFig. 1. 3Vs of Big DataResearchers all over the world have adopted differenttechniques of Big data analytics to know theimportant measures for facing the challengingsituation. The four types of Big data analyticalPredictive analysis is one of the analytical techniquesthat can play a significant role in knowing thedemand of medical facilities by studying the rate ofgrowth of infection in a particular area. It can be usedto develop models for different scenarios like bestcase and worst case that can be adjusted according tothe real time situations or change in data. Thisinformation can be utilized to envisage the demandfor health care facilities like personal protectiveequipment (PPE) kits, ICU beds, ventilators,medicines, quarantine facilities, ambulances services,testing labs and number of medical as well asparamedical staff. Predictive analytics play a crucialrole in identifying Coronavirus patterns by observinghealth records of patients and determination ofhotspot areas. The outcomes of analytics can be usedIJSER 2020http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 11, Issue 7, July-2020ISSN 2229-5518to suggest plan of action for the quarantine andpreventive measures. This type of analytics can bepowerful while making efforts to handle thepandemic situations by identification of factorsresponsible for the quick spread of infection. It canthus play a great role in recognizing treatmentpatterns, therapies and development of vaccines.The prescriptive analysis that helps in selecting thebest possible solutions can be helpful in decidingpolicies for social distancing, quarantine andlockdown. The incubation period of the virus being 514 days the infected person does not show anysymptoms and leads a normal life. Without beingaware that he/she has become a potential carrier ofthe disease the infection spreads to many people.Researchers and data scientists after analyzing therate of virus spread in severely affected countriesfound that if people are allowed to stay home theywould not come in contact with others and this can behelpful in combating the infection. The suspects or theclose contacts were therefore sent to quarantine inwhich they would be away from the social lifethereby reducing the chances of spreading infectionto others. National lockdown was another preventivestep enforced by many countries to handle thedifficult situation. The imposition of lockdown putlimitations on movement of people outside theirhouses, allowing access only to essentialcommodities. Though the lockdown phase hit theeconomic development of the countries but thefavorable part of is that rate of virus spread could becontrolled that has saved many lives.various types. The dataset will consist of geographicinformation like country, state, province or city.Another set of important information for analyticsincludes kind of symptoms seen in the patients alongwith its severity levels observed depending on theirage and gender. Apart from this information of datesof onset of disease, testing, domestic or internationaltravel and hospitalization can play a key role in theanalytics process. The advantage of using MachineLearning is its high speed, accuracy and simpleoperations. The data scientists working with MLmodels feed data to the machine and an appropriatealgorithm is developed by the machine. The accuracyand high speed of the ML models is beneficial infinding preventive measures; accelerate the process ofresearch for finding drugs to cure the disease andmaking arrangements for the medical facilitiesaccording to the outcomes of the analytics.IJSER3. ROLEOFMACHINE LEARNING1233IN DATAANALYSIS OF CORONAVIRUS PANDEMICMachine learning algorithms allow the machine toacquire knowledge from the received data andperform analysis to derive solutions for the givenproblems. Pandemics have been always a seriousthreat to the world. The world has witnessed anumber of pandemic and the current Covid-19 is notthe last one. Data scientists are using MachineLearning approaches for the big data analytics to fightthe pandemic. This section will discuss some of thesolutions provided by Machine Learning for facingthe pandemic scenario.The authors in [7] during their research havementioned that the large data obtained from thepatients can be analyzed using machine learning. Asshown in fig 2 the data to be collected can be ofIJSER 2020http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 11, Issue 7, July-2020ISSN 2229-5518Fig 2: Application of Machine Learning for Fighting CoronavirusPandemicML approaches provide valuable support inidentifying the risks of infection and ineffectiveness oftreatment method. The available data set can be usedto predict the causes of infection like unhygienicpractices, social interaction and climatic changes.Similarly the classification on the basis of age orhealth conditions allows identifying the group ofelderly people or the people with ailments likediabetes, hypertension etc., by which special methodscan be adopted to protect them from infection. MLhas been used earlier to predict the outcomes oftreatment for diseases like epilepsy and cancer. Asthere are no medicines available for the treatment ofCovid-19, the doctors are trying different type ofmedicines to treat the disease. The data of varioustreatment methods can be analyzed using machinelearning to know the effect of medicines, any sideeffects of those medicines and response of the patientsbased on their age as well as health condition.vaccine but most of the health organizations havepredicted that arrival of vaccine will take a long time.Machine learning has been used in the past forhandling the outbreak of Ebola virus in past. Similarexperiments can be helpful in identification of drugand development of vaccine to face the currentpandemics by analyzing review of drugs.Machine learning models can be used to evaluate thepossibilities of virus contamination by interpretingthe interactions of people through social media andother such communication platforms. These modelscannot help in diagnosing but can be used by peopleto locate the nearby hospital. It can also be used toestimate the spread of infection and predict the totalnumber of patients in the upcoming weeks. It canused to track the travel history of a person for last fewdays to gather information of the places travelled bythe infected person and know the people who came incontact with him. These models can be used like anapp on mobile that will be capable of sending alerts ofcontaminated zones near you. Some of theGovernments have been using these apps to collectdata of the users like their name, gender, age, travelhistory and any symptoms of Covid-19 observed bythe user. This facilitates tracing of the suspects so thatthey reach for testing. The user of app will receivenotifications if any Coronavirus positive patient isnear to him for enabling the users either to takeprecautionary measure or leave the placeimmediately.IJSERThe novel coronavirus outbreak left the whole worldwondering with its intensity of transmission.Thousands of death has been reported all over theworld. Diagnosis of the disease is essential to providemedical assistance to the infected persons and isolatethem to avoid further transmission to others.Collection of medical samples for large populatedcountries is difficult. To avoid long waiting periodfor the medical reports due to increased burden onthe testing labs, Machine learning can be used for thedevelopment of some faster and cheaper methods fordiagnosing the disease by analysis of data related tothe symptoms of coronavirus infection. For example aface scanning machine can be used to identifysymptoms like fever. A similar system was launchedat a hospital in Florida that uses thermal scanningface camera to detect fevers and sweating. Suchcameras can be ideal to be used at grocery stores,hospitals and similar places which are visited by largenumber of people. Machine learning based smartwatches and wearable are used for monitoring healthparameters like body temperature, heart beat andbreathing rate. These smart watches with someimprovements can be used for detecting onset ofinfection and tracking recovery process. Both thegiven examples are under research and no effectiveresults are available but using them can be beneficialin handling the increasing number of patients.The medical treatments provided to the affectedpatients are taking place on trial and error basiseverywhere. A lot of research is going on to develop a1234The research work for the Covid-19 pandemic isgoing through a developmental phase. ThoughMachine Learning is playing an important role incontaining the pandemic situation but it has somelimitations. Machine Learning requires large andgood quality of data to produce the desired results.Unavailability of data has become one of the causesfor the slow development of medical research. Asmost of the hospitals and the health organizationshesitate to share the patient’s data due to security orprivacy concerns, a large amount of data is beingunderutilized. Machine learning does not work wellwith the few data points and the decisions orpredictions given by the system tend to be biased insuch cases. Moreover incomplete data and presenceof errors will produce inappropriate results. In thissituation the researchers may tend to take wrongdecisions based on the obtained results which canprove to be fatal for the people. To enable sharing ofdata the concerned organizations can removepersonal information of the patients and at the sametime the research groups should clearly specify theirobjectives of research along with an assurance ofavoiding misuse of data. Another risk while usingIJSER 2020http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 11, Issue 7, July-2020ISSN 2229-5518Machine Learning program is that immediatedetection of error may not be possible always. Oncethe problem is identified it requires lot of time andefforts to find the root cause and make corrections forit. The number of countries and people affected withthe Covid-19 pandemic is increasing exponentially.Machine learning can be used for the data fromdifferent geographic locations but standardization ofthe collected data at the global level is very difficult.4. MACHINE LEARNINGMODELSFORCOVID-19A research by MIT Sloan School of Management usedML and data analytics to figure out some solutionsfor the Covid-19 pandemic. The research focused onsome of the primary areas of the pandemic likeprediction of mortality rate, infection spread,ventilator requirements and testing facilities. Theteam developed a mortality risk calculator by usingthe data published by countries like Italy, UnitedStates of America and China. It used factors like age,gender, blood pressure, body temperature to analyzethe data related to infection, death rate, effects ofisolation and health conditions of patients in ICU.Another model called as DELPHI was developed bythe team for analyzing the spread of infection.DELPHI was based on the standard SEIR model thatclassifies the population into four categories:Susceptible, Exposed, Infected and Recovered [10].This model was helpful in estimating the virusinfections, changes in the requirements of medicalfacilities and deaths. An important concern for facingthe challenging pandemic was availability ofventilators for the critical patients. The patients withworse affects of infection need ventilators to increasethe oxygen levels of the lungs. In the countries wherethe number of infected patients was increasingrapidly it was difficult to meet the fluctuatingdemand of ventilators. A model developed by thisresearch team optimized allocation of ventilators tomeet the shortage by allowing sharing of resourcesamong the states. The team collected set of samplesand fed it to a ML algorithm for improving accuracyof Covid-19 testing procedures.Coronavirus affects the lungs that causes respiratoryproblems and may become cause of death for somecritical patients. Radiologists can detect the symptomsof Covid-19 by using computed tomography and Xray images of chest. Most of the emergency clinicshave x-ray imaging machines. Therefore using X-rayimages to detect coivd-19 can reduce the burden of1235testing labs. Some of the researchers have usedsupervised machine algorithm called as SupportVector Machine (SVM) for X-ray images of lungs todifferentiate between the healthy persons, pneumoniapatients and covid-19 patients [8]. SVM is generallyused for classification problems. It uses hyperplane toclassify the data points into two classes. Theautomated techniques like SVM provide betteraccuracies as compared to the traditional imageclassification methods. It can be used for largedatasets, saves time of the medical professionals dueto the high speed of the algorithm and thus facilitatesobtaining the results of the screening in less time.Radiologists can therefore use SVM on the images toclassify infected lungs of Covid-19 so that an earlydetection of lung damage signs can be done whichcan reduce the delay in providing medical attentionto the affected people. The technique will also beuseful in observing the changes during the recoveryprocess [9].Regression analysis using Machine Learning is aprominent method to make predictions for thedangerous circumstances that may arise in the futuredue to the pandemic. Logistic and linear regressiontechniques can play a crucial role in forecasting theeffects of Covid-19 pandemic. The linear regressionmodel describes relation between a dependentvariable and one or more independent variable usinga regression line. On the other hand logisticregression is used to determine probabilities of all thepossible outcomes for an event using dependentvariable that are binary in nature like 0/1 and yes/no.Linear regression can be used to predict theprobability of quantitative parameter while thelogistic method can be used for quantitativeparameters. Linear regression can e used to predictthe number of confirmed, recovered and deceasedcases after a given interval of time . The accuracy ofoutcomes while using this model can be improved bymaking the dataset more informative by inclusion ofusing some more parameters like date, gender,immune system of patients and preventive measures.The predictions obtained by the linear regressionenabled handling of the risky situation throughformulation of policies to protect the community fromthe deadly virus. The predictions done for short terminterval can be modified for the long termpredictions. The health care ministry andorganizations of a country can use these predictionsto plan for the essential services. Some of the stepstaken by the countries all over the world to preventtransmission are social distancing, quarantine andnational lockdown. Logistic regression can be usedfor sentiment analysis of people using the data fromthe social media platforms. Most of the people useIJSERIJSER 2020http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 11, Issue 7, July-2020ISSN 2229-5518social media to express their fear for fast spread ofcoronavirus infection. Some of them may talk aboutthe impact of the pandemic on the economy of thecountry and how it would affect the common lives ofthe people. The sentiments or expressions of theperson will vary according to time and place withboth positive and negative emotions being sharedthrough the social media like Facebook and twitter.The sentiment analysis using logistic regression canbe used to predict the impact on hospitality andtourism industry due to changes in behavior ofpeople for spending on travel and entertainment. Thissituation can arise for some of the people due to fearof infection. Another reason for the change inbehavior will be financial instability due to theincrease in unemployment and loss in businessduring the Coronavirus pandemic. The sentimentanalysis can be done for making much other kind ofpredictions by creating a subset of data. This gathereddata can be classified into positive and negativeemotions using logistic regression model.[4] Anis Koubaa, Understanding the COVID19Outbreak: A Comparative Data Analytics and Study,Arxiv:2003.14150v1,March learning-covid-19[6] /[7] Ahmad Alimadadi, Sachin Aryal,IshanManandhar, Patricia B. Munroe, Bina Joe, Xi Cheng,Artificial Intelligence And Machine Learning To FightCOVID-19, Physiol Genomics 52: 200–202, 2020[8] Prabira Sethy, Santi Kumari Behera, PradyumnaKumar, Preesat Biswas,(2020). Detection ofcoronavirus Disease (COVID-19) based on DeepFeatures and Support Vector Machine. 643-651.10.33889/IJMEMS.2020.5.4.052.[9] Lamia Nabil Mahdy, Kadry Ali Ezzat, HaythamH. Elmousalami, Hassan Aboul Ella, Aboul EllaHassanien, Automatic X-ray COVID-19 Lung ImageClassification System based on Multi-LevelThresholding and Support Vector Machine, 2020,medRxiv preprint[10] Gaurav Pandey, Poonam Chaudhary, RajanGupta, Saibal Pal, SEIR and Regression Model IJSER5. CONCLUSIONBig data analytics and Machine Learning can be veryeffective in finding solutions for the Covid-19pandemic. The results of analytics help in predictingthe growth of affected patients in the future whichcan thus be utilized for forecasting the health caredemands. An advantage of using Machine Learningfor analytics is high speed and accuracy. This articlehas described the role of Machine Learning and someof the models that can be helpful in facing thechallenging situation. As the pandemic continues,new challenges will keep on arising but thetechnological advancements will surely try to find away through this devastating phase and help theworld to mitigate the impact of virus.1236REFERENCES[1]Douglas Laney, 3D Data Management:Controlling Data Volume, Velocity and Variety,Application Delivery Strategies, Meta Group, 6 Feb2001, pp s-fight-coronavirus/[3] Ifeyinwa Angela Ajah ,Henry Friday Nweke, BigData and Business Analytics: Trends, Platforms,Success Factors and Applications, Big Data andCognitive Computing, MDPI,2019IJSER 2020http://www.ijser.org

Predictive analytics can be used to forecast certain events which can then be used to plan accordingly to achieve our targets. Prescriptive analytics uses the power of decision science to select the best among all the available m