Syed Mehtab Alam, Decoupes Rémy, Arsevska Elena, Roche Mathieu, Teisseire Maguelonne. 2022. Spatial opinion mining from COVID-19 twitter data. International Journal of Infectious Diseases, 116, suppl. : 527. International Meeting on Emerging Diseases and Surveillance (IMED 2021), s.l., 4 Novembre 2021/6 Novembre 2021.
|
Version publiée
- Anglais
Sous licence . Syed_Mehtab_Alam_IJID.pdf Télécharger (87kB) | Prévisualisation |
Liste HCERES des revues (en SHS) : oui
Thème(s) HCERES des revues (en SHS) : Psychologie-éthologie-ergonomie
Résumé : Purpose: In the first quarter of 2020, World Health Organization (WHO) declared COVID-19 as a public health emergency around the globe. Therefore, different users from all over the world shared their thoughts about COVID-19 on social media platforms i.e., Twitter, Facebook etc. So, it is important to analyze public opinions about COVID-19 from different regions over different period of time. To fulfill the spatial analysis issue, a previous work called H-TF-IDF (Hierarchy-based measure for tweet analysis) for term extraction from tweet data has been proposed. In this work, we focus on the sentiment analysis performed on terms selected by H-TF-IDF for spatial tweets groups to know local situations during the ongoing epidemic COVID-19 over different time frames. Methods & Materials: The primary step is to extract terms from tweets using H-TF-IDF approach. Moreover, these terms are utilized in two ways i.e., 1) select tweets containing terms, 2) terms used as features for sentiment analysis. Thereafter, data preprocessing is performed to clean the text. Afterwards, Vectorization models i.e., bag-of-words (BOW) and term frequency-inverse document frequency (TF-IDF) are used to extract features with the help of n-gram techniques. These features are extracted to train the prediction models for sentiment analysis. Lastly, different statistical and machine learning models i.e., Logistic regression, support vector machine (SVM), etc. are applied to classify the spatial tweets groups. For preliminary results, experiments are conducted on H-TF-IDF tweets corpus having geocoded spatial information for the period of January, 2020. These tweets are extracted from the dataset collected by E.Chen (https://github.com/echen102/COVID-19-TweetIDs) that focuses on the early beginning of the outbreak. A uniform experiment setup of train-test (80% and 20%) split scheme is used for each prediction model. Results: The results illustrate that specific terms highlighted by H-TF-IDF provide useful information that would not have been identified without this spatial analysis. The classification results spatial location tweet groups into positive, negative and neutral by subjectivity and polarity measures. Conclusion: The current work is applied on English language-based Twitter information. A following work is to incorporate other languages to perform sentiment analysis. Furthermore, BERT will be used to extend these features.
Mots-clés Agrovoc : covid-19, opinion publique, réseaux sociaux, analyse de réseau, analyse spatiale, fouille de textes, fouille de données, épidémiologie, pandémie
Mots-clés libres : Event-based surveillance, Epidemic intelligence, Text Mining, Spatial analysis, COVID-19, Twitter
Champ stratégique Cirad : CTS 7 (2019-) - Hors champs stratégiques
Agences de financement européennes : European Commission
Programme de financement européen : H2020
Projets sur financement : (EU) MOnitoring Outbreak events for Disease surveillance in a data science context
Auteurs et affiliations
- Syed Mehtab Alam, CIRAD-ES-UMR TETIS (FRA) - auteur correspondant
- Decoupes Rémy, INRAE (FRA)
- Arsevska Elena, CIRAD-BIOS-UMR ASTRE (FRA) ORCID: 0000-0002-6693-2316
- Roche Mathieu, CIRAD-ES-UMR TETIS (FRA) ORCID: 0000-0003-3272-8568
- Teisseire Maguelonne, INRAE (FRA)
Source : Cirad-Agritrop (https://agritrop.cirad.fr/600946/)
[ Page générée et mise en cache le 2024-02-09 ]