[Air-L] COVID-19 Twitter dataset (>524 million) for research

Muhammad Imran mimran15 at gmail.com
Fri Jun 5 05:06:16 PDT 2020


[Apologies for cross-posting]

Dear Colleagues, 

This COVID-19 Twitter dataset might be of interest to you. 

We've released "GeoCoV19", a large-scale and multilingual dataset of 524 million tweets about the ongoing COVID-19 pandemic. The dataset is annotated with location information, including country, state, county, and city. The following are some important stats about the dataset:

- The dataset was collected over a period of 90 days from Feb 1st to May 1st without any gaps. We are still collecting new data, which will be released as well.
- The 524 million tweets in the dataset are from over 43 million users.
- The geographic coverage of the dataset spans over 218 countries and more than 47,000 cities around the world.
- The dataset covers 62 languages. The top five languages are: English, Spanish, French, Portuguese, and Italian.

Paper link: https://dl.acm.org/doi/abs/10.1145/3404111.3404114 <https://dl.acm.org/doi/abs/10.1145/3404111.3404114>

Dataset link: https://crisisnlp.qcri.org/covid19 <https://crisisnlp.qcri.org/covid19>

Best,
-----
Dr. Muhammad Imran
Scientist
Qatar Computing Research Institute
Hamad Bin Khalifa University
Doha, Qatar.
Tel:  +974 4454 1521
https://mimran.me/ <http://mimran.me/>



More information about the Air-L mailing list