[Air-L] COVID-19 Twitter dataset (>524 million) for research
Muhammad Imran
mimran15 at gmail.com
Fri Jun 5 05:06:16 PDT 2020
[Apologies for cross-posting]
Dear Colleagues,
This COVID-19 Twitter dataset might be of interest to you.
We've released "GeoCoV19", a large-scale and multilingual dataset of 524 million tweets about the ongoing COVID-19 pandemic. The dataset is annotated with location information, including country, state, county, and city. The following are some important stats about the dataset:
- The dataset was collected over a period of 90 days from Feb 1st to May 1st without any gaps. We are still collecting new data, which will be released as well.
- The 524 million tweets in the dataset are from over 43 million users.
- The geographic coverage of the dataset spans over 218 countries and more than 47,000 cities around the world.
- The dataset covers 62 languages. The top five languages are: English, Spanish, French, Portuguese, and Italian.
Paper link: https://dl.acm.org/doi/abs/10.1145/3404111.3404114 <https://dl.acm.org/doi/abs/10.1145/3404111.3404114>
Dataset link: https://crisisnlp.qcri.org/covid19 <https://crisisnlp.qcri.org/covid19>
Best,
-----
Dr. Muhammad Imran
Scientist
Qatar Computing Research Institute
Hamad Bin Khalifa University
Doha, Qatar.
Tel: +974 4454 1521
https://mimran.me/ <http://mimran.me/>
More information about the Air-L
mailing list