[Air-L] ethical considerations for a publicly-shared Twitter dataset?

Jodi Schneider jschneider at pobox.com
Thu Jul 13 11:35:23 PDT 2023

I am writing the list for ethics guidance on reusing a publicly-shared
Twitter dataset I found <https://github.com/DrMassie/Covid_tweets/> for
academic research. The data contains "four variables are in each dataset:
created_at (i.e., the date of the tweet), username (i.e., the handle of the
user), text (i.e,. *the tweet itself*), and location (i.e., information on
user profile that the user put as his/her location)."

In particular, has the data has been shared properly? As far as I can tell,
"properly" would mean "in accordance with the Twitter ToS". After reading
the current Twitter ToS <https://twitter.com/en/tos> and developer
agreements <https://developer.twitter.com/en/developer-terms/agreement> I
am still unsure. (I would ask our copyright librarian - but she is on
sabbatical until January!)

The associated paper - which is quite interesting - is:
Fogarty, B., Massie, K., and Svistova, J. (2024). Unmasking Twitter
discourse: An infodemiology study of Covid-19 mitigation practices. *The
Atlantic Journal of Communication*.

They say "*The data and files that we have generated are freely available
for public and academic use as long as our original work is sited *(sic) *as
the source*." The dataset is described as "Keyword "coronavirus" tweets
from March to May 2020 before covid or covid19 became popular".



More information about the Air-L mailing list