[Air-L] [External] Re: Buying tweets ?

Sarah Ann Oates soates at umd.edu
Thu Sep 10 07:12:59 PDT 2020


It would be great if we didn't have to pay for social media data (or
research in general). But one way or other you have to pay either by
learning how to do it or paying someone else to do it for you -- and while
you can scrape some material, there are ethical and practical
considerations that make this problematic.

I do take the point that there is a wealth of content out there to study
for free -- but I don't think this is true of a significant study of social
media content on important issues. It would great if social media data were
like other content -- imagine a situation in which universities subscribed
to social media feeds the way that many subscribe to Nexis Uni for
traditional media content.

We actually had a roundtable about researchers acquiring data etc. via
social media analytics companies at the American Political Science
Association Political Communication Pre-Conference this week. You can see
our pre-record here: https://youtu.be/jErwj9ZAXa4

Sarah Oates

Professor and Senior Scholar
Philip Merrill College of Journalism
University of Maryland
College Park, MD 20457
Email: soates at umd.edu
Phone: 301 455 2332
www.media-politics.com
Twitter: @media_politics

*Support the UMD Student Crisis Fund
<https://giving.umd.edu/giving/showPage.php?name=crisis-funding> today. *



On Thu, Sep 10, 2020 at 9:55 AM Deen Freelon <dfreelon at gmail.com> wrote:

> The streaming API is great--if 1) you're certain of all your search
> criteria in advance, and 2) you have a well-calibrated hardware/software
> setup dedicated to real-time data collection. In practice, researchers
> often need to add keywords and other criteria after the fact, or the
> significance of certain events/individuals only becomes clear later, or
> the limitations of their data collection setup do not become apparent
> until it fails, etc. The obvious advantage of streaming is that it's
> free, but it is not viable for a substantial subset of use cases.
> Sometimes the only way to obtain high-quality historical Twitter data is
> to purchase it.
>
> The ethical issues Stu mentions are as yet unresolved. Theoretically,
> users should be able to demand removal of their data from academic
> databases at any time, but this is a practical impossibility. Most
> Twitter-based research would be impossible without storage of the full
> text and metadata, and there are no widely-used guidelines about how
> that data should be managed or sunset aside from Twitter's prohibition
> on the sharing of complete datasets. Certainly something we should
> continue to think about how best to address... /DEEN
>
> On 9/10/2020 9:39 AM, Stuart Shulman wrote:
> > There is some scholarship on the various options which have continued
> > to evolve over time with respect to Twitter.
> > https://www.mdpi.com/1660-4601/17/3/864
> >
> > I agree that there is usually sufficient free, real time data to
> > gather from Twitter to reach saturation on most real time current
> > research questions.
> >
> > My personal experience with the cost and regulation of historical
> > Twitter access for academia is a cautionary tale. Most of what
> > academics want to study, and often the way they do it, violates the
> > clear language of the Twitter Terms of Service and also the
> > increasingly widespread right to be forgotten. If you are storing
> > spreadsheets of Twitter data that includes over time more and more
> > material from deleted accounts or deleted Tweets, this is problematic
> > from a legal perspective and raises ethical review questions
> > that should not be glossed over in any wikileaks fashion by journal
> > editors or university ethics officers.
> >
> > ~Stu
> > Dr. Stu Shulman U.S. Soccer Federation C-Licensed Coach Valeo FC &
> > Capacidad <http://capacidadprograms.org/?page_id=13> Volunteer Coach
> > /*Is your player ready to give back to the game?* /Contact Coach Stu
> > about winter & spring 2020 volunteer efforts.
> > Capacidad  <http://capacidadprograms.org/?page_id=13>
> >
> >
> >
> >
> >
> > On Thu, Sep 10, 2020 at 9:24 AM Peter Joseph Gloviczki PhD
> > <pgloviczki at coker.edu <mailto:pgloviczki at coker.edu>> wrote:
> >
> >     Deen makes a great point about sponsorship. I was referring to
> >     having to
> >     use personal funds.
> >
> >     Fondly, Peter
> >
> >     Peter Joseph Gloviczki, PhD    he/him/his
> >     Associate Professor of Communication
> >     Coker University
> >     300 East College Avenue
> >     Hartsville, South Carolina 29550
> >     843.383.8379
> >     pgloviczki at coker.edu <mailto:pgloviczki at coker.edu>
> >
> >     Assistant Editor, Journal of Loss and Trauma (Taylor & Francis)
> >     Immediate Past Head, Cultural and Critical Studies Division, AEJMC
> >     1st Vice President, Carolinas Communication Association
> >
> >
> >     On Thu, Sep 10, 2020 at 9:20 AM Deen Freelon <dfreelon at gmail.com
> >     <mailto:dfreelon at gmail.com>> wrote:
> >
> >     > I have purchased tweets directly from Twitter on multiple
> >     occasions. I
> >     > disagree with Dr. Gloviczki about paying research costs--some of
> the
> >     > most rigorous research is sponsored. I wouldn't spend my own
> >     personal
> >     > money on such costs, but if you've got funding, by all means use
> it.
> >     >
> >     > Twitter allows university-affiliated users to buy data in a few
> >     ways.
> >     > I've primarily used their a la carte service (that's just what I
> >     call
> >     > it), where you give them a set of search criteria (e.g. a
> >     keyword[s] and
> >     > a time period) and they give you a quote. Pricing is based on
> >     the number
> >     > of days covered and the total volume of tweets. Their minimum
> >     price is a
> >     > little over $1k US and costs can quickly run into the
> >     five-figure range,
> >     > especially if you want tweets over a lengthy period of time.
> >     Also, they
> >     > have been known to refuse certain data requests, especially those
> >     > related to international conflict. The criteria for "acceptable"
> >     data
> >     > requests are not public--I've asked.
> >     >
> >     > Twitter does not advertise this service but it does exist. Fill
> >     out this
> >     > form and ask about it:
> >     >
> >     >
> >
> https://developer.twitter.com/en/products/twitter-api/enterprise/application
> >     >
> >     > The associated metadata are the same as provided through the
> >     standard
> >     > APIs. These can be found here:
> >     >
> >     >
> >
> https://developer.twitter.com/en/docs/twitter-api/v1/tweets/post-and-engage/api-reference/get-statuses-lookup
> >     > Language tags are included, and geographic info is present only
> when
> >     > users opt in, which is rare (typically 3-5% of tweets). I will
> >     also say
> >     > that obtaining the data once purchased is not easy--they come as
> >     GZIPped
> >     > JSON files packaged in 10-minute increments. So a year of data
> >     is far
> >     > too much to download manually--you'd need to automate your download
> >     > pipeline. I've written code to do this, so anyone who manages to
> >     > successfully buy Twitter data may feel free to contact me to
> >     access my
> >     > scripts.
> >     >
> >     > Twitter also offers a couple other data purchase options,
> >     including its
> >     > Premium API
> >     >
> >     (https://developer.twitter.com/en/products/twitter-api/premium-apis)
> >     and
> >     > its Enterprise API
> >     >
> >     (https://developer.twitter.com/en/products/twitter-api/enterprise#/
> ).
> >     > These charge pretty steep monthly fees and are oriented more toward
> >     > corporate and other well-funded clients.
> >     >
> >     > Finally, here's their portal for academic researchers, which may
> >     have
> >     > some relevant info:
> >     >
> >     >
> >
> https://developer.twitter.com/en/solutions/academic-research/products-for-researchers
> >     >
> >     > Best, /DEEN
> >     >
> >     > On 9/10/2020 8:39 AM, Sandrine Roginsky wrote:
> >     > > Hello everybody,
> >     > >
> >     > > Help needed. Does anyone have experience with buying tweets
> >     from Twitter
> >     > for research? We have a fairly specific query and would like to
> >     know which
> >     > information is given about the tweets harvested through the
> >     query (e.g. is
> >     > language or geographic information given for tweets, even if it
> >     isn't part
> >     > of the query - so not a selection criterion)?
> >     > >
> >     > > Many thanks.
> >     > >
> >     > > Best wishes,
> >     > > Sandrine
> >     > >
> >     > >
> >     > >
> >     > > Sandrine Roginsky
> >     > > Associate Professor
> >     > >
> >     > > Faculty of Economic, Social and Political Sciences, and
> >     Communication
> >     > > Institute Language & Communication, PCOM / LASCO
> >     > >
> >     > >
> >     > >
> >     > >
> >     > > _______________________________________________
> >     > > The Air-L at listserv.aoir.org <mailto:Air-L at listserv.aoir.org>
> >     mailing list
> >     > > is provided by the Association of Internet Researchers
> >     http://aoir.org
> >     > > Subscribe, change options or unsubscribe at:
> >     > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >     > >
> >     > > Join the Association of Internet Researchers:
> >     > > http://www.aoir.org/
> >     >
> >     > --
> >     > Deen Freelon, Ph.D.
> >     > Associate Professor -> Hussman School of Journalism and Media
> >     > Principal Researcher -> Center for Information, Technology, and
> >     Public Life
> >     > University of North Carolina at Chapel Hill
> >     > http://dfreelon.org | @dfreelon <https://twitter.com/dfreelon> |
> >     > https://github.com/dfreelon | https://citap.unc.edu/
> >     > Schedule an appointment with me
> >     > <https://doodle.com/mm/deenfreelon/book-a-time>
> >     > _______________________________________________
> >     > The Air-L at listserv.aoir.org <mailto:Air-L at listserv.aoir.org>
> >     mailing list
> >     > is provided by the Association of Internet Researchers
> >     http://aoir.org
> >     > Subscribe, change options or unsubscribe at:
> >     > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >     >
> >     > Join the Association of Internet Researchers:
> >     > http://www.aoir.org/
> >     >
> >     _______________________________________________
> >     The Air-L at listserv.aoir.org <mailto:Air-L at listserv.aoir.org>
> >     mailing list
> >     is provided by the Association of Internet Researchers
> http://aoir.org
> >     Subscribe, change options or unsubscribe at:
> >     http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> >     Join the Association of Internet Researchers:
> >     http://www.aoir.org/
> >
>
> --
> Deen Freelon, Ph.D.
> Associate Professor -> Hussman School of Journalism and Media
> Principal Researcher -> Center for Information, Technology, and Public Life
> University of North Carolina at Chapel Hill
> http://dfreelon.org | @dfreelon <https://twitter.com/dfreelon> |
> https://github.com/dfreelon | https://citap.unc.edu/
> Schedule an appointment with me
> <https://doodle.com/mm/deenfreelon/book-a-time>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/



More information about the Air-L mailing list