[Air-L] [External] Re: Buying tweets ?

Matteo Magnani mat.magnani at gmail.com
Tue Sep 15 00:43:19 PDT 2020


At my lab we made some experiments to automate or semi-automate user
notification and tweet removal from our archives, while studying how the
GDPR influences social network analysis research (paper here:
https://dl.acm.org/doi/fullHtml/10.1145/3365524). Long story short: we
encountered several difficulties, from Twitter blocking our notifications
(because we qualified as evil bots :) ) to the potential burden on
individual researchers having to take additional actions to increase data
protection. But some (maybe sub-optimal) solutions are definitely feasible.
For example we implemented an extension of an existing Twitter data
collection tool that can be set up to regularly tweet some text with the
same keyword that is being monitored informing users that we are monitoring
it, with a link to a page with the required information about the research
and where users can login with their Twitter account to request their
tweets collected as part of the process. This was just a proof-of-concept,
not engineered to scale to large data collections (in particular,
pseudonymization was not optimized and thus slow), but it showed that this
can be done "for real" with some limited resources, and that it cannot
really be done by individual researchers for their own projects in general.
In my opinion, incorporating such capabilities in "mainstream" Twitter data
collection tools would greatly facilitate the adoption of best
practices, that would then be as easy as clicking on a button while setting
up data collection processes. The same is probably true for other
platforms, although we have only tried this on Twitter.

Matteo

--
Matteo Magnani
Docent (Associate Professor) in Computing Science and Distinguished
university teacher
Coordinator, Master’s programme in Data Science
Director, Uppsala University InfoLab (http://infolab.it.uu.se)
Nordic network on online disinformation (https://nordis.research.it.uu.se)
Department of Information Technology, Uppsala University
TW: @matmagnani @uuinfolab FB: @uuinfolab

On Thu, Sep 10, 2020 at 7:50 PM Stuart Shulman <stuart.shulman at gmail.com>
wrote:

> There is a difference between not being able or willing and it not
> being possible.
>
> https://developer.twitter.com/en/docs/twitter-api/v1/tweets/compliance/overview
>
> It is definitely possible and also required.
> Many research labs in academia find ways to comply with the
> documentation above.
> More, I suspect either know and ignore it, or simply do not know what
> is expected.
> It is similar to the need to comply with the office of sponsored
> research protocols for human subjects.
> You have to minimize the risk of harm.
> No compromises for researcher convenience or productivity demands.
>
> ~SWS
>
>
>
>
>
>
>
>
> On Thu, Sep 10, 2020 at 1:33 PM Deen Freelon <dfreelon at gmail.com> wrote:
>
> > Sure, many countries have a right to be forgotten. The US doesn't, and
> > AFAIK there's little clear case law that applies to individuals'
> > presence in research datasets. If someone asks me to remove their data
> > from my datasets, I'm happy to do so, but I'm not willing to
> > prospectively monitor Twitter's platform for deletions so that my
> > datasets always match what is currently available on Twitter. That is
> > technically infeasible for me, and I suspect for many others as well.
> >
> > The practicality aspect I mentioned applies also to users. You can ask
> > AIR-L members to remove your data, but what assurances do you have that
> > they've done so? It's impossible even to check that they've actually
> > read your message. Now consider all the other researchers' datasets of
> > which your data may be a part--there's no way to even know who to ask.
> > And all of this to prevent your data from being one point among
> > millions, with no exposure of any identifying information? It's little
> > wonder yours is the first data removal request I've ever received, but
> > as I said, I'll honor it. /DEEN
> >
> > On 9/10/2020 10:49 AM, Stuart Shulman wrote:
> > > There is nothing theoretical about checking in real time for
> > > deletions. When you study a Tweet's content in the Twitter display, if
> > > a Tweet is deleted or an account suspended or deleted, the Tweet will
> > > not display. That is real time compliance. We have done it for many
> > > years now, all the while advising students and faculty on the
> > > ethical importance of this point.
> > >
> > > The "right to be forgotten" is law in many countries, so I am unsure
> > > how that is unresolved. Something is either legal or it is not. If
> > > anyone reading this has any of my deleted Tweets from my deleted
> > > account, the Canadian part of me requests you immediately delete them.
> > > If you lack the ability to check for compliance in real time, should
> > > you be handling my data and violating my right to be forgotten under
> > > the broad banner of research? I have tweeted extensively about acts by
> > > a hostile foreign power to game the imminent election. I have recently
> > > deleted personal Facebook, YouTube and Twitter accounts. Nobody has
> > > any business holding that data. It is unethical.
> > >
> > > There are various guidelines about legally sharing lists of Tweet IDs
> > > for rehydration and replication (something almost never done) versus
> > > sharing spreadsheets of complete data extracts or the raw JSON, which
> > > is done all the time in defiance of the Twitter ToS.
> > >
> > > On Thu, Sep 10, 2020 at 9:55 AM Deen Freelon <dfreelon at gmail.com
> > > <mailto:dfreelon at gmail.com>> wrote:
> > >
> > >     The streaming API is great--if 1) you're certain of all your search
> > >     criteria in advance, and 2) you have a well-calibrated
> > >     hardware/software
> > >     setup dedicated to real-time data collection. In practice,
> > >     researchers
> > >     often need to add keywords and other criteria after the fact, or
> the
> > >     significance of certain events/individuals only becomes clear
> > >     later, or
> > >     the limitations of their data collection setup do not become
> apparent
> > >     until it fails, etc. The obvious advantage of streaming is that
> it's
> > >     free, but it is not viable for a substantial subset of use cases.
> > >     Sometimes the only way to obtain high-quality historical Twitter
> > >     data is
> > >     to purchase it.
> > >
> > >     The ethical issues Stu mentions are as yet unresolved.
> Theoretically,
> > >     users should be able to demand removal of their data from academic
> > >     databases at any time, but this is a practical impossibility. Most
> > >     Twitter-based research would be impossible without storage of the
> > >     full
> > >     text and metadata, and there are no widely-used guidelines about
> how
> > >     that data should be managed or sunset aside from Twitter's
> > >     prohibition
> > >     on the sharing of complete datasets. Certainly something we should
> > >     continue to think about how best to address... /DEEN
> > >
> > >     On 9/10/2020 9:39 AM, Stuart Shulman wrote:
> > >     > There is some scholarship on the various options which have
> > >     continued
> > >     > to evolve over time with respect to Twitter.
> > >     > https://www.mdpi.com/1660-4601/17/3/864
> > >     >
> > >     > I agree that there is usually sufficient free, real time data to
> > >     > gather from Twitter to reach saturation on most real time current
> > >     > research questions.
> > >     >
> > >     > My personal experience with the cost and regulation of historical
> > >     > Twitter access for academia is a cautionary tale. Most of what
> > >     > academics want to study, and often the way they do it, violates
> the
> > >     > clear language of the Twitter Terms of Service and also the
> > >     > increasingly widespread right to be forgotten. If you are storing
> > >     > spreadsheets of Twitter data that includes over time more and
> more
> > >     > material from deleted accounts or deleted Tweets, this is
> > >     problematic
> > >     > from a legal perspective and raises ethical review questions
> > >     > that should not be glossed over in any wikileaks fashion by
> journal
> > >     > editors or university ethics officers.
> > >     >
> > >     > ~Stu
> > >     > Dr. Stu Shulman U.S. Soccer Federation C-Licensed Coach Valeo FC
> &
> > >     > Capacidad <http://capacidadprograms.org/?page_id=13> Volunteer
> > >     Coach
> > >     > /*Is your player ready to give back to the game?* /Contact Coach
> > >     Stu
> > >     > about winter & spring 2020 volunteer efforts.
> > >     > Capacidad  <http://capacidadprograms.org/?page_id=13>
> > >     >
> > >     >
> > >     >
> > >     >
> > >     >
> > >     > On Thu, Sep 10, 2020 at 9:24 AM Peter Joseph Gloviczki PhD
> > >     > <pgloviczki at coker.edu <mailto:pgloviczki at coker.edu>
> > >     <mailto:pgloviczki at coker.edu <mailto:pgloviczki at coker.edu>>>
> wrote:
> > >     >
> > >     >     Deen makes a great point about sponsorship. I was referring
> to
> > >     >     having to
> > >     >     use personal funds.
> > >     >
> > >     >     Fondly, Peter
> > >     >
> > >     >     Peter Joseph Gloviczki, PhD    he/him/his
> > >     >     Associate Professor of Communication
> > >     >     Coker University
> > >     >     300 East College Avenue
> > >     >     Hartsville, South Carolina 29550
> > >     >     843.383.8379
> > >     > pgloviczki at coker.edu <mailto:pgloviczki at coker.edu>
> > >     <mailto:pgloviczki at coker.edu <mailto:pgloviczki at coker.edu>>
> > >     >
> > >     >     Assistant Editor, Journal of Loss and Trauma (Taylor &
> Francis)
> > >     >     Immediate Past Head, Cultural and Critical Studies Division,
> > >     AEJMC
> > >     >     1st Vice President, Carolinas Communication Association
> > >     >
> > >     >
> > >     >     On Thu, Sep 10, 2020 at 9:20 AM Deen Freelon
> > >     <dfreelon at gmail.com <mailto:dfreelon at gmail.com>
> > >     >     <mailto:dfreelon at gmail.com <mailto:dfreelon at gmail.com>>>
> > wrote:
> > >     >
> > >     >     > I have purchased tweets directly from Twitter on multiple
> > >     >     occasions. I
> > >     >     > disagree with Dr. Gloviczki about paying research
> > >     costs--some of the
> > >     >     > most rigorous research is sponsored. I wouldn't spend my
> own
> > >     >     personal
> > >     >     > money on such costs, but if you've got funding, by all
> > >     means use it.
> > >     >     >
> > >     >     > Twitter allows university-affiliated users to buy data in
> > >     a few
> > >     >     ways.
> > >     >     > I've primarily used their a la carte service (that's just
> > >     what I
> > >     >     call
> > >     >     > it), where you give them a set of search criteria (e.g. a
> > >     >     keyword[s] and
> > >     >     > a time period) and they give you a quote. Pricing is based
> on
> > >     >     the number
> > >     >     > of days covered and the total volume of tweets. Their
> minimum
> > >     >     price is a
> > >     >     > little over $1k US and costs can quickly run into the
> > >     >     five-figure range,
> > >     >     > especially if you want tweets over a lengthy period of
> time.
> > >     >     Also, they
> > >     >     > have been known to refuse certain data requests,
> > >     especially those
> > >     >     > related to international conflict. The criteria for
> > >     "acceptable"
> > >     >     data
> > >     >     > requests are not public--I've asked.
> > >     >     >
> > >     >     > Twitter does not advertise this service but it does exist.
> > >     Fill
> > >     >     out this
> > >     >     > form and ask about it:
> > >     >     >
> > >     >     >
> > >     >
> > >
> >
> https://developer.twitter.com/en/products/twitter-api/enterprise/application
> > >     >     >
> > >     >     > The associated metadata are the same as provided through
> the
> > >     >     standard
> > >     >     > APIs. These can be found here:
> > >     >     >
> > >     >     >
> > >     >
> > >
> >
> https://developer.twitter.com/en/docs/twitter-api/v1/tweets/post-and-engage/api-reference/get-statuses-lookup
> > >     >     > Language tags are included, and geographic info is present
> > >     only when
> > >     >     > users opt in, which is rare (typically 3-5% of tweets). I
> > will
> > >     >     also say
> > >     >     > that obtaining the data once purchased is not easy--they
> > >     come as
> > >     >     GZIPped
> > >     >     > JSON files packaged in 10-minute increments. So a year of
> > data
> > >     >     is far
> > >     >     > too much to download manually--you'd need to automate your
> > >     download
> > >     >     > pipeline. I've written code to do this, so anyone who
> > >     manages to
> > >     >     > successfully buy Twitter data may feel free to contact me
> to
> > >     >     access my
> > >     >     > scripts.
> > >     >     >
> > >     >     > Twitter also offers a couple other data purchase options,
> > >     >     including its
> > >     >     > Premium API
> > >     >     >
> > >     >
> > >      (
> https://developer.twitter.com/en/products/twitter-api/premium-apis
> > )
> > >     >     and
> > >     >     > its Enterprise API
> > >     >     >
> > >     >
> > >      (
> https://developer.twitter.com/en/products/twitter-api/enterprise#/
> > ).
> > >     >     > These charge pretty steep monthly fees and are oriented
> > >     more toward
> > >     >     > corporate and other well-funded clients.
> > >     >     >
> > >     >     > Finally, here's their portal for academic researchers,
> > >     which may
> > >     >     have
> > >     >     > some relevant info:
> > >     >     >
> > >     >     >
> > >     >
> > >
> >
> https://developer.twitter.com/en/solutions/academic-research/products-for-researchers
> > >     >     >
> > >     >     > Best, /DEEN
> > >     >     >
> > >     >     > On 9/10/2020 8:39 AM, Sandrine Roginsky wrote:
> > >     >     > > Hello everybody,
> > >     >     > >
> > >     >     > > Help needed. Does anyone have experience with buying
> tweets
> > >     >     from Twitter
> > >     >     > for research? We have a fairly specific query and would
> > >     like to
> > >     >     know which
> > >     >     > information is given about the tweets harvested through the
> > >     >     query (e.g. is
> > >     >     > language or geographic information given for tweets, even
> > >     if it
> > >     >     isn't part
> > >     >     > of the query - so not a selection criterion)?
> > >     >     > >
> > >     >     > > Many thanks.
> > >     >     > >
> > >     >     > > Best wishes,
> > >     >     > > Sandrine
> > >     >     > >
> > >     >     > >
> > >     >     > >
> > >     >     > > Sandrine Roginsky
> > >     >     > > Associate Professor
> > >     >     > >
> > >     >     > > Faculty of Economic, Social and Political Sciences, and
> > >     >     Communication
> > >     >     > > Institute Language & Communication, PCOM / LASCO
> > >     >     > >
> > >     >     > >
> > >     >     > >
> > >     >     > >
> > >     >     > > _______________________________________________
> > >     >     > > The Air-L at listserv.aoir.org
> > >     <mailto:Air-L at listserv.aoir.org> <mailto:Air-L at listserv.aoir.org
> > >     <mailto:Air-L at listserv.aoir.org>>
> > >     >     mailing list
> > >     >     > > is provided by the Association of Internet Researchers
> > >     > http://aoir.org
> > >     >     > > Subscribe, change options or unsubscribe at:
> > >     >     > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> > >     >     > >
> > >     >     > > Join the Association of Internet Researchers:
> > >     >     > > http://www.aoir.org/
> > >     >     >
> > >     >     > --
> > >     >     > Deen Freelon, Ph.D.
> > >     >     > Associate Professor -> Hussman School of Journalism and
> Media
> > >     >     > Principal Researcher -> Center for Information,
> > >     Technology, and
> > >     >     Public Life
> > >     >     > University of North Carolina at Chapel Hill
> > >     >     > http://dfreelon.org | @dfreelon
> > >     <https://twitter.com/dfreelon> |
> > >     >     > https://github.com/dfreelon | https://citap.unc.edu/
> > >     >     > Schedule an appointment with me
> > >     >     > <https://doodle.com/mm/deenfreelon/book-a-time>
> > >     >     > _______________________________________________
> > >     >     > The Air-L at listserv.aoir.org
> > >     <mailto:Air-L at listserv.aoir.org> <mailto:Air-L at listserv.aoir.org
> > >     <mailto:Air-L at listserv.aoir.org>>
> > >     >     mailing list
> > >     >     > is provided by the Association of Internet Researchers
> > >     > http://aoir.org
> > >     >     > Subscribe, change options or unsubscribe at:
> > >     >     > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> > >     >     >
> > >     >     > Join the Association of Internet Researchers:
> > >     >     > http://www.aoir.org/
> > >     >     >
> > >     >     _______________________________________________
> > >     >     The Air-L at listserv.aoir.org <mailto:Air-L at listserv.aoir.org>
> > >     <mailto:Air-L at listserv.aoir.org <mailto:Air-L at listserv.aoir.org>>
> > >     >     mailing list
> > >     >     is provided by the Association of Internet Researchers
> > >     http://aoir.org
> > >     >     Subscribe, change options or unsubscribe at:
> > >     > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> > >     >
> > >     >     Join the Association of Internet Researchers:
> > >     > http://www.aoir.org/
> > >     >
> > >
> > >     --
> > >     Deen Freelon, Ph.D.
> > >     Associate Professor -> Hussman School of Journalism and Media
> > >     Principal Researcher -> Center for Information, Technology, and
> > >     Public Life
> > >     University of North Carolina at Chapel Hill
> > >     http://dfreelon.org | @dfreelon <https://twitter.com/dfreelon> |
> > >     https://github.com/dfreelon | https://citap.unc.edu/
> > >     Schedule an appointment with me
> > >     <https://doodle.com/mm/deenfreelon/book-a-time>
> > >     _______________________________________________
> > >     The Air-L at listserv.aoir.org <mailto:Air-L at listserv.aoir.org>
> > >     mailing list
> > >     is provided by the Association of Internet Researchers
> > http://aoir.org
> > >     Subscribe, change options or unsubscribe at:
> > >     http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> > >
> > >     Join the Association of Internet Researchers:
> > >     http://www.aoir.org/
> > >
> >
> > --
> > Deen Freelon, Ph.D.
> > Associate Professor -> Hussman School of Journalism and Media
> > Principal Researcher -> Center for Information, Technology, and Public
> Life
> > University of North Carolina at Chapel Hill
> > http://dfreelon.org | @dfreelon <https://twitter.com/dfreelon> |
> > https://github.com/dfreelon | https://citap.unc.edu/
> > Schedule an appointment with me
> > <https://doodle.com/mm/deenfreelon/book-a-time>
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>



More information about the Air-L mailing list