[Air-L] [External] Re: Buying tweets ?

Stuart Shulman stuart.shulman at gmail.com
Tue Sep 15 04:53:44 PDT 2020


Matteo,

Definitely worth the effort! You may be hitting rate limits trying to check
batches, which would not technically make you a bot, but it would put you
in the penalty box with Twitter for asking too much of the API all at
once, even in the name of compliance. The manageable method is to review
individual Tweets using the Twitter display. In doing so, when a researcher
loads a Tweet stored in the database, the researcher pulls the content in
real time, producing many advantages, not just compliance. For compliance,
if the account is deleted or suspended, the content will not display and a
message tells the researcher what the cause is. This method avoids any rate
limits associated with large batch compliance checking by just going one at
a time. You also avoid the pitfall of deletions and suspensions that happen
moments, minutes, days, or weeks after your most recent batch check. Even
if you master the method of bulk compliance checking, the results do not
age well.

I do like the ethical principle of getting folks to opt in, however, most
probably won't and so that takes largely unrepresentative Twitter data and
further skews it to just those users willing to opt in, a very select
group.

Spreadsheets of Tweets are derivative not authentic Twitter data. Tweets
are much more than flat text and metadata. If the Tweet is still live on
Twitter, the many advantages of using the Twitter display beyond compliance
checking include:

- Media previews are displayed,
- Images are displayed,
- User images are displayed,
- Emoji are always displayed correctly in full color,
- Replies are seen in context attached to the original Tweet,
- Retweet and like counts show real time values.

There are other advantages. If you have never looked at Twitter datasets
for research using the Twitter display, arguably you have never looked at
actual Twitter data for research. Instead, you have been studying a
degraded convenience sample that is a by-product. For many reasons and a
variety of methodologies, it is important to preserve the authentic digital
artifact including the display that ensures both contextual validity for
substantive inferences and ethical standards for the handling of data
that no researcher can own or legally store forever. While there are many
uses for extracted features of Twitter data (ex., graphml files for network
graphs), the practice of storing Tweets and their metadata in spreadsheets
forever remains problematic.

Perhaps the most problematic idea in this thread is the notion that all
users of Twitter are responsible for notifying all researchers when they
delete a Tweet or an account. I cannot conceive how that might work. How
would I know which researchers hold data that I created and have the legal
right to delete? The obligation in fact falls on the ethical researcher to
put compliance methods in place, not the individual Twitter user creating
and deleting content, in many cases on a daily basis.

~Stu

Dr. Stuart ShulmanU.S. Soccer Federation C-Licensed Coach
Northampton High School Boys Varsity Coach



On Tue, Sep 15, 2020 at 3:43 AM Matteo Magnani <mat.magnani at gmail.com>
wrote:

> At my lab we made some experiments to automate or semi-automate user
> notification and tweet removal from our archives, while studying how the
> GDPR influences social network analysis research (paper here:
> https://dl.acm.org/doi/fullHtml/10.1145/3365524). Long story short: we
> encountered several difficulties, from Twitter blocking our notifications
> (because we qualified as evil bots :) ) to the potential burden on
> individual researchers having to take additional actions to increase data
> protection. But some (maybe sub-optimal) solutions are definitely feasible.
> For example we implemented an extension of an existing Twitter data
> collection tool that can be set up to regularly tweet some text with the
> same keyword that is being monitored informing users that we are monitoring
> it, with a link to a page with the required information about the research
> and where users can login with their Twitter account to request their
> tweets collected as part of the process. This was just a proof-of-concept,
> not engineered to scale to large data collections (in particular,
> pseudonymization was not optimized and thus slow), but it showed that this
> can be done "for real" with some limited resources, and that it cannot
> really be done by individual researchers for their own projects in general.
> In my opinion, incorporating such capabilities in "mainstream" Twitter data
> collection tools would greatly facilitate the adoption of best
> practices, that would then be as easy as clicking on a button while setting
> up data collection processes. The same is probably true for other
> platforms, although we have only tried this on Twitter.
>
> Matteo
>
> --
> Matteo Magnani
> Docent (Associate Professor) in Computing Science and Distinguished
> university teacher
> Coordinator, Master’s programme in Data Science
> Director, Uppsala University InfoLab (http://infolab.it.uu.se)
> Nordic network on online disinformation (https://nordis.research.it.uu.se)
> Department of Information Technology, Uppsala University
> TW: @matmagnani @uuinfolab FB: @uuinfolab
>
> On Thu, Sep 10, 2020 at 7:50 PM Stuart Shulman <stuart.shulman at gmail.com>
> wrote:
>
>> There is a difference between not being able or willing and it not
>> being possible.
>>
>> https://developer.twitter.com/en/docs/twitter-api/v1/tweets/compliance/overview
>>
>> It is definitely possible and also required.
>> Many research labs in academia find ways to comply with the
>> documentation above.
>> More, I suspect either know and ignore it, or simply do not know what
>> is expected.
>> It is similar to the need to comply with the office of sponsored
>> research protocols for human subjects.
>> You have to minimize the risk of harm.
>> No compromises for researcher convenience or productivity demands.
>>
>> ~SWS
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Sep 10, 2020 at 1:33 PM Deen Freelon <dfreelon at gmail.com> wrote:
>>
>> > Sure, many countries have a right to be forgotten. The US doesn't, and
>> > AFAIK there's little clear case law that applies to individuals'
>> > presence in research datasets. If someone asks me to remove their data
>> > from my datasets, I'm happy to do so, but I'm not willing to
>> > prospectively monitor Twitter's platform for deletions so that my
>> > datasets always match what is currently available on Twitter. That is
>> > technically infeasible for me, and I suspect for many others as well.
>> >
>> > The practicality aspect I mentioned applies also to users. You can ask
>> > AIR-L members to remove your data, but what assurances do you have that
>> > they've done so? It's impossible even to check that they've actually
>> > read your message. Now consider all the other researchers' datasets of
>> > which your data may be a part--there's no way to even know who to ask.
>> > And all of this to prevent your data from being one point among
>> > millions, with no exposure of any identifying information? It's little
>> > wonder yours is the first data removal request I've ever received, but
>> > as I said, I'll honor it. /DEEN
>> >
>> > On 9/10/2020 10:49 AM, Stuart Shulman wrote:
>> > > There is nothing theoretical about checking in real time for
>> > > deletions. When you study a Tweet's content in the Twitter display, if
>> > > a Tweet is deleted or an account suspended or deleted, the Tweet will
>> > > not display. That is real time compliance. We have done it for many
>> > > years now, all the while advising students and faculty on the
>> > > ethical importance of this point.
>> > >
>> > > The "right to be forgotten" is law in many countries, so I am unsure
>> > > how that is unresolved. Something is either legal or it is not. If
>> > > anyone reading this has any of my deleted Tweets from my deleted
>> > > account, the Canadian part of me requests you immediately delete them.
>> > > If you lack the ability to check for compliance in real time, should
>> > > you be handling my data and violating my right to be forgotten under
>> > > the broad banner of research? I have tweeted extensively about acts by
>> > > a hostile foreign power to game the imminent election. I have recently
>> > > deleted personal Facebook, YouTube and Twitter accounts. Nobody has
>> > > any business holding that data. It is unethical.
>> > >
>> > > There are various guidelines about legally sharing lists of Tweet IDs
>> > > for rehydration and replication (something almost never done) versus
>> > > sharing spreadsheets of complete data extracts or the raw JSON, which
>> > > is done all the time in defiance of the Twitter ToS.
>> > >
>> > > On Thu, Sep 10, 2020 at 9:55 AM Deen Freelon <dfreelon at gmail.com
>> > > <mailto:dfreelon at gmail.com>> wrote:
>> > >
>> > >     The streaming API is great--if 1) you're certain of all your
>> search
>> > >     criteria in advance, and 2) you have a well-calibrated
>> > >     hardware/software
>> > >     setup dedicated to real-time data collection. In practice,
>> > >     researchers
>> > >     often need to add keywords and other criteria after the fact, or
>> the
>> > >     significance of certain events/individuals only becomes clear
>> > >     later, or
>> > >     the limitations of their data collection setup do not become
>> apparent
>> > >     until it fails, etc. The obvious advantage of streaming is that
>> it's
>> > >     free, but it is not viable for a substantial subset of use cases.
>> > >     Sometimes the only way to obtain high-quality historical Twitter
>> > >     data is
>> > >     to purchase it.
>> > >
>> > >     The ethical issues Stu mentions are as yet unresolved.
>> Theoretically,
>> > >     users should be able to demand removal of their data from academic
>> > >     databases at any time, but this is a practical impossibility. Most
>> > >     Twitter-based research would be impossible without storage of the
>> > >     full
>> > >     text and metadata, and there are no widely-used guidelines about
>> how
>> > >     that data should be managed or sunset aside from Twitter's
>> > >     prohibition
>> > >     on the sharing of complete datasets. Certainly something we should
>> > >     continue to think about how best to address... /DEEN
>> > >
>> > >     On 9/10/2020 9:39 AM, Stuart Shulman wrote:
>> > >     > There is some scholarship on the various options which have
>> > >     continued
>> > >     > to evolve over time with respect to Twitter.
>> > >     > https://www.mdpi.com/1660-4601/17/3/864
>> > >     >
>> > >     > I agree that there is usually sufficient free, real time data to
>> > >     > gather from Twitter to reach saturation on most real time
>> current
>> > >     > research questions.
>> > >     >
>> > >     > My personal experience with the cost and regulation of
>> historical
>> > >     > Twitter access for academia is a cautionary tale. Most of what
>> > >     > academics want to study, and often the way they do it, violates
>> the
>> > >     > clear language of the Twitter Terms of Service and also the
>> > >     > increasingly widespread right to be forgotten. If you are
>> storing
>> > >     > spreadsheets of Twitter data that includes over time more and
>> more
>> > >     > material from deleted accounts or deleted Tweets, this is
>> > >     problematic
>> > >     > from a legal perspective and raises ethical review questions
>> > >     > that should not be glossed over in any wikileaks fashion by
>> journal
>> > >     > editors or university ethics officers.
>> > >     >
>> > >     > ~Stu
>> > >     > Dr. Stu Shulman U.S. Soccer Federation C-Licensed Coach Valeo
>> FC &
>> > >     > Capacidad <http://capacidadprograms.org/?page_id=13> Volunteer
>> > >     Coach
>> > >     > /*Is your player ready to give back to the game?* /Contact Coach
>> > >     Stu
>> > >     > about winter & spring 2020 volunteer efforts.
>> > >     > Capacidad  <http://capacidadprograms.org/?page_id=13>
>> > >     >
>> > >     >
>> > >     >
>> > >     >
>> > >     >
>> > >     > On Thu, Sep 10, 2020 at 9:24 AM Peter Joseph Gloviczki PhD
>> > >     > <pgloviczki at coker.edu <mailto:pgloviczki at coker.edu>
>> > >     <mailto:pgloviczki at coker.edu <mailto:pgloviczki at coker.edu>>>
>> wrote:
>> > >     >
>> > >     >     Deen makes a great point about sponsorship. I was referring
>> to
>> > >     >     having to
>> > >     >     use personal funds.
>> > >     >
>> > >     >     Fondly, Peter
>> > >     >
>> > >     >     Peter Joseph Gloviczki, PhD    he/him/his
>> > >     >     Associate Professor of Communication
>> > >     >     Coker University
>> > >     >     300 East College Avenue
>> > >     >     Hartsville, South Carolina 29550
>> > >     >     843.383.8379
>> > >     > pgloviczki at coker.edu <mailto:pgloviczki at coker.edu>
>> > >     <mailto:pgloviczki at coker.edu <mailto:pgloviczki at coker.edu>>
>> > >     >
>> > >     >     Assistant Editor, Journal of Loss and Trauma (Taylor &
>> Francis)
>> > >     >     Immediate Past Head, Cultural and Critical Studies Division,
>> > >     AEJMC
>> > >     >     1st Vice President, Carolinas Communication Association
>> > >     >
>> > >     >
>> > >     >     On Thu, Sep 10, 2020 at 9:20 AM Deen Freelon
>> > >     <dfreelon at gmail.com <mailto:dfreelon at gmail.com>
>> > >     >     <mailto:dfreelon at gmail.com <mailto:dfreelon at gmail.com>>>
>> > wrote:
>> > >     >
>> > >     >     > I have purchased tweets directly from Twitter on multiple
>> > >     >     occasions. I
>> > >     >     > disagree with Dr. Gloviczki about paying research
>> > >     costs--some of the
>> > >     >     > most rigorous research is sponsored. I wouldn't spend my
>> own
>> > >     >     personal
>> > >     >     > money on such costs, but if you've got funding, by all
>> > >     means use it.
>> > >     >     >
>> > >     >     > Twitter allows university-affiliated users to buy data in
>> > >     a few
>> > >     >     ways.
>> > >     >     > I've primarily used their a la carte service (that's just
>> > >     what I
>> > >     >     call
>> > >     >     > it), where you give them a set of search criteria (e.g. a
>> > >     >     keyword[s] and
>> > >     >     > a time period) and they give you a quote. Pricing is
>> based on
>> > >     >     the number
>> > >     >     > of days covered and the total volume of tweets. Their
>> minimum
>> > >     >     price is a
>> > >     >     > little over $1k US and costs can quickly run into the
>> > >     >     five-figure range,
>> > >     >     > especially if you want tweets over a lengthy period of
>> time.
>> > >     >     Also, they
>> > >     >     > have been known to refuse certain data requests,
>> > >     especially those
>> > >     >     > related to international conflict. The criteria for
>> > >     "acceptable"
>> > >     >     data
>> > >     >     > requests are not public--I've asked.
>> > >     >     >
>> > >     >     > Twitter does not advertise this service but it does exist.
>> > >     Fill
>> > >     >     out this
>> > >     >     > form and ask about it:
>> > >     >     >
>> > >     >     >
>> > >     >
>> > >
>> >
>> https://developer.twitter.com/en/products/twitter-api/enterprise/application
>> > >     >     >
>> > >     >     > The associated metadata are the same as provided through
>> the
>> > >     >     standard
>> > >     >     > APIs. These can be found here:
>> > >     >     >
>> > >     >     >
>> > >     >
>> > >
>> >
>> https://developer.twitter.com/en/docs/twitter-api/v1/tweets/post-and-engage/api-reference/get-statuses-lookup
>> > >     >     > Language tags are included, and geographic info is present
>> > >     only when
>> > >     >     > users opt in, which is rare (typically 3-5% of tweets). I
>> > will
>> > >     >     also say
>> > >     >     > that obtaining the data once purchased is not easy--they
>> > >     come as
>> > >     >     GZIPped
>> > >     >     > JSON files packaged in 10-minute increments. So a year of
>> > data
>> > >     >     is far
>> > >     >     > too much to download manually--you'd need to automate your
>> > >     download
>> > >     >     > pipeline. I've written code to do this, so anyone who
>> > >     manages to
>> > >     >     > successfully buy Twitter data may feel free to contact me
>> to
>> > >     >     access my
>> > >     >     > scripts.
>> > >     >     >
>> > >     >     > Twitter also offers a couple other data purchase options,
>> > >     >     including its
>> > >     >     > Premium API
>> > >     >     >
>> > >     >
>> > >      (
>> https://developer.twitter.com/en/products/twitter-api/premium-apis
>> > )
>> > >     >     and
>> > >     >     > its Enterprise API
>> > >     >     >
>> > >     >
>> > >      (
>> https://developer.twitter.com/en/products/twitter-api/enterprise#/
>> > ).
>> > >     >     > These charge pretty steep monthly fees and are oriented
>> > >     more toward
>> > >     >     > corporate and other well-funded clients.
>> > >     >     >
>> > >     >     > Finally, here's their portal for academic researchers,
>> > >     which may
>> > >     >     have
>> > >     >     > some relevant info:
>> > >     >     >
>> > >     >     >
>> > >     >
>> > >
>> >
>> https://developer.twitter.com/en/solutions/academic-research/products-for-researchers
>> > >     >     >
>> > >     >     > Best, /DEEN
>> > >     >     >
>> > >     >     > On 9/10/2020 8:39 AM, Sandrine Roginsky wrote:
>> > >     >     > > Hello everybody,
>> > >     >     > >
>> > >     >     > > Help needed. Does anyone have experience with buying
>> tweets
>> > >     >     from Twitter
>> > >     >     > for research? We have a fairly specific query and would
>> > >     like to
>> > >     >     know which
>> > >     >     > information is given about the tweets harvested through
>> the
>> > >     >     query (e.g. is
>> > >     >     > language or geographic information given for tweets, even
>> > >     if it
>> > >     >     isn't part
>> > >     >     > of the query - so not a selection criterion)?
>> > >     >     > >
>> > >     >     > > Many thanks.
>> > >     >     > >
>> > >     >     > > Best wishes,
>> > >     >     > > Sandrine
>> > >     >     > >
>> > >     >     > >
>> > >     >     > >
>> > >     >     > > Sandrine Roginsky
>> > >     >     > > Associate Professor
>> > >     >     > >
>> > >     >     > > Faculty of Economic, Social and Political Sciences, and
>> > >     >     Communication
>> > >     >     > > Institute Language & Communication, PCOM / LASCO
>> > >     >     > >
>> > >     >     > >
>> > >     >     > >
>> > >     >     > >
>> > >     >     > > _______________________________________________
>> > >     >     > > The Air-L at listserv.aoir.org
>> > >     <mailto:Air-L at listserv.aoir.org> <mailto:Air-L at listserv.aoir.org
>> > >     <mailto:Air-L at listserv.aoir.org>>
>> > >     >     mailing list
>> > >     >     > > is provided by the Association of Internet Researchers
>> > >     > http://aoir.org
>> > >     >     > > Subscribe, change options or unsubscribe at:
>> > >     >     > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>> > >     >     > >
>> > >     >     > > Join the Association of Internet Researchers:
>> > >     >     > > http://www.aoir.org/
>> > >     >     >
>> > >     >     > --
>> > >     >     > Deen Freelon, Ph.D.
>> > >     >     > Associate Professor -> Hussman School of Journalism and
>> Media
>> > >     >     > Principal Researcher -> Center for Information,
>> > >     Technology, and
>> > >     >     Public Life
>> > >     >     > University of North Carolina at Chapel Hill
>> > >     >     > http://dfreelon.org | @dfreelon
>> > >     <https://twitter.com/dfreelon> |
>> > >     >     > https://github.com/dfreelon | https://citap.unc.edu/
>> > >     >     > Schedule an appointment with me
>> > >     >     > <https://doodle.com/mm/deenfreelon/book-a-time>
>> > >     >     > _______________________________________________
>> > >     >     > The Air-L at listserv.aoir.org
>> > >     <mailto:Air-L at listserv.aoir.org> <mailto:Air-L at listserv.aoir.org
>> > >     <mailto:Air-L at listserv.aoir.org>>
>> > >     >     mailing list
>> > >     >     > is provided by the Association of Internet Researchers
>> > >     > http://aoir.org
>> > >     >     > Subscribe, change options or unsubscribe at:
>> > >     >     > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>> > >     >     >
>> > >     >     > Join the Association of Internet Researchers:
>> > >     >     > http://www.aoir.org/
>> > >     >     >
>> > >     >     _______________________________________________
>> > >     >     The Air-L at listserv.aoir.org <mailto:Air-L at listserv.aoir.org
>> >
>> > >     <mailto:Air-L at listserv.aoir.org <mailto:Air-L at listserv.aoir.org>>
>> > >     >     mailing list
>> > >     >     is provided by the Association of Internet Researchers
>> > >     http://aoir.org
>> > >     >     Subscribe, change options or unsubscribe at:
>> > >     > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>> > >     >
>> > >     >     Join the Association of Internet Researchers:
>> > >     > http://www.aoir.org/
>> > >     >
>> > >
>> > >     --
>> > >     Deen Freelon, Ph.D.
>> > >     Associate Professor -> Hussman School of Journalism and Media
>> > >     Principal Researcher -> Center for Information, Technology, and
>> > >     Public Life
>> > >     University of North Carolina at Chapel Hill
>> > >     http://dfreelon.org | @dfreelon <https://twitter.com/dfreelon> |
>> > >     https://github.com/dfreelon | https://citap.unc.edu/
>> > >     Schedule an appointment with me
>> > >     <https://doodle.com/mm/deenfreelon/book-a-time>
>> > >     _______________________________________________
>> > >     The Air-L at listserv.aoir.org <mailto:Air-L at listserv.aoir.org>
>> > >     mailing list
>> > >     is provided by the Association of Internet Researchers
>> > http://aoir.org
>> > >     Subscribe, change options or unsubscribe at:
>> > >     http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>> > >
>> > >     Join the Association of Internet Researchers:
>> > >     http://www.aoir.org/
>> > >
>> >
>> > --
>> > Deen Freelon, Ph.D.
>> > Associate Professor -> Hussman School of Journalism and Media
>> > Principal Researcher -> Center for Information, Technology, and Public
>> Life
>> > University of North Carolina at Chapel Hill
>> > http://dfreelon.org | @dfreelon <https://twitter.com/dfreelon> |
>> > https://github.com/dfreelon | https://citap.unc.edu/
>> > Schedule an appointment with me
>> > <https://doodle.com/mm/deenfreelon/book-a-time>
>> > _______________________________________________
>> > The Air-L at listserv.aoir.org mailing list
>> > is provided by the Association of Internet Researchers http://aoir.org
>> > Subscribe, change options or unsubscribe at:
>> > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>> >
>> > Join the Association of Internet Researchers:
>> > http://www.aoir.org/
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at:
>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
>>
>



More information about the Air-L mailing list