[Air-L] TayTweets Dataset?

Heinz, Lisa ls144009 at ohio.edu
Sat Apr 16 17:49:09 PDT 2016


@TayandYou, watch the spelling, there is at least one fake account already...

~~~~~~~~~~~~~~
Lisa M. Heinz
PhD Student
E.W. Scripps School of Journalism
Ohio University, Scripps College of Communication

@LivingRural on Twitter

Sent from my iPhone

On Apr 16, 2016, at 8:42 PM, Shulman, Stu <stu at texifter.com<mailto:stu at texifter.com>> wrote:

Unless they were all deleted, you can access them using:

sifter.texifter.com<http://sifter.texifter.com>

What was the @ handle of Tay? I can share the estimate with the list, perhaps more...

On Sat, Apr 16, 2016 at 8:39 PM, Heinz, Lisa <ls144009 at ohio.edu<mailto:ls144009 at ohio.edu>> wrote:
Cory, you bring up some very good points I also thought about as I've searched for the dataset.


Screenshots (and news articles) are one source, but they would contain the most inflammatory of the tweets and still represent only a fraction of a fraction of the entire dataset of more than 90,000 tweets from Tay itself. Also, this figure does not include all the mentions that are critical to an analysis of the devolution of this bot, which may at least double that number considering much (all?) of Tay's tweets were responses to questions and comments.


Last summer, Twitter finished the process of archiving tweets clear back to when it launched to make them accessible to researchers/enterprise customers beginning last fall (go here for a short list of providers: http://www.pcmag.com/article2/0,2817,2489442,00.asp).  We don't have access at our SMART lab, yet, but if the tweets exist, they are hiding behind a now-private account so the usual scrapers won't find them. Unless someone out there has a few tricks up their sleeves...




~~Lisa


~~~~~~~~~~~~~~~~~~~
Lisa Heinz
PhD Student in Mass Communication
E.W. Scripps School of Journalism
Ohio University, Scripps College of Communication
Twitter<http://twitter.com/livingrural>
LinkedIn<http://linkedin.com/in/lisaheinz>

________________________________
From: Cory Salveson <corysalveson at gmail.com<mailto:corysalveson at gmail.com>>
Sent: Saturday, April 16, 2016 4:50:50 PM
To: Kishonna Gray
Cc: Heinz, Lisa; air-l at listserv.aoir.org<mailto:air-l at listserv.aoir.org>
Subject: Re: [Air-L] TayTweets Dataset?

It's not much, but Archive.org<http://archive.org> has multiple snapshots of tweets here<https://web.archive.org/web/20160324022856*/https://mobile.twitter.com/TayandYou> starting March 24. It lacks the extended metadata available via the API, but it's better than nothing.

In case someone out there did do a more complete capture, I put out a request on the "datasets" subreddit here<https://www.reddit.com/r/datasets/comments/4f3hmx/request_taytweets_tayandyou_archive/>, and there's also a Quora question (from somebody else) here<https://www.quora.com/Have-TayTweetss-tweets-been-archived>. If anything surfaces, maybe one or both of these spots will get updated.

If nothing else, the unavailability of these tweets is itself interesting. Others can advance a more nuanced analysis than me here, but for example, I wonder: on what basis are we as researchers "blocked" from accessing the tweets? I assume Twitter still technically has them in their databases somewhere, so isn't this essentially Twitter respecting Microsoft's -- maybe even Tay's -- right to privacy under the Twitter TOS, like any other user? If so, then are all Twitter users really created equally with respect to privacy, or should some actors, as is the case for public figures generally, enjoy less? In other words, should the public be allowed to request these tweets directly from Twitter? Etc.

Cory Salveson
http://corysalveson.com

On Fri, Apr 15, 2016 at 3:47 PM, Kishonna Gray <kishonnagray at gmail.com<mailto:kishonnagray at gmail.com><mailto:kishonnagray at gmail.com<mailto:kishonnagray at gmail.com>>> wrote:
this would have been great to capture them. i have archived some from the
associated hashtags. so if anyone did, that would be amazing. "meltdown" is
an understatement.

On Fri, Apr 15, 2016 at 2:34 PM, Heinz, Lisa <ls144009 at ohio.edu<mailto:ls144009 at ohio.edu><mailto:ls144009 at ohio.edu<mailto:ls144009 at ohio.edu>>> wrote:

> Hello... I wondered if someone on this list was able to capture all of
> Microsoft's Tay bot's tweets and mentions from the time it went live until
> it was shut down, March 23-24? I did not get my collector setup in time, so
> I am looking for someone who collected Tay's original tweets and mentions,
> and who would be willing to share them.  I am primarily interested in the
> avatar's meltdown as an historical marker in the adoption of this
> technology.
>
>
> ~~Lisa
>
>
> ~~~~~~~~~~~~~~~~~~~
> Lisa Heinz
> PhD Student in Mass Communication
> E.W. Scripps School of Journalism
> Ohio University, Scripps College of Communication
> Twitter<http://twitter.com/livingrural>
> LinkedIn<http://linkedin.com/in/lisaheinz>
>
> _______________________________________________
> The Air-L at listserv.aoir.org<mailto:Air-L at listserv.aoir.org><mailto:Air-L at listserv.aoir.org<mailto:Air-L at listserv.aoir.org>> mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
_______________________________________________
The Air-L at listserv.aoir.org<mailto:Air-L at listserv.aoir.org><mailto:Air-L at listserv.aoir.org<mailto:Air-L at listserv.aoir.org>> mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/

_______________________________________________
The Air-L at listserv.aoir.org<mailto:Air-L at listserv.aoir.org> mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/



--
Dr. Stuart W. Shulman
Founder and CEO, Texifter
LinkedIn: http://www.linkedin.com/in/stuartwshulman



More information about the Air-L mailing list