[Air-L] TayTweets Dataset?

Heinz, Lisa ls144009 at ohio.edu
Sun Apr 17 04:30:59 PDT 2016


I have an update! Tully Hansen  a writer and "bot overlord" from Australia sent me the following link of screen dumps of the Devolution of Tay. Probably the most comprehensive collection, yet.

I'll keep looking and post what I find. The dataset through Sifter would cost about $100 and might be worth it, though without Tay's tweets, isn't as valuable to me. I'll check with our new system once it's up and running over the summer. Thank you, everyone, for your help!

http://m.imgur.com/a/Zfwsz

~~Lisa

~~~~~~~~~~~~~~
Lisa M. Heinz
PhD Student
E.W. Scripps School of Journalism
Ohio University, Scripps College of Communication

@LivingRural on Twitter

Sent from my iPhone

On Apr 16, 2016, at 10:07 PM, Shulman, Stu <stu at texifter.com<mailto:stu at texifter.com>> wrote:

Absolutely not. Deleted and private Tweets are not accessible via Sifter.

Sifter is a Twitter-compliant tool and not strictly speaking a scraper.

It is Twitter-approved app for generating free estimates. It is used primarily by academics who need 1,000-1,000,000 Tweets on a tight budget.

~Stu

On Saturday, April 16, 2016, Heinz, Lisa <ls144009 at ohio.edu<mailto:ls144009 at ohio.edu>> wrote:
Thanks for trying, Stu! I wonder, though, does the scraper you used have the ability to access data behind a privacy lock?

~~~~~~~~~~~~~~
Lisa M. Heinz
PhD Student
E.W. Scripps School of Journalism
Ohio University, Scripps College of Communication

@LivingRural on Twitter

Sent from my iPhone

On Apr 16, 2016, at 9:32 PM, Shulman, Stu <stu at texifter.com<javascript:_e(%7B%7D,'cvml','stu at texifter.com');>> wrote:

Update

from:TayandYou March 23-24:  0 Tweets available

TayandYou March 23-24 (not from @TayandYou, but which mention TayandYou): ~122,000 Tweets.

I suppose the folks at Microsoft decided to delete everything on that somewhat misguided experiment...

On Sat, Apr 16, 2016 at 8:39 PM, Heinz, Lisa <ls144009 at ohio.edu<javascript:_e(%7B%7D,'cvml','ls144009 at ohio.edu');>> wrote:
Cory, you bring up some very good points I also thought about as I've searched for the dataset.


Screenshots (and news articles) are one source, but they would contain the most inflammatory of the tweets and still represent only a fraction of a fraction of the entire dataset of more than 90,000 tweets from Tay itself. Also, this figure does not include all the mentions that are critical to an analysis of the devolution of this bot, which may at least double that number considering much (all?) of Tay's tweets were responses to questions and comments.


Last summer, Twitter finished the process of archiving tweets clear back to when it launched to make them accessible to researchers/enterprise customers beginning last fall (go here for a short list of providers: http://www.pcmag.com/article2/0,2817,2489442,00.asp).  We don't have access at our SMART lab, yet, but if the tweets exist, they are hiding behind a now-private account so the usual scrapers won't find them. Unless someone out there has a few tricks up their sleeves...




~~Lisa


~~~~~~~~~~~~~~~~~~~
Lisa Heinz
PhD Student in Mass Communication
E.W. Scripps School of Journalism
Ohio University, Scripps College of Communication
Twitter<http://twitter.com/livingrural>
LinkedIn<http://linkedin.com/in/lisaheinz>

________________________________
From: Cory Salveson <corysalveson at gmail.com<javascript:_e(%7B%7D,'cvml','corysalveson at gmail.com');>>
Sent: Saturday, April 16, 2016 4:50:50 PM
To: Kishonna Gray
Cc: Heinz, Lisa; air-l at listserv.aoir.org<javascript:_e(%7B%7D,'cvml','air-l at listserv.aoir.org');>
Subject: Re: [Air-L] TayTweets Dataset?

It's not much, but Archive.org<http://archive.org> has multiple snapshots of tweets here<https://web.archive.org/web/20160324022856*/https://mobile.twitter.com/TayandYou> starting March 24. It lacks the extended metadata available via the API, but it's better than nothing.

In case someone out there did do a more complete capture, I put out a request on the "datasets" subreddit here<https://www.reddit.com/r/datasets/comments/4f3hmx/request_taytweets_tayandyou_archive/>, and there's also a Quora question (from somebody else) here<https://www.quora.com/Have-TayTweetss-tweets-been-archived>. If anything surfaces, maybe one or both of these spots will get updated.

If nothing else, the unavailability of these tweets is itself interesting. Others can advance a more nuanced analysis than me here, but for example, I wonder: on what basis are we as researchers "blocked" from accessing the tweets? I assume Twitter still technically has them in their databases somewhere, so isn't this essentially Twitter respecting Microsoft's -- maybe even Tay's -- right to privacy under the Twitter TOS, like any other user? If so, then are all Twitter users really created equally with respect to privacy, or should some actors, as is the case for public figures generally, enjoy less? In other words, should the public be allowed to request these tweets directly from Twitter? Etc.

Cory Salveson
http://corysalveson.com

On Fri, Apr 15, 2016 at 3:47 PM, Kishonna Gray <kishonnagray at gmail.com<javascript:_e(%7B%7D,'cvml','kishonnagray at gmail.com');><mailto:kishonnagray at gmail.com<javascript:_e(%7B%7D,'cvml','kishonnagray at gmail.com');>>> wrote:
this would have been great to capture them. i have archived some from the
associated hashtags. so if anyone did, that would be amazing. "meltdown" is
an understatement.

On Fri, Apr 15, 2016 at 2:34 PM, Heinz, Lisa <ls144009 at ohio.edu<javascript:_e(%7B%7D,'cvml','ls144009 at ohio.edu');><mailto:ls144009 at ohio.edu<javascript:_e(%7B%7D,'cvml','ls144009 at ohio.edu');>>> wrote:

> Hello... I wondered if someone on this list was able to capture all of
> Microsoft's Tay bot's tweets and mentions from the time it went live until
> it was shut down, March 23-24? I did not get my collector setup in time, so
> I am looking for someone who collected Tay's original tweets and mentions,
> and who would be willing to share them.  I am primarily interested in the
> avatar's meltdown as an historical marker in the adoption of this
> technology.
>
>
> ~~Lisa
>
>
> ~~~~~~~~~~~~~~~~~~~
> Lisa Heinz
> PhD Student in Mass Communication
> E.W. Scripps School of Journalism
> Ohio University, Scripps College of Communication
> Twitter<http://twitter.com/livingrural>
> LinkedIn<http://linkedin.com/in/lisaheinz>
>
> _______________________________________________
> The Air-L at listserv.aoir.org<javascript:_e(%7B%7D,'cvml','Air-L at listserv.aoir.org');><mailto:Air-L at listserv.aoir.org<javascript:_e(%7B%7D,'cvml','Air-L at listserv.aoir.org');>> mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
_______________________________________________
The Air-L at listserv.aoir.org<javascript:_e(%7B%7D,'cvml','Air-L at listserv.aoir.org');><mailto:Air-L at listserv.aoir.org<javascript:_e(%7B%7D,'cvml','Air-L at listserv.aoir.org');>> mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/

_______________________________________________
The Air-L at listserv.aoir.org<javascript:_e(%7B%7D,'cvml','Air-L at listserv.aoir.org');> mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/



--
Dr. Stuart W. Shulman
Founder and CEO, Texifter
LinkedIn: http://www.linkedin.com/in/stuartwshulman


--
Dr. Stuart W. Shulman
Founder and CEO, Texifter
LinkedIn: http://www.linkedin.com/in/stuartwshulman




More information about the Air-L mailing list