[Air-L] Text/Data Mining Software Suggestions: for YouTube, Facebook & Instagram?

Tue Nov 10 04:43:45 PST 2020

I feel that "if the social relevance of the topic warrants it" is not an
argument university human subjects review panels can use to defend
techniques that break laws and violate privacy. Although I try my best to
support students and faculty doing socially relevant studies, I draw a red
line when it comes to knowingly violating platform rules. I have watched
this debate evolve for a decade and sought carefully to engineer compliance
into our tools. Meanwhile, concerned academics are writing papers about the
API lockdowns as if their own disregard for laws and norms (ex., the right
to be forgotten) is not one of the root causes of the lockdowns. Not the
sole cause, as there are strategic financial drivers impelling lockdowns,
among other legal landmines. However, if you are one of the many thousands
of academics sitting on spreadsheets of social data you have never checked
for deletions before doing analysis and have no plans to delete all of that
data at some point, I would suggest you are failing a basic test for
ethical and legal research practices. While some well-funded research
groups are working hard, for example, to use the compliance stream with
stored Twitter data, the vast majority appear to treat stored social data
as if it were the exactly the same as a stamp collection or a scrapbook of
newspaper clippings you might store forever and claim complete ownership
over. It is not. By pursuing research without the same level of ethical and
legal compliance routinely required for interviews and focus groups (ex.,
we de-identify data, destroy the recordings, and delete the transcripts
after the research is complete), the anarchic world of scraping, storing,
and mining the personal data of millions of people mimics the very things
we like the least about the platforms. Whilst we may deem our own research
to be warranted irrespective of any or all laws and norms, anyone from any
perspective could use that argument to study anything using any method, no
matter how invasive, insensitive, or harmful to the research subjects.
After a decade on Twitter as @stuartwshulman, I recently deleted my
account. Many of you reading this post may have stored some Tweets I wrote
about politics, sports, and growing garlic as well as my family, dogs, and
close friends. Please be advised I request that you delete 100% of my
Tweets from your databases. If you can comply and actually do so, good
work, as you are well ahead of the curve. If you cannot or will not, you
should report to the research compliance office at your university
(especially in Europe) and explain why you cannot find and delete that data
for me, and every other former Twitter user like me who may not have
thought of using a listserv post to flag this issue related to my right to
be forgotten.

Dr. Stuart ShulmanU.S. Soccer Federation C-Licensed Coach
Northampton High School Boys Varsity Coach

On Tue, Nov 10, 2020 at 6:44 AM Bernhard Rieder <berno.rieder at gmail.com>
wrote:

> Dear colleagues,
>
> I would like to disagree with Brooke here. Facebook data can still be
> accessed through non-scraping based API-access, most importantly the
> awesome Facepager.
>
> For Instagram, scraping is indeed the go-to technique (instaloader works
> very well) and I would like to defend the idea that ToS should not hinder
> researchers if the social relevance of the topic warrants it. Adhering to
> corporate policy is not the gold standard for what independent research
> should strive for, in my view. Proposing topics to people at Facebook may
> be a strategy for certain topics, but for anything that does not fit within
> the narrow interests of the platform, this will most likely go nowhere.
>
> For YouTube, you can also check out the YouTube Data Tools that I have
> been maintaining here: https://tools.digitalmethods.net/netvizz/youtube/
>
> All the best,
> Bernhard
>
>
> > On 10 Nov 2020, at 05:22, Brooke Criswell via Air-L <
> air-l at listserv.aoir.org> wrote:
> >
> > Facebook and Instagram are strict and according to terms and conditions
> > they don't allow any data scraping.
> >
> > Best try is to propose your study to a researcher at Facebook
> >
> > On Mon, Nov 9, 2020, 2:21 AM Alexandre Leroux <alleroux at ulb.ac.be>
> wrote:
> >
> >> Facepager for FB and YT it has a user interface and a decent
> documentation.
> >>
> >> There are scrappers for instagram but those don't comply with the
> >> platform terms of use and afaik are terminal only.
> >>
> >>
> >> On 6/11/20 14:59, Cristina Migliaccio wrote:
> >>> Dear Colleagues,
> >>>
> >>> Advance apologies if this question has been addressed (as I am certain
> it
> >>> has been) in some previous forum/email---does an easy to use text/data
> >>> mining software/platform exist that works across these 3 social media
> >>> platforms: YouTube, Facebook & Instagram?
> >>>
> >>> I would like to collect data on alphabetic features but also
> >> paralinguistic
> >>> features such as likes, shares, etc.
> >>>
> >>> Any suggestions whatsoever for a text/data mining beginner would be
> >> greatly
> >>> appreciated (videos, lectures to this end also appreciated!)
> >>>
> >>> Warm thanks-
> >>> Cristina Migliaccio
> >>> _______________________________________________
> >>> The Air-L at listserv.aoir.org mailing list
> >>> is provided by the Association of Internet Researchers http://aoir.org
> >>> Subscribe, change options or unsubscribe at:
> >> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >>>
> >>> Join the Association of Internet Researchers:
> >>> http://www.aoir.org/
> >>>
> >>
> >> --
> >> Alexandre Leroux
> >> Ph.D candidate
> >> Group for research on Ethnic Relations, Migrations and Equality (GERME)
> >> Université Libre de Bruxelles (ULB)
> >> alleroux at ulb.ac.be
> >> _______________________________________________
> >> The Air-L at listserv.aoir.org mailing list
> >> is provided by the Association of Internet Researchers http://aoir.org
> >> Subscribe, change options or unsubscribe at:
> >> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >>
> >> Join the Association of Internet Researchers:
> >> http://www.aoir.org/
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/