[Air-L] First simple request: directory of twitter accounts for organizations?

Peter Timusk peterotimusk at gmail.com
Sat Sep 5 22:07:14 PDT 2020


We have this matching problem at my job where I get a list of business
names and street addresses or websites and have to match them to businesses
on our national business survey frame. This links to their tax data.

 I know that is not Internet related but the matching problem is a common
enough problem i believe in computer science.

 I also need sometimes to search a web site like www.catspics.com to find
the real company name like 'media 3d' in the 'terms of use' page or
'privacy policy' page. I have not been able to automate this yet. I have
tried with R and the rvest package but the web sites tend to be
heterogeneous and canned CMS sites with very little CSS or HTML5 exposed to
gather.

On Sat., Sep. 5, 2020, 9:33 p.m. Muira McCammon, <muira.n.mccammon at gmail.com>
wrote:

> Hi, Ronald et al.,
>
> This is an interesting thread, and Ed, thanks for sharing that post with us
> all.
>
> I'm speaking as someone, who did *not *initially automate this process of
> searching for orgs' social media accounts, but one thing I wanted to add
> that if you do want to get technical and/or exhaustive about this, it may
> be worth using Wayback to double check that the social media accounts
> currently associated with an org weren't preceded/predated by others. These
> days, there's a lot of org accounts disappearing overnight with very little
> heads up.
>
> In my own case, I have spent much of the past five years searching for,
> cataloguing, and tracking over approx. 2927 Twitter accounts associated
> with the U.S. federal government. I built on that project this summer by
> documenting all the social media accounts run by US state-level public
> health departments. I don't necessarily recommend you go the manual road on
> this but wanted to share a few observations from the process.
>
> I began this journey years ago by going individually to each US fed
> agency's homepage and seeing which official social media accounts were
> listed. I then cross-checked this process by searching for each agency's
> name in Twitter. Another level of checking entailed seeing which accounts
> the govt agencies themselves were following. Some initiatives have popped
> up over the years to try to keep track of govt Twitter accounts (Politwoops
> and Voxgov and even Digital.gov), but they are far from exhaustive. I guess
> I'm saying this, because it's good to remember that many orgs these days
> will have one primary Twitter account but then will launch smaller accounts
> related to specific initiatives/campaigns/etc. Often, it's really hard to
> find these unless you dig into who specific accounts are following.
>
> Anecdotally, I also wanted to add that a lot of orgs these days aren't
> updating their homepages webpages to reflect the full extent of their
> social media presence, in part because many are continuing to experiment
> with the platforms that work best for their mission.
>
> Muira
>
> On Sat, Sep 5, 2020 at 7:42 AM Shulman, Stu <stu at texifter.com> wrote:
>
> > I would add that Maurice points to the non-trivial task of disambiguation
> > when an organization name overlaps terms in common usage. For example,
> > United Airlines is an organization, but it is most commonly referred to
> as
> > United. Manchester United is a very popular football organization, most
> > often referred to as United. The list of other widespread uses of this
> > common organization name sums up the disambiguation problem. It can be
> done
> > with training and machine-learning, but not for 2000 terms unless you
> have
> > an army of workers and lots of money. That suggests a second point,
> > essentially that the practical steps required to gather data for 2000
> > organizations over time and remain compliant with rate and query limits
> > would be daunting. You might consider trying the task with 5
> > organizations to assess the challenge of performing the task at scale.
> > Finally, from the view of qualitative research, depending on your end
> > goals, you may not need such a huge number of organizations to reach
> > saturation during analysis. That is, say you looked at 50 organizations
> and
> > then noticed on 51-60 that you were not learning much you had not already
> > learned. That is saturation.
> >
> > On Sat, Sep 5, 2020 at 6:08 AM Vergeer, M.R.M. (Maurice) <
> > m.vergeer at maw.ru.nl> wrote:
> >
> > > Hi Ronald,
> > >
> > > yes it can be done, using R and the package rtweet. As for the YouTube
> > > question in the other, a similar approach could be done with R and the
> > > package Tuber. It probably needs a "do for" loop. Not sure rtweet
> > (beware,
> > > technical lingo ahead)  is vectorized for this problem.
> > > A loop will take some time though, given the large number of
> > > organizations. Furthermore, because one query will return multiple
> > results,
> > > some semi-manual evaluation needs to take place to asses which account
> is
> > > the actual account.
> > > But, anyone with some experience with R could do it.
> > > Hope that herlps.
> > >
> > > best regards
> > > Maurice
> > >
> > > ________________________________________________
> > > Maurice Vergeer
> > > www.mauricevergeer.nl
> > >
> > > ________________________________________________
> > >
> > > ________________________________________
> > > Van: Air-L <air-l-bounces at listserv.aoir.org> namens Ronald Rice <
> > > rrice at comm.ucsb.edu>
> > > Verzonden: zaterdag 5 september 2020 01:46
> > > Aan: AoIR-L
> > > Onderwerp: [Air-L] First simple request: directory of twitter accounts
> > for
> > > organizations?
> > >
> > > Hi folks.  This is an incredibly simple question, and I told my
> > colleagues
> > > that I was sure someone (probably many) on AoIR knows the answer to
> this.
> > > I have a study with 2000 organizations (and their official names) and
> > wish
> > > to find out their main twitter account.  Twitter has a public
> directory,
> > > but it's browse only.  I'm sure a quick script could take the table of
> > org
> > > names, apply it to some aspect of a twitter API or twitter database and
> > > return a list. But I'm not trained in that really cool and powerful set
> > of
> > > approaches.  However, I'm also sure there is in fact already existing a
> > > twitter directory where you could enter the organization name and get
> the
> > > account. The paleolithic approach is to search each of the 2000
> websites
> > > (which we have) to see if there's a twitter account posted; or worse,
> > type
> > > the org name and "twitter" in Google search. Anyone have a suggestion?
> > > Thanks, so much, in advance.
> > > --
> > > Ronald E. Rice
> > > Arthur N. Rupe Professor in the Social Effects of Mass Communication
> > > Department of Communication
> > > 4127 SS&MS Bldg
> > > Santa Barbara, CA 93106-4020
> > > 805-893-8696; rrice at comm.ucsb.edu
> > > https://www.comm.ucsb.edu/people/ronald-e-rice
> > > [image: UC Santa Barbara]
> > > _______________________________________________
> > > The Air-L at listserv.aoir.org mailing list
> > > is provided by the Association of Internet Researchers http://aoir.org
> > > Subscribe, change options or unsubscribe at:
> > > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> > >
> > > Join the Association of Internet Researchers:
> > > http://www.aoir.org/
> > > _______________________________________________
> > > The Air-L at listserv.aoir.org mailing list
> > > is provided by the Association of Internet Researchers http://aoir.org
> > > Subscribe, change options or unsubscribe at:
> > > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> > >
> > > Join the Association of Internet Researchers:
> > > http://www.aoir.org/
> > >
> >
> >
> > --
> > Dr. Stuart W. Shulman
> > Founder and CEO, Texifter
> > Editor Emeritus, *Journal of Information Technology & Politics*
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
> >
>
>
> --
> *Muira McCammon*
>
> *Ph.D. candidate, Annenberg School for Communication, University of
> Pennsylvania M.L., University of Pennsylvania Law School (2020)M.A. in
> Translation Studies, University of Massachusetts, Amherst (2016) A bit
> about my research here
> <
> https://penntoday.upenn.edu/news/Penn-grad-student-studies-information-flow-Guantanamo-Bay-Gitmo-detention-center
> >Twitter:
> @muira_mccammonPlease note that I am working more flexibly and I may send
> and respond to emails out of hours - there is no expectation or desire that
> you do the same. *
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>



More information about the Air-L mailing list