[Air-L] Sampling facebook pages

Sun Sep 14 05:42:36 PDT 2014

The random sampling tool in DiscoverText is used for all sorts of
exploratory coding forays. Combined with a filter that excludes items
already coding, it simplifies the steps toward saturation that Annette
usefully describes.

For some social data, particularly Twitter and other re-post enabling
publishers (ex., Tumblr), you may want to use other methods to explore
diversity in the dataset. Finding all the duplicates and setting them aside
(except for a sample item) before you take a random sample greatly
increases the diversity of the sample. Finding all the near-duplicates and
taking a sample based on one-per-cluster also has the same effect.

Random sampling is a great enabler of social data research using much
larger collections. The question of how and why to undertake pre-processing
the data for diversity and/or topical relevance prior to creating a random
sample is an open one worthy of wider discussion.

On Sun, Sep 14, 2014 at 7:41 AM, Annette Markham <amarkham at gmail.com> wrote:

> Hmm…I’m not sure yet, so let’s get more information about what you’re
> doing.  If you've already developed specific codes or themes you’re looking
> for in the discussions, you don’t necessarily need to sample, since you
> could search and find within the entire population of content.  In this
> case, you are presumably looking for instances of ‘tolerance,’ at different
> levels, I suppose, which you would have pre-operationalized. This all
> depends on how specifically “political tolerance” can be articulated and
> thus identified by computer-aided coding, of course.
>
> But you mention qualitative analysis, which leads me to suspect that you
> are at an earlier stage of exploration. Would I be on the right track to
> assume you plan to do something like a more qualitative exploration of
> posts/comments, where you conduct open coding, and through this coding
> process, develop a list of relevant / salient categories of “tolerance”?
> If this is the case, I would recommend a two step sampling approach that
> involves more stages of analysis.  If you need to generate relevant
> categories that indicate different levels or qualities of ’political
> tolerance,’ you might first use a sampling approach inspired by grounded
> theory, where you do open coding until you reach saturation. You might have
> a list of themes or codes that could be then transformed to categories that
> you’ll look for more deliberately in your second sampling/analysis.  At
> that point, you could return to the original sample and do another round of
> coding where you’re  more deliberately seeking certain instances.   If this
> second scenario sounds more like what you’re doing, there are many ways to
> sample, but I would start with this sort of sampling scheme:
>
> sample 1: a systematic sample within the four pages that seeks to cover as
> many different types, instances, and levels of political tolerance as
> possible. To cover the entire population in a systematic but thorough way
> without losing your mind, you could code every x number of post/comments,
> gradually getting more and more covered.
> e.g., for 1000 total posts in one FB page:
> 1st pass: code every 100th post (10 total)
> 2nd pass: code every 50th post (10 more)
> 3rd pass: code every 25th post (20 more)
> may need more or less, depending on point of saturation or emergence of
> themes. This is an exploratory, inductive process.
>
> (this is systematic, non random, seeking variation)
>
> Then if you are really determined to do a quantitative approach, you could
> develop a coding scheme that could be applied/sought in a more
> number-generating way by different analysts, who have been trained to find
> intercoder reliability, if that’s what you’re seeking.
>
> sample 2 would enact a more deductive approach, designed for a less
> open-ended analysis. This sample could be taken in a number of different
> ways... but I’m not the one to provide a very sophisticated set of
> techniques, since my strength is in the inductive/qualitative arena.
>
> does that help?
>
>
> On 14 Sep 2014, at 12:17, Noha Nagi <noha.a.nagi at gmail.com> wrote:
>
> > Hello Annette,
> >
> > Thanks for your recommendations.
> >
> > To make it more clear.... I selected purposefully four facebook pages,
> but I don't want to analyze all their content (posts and comments). I am
> using a quantitative method mainly.
> > I will analyze content qualitatively & quantitatively and then measure
> the level of political toleration in discussions within the pages as an
> estimate for the political toleration within a society.
> >
> > Will this change your reply?
> >
> > On Sun, Sep 14, 2014 at 12:55 PM, Annette Markham <amarkham at gmail.com>
> wrote:
> > Hi Noha,
> >
> > Not sure what you’ve already done to establish the most appropriate
> sampling plan, but off the top of my head systematic is not part of random
> sampling.  You’re probably doing a purposeful sampling. Are you taking a
> qualitative or quantitative orientation to the analysis? That makes a
> difference in how you’ll describe (and conduct) the sample.
> >
> > As for further reading:  Because these two books are sitting open on my
> desk, I can recommend these basic introductions to sampling concepts and
> terms:
> >
> > Sarah Tracy’s textbook on qualitative research methods:  covers
> different qualitative sampling strategies (in the chapter on interviewing).
> http://eu.wiley.com/WileyCDA/WileyTitle/productCd-EHEP002631.html
> > I’ve shared a screenshot of her summary of types of sampling here, but
> the larger section is much more detailed:
> https://www.dropbox.com/s/6jmm7th5mbl0jfz/tracysamplingchart.png?dl=0
> >
> > Donald Treadwell’s textbook on introducing Communication Research:
> presents positivist and interpretivist notions of sampling.
> http://www.sagepub.com/textbooks/Book237564
> > You can see a copy of Treadwell's chapter on sampling here:
> https://www.dropbox.com/s/0320vm66m0gec2w/Treadwellch8Sampling.pdf?dl=0
> >
> > and here’s a nice piece that cuts deeper into the ideas and complexity
> of qualitative sampling:
> >
> http://corcom300-s12-lay.wikispaces.umb.edu/file/view/ARTICLE_Sampling_Qualitative.pdf
> >
> >  Best,
> >
> > annette
> >
> > On 14 Sep 2014, at 11:22, Noha Nagi <noha.a.nagi at gmail.com> wrote:
> >
> >> Dear Professors and colleagues,
> >>
> >> I was wondering if there are different sampling methods for internet
> data
> >> than the already known sampling methods.
> >>
> >> For my research, I was thinking of taking a *systematic random sample*
> from
> >> facebook posts on a each of four facebook pages. I have no information
> >> about the heterogeneity between the different pages according to any
> >> variable (gender, political affiliation...) so I thought it is not
> >> stratified nor cluster, and it will be more likely a systematic sample.
> >>
> >>
> >>    Is systematic sampling* logically right*?
> >>
> >>    Did any one come across *other sampling techniques for facebook
> pages*?
> >> or internet data in general?
> >>
> >>    Can anyone suggest *a book to read about this*?
> >>
> >>
> >> I would love to hear any advice from you.
> >>
> >>
> >> Yours,
> >> *Noha A.Nagi*
> >> _______________________________________________
> >> The Air-L at listserv.aoir.org mailing list
> >> is provided by the Association of Internet Researchers http://aoir.org
> >> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> >>
> >> Join the Association of Internet Researchers:
> >> http://www.aoir.org/
> >
> >
> >
> >
> > --
> > Noha A.Nagi
>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/
>

-- 
Dr. Stuart W. Shulman
http://people.umass.edu/stu

Founder and CEO, Texifter
http://texifter.com

LinkedIn
http://www.linkedin.com/in/stuartwshulman

Twitter
https://twitter.com/StuartWShulman