[Air-L] Text Sample Size?

Alex Halavais alex at halavais.net
Mon Aug 17 19:46:46 PDT 2009

Karyn & Peter,

I'm hoping someone out there will correct me. I think you are looking
for something like a rule of thumb, and I suspect that doesn't exist.

There are two questions. The first is how many blogs/bloggers you need
to sample in order to generalize to all bloggers. I'm guessing that's
not your question. (Although given the issues of arriving at a
representative sample, it is not a trivial one.)

I think the question you are asking is (a) how many different bloggers
you will need to sample in order to have the power necessary to
demonstrate a significant difference between groups, and (b) how much
text from each of these bloggers you will need.

Of course, that question hinges in part on the distribution of
differences within your groups. That, in turn, is dependent on
precisely how you are measuring such differences. (And we'll leave
aside, for the moment, the question of whether those differences make
a difference--i.e., the validity of whatever measure you choose to

If you are using a metric that has been used in the past to show
gender differences, you may be able to use whatever differences they
found--in group and between--to estimate your own sample needs. In
practice, though, if that literature exists--you probably just use the
same sample size.

So, that is my non-answer.

- Alex

// This email is
// [x] assumed public and may be blogged / forwarded.
// [ ] assumed to be private, please ask before redistributing.
// Alexander C. Halavais, ciberflâneur
// http://alex.halavais.net

On Mon, Aug 17, 2009 at 10:16 PM, Peter Timusk<ptimusk at sympatico.ca> wrote:
> I have no idea of samples of words. I do know samples of persons.
> A sample of persons below say 300 is suspect especially if not random. I am
> reading a few books in Internet studies that argue against previous studies
> by claiming the sample is too small and not random.
> You can claim somethings with samples as small as 12 but the more items you
> want to measure the larger your sample should be IMHO.
> A sample is best random. Some would argue a sample is only a sample if
> random. You can sample randomly and still choose roughly equal men and
> women.
> Can you randomize your samples in some ways?
> The Canadian Internet Use Survey has had a sample of more than 20,000
> persons.
> All that I am saying is probably to be found in most undergraduate
> statistics books.
> You would need  to ask text analysts about how to sample texts.
> I follow gender and computers so would be interested in your results or what
> you are looking for.
> Peter
> On 17-Aug-09, at 8:29 PM, Karyn Hollis wrote:
>> Hi All--
>>  This is a newbie question.  I am planning to do a quantitative data
>>  analysis to study blogs for gender differences in CMC.  Are there any
>>  rules for the size of samples?  Would comparing male to female blog
>>  texts of a total of 50,000 words each be enough to claim statistical
>>  significance for any differences I find?
>>  Thanks for any advice,
>>  Karyn Hollis
>>  Villanova University
> Peter Timusk statistical computer programmer
> ptimusk at sympatico.ca
> address 701-151 Parkdale Avenue
> Ottawa, Ontario Canada K1Y 4V8
> Phone 613-729-8328
> May all your numbers be quality numbers... even if they are only average
> numbers.
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> Join the Association of Internet Researchers:
> http://www.aoir.org/

More information about the Air-L mailing list