[Air-L] Text Sample Size?
tronica at gmail.com
Tue Aug 18 05:30:38 PDT 2009
Alex, I believe you are right. There is no answer to the question 'how many
observations do I need to enable statistically significance' as a rule of
thumb. But, if you know a bit more about your planned analyses in advance,
you may be able to estimate sample size using power tables.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences
(2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.
Power calculations are useful only if your data also meet other criteria
though, which need to be considered before you should be applying
One of the problems with web-based data is the relative ease of collecting
'large numbers' of responses, words, observations, etc. People often think
that large numbers means they can 'find statistical significance'. What
matters is the way you are sampling those units and how you have defined the
larger population about which you hope to infer - and other elements such as
the expected differences between groups and effect sizes, as Alex and Peter
have already mentioned.
Although a lot of this is standard methods textbook content, it's surprising
how many published articles use statistical inference in situation where
assumptions for it aren't met. Indeed, I'm still trying to get my head
around it. Colleagues of mine have said things like 'it's not a random
sample and I don't want to generalise my results to a larger population as I
know I cannot, but I can still use statistical tests to test variables
within my data, right?' Given these things get published, I'm confused
myself. Then again, what is theoretically correct and what gets published
aren't necessarily the same thing...
Some answers and more questions for you!
2009/8/18 Alex Halavais <alex at halavais.net>
> Karyn & Peter,
> I'm hoping someone out there will correct me. I think you are looking
> for something like a rule of thumb, and I suspect that doesn't exist.
> There are two questions. The first is how many blogs/bloggers you need
> to sample in order to generalize to all bloggers. I'm guessing that's
> not your question. (Although given the issues of arriving at a
> representative sample, it is not a trivial one.)
> I think the question you are asking is (a) how many different bloggers
> you will need to sample in order to have the power necessary to
> demonstrate a significant difference between groups, and (b) how much
> text from each of these bloggers you will need.
> Of course, that question hinges in part on the distribution of
> differences within your groups. That, in turn, is dependent on
> precisely how you are measuring such differences. (And we'll leave
> aside, for the moment, the question of whether those differences make
> a difference--i.e., the validity of whatever measure you choose to
> If you are using a metric that has been used in the past to show
> gender differences, you may be able to use whatever differences they
> found--in group and between--to estimate your own sample needs. In
> practice, though, if that literature exists--you probably just use the
> same sample size.
> So, that is my non-answer.
> - Alex
> // This email is
> // [x] assumed public and may be blogged / forwarded.
> // [ ] assumed to be private, please ask before redistributing.
> // Alexander C. Halavais, ciberflâneur
> // http://alex.halavais.net
> On Mon, Aug 17, 2009 at 10:16 PM, Peter Timusk<ptimusk at sympatico.ca>
> > I have no idea of samples of words. I do know samples of persons.
> > A sample of persons below say 300 is suspect especially if not random. I
> > reading a few books in Internet studies that argue against previous
> > by claiming the sample is too small and not random.
> > You can claim somethings with samples as small as 12 but the more items
> > want to measure the larger your sample should be IMHO.
> > A sample is best random. Some would argue a sample is only a sample if
> > random. You can sample randomly and still choose roughly equal men and
> > women.
> > Can you randomize your samples in some ways?
> > The Canadian Internet Use Survey has had a sample of more than 20,000
> > persons.
> > All that I am saying is probably to be found in most undergraduate
> > statistics books.
> > You would need to ask text analysts about how to sample texts.
> > I follow gender and computers so would be interested in your results or
> > you are looking for.
> > Peter
> > On 17-Aug-09, at 8:29 PM, Karyn Hollis wrote:
> >> Hi All--
> >> This is a newbie question. I am planning to do a quantitative data
> >> analysis to study blogs for gender differences in CMC. Are there any
> >> rules for the size of samples? Would comparing male to female blog
> >> texts of a total of 50,000 words each be enough to claim statistical
> >> significance for any differences I find?
> >> Thanks for any advice,
> >> Karyn Hollis
> >> Villanova University
> > Peter Timusk statistical computer programmer
> > ptimusk at sympatico.ca
> > address 701-151 Parkdale Avenue
> > Ottawa, Ontario Canada K1Y 4V8
> > Phone 613-729-8328
> > May all your numbers be quality numbers... even if they are only average
> > numbers.
> > _______________________________________________
> > The Air-L at listserv.aoir.org mailing list
> > is provided by the Association of Internet Researchers http://aoir.org
> > Subscribe, change options or unsubscribe at:
> > http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> > Join the Association of Internet Researchers:
> > http://www.aoir.org/
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> Join the Association of Internet Researchers:
More information about the Air-L