[Air-L] Text Sample Size?

Tue Aug 18 06:55:23 PDT 2009

First of all, I'm assuming you want to apply some sorts of inferential stats
to these data - if not then this may not apply. If so, the main problem with
small sample size is the loss of power, so, if there is an effect in the
population, you are less likely to find it, so, mainly you're just
handicapping yourself. In fact, if you do find something with low power,
it's very possibly a very large effect.

It's sort of like trying to see through a hazy pair of glasses. If there's
something there, for you to see it, it needs to be big and obvious.

...peace...richard

On 8/17/09 9:46 PM, "Alex Halavais" <alex at halavais.net> wrote:

> Karyn & Peter,
> 
> I'm hoping someone out there will correct me. I think you are looking
> for something like a rule of thumb, and I suspect that doesn't exist.
> 
> There are two questions. The first is how many blogs/bloggers you need
> to sample in order to generalize to all bloggers. I'm guessing that's
> not your question. (Although given the issues of arriving at a
> representative sample, it is not a trivial one.)
> 
> I think the question you are asking is (a) how many different bloggers
> you will need to sample in order to have the power necessary to
> demonstrate a significant difference between groups, and (b) how much
> text from each of these bloggers you will need.
> 
> Of course, that question hinges in part on the distribution of
> differences within your groups. That, in turn, is dependent on
> precisely how you are measuring such differences. (And we'll leave
> aside, for the moment, the question of whether those differences make
> a difference--i.e., the validity of whatever measure you choose to
> use.)
> 
> If you are using a metric that has been used in the past to show
> gender differences, you may be able to use whatever differences they
> found--in group and between--to estimate your own sample needs. In
> practice, though, if that literature exists--you probably just use the
> same sample size.
> 
> So, that is my non-answer.
> 
> - Alex
> 
> 
> --
> //
> // This email is
> // [x] assumed public and may be blogged / forwarded.
> // [ ] assumed to be private, please ask before redistributing.
> //
> // Alexander C. Halavais, ciberflâneur
> // http://alex.halavais.net
> //
> 
> 
> 
> On Mon, Aug 17, 2009 at 10:16 PM, Peter Timusk<ptimusk at sympatico.ca> wrote:
>> I have no idea of samples of words. I do know samples of persons.
>> 
>> A sample of persons below say 300 is suspect especially if not random. I am
>> reading a few books in Internet studies that argue against previous studies
>> by claiming the sample is too small and not random.
>> 
>> You can claim somethings with samples as small as 12 but the more items you
>> want to measure the larger your sample should be IMHO.
>> 
>> A sample is best random. Some would argue a sample is only a sample if
>> random. You can sample randomly and still choose roughly equal men and
>> women.
>> 
>> Can you randomize your samples in some ways?
>> 
>> The Canadian Internet Use Survey has had a sample of more than 20,000
>> persons.
>> 
>> All that I am saying is probably to be found in most undergraduate
>> statistics books.
>> 
>> You would need  to ask text analysts about how to sample texts.
>> 
>> I follow gender and computers so would be interested in your results or what
>> you are looking for.
>> 
>> Peter
>> 
>> 
>> On 17-Aug-09, at 8:29 PM, Karyn Hollis wrote:
>> 
>>> Hi All--
>>>  This is a newbie question.  I am planning to do a quantitative data
>>>  analysis to study blogs for gender differences in CMC.  Are there any
>>>  rules for the size of samples?  Would comparing male to female blog
>>>  texts of a total of 50,000 words each be enough to claim statistical
>>>  significance for any differences I find?
>>>  Thanks for any advice,
>>>  Karyn Hollis
>>>  Villanova University
>> 
>> Peter Timusk statistical computer programmer
>> ptimusk at sympatico.ca
>> address 701-151 Parkdale Avenue
>> Ottawa, Ontario Canada K1Y 4V8
>> Phone 613-729-8328
>> 
>> May all your numbers be quality numbers... even if they are only average
>> numbers.
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at:
>> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>> 
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
>> 
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at:
> http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/

-- 
Richard H. Hall, PhD
Professor, Information Science and Technology
Missouri University of Science and Technology
http://mst.edu/~rhall