[Air-l] Re: Air-l digest, Vol 1 #217 - 19 msgs

Lee Giles giles at ist.psu.edu
Mon Nov 19 13:44:33 PST 2001


A note about searching. Data does not need to have metadata
to be searchable; however, it certainly helps.
The web consists of data that has little useful metadata
content, and it is very searchable, as the search engines have shown.
There are automatic methods for generating metadata which can be
particularly applicable to a domain or appropriate for a particular
view. See for example researchindex.com where the author, title, and
citation metadata are automatically generated.

Lee Giles


air-l-request at aoir.org wrote:

>
>
> Today's Topics:
>
>    1. Re: AoIR communal data-database (Charlie Hendricksen)
>    2. Re: AoIR communal data-database (jeremy hunsinger)
>    3. Re: AoIR communal data-database (Charlie Hendricksen)
>    4. Re: AoIR communal data-database (Charlie Hendricksen)
>
> Message: 1
> Date: Sun, 18 Nov 2001 13:05:23 -0700
> From: Charlie Hendricksen <veritas at u.washington.edu>
> Organization: Department of Geography, University of Washington
> To: air-l at aoir.org
> Subject: Re: [Air-l] AoIR communal data-database
> Reply-To: air-l at aoir.org
>
> Yes, the "codebook" for the study should have all the metadata
> necessary.  But are the codebooks searchable?  If the repository is of
> any size at all, then it needs to be searchable.  Would you like to
> read all the codebooks in order to see if there was any data you could
> use?  If the codebooks are disassembled and placed in a database that
> allows searching then the repository is very useful.  My guess is that
> codebooks are idiosyncratic and of wildly varying quality.  This means
> that the metadata would be incomplete in many cases.
>
> This raises the issue of what the metadata should include.
>
> jeremy hunsinger wrote:
> >
> > I'm not sure what level of metadata you are talking about here...
> > collecting a description of the study, the authors, the coding, etc.
> > should all be public in the codebook for the study... if it is not then
> > the study probably wouldn't be useful to others in any case.  perhaps I
> > am on the wrong track here?
> >
> > >
> > >     The question of metadata raises a difficult barrier to building
> > > the proposed repository.  Data is pretty much useless without
> > > metadata.  The amount of work required to obtain useful metadata is
> > > likely to exceed what a volunteer effort can suppor
> >
> > > jeremy hunsinger
> > on the ibook
> > www.cddc.vt.edu
> > www.cddc.vt.edu/jeremy
> >
> > _______________________________________________
> > Air-l mailing list
> > Air-l at aoir.org
> > http://www.aoir.org/mailman/listinfo/air-l
>
> --
>             Charlie Hendricksen   veritas at u.washington.edu
>
>             "Information technology structures human relationships."
>                             "Models relate concepts."
>
> --__--__--
>
> Message: 2
> Date: Sun, 18 Nov 2001 15:18:41 -0500
> Subject: Re: [Air-l] AoIR communal data-database
> From: jeremy hunsinger <jhuns at vt.edu>
> To: air-l at aoir.org
> Reply-To: air-l at aoir.org
>
> yes, but I don't think one needs to have metadata at the level of
> variables.  People might want that data, then they should see if the
> study might have that information, download the study and look for it
> themselves.  I think one needs to have it at the level of the study.
> I'm assuming that this will all be in a database eventually, so
> categories such as the openarchives.org metadata set would be best, it
> is a standard, it describes unique objects like a study, etc.
>
> the lack of exact and complete metadata has not hindered the development
> of such projects in the past, i guess in the end it is always a balance
> between the practical and the ideal situations.
>
> On Sunday, November 18, 2001, at 03:05 PM, Charlie Hendricksen wrote:
>
> > Yes, the "codebook" for the study should have all the metadata
> > necessary.  But are the codebooks searchable?  If the repository is of
> > any size at all, then it needs to be searchable.  Would you like to
> > read all the codebooks in order to see if there was any data you could
> > use?  If the codebooks are disassembled and placed in a database that
> > allows searching then the repository is very useful.  My guess is that
> > codebooks are idiosyncratic and of wildly varying quality.  This means
> > that the metadata would be incomplete in many cases.
> >
> > This raises the issue of what the metadata should include.
> jeremy hunsinger
> on the ibook
> www.cddc.vt.edu
> www.cddc.vt.edu/jeremy
>
> --__--__--
>
> Message: 3
> Date: Sun, 18 Nov 2001 13:24:14 -0700
> From: Charlie Hendricksen <veritas at u.washington.edu>
> Organization: Department of Geography, University of Washington
> To: air-l at aoir.org
> Subject: Re: [Air-l] AoIR communal data-database
> Reply-To: air-l at aoir.org
>
> I have replied to Nancy's response below inline.
>
> Nancy Baym wrote:
> >
> > >Friends,
> > >
> > >     In point 4 of Nancy's proposal for a data repository there is this
> > >statement: "Our intention is that access to such private resources
> > >contributed by aoir members
> > >would be limited to aoir members."  I see no reasonable justification
> > >for restricting access and would not participate in the venture if
> > >such restrictions are adopted.
> >
> > My assumption was that people would prefer to limit the access to
> > their data, otherwise it would fall under that first category of data
> > already available on the web. Personally, if I were going to make
> > data I'd collected available, I'd like to know that there was a
> > limited set of people who would have access to that, and that I could
> > get that list on the member website. However, the level of access is
> > certainly open for discussion and I'd be inclined to defer to the
> > will of the people who were willing to share their data through a
> > resource like this. If they want it available to all, then that's
> > fine.
> >
>
> Yes! The use of data for other uses such as meta analysis should
> usually be contingent on the agreement of the original author.  I
> think that the data repository would be most useful as a catalog of
> available datasets.  If the metadata pointed to a dataset that might
> be of use, then the new user could contact the holder of the data and
> arrange for use.  It might be useful (but perhaps embarrassing) to
> allow annotation of the metadata.
>
> > The issue of how much of what aoir does under its auspices should be
> > available to all and how much should be available only to members is
> > a tricky one and there are arguments on both sides. It's a matter of
> > ongoing discussion with every idea we come up with. Speaking only for
> > myself, my train of logic goes like this --> do we distinguish
> > between members and nonmembers? if we don't what does membership
> > mean? if membership doesn't mean anything then why join? if no one
> > joins there's no budget, eventually no conferences, eventually no
> > association. While I believe that aoir should not be an exclusive
> > little clique, I do think it's important to provide benefits for
> > members that are better than the benefits of not being a member. It's
> > not like membership is hard to come by.
> >
>
> If the repository is principally a repository of metadata for
> available data, I see no reason why AoIR would lose anything by making
> it public.  You might even attract new members.
>
> > Regarding metadata, I concur with Jeremy. If we're talking about data
> > that are incomprehensible without being in on the research program or
> > that needs a lot of sophisticated metastuff that's more than a
> > codebook and explanation could provide, then it's probably not
> > appropriate for this. On the other hand, there is a lot of data
> > available already on the web that's being used just like this (e.g.
> > Pew's data).
> >
>
> A metadata repository need only be as sophisticated as is needed to
> eliminate hopefully optimistic requests, and to attract requests for
> data that is likely to be useful.  In other words it needs to
> eliminate obviously false positives and obviously false negatives.
> Having been involved in a metadata cataloging process, I appreciate
> the existence of overly sophisticated "metastuff" (beautiful term!).
>
> > Regarding whether this is too big to be sustained by volunteers,
> > maybe a volunteer effort can't sustain this. If this is not something
> > people would find adequately valuable to participate in, then it
> > won't work. On the other hand, all of AoIR thus far would seem to be
> > a lot more than a volunteer effort could sustain, and it seems to be
> > working pretty well because people have cared enough to volunteer
> > their energies.
>
> Well, I hope that the authors of data can find the time to submit
> metadata.  The existence of a well designed metadata core, along with
> tools to submit that metadata, and review it before publication, is
> important.  Similarly, tools that allow exploration of the metadata
> database are essential to the dissemination of that product.  I have
> some tools and an unpublished paper that may be useful.
>
> >
> > Nancy
> >
> > _________________________________________________________
> > Nancy Baym
> > nbaym at ku.edu
> > http://www.ku.edu/home/nbaym
> > Communication Studies, University of Kansas
> > 102 Bailey, 1440 Jayhawk Blvd., Lawrence, KS 66045, USA
> > VP, Association of Internet Researchers: http://aoir.org
> >
> > _______________________________________________
> > Air-l mailing list
> > Air-l at aoir.org
> > http://www.aoir.org/mailman/listinfo/air-l
>
> --
>             Charlie Hendricksen   veritas at u.washington.edu
>
>             "Information technology structures human relationships."
>                             "Models relate concepts."
>
> --__--__--
>
> Message: 4
> Date: Sun, 18 Nov 2001 13:35:03 -0700
> From: Charlie Hendricksen <veritas at u.washington.edu>
> Organization: Department of Geography, University of Washington
> To: air-l at aoir.org
> Subject: Re: [Air-l] AoIR communal data-database
> Reply-To: air-l at aoir.org
>
> OK, there is a proposal: the metadata should be based on the
> openarchives.org database.  Now, in total ignorance of that metadata
> set, let me say that that metadata set might be so extensive that the
> data providers would be discouraged from submitting their metadata.
> There is an exquisite balance between the work required to input
> metadata and the rewards for participating in such a project.  In my
> experience that balance goes to simplicity of an order that makes the
> metadata marginally useful.  I would argue for a custom metadata set
> -- that argument based on experience with the hopelessly complex
> metadata standards of the Federal Geographic Data Committee.
>
> jeremy hunsinger wrote:
> >
> > yes, but I don't think one needs to have metadata at the level of
> > variables.  People might want that data, then they should see if the
> > study might have that information, download the study and look for it
> > themselves.  I think one needs to have it at the level of the study.
> > I'm assuming that this will all be in a database eventually, so
> > categories such as the openarchives.org metadata set would be best, it
> > is a standard, it describes unique objects like a study, etc.
> >
> > the lack of exact and complete metadata has not hindered the development
> > of such projects in the past, i guess in the end it is always a balance
> > between the practical and the ideal situations.
> >
> > On Sunday, November 18, 2001, at 03:05 PM, Charlie Hendricksen wrote:
> >
> > > Yes, the "codebook" for the study should have all the metadata
> > > necessary.  But are the codebooks searchable?  If the repository is of
> > > any size at all, then it needs to be searchable.  Would you like to
> > > read all the codebooks in order to see if there was any data you could
> > > use?  If the codebooks are disassembled and placed in a database that
> > > allows searching then the repository is very useful.  My guess is that
> > > codebooks are idiosyncratic and of wildly varying quality.  This means
> > > that the metadata would be incomplete in many cases.
> > >
> > > This raises the issue of what the metadata should include.
> > jeremy hunsinger
> > on the ibook
> > www.cddc.vt.edu
> > www.cddc.vt.edu/jeremy
> >
> > _______________________________________________
> > Air-l mailing list
> > Air-l at aoir.org
> > http://www.aoir.org/mailman/listinfo/air-l
>
> --
>             Charlie Hendricksen   veritas at u.washington.edu
>
>             "Information technology structures human relationships."
>                             "Models relate concepts."
>
> -

--
C. Lee Giles, David Reese Professor
School of Information Sciences and Technology
and Computer Science and Engineering
The Pennsylvania State University
001 Thomas Bldg,
University Park, PA, 16801, USA
814 865 7884; FAX: 6426
http://ist.psu.edu/giles






More information about the Air-L mailing list