[Air-L] do we need an aoir data archive

Robert Ackland robert.ackland at anu.edu.au
Sat May 7 22:12:36 PDT 2011


Hi Eric,

Thank you to both you and Ralph Schroeder for this interesting report. 
  My view on this (which I've expressed to both you and Ralph 
previously) is that the major obstacle to research using web archive 
data is the lack of APIs.

The reason why the live web is more actively researched than the 
archived web is because tool developers can access live web data 
either by crawling websites directly or through APIs into 
Google/Yahoo, Twitter, Facebook etc. (although these APIs are not 
always what we would hope for - see the recent discussion on this list 
regarding the Twitter API).  I've been wanting to connect the VOSON 
software to sources of historical hyperlink data (e.g. Internet 
Archive) since around 2005 but as far as I know, there is no publicly 
available API yet.

Through your report I've learned about the EU's Longitudinal Analysis 
of Web Archive Data project (http://www.lawa-project.eu) and so I'm 
hoping they may be developing APIs into historical web collections 
that other tool developers can use for the construction of 
longitudinal hyperlink networks (and indeed, even hyperlink event 
stream datasets).

Archives (of all types) are under pressure to add value to their data 
and there seems to be a perception that this is best done by in-house 
development of tools that sit on top of collections, with preferential 
access to the data.  However I feel that in many situations, APIs that 
can be used by third-party developers are a much better way of 
stimulating innovative research.

Rob

-------------------------------------
Dr Robert Ackland
Fellow and Masters Coordinator, Australian Demographic and Social 
Research Institute, The Australian National University

homepage: http://adsri.anu.edu.au/people/robert.php
project:  http://voson.anu.edu.au

Information about the Master of Social Research
(Social Science of the Internet specialisation):
http://adsri.anu.edu.au/study/msr.php
-------------------------------------


Eric Meyer wrote:
> While Jeremy is bringing this issue up, let me get in a bid to generate some discussion on the topic of building archives of web content.  These may be slightly different than the structured data archives Jeremy is referring to, but potentially offer future researchers access to important content for Internet research.
> 
> At OII, we have written a draft report that we will be delivering at the IIPC conference next week, and we are hoping that communities such as AoIR will scan the report and contribute your thoughts. This report has been commissioned by the IIPC, so your ideas have a pretty good chance of finding the right audience.  Please let us know if there is data you think that archives should be collecting, storing, and making available for research, and research questions they should be considering as they build archives, or partner with organizations like AoIR to make archives.
> 
> The draft report is available at: http://ssrn.com/abstract=1830025
> 
> Comments prior to 04 June 2011 will be taken into account when we write the final report.  You can respond directly to me (so as not to bog down this interesting discussion), and we will post the final report back to the list at the end of June.
> 
> Thanks,
> Eric
> 
> Eric T. Meyer
> Research Fellow, Oxford Internet Institute
> University of Oxford
> eric.meyer at oii.ox.ac.uk
> http://people.oii.ox.ac.uk/meyer
> 
> 
> -----Original Message-----
> From: air-l-bounces at listserv.aoir.org [mailto:air-l-bounces at listserv.aoir.org] On Behalf Of jeremy hunsinger
> Sent: 06 May 2011 04:53
> To: jeremy hunsinger
> Cc: aoir list
> Subject: Re: [Air-L] do we need an aoir data archive
> 
> I mean to say... 'i do not think you can peer review data based research without access to the data'  not  'As I've said elsewhere, i don't actually think you can do peer reviewable research without providing access to the data that generated that research'  bleh, i really should stop multitasking when writing email.  
> 
>> I brought this up on twitter yesterday.   Perhaps it is time for AoIR to start an archive of data for current and future use.  My thought is that right now many researchers have access and rights to share significant bits of data, and many do not have access.  As I've said elsewhere, i don't actually think you can do peer reviewable research without providing access to the data that generated that research, and given that I have reviewed several papers based on proprietary information they could not release..., I'm thinking that we need to find ways of getting our data out there in the spirit of community and to promote scholarly quality.     I also think having a neutral nonprofit holding the information would make it far easier for people to get information shared from corporations.  
>>
>> However, that i see the need and others have agreed here and there, I think it is time to have a discussion.  Personally, I know that I could set the infrastructure up without issue.   The website in theory has close to unlimited storage space, though downloads would need to be limited as we have limited processor speed.   There are also legal issues, copyright issues, and codebook issues that would need to be sorted out.  
>>
>> so do we need something like this?  and if so or if not, why or why not?  
>>
>>
>> Jeremy Hunsinger
>> Center for Digital Discourse and Culture
>> Virginia Tech
>>
>>
>> Words are things; and a small drop of ink, falling like dew upon a thought, produces that which makes thousands, perhaps millions, think. --Byron
>>
>>
>>
>>
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
> 
> Jeremy Hunsinger
> Center for Digital Discourse and Culture
> Virginia Tech
> 
> 
> 
> Imagination is the one weapon in the war against reality.
> -Jules de Gaultier
> 
> () ascii ribbon campaign - against html mail
> /\ - against microsoft attachments
> 
> 
> 
> 
> 
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
> 
> Join the Association of Internet Researchers:
> http://www.aoir.org/



More information about the Air-L mailing list