[Air-L] Using the Archive.org for data capture?

Heinz, Lisa ls144009 at ohio.edu
Tue Apr 21 03:24:13 PDT 2015


Hello!
I can't help but to sprinkle a little nostalgia over this topic. Forgive me for the length of this post. I have developed and designed websites since 1995, and have just in the last year moved into studying the Internet with a particular interest in pre-Web communities and the tech that drove them compared to today's Web. I joined this list last fall on the suggestion of one of my professors. It has been a wonderful source of information and frequently provides me with a walk down memory lane. Thank you. 

One of the drawbacks to the AI/WM is that it is not complete by any means, especially if you seek historical record of the earliest of early websites. My very first website, launched summer of 1995, does not exist anywhere on the web and my copies of it were lost in the long-gone storage medium of Zip-drives. The AI/WM did not start collecting sites until later that year. 

My second website launched in spring of 1996, and that one is on the WM, starlightbridals.com, if you want to look. (Please ignore anything after 2001 when I closed my business and dropped my domain. I spent 10yrs working to get it back while it was used as a porn hub.) Of course, not all pages were archived in the life of the site, but that's OK, my home page is there. I've set-up dozens of websites since then, most of them are archived, a few never made it because they were sites for short-lived community organizations or events and the WM wasn't collecting sites with the speed it does now.

What the IA/WM provides me, personally, is a validation of my contribution to the development of the Web. I am very happy to see folks studying this historical record, and I've started to collect a reading list based on your recommendations. Thank you, again! I look forward to participating in this group in the years to come.

~~Lisa

~~~~~~~~~~~~~~~~~~~
Lisa M. Heinz
Masters Student, Media Arts & Studies
Ohio University
http://Twitter.com/livingrural
http://LinkedIn.com/in/lisaheinz​


________________________________________
From: Air-L <air-l-bounces at listserv.aoir.org> on behalf of Jefferson Bailey <jefferson at archive.org>
Sent: Monday, April 20, 2015 7:29 PM
To: matthew.weber at rutgers.edu
Cc: air-l at listserv.aoir.org
Subject: Re: [Air-L] Using the Archive.org for data capture?

Hi all,

I am also happy to discuss more off list. We have worked with Matt and many others on providing access to web archive data for researchers interested in studying/mining the historical web. There are a number of initiatives within Internet Archive to augment research services and access models and I'll document them on this list as they ramp up. We're definitely excited to see more researchers interested in web archives.

There are some technical and methodological differences for web archives as far as issues related to provenance, validity, format, completeness, and so on, though many of these concerns are no different than those encountered using traditional (analog) archival materials; they often just *feel* more immediate or loaded given our unique relationship with the web and its affordances and contingencies as a quote-unquote documentary record.

Cheers,
Jefferson

Jefferson Bailey
Program Manager & Interim Co-Director
Web Archiving Services & Programs
Internet Archive
jefferson at archive.org


On Apr 20, 2015, at 13:54, Matthew Weber <matthew.weber at rutgers.edu> wrote:

> Dan,
>
> Rogers’ digital methods work is a broad starting point, although I’m not sure that he’s specifically addressed issues with the Internet Archive.
>
> I’ve been working on research derived from the Internet Archive for almost a decade now, mostly at a large scale, although some projects are smaller in nature. One starting point might be this paper http://dl.acm.org/citation.cfm?id=2579213 <http://dl.acm.org/citation.cfm?id=2579213>  and I have some other published work using derived datasets.
>
> With regards to your question about validity, it depends in part on what you’re looking to explore. If you’re using smaller datasets, validity won’t be too much of an issue, but once you scale beyond a few dozen domains (and again, depending on your analysis and RQs) there are validity issues that must be addressed. We’ve started to outline these in a few related papers that are under review but mostly it pertains to issues of sampling error and data completeness.
>
> Feel free to ping me offline - I can point you to GitHub code and other work, depending on your goals - and definitely check out the work of others. Kalev Letaaru is active on here and works in this space, as does Neils Brugger at Aarhus. There is a growing community of researchers doing Internet Archive-related research.
>
> Regards,
> Matt
>
>
>
>
>
>> On Apr 20, 2015, at 4:46 PM, Matthew T Mccarthy <mccart74 at uwm.edu> wrote:
>>
>> Apologies for the curt message. I hit send before finishing.
>> In addition to the citation for his book, here is a link to the Ditigal Methods Initiative wikipage
>>
>> https://wiki.digitalmethods.net
>>
>> Best,
>> Matt
>>
>>
>> Matthew T. McCarthy
>> Ph.D. Student/Graduate Instructor
>> Department of Sociology
>> University of Wisconsin-Milwaukee
>> P.O. Box 413
>> Milwaukee, WI   53201
>>
>> ________________________________________
>> From: Air-L <air-l-bounces at listserv.aoir.org> on behalf of Matthew T Mccarthy <mccart74 at uwm.edu>
>> Sent: Monday, April 20, 2015 3:43 PM
>> To: Dan Fielding; Air-L at listserv.aoir.org
>> Subject: Re: [Air-L] Using the Archive.org for data capture?
>>
>> Dan,
>>
>> Richard Rogers of the Digital Methods Initiative has dealt with this.
>>
>>
>> Rogers, R. (2013). Digital methods. MIT press.
>>
>>
>> Matthew T. McCarthy
>> Ph.D. Student/Graduate Instructor
>> Department of Sociology
>> University of Wisconsin-Milwaukee
>> P.O. Box 413
>> Milwaukee, WI   53201
>>
>> ________________________________________
>> From: Air-L <air-l-bounces at listserv.aoir.org> on behalf of Dan Fielding <sociologyfornerds at gmail.com>
>> Sent: Monday, April 20, 2015 3:28 PM
>> To: Air-L at listserv.aoir.org
>> Subject: [Air-L] Using the Archive.org for data capture?
>>
>> Hello wonderful list,
>>
>> I am currently establishing a research protocol that will rely on the
>> wayback machine (archive.org) to gather caches of pages from 1-2 years ago.
>> Is there research on the wayback machine as an effective mode of data
>> capture? Are there any questions about its validity? Have you read
>> published work using the wayback machine? What concerns have other scholars
>> raised about using it?
>>
>> Thanks for your time! Have a great day,
>>
>> Dan Fielding
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
>
> _______________________________________________
> The Air-L at listserv.aoir.org mailing list
> is provided by the Association of Internet Researchers http://aoir.org
> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>
> Join the Association of Internet Researchers:
> http://www.aoir.org/

_______________________________________________
The Air-L at listserv.aoir.org mailing list
is provided by the Association of Internet Researchers http://aoir.org
Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/


More information about the Air-L mailing list