[Air-L] archiving web pages?

Raphael Velt psxrv at nottingham.ac.uk
Wed Oct 15 03:37:34 PDT 2014


Hello,

If you're not afraid of code, I'd recommend a solution based on phantomjs as it is basically a UI-less Chrome browser, and will load and render webpages as they would be in a browser, and you can then automate tasks or do screenshots on the webpage (according to StackOverflow questions, you can definitely script a scroll on the webpage). The project's website offers an example of turning multiple pages into PNG images.
http://phantomjs.org/

If you need something simpler, I've seen a project based on it to specifically scrape pages, called pjscrape, but haven't tried it
http://nrabinowitz.github.io/pjscrape/

Regards,

Raphaël Velt

PhD Student - Industrial Case "Understanding Media Trajectories"
Mixed Reality Lab - University of Nottingham
In partnership with BBC Research & Development, MediaCityUK

-----Original Message-----
From: Air-L [mailto:air-l-bounces at listserv.aoir.org] On Behalf Of Bryan-Mitchell Young
Sent: 14 October 2014 17:56
To: air-l at listserv.aoir.org
Subject: [Air-L] archiving web pages?

I've been using a combination of httrack, the scrapbook extension for firefox, and print-to-pdf to archive web sites but some of the more complex sites are difficult to save with these so I was wondering if anyone knows of other, more powerful, ways to archive modern web sites that use a lot of javascript or don't display the full page until you go to the bottom of the page or things like that.
Thanks
Bryan-Mitchell Young
_______________________________________________
The Air-L at listserv.aoir.org mailing list is provided by the Association of Internet Researchers http://aoir.org Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org

Join the Association of Internet Researchers:
http://www.aoir.org/
This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it.   Please do not use, copy or disclose the information contained in this message or in any attachment.  Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment
may still contain software viruses which could damage your computer system, you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.







More information about the Air-L mailing list