[Air-L] Tool for collecting Instagram images/websites/data?

Craig Hamilton Craig.Hamilton at bcu.ac.uk
Tue Sep 20 07:58:44 PDT 2016


Hi Todd, Rainer and all

Here’s a blog post on the Harkive site that (I hope) explains the process I described, whereby data from different social media sources can be automatically written to a single database. The post includes a short video walkthrough of the process.

http://harkive.org/datcolzap/

Hope you find this useful
Kind regards
Craig
> On 19 Sep 2016, at 13:02, Todd O'Neill <Todd.O'Neill at mtsu.edu> wrote:
> 
> Craig
> 
> Would you mind sharing your Zapier workflow with us?
> 
> Cheers!
> 
> Todd O'Neill
> Assistant Professor, New Media
> Electronic Media Communication Department
> College of Media and Entertainment
> Middle Tennessee State University
> todd.oneill at mtsu.edu
> LinkedIn: toddoneill  |  FaceBook: OneillTodd  |  Twitter: mtsunewmedia
> 
>> On Sep 19, 2016, at 2:10 AM, Craig Hamilton <Craig.Hamilton at bcu.ac.uk> wrote:
>> 
>> Hi Rainer,
>> 
>> When collecting data for my Harkive project (which collects on the #harkive hashtag across various platforms, including Instagram, Twitter, etc), I recently started using Zapier, which I found solved a lot of time-consuming issues. 
>> 
>> The main advantage for my purposes was that it significantly reduced data cleaning/organising post collection. As I’m sure you are aware, the API of each service uses slightly different terms for common elements of the data available. EG: On Twitter the text content of a Tweet is contained with the <text> element, whereas on Tumblr the text written by a user is in the <body> element. Not only that, but date/time stamps often appear in different formats. What Zapier allowed me to do was collect from the various APIs, reformat to common data formats, and then write all data to specific columns in a central GoogleDocs Spreadsheet - so, all commonly formatted date/time stamps appeared in a single column, all usernames in another, and so on, regardless of the service they originated from. I was also able to create a new entry in each row that labelled each entry as originating from Twitter, Tumblr, Instagram, etc. This was all in an attempt to have my data rendered as ‘Tidy’, in the Hadley Wickham sense, automatically. 
>> 
>> The disadvantage of Zapier is that there is a charge. I was collecting over a short period of time so I was able to keep this cost quite low. If you are collecting a lot of data over a sustained period of time, you may find it prohibitively expensive. 
>> 
>> Let me know if you would like to know more - I am happy to share my rough workflow notes with you, or can show you in Berlin during the conference.
>> 
>> Kind regards
>> Craig
>> 
>>> On 18 Sep 2016, at 10:24, Maurice Vergeer <m.vergeer at maw.ru.nl> wrote:
>>> 
>>> maybe the R package instaR:
>>> https://github.com/pablobarbera/instaR/blob/master/examples.R
>>> For longterm applications you could learn web scraping with R package rvest
>>> (there ar more options for web scraping)
>>> The quick-and-dirty method is search a tag on Instagram with a browser,
>>> scroll down multiple times to load as much pictures as you need, then save
>>> the resulting Instagram page in a new specific folder, which then contains
>>> all pictures with that  specific tag.
>>> 
>>> hope that helps
>>> 
>>> 
>>> 
>>> On Sat, Sep 17, 2016 at 5:27 PM, Rainer Hillrichs <hillrichs at uni-mannheim.de
>>>> wrote:
>>> 
>>>> Dear all,
>>>> 
>>>> I searched on the list and on the web but couldn't find anything: I'm
>>>> looging for a tool that collects Instagram images, websites, and data
>>>> associated with a specific tag. Basically, I want to type in a tag and end
>>>> up with a folder full of images, websites, and a table with data (e.g. user
>>>> name, date posted, URL, other tags). I already suspect that is a lot to ask
>>>> for ;-) Even a simpler tool would be a good start! As long as I don't have
>>>> to to end up saving individual images, websites, and typiing/copying stuff
>>>> into a table.
>>>> 
>>>> Suggestions very much appreciated!
>>>> Rainer
>>>> 
>>>> 
>>>> --
>>>> Dr. Rainer Hillrichs
>>>> Universität Mannheim
>>>> https://uni-mannheim.academia.edu/RainerHillrichs
>>>> _______________________________________________
>>>> The Air-L at listserv.aoir.org mailing list
>>>> is provided by the Association of Internet Researchers http://aoir.org
>>>> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/
>>>> listinfo.cgi/air-l-aoir.org
>>>> 
>>>> Join the Association of Internet Researchers:
>>>> http://www.aoir.org/
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> ________________________________________________
>>> Maurice Vergeer
>>> To contact me, see http://mauricevergeer.nl/node/5
>>> To see my publications, see http://mauricevergeer.nl/node/1
>>> ________________________________________________
>>> _______________________________________________
>>> The Air-L at listserv.aoir.org mailing list
>>> is provided by the Association of Internet Researchers http://aoir.org
>>> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>>> 
>>> Join the Association of Internet Researchers:
>>> http://www.aoir.org/
>> 
>> _______________________________________________
>> The Air-L at listserv.aoir.org mailing list
>> is provided by the Association of Internet Researchers http://aoir.org
>> Subscribe, change options or unsubscribe at: http://listserv.aoir.org/listinfo.cgi/air-l-aoir.org
>> 
>> Join the Association of Internet Researchers:
>> http://www.aoir.org/
> 



More information about the Air-L mailing list