[Air-L] Nvivo for facebook pages
berno.rieder at gmail.com
Wed Apr 19 10:21:30 PDT 2017
> b) we have noticed differences in the amount of information gathered by
> NCapture in comparison with Netvizz. There are chunks of periods in which
> posts have not been collected by NCapture (and we confirmed that the data
> is there), which has surprised us, because we had not previously had this
I fear that this may have little to do with NCapture or Netvizz per se, but with the way APIs have come to function nowadays. Platforms like Facebook have the imperative to provide service as fast as possible with as much uptime as possible. Data completeness is simply not a concern. I imagine that Facebook has a tiered storage architecture that will hold some data readily available, while other elements are stored further down the hierarchy. Which elements are currently held in fast storage depends on the shard (storage unit) users are currently connecting to. Shards are possibly assigned on the basis of IP, app identifier (e.g. NCapture's app token), user identifier, main network affiliation, etc. When data is not in the fast storage you’re connected to, it may be omitted.
This is all just speculation, but as the developer of Netvizz, I have observed that sometimes, particularly at peak hours, post can be missing for one user, while another user can get everything without any problems. Liking a page seems to have some effect on this. Netvizz tries to alleviate some of these problems through caching, but given the size of Facebook compared to our meager resources, this is a losing battle.
My recommendation would be to compare retrieved data with the actual pages at least in a cursory fashion and, if possible, to check by downloading data from two different user accounts.
Ultimately, any tool that uses the public API is just a dumb data exporter that sits on Facebook's vast and weird data infrastructure - which we know preciously little about.
Bernhard Rieder | Associate Professor | New Media and Digital Culture
University of Amsterdam | Turfdraagsterpad 9 | 1012 XT Amsterdam | The Netherlands
http://thepoliticsofsystems.net | http://rieder.polsys.net | https://www.digitalmethods.net | @RiederB
More information about the Air-L