Bernhard Rieder berno.rieder at gmail.com
Wed Apr 19 10:21:30 PDT 2017

I fear that this may have little to do with NCapture or Netvizz per se, but with the way APIs have come to function nowadays. Platforms like Facebook have the imperative to provide service as fast as possible with as much uptime as possible. Data completeness is simply not a concern. I imagine that Facebook has a tiered storage architecture that will hold some data readily available, while other elements are stored further down the hierarchy. Which elements are currently held in fast storage depends on the shard (storage unit) users are currently connecting to. Shards are possibly assigned on the basis of IP, app identifier (e.g.  NCapture's app token), user identifier, main network affiliation, etc. When data is not in the fast storage you’re connected to, it may be omitted.

This is all just speculation, but as the developer of Netvizz, I have observed that sometimes, particularly at peak hours, post can be missing for one user, while another user can get everything without any problems. Liking a page seems to have some effect on this. Netvizz tries to alleviate some of these problems through caching, but given the size of Facebook compared to our meager resources, this is a losing battle.

My recommendation would be to compare retrieved data with the actual pages at least in a cursory fashion and, if possible, to check by downloading data from two different user accounts.

Ultimately, any tool that uses the public API is just a dumb data exporter that sits on Facebook's vast and weird data infrastructure - which we know preciously little about.


