[Air-l] Re: Data
Bram Dov Abramson
bda at bazu.org
Thu May 20 15:51:40 PDT 2004
Justin,
It's quite a problem. GISes, and tools like UCINET or even
plain old spreadsheets, are great for manipulating the data --
once you have it. Some thoughts which may be of help:
> In my ideal world, I would be
> able to build a relational database of data traffic between
> the largest cities worldwide. The data I have found shows
> gross data traffic between nodes, which includes traffic
> originated in third-party cities and destined for fourth-party
> cities, for example, and which does not provide an estimate
> of the traffic originated in 3 and destined for 4. This means
> that the data doesn't relate every node in the city system
> to every other in terms of network traffic inbound and
> outbound.
The TeleGeography research group (now part of PriMetrica, Inc.)
has this kind of data for major cities. The non-provider-
specific datasets they create on city-to-city Internet
*bandwidth* displays the characteristics you mention. But their
city-to-city Internet *traffic* data are end-to-end.
Elijah is right to point out that collating, tabulating, and
verifying this data is very difficult. That's probably why the
dataset only exists for major cities, and why TeleGeography only
began the traffic work only after a few years on the Internet
bandwidth side.
> you have any thoughts on how currently available data can be
> patched for network analysis, or how such a relational
> database could be built in the future?
If you go to TeleGeography, you'll probably want to ask for
older data -- the most recent stuff is quite expensive.
If you prefer to construct your own, you'll probably want to do
something attainable. An interesting approach might be to
combine DNS lookups with Web link analysis. In other words,
something like:
1) come up with a list of cities you want to test for;
2) find some set of Web sites for each of those cities, for
example by looking up the registered addresses for the "top x"
Web sites according to Nielsen//NetRatings or ComScore or Alexa
or whoever else you judge least bad;
3) tabulate city-to-city links between these Web sites.
Based on the argument that these Web sites probably represented
some very high percentage of all Web usage (the ratings service
you went with would make some claim here), it seems to me you'd
have something useable.
Now, that would obviously give you links between cities to which
Web sites are registered -- not between the cities in which Web
sites are hosted. If you thought the latter was more relevant,
you'd probably want to make step 2 a bit fancier, involving DNS
to IP to geolocation using one of many techniques.
As to whether or not doing all this is less costly time-wise
than paying for what someone else has used their time to do ...
that's another story. It's a not-insignificant but interesting
programming challenge, anyway; the devil would be in the
tweaking.
A final thought: some (particularly George Barnett's group at
SUNY Buffalo) have done a fair bit of work with country-to-
country telephone traffic in this vein. Because international
PSTN traffic has been collected for a much longer time, that
kind of data (again, ITU or TeleGeography) is much more -- and
much more cheaply -- obtainable.
cheers
Bram
More information about the Air-L
mailing list