[Air-l] Re: Data

Bram Dov Abramson bda at bazu.org
Thu May 20 15:51:40 PDT 2004


Justin,

It's quite a problem.  GISes, and tools like UCINET or even 
plain old spreadsheets, are great for manipulating the data -- 
once you have it.  Some thoughts which may be of help:

> In my ideal world, I would be 
> able to build a relational database of data traffic between
> the largest cities worldwide.  The data I have found shows
> gross data traffic between nodes, which includes traffic
> originated in third-party cities and destined for fourth-party
> cities, for example, and which does not provide an estimate 
> of the traffic originated in 3 and destined for 4. This means
> that the data doesn't relate every node in the city system 
> to every other in terms of network traffic inbound and
> outbound.

The TeleGeography research group (now part of PriMetrica, Inc.) 
has this kind of data for major cities.  The non-provider-
specific datasets they create on city-to-city Internet 
*bandwidth* displays the characteristics you mention.  But their 
city-to-city Internet *traffic* data are end-to-end.

Elijah is right to point out that collating, tabulating, and 
verifying this data is very difficult.  That's probably why the 
dataset only exists for major cities, and why TeleGeography only 
began the traffic work only after a few years on the Internet 
bandwidth side.

> you have any thoughts on how currently available data can be
> patched for network analysis, or how such a relational
> database could be built in the future?

If you go to TeleGeography, you'll probably want to ask for 
older data -- the most recent stuff is quite expensive.

If you prefer to construct your own, you'll probably want to do 
something attainable.  An interesting approach might be to 
combine DNS lookups with Web link analysis.  In other words, 
something like:

1) come up with a list of cities you want to test for;

2) find some set of Web sites for each of those cities, for 
example by looking up the registered addresses for the "top x" 
Web sites according to Nielsen//NetRatings or ComScore or Alexa 
or whoever else you judge least bad;

3) tabulate city-to-city links between these Web sites.

Based on the argument that these Web sites probably represented 
some very high percentage of all Web usage (the ratings service 
you went with would make some claim here), it seems to me you'd 
have something useable.

Now, that would obviously give you links between cities to which 
Web sites are registered -- not between the cities in which  Web 
sites are hosted.  If you thought the latter was more relevant, 
you'd probably want to make step 2 a bit fancier, involving DNS 
to IP to geolocation using one of many techniques.

As to whether or not doing all this is less costly time-wise 
than paying for what someone else has used their time to do ... 
that's another story.  It's a not-insignificant but interesting 
programming challenge, anyway; the devil would be in the 
tweaking.

A final thought: some (particularly George Barnett's group at 
SUNY Buffalo) have done a fair bit of work with country-to-
country telephone traffic in this vein.  Because international 
PSTN traffic has been collected for a much longer time, that 
kind of data (again, ITU or TeleGeography) is much more -- and 
much more cheaply -- obtainable.

cheers
Bram




More information about the Air-L mailing list