[Air-l] Re: Data

Justin Rosenthal jrr at uchicago.edu
Thu May 20 16:17:41 PDT 2004


Elijah,

As I get deeper into this problem, I am slowly coming to terms with the fact 
that the data I am after could only be generated at a great expense.  
Primetrica comes the closest, but as I say, their datasets don't provide all 
data points necessary for network analysis i.e. square-matrix of the form

  ABCD
A -yyy
B y-yy
C yy-y
D yyy-

where y is a measure of data flow from city in row i to city in column j.

Too bad, because data like this could really open up some interesting problems.

Thanks for your observations,

Justin

Message: 15 
Date: Thu, 20 May 2004 10:29:28 -0500 (CDT) 
From: elijah wright <elw at stderr.org> 
To: air-l at aoir.org 
Subject: Re: [Air-l] Data 
Reply-To: air-l at aoir.org 


> I am interested in hearing any thoughts you have on a data problem that 
> I have, that I am sure many of you have approached, and which is, of 
> course, a result of the structure of the Internet itself.  In my ideal 
> world, I would be able to build a relational database of data traffic 
> between the largest cities worldwide. 

Social problem the first - the information you'd most like to have is 
closely guarded by the involved companies.  They keep it secret so that 
other companies can't deduce all of their peering agreements and thereby 
figure out how best to 'take advantage' of network position for profit. 

This is a pretty common problem for a decentralized network, in my 
experience. 

> The data I have found shows gross data traffic between nodes, which 
> includes traffic originated in third-party cities and destined for 
> fourth-party cities, for example, and which does not provide an estimate 
> of the traffic originated in 3 and destined for 4.  This means that the 
> data doesn't relate every node in the city system to every other in 
> terms of network traffic inbound and outbound. 

right - the nodes which are most easily measured/evaluated (the network 
hubs) don't actually act as termination points for a whole lot of traffic. 
they're just points in the system as a whole, with peers that serve 
endpoints but are not backbone nodes themselves. 

> Have you approached this problem?  Do you have any thoughts on how 
> currently available data can be patched for network analysis, or how 
> such a relational database could be built in the future? 

a graph-like structure is good for this, IMHO.  something like this: 

sourcenode        destnode        measurement        eval.date 
sourcenode        destnode        measurement        eval.date 
sourcenode        destnode        measurement        eval.date 

ad nauseum.  you may need some more values, depending on what it is that 
you're wanting to do.  but that general form (spreadsheet-like) is one of 
the simpler structures to store in a database, and reformatting those 
tables into something that tools like UCINet or Pajek can display is not 
such a terrible task. 

elijah 


_____________________________________
Justin Rosenthal
MA Candidate - Social Science
University of Chicago
jrr at uchicago.edu




More information about the Air-L mailing list