[Air-l] Re: Data
Justin Rosenthal
jrr at uchicago.edu
Thu May 20 16:17:41 PDT 2004
Elijah,
As I get deeper into this problem, I am slowly coming to terms with the fact
that the data I am after could only be generated at a great expense.
Primetrica comes the closest, but as I say, their datasets don't provide all
data points necessary for network analysis i.e. square-matrix of the form
ABCD
A -yyy
B y-yy
C yy-y
D yyy-
where y is a measure of data flow from city in row i to city in column j.
Too bad, because data like this could really open up some interesting problems.
Thanks for your observations,
Justin
Message: 15
Date: Thu, 20 May 2004 10:29:28 -0500 (CDT)
From: elijah wright <elw at stderr.org>
To: air-l at aoir.org
Subject: Re: [Air-l] Data
Reply-To: air-l at aoir.org
> I am interested in hearing any thoughts you have on a data problem that
> I have, that I am sure many of you have approached, and which is, of
> course, a result of the structure of the Internet itself. In my ideal
> world, I would be able to build a relational database of data traffic
> between the largest cities worldwide.
Social problem the first - the information you'd most like to have is
closely guarded by the involved companies. They keep it secret so that
other companies can't deduce all of their peering agreements and thereby
figure out how best to 'take advantage' of network position for profit.
This is a pretty common problem for a decentralized network, in my
experience.
> The data I have found shows gross data traffic between nodes, which
> includes traffic originated in third-party cities and destined for
> fourth-party cities, for example, and which does not provide an estimate
> of the traffic originated in 3 and destined for 4. This means that the
> data doesn't relate every node in the city system to every other in
> terms of network traffic inbound and outbound.
right - the nodes which are most easily measured/evaluated (the network
hubs) don't actually act as termination points for a whole lot of traffic.
they're just points in the system as a whole, with peers that serve
endpoints but are not backbone nodes themselves.
> Have you approached this problem? Do you have any thoughts on how
> currently available data can be patched for network analysis, or how
> such a relational database could be built in the future?
a graph-like structure is good for this, IMHO. something like this:
sourcenode destnode measurement eval.date
sourcenode destnode measurement eval.date
sourcenode destnode measurement eval.date
ad nauseum. you may need some more values, depending on what it is that
you're wanting to do. but that general form (spreadsheet-like) is one of
the simpler structures to store in a database, and reformatting those
tables into something that tools like UCINet or Pajek can display is not
such a terrible task.
elijah
_____________________________________
Justin Rosenthal
MA Candidate - Social Science
University of Chicago
jrr at uchicago.edu
More information about the Air-L
mailing list