There are a quite a few tools experimenter can use to investigate the behaviour of the IP network from an end user's point of view. The proper choice of tool may help to reveal some information about the delays IP fragments experience travelling accross the network or can characterize the temporal capacity conditions or can reveal the topological structure of the network.
Trying to take advantage of the fact that Monroe nodes are connected to the network via various access technologies and network providers (we call them operators) we designed experiments, whose goals were to investigate if there are observable differences by the choice of operator.
In the measurement campaigns in layer 3 the
traceroute application was used.
traceroute program injects a sequence of IP packets in the computer network with different TTL values so as to capture hop-by-hop IP connectivity.
Besides capturing the addresses of the hops dropping the packet due to time to live expired the program also measures the round trip delay for each depth the probe sequence dictates.
Note, however, the round trip delay is a composition of several factors, including the queuing delay, the propagation delay and the processing delay.
Since the goal is to see if some differentiation can be made according to the operator the edge between the network of the operator and the Internet needs to be identified. Furthermore, within the operator network also a boundary can be defined, which we believe is highly coupled with the structure of the access configuration. The former we call the gateway the latter we call the NAT gateway.
In order to say something statistically signifficant a larger sample was our goal to collect. We achieved it by covering as many Monroe source nodes as available when doing the measurement and also by targetting more nodes.
According to the target nodes two measurement scenarios were defined:
The PlanetLab is a network of computational resources shared accross the Internet operated mostly by universities and with an academic userbase.
In the popular sites case the top 1000 most frequently visited domains are used as targets. CITATION IS MISSING
In this study we collect our findings for the popular sites case.
Based on the IP address sequence the
traceroute collects towards a given target the NAT gateway is defined as the first IP address in the public dimain that fulfill the following requirements all.
The IP address of the preceeding hop has to be a private address.
Those measurements, where the preceeding hop denies telling their IP addresses are dropped, because we cannot be sure wheter that hop is still in the access LAN or the NAT gateway itself.
Offline after the measurement logs are collected we resolve the autonomous systems (AS) the IP address belong to and keep only those measurements where the resolved AS is in line with the operator of choice.
After careful filtering of data we look at
The timestamp encoded in the filenames follow a specific pattern. Note: my laptop's clock is offset by an hour compared to the timestamps produced by Monroe nodes.
Most often the IP address (IPv4) is in dotted decimal format, however, for data manipulation we better user 4-byte integers.
We use a public API to resolve the location of public IP addresses. For most addresses also AS information is present.
We use another API to retrieve AS information about a given IP address. The response often contain a whole IP range or even more ranges allocated to a given AS. In order to minimize the number of the remote procedure calls we extract these information, too.
Time consuming are calling the AS resolution and location resolution remote functions, and also they impose some quota policies against request frequency, so we try avoiding double look up by saving the responses in a local light weight sqlite database.
traceroute log filenames encode the following information:
We implement our own simple represenation of the IP address sequence each measurement yields.
This function pick the NAT gateway from a traceroute instance that conform to the definition.
We are still dealing with only a few nodes so there is no problem with using a computationally loaded distance formula. The Haversine formula gives the distance between two points on the surface of a sphere along a great circle.
We access the raw data from our remote server over ssh filesystem.
Loading and parsing all the files and looking up the AS information of NAT gateway candidates is a time consuming step. When hitting the API request quota, we may stop this loop and restart it after a while.
In a filtered table we keep only those traceroute instances that satisfy the following:
Because parsing the raw files is very expensive in terms of time, we save the formerly extracted information right away.
CHECKPOINT: Here is how to read back the extracted information.
The number of raw traceroute log files that contain any information and those kept.
The number of source nodes, destinations and operators involved in the measurements.
The number of complete traceroutes:
By the definition the NAT gateway must belong to an AS, which relates to the given operator. In few cases it may happen the NAT gateway candidate has an AS resolved that does not conform with this expectation. We need to filter them out and we do it manually as there is no strict rules to the string representation of the various AS-s.
We analyse the diameter of the access network i.e. the hop count between the Monroe node and the NAT gateway.
According to the operator there are two strategies we can spot. Either a single radius describes the access network diameter or two different radii. Interesting to see the two Telia operators share the same radii, which may be the footprint of they following the same technological and configurational scheme.
Investigate geographical distances related to the NAT gateways.
Load the node positions retrieved by the help of
cassandra API and double check that all nodes are covered.
We need to further filter our dataset to keep only those logs that reached the target. Visit the former table to see how many data are kept versus all.
Look up the coordinates of the NAT gateways kept.
Calculate the distance between the target and the NAT gateway
Join tables so Monroe node coordinates are present in the filtered dataset.
FIXME: we should use
We need to parse those traceroutes once again which we kept filtered and complete.
Look up AS-es in the rest of the traceroute.
Note: Filtering is not automated at this pont, we peeking into the data for all operators one-by-one and build the mapping criteria manually.