Analysis: Not very decentralised at all.
Delusions of anonymity have clouded the issue of Bittorrent's reliance on relatively few servers to distribute files on the network.
A supposedly worrying report detailed how researchers in France had managed to track users participating in Bittorrent swarms and had somehow revealed never before generated data. The truth is, not only have researchers been "sampling" real data from P2P for years, it's not even very hard to do.
P2P networks are in many ways the same as the client-server networks of the last century. The difference with a P2P network is the ability for any node to be a server, and in the case of Bittorrent, where files are split into "chunks", a server does not require the whole file to participate. Much like one would expect a traditional server to keep track of which node is accessing it, P2P nodes do the same.
For years researchers have tried to model the behaviour of P2P networks in order to gain an understanding of peer distributions, uptimes and how the network copes with the transient nature of its membership, known as churn. The way researchers have gone about this task is to create crawlers to walk the network, identifying nodes by their IP addresses. Depending on the area of research, the lifetime of a node and what files it was offering to other nodes were also recorded.
Prior to Bittorrent, research had been carried out on earlier networks that were home to file shares. Saroiu, back in 2002, managed to gather not only the IP addresses of Napster peers but intimate knowledge of what files the peer was sharing. The authors claimed they managed to grab somewhere in the region of 40 to 60 per cent of all the peers on a particular Napster server.
Though Napster wasn't a P2P network, the same kind of data was harvested through the Gnutella network, a bone-fide "unstructured" P2P network. Here the authors claimed to gather between 8,000 to 10,000 peers in just two minutes. This, they claimed represented anything between 25 and 50 per cent of the nodes on the network at that time.
Another study carried out in 2003 focused on the Overnet network, known to most as its sister network Edonkey. Bhagwan ran a crawler to harvest information and a "prober" to check whether nodes were still active on the P2P network. The research managed to capture 84,000 host IDs in a 24 hour period.
While those networks have seen their popularity recede or disappear completely, research has focused on Bittorrent in the past five years. The difference with Bittorrent is that one doesn't require a bespoke crawler to see other peers peddling chunks in the swarm. Bittorent clients such as µTorrent will display information such as throughput, IP addresses, client version and even IP geo-location all automatically.
Bittorrent is widely used as a low cost distribution method for legitimate software such as Linux distributions. Using the protocol to acquire the latest Ubuntu ISO, it wasn't long before we were able to see several hundred other nodes, with their IP addresses taking part in the swarm. Using built-in IP geo-location, it was easy to find out, roughly, where other nodes were located.
Of course, none of this should be particularly surprising or shocking to anyone who has used the Internet for any length of time. Clients such as µTorrent make it relatively simple to view this information but programs such as Netstat or Wireshark can provide lower level data regardless of protocol.
It's a shame then that the real conclusion from Blond was mis-reported. The finding of the research was not the ability to monitor real-time node participation in P2P networks, something that was demonstrated a decade previously, but rather the reliance of a P2P protocol on relatively few nodes to inject data into the system.
theinquirer.net (c) 2010 Incisive Media
Issue: 137 | June, 2012