|
|||||||||
|
Menu |
Part 4: File Sharing File-sharing is the most known application of P2P. The idea of file-sharing is as it name implies is sharing user's files with his fellow peers in the network. Each user makes a portion of his files available to other users in the network. This way users can freely share and receive various files of their interest. First file-sharing application was Napster, which allowed swapping MP3 music files. Today file-sharing applications support swapping files of all types. Some of the popular file-sharing applications are Gnutella, KaZaA, iMesh, Music Nation (Morpheus), AudioGalaxy, eDonkey2000, Scour Exchange and numerous others. The file sharing world has many interesting aspects in addition to the technological one. One area is the resistance to new information distribution techniques from old content distributors like the established film and music industry. Unfortunately, P2P file sharing systems have been labeled as a technology for illegal distribution of copyrighted material. It is worth noting that this technology has usages besides the pirating ones. One example is as a tool for in-house project share spaces. Several of the technologies also use a decentralized operation model with no central single point of failure that makes them suitable for use as a tool for combining together numerous heterogeneous information systems. An example of this application already exists in the "Care Data Exchange" system from CareScience Inc. that applies the techniques from P2P file sharing to establish glue for searching and exchanging medical journals between self-contained systems. The world of P2P file sharing technologies is a constantly changing field of new ideas and systems. For example, fie-sharing applications now allow chat between users and sophisticated fuzzy searches on files' meta-data. In this section, we will try to give an introduction to technical aspects of P2P file sharing technologies by describing the Napster technology and the P2P architectures that replaced it. First, one must understand that the communication channels in P2P networks are application level logical channels, independent of the physical or network level. As a result of this, your closest peer may be physically located on the other side of the world and the computer in your neighbor office is too distant and not within your reach. We should mention that research is done on P2P systems that try to merge the logical infrastructure to the physical for performance purposes. Top
![]() Napster has undoubtedly been the first killer application of P2P technology. As the user sees it, Napster is a music exchange community. More precisely, it is a commercial, centrally coordinated, P2P, MP3 file exchange system with chat support.
Peer operation Downloading is carried out directly between peers without the involvement of the central Napster service. A peer wishing to download a file addresses the specific data unit on a serving peer in the form of the IP-address and the filename of the MP3-file. Upon connection termination, the central Napster server removes MP3 file entries in its database relating to this specific peer.
![]()
Network The most significant advantages and disadvantages of the Napster technology are just two sides of the same property. The centralized service makes Napster as scalable as any other traditional Internet service. Scalability is simply a matter of more bandwidth to a backbone system with enough CPU power. The downside is the vulnerability of a centralized system and the dependence on a closed technology under the control of a commercial actor. This is what allowed the RIAA to shutdown Napster; by closing the main server. Napster was virtually eliminated in a second. Several other file-sharing technologies are based on the same design as Napster. Some clones specialize in the exchange of other or more generic file types (Scour Exchange, Audio Galaxy ,File Rouge), "improvement" of the basic functionality (eDonkey2000, iMesh, etc.) or in community/content . From technological point of view, all we can say about Napster is that it was inspiration for other people to create better file-sharing architectures. Napster employed semi-P2P infrastructure, simple dynamic addressing and rather obvious peer discovery mechanism. Top
![]() Gnutella is a fully decentralized P2P file sharing system. Another more interesting way of describing Gnutella is as a network of simple stand alone web servers interconnected and searchable through the Gnutella network. Gnutella has lived its first year in the shadow of its famous P2P cousin Napster. The source was never released and the original program binary was online for only a few hours before someone in the Nullsoft/AOL/Warner Corporation silently removed it. As another example of the viral characteristics of Internet and P2P technologies in general, information released is almost impossible to retrieve and the original Gnutella client was widely spread, mostly thanks to the high impact of the -Slashdot forum-. Soon after the initial release, the Gnutella protocol was reverse-engineered and resulted in alternative implementations and also initiated a further development of the technology. Gnutella is under constant development to remove bottlenecks and to expand the user functionality. The following explanation is mostly based on its original design that supported searching and downloading of files in a fully decentralized network of small stand alone web servers. The term Gnutella is used to describe: 1. The original client implementation from Nullsoft (or alternative implementations). 2. The specification of the protocol that makes interoperation possible between different implementations. 3. The network of interconnected Gnutella clients often denoted as the GnutellaNet. Gnutella introduced the term Servent for a peer participating in a P2P network. The word is a paraphrase of the words "server" and "client" and indicates the dualistic nature of a peer in the network where the client/server separation has disappeared and the peer is operating both as a server and a client.
The Gnutella Rave
![]() The term Rave also hints at the constant reformation of the network due to servents entering and leaving the mesh or dropping of network links due to communication problems. Each servent tries to maintain a number of connections and the "repair" process results in a dynamically changing "neighborhood". The design will in practice only make a subset of the whole network reachable to a single servent. Since a servent also has its private view of the neighborhood, each servent has an individual view of what the network looks like. The reachable region for a peer is commonly denoted as its horizon. The view horizon of Gnutella servents makes all data beyond this region both invisible and inaccessible. At first sight it may seem quite a limitation for a participant only to reach a part of the available information. Due to the super distribution property of a P2P system, information will spread around and in many cases be available in the horizon of a specific peer. The dynamic nature of the Gnutella network, and also the network horizon of servents, makes it almost impossible to get an exact view of the whole network. It is not strictly correct to talk about "the GnutellaNet" because several separate networks exist with no interconnection to "the Gnutella network". Since the early days of Gnutella, implementations have supported the formation of separate communities by giving the user the ability to change the protocol name used in the connection request string. The Gnutella implementations simply describe this as the 8-character name of the network. Some client implementations support password protected private networks but access to a private network will typically only require the knowledge of the clear-text network ID and the location of an active servent. Future Gnutella implementations can be expected to have improved its support for private locked networks.
The rules of the game A running servent maintains a locally dynamic cache, or list, of known active servents. -host image -. It picks candidates for its neighbours from this list. This cache is built with the help of the "Ping" and "Pong" protocol messages. Both during the bootstrap, and also through its lifetime, a servent generates "Ping" messages to inform others of its existence. This message is broadcasted to the peers within its network horizon. The region is specified with a TTL (time-to-live) parameter telling how deep into the mesh the message should be distributed. TTL is just a counter that a servent decreases when it forwards a message. Further broadcasting of the message is dropped when it reaches zero. A servent receiving a "Ping" message acknowledges this by returning a "Pong" protocol message that is routed back through the same path as the "Ping" message. A servent can then do an active discovery using a "Ping" message, waiting to receive the acknowledgment "Pong" messages, or just stay passively listening for "Ping" messages (which it of course should acknowledge with a "Pong"). Not all messages in the Gnutella network are broadcasted. Some messages ("Pong", "QueryHit" and "Push") are routed directly from source to destination. Routing of messages throughout the network is based on a mechanism where a servent temporarily stores the "semi-unique" message identifier ("Descriptor ID") together with the network link source where the message was received. When a message with a corresponding ID is received, the servent looks in this cache to locate the outgoing link to where the message should be pushed.
Searching As stated above, the QueryHit message is routed back to the originator of the search by matching the descriptor ID against the identical Query ID. Existing Gnutella implementations interpret this as a search string to match against the names of local files made available to the GnutellaNet. The matching algorithm varies amongst implementations but the search string is commonly treated as a sub string of a potential file name. The interpretations of a Gnutella query as a file search is also substantiated by the specification of the format of a result set in a QueryHit message. Although the format restricts the utilization of Gnutella for file searching purposes only, further references to a match are done using an ID that uniquely identifies the resource at a specific servent. It is worth noting that this opens the utilization of the Gnutella technology for any type of distributed searches. One example is a distributed patient information retrieval system using Gnutella as the glue to connect all the non-cooperating information systems typically found in e.g. larger hospitals.
Downloading
Anonymity Publishing: Publishing is a "passive action" in Gnutella. A file is "published to the Gnutella network" by making it available to the local servent. The presence of the document is not reported anywhere and other servents must actively detect it by issuing a query. This means no file browsing supported like in the "Direct Connect" client. The publishing is a non-anonymous process since any searchers may disclose the location of the document. Searching: The originator of a query is hidden for a servent receiving the message, and searching the Gnutella network is consequently anonymous. However, immediate neighbours know the source since this is indicated by a "Hops" field of 0 (number of times the message has been forwarded) and they know the link address of the initiator. As long as servents deeper into the mesh do not cooperate to reveal the message distribution chain, searching should be classified as anonymous. Since the searches are visible to a specific servent, it may implement filtering rules deciding which searches should be broadcasted or not. Although a central censoring mechanism is impossible to implement, Gnutella is a rare example of a self-governed community where the individual members actually have direct influence of the utilization of the system from others. Downloading: Downloading is not anonymous and the requestor is visible to the serving peer. This was exploited by an initiative to stop the distribution of illegal material on the Gnutella network. The "Gnutella Wall of Shame" revealed the identity of requestors trying to download harmless material that was published as clearly illegal material. Guerrilla Network Trading is another decentralized file sharing technology using a modification of the Gnutella design supporting anonymous downloads. The transfers are encrypted and also routed through the network. Downloading through the network increases the load but would probably not be a problem in practice. This due to the nature of the information to flow through a network designed to "make a system for spreading political propaganda in countries without the freedom of speech".
Scalability
1st Generation - From Nullsoft to the August 2000 breakdown: In the period from March to August 2000, a Gnutella servent typically reported between 1000-4000 available servents and periodically up to 8000. Then suddenly, users started to experience abnormalities. The number of visible peers dropped dramatically although the peer brokers did not report any reduction in the number of servents connecting to the network. This incident is known as the August breakdown of Gnutella. The network after the breakdown lived in a semi-collapsed state where the network was fragmented in numerous disconnected segments. It did not collapse entirely but existed in an intermediate position between scaling and collapse. Gnutella collapsed due to its popularity. Analyses of the August incident showed that the initial Gnutella had reached its maximum and further expansion was impossible since peers connected through low bandwidth connections could not keep up with the traffic load. Gnutella had reached its "modem bandwidth barrier".
2nd Generation - Connection logic:
3rd Generation - Dynamic or static hierarchies:
![]() The reflectors require installation and configuration of the specialized software. Its availability must also be announced by out-of-band means so servents requiring this functionality can be manually configured to utilize them as GnutellaNet access points. The reflector is a method to manually build a static 2-level network hierarchy in the Gnutella network. What GnutellaNet really needs is a fully automatic and transparent solution. We mean Gnutella must become "self-aware". Some mechanism must elect servents to become reflectors automatically by some form of sensitivity. Also it is very likely that caching will boost performance, because traces have shown that only a small (compared with the available selection) portion if searches are performed. We believe multi-level reflectors (adding more than two levels should be trivial generalization) and smart caching are the key to Gnutella growth in the future. New, more advanced P2P networks like FastTrack have those features and present strong competition to various Gnutella clients. Top |
||||||||
|
Site Map | Top Page |
|||||||||