DTR

Name: DTR - Distributed information indexing and retrieving (December, 2013) Source Code: github.com/cadmuxe/DTR

It is a distributed system. The system can index and retrieve documents by distributing the calculation and storage into different machines; and the calculation node can join and left the system at anytime.

The system contain one MainServer and at least one ComputationNode which automatically started by the MainServer. And ComputationNode that running on different computers can join into the system. The index data will distribute on all the ComputationNodes, and for all data they will have several duplicated copies(in the implementation the duplicated number is 3).

In order to communicate efficiently, I wrote a set of data serialization functions(It is not a good idea to write them manually, so after that I found thrift and protocol buffers).