Red Hat Developer Program
August 16, 2013

Efficient top-k query processing on distributed column family databases: by Rui Vieira

Ranking queries are one of the central topics in the field of Information Retrieval with considerable applications in fields such as analytics. One of the challenges is to provide solutions which can be adapted to distributed data sources, specifically NoSQL distributed column-oriented databases, and comply to "user" real-time constraints, especially when dealing with massive amounts of data. In this talk, we discuss the implementation and challenges for some of the most promising algorithms to address these challenges. We also analyse their scalability and substantial gains in terms of bandwidth and execution time, as indicated by the experimental results."