Benchmarking Top-K Keyword and Top-K Document Processing with T${}^2$K${}^2$ and T${}^2$K${}^2$D${}^2$

Ciprian‐Octavian Truică; Jérôme Darmont; Alexandru Boicea; Florin Rădulescu

doi:10.48550/arxiv.1804.07525

Benchmarking Top-K Keyword and Top-K Document Processing with T${}^2$K${}^2$ and T${}^2$K${}^2$D${}^2$

dc.contributor.author	Ciprian‐Octavian Truică
dc.contributor.author	Jérôme Darmont
dc.contributor.author	Alexandru Boicea
dc.contributor.author	Florin Rădulescu
dc.coverage.spatial	Bolivia
dc.date.accessioned	2026-03-22T20:47:48Z
dc.date.available	2026-03-22T20:47:48Z
dc.date.issued	2018
dc.description.abstract	Top-k keyword and top-k document extraction are very popular text analysis techniques. Top-k keywords and documents are often computed on-the-fly, but they exploit weighted vocabularies that are costly to build. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark currently addresses these problems. Hence, in this paper, we present T${}^2$K${}^2$, a top-k keywords and documents benchmark, and its decision support-oriented evolution T${}^2$K${}^2$D${}^2$. Both benchmarks feature a real tweet dataset and queries with various complexities and selectivities. They help evaluate weighting schemes and database implementations in terms of computing performance. To illustrate our bench-marks' relevance and genericity, we successfully ran performance tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on different relational (Oracle, PostgreSQL) and document-oriented (MongoDB) database implementations, on the other hand.
dc.identifier.doi	10.48550/arxiv.1804.07525
dc.identifier.uri	https://doi.org/10.48550/arxiv.1804.07525
dc.identifier.uri	https://andeanlibrary.org/handle/123456789/84119
dc.language.iso	en
dc.publisher	Cornell University
dc.relation.ispartof	arXiv (Cornell University)
dc.source	Universidad Privada Boliviana
dc.subject	Benchmarking
dc.subject	Computer science
dc.subject	Information retrieval
dc.title	Benchmarking Top-K Keyword and Top-K Document Processing with T${}^2$K${}^2$ and T${}^2$K${}^2$D${}^2$
dc.type	preprint

Collections

Artículo Científico (Preprint)

Benchmarking Top-K Keyword and Top-K Document Processing with T${}^2$K${}^2$ and T${}^2$K${}^2$D${}^2$

Files

Collections