Benchmarking Top-K Keyword and Top-K Document Processing with T${}^2$K${}^2$ and T${}^2$K${}^2$D${}^2$

dc.contributor.authorCiprian‐Octavian Truică
dc.contributor.authorJérôme Darmont
dc.contributor.authorAlexandru Boicea
dc.contributor.authorFlorin Rădulescu
dc.coverage.spatialBolivia
dc.date.accessioned2026-03-22T20:47:48Z
dc.date.available2026-03-22T20:47:48Z
dc.date.issued2018
dc.description.abstractTop-k keyword and top-k document extraction are very popular text analysis techniques. Top-k keywords and documents are often computed on-the-fly, but they exploit weighted vocabularies that are costly to build. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark currently addresses these problems. Hence, in this paper, we present T${}^2$K${}^2$, a top-k keywords and documents benchmark, and its decision support-oriented evolution T${}^2$K${}^2$D${}^2$. Both benchmarks feature a real tweet dataset and queries with various complexities and selectivities. They help evaluate weighting schemes and database implementations in terms of computing performance. To illustrate our bench-marks' relevance and genericity, we successfully ran performance tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on different relational (Oracle, PostgreSQL) and document-oriented (MongoDB) database implementations, on the other hand.
dc.identifier.doi10.48550/arxiv.1804.07525
dc.identifier.urihttps://doi.org/10.48550/arxiv.1804.07525
dc.identifier.urihttps://andeanlibrary.org/handle/123456789/84119
dc.language.isoen
dc.publisherCornell University
dc.relation.ispartofarXiv (Cornell University)
dc.sourceUniversidad Privada Boliviana
dc.subjectBenchmarking
dc.subjectComputer science
dc.subjectInformation retrieval
dc.titleBenchmarking Top-K Keyword and Top-K Document Processing with T${}^2$K${}^2$ and T${}^2$K${}^2$D${}^2$
dc.typepreprint

Files