Repository logo
Repository
Browse
SLIIT Journals
OPAC
Log In
  1. Home
  2. Browse by Author

Browsing by Author "Di Nuovo, A"

Filter results by typing the first few letters
Now showing 1 - 1 of 1
  • Results Per Page
  • Sort Options
  • Thumbnail Image
    PublicationEmbargo
    Document Clustering with Evolved Single Word Search Queries
    (IEEE, 2021-06-28) Hirsch, L; Haddela, P. S; Di Nuovo, A
    We present a novel, hybrid approach for clustering text databases. We use a genetic algorithm to generate and evolve a set of single word search queries in Apache Lucene format. Clusters are formed as the set of documents matching a search query. The queries are optimized to maximize the number of documents returned and to minimize the overlap between clusters (documents returned by more than one query in a set). Optionally, the number of clusters can be specified in advance, which will normally result in an improvement in performance. Not all documents in a collection are returned by any of the search queries in a set, so once the search query evolution is completed a second stage is performed whereby a KNN algorithm is applied to assign all unassigned documents to their nearest cluster. We describe the method and compare effectiveness with other well-known existing systems on 8 different text datasets. We note that search query format has the qualitative benefits of being interpretable and providing an explanation of cluster construction.

Copyright 2025 © SLIIT. All Rights Reserved.

  • Privacy policy
  • End User Agreement
  • Send Feedback