Publication:
Document Clustering with Evolved Single Word Search Queries

dc.contributor.authorHirsch, L
dc.contributor.authorHaddela, P. S
dc.contributor.authorDi Nuovo, A
dc.date.accessioned2022-04-22T07:07:01Z
dc.date.available2022-04-22T07:07:01Z
dc.date.issued2021-06-28
dc.description.abstractWe present a novel, hybrid approach for clustering text databases. We use a genetic algorithm to generate and evolve a set of single word search queries in Apache Lucene format. Clusters are formed as the set of documents matching a search query. The queries are optimized to maximize the number of documents returned and to minimize the overlap between clusters (documents returned by more than one query in a set). Optionally, the number of clusters can be specified in advance, which will normally result in an improvement in performance. Not all documents in a collection are returned by any of the search queries in a set, so once the search query evolution is completed a second stage is performed whereby a KNN algorithm is applied to assign all unassigned documents to their nearest cluster. We describe the method and compare effectiveness with other well-known existing systems on 8 different text datasets. We note that search query format has the qualitative benefits of being interpretable and providing an explanation of cluster construction.en_US
dc.identifier.citationL. Hirsch, A. D. Nuovo and P. Haddela, "Document Clustering with Evolved Single Word Search Queries," 2021 IEEE Congress on Evolutionary Computation (CEC), 2021, pp. 280-287, doi: 10.1109/CEC45853.2021.9504770.en_US
dc.identifier.doi10.1109/CEC45853.2021.9504770en_US
dc.identifier.isbn978-1-7281-8393-0
dc.identifier.urihttps://rda.sliit.lk/handle/123456789/2015
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.relation.ispartofseries2021 IEEE Congress on Evolutionary Computation (CEC);Pages 280-287
dc.subjectDocument Clusteringen_US
dc.subjectEvolveden_US
dc.subjectSingle Worden_US
dc.subjectSearch Queriesen_US
dc.titleDocument Clustering with Evolved Single Word Search Queriesen_US
dc.typeArticleen_US
dspace.entity.typePublication

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Document_Clustering_with_Evolved_Single_Word_Search_Queries.pdf
Size:
2.17 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: