Publication: Document Clustering with Evolved Single Word Search Queries
| dc.contributor.author | Hirsch, L | |
| dc.contributor.author | Haddela, P. S | |
| dc.contributor.author | Di Nuovo, A | |
| dc.date.accessioned | 2022-04-22T07:07:01Z | |
| dc.date.available | 2022-04-22T07:07:01Z | |
| dc.date.issued | 2021-06-28 | |
| dc.description.abstract | We present a novel, hybrid approach for clustering text databases. We use a genetic algorithm to generate and evolve a set of single word search queries in Apache Lucene format. Clusters are formed as the set of documents matching a search query. The queries are optimized to maximize the number of documents returned and to minimize the overlap between clusters (documents returned by more than one query in a set). Optionally, the number of clusters can be specified in advance, which will normally result in an improvement in performance. Not all documents in a collection are returned by any of the search queries in a set, so once the search query evolution is completed a second stage is performed whereby a KNN algorithm is applied to assign all unassigned documents to their nearest cluster. We describe the method and compare effectiveness with other well-known existing systems on 8 different text datasets. We note that search query format has the qualitative benefits of being interpretable and providing an explanation of cluster construction. | en_US |
| dc.identifier.citation | L. Hirsch, A. D. Nuovo and P. Haddela, "Document Clustering with Evolved Single Word Search Queries," 2021 IEEE Congress on Evolutionary Computation (CEC), 2021, pp. 280-287, doi: 10.1109/CEC45853.2021.9504770. | en_US |
| dc.identifier.doi | 10.1109/CEC45853.2021.9504770 | en_US |
| dc.identifier.isbn | 978-1-7281-8393-0 | |
| dc.identifier.uri | https://rda.sliit.lk/handle/123456789/2015 | |
| dc.language.iso | en | en_US |
| dc.publisher | IEEE | en_US |
| dc.relation.ispartofseries | 2021 IEEE Congress on Evolutionary Computation (CEC);Pages 280-287 | |
| dc.subject | Document Clustering | en_US |
| dc.subject | Evolved | en_US |
| dc.subject | Single Word | en_US |
| dc.subject | Search Queries | en_US |
| dc.title | Document Clustering with Evolved Single Word Search Queries | en_US |
| dc.type | Article | en_US |
| dspace.entity.type | Publication |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Document_Clustering_with_Evolved_Single_Word_Search_Queries.pdf
- Size:
- 2.17 MB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description:
