Skip to content

FlexiClustering

Unai Lopez edited this page Jul 5, 2018 · 1 revision

FlexiClustering takes as input list of multi-word terms recognised by FlexiTerm and performs hierarchical clustering of these terms.

To run FlexiClustering after you have run FlexiTerm:

  1. Run FlexiClustering.bat from the command line.
  2. Check results by uploading output_dendrogram.txt to evolgenius (an online tree viewer).

Files in the folder structure related to FlexiClustering:

  • script/FlexiClustering.bat: Batch file that runs FlexiClustering.java.
  • out/output_agglomeration.csv: Cluster agglomeration schedule: cluster1 cluster2 cluster1+2 distance
  • out/output_dendrogram.txt: Main clustering results - a dendrogram in the Newick format: https://en.wikipedia.org/wiki/Newick_format
  • out/output_distance.csv: Term distance matrix, which you may use to perform clustering with a tool of your own choice, e.g. SPSS.
  • out/FlexiClustering.log: Listing output used for debugging.
  • src/FlexiClustering.java: Main Java class.

Clone this wiki locally