Jaccardetäisyys
Jaccardetäisyys, also known as the Jaccard distance or Jaccard dissimilarity, is a statistic used for gauging the similarity and diversity of sample sets. It is defined as the size of the intersection divided by the size of the union of the sample sets. Mathematically, for two sets A and B, the Jaccard index is calculated as |A ∩ B| / |A ∪ B|. The Jaccard distance is then defined as 1 minus the Jaccard index, or 1 - (|A ∩ B| / |A ∪ B|). This means the distance is the size of the difference of the sets divided by the size of the union of the sets.
The Jaccard distance ranges from 0 to 1. A Jaccard distance of 0 indicates that the two