Systemic Risk in AI Development
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
This appears to be a machine learning paper on cluster validity indexes rather than an AI safety resource; the title and content are misaligned, suggesting potential metadata error or incorrect classification.
Paper Details
Metadata
Abstract
The optimal number of clusters is one of the main concerns when applying cluster analysis. Several cluster validity indexes have been introduced to address this problem. However, in some situations, there is more than one option that can be chosen as the final number of clusters. This aspect has been overlooked by most of the existing works in this area. In this study, we introduce a correlation-based fuzzy cluster validity index known as the Wiroonsri-Preedasawakul (WP) index. This index is defined based on the correlation between the actual distance between a pair of data points and the distance between adjusted centroids with respect to that pair. We evaluate and compare the performance of our index with several existing indexes, including Xie-Beni, Pakhira-Bandyopadhyay-Maulik, Tang, Wu-Li, generalized C, and Kwon2. We conduct this evaluation on four types of datasets: artificial datasets, real-world datasets, simulated datasets with ranks, and image datasets, using the fuzzy c-means algorithm. Overall, the WP index outperforms most, if not all, of these indexes in terms of accurately detecting the optimal number of clusters and providing accurate secondary options. Moreover, our index remains effective even when the fuzziness parameter $m$ is set to a large value. Our R package called UniversalCVI used in this work is available at https://CRAN.R-project.org/package=UniversalCVI.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI Risk Interaction Network Model | Analysis | 64.0 |
Cached Content Preview
00footnotetext: AMS 2010 subject classifications: Primary 62H30 Secondary 68T10.
# A correlation-based fuzzy cluster validity index with secondary options detector
Nathakhun Wiroonsri and Onthada Preedasawakul
Mathematics and Statistics with Applications Research Group (MaSA)
Department of Mathematics, King Mongkut’s University of Technology Thonburi
This author is financially supported by National Research Council of Thailand (NRCT), Grant number: N42A660991 (2023). Email: nathakhun.wir@kmutt.ac.thEmail: o.preedasawakul@gmail.com
###### Abstract
The optimal number of clusters is one of the main concerns when applying cluster analysis. Several cluster validity indexes have been introduced to address this problem. However, in some situations, there is more than one option that can be chosen as the final number of clusters. This aspect has been overlooked by most of the existing works in this area. In this study, we introduce a correlation-based fuzzy cluster validity index known as the Wiroonsri–Preedasawakul (WP) index. This index is defined based on the correlation between the actual distance between a pair of data points and the distance between adjusted centroids with respect to that pair. We evaluate and compare the performance of our index with several existing indexes, including Xie–Beni, Pakhira–Bandyopadhyay–Maulik, Tang, Wu–Li, generalized C, and Kwon2. We conduct this evaluation on four types of datasets: artificial datasets, real-world datasets, simulated datasets with ranks, and image datasets, using the fuzzy c-means algorithm. Overall, the WP index outperforms most, if not all, of these indexes in terms of accurately detecting the optimal number of clusters and providing accurate secondary options. Moreover, our index remains effective even when the fuzziness parameter m𝑚m is set to a large value. Our R package called UniversalCVI used in this work is available at https://CRAN.R-project.org/package=UniversalCVI.
Keyword: Cluster analysis, CRAN, fuzzy c-means (FCM),
image processing, ranking, R package, sub-optimal.
## 1 Introduction
Cluster analysis is an unsupervised learning tool in machine learning that is widely used in various areas, including business, pattern recognition, data mining, medical diagnosis, and image processing, among others. It relies on the inherent properties, patterns, or similarities of objects to reveal meaningful information. The aim is to identify natural groupings within a dataset that are not initially apparent and without prior knowledge of the groups. There are several clustering algorithms, mainly categorized as centroid-based clustering (such as K-means, K-medoids, K-medians, and fuzzy c-means \[FCM\]), hierarchical clustering (including single linkage, complete linkage, group average agglomerative, and Ward’s criterion), density-based clustering (such as DBSCAN, DENCLUE, and OPTICS), probabilistic clustering (EM), grid-based clustering (including CLIQUE, MAFIA, ENCLUS, and OptiGrid), and
... (truncated, 67 KB total)5ea1633005740b6f | Stable ID: Nzc4M2FlYj