Skip to content
Longterm Wiki
Back

Systemic Risk in AI Development

paper

Authors

Nathakhun Wiroonsri·Onthada Preedasawakul

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

This appears to be a machine learning paper on cluster validity indexes rather than an AI safety resource; the title and content are misaligned, suggesting potential metadata error or incorrect classification.

Paper Details

Citations
0
0 influential
Year
2026
Methodology
book-chapter
Categories
AI and ML Governance in Financial Services

Metadata

arxiv preprintreference

Abstract

The optimal number of clusters is one of the main concerns when applying cluster analysis. Several cluster validity indexes have been introduced to address this problem. However, in some situations, there is more than one option that can be chosen as the final number of clusters. This aspect has been overlooked by most of the existing works in this area. In this study, we introduce a correlation-based fuzzy cluster validity index known as the Wiroonsri-Preedasawakul (WP) index. This index is defined based on the correlation between the actual distance between a pair of data points and the distance between adjusted centroids with respect to that pair. We evaluate and compare the performance of our index with several existing indexes, including Xie-Beni, Pakhira-Bandyopadhyay-Maulik, Tang, Wu-Li, generalized C, and Kwon2. We conduct this evaluation on four types of datasets: artificial datasets, real-world datasets, simulated datasets with ranks, and image datasets, using the fuzzy c-means algorithm. Overall, the WP index outperforms most, if not all, of these indexes in terms of accurately detecting the optimal number of clusters and providing accurate secondary options. Moreover, our index remains effective even when the fuzziness parameter $m$ is set to a large value. Our R package called UniversalCVI used in this work is available at https://CRAN.R-project.org/package=UniversalCVI.

Cited by 1 page

PageTypeQuality
AI Risk Interaction Network ModelAnalysis64.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202667 KB
00footnotetext: AMS 2010 subject classifications: Primary 62H30 Secondary 68T10.

# A correlation-based fuzzy cluster validity index with secondary options detector

Nathakhun Wiroonsri and Onthada Preedasawakul

Mathematics and Statistics with Applications Research Group (MaSA)

Department of Mathematics, King Mongkut’s University of Technology Thonburi
This author is financially supported by National Research Council of Thailand (NRCT), Grant number: N42A660991 (2023). Email: nathakhun.wir@kmutt.ac.thEmail: o.preedasawakul@gmail.com

###### Abstract

The optimal number of clusters is one of the main concerns when applying cluster analysis. Several cluster validity indexes have been introduced to address this problem. However, in some situations, there is more than one option that can be chosen as the final number of clusters. This aspect has been overlooked by most of the existing works in this area. In this study, we introduce a correlation-based fuzzy cluster validity index known as the Wiroonsri–Preedasawakul (WP) index. This index is defined based on the correlation between the actual distance between a pair of data points and the distance between adjusted centroids with respect to that pair. We evaluate and compare the performance of our index with several existing indexes, including Xie–Beni, Pakhira–Bandyopadhyay–Maulik, Tang, Wu–Li, generalized C, and Kwon2. We conduct this evaluation on four types of datasets: artificial datasets, real-world datasets, simulated datasets with ranks, and image datasets, using the fuzzy c-means algorithm. Overall, the WP index outperforms most, if not all, of these indexes in terms of accurately detecting the optimal number of clusters and providing accurate secondary options. Moreover, our index remains effective even when the fuzziness parameter m𝑚m is set to a large value. Our R package called UniversalCVI used in this work is available at https://CRAN.R-project.org/package=UniversalCVI.

Keyword: Cluster analysis, CRAN, fuzzy c-means (FCM),
image processing, ranking, R package, sub-optimal.

## 1 Introduction

Cluster analysis is an unsupervised learning tool in machine learning that is widely used in various areas, including business, pattern recognition, data mining, medical diagnosis, and image processing, among others. It relies on the inherent properties, patterns, or similarities of objects to reveal meaningful information. The aim is to identify natural groupings within a dataset that are not initially apparent and without prior knowledge of the groups. There are several clustering algorithms, mainly categorized as centroid-based clustering (such as K-means, K-medoids, K-medians, and fuzzy c-means \[FCM\]), hierarchical clustering (including single linkage, complete linkage, group average agglomerative, and Ward’s criterion), density-based clustering (such as DBSCAN, DENCLUE, and OPTICS), probabilistic clustering (EM), grid-based clustering (including CLIQUE, MAFIA, ENCLUS, and OptiGrid), and 

... (truncated, 67 KB total)
Resource ID: 5ea1633005740b6f | Stable ID: Nzc4M2FlYj