Systemic Risk in AI Development

paper

2023·arXiv·arxiv.org/abs/2308.14785

Authors

Nathakhun Wiroonsri·Onthada Preedasawakul

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

This appears to be a machine learning paper on cluster validity indexes rather than an AI safety resource; the title and content are misaligned, suggesting potential metadata error or incorrect classification.

Paper Details

Citations

0 influential

Year

2026

Methodology

book-chapter

Metadata

arxiv preprintreference

Abstract

The optimal number of clusters is one of the main concerns when applying cluster analysis. Several cluster validity indexes have been introduced to address this problem. However, in some situations, there is more than one option that can be chosen as the final number of clusters. This aspect has been overlooked by most of the existing works in this area. In this study, we introduce a correlation-based fuzzy cluster validity index known as the Wiroonsri-Preedasawakul (WP) index. This index is defined based on the correlation between the actual distance between a pair of data points and the distance between adjusted centroids with respect to that pair. We evaluate and compare the performance of our index with several existing indexes, including Xie-Beni, Pakhira-Bandyopadhyay-Maulik, Tang, Wu-Li, generalized C, and Kwon2. We conduct this evaluation on four types of datasets: artificial datasets, real-world datasets, simulated datasets with ranks, and image datasets, using the fuzzy c-means algorithm. Overall, the WP index outperforms most, if not all, of these indexes in terms of accurately detecting the optimal number of clusters and providing accurate secondary options. Moreover, our index remains effective even when the fuzziness parameter $m$ is set to a large value. Our R package called UniversalCVI used in this work is available at https://CRAN.R-project.org/package=UniversalCVI.

Cited by 1 page

Page	Type	Quality
AI Risk Interaction Network Model	Analysis	64.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202660 KB

[2308.14785] A correlation-based fuzzy cluster validity index with secondary options detector 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 0 0 footnotetext: AMS 2010 subject classifications: Primary 62H30 Secondary 68T10. 
 A correlation-based fuzzy cluster validity index with secondary options detector 

 
 
 Nathakhun Wiroonsri and Onthada Preedasawakul 
 Mathematics and Statistics with Applications Research Group (MaSA) 
 Department of Mathematics, King Mongkut’s University of Technology Thonburi
 This author is financially supported by National Research Council of Thailand (NRCT), Grant number: N42A660991 (2023). Email: nathakhun.wir@kmutt.ac.thEmail: o.preedasawakul@gmail.com 
 

 
 Abstract

 The optimal number of clusters is one of the main concerns when applying cluster analysis. Several cluster validity indexes have been introduced to address this problem. However, in some situations, there is more than one option that can be chosen as the final number of clusters. This aspect has been overlooked by most of the existing works in this area. In this study, we introduce a correlation-based fuzzy cluster validity index known as the Wiroonsri–Preedasawakul (WP) index. This index is defined based on the correlation between the actual distance between a pair of data points and the distance between adjusted centroids with respect to that pair. We evaluate and compare the performance of our index with several existing indexes, including Xie–Beni, Pakhira–Bandyopadhyay–Maulik, Tang, Wu–Li, generalized C, and Kwon2. We conduct this evaluation on four types of datasets: artificial datasets, real-world datasets, simulated datasets with ranks, and image datasets, using the fuzzy c-means algorithm. Overall, the WP index outperforms most, if not all, of these indexes in terms of accurately detecting the optimal number of clusters and providing accurate secondary options. Moreover, our index remains effective even when the fuzziness parameter m 𝑚 m is set to a large value. Our R package called UniversalCVI used in this work is available at https://CRAN.R-project.org/package=UniversalCVI .

 
 
 Keyword : Cluster analysis, CRAN, fuzzy c-means (FCM),
image processing, ranking, R package, sub-optimal.

 
 
 
 1 Introduction

 
 Cluster analysis is an unsupervised learning tool in machine learning that is widely used in various areas, including business, pattern recognition, data mining, medical diagnosis, and image processing, among others. It relies on the inherent properties, patterns, or similarities of objects to reveal meaningful information. The aim is to identify natural groupings within a dataset that are not initially apparent and without prior knowledge of the groups. There are several clustering algorithms, mainly categorized as centroid-based clustering (such as K-means, K-medoids, K-medians, and fuzzy c-means [FCM]), hierarchical clustering (including single linkage, complete linkage, group average agglomerative, and Ward’s criterion), density-base

... (truncated, 60 KB total)

Resource ID: 5ea1633005740b6f | Stable ID: sid_XT1wfYHgUQ