An Intelligent System for Detecting Duplicates and Anomalies in Small Business Databases
Keywords:
Duplicate detection, anomaly detection, data cleaning, small business data quality, AI system .Abstract
This paper introduces an AI-based system tailored for improving the quality of small business databases through automatic detection and correction of duplicate and anomalous records. The system employs fuzzy string matching and machine learning algorithms (such as Isolation Forest and DBSCAN) to identify inconsistencies with high accuracy and efficiency. Applied to a real-world dataset of over 50,000 entries, the system achieved 92% precision in duplicate detection and successfully isolated over 500 anomalous transactions. These results demonstrate the system’s practical value in enhancing decision-making and operational reliability for small enterprises.
