Application of data mining techniques for avoiding underestimation of an event

  • Avijit Kumar Chaudhuri
  • Dr. Anirban Das
  • Dr. Deepankar Sinha
  • Dr. Dilip K. Banerjee


Medical records comprise varied data types; artificial intelligence and data-mining methods (DMTs) are useful to draw insights and patterns. Several scholars claim that there is no universal way of addressing diagnosis issues, and a mixed model is desirable to resolve these concerns. In this paper, the authors compare the proven approaches and propose a framework to integrate the findings from various techniques to evade Type 2 and Type 1 errors. The dataset chosen for this purpose includes medical data on HPV disease. Two sets of dataset – disease and treatment dataset and features found significant from ensemble method – the random forest were used and to predict the disease. The results show that traditional methods such as Logistic Regression(LR) performed better with features found significant using  Random Forest(RF). However, this approach fails when the dichotomy of data (i.e., disease or no disease) is not distinct. Decision Tree(DT) analysis shows consistent performance across all variants of the dataset chosen in this paper. The paper suggests an amalgamation of association rules and a prediction approach (with or without integration) that provides higher accuracy. 

Keywords: Data mining techniques; cervical cancer; Type 1 and Type 2 errors; integrated-approach; under estimation


