Relative analysis of Ontology based mining and mining using side-information

  • Atiya Kazi University of Pune
  • Priyanka Bandagale
Keywords: clustering, ontology, side-information

Abstract

Text mining applications generally disregard the side-information contained within the text document, which can enhance the overall clustering process. To overcome this deficiency, the proposed algorithm will work in two phases. In the first phase, it will perform clustering of data along with the side information, by combining classical partitioning algorithms with probabilistic models. This will automatically boost the efficacy of clustering. The clusters thus generated, can also be used as a training model to promote the solution of the classification problem. In the second phase, a similarity based distance calculation algorithm, which makes use of two shared word spaces from the DISCO ontology, is employed to perk up the clustering approach. This pre-clustering technique will calculate the similarity between terms based on the cosine distance method, and will generate the clusters based on a threshold. This inclusion of ontology in the pre-clustering phase will generate more coherent clusters by inducing ontology along with side-information

References

[1] C. C. Aggarwal et al, “On the use of side-information for mining text data”, IEEE Trans. Knowl. Data Eng, vol 26, pp. 1415-1429, June 2014. [2] Henrihs Gorskis, Yuri Chizhov, “Ontology Building Using Data Mining Techniques”, Information technology and management science, vol 15, pp 183-188, 2013.
[3] C. C. Aggarwal and C.-X. Zhai, “A survey of text classification algorithms," in Mining Text Data. New York, NY, USA: Springer, 2012. [4] Mathieu d’Aquina, Gabriel Kronbergerb, and Mari Carmen Suárez- Figueroa, “Combining Data Mining and Ontology Engineering to enrich Ontologies and Linked Data”, Proc. first International workshop on knowledge discovery and Data Mining , pp 19-24, 2012.
[5] Chin-Ang Wu et al., “Toward Intelligent Data Warehouse Mining: An Ontology-Integrated Approach for Multi-Dimensional Association Mining”, Information Technology and Management Science, Expert Systems with applications, volume 38, Issue 9, pp 11011-11023, sept- 2011. [6] J. Chang and D. Blei, “Relational topic models for document networks”, in Proc. AISTASIS, Clearwater, FL, USA, 2009, pp. 8188. [7] R. Angelova and S. Siersdorfer, “A neighborhood-based approach for clustering of linked document collections”, in Proc. CIKM Conf., New York, NY, USA, 2006, pp. 778–779. [8] C. C. Aggarwal and P. S. Yu, “A framework for clustering massive text and categorical data streams”, in Proc. SIAM Conf. Data Mining, 2006, pp. 477-481. [9] S. Zhong, “Efficient streaming text clustering”, Neural Netw., vol. 18, no. 5–6, pp. 790–798, 2005. [10] Y. Zhao and G. Karypis, “Topic-driven clustering for document datasets”, in Proc. SIAM Conf. Data Mining, 2005, pp. 358-369. [11] C. C. Aggarwal, S. C. Gates, and P. S. Yu, “On using partial supervision for text categorization,” IEEE Trans. Knowl. Data Eng.,vol. 16, no. 2, pp. 245–255, Feb. 2004. [12] T. Liu, S. Liu, Z. Chen, and W.-Y. Ma, “An evaluation of feature selection for text clustering”, in Proc. ICML Conf., Washington, DC, USA, 2003, pp. 488-495. [13] M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document clustering techniques,” in Proc. Text Mining Workshop KDD, 2000, pp. 109–110. [14] S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering algorithm for large databases”, in Proc. ACM SIGMOD Conf., New York, NY, USA, 1998, pp. 73-84. [15] H. Schutze and C. Silverstein, “Projections for efficient document clustering”, in Proc. ACM SIGIR Conf., New York, NY, USA, 1997, pp. 74-81. [16] D. Cutting, D. Karger, J. Pedersen, and J. Tukey, “Scatter/Gather: A cluster-based approach to browsing large document collections”, in Proc. ACM SIGIR Conf., New York, NY, USA, 1992, pp. 318- 329.
Published
2018-03-21
How to Cite
Kazi, A., & Bandagale, P. (2018). Relative analysis of Ontology based mining and mining using side-information. Asian Journal For Convergence In Technology (AJCT) ISSN -2350-1146, 2(2). Retrieved from http://www.asianssr.org/index.php/ajct/article/view/167
Section
Article

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.