Friday, March 29, 2019

Privacy-handling Techniques and Algorithms for Data Mining

Privacy-handling Techniques and Algorithms for entropy MiningVIVEK UNIYAL bunco info mining cannister extract a previously unknown patterns from vast collection of selective information. immediately networking, hardwargon and software technology are rapidly growing bully in collection of selective information fall. Organization are containing huge amount of information from many heterogeneous data shank in which private and sensitive entropy of an individual. In data mining novel pattern lead be extracted from such data by which we can use for confused domains in decision marketing. But in the data mining output in that respect will be sensitive, private or personal information of a particular person can also be revealed. There will be some misuse of leting these types of information, and it can harm the data owner. So in distributed environs privacy is becoming an important government issue in many applications of data mining. Techniques of Privacy preserving data mini ng (PPDM) are provide new direction to solve issues. By PPDM, we can find a valid data mining results without underlying data determine learning.In this dissertation we have introduced two algorithms for privacy handling concern. one(a) is k-anonymization in which information corresponding to any individual person in a release data can non be distinguished from that of at least(prenominal) k-1 other individual persons whose information also appears in release data. In this algorithm we are achieving the k-anonimyzation some values must be suppressed or extrapolate in database. K-anonymity have record linkage feeler mode and l- vicissitude can have attack mode of attribute linkage.KEYWORDS Data Mining, Advantages and Disadvantages of Data Mining, Privacy handking, K-anonymization Algorithm, L-diversity.ACKNOWLEDGEMENTSI wish to take this opportunity to render my deep gratitude to all the people who have extended their cooperation in various ways during my dissertation. It is m y pleasure to acknowledge the help of all those individuals.First of all, I would like to express my deepest gratitude to my dissertation supervisor, Mr. Govind Kamboj without whom none of this would have been possible. He provided me perpetually the essential direction and advice during the work. I am grateful to him to give a shape towards completion of my dissertation. Without his supervision and support, this work would not have been unblemished successfully in time.I am grateful to the President, wickedness President, Chancellor, Vice Chancellor and Head of the Department of the Graphic Era University for providing an excellent environment for work with ample facilities and academic freedom. I would also like to give thanks the teaching and non-teaching staff for their valuable support during M.Tech.Last but not the least I am grateful to all my teachers and friends for their cooperation and encouragement throughout completing this task.(Vivek Uniyal)M.Tech( Computer Scienc e Engineering)TABLE OF CONTENTSCANDIDATES DECLERATION iiiABSTRACT ivACKNOWLEDGEMENT vLIST OF ABBREVIATIONS ixLIST OF FIGURES x1. INTRODUCTION 11.1 Problem Statement 11.2 Overview 11.3 Advantages of data mining 31.4 Disadvantages of data mining 41.5 Why privacy-handling is infallible in data-mining 41.6 Motivation 61.7 Organization 42. background knowledge AND LITERATURE SURVEY 73. METHODS AND METHODOLOGIES 133.1 Randomization method 133.2 Group based anonymization methods 143.2.1 K-Anonymity framework 143.2.2 Personalized privacy-preservation 153.2.3 Utility based privacy-preservation 153.2.4 Sequential releases 153.2.5 The l-diversity method 153.3 Distributed privacy-preserving data mining 163.4 Detailed description about K-anonymity and l-diversity 163.4.1 Data collection and Data publishing 163.4.2 Privacy Data publishing 173.4.3 Algorithm of k-anonimity 193.4.4 l-diversity 243.4.1.1 Lack of diversity 253.4.1.2 Strong background knowledge 254. EXPERIMENTAL outgrowth 274.1 Introduction 274.2 Experimental result 274.2.1 burden of proposed k-anonymity and l-diversity 275. CONCLUSION AND SCOPE FOR FUTURE WORK 335.1 shutdown 335.2 Scope for Future Work 33PUBLICATION come out of the closet OF THIS WORK 34REFERENCES 35LIST OF ABBREVIATIONSPPDP Privacy-preserving data publishingPPDMPrivacy-preserving data miningQID Quasi-IdentifierLIST OF FIGURES send off 1.1 Data mining a touchstone included in the process of knowledge discovery 1 recruit 1.2 Typical data mining system architecture 2 realise 1.3 Record Owner, Data Collection and Data Publishing 17 go in 1.4 Hospital Database 18 get into 1.5 Taxonomy tree for JOB, SEX, AGE (QID attributes) 20 propose 1.6 Hospital display panel Original record in data base 21 prototype 1.7 Table of Sensitive record (Publishing data) 21Figure 1.8 Table of External Data ppt table 22Figure 1.9 sequeling data after linking the sensitive and ppl table 22Figure 1.10 Research table (generalized with k-anonymous publish data) 23Figure 1.11 elongated table (For linking like generalized voter list) 23Figure 1.12 For checking the k- anonymity 23Figure 1.13 Result of linking the table research to extended 24Figure 1.14 Hospital master key data record Project 28Figure 1.15 Comparing the Un-Generalized published and extended data tables 29 Figure 1.16 Comparing Generalized Extended and Sensitive table records 30 Figure 1.17 Table for k-anonymity and l-diversity 32 Figure 1.18 Plotting exact l-value and distinct l-diversity value in weka 33 Figure 1.19 Plotting exact l-value and entropy l-diversity value in weka 33

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.