A remarkable increase in different issues like
network complexity, increased access of Internet, information sharing and a
growing impact of Internet gives rise to security and privacy as a major
concern for research. “Data mining is a technique for extracting knowledge
automatically and intelligently from huge amount of data. Individual Sensitive
information compromising the individual’s right to privacy is also disclosed
during the process.” Privacy preserving data mining (PPDM) refers to securing
the privacy of personal data or sensitive information without losing the productiveness
of data.

Privacy preserving data mining is drawing booming
attention in the past recent years with the expeditious development of
Internet, data processing and data storage technologies. Privacy of an
individual will not be violated until and unless one feels his/her private
information is being used unfavorably. No one can prevent someone’s
personal information from being misused once it is disclosed. There are several
methods that have been put forward for privacy concern but this branch of
research is still in its infancy.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

“A bunch of techniques and methods have been
developed for privacy preserving data mining that allows one to extract
required relevant knowledge from huge amount of data, and hiding sensitive data
from disclosure or inference at the same time. The ultimate goal of PPDM is to
develop an efficient algorithm that meets the following requirements.”


Research of PPDM has the following approaches:


Data Hiding: The sensitive data
like name, address, contact number, etc. are either replaced or blocked or
trimmed from the database. This prevents the user of data from trading off with
other individual’s personal information.

Rule Hiding: The sensitive
information or rules extracted from data mining process are blocked for use.
Thus the private information explored from the mining cannot be used.

Secure Multiparty Computation
(SMC): The data is encrypted before being shared for computations so as to
avoid the data from being leaked.


Privacy Preserving Data Mining Techniques are classified
on the basis of following dimension:

Data distribution

Data or rule hiding

Data modification

Data mining algorithm

Privacy preservation


distribution: On the basis of data
distribution, the PPDM algorithms are categorized as centralized and
distributed. In the centralized database system, whole data is stored at a
single database. While in the distributed database, the data may be present in
different databases at different locations. The distributed database is further
classified as horizontal data distribution and vertical data distribution. In the
horizontal approach, the records of different databases resides at different
locations while in the vertical approach, all the data for various attributes
is present in different locations.

Data or
rule hiding: On the basis of purpose of
hiding, PPDM algorithms are classified as data hiding and rule hiding. In the
data hiding approach, the sensitive data like name, address, contact number,
etc. are either replaced or blocked or trimmed from the database. This prevents
the user of data from trading off with other individual’s personal information.
Most of the procedures use data hiding techniques as a measure to keep the information
safe from revealing out through hiding precise patterns by modifying the data.

Data modification: Modification is required to modify or change the
data in order to attain high level of privacy. The data can be modified by
perturbation, blocking, merging, aggregation, sampling or swapping or using
combination of any of these techniques.

Perturbation: It refers to
changing the original value by some new value. For example, replacing 1 by 0 or
0 by 1, i.e. adding some noise.

Blocking: Refers to blocking of
data from being disclosed by substituting the current attribute value by ‘?’

Aggregation or merging: It is
achieved by combining various values into a loutish group.

Swapping: This means interchanging
the values of some particular data.

Sampling: It refers to unleashing
of data only for a particular sample.


mining algorithms: There are various data modification
algorithms which prepare a ground for analysis and designing of data hiding
algorithms. Some of the important algorithms are:


Decision tree inducers

Association rule mining

Clustering algorithms

Rough sets

Bayesian networks

In the current outline, PPDM techniques use
classification, association rule mining and clustering. Association mining
cites to the detection of associated rules periodically. Clustering analysis is
a task of dividing or splitting a data set into different groups. Classification
refers to finding of set of models for estimating an outcome on the basis of
the input provided, which gives data classes.


preservation: “The selective modification of
data is done using

PPDM technique and is required to achieve higher
utility for the modified data given that the privacy is not lost.” The techniques
which are used in centralized data distributions involve sanitation, blocking,
distortion and generalization. Secure multi party computation is one of the algorithms
which deals with the computation of any function for any input, provided that one
input is held by each candidate and no private information is disclosed to any
contributor during the computation. For data hiding, data distortion is used
mainly, then the data sanitation and then generalization.

Post Author: admin


I'm Irvin!

Would you like to get a custom essay? How about receiving a customized one?

Check it out