Skip to main content

Epistemic Privacy

Intelligent Information Systems project : Epistemic Privacy

Today, privacy protection has become a popular and even fashionable area of database research. However, the current state of scientific knowledge still does not allow the implementation of a comprehensive privacy solution that guarantees provable protection. In fact, the notion of privacy itself has many definitions and interpretations, some focused on theoretical soundness, others on practical usefulness. This work attempts to reduce the gap between these two aspects by exploring more flexible yet sound privacy definitions.

Our novel definition of privacy is given in the framework of offline (retroactive) database query auditing. Given information about the database, a description of sensitive data, and assumptions about users’ prior knowledge, our goal is to determine if answering a past user’s query could have led to a privacy breach. According to our definition, an audited fact A is private, given the disclosure of fact B, if no user can gain confidence in A by learning B, subject to prior knowledge constraints. Privacy is not violated if the disclosure of B causes a loss of confidence in A. The new notion of privacy is formalized using the well-known semantics for reasoning about knowledge, where logical facts correspond to sets of possible worlds (databases) that satisfy these facts. Database users are modelled as either possibilistic agents whose knowledge is a set of possible worlds, or as probabilistic agents whose knowledge is a probability distribution on possible worlds.

We study the properties of the new privacy notion, show its relationship with the conventional approach, and derive criteria that allow the auditor to test privacy efficiently in some important cases. In particular, we prove characterization theorems for the possibilistic case, and study in depth the probabilistic case under the assumption that all database records are considered a-priori independent by the user, as well as under more relaxed (or absent) prior-knowledge assumptions, e.g. log-supermodularity and log-submodularity. In the probabilistic case we show that for certain families of distributions there is no efficient algorithm to test whether a fact A is private given the disclosure of a fact B, assuming P ≠ NP. Nevertheless, for many interesting families, such as the family of product distributions, we obtain algorithms that are efficient both in theory and in practice.

It turns out that taking advantage of the confidence gain vs. confidence loss distinction (confidence in A while learning B) yields a remarkable increase in the flexibility of query auditing. In the probabilistic case, this corresponds to enforcing the “semiperfect secrecy” inequality Pr[A | B] ≤ Pr[A]. Let us illustrate its flexibility with a simple example of Alice (a user) and Bob (a patient). The hospital’s database has two records: X = “Bob is HIV-positive” and Y = “Bob had blood transfusions.” The sensitive property A is the presence of X, i.e. that Bob is HIV-positive. The property B that Alice queries and learns is “X implies Y”, i.e. that “if Bob is HIV-positive, then he had blood transfusions.” We make no constraints on Alice’s prior knowledge distribution, other than a nonzero probability of the actual database. Could the disclosure of B violate the privacy of A? Look at the following possible-worlds table:

 Y  is present  Y  is not present
 X  is present A  is true A  is true
 X  is not present A  is false A  is false

For Alice, learning B has the effect of ruling out the black cell, while leaving the white cells untouched. Whatever the cells’ prior probabilities are, the odds of A can only go down: Pr[A | B] ≤ Pr[A]. Thus, A is private with respect to B, even though A and B share a critical record X in the sense of Miklau and Suciu [1], and regardless of any possible dependence between the records. (A record R is critical for a query Q iff there is a database D such that Q(D − {R}) ≠ Q(D).) Note that if Bob proactively tells Alice “If I am HIV-positive, then I had blood transfusions,” a privacy breach of A may occur, because Alice learns more than just B.

References

  1. Gerome Miklau and Dan Suciu. A Formal Analysis of Information Disclosure in Data Exchange. Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, June 13-18, 2004, pp. 575-586.

Team: Alexandre Evfimievski, Ronald Fagin, David Woodruff

Papers

  1. Alexandre Evfimievski, Ronald Fagin, David Woodruff.  download content in pdf format "Epistemic Privacy In Proceedings of the 27th ACM Symposium on Principles of Database Systems (PODS 2008), Vancouver, BC, Canada, June 9-11, 2008, pages 171-180. [Slides]

[ Intelligent Information Systems Home Page | Healthcare Informatics | CS at ARC ]