【基础储备】Differential Privacy 基础知识储备

本文主要是介绍【基础储备】Differential Privacy 基础知识储备，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

onion routing, online tracking, privacy policies, genetic privacy, social networks

传统隐私保护方法概览

k-anonymity
k-map
l-diversity
δ-presence

Why differential privacy is awesome

We no longer need attack modeling.
$\quad$ We protect any kind of information about an individual. It doesn $’$ t matter what the attacker wants to do.
$\quad$ It works no matter what the attacker knows about our data.
We can quantify the privacy loss.
$\quad$ When we use differential privacy, we can quantify the greatest possible information gain by the attacker.
We can compose multiple mechanisms.
$\quad$ Composition is a way to stay in control of the level of risk as new use cases appear and processes evolve.

Quantify the attacker $^{'}$ s knowledge $\,$ ( or, what it means to quantify information gain, &bound)

In the attacker $’$ s model of the world, the actual database $\mathcal D$ can be either $D$ or $D^{'}$ .
$\mathbb P[\mathcal D=D]$ is the initial suspicion of the attacker. Similarly, their another initial suspicion is $\mathbb P[\mathcal D=D'] = 1-\mathbb P[\mathcal D=D]$ .
The updated suspicion $\mathbb P[\mathcal D=D | A(\mathcal D)=\mathcal O]$ is the attacker $’$ s model after seeing the mechanism returns output $\mathcal O$ .
在这里插入图片描述
With differential privacy, the updated probability(suspicion) is never too far from the initial suspicion. The black line is what happens if the attacker didn $’$ t get their suspicion updated at all. The blue lines are the lower and upper bounds on the updated suspicion: it can be anywhere between the two.
$\,$
Proof :
$\,$
Bayes $’$ rule : $\mathbb P[\mathcal D=D∣A(\mathcal D)=\mathcal O] = \frac{ \mathbb P[\mathcal D=D] · \mathbb P[A(\mathcal D)=\mathcal O∣\mathcal D=D] }{ P[A(\mathcal D)=\mathcal O] }$
$\,$
$\frac{ \mathbb P[\mathcal D=D∣A(\mathcal D)=\mathcal O] }{ \mathbb P[\mathcal D=D'∣A(\mathcal D)=\mathcal O] } = \frac{ \mathbb P[\mathcal D=D] · \mathbb P[A(\mathcal D)=\mathcal O∣\mathcal D=D] }{ \mathbb P[\mathcal D=D'] · \mathbb P[A(\mathcal D)=\mathcal O∣\mathcal D=D'] } = \frac{ \mathbb P[\mathcal D=D] · \mathbb P[A(D)=\mathcal O] }{ \mathbb P[\mathcal D=D'] · \mathbb P[A(D')=\mathcal O] }$
$\,$
Differential privacy : $e^{-\varepsilon} \le \frac{ \mathbb P[A(D)=\mathcal O] }{ \mathbb P[A(D')=\mathcal O] } \le e^\varepsilon$
$\,$
$e^{-\varepsilon} · \frac{\mathbb{P}\left[\mathcal D=D\right]}{\mathbb{P}\left[\mathcal D=D'\right]} \le \frac{\mathbb{P}\left[\mathcal D=D\mid A(\mathcal D)=\mathcal O\right]}{\mathbb{P}\left[\mathcal D=D'\mid A(\mathcal D)=\mathcal O\right]} \le e^\varepsilon · \frac{\mathbb{P}\left[\mathcal D=D\right]}{\mathbb{P}\left[\mathcal D=D'\right]}$
$\,$
Replace $\mathbb P[\mathcal D=D']$ with $1-\mathbb P[\mathcal D=D]$ , do the same for $\mathbb{P}\left[\mathcal D=D' | A(\mathcal D)=\mathcal O\right]$ , and solve for $\mathbb{P}\left[\mathcal D=D | A(\mathcal D)=\mathcal O\right]$
$\,$
$\frac{\mathbb{P}\left[\mathcal D=D\right]}{e^{\varepsilon}+\left(1-e^{\varepsilon}\right) · \mathbb{P}\left[\mathcal D=D\right]} \leq \mathbb{P}\left[\mathcal D=D\mid A\left(\mathcal D\right)=\mathcal O\right] \leq \frac{e^{\varepsilon} · \mathbb{P}\left[\mathcal D=D\right]}{1+\left(e^{\varepsilon}-1\right)\cdot\mathbb{P}\left[\mathcal D=D\right]}$
$\,$
We can draw a generalization of this graph for various values of $\varepsilon$ :
在这里插入图片描述

The privacy loss random variable

Recall the proof above. We saw that $\varepsilon$ bounds the evolution of betting odds : $\frac{\mathbb{P}\left[D=D_1\mid A(D)=O\right]}{\mathbb{P}\left[D=D_2\mid A(D)=O\right]} \le e^\varepsilon · \frac{\mathbb{P}\left[D=D_1\right]}{\mathbb{P}\left[D=D_2\right]}$
Let us define:
$\mathcal{L}_{D_1,D_2}(O) = \ln\frac{ \, \frac{ \mathbb{P}\left[D=D_1\mid A(D)=O\right]}{\mathbb{P}\left[D=D_2\mid A(D)=O\right]} \,}{ \frac{\mathbb{P}\left[D=D_1\right]}{\mathbb{P}\left[D=D_2\right]} } = \ln\left(\frac{\mathbb{P}\left[A(D_1)=O\right]}{\mathbb{P}\left[A(D_2)=O\right]}\right)$

Intuitively, the privacy loss random variable is the actual $\varepsilon$ value for a specific output $O$ .
在定义 $\mathbb{P}[A(D_1)=O] \le e^\varepsilon · \mathbb{P}[A(D_2)=O]$ 中取 $\le$ 的体现：
$\quad\,\,\,\,$

实际上只有当输出 $O$ 在 $O=A(D_1)$ 与 $O=A(D_2)$ 之间时才取 $<$ , 此时同样 $\mathcal{L}_{D_1,D_2}(O) \le \varepsilon$ 只取 $<$

$\delta$ in $(\varepsilon,\delta)$ - $D P$

$\delta \,\,\,=\,\,\, \sum_{O} \mathbb{P}[A(D_1)=O] \, · \max\left(0, 1 -e^{\varepsilon-{\mathcal{L}_{D_1,D_2}(O)}}\right) \,\,\,=\,\,\, \mathbb{E}_{O\sim A(D_1)} \left[ \max\left(0, 1 - e^{\varepsilon-{\mathcal{L}_{D_1,D_2}(O)}}\right)\right]$

Intuitively, in $(\varepsilon,\delta)$ -DP, the $\delta$ is the area highlighted below.
在这里插入图片描述
注 : 使用高斯噪声时, 若仅仅直观地将 $\,\delta\,$ 理解为是 $\,(\varepsilon,0)$ - $DP\,$ 即传统 $\,\varepsilon$ - $DP\,$ 的定义可以被违反的概率, 则是选取了 $\,\delta=\delta_1$
但实际上 $\,\delta_2$ 更紧。从上图可见 , $\delta_2<\delta_1$ , 因为 $\,\delta_1$ 在上图中也可以看作是长为 $\,\delta_1$ 宽为1的矩形面积, 显然大于阴影部分的面积 $\,\delta_2$
上述 $\,\delta_1\,$ ( $\,$ 即传统 $\,\varepsilon$ - $DP\,$ 的定义可以被违反的概率 $\,$ ) $\,$ 的直观意义如下图 :
在这里插入图片描述
Proof :
$\,$
The definition of $(\varepsilon,\delta)$ - $D P$ : $\mathbb{P}[A(D_1)\in S] \le e^\varepsilon\cdot\mathbb{P}[A(D_2)\in S]+\delta.$
Fix a mechanism $A$ and a $\varepsilon \ge0$ . For each possible set of outputs $S$ , we can compute:
$\,$
$\delta = \max_{S} \left(\mathbb{P}[A(D_1)\in S] - e^\varepsilon\cdot\mathbb{P}[A(D_2)\in S]\right)$
$\,$
The set $S$ that maximizes the quantity above is:
$\,$
$S_{max} = \left\{O \mid \mathbb{P}[A(D_1)=O] > e^\varepsilon\cdot\mathbb{P}[A(D_2)=O]\right\} =\{ O \mid \, \mathcal{L}_{D_1,D_2}(O)>\varepsilon \,\, \}$
$\,$
So we have:
$\,$
$\delta = \mathbb{P}[A(D_1)\in S_{max}] - e^\varepsilon\cdot\mathbb{P}[A(D_2)\in S_{max}] \\ \,\,\,\,\, = \sum_{O\in S_{max}} \mathbb{P}[A(D_1)=O] - e^\varepsilon\cdot\mathbb{P}[A(D_2)=O] \\ \,\,\,\,\, = \sum_{O\in S_{max}} \mathbb{P}[A(D_1)=O] \left(1 - \frac{e^\varepsilon}{e^{\mathcal{L}_{D_1,D_2}(O)}}\right)$
$\,$
$\delta = \sum_{O} \mathbb{P}[A(D_1)=O] \, · \max\left(0, 1 -e^{\varepsilon-{\mathcal{L}_{D_1,D_2}(O)}}\right) \,\,\,=\,\,\, \mathbb{E}_{O\sim A(D_1)} \left[ \max\left(0, 1 - e^{\varepsilon-{\mathcal{L}_{D_1,D_2}(O)}}\right)\right]$

Composition

$C$ is the algorithm which combines $A$ and $B$ : $\mathcal D )=( A(\mathcal D) , B(\mathcal D))$ .
The output of this algorithm will be a pair of outputs: $\mathcal O=(\mathcal O_A , \mathcal O_B)$ .
$C$ is ( $\varepsilon_A$ + $\varepsilon_B$ )-differentially private. / $C$ is ( $\varepsilon_A$ + $\varepsilon_B$ , $\delta_A$ + $\delta_B$ )-differentially private.

ESA (Encode, Shuffle, Analyze) The best of both worlds (Local vs. Global differential privacy).

Until very recently, there was no middle ground between the two options. The choice was binary: either accept a much larger level of noise (Local differential privacy), or collect raw data (Global differential privacy). This is starting to change, thanks to recent work on a novel type of architecture called ESA.

The encoder is a fancy name to say “user”. It collects the data, encrypts it twice in two layers, and passes it to the shuffler.
The shuffler can only decrypt the first layer. It contains the user IDs, and something called “group ID”. This group ID describes what kind of data this is, but not what is the actual value of the data. First, The shuffler removes identifiers, and groups all group IDs together and counts how many users are in each group. Then, it passes them all to the analyzer if there are enough of them.
The analyzer can then decrypt the second layer of the data, and compute the output.

eee

Ref

博文参考链接（Google）
博文参考链接（DP创始者之一）

这篇关于【基础储备】Differential Privacy 基础知识储备的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！

【基础储备】Differential Privacy 基础知识储备

传统隐私保护方法概览

Why differential privacy is awesome

Quantify the attacker $^{'}$ s knowledge $\,$ ( or, what it means to quantify information gain, &bound)

The privacy loss random variable

$\delta$ in $(\varepsilon,\delta)$ - $D P$

Composition

ESA (Encode, Shuffle, Analyze) The best of both worlds (Local vs. Global differential privacy).

eee

Ref

相关文章

从基础到高级详解Go语言中错误处理的实践指南

Python的pandas库基础知识超详细教程

Spring的基础事务注解@Transactional作用解读

Java中最全最基础的IO流概述和简介案例分析

从基础到高级详解Python数值格式化输出的完全指南

redis-sentinel基础概念及部署流程

从基础到进阶详解Python条件判断的实用指南

Python WebSockets 库从基础到实战使用举例

从基础到高阶详解Python多态实战应用指南

MySQL数据类型与表操作全指南( 从基础到高级实践)

【基础储备】Differential Privacy 基础知识储备

传统隐私保护方法概览

Why differential privacy is awesome

Quantify the attacker ′ ' ′s knowledge \, ( or, what it means to quantify information gain, &bound)

The privacy loss random variable

δ \delta δ in ( ε , δ ) (\varepsilon,\delta) (ε,δ)- D P DP DP

Composition

ESA (Encode, Shuffle, Analyze) The best of both worlds (Local vs. Global differential privacy).

eee

Ref

相关文章

Quantify the attacker $^{'}$ s knowledge $\,$ ( or, what it means to quantify information gain, &bound)

$\delta$ in $(\varepsilon,\delta)$ - $D P$