On Sampling Edges Almost Uniformly

发表于 2024-08-09 分类于论文阅读

[ER17] On Sampling Edges Almost Uniformly, reading note; also the note of the talk given by Professor Talya Eden in workshop WoLA.

Introduction

In fact, the original topic of Professor Eden was “counting subgraphs”. However, Eden let us choose to talk about “counting subgraphs” or “sampline edges”, and finally more people chose the latter.

A nice talk!

Preliminaries

The algorithms we consider access an undirected graph $G$ via queries. Our algorithm uses the following queries:

Vertex query: returns a uniformly random vertex $v ∈ V$ .
Degree query: given a vertex $v ∈ V$ , returns $d(v)$ .
Neighbor query: given $v ∈ V$ and $i ∈ \mathbb{N}$ , return $v$ ’s $i$ -th neighbor; if $i > d(v)$ , this operation returns fail.

We want to design an algorithm which has low query complexity and can return an edge with probability at least $2 / 3$ . Moreover, each returned edge is sampled according to a distribution that is pointwise $\varepsilon$ -close to uniform.

To design the algorithm, we need to know the approximate number of edges $\overline{m}$ . And there is an algorithm:

Thm 2.1. ([GR08]) Let $G = (V,E)$ be a graph with $n$ vertices and $m$ edges. There exists an algorithm which uses ˜ $\widetilde{O}(n / \sqrt{m})$ query complexity in expectation and outputs an estimate $\overline{m}$ of $m$ that with probability at least $2/3$ satisfies $m ≤ \overline{m} ≤ 2m$ .

And another lemma about statistics distance:

Lemma 2.2. Let $P$ be a probability distribution over a finite set $Ω$ which satisfies:

1 - \varepsilon \le \dfrac{P(x)}{P(y)} \le 1 + \varepsilon,\ \forall x, y \in \Omega

Then $P$ is pointwise $ε$ -close to uniform

Warm-up —— In Regular Graph or Bounded Degree Graph

For regular graph, sampling edges is very easy. We just sample a vertex $v$ and then sample a neighbor edge $e$ of $v$ .

For bounded degree graph, let the maximum degree $\Delta$ be a constant. We can just make a small change: when we get the edge $e$ by sampling the neighbor of $v$ , we accept it with probability $d(v) / \Delta$ .

Then the probability we get the specific edge $e = (u, v)$ is:

\Pr[e] = \dfrac{1}{n} \cdot \left( \dfrac{1}{d(v)} \cdot \dfrac{d(v)}{\Delta} + \dfrac{1}{d(u)} \cdot \dfrac{d(u)}{\Delta} \right) = \dfrac{2}{n \Delta}

We see that in the case algorithm succeds, the edge returned is uniformly distributed. Then we repeat this algorithm $O\left( \dfrac{n \Delta}{m} \right)$ times to get $\ge 2 / 3$ successful probability.

When the graph is a star, then $\Delta = m = n - 1$ , and the query complexity bacome very large (not sublinear).

General Graph

Defn 4.1. For a given threshold $\theta$ , we say that a vertex $u$ is a light vertex if $d(u) ≤ θ$ and
otherwise we say that it is a heavy vertex. We say that an edge is a light edge if it originates from a light vertex. A heavy edge is defined analogously. Finally, we denote the sets of light and heavy vertices by $L$ and $H$ respectively, and the sets of light and heavy edges by $E_L$ and $E_H$ respectively.

The algorithm is very simple:

sample-light-edge
sample-heavy-edge
sample-edge-almost-uniformly

Lemma 4.2. The procedure $\text{Sample-light-edge}$ performs a constant number of queries and succeeds with probability $|E_L| / (n θ)$ . In the case where $\text{Sample-light-edge}$ succeeds, the edge returned is uniformly distributed in $E_L$ .

Proof. Almost the same as the bounded degree case. $\Box$

Lemma 4.3. The procedure $\text{Sample-heavy-edge}$ performs a constant number of queries and succeeds with probability in $\left[ \dfrac{|E_H|}{n \theta} \left( 1 - \dfrac{m}{\theta^2} \right), \dfrac{|E_H|}{n \theta} \right]$ . In the case where $\text{Sample-heavy-edge}$ succeeds, the edge returned is distributed according to a distribution that is pointwise $(m / θ^2)$ -close to uniform.

Proof. For a vertex $v$ , we denote $|\Gamma(v) \cap H|$ by $d_H(v)$ . $d_L(v)$ is defined similarly. Then we have:

d_H(v) \le |H| < \dfrac{m}{\theta} < \dfrac{m}{\theta} \cdot \dfrac{d(v)}{\theta}

Thus:

d_L(v) = d(v) - d_H(v) > \left( 1 - \dfrac{m}{\theta^2} \right) d(v)

And for the edge $e = (v, w)$ in line $6$ , we have:

\Pr[e] = \dfrac{d_L(v)}{n \theta} \cdot \dfrac{1}{d(v)} > \dfrac{1}{n \theta} \left( 1 - \dfrac{m}{\theta^2} \right)

On the other hand, $d_L(v) ≤ d(v)$ implies that:

\Pr[e] \le \dfrac{1}{n \theta}

Thus we have:

\Pr[\text{Success}] = \sum_{e \in E_H} \Pr[e] = \dfrac{|E_H|}{n \theta} \left( 1 - \dfrac{M}{\theta^2} \right)

$\Box$

Lemma 3.4. If $m ≤ \overline{m} ≤ 2m$ , then Algorithm $\text{Sample-edge-almost-uniformly}$ returns an edge $(u,v) ∈ E$ with probability at least t $2/3$ . The distribution induced by the algorithm is pointwise $ε$ -close to uniform over $E$ .

Proof. For a specific loop $i$ in line $2$ of $\text{Sample-edges-almost-uniformly}$ , we have:

\Pr[\text{Success in loop }i] = \dfrac{1}{2} \Pr[\text{L success}] + \dfrac{1}{2} \Pr[\text{H success}] \\ \ge \dfrac{|E_L|}{2 n \theta} + \dfrac{|E_H|}{2 n \theta} \left( 1 - \dfrac{m}{\theta^2} \right) \ge (1 - \varepsilon) \dfrac{m}{2 n \theta} \ge (1 - \varepsilon) \dfrac{\sqrt{m \varepsilon}}{4 n}

Thus by repeating $q$ times, successful probability is at most:

\left( 1 - \dfrac{(1 - \varepsilon) \sqrt{m \varepsilon}}{4 n} \right)^{10 n / ((1 - \varepsilon) \sqrt{m \varepsilon})} < 1 / 3

Moreover, by Lemmas 3.1 and 3.2, we have:

\dfrac{1 - \varepsilon / 2}{2 n \theta} \le \Pr[e] \le \dfrac{1}{2 n \theta}

Thus, we have:

1 - \varepsilon / 2 \le \dfrac{\Pr[e]}{\Pr[e']} \le \dfrac{1}{1 - \varepsilon / 2} \le 1 + \varepsilon,\ \forall e, e' \in E

By Lemma 2.1, we have done. $\Box$

Finally, if $m$ is not known, then we repeat [GR08] algorithm $O(\log (n / \varepsilon))$ times, and find the median of these results. Chernoff bounds guarantees that $\overline{m}$ is good with probability at least $1−ε/(2n^2)$ .

Lower Bound

[ER17] also gives a lower bound of the quert complexity that sampling edge almost uniformly, while Professor Eden had no time to discuss in her talk.

TODO

Discussion

The technique we use:

Rejection Sampling;
Making things uniform;
Heavy / light.

Reference

[ER17] Talya Eden and Will Rosenbaum. On Sampling Edges Almost Uniformly. In 1st Symposium on Simplicity in Algorithms (SOSA 2018). Open Access Series in Informatics (OASIcs), Volume 61, pp. 7:1-7:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

[GR08] Oded Goldreich and Dana Ron. Approximating average parameters of graphs. Random Struct. Algorithms, 32(4):473–493, 2008.