On Sampling Edges Almost Uniformly

[ER17] On Sampling Edges Almost Uniformly, reading note; also the note of the talk given by Professor Talya Eden in workshop WoLA.

Introduction

In fact, the original topic of Professor Eden was “counting subgraphs”. However, Eden let us choose to talk about “counting subgraphs” or “sampline edges”, and finally more people chose the latter.

A nice talk!

Preliminaries

The algorithms we consider access an undirected graph GG via queries. Our algorithm uses the following queries:

  1. Vertex query: returns a uniformly random vertex vVv ∈ V.
  2. Degree query: given a vertex vVv ∈ V, returns d(v)d(v).
  3. Neighbor query: given vVv ∈ V and iNi ∈ \mathbb{N}, return vv’s ii-th neighbor; if i>d(v)i > d(v), this operation returns fail.

We want to design an algorithm which has low query complexity and can return an edge with probability at least 2/32 / 3. Moreover, each returned edge is sampled according to a distribution that is pointwise ε\varepsilon-close to uniform.

To design the algorithm, we need to know the approximate number of edges m\overline{m}. And there is an algorithm:

Thm 2.1. ([GR08]) Let G=(V,E)G = (V,E) be a graph with nn vertices and mm edges. There exists an algorithm which uses ˜O~(n/m)\widetilde{O}(n / \sqrt{m}) query complexity in expectation and outputs an estimate m\overline{m} of mm that with probability at least 2/32/3 satisfies mm2mm ≤ \overline{m} ≤ 2m.

And another lemma about statistics distance:

Lemma 2.2. Let PP be a probability distribution over a finite set which satisfies:

1εP(x)P(y)1+ε, x,yΩ1 - \varepsilon \le \dfrac{P(x)}{P(y)} \le 1 + \varepsilon,\ \forall x, y \in \Omega

Then PP is pointwise εε-close to uniform

Warm-up —— In Regular Graph or Bounded Degree Graph

For regular graph, sampling edges is very easy. We just sample a vertex vv and then sample a neighbor edge ee of vv.

For bounded degree graph, let the maximum degree Δ\Delta be a constant. We can just make a small change: when we get the edge ee by sampling the neighbor of vv, we accept it with probability d(v)/Δd(v) / \Delta.

Then the probability we get the specific edge e=(u,v)e = (u, v) is:

Pr[e]=1n(1d(v)d(v)Δ+1d(u)d(u)Δ)=2nΔ\Pr[e] = \dfrac{1}{n} \cdot \left( \dfrac{1}{d(v)} \cdot \dfrac{d(v)}{\Delta} + \dfrac{1}{d(u)} \cdot \dfrac{d(u)}{\Delta} \right) = \dfrac{2}{n \Delta}

We see that in the case algorithm succeds, the edge returned is uniformly distributed. Then we repeat this algorithm O(nΔm)O\left( \dfrac{n \Delta}{m} \right) times to get 2/3\ge 2 / 3 successful probability.

When the graph is a star, then Δ=m=n1\Delta = m = n - 1, and the query complexity bacome very large (not sublinear).

General Graph

Defn 4.1. For a given threshold θ\theta, we say that a vertex uu is a light vertex if d(u)θd(u) ≤ θ and
otherwise we say that it is a heavy vertex. We say that an edge is a light edge if it originates from a light vertex. A heavy edge is defined analogously. Finally, we denote the sets of light and heavy vertices by LL and HH respectively, and the sets of light and heavy edges by ELE_L and EHE_H respectively.

The algorithm is very simple:

sample-light-edge
sample-heavy-edge
sample-edge-almost-uniformly

Lemma 4.2. The procedure Sample-light-edge\text{Sample-light-edge} performs a constant number of queries and succeeds with probability EL/(nθ)|E_L| / (n θ). In the case where Sample-light-edge\text{Sample-light-edge} succeeds, the edge returned is uniformly distributed in ELE_L.

Proof. Almost the same as the bounded degree case. \Box

Lemma 4.3. The procedure Sample-heavy-edge\text{Sample-heavy-edge} performs a constant number of queries and succeeds with probability in [EHnθ(1mθ2),EHnθ]\left[ \dfrac{|E_H|}{n \theta} \left( 1 - \dfrac{m}{\theta^2} \right), \dfrac{|E_H|}{n \theta} \right]. In the case where Sample-heavy-edge\text{Sample-heavy-edge} succeeds, the edge returned is distributed according to a distribution that is pointwise (m/θ2)(m / θ^2)-close to uniform.

Proof. For a vertex vv, we denote Γ(v)H|\Gamma(v) \cap H| by dH(v)d_H(v). dL(v)d_L(v) is defined similarly. Then we have:

dH(v)H<mθ<mθd(v)θd_H(v) \le |H| < \dfrac{m}{\theta} < \dfrac{m}{\theta} \cdot \dfrac{d(v)}{\theta}

Thus:

dL(v)=d(v)dH(v)>(1mθ2)d(v)d_L(v) = d(v) - d_H(v) > \left( 1 - \dfrac{m}{\theta^2} \right) d(v)

And for the edge e=(v,w)e = (v, w) in line 66, we have:

Pr[e]=dL(v)nθ1d(v)>1nθ(1mθ2)\Pr[e] = \dfrac{d_L(v)}{n \theta} \cdot \dfrac{1}{d(v)} > \dfrac{1}{n \theta} \left( 1 - \dfrac{m}{\theta^2} \right)

On the other hand, dL(v)d(v)d_L(v) ≤ d(v) implies that:

Pr[e]1nθ\Pr[e] \le \dfrac{1}{n \theta}

Thus we have:

Pr[Success]=eEHPr[e]=EHnθ(1Mθ2)\Pr[\text{Success}] = \sum_{e \in E_H} \Pr[e] = \dfrac{|E_H|}{n \theta} \left( 1 - \dfrac{M}{\theta^2} \right)

\Box

Lemma 3.4. If mm2mm ≤ \overline{m} ≤ 2m, then Algorithm Sample-edge-almost-uniformly\text{Sample-edge-almost-uniformly} returns an edge (u,v)E(u,v) ∈ E with probability at least t2/32/3. The distribution induced by the algorithm is pointwise εε-close to uniform over EE.

Proof. For a specific loop ii in line 22 of Sample-edges-almost-uniformly\text{Sample-edges-almost-uniformly}, we have:

Pr[Success in loop i]=12Pr[L success]+12Pr[H success]EL2nθ+EH2nθ(1mθ2)(1ε)m2nθ(1ε)mε4n\Pr[\text{Success in loop }i] = \dfrac{1}{2} \Pr[\text{L success}] + \dfrac{1}{2} \Pr[\text{H success}] \\ \ge \dfrac{|E_L|}{2 n \theta} + \dfrac{|E_H|}{2 n \theta} \left( 1 - \dfrac{m}{\theta^2} \right) \ge (1 - \varepsilon) \dfrac{m}{2 n \theta} \ge (1 - \varepsilon) \dfrac{\sqrt{m \varepsilon}}{4 n}

Thus by repeating qq times, successful probability is at most:

(1(1ε)mε4n)10n/((1ε)mε)<1/3\left( 1 - \dfrac{(1 - \varepsilon) \sqrt{m \varepsilon}}{4 n} \right)^{10 n / ((1 - \varepsilon) \sqrt{m \varepsilon})} < 1 / 3

Moreover, by Lemmas 3.1 and 3.2, we have:

1ε/22nθPr[e]12nθ\dfrac{1 - \varepsilon / 2}{2 n \theta} \le \Pr[e] \le \dfrac{1}{2 n \theta}

Thus, we have:

1ε/2Pr[e]Pr[e]11ε/21+ε, e,eE1 - \varepsilon / 2 \le \dfrac{\Pr[e]}{\Pr[e']} \le \dfrac{1}{1 - \varepsilon / 2} \le 1 + \varepsilon,\ \forall e, e' \in E

By Lemma 2.1, we have done. \Box

Finally, if mm is not known, then we repeat [GR08] algorithm O(log(n/ε))O(\log (n / \varepsilon)) times, and find the median of these results. Chernoff bounds guarantees that m\overline{m} is good with probability at least 1ε/(2n2)1−ε/(2n^2).

Lower Bound

[ER17] also gives a lower bound of the quert complexity that sampling edge almost uniformly, while Professor Eden had no time to discuss in her talk.

TODO

Discussion

The technique we use:

  1. Rejection Sampling;
  2. Making things uniform;
  3. Heavy / light.

Reference

[ER17] Talya Eden and Will Rosenbaum. On Sampling Edges Almost Uniformly. In 1st Symposium on Simplicity in Algorithms (SOSA 2018). Open Access Series in Informatics (OASIcs), Volume 61, pp. 7:1-7:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

[GR08] Oded Goldreich and Dana Ron. Approximating average parameters of graphs. Random Struct. Algorithms, 32(4):473–493, 2008.