VC(Vapnik-Chervonenkis)Dimension

The hypothesis sets typically used in machine learning are infinite. But the sample complexity bounds of the previous discussions are uninformative when dealing with infinite hypothesis sets. One could ask whether efficient learning from a infinite sample is even possible when the hypothesis set $H$ is infinite. Our analysis of the family of axis-aligned rectangles  indicates that this is indeed possible at least in some cases, since we proved that that infinite concept class was PAClearnable.

Our goal here will be to generalize that result and derive general learning guarantees for infinite hypothesis sets.A general idea for doing so consists of reducing the infinite case to the analysis of infinite sets of hypotheses and then proceed as in the finite case. There are different techniques for that reduction, each relying on a different notion of complexity for the family of hypotheses.

The first complexity notion we will use is that of Rademacher complexity. This will help us derive learning guarantees using relatively simple proofs based on McDiarmid's inequality, while obtaining highquality bounds, including data-dependent ones, which we will frequently make use later.. However, the computation of the empirical Rademacher complexity is NP-hard for some hypothesis sets. Thus, we subsequently introduce two other purely combinatorial notions, the growth function and the VC-dimension.

VCdimension
Here, we introduce the notion of VC-dimension (Vapnik-Chervonenkis dimension).The VC-dimension is also a purely combinatorial notion but it is often easier to compute than the growth function (or the Rademacher Complexity). As we shall see, the VC-dimension is a key quantity in learning and is directly related to the growth function.

Is there any means to determine if a function is PAC learnable and derive the right bound?The answer is yes and it is based on theVapnikChervonenkis dimension.

To define the VC-dimension of a hypothesis set $H$, we first introduce the concept of shattering. Recall from the previous section, that given a hypothesis set $H$, a dichotomy of a set $S$ is one of the possible ways of labeling the points of $S$ using a hypothesis in $H$. A set $S$ of $m \ge 1$ points is said to be shattered by a hypothesis set $H$ when $H$ realizes all possible dichotomies of $S$, that is when $\prod H(m) = 2^m$.

To compute the VC-dimension we will typically show a lower bound for its value and then a matchingupper bound. To give a lower bound $d$ for $VCdim(H)$, it suffices to show that a set $S$ of cardinality $d$ can be shattered by $H$. To give an upper bound, we need to prove that no set $S$ of cardinality $d + 1$ can be shattered by $H$, which is typically more difficult.

Example  (Intervals on the real line) Our first example involves the hypothesis class of intervals on the real line. It is clear that the VC-dimension is at least two,since all four dichotomies$ (+, +); (-,-); (+,-); (-,-)$ can be realized, as illustrated in figure below. In contrast, by the definition of intervals, no set of three points can be shattered since the (+, - , +) labeling cannot be realized. Hence, $VCdim(intervals in $R$) = 2$.
Example (Hyperplanes) Consider the set of hyperplanes in $R^2$. We first observe that any three non-collinear points in $R^2$ can be shattered. To obtain the first three dichotomies, we choose a hyperplane that has two points on one side and the third point on the opposite side. To obtain the fourth dichotomy we have all three points on the same side of the hyperplane. The remaining four dichotomies are realized by simply switching signs. 
Next, we show that four points cannot be shattered by considering two cases: (i) the four points lie on the convex hull defined by the four points, and (ii) three of the four points lie on the convex hull and the remaining point is internal. In the first case, a positive labeling for one diagonal pair and a negative labeling for the other diagonal pair cannot be realized, as illustrated in figure(a). In the second case, a labeling which is positive for the points on the convex hull and negative for the interior point cannot be realized, as illustrated in figure (b). Hence, $VCdim(hyperplanes in R^2) = 3$.



Example  (Axisaligned Rectangles) We first show that the VC-dimension is at least four, by considering four points in a diamond pattern. Then, it is clear that all 16 dichotomies can be realized, some of which are illustrated in figure(a).In contrast, for any set of five distinct points, if we construct the minimal axis aligned rectangle containing these points, one of the five points is in the interior of this rectangle. Imagine that we assign a negative label to this interior point and a positive label to each of the remaining four points, as illustrated in figure (b) below.There is no axis-aligned rectangle that can realize this labeling. Hence, no set of five distinct points can be shattered and $VCdim(axis-aligned rectangles) = 4$.

Corollary: The VC dimension of the set of oriented hyperplanes in $R^n$ is $n+1$. 
Proof: we can always choose $n + 1$ points, and then choose one of the points as origin, such that the position vectors of the remaining $n$ points are linearly independent, but can never choose $n + 2$ such points (since no $n + 1$ vectors in $R^n$ can be linearly independent).

Prove that $VC(H) ≤ log_2 |H|$, where $H$ is a hypothesis space. ($|H|$ denotes the cardinality of the hypothesis space)

Assume $VC(H) = d$, which means for $d$ instances $2^d$ different labelling possible. Therefore  $|H|$must be able to represent  $2^d$ hypothesis. Thus, $2^d <= |H|$  Thus, $d =VC(H) ≤ log_2 |H| $

Origin-centered circles and spheres
What is the VC dimension of an origin-centered circle (2D)
$f(x)=sign(ax^⊤x+b)$

The VC dimension is 2. With any set of three points, they will be at some radii $r1≤r2≤r3$ from the origin, and no function $f$ will be able to label the points at $r1$ and $r3$ with +1 while labeling the point at $r2$ with -1. (The opposite labeling also cannot be achieved.) Thus, no set of three points is shatterable by origin-centered circles, making the VC dimension just 2. 

What about an origin-centered sphere (3D)?
The VC dimension in this case is also 2. The same reasoning as for the 2D case applies.

Comments

Popular posts from this blog

Advanced Topics in Machine Learning - CST396 KTU CS Sixth Semester Honours Notes - Dr Binu V P 9847390760

Inference in Graphical Models-Inference on Chain , Trees and factor Graphs

Sampling Method