Shattering

The concept of shattering of a set of points plays an important role in Vapnik-Chervonenkis theory, also known as VC-theory. Shattering and VC-theory are used in the study of empirical processes as well as in statistical computational learning theory.

Definition
Suppose we have a class C of sets and a given set A. C is said to shatter  A if, for each subset T of A, there is some element U of C such that


 * $$ U \cap A = T.\,$$

Equivalently, C shatters A when the power set P(A) is the set { U &cap; A | U &isin; C }.

For example, the class C  of all discs in the plane (two-dimensional space) cannot shatter every set A of four points, yet the class of all convex sets in the plane shatters every finite set on the (unit) circle. (For the collection of all convex sets, connect the dots!)

We employ the letter C  to refer to a "class" or "collection" of sets, as in a Vapnik-Chervonenkis class (VC-class). The set A is often assumed to be finite because, in empirical processes, we are interested in the shattering of finite sets of data points.

Shatter coefficient
To quantify the richness of a collection $$ C $$ of sets, we use the concept of shattering coefficients (also known as shatter coefficients or the growth function). For a collection $$ C $$ of sets $$ s \subset $$ Ω,  Ω being any space,  often a probability space, we define the nth shattering coefficient of $$ C $$ as


 * $$ S _C(n) = \max_{x_1,x_2,\dots,x_n \in \Omega } \operatorname{card} \{\,\{\,x_1,x_2,\dots,x_n\}\cap s, s\in C \}$$

where "card" denotes the cardinality, that is the number of elements of a set.

$$ S _C(n) $$ is equal to the largest number of subsets of any set $$ A $$  of n points that can be formed by intersecting  $$ A $$ with the sets in collection $$ C $$.

It is obvious that
 * 1.$$S_C(n)\leq 2^n$$ for all n.
 * 2. If $$S_C(n)=2^n$$, that means there is a set of cardinality n, which can be shattered by C.
 * 3. If $$S_C(N)<2^N$$ for some $$N>1$$ then $$S_C(n)<2^n$$ for all $$n\geq N$$

The third property means that if C cannot shatter any set of cardinality N then it cannot shatter sets of larger cardinalities.

Vapnik-Chervonenkis class
The VC dimension of a class C is defined as
 * $$VC(C)=\underset{n}{\min}\{n:S_C(n)<2^n\}\,$$

or, alternatively, as
 * $$VC_0(C)=\underset{n}{\max}\{n:S_C(n)=2^n\}.\,$$

Note that $$VC(C)=VC_0(C)+1$$.

If for any n there is a set of cardinality n which can be shattered by C, then $$S_C(n)=2^n$$ for all n and the VC dimension of this class C is infinite.

A class with finite VC dimension is called a Vapnik-Chervonenkis class or VC class. A class C is uniformly Glivenko-Cantelli if and only if it is a VC class.