Kernels MIT 15.097 Course Notes
Kernels MIT 15.097 Course Notes
Cynthia Rudin
Credits: Bartlett, Scho?lkopf and Smola, Cristianini and Shawe-Taylor
The kernel trick that I'm going to show you applies much more broadly than SVM, but we'll use it for SVM's.
Warning: This lecture is technical. Try to get the basic idea even if you don't catch all the details.
Basic idea: If you can't separate positives from negatives in a low-dimensional space using a hyperplane, then map everything to a higher dimensional space where you can separate them.
The picture looks like this but it's going to be a little confusing at this point:
f
X
x
o
x
ox o
ox
or it might look like this:
f(o)
F
f(x)
f(x)
f(o) f(o)
f(o)
f(x) f(x)
Image by MIT OpenCourseWare.
1
Input Space
Feature Space
Image by MIT OpenCourseWare.
Say I want to predict whether a house on the real-estate market will sell today or not:
x=
x(1)
,
x(2)
,
x(3)
, x(4) , ... .
house's list price estimated worth length of time on market in a good area
We might want to consider something more complicated than a linear model:
Example 1: [x(1), x(2)] [x(1), x(2)] = x(1)2, x(2)2, x(1)x(2)
The 2d space gets mapped to a 3d space. We could have the inner product in the 3d space:
(x)T (z) = x(1)2z(1)2 + x(2)2z(2)2 + x(1)x(2)z(1)z(2). Example 2: [x(1), x(2), x(3)] [x(1), x(2), x(3)]
= [x(1)2, x(1)x(2), x(1)x(3), x(2)x(1), x(2)2, x(2)x(3), x(3)x(1), x(3)x(2), x(3)2] and we can take inner products in the 9d space, similarly to the last example.
2
Rather than apply SVM to the original x's, apply it to the (x)'s instead.
The kernel trick is that if you have an algorithm (like SVM) where the examples appear only in inner products, you can freely replace the inner product with a different one. (And it works just as well as if you designed the SVM with some map (x) in mind in the first place, and took the inner product in 's space instead.)
Remember the optimization problem for SVM?
m
1
max
i - 2
m
ijyiyk xTi xk
inner product
i=1
i,k=1
m
s.t. 0 i C, i = 1, ..., m and iyi = 0
i=1
You can replace this inner product with another one, without even knowing . In fact there can be many different feature maps that correspond to the same inner product.
In other words, we'll replace xT z (i.e., x, z Rn) with k(xi, xj), where k happens to be an inner product in some feature space, (x), (x) Hk. Note that k is also called the kernel (you'll see why later).
Example 3: We could make k(x, z) the square of the usual inner product:
k(x, z) =
x, z
2 Rn
=
n
2
nn
x(j)z(j) =
x(j)x(k)z(j)z(k).
j=1
j=1 k=1
But how do I know that the square of the inner product is itself an inner product in some feature space? We'll show a general way to check this later, but for now, let's see if we can find such a feature space.
Well, for n = 2, if we used [x(1), x(2)] = x(1)2, x(2)2, x(1)x(2), x(2)x(1) to map into a 4d feature space, then the inner product would be:
(x)T (z)
=
x(1)2z(1)2 + x(2)2z(2)2 + 2x(1)x(2)z(1)z(2) =
x, z
2 R2
.
3
So we showed that k is an inner product for n = 2 because we found a feature space corresponding to it.
For n = 3 we can also find a feature space, namely the 9d feature space from Example 2 would give us the inner product k. That is,
(x) = (x(1)2, x(1)x(2), ..., x(3)2), and (z) = (z(1)2, z(1)z(2), ..., z(3)2),
That's nice.
(x), (z) R9 =
x, z
2 R3
.
We can even add a constant, so that k is the inner product plus a constant squared.
Example 4:
n
k(x, z) = (xT z + c)2 =
x(j)z(j) + c
n
x( )z( ) + c
j=1
=1
nn
n
=
x(j)x( )z(j)z( ) + 2c x(j)z(j) + c2
j=1 =1
j=1
n
n
=
(x(j)x( ))(z(j)z( )) + ( 2cx(j))( 2cz(j)) + c2,
j, =1
j=1
and in n = 3 dimensions, one possible feature map is:
(x) = [x(1)2, x(1)x(2), ..., x(3)2, 2cx(1), 2cx(2), 2cx(3), c]
and c controls the relative weight of the linear and quadratic terms in the inner product.
Even more generally, if you wanted to, you could choose the kernel to be any higher power of the regular inner product.
Example 5: For any integer d 2 k(x, z) = (xT z + c)d,
4
where the feature space (x) will be of degree
n+d d
,
with
terms
for
all
monomi-
als up to and including degree d. The decision boundary in the feature space (of
course) is a hyperplane, whereas in the input space it's a polynomial of degree
d. Now do you understand that figure at the beginning of the lecture?
Because these kernels give rise to polynomial decision boundaries, they are called polynomial kernels. They are very popular.
Courtesy of Dr. Hagen Knaf. Used with permission.
Beyond these examples, it is possible to construct very complicated kernels, even ones that have infinite dimensional feature spaces, that provide a lot of modeling power:
5
? mlpy Developers. All rights reserved. This content is excluded from our Creative Commons license. For more information, see .
SVMs with these kinds of fancy kernels are among the most powerful ML algorithms currently (a lot of people say the most powerful). But the solution to the optimization problem is still a simple linear combination, even if the feature space is very high dimensional.
How do I evaluate f (x) for a test example x then? If we're going to replace xTi xk everywhere with some function of xi and xk that is hopefully an inner product from some other space (a kernel), we need to ensure that it really is an inner product. More generally, we'd like to know how to construct functions that are guaranteed to be inner products in some space. We need to know some functional analysis to do that.
6
Roadmap
1. make some definitions (inner product, Hilbert space, kernel) 2. give some intuition by doing a calculation in a space with a finite number
of states 3. design a general Hilbert space whose inner product is the kernel 4. show it has a reproducing property - now it's a Reproducing Kernel Hilbert
space 5. create a totally different representation of the space, which is a more intu-
itive to express the kernel (similar to the finite state one) 6. prove a cool representer theorem for SVM-like algorithms 7. show you some nice properties of kernels, and how you might construct them
Definitions
An inner product takes two elements of a vector space X and outputs a number. An inner product could be a usual dot product: u, v = u v = i u(i)v(i), or it could be something fancier. An inner product ?, ? must satisfy the following
conditions:
1. Symmetry
u, v = v, u u, v X
2. Bilinearity
u + v, w = u, w + v, w u, v, w X , , R
3. Strict Positive Definiteness u, u 0 x X
u, u = 0 u = 0.
7
An inner product space (or pre-Hilbert space) is a vector space together with an inner product.
A Hilbert space is a complete inner product space. (`Complete' means sequences converge to elements of the space - there aren't any "holes" in the space.)
Examples of Hilbert spaces include:
? The vector space Rn with u, v Rn = uT v, the vector dot product of u and v.
? The space
2 of square summable sequences, with inner product
u, v
=
2
i=1
uivi
? The space L2(X , ?) of square integrable functions, that is, functions f such that f (x)2d?(x) < , with inner product f, g L2(X ,?) = f (x)g(x)d?(x).
Finite States
Say we have a finite input space {x1, ..., xm}. So there's only m possible states for the xi's. (Think of the bag-of-words example where there are 2(#words) possible states.) I want to be able to take inner products between any two of them using my function k as the inner product. Inner products by definition are symmetric, so k(xi, xj) = k(xj, xi). In other words, in order for us to even consider k as a valid kernel function, the matrix:
needs to be symmetric, and this means we can diagonalize it, and the eigendecomposition takes this form:
K = VV where V is an orthogonal matrix where the columns of V are eigenvectors, vt, and is a diagonal matrix with eigenvalues t on the diagonal. This fact (that
8
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- we ve got you covered
- avoiding colloquial informal writing douglas hume
- english made easy greatschools
- to add ing to a word that ends in e drop e now see if
- how to become a better speller how to distinguish
- kernels mit 15 097 course notes
- avoiding second person southeast missouri state university
- phonemic awareness kindergarten and first grade
- useful argumentative essay words and phrases
- the y rule school specialty
Related searches
- minecraft 1.15 download java free
- 2.15 apy equals what rate
- 16 1 chapter 15 capital structure basic
- chapter 15 capital
- 15 year boat loan rates
- minecraft 1.15 download free
- minecraft 1 15 download java free
- new 1 15 crafting recipes
- 2 15 apy equals what rate
- mods for minecraft 1 15 1 java
- minecraft 1 15 download pc
- mit scratch download windows 10