Bài giảng khai phá dữ liệu (data mining) support vector machine

Trịnh Tấn Đạt Khoa CNTT – Đại Học Sài Gòn Email: trinhtandat@sgu.edu.vn Website: https://sites.google.com/site/ttdat88/ Contents  Introduction  Review of Linear Algebra  Classifiers & Classifier Margin  Linear SVMs: Optimization Problem  Hard Vs Soft Margin Classification  Non-linear SVMs Introduction  Competitive with other classification methods  Relatively easy to learn  Kernel methods give an opportunity to extend the idea to  Regression  Density estimation  Kernel PCA  Etc Advantages of SVMs -  A principled approach to classification, regression and novelty detection  Good generalization capabilities  Hypothesis has an explicit dependence on data, via support vectors – hence, can readily interpret model Advantages of SVMs -  Learning involves optimization of a convex function (no local minima as in neural nets)  Only a few parameters are required to tune the learning machine (unlike lots of weights and learning parameters, hidden layers, hidden units, etc as in neural nets) Prerequsites  Vectors, matrices, dot products  Equation of a straight line in vector notation  Familiarity with  Perceptron is useful  Mathematical programming will be useful  Vector spaces will be an added benefit  The more comfortable you are with Linear Algebra, the easier this material will be What is a Vector ?  Think of a vector as a directed line segment in N-dimensions! (has “length” and “direction”)  Basic idea: convert geometry in higher dimensions into algebra!  Once you define a “nice” basis along each dimension: x-, y-, z-axis …    Vector becomes a x N matrix! v = [a b c]T Geometry starts to become linear algebra on vectors like v! a     v = b   c  y v x Vector Addition: A+B A+B + w = ( x1 , x ) + ( y1 , y ) = ( x1 + y1 , x + y ) A B C A+B = C (use the head-to-tail method to combine vectors) B A Scalar Product: av a v = a ( x1 , x ) = ( ax1 , ax ) av v Change only the length (“scaling”), but keep direction fixed Sneak peek: matrix operation (Av) can change length, direction and also dimensionality! Vectors: Magnitude (Length) and Phase (direction) v = ( x , x ,  , x )T n n v =  x2 (Magnitude or “2-norm”) i i =1 If v = 1, a unit vector Alternate representations: Polar coords: (||v||, ) Complex numbers: ||v||ej (unit vector => pure direction) y ||v||  “phase” x 10 Consider a Φ Φas shown below F é (a) F ê ù ê ê ê a1 ê ê ê 2am ê ê a12 ê a ê ê 2 m a ê (b) = ê ê 2a1a2 ê ê 2a1a3 ê ê ê 2a1am ê ê 2a2 a3 ê ê ê 2a2 am ê ê êë 2am- am ú ú é ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú úû ê ù ê ê ê 2b1 ê ê ê 2bm ê ê b12 2 ê b ê ê m ê b ê ê 2b1b2 ê ê 2b1b3 ê ê ê 2b1bm ê ê 2b2 b3 ê ê ê 2b2 bm ê ê êë 2bm- bm ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú úû 63 Collecting terms in the dot product  First term = +  Next m terms = å  Next m terms = m 2ai bi i=1 å  Rest =  Therefore m ai2 bi2 å i=1 m å å m m 2ai bi 2ai a j bi b j i=1 i=1 j=i+1 F (a) F (b) = 1+ 2å m i=1 bi + å m b + å i=1 i m å m 2ai a j bi b j i=1 j=i+1 64 Out of Curiosity (1+ a b) = (a b) + 2(a b) +1 2 ổ =ỗ m ổ bi ữ + ỗ ố ứ m bi ÷+1 è i=1 ø i=1 m m æ m = å å bi a j b j + ỗ bi ữ+1 ứ ố i=1 i=1 j=1 =å m i=1 (ai bi )2 + 2å m å m i=1 j=i+1 æ bi a j b j + ỗ m bi ÷+1 è i=1 ø 65 Both are Same  Comparing term by term, we see  Φ.Φ = (1 + a.b)2  But computing the right side is lot more efficient, O(m) (m additions and multiplications)  Let us call (1 + a.b)2 = K(a,b) = Kernel 66 Φ in “Kernel Trick” Example 2-dimensional vectors x = [x1 x2]; Let K(xi,xj)=(1 + xiTxj)2, Need to show that K(xi,xj)= φ(xi) Tφ(xj): K(xi,xj) = (1 + xiTxj)2 = 1+ xi12xj12 + xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2 = [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2] = φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2] 67 Other Kernels  Beyond polynomials there are other high dimensional basis functions that can be made practical by finding the right kernel function 68 Examples of Kernel Functions ◼ Linear: K(xi,xj)= xi Txj ◼ Polynomial of power p: K(xi,xj)= (1+ xi Txj)p ◼ Gaussian (radial-basis function network): K ( x i , x j ) = exp(− ◼ xi − x j 2 2 ) Sigmoid: K(xi,xj)= tanh(β0xi Txj + β1) 69  The function we end up optimizing is å R a k=1 s.t k R R a å å k=1 l=1 0£ a k k a lQkl where Qkl = yk yl K(xk , xl ) £ C, " k and å R a k yk = k=1 70 Multi-class classification Multi-class classification  One versus all classification Multi-class SVM Multi-class SVM SVM Software  Python: scikit-learn module  LibSVM (C++)  SVMLight (C)  Torch (C++)  Weka (Java) … 75 Research  One-class SVM (unsupervised learning): outlier detection  Weibull-calibrated SVM (W-SVM) / PI -SVM: open set recognition Homework  CIFAR-10 image recognition using SVM  The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, with 6000 images per class * There are 50000 training images and 10000 test images  These are the classes in the dataset: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck  Hint : https://github.com/wikiabhi/Cifar-10 https://github.com/mok232/CIFAR-10-Image-Classification

Bài giảng khai phá dữ liệu (data mining) support vector machine

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan