TWiki
>
Kernel Web
>
LearningKernels
>
LearningKernelsQuickstart
(revision 5) (raw view)
Edit
Attach
---+Learning Kernels Quickstart Here we give some examples showing how to automatically create custom kernels using data. The kernels are generally created from a combination of base kernels, which can be specified by the user. The examples here will use the [[http://cs.nyu.edu/~rostami/data/electronics.tar.gz][electronics]] category from the sentiment analysis dataset of [[http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html][Blitzer et al.]], which we include with precomputed word level ngram features and binary as well as regression labels (1-5 stars). The data is arranged in LIBSVM format, which is compatible with all the learning algorithms used in these examples. The results shown here should be easily reproducible and serve as a good first exercise. Note that these exercises have been constructed the mechanics behind using the automatic kernel selection tools. The kernels and parameters and used here are NOT necessarily the best for the one dataset used here. ---++Feature Weighted Kernels The examples below consider the case when each base kernel corresponds to a single features. Such a set of base kernels occurs naturally when, for example, learning rational kernels as explained in [[http://www.cs.nyu.edu/~rostami/papers/lsk.pdf][Cortes et al. (MLSP 2008)]]. *Correlation Kernel Example:* The correlation based kernel weights each input feature by a quantity proportional to its correlation with the training labels. The following command will generate weighted features: <verbatim> $ klweightfeatures --lk_alg=corr --features --sparse \ --num_train=0:1000 --ker_reg=1 elec.2-gram elec.2-gram.corr INFO: Loaded 2000 datapoints. INFO: Selecting 1000 training examples... INFO: Using 57466 features. </verbatim> The =--features= flag forces the output of explicit feature vectors, rather than the kernel matrix, and the =--sparse= flag forces the use of sparse data-structure, which are both desirable in this case since the ngram-features are sparse. The =--num_train= flag indicates that the kernel selection algorithm should use only the first 1000 data-points for training, and thus allows us to use the remaining points as a holdout set for evaluating performance. Regularization on the kernel is imposed via the =--ker_reg= flag, which in the case of correlation kernels limits the kernel trace. Finally, the =--lk_alg= flag selects which type of kernel selection algorithm is used. The first argument indicates the input dataset and the second argument indicates the output file. The weighted features can then be used to train and test an svm model via libsvm or liblinear: Separate train and test: <verbatim> $ head -n1000 elec.2-gram.corr > elec.2-gram.corr.train $ tail -n+1001 elec.2-gram.corr > elec.2-gram.corr.test </verbatim> Train: <verbatim> $ svm-train -s 0 -t 0 -c 4096 elec.2-gram.corr.train model </verbatim> Test: <verbatim> $ svm-predict elec.2-gram.corr.test model pred Accuracy = 80% (800/1000) (classification) </verbatim> *L2 Regularized Linear Combination:* Here we optimally weight the input features in order to maximize the kernel ridge regression (KRR) objective, subject to the L2 regularization constraint: =||mu - mu0|| < ker_reg||=, where =mu= is the vector of squared weights and =mu0= and =ker_reg= are user specified arguments. <verbatim> $ klweightfeatures --lk_alg=lin2 --features --sparse --num_train=0:1000 \ --alg_reg=4 --ker_reg=1 --offset=1 --tol=1e-4 elec.1-gram.reg elec.1-gram.lin2 INFO: Loaded 2000 datapoints. INFO: Selecting 1000 training examples... ... INFO: iter: 18 obj: 334.069 gap: 0.000123848 INFO: iter: 19 obj: 334.071 gap: 6.18826e-05 INFO: Using 12876 features. </verbatim> The algorithm will iterate until the tolerance, which is set by the =--tol= flag, or maximum number of iterations is met. In this case =mu0= is equal to zero (the default) and =ker_reg= is specified by the second argument to the function. Since this selection is algorithm specific, we should also specify the regularization parameter we will use in the second step via the =--alg_reg= flag. The =--offset= flag adds the constant indicated offset to the dataset input if one is not already included. Finally, the ==--tol= flag indicates at what precision the iterative method should stop. We then train and test using kernel ridge regression (KRR), with input and output arguments that have been made to closely resemble libsvm. One main difference is that the user must specify to use sparse data-structures. If the data is dense, it is better to use highly efficient dense blas routines instead by omitting the =--sparse= flag. To see a full list of command line arguments, run krr-train without any parameters. Separate train and test: <verbatim> $ head -n1000 elec.1-gram.lin2 > elec.1-gram.lin2.train $ tail -n+1001 elec.1-gram.lin2 > elec.1-gram.lin2.test </verbatim> Train: <verbatim> $ krr-train --sparse elec.1-gram.lin2.train 4 model </verbatim> Test: <verbatim> $ krr-predict --sparse elec.1-gram.lin2.test model pred INFO: Using primal solution to make predictions... INFO: RMSE: 1.34909 </verbatim> ---++Kernel Combinations with Explicit Features Here we consider the case of combining several general base kernels that admit explicit feature mappings. In the case that these features are sparse, for example, we are able to very efficiently compute combinations in high dimensional features spaces. In this example we find the best linear combination of 5 character-level ngram kernels (1-gram, 2-gram, ..., 5-gram) with respect to the SVM objective. <verbatim> $ klcombinefeatures --lk_alg=lin1 --sparse --num_train=0:1000 \ --alg_reg=0.1 elec.list elec.comb ... INFO: iter: 10 constraint: -5.65162 theta: -5.65151 gap: 1.92756e-05 20: objval = -5.651513964e+00 infeas = 1.000000000e+00 (0) 21: objval = -5.651616432e+00 infeas = 0.000000000e+00 (0) OPTIMAL SOLUTION FOUND .* optimization finished, #iter = 12 Objective value = -5.651646 nSV = 790 INFO: iter: 11 constraint: -5.65165 theta: -5.65162 gap: 5.23689e-06 </verbatim> Here the argument =elec.list= is a file with the paths to each basekernel written on a separate line, and the combined kernel is written to the file =elec.comb=. The flag =--alg_reg= indicates the regularization parameter that will be used with SVM. This will produce a kernel with many features, but which are sparse, thus liblinear is a good choice for training a model: Separate train and test: <verbatim> $ head -n1000 elec.comb > elec.comb.train $ tail -n+1001 elec.comb > elec.comb.test </verbatim> Train: <verbatim> $ train -s 3 -c 0.1 -B -1 elec.comb.train model </verbatim> Test: <verbatim> $ predict elec.comb.test model pred Accuracy = 83.6% (836/1000) </verbatim> ---++General Kernel Combinations The final example listed here is regarding general combinations of kernels, where we combine the kernel matrices of the 5 ngram kernels. Of course, in practice this general kernel combination should be used when easy to represent explicit feature mapping are not available, such as with Guassian kernels. <verbatim> klcombinekernels ... $ klcombinekernels --lk_alg=lin2 --num_train=0:1000 \ --alg_reg=0.0001 --tol=1e-3 elec.kernel.list elec.kernel.comb ... INFO: iter: 13 obj: 2981.09 gap: 0.00217109 INFO: iter: 14 obj: 2980.99 gap: 0.00108555 INFO: iter: 15 obj: 2980.94 gap: 0.000542772 </verbatim> Here =elec.kernel.list= is a file with the path to each base kernel written on a separate line. Separate train and test: <verbatim> $ head -n1000 elec.kernel.comb > elec.kernel.comb.train $ tail -n+1001 elec.kernel.comb > elec.kernel.comb.test </verbatim> Train: <verbatim> $ krr-train --kernel elec.kernel.comb.train 0.0001 model </verbatim> Test: <verbatim> $ krr-predict --kernel elec.kernel.comb.test model pred INFO: Using dual solution to make predictions... INFO: Making predicitons... INFO: RMSE: 1.36997 </verbatim> -- Main.AfshinRostamizadeh - 10 Sep 2009
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r7
<
r6
<
r5
<
r4
<
r3
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r5 - 2009-09-24
-
AfshinRostamizadeh
Kernel
Log In
or
Register
Kernel Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Webs
Contrib
FST
Forum
GRM
Kernel
Main
Sandbox
TWiki
Main
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback