ࡱ>  " v$bjbjVV ,<<%3KKK___8$_N"!HNJNJNJNJNJNJNPYSjJN-Kl@JN!KwNR(K!HNHNGl)"L!PPf_@I*4NN0NjIS4STLSKLJNJNNS : Dr. Eick COSC 6342Machine Learning Homework2 Spring 2011 First Draft Due: all graded problems (10, 11, 13, and 14) are due Mo. March 7, 11a; problem 10b can be submitted after Spring Break, but try your best to submit on Monday! Last updated: March 3, 10a 10) PRINCIPAL COMPONENT ANALYSIS i) What is the main difference between principal component analysis and subset selection? ii) Compute the first principal component for the covariance matrix of Problem 9 (use any procedure you like). What percentage of the variance does the first principal component capture? Interpret your result! 11) K-Means and EM a) What is the meaning of b_it and h_it in the minimization process of K-means/EMs objective function? What constraint holds with respect to h_it and b_it? Remark: b_it means h_it means b) Why do K-means and EM employ an iterative optimization procedure rather than differentiating the objective function and setting its gradient to 0 to derive the optimal clustering? What is the disadvantage of using an iterative optimization procedure? c) EM is called a soft clustering algorithmwhat does this mean? d) What are advantages you see in using EM over K-means? e) Assume you cluster a 2-dimensional dataset with K-means with k=2 using Manhattan distance as your distance function and the initial clusters are: Cluster1: {(8,8), (8,7), (1,1)} Cluster2:= {(5,4), (2,3), (2,2)} What clusters will K-means compute in his next iteration! Give all steps of the computation! 12) kNN [8] (not graded) a) Selecting the proper value for the number of neighbors k is a critical problem when using kNN to solve real-world problems. What approach could be used to select k? [2] b) Compare kNN with parametric classification approaches. What are the main differences between the two approaches? c) kNN is a lazy classifier; what does this mean? What are the advantages/disadvantages of using lazy classifiers? 13) Non-Parametric Density Estimation Assume we have a one dimensional dataset containing values {2, 3, 7, 8, 9, 12} Assume h=2 for all questions (formula 8.2); compute p(x) using equation 8.2 for x=8 and x=12 Now compute the same densities using Silvermans nave estimator (formula 8.4)! Now assume we use a Gaussian Kernel Estimator (equation 8.7); give a verbal description and a formula how this estimator measures the density for x=11 Compare the 3 density estimation approaches; what are the main differences and advantages for each approach? 14) Decision Trees a) We would like to predict the gender of a person based on two binary attributes: leg-cover (pants or skirts) and beard (beard or bare-faced). We assume we have a data set of 20000 individuals, 16000 of which are male and 4000 of which are female. 80% of the 16000 males are barefaced. Skirts are present on 50% of the females. All females are bare-faced and no male wears a skirt. Compute the information gain of using the attribute leg-cover for predicting gender! Just giving the formula that computes the information gain is fine; you do not need to compute the exact value of the formula! Use H as the entropy function in your formula (e.g. H(1/3,2/3) is the entropy that 1/3 of the examples belong to class1 and 2/3 of the examples belong to class 2). Computer the information gain of using the attribute beard to predict gender! Based on your computations which of the two tests would you use as the root test of a decision tree which predicts gender. b) Why do MANY decision tree learning algorithms grow the entire tree and then apply pruning techniques to trim the tree to obtain a tree of smaller size? [3] c) The decision tree learning algorithm is a greedy algorithm. What does this mean? What is the reason that most decision tree learning algorithms are greedy algorithms? Gain(Skirt={yes,no})=H(4/5,1/5)-(1/10*H( #$%./8:;FGLM_t}ֿ~skc[cScSKSKSh2B*php0hzvB*php0h;B*php0hF$rB*php0hDSB*php0hChB*php0hk~0h;CJ"aJ"h;CJ"aJ"h@KCJ"aJ"hCJ"aJ"hk~0hk~0CJ"aJ"hk~0hCJ"aJ"hDSCJ"aJ"hk~0he,TCJ"aJ"hk~0hk~06CJ"aJ"hk~0h'76CJ"aJ"hk~0h/*CJ"aJ"hk~0h'7CJ"aJ"h(CJaJ ;G  & S g  / 0 / t B gdF$r P gd $ P a$gde,T$a$gde,T$a$gde,T     % & .  ; R S U e f g i   + . / 0 t A B p s w z ·³tplllh;h1tN jhF$rh 8UmHnHuh;CJaJhF$rhF$rCJaJhF$rCJaJ hF$rhF$rhz8hF$rh 8hWh;h1tNCJaJh;hWCJaJh;hF$rCJaJh hbMhbMB* phhReB* phhB* phh~B*php0* (5#89:;Oo1tyHJYlj$$ʻʻʳUhzhzOJQJhz h;PJh;h;CJaJhDSh;hWh 8h 8CJaJh1tNh"^h!L h2hF$rhF$rCJaJhF$rCJaJ hF$rhF$rh 8hF$rhz89dKKZ$$$$$$$$ P gd $ P a$gdU|[ & F 88^8gd;gd; & FgdWgdF$r0/1)+ 9/10*H(8/9,1/9)     PAGE  PAGE 2  EMBED Equation.3   EMBED Equation.3  $$$$$$ $!$#$$$&$'$)$*$0$1$2$4$5$;$<$=$>$?$A$B$C$V$W$X$Y$[$\$o$p$q$r$t$u$v$Կ޷hjh hF$rEHUjYM hF$rUVjh h 8EHUj1YM hF$rUVhF$rjhF$rUh> 0JmHnHuhy : hy :0Jjhy :0JUhX~*jhX~*U h$hh# hF$rh;h;'$$ $"$#$%$&$($)$2$3$4$?$@$A$B$Z$[$s$t$u$v$ P gdgdF$rh]hgd'7 &`#$gdB+,1h/ =!"#$% `!9`! {P 2 @XJ xcdd`` $X bbd12,(ㆫar`V`3H1g`YРW@P5< %! `3),L ! ~ Ay ?'i,@պ@9cXgI%Ϭ7S_ &%P$UrI>1@"#RpeqIj. @ ] @u; b#3X?M@`!9`Xp53Zg @XJ xcdd`` $X bbd12,(ㆫar`V`3H1g`YРW@P5< %! `3),L ! ~ Ay ?' fiYu&si#.Y`%D0?L j>#Ȟp{@Q$A>$;FLLJ% AfPdh,] `MODd T0  # A2e`! {P 2AD`!9`! {P 2 @XJ xcdd`` $X bbd12,(ㆫar`V`3H1g`YРW@P5< %! `3),L ! ~ Ay ?'i,@պ@9cXgI%Ϭ7S_ &%P$UrI>1@"#RpeqIj. @ ] @u; b#3X?M@Dd Tb  c $A? ?3"`?2ewjJ-;}|4A-`!9wjJ-;}|4 @XJ xcdd`` $X bbd12,(ㆫar`V`3H1g`YРW@P5< %! `3),L ! ~ Ay ?'r,@պ@9,ΒJS"Y o@ H5d=Lf pɨ s} cE#F&&\ w3u(2tA4| .Ff~M !$%&')(*+,X./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWRoot Entry F@ f# Data WordDocument,ObjectPool`f@ f_1297678897 F`f`fOle CompObjfObjInfo  "#$%'()+,-./1 FMicrosoft Equation 3.0 DS Equation Equation.39qW  .  @ & & MathType Times New RomanXwaw @w5f-2 tyOlePres000 Equation Native  ;_1297678992 F`f`fOle  2 GiyTimes New RomanXwaw @w5f-2 @"by & "System5f !-NANIqx2 b it FMicrosoft Equation 3.0 DS Equation Equation.39qCompObj fObjInfoOlePres000 Equation Native ;W  .  @ & & MathType Times New RomanXwaw @w6f-2 ty2 aiyTimes New RomanXwaw @w6f-2 @6hy & "Systemǐ6f !\-NANIq. h itOh+'0|   , 8 D P\dlt1)Computer ScienceNormalChristoph Eick9Microsoft Office Word@~mg@G@1Table-TSummaryInformation(DocumentSummaryInformation8!0MsoDataStorePߔfPPf@OA ՜.+,0 hp  University of Houston 1) Title ^ 2 0@P`p2( 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p8XV~_HmH nH sH tH @`@ NormalCJ_HaJmH sH tH ::  Heading 1$@&5\DA`D Default Paragraph FontViV  Table Normal :V 44 la (k (No List 4 @4 '7Footer  !.)@. '7 Page Number6B6 [/ Body TextPJaJB^"B [/ Normal (Web)dd[$\$6U16  Hyperlink >*B*ph>B>  Footnote TextCJaJ<Q< Footnote Text Char@&a@ Footnote ReferenceH*PK![Content_Types].xmlj0Eжr(΢Iw},-j4 wP-t#bΙ{UTU^hd}㨫)*1P' ^W0)T9<l#$yi};~@(Hu* Dנz/0ǰ $ X3aZ,D0j~3߶b~i>3\`?/[G\!-Rk.sԻ..a濭?PK!֧6 _rels/.relsj0 }Q%v/C/}(h"O = C?hv=Ʌ%[xp{۵_Pѣ<1H0ORBdJE4b$q_6LR7`0̞O,En7Lib/SeеPK!kytheme/theme/themeManager.xml M @}w7c(EbˮCAǠҟ7՛K Y, e.|,H,lxɴIsQ}#Ր ֵ+!,^$j=GW)E+& 8PK!Ptheme/theme/theme1.xmlYOo6w toc'vuر-MniP@I}úama[إ4:lЯGRX^6؊>$ !)O^rC$y@/yH*񄴽)޵߻UDb`}"qۋJחX^)I`nEp)liV[]1M<OP6r=zgbIguSebORD۫qu gZo~ٺlAplxpT0+[}`jzAV2Fi@qv֬5\|ʜ̭NleXdsjcs7f W+Ն7`g ȘJj|h(KD- dXiJ؇(x$( :;˹! I_TS 1?E??ZBΪmU/?~xY'y5g&΋/ɋ>GMGeD3Vq%'#q$8K)fw9:ĵ x}rxwr:\TZaG*y8IjbRc|XŻǿI u3KGnD1NIBs RuK>V.EL+M2#'fi ~V vl{u8zH *:(W☕ ~JTe\O*tHGHY}KNP*ݾ˦TѼ9/#A7qZ$*c?qUnwN%Oi4 =3ڗP 1Pm \\9Mؓ2aD];Yt\[x]}Wr|]g- eW )6-rCSj id DЇAΜIqbJ#x꺃 6k#ASh&ʌt(Q%p%m&]caSl=X\P1Mh9MVdDAaVB[݈fJíP|8 քAV^f Hn- "d>znNJ ة>b&2vKyϼD:,AGm\nziÙ.uχYC6OMf3or$5NHT[XF64T,ќM0E)`#5XY`פ;%1U٥m;R>QD DcpU'&LE/pm%]8firS4d 7y\`JnίI R3U~7+׸#m qBiDi*L69mY&iHE=(K&N!V.KeLDĕ{D vEꦚdeNƟe(MN9ߜR6&3(a/DUz<{ˊYȳV)9Z[4^n5!J?Q3eBoCM m<.vpIYfZY_p[=al-Y}Nc͙ŋ4vfavl'SA8|*u{-ߟ0%M07%<ҍPK! ѐ'theme/theme/_rels/themeManager.xml.relsM 0wooӺ&݈Э5 6?$Q ,.aic21h:qm@RN;d`o7gK(M&$R(.1r'JЊT8V"AȻHu}|$b{P8g/]QAsم(#L[PK-![Content_Types].xmlPK-!֧6 +_rels/.relsPK-!kytheme/theme/themeManager.xmlPK-!Ptheme/theme/theme1.xmlPK-! ѐ' theme/theme/_rels/themeManager.xml.relsPK] 2v25v* $$$' $v$ $v$   '!!-/5::/X2$`! {P 2A.*2$`Xp53ZgAo+@@(    BA ? ?Object 4"?  BA ? ?Object 4"?B S  ?ghv; twt&'$(X[ # "#%&()ABwOQ+4 #S U  "#%&()ABw3333333};R45  "#%&()ABZ[stw};R45  "#%&()ABZ[stwN%pڋxCvwu"tvap(:<)z.Ά ~,Y,,6w("j/Rbf |S><dK<dwu" ~,x"j/lB                                   NTVޑ oBw_h,"2H*@u                  " ЩL1                         4VN                 D3>Aڂךt.}̡F6                           d                 bf9~; + > d$rn14Q!L +"v%F(/**X~*B+$W-.V;/ry/k~0j3'78 8z8y :< j=@@O@>IdLJjJ@KrK-MbMN1tN7 Od&OgQe,T_XU|[Hf^eRe=m~oSeqF$rzv0W^&<DS5bC4Cy8#Y8\C0Ve 2]+]7ExPOb%@:"^rtS[ 7kq <}{Bws*'Xf@CVz[/_9k;VM(CU @(v@H@UnknownG*Ax Times New Roman5Symbol3. *Cx ArialG=  jMS Mincho-3 fg=. Cordia New?= *Cx Courier New;WingdingsA BCambria Math"qhFBFR 5A A !24d 3QHX?b2!xx1)Computer ScienceChristoph EickT               F'Microsoft Office Word 97-2003 Document MSWordDocWord.Document.89q