ࡱ> mol y'bjbjWW 4D55yv v < <Z\rrrr;;;fZhZhZhZhZhZhZ$]y`fZ;;;;;ZrrZyyy;FrrfZy;fZyyxRWrvBjpTfRZZ0ZT`$`pWW`Wy;;;ZZj;;;Z;;;;`;;;;;;;;;v : CSE 231 Spring 2014 Computer Project #8 Assignment Overview This assignment focuses on the design, implementation and testing of a Python program to process collections of data using data structures, as described below. It is worth 60 points (6% of course grade) and must be completed no later than 11:59 PM on Monday, March 31. Assignment Deliverable The deliverable for this assignment is the following file: proj08.py your source code program Be sure to use the specified file name and to submit it for grading via the handin system before the project deadline. Assignment Background Whenever a Wikipedia article is edited, information about the revision is logged by the system. The SNAP project at Stanford University has collected some of those revisions and published them in an accessible format. The SNAP data file contains one log entry for each revision, where each log entry contains 13 lines. A sample log entry is shown below: REVISION 4781981 72390319 Steven_Strogatz 2006-08-28T14:11:16Z SmackBot 433328 CATEGORY American_mathematicians IMAGE MAIN Boston_University MIT Harvard_University Cornell_University TALK USER USER_TALK OTHER De:Steven_Strogatz Es:Steven_Strogatz EXTERNAL http://www.edge.org/3rd_culture/bios/strogatz.html TEMPLATE Cite_book Cite_book Cite_journal COMMENT ISBN formatting &/or general fixes using [[WP:AWB|AWB]] MINOR 1 TEXTDATA 229 Each line of a log entry begins with a label, such as REVISION or CATEGORY. There is a blank line between log entries. The REVISION line of a log entry contains the following information: Article identification number Revision identification number Article title Revision time stamp User name of editor User identification number of editor The last two fields contain an Internet Protocol address (IP address) if the editor is not a registered Wikipedia user. Assignment Specifications 1. You will develop a Python program to manage information about the revision history for Wikipedia articles. The program will input a collection of log entries from a file, and then allow the user to display statistics about the revisions. 2. The program will recognize the following commands: QUIT HELP INPUT filename TOP n EDITORS TOP n EDITS TOP n ARTICLES The program will be operated interactively: it will prompt the user and accept commands from the keyboard. The program will recognize commands entered with any mix of upper and lower case letters. If the user enters an invalid command, the program will display an appropriate message and prompt the user to enter another command. 3. The QUIT command will halt execution. 4. The "HELP" command will display information to the user about the commands recognized by the program. 5. The "INPUT" command will be followed by a string representing the name of an input file. The program will discard the current data set stored in memory, and then process the input file as the source for a new data set. If the user enters an invalid file name, the program will display an appropriate message and prompt the user to enter another command. The program will ignore all lines except properly formatted REVISION lines. The program will use the data set to construct two dictionaries, as described below. a) Each entry in the first dictionary has a key which is an editor and a value which is a set (or list) of articles revised by that editor: {editor: {set of articles revised by editor}} The rest of your program will be easier if that set (or list) contains no duplicates. b) Each entry in the second dictionary has a key which is an article and a value which is a tuple (or list) that contains the number of edits and a set (or list) of editors who revised the article: {article: (count of edits, {set of editors who revised the article})} The rest of your program will be easier if that set (or list) of editors contains no duplicates. 6. The "TOP" command will be followed by an integer number and a string (one of "EDITORS", "EDITS" or "ARTICLES"). If the user enters an invalid command, the program will display an appropriate message and prompt the user to enter another command; the program will not display an empty report. For each report, the program will display the relevant information about the top "n" items in a given category: EDITORS the editors who have revised the most articles (display the user name of the editor and the number of articles revised by that editor). EDITS the articles which have been revised the most often (display the title of the article and the number of times that article was revised). ARTICLES the articles which have been revised by the most editors (display the title of the article and the number of editors who have revised that article). The information will be displayed in tabular form. The fields will be identified using column headers, and the fields will be aligned beneath the headers. The reports will be sorted from highest to lowest value in the category. Entries with the same sort value will be displayed in alphabetical order. 7. The program will display appropriate messages to inform the user about any unusual circumstances. 8. The program will consist of at least four meaningful functions: A function which processes INPUT filename commands (reads the contents of the file and builds the two dictionaries). A function which processes TOP n EDITORS commands (displays the information). A function which processes TOP n EDITS commands (displays the information). A function which processes TOP n ARTICLES commands (displays the information). Communication between functions will occur via parameters and return values; you may not use any global variables. Assignment Notes 1. The project directory contains two sample data files (sample1.txt and sample2.txt) which may be used as you develop your solution. However, your program is expected to execute correctly for any properly formatted data file. 2. Additional information about the SNAP project can be found on the web:  HYPERLINK "http://snap.stanford.edu/" http://snap.stanford.edu/ 3. You may assume that the Article identification number and the Article title both uniquely identify an article. Similarly, you may assume that the User name and User identification number uniquely identify an editor. 4. As noted above, some REVISION lines contain IP addresses, rather than User names and User identification numbers (for example, ip:24.188.31.147). Your program will accept them as user names since they are reasonable aliases for actual people (that is not precise, but sufficient for our purposes). 5. The only time program execution will halt is when the user enters the QUIT command. 6. As noted above, your program must consist of at least four meaningful functions; you certainly may develop more than four functions. You will have to decide how to decompose the overall program into smaller tasks. Note that it would be natural to implement some (or even all) of the subtasks as functions. Each function must be declared at the global level. Each function should have a coherent purpose which can be expressed succinctly in a line or two. You may not use any global variables in your program. All communication between the functions which constitute your program will be done using parameters and return values. Reminders Solve the problem using pencil and paper first. Never show your program source code to another student or discuss specifics of how you implemented parts of your solution at the level of Python source code. Develop a simple version of the program, then add functionality incrementally, testing your program thoroughly after each addition. Use the handin system to turn in each version of your program, including the final version. You would be wise to back up your file on your H: drive, in addition to submitting it via the handin system. Be sure to log out when you leave the room, if youre working in a public lab. *+>?@AV    D Z b e { | Ȭȧ{whcf h4uhgOJQJ^JhzhOJQJ^Jh<h5 hM5h\ h[5h[hghMhT7h&hhk h=h[h~h25CJaJh\5CJaJhg5CJaJ h\5 hm5h~h\5.+?@AUVd e | } W X n o J K $ h^hgdFWgd07gdgd[$a$$a$gdc   , 2 3 9 X o O T f  $   LM`z׾ӂӂӂӂӂӂ~y h\5h:hpu(hFWhFWCJOJQJ^JaJmHsH hFWhFWCJOJQJ^JaJ"hMCCJOJQJ^JaJmHsH(hFWhFWCJOJQJ^JaJmHsHh9BhFWh&0h07 h075hcf h5h<h5hMh.$ E K  2 r z MNl_`zh^hgd&0h^hgdFWz{no78de89gdP6gd4gdh^hgdvcgdvcgd){z{CaoU78fi[lmnv689Jbuv~¾ʾҺβhpuhaEh%=h h 0hIh Ah3h4hh\]hvfhvchkhCh07hGhphn,"h){ h/75Aklf$%h^hgdP6 ^`gdR3 h*$^hgd0@*$gd%=gdP6 !&79<ADSilms.1EJPVZ_37õõçݵõõõççh%=h%=\nHtHh%=CJOJQJ^JaJh0@CJOJQJ^JaJ h8'h%=CJOJQJ^JaJhpu\nHtHh0@\nHtHhI\nHtHh%=\nHtHh%=hgs{=7;?DKegjorJK%-c`abcfi ̼ظԴаԬh 0h]+hvfh!L!M!v!!!!!!"""#"$"㸱㭩zrzjh:nHtHhpunHtHh7InHtH"h7ICJOJQJ^JaJnHtH(hYh7ICJOJQJ^JaJnHtHh7IhMC h:hMChMCnHtHhSTh:0J"jh:U h:h:jh:Uh:hz hz5hvfh]hpuh 0)  ""#"}"~"##M$N$$$% %9%:%%%]&^&&&h^hgdvf & Fgdzgd VgdzgdMCgd:`gd:$"~""""""""#=#D#R#m######L$$$$$% %9%:%R%Y%q%%%%&\&f&l&m&s&y&&&&'&'('x'y'žhl5hvfhvf5hrhw|hz5hbhz5h:hzhHjhvf hvfhz hz5 hHj5 hpu5 h+5 h+hMhMhfh Ah^^!h_th+hMChpu0&(')'x'y'gdl5 & Fgd){gdz & Fgdz21h:pc/ =!"#$% DyK http://snap.stanford.edu/yK Lhttp://snap.stanford.edu/yX;H,]ą'c^. 666666666vvvvvvvvv66666686666666666666666666666666666666666666666666666666hH6666666666666666666666666666666666666666666666666666666666666666662 0@P`p2( 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p8XV~_HmH nH sH tH H`H Normal*$CJPJ_HaJmH sH tHDA D Default Paragraph FontViV 0 Table Normal :V 44 la (k ( 0No List 2o2 WW8Num3z0OJQJ2o2 WW8Num3z1OJQJ2o2 WW8Num3z2OJQJ2o!2 WW8Num4z0OJQJ2o12 WW8Num4z1OJQJ2oA2 WW8Num4z2OJQJ4oQ4 WW8Num11z0OJQJ4oa4 WW8Num11z1OJQJ4oq4 WW8Num11z2OJQJ4o4 WW8Num14z0OJQJ8o8 WW8Num14z1 OJQJ^J4o4 WW8Num14z2OJQJ4o4 WW8Num19z0OJQJ4o4 WW8Num19z2OJQJ8o8 WW8Num19z4 OJQJ^J4o4 WW8Num21z0OJQJ8o8 WW8Num21z1 OJQJ^J4o4 WW8Num21z2OJQJDA D Default Paragraph Font6U`!6 Hyperlink >*B*ph>'`1> Comment ReferenceCJNORN Heading $x$OJQJCJPJ^JaJ6BR6 Body Text %x$/Qb$ List&D"@rD Caption 'xx $CJ6aJ]** Index( $44 Comment Text):j: Comment Subject*D@D Balloon Text+OJ QJ CJaJB/B Hfapple-converted-spaceFV`F 0n0FollowedHyperlink >*B*phPK![Content_Types].xmlN0EH-J@%ǎǢ|ș$زULTB l,3;rØJB+$G]7O٭V$ !)O^rC$y@/yH*񄴽)޵߻UDb`}"qۋJחX^)I`nEp)liV[]1M<OP6r=zgbIguSebORD۫qu gZo~ٺlAplxpT0+[}`jzAV2Fi@qv֬5\|ʜ̭NleXdsjcs7f W+Ն7`g ȘJj|h(KD- dXiJ؇(x$( :;˹! I_TS 1?E??ZBΪmU/?~xY'y5g&΋/ɋ>GMGeD3Vq%'#q$8K)fw9:ĵ x}rxwr:\TZaG*y8IjbRc|XŻǿI u3KGnD1NIBs RuK>V.EL+M2#'fi ~V vl{u8zH *:(W☕ ~JTe\O*tHGHY}KNP*ݾ˦TѼ9/#A7qZ$*c?qUnwN%Oi4 =3N)cbJ uV4(Tn 7_?m-ٛ{UBwznʜ"Z xJZp; {/<P;,)''KQk5qpN8KGbe Sd̛\17 pa>SR! 3K4'+rzQ TTIIvt]Kc⫲K#v5+|D~O@%\w_nN[L9KqgVhn R!y+Un;*&/HrT >>\ t=.Tġ S; Z~!P9giCڧ!# B,;X=ۻ,I2UWV9$lk=Aj;{AP79|s*Y;̠[MCۿhf]o{oY=1kyVV5E8Vk+֜\80X4D)!!?*|fv u"xA@T_q64)kڬuV7 t '%;i9s9x,ڎ-45xd8?ǘd/Y|t &LILJ`& -Gt/PK! ѐ'theme/theme/_rels/themeManager.xml.relsM 0wooӺ&݈Э5 6?$Q ,.aic21h:qm@RN;d`o7gK(M&$R(.1r'JЊT8V"AȻHu}|$b{P8g/]QAsم(#L[PK-![Content_Types].xmlPK-!֧6 0_rels/.relsPK-!kytheme/theme/themeManager.xmlPK-!0C)theme/theme/theme1.xmlPK-! ѐ' theme/theme/_rels/themeManager.xml.relsPK] yD z7 $"y' $ z&y'!yX8@0(  0 # $LP{%#gkm s {33333+AXn$Er`{ose i fjbf=>LM!$~ { =>LM!$~{w d"֯TG.,J f/{D{! f/@;9$;3AmtB^.z:5CBDz6 oD{k2HnP^ qRڷHXp-Td=T8yqk4-@kތz ^`OJQJo( 8^8`OJQJo( ^`OJQJo(o  p^ `OJQJo(  @ ^ `OJQJo( x^x`OJQJo( H^H`OJQJo(o ^`OJQJo( ^`OJQJo(^`OJQJ^`OJQJop^p`OJQJ@ ^@ `OJQJ^`OJQJo^`OJQJ^`OJQJ^`OJQJoP^P`OJQJhh^h`.^`.^`.P^`P@@^@`0^`0``^``^`^`^``^``00^0`h^`OJQJo(hHh^`OJQJo(hHohp^p`OJQJo(hHh@ ^@ `OJQJo(hHh^`OJQJo(hHoh^`OJQJo(hHh^`OJQJo(hHh^`OJQJo(hHohP^P`OJQJo(hHh ^`hH.h ^`hH.h pL^p`LhH.h @ ^@ `hH.h ^`hH.h L^`LhH.h ^`hH.h ^`hH.h PL^P`LhH.h ^`hH.h ^`hH.h pL^p`LhH.h @ ^@ `hH.h ^`hH.h L^`LhH.h ^`hH.h ^`hH.h PL^P`LhH.h^`OJQJo(hHh^`OJQJo(hHohp^p`OJQJo(hHh@ ^@ `OJQJo(hHh^`OJQJo(hHoh^`OJQJo(hHh^`OJQJo(hHh^`OJQJo(hHohP^P`OJQJo(hHh ^`hH.h ^`hH.h pL^p`LhH.h @ ^@ `hH.h ^`hH.h L^`LhH.h ^`hH.h ^`hH.h PL^P`LhH.^`o(. ^`hH. pL^p`LhH. @ ^@ `hH. ^`hH. L^`LhH. ^`hH. ^`hH. PL^P`LhH.^`o(. ^`hH. pLp^p`LhH. @ @ ^@ `hH. ^`hH. L^`LhH. ^`hH. ^`hH. PLP^P`LhH.88^8`o(. ^`hH.  L ^ `LhH.   ^ `hH. xx^x`hH. HLH^H`LhH. ^`hH. ^`hH. L^`LhH.h^`OJQJo(hHh^`OJQJ^Jo(hHohpp^p`OJQJo(hHh@ @ ^@ `OJQJo(hHh^`OJQJ^Jo(hHoh^`OJQJo(hHh^`OJQJo(hHh^`OJQJ^Jo(hHohPP^P`OJQJo(hHh^`OJQJo(hHh^`OJQJo(hHohp^p`OJQJo(hHh@ ^@ `OJQJo(hHh^`OJQJo(hHoh^`OJQJo(hHh^`OJQJo(hHh^`OJQJo(hHohP^P`OJQJo(hHh ^`hH.h ^`hH.h pL^p`LhH.h @ ^@ `hH.h ^`hH.h L^`LhH.h ^`hH.h ^`hH.h PL^P`LhH.^`^Jo(()^`^J.pL^p`L^J.@ ^@ `^J.^`^J.L^`L^J.^`^J.^`^J.PL^P`L^J.8^8`o(. ^`hH.  L^ `LhH.  ^ `hH. x^x`hH. HL^H`LhH. ^`hH. ^`hH. L^`LhH.808^8`0o(. ^`hH. pLp^p`LhH. @ @ ^@ `hH. ^`hH. L^`LhH. ^`hH. ^`hH. PLP^P`LhH.^`o(. ^`hH. pL^p`LhH. @ ^@ `hH. ^`hH. L^`LhH. ^`hH. ^`hH. PL^P`LhH.hh^h`OJQJo(hHh8^8`OJQJo(hHoh^`OJQJo(hHh ^ `OJQJo(hHh ^ `OJQJo(hHohx^x`OJQJo(hHhH^H`OJQJo(hHh^`OJQJo(hHoh^`OJQJo(hH88^8`o(. ^`hH.  L ^ `LhH.   ^ `hH. xx^x`hH. HLH^H`LhH. ^`hH. ^`hH. L^`LhH.hh^h`OJQJo(hHh8^8`OJQJo(hHoh^`OJQJo(hHh ^ `OJQJo(hHh ^ `OJQJo(hHohx^x`OJQJo(hHhH^H`OJQJo(hHh^`OJQJo(hHoh^`OJQJo(hHh ^`hH.h ^`hH.h pL^p`LhH.h @ ^@ `hH.h ^`hH.h L^`LhH.h ^`hH.h ^`hH.h PL^P`LhH.{D:5Cw yqkHX qRk2H@;tBnPTGDJ!-@kTd3A$; oDWW8Num3WW8Num18WW8Num26                                                               6                                   s2x        bb{        n]g                          :Ò                          VL@ `!jM \ 078<c'_I z4 cf !^^!n," #]+M/ 0 03R3l56P6[790@ AIXAMCF!G7IQkxR<)S^*S,ZS Y*jYZ[`vcvfSh]k9mN[o_tzt4u-waw){gs{zkF\]IFW =7}Z|tT75[9Brp-9Y.2.:dp+aE/7agfhpuw|JvgKo~rm{M##&K*V V)V@}ksBX"W&0Rf?C:5sGXioxq\z> %=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcefghijknRoot Entry F03BpData #1Table+OaWordDocument4DSummaryInformation(\DocumentSummaryInformation8dCompObjr  F Microsoft Word 97-2003 Document MSWordDocWord.Document.89q