ࡱ> OQNTY >bjbjWW ==l9]ZZZZ8V$z ,bbff|ZZZT V V V V V V $ z Z48"ZZz jZZf|bjjjZZ8f8fT ZZZZZT jjT T f\PcT NORTHERN CALIFORNIA ORACLE USERS GROUP November 19, 1998 Chris Lawson chris_lawson @ yahoo.com FOCUS OF PRESENTATION: Explore some "strange" rdbms problems that have baffled some DBAs. Most of the mysteries occurred on critical production systems, although some were on development systems. Some of these problems lasted for MONTHS, until someone took the time to investigate; others were solved in a few hours. ALL the mysteries were eventually explained. Depending on your personal experience, some of these "mysteries" will seem trivial or commonplace; others will indeed seem mysterious. Most mysteries have a simple explanation. Most mysteries have a simple fix. WHY SPEND TIME ON THESE RDBMS MYSTERIES? Each DBA has a unique set of experiences and biases. What one DBA thinks is obvious, another will not. An oracle "detective" is part scientist/part artist. Many solutions require CREATIVITY, not just logic. A superior DBA will look for ways to "stretch" and learn ways to handle difficult problems. Without working difficult problems, you will not advance as a DBA. You will be the "hero" if you encounter a mystery and solve it; but remember the solution, as you may see it again! NOTE: If you have a better solution to any of these mysteries, please share it! MYSTERY #1 "THE CASE OF THE BERSERK APPLICATION" Clue # 1: Big Phone Company 1997 Using HPUX, rdbms 7.3.2.3. Help-desk application (Vantive) that connects to oracle rdbms suddenly goes berserk, creating thousands of connections. Program had worked normally for many months. DBAs watch helplessly as CPU load driven from 1 to 50. As DBAs kill extra processes, more take their place. Alert log and recent rdbms trace files show nothing unusual. DBAs are united in accusing the application as the culprit. MYSTERY #1 (contd) Clue # 2: Big Publisher Ltd. 1998 Running SUN Solaris, rdbms 7.3.2.3 Users complain that performance has degraded in recent months. Manager states that something must be wrong with the network. Application is CORIS, a document management/printing app. DBA investigates. Discovers that time to connect in sql*plus is about 30-45 seconds, even though server load is very low. Connect time is terrible whether remote (PC) or directly on server. Server load (file I/O and CPU) is generally low. MYSTERY # 2 "THE CASE OF THE RELUCTANT PATCH" BACKGROUND: To correct several bugs, decision is made to upgrade from 7.3.2.2 to 7.3.2.3 (HPUX) Patch is obtained from Oracle, and applied to test server. DBA notes that patch ran very quickly, and runs again "just to be sure" Bug is now gone on Test server. PROBLEM: Patch is similarly applied to production server--same OS & server. Production application is tested, but bug is still there! Another DBA reviews patch file, location, etc. All seem correct. MYSTERY # 3 THE CASE OF THE SLEAZY SQL Big Publisher Ltd runs an MRP system called "AVALON," similar to Oracle Manufacturing. Database stores inventory, part information, vendors, etc. Server is ATT3555, running NCR UNIX. RDBMS version is 7.1.6 Issue: Users report that certain common operations are very slow. DBA investigates, and queries v$sqlarea using: Select sql_text from v$sqlarea where Disk_reads/executions > 1000; Query yields troublesome sql statement, with these stats: DISK_READS PER EXECUTION = 5,000 BUFFER_GETS PER EXECUTION = 5,100 MYSTERY # 3 contd Statement has been accidentally designed to ensure worst possible performance by making index usage totally impossible: SELECT * FROM ABC WHERE NVL (COL_W) = NVL (:1) AND NVL (COL_X) = NVL (:2) AND NVL (COL_Y) = NVL (:3) AND NVL (COL_Z) = NVL (:4); TABLE ABC SIZE = 3 mb, about 25,000 rows DBA requests developers to alter statement to eliminate NVL (COL_N) functions. DBA advised that no resources available to make change. Problem: If code cant be changed, what can be done to improve performance? MYSTERY # 4 THE CASE OF THE NON-OPTIMAL OPTIMIZER A large software company based in Cedar Shores has designed a large financials application. Program has been tuned for Rule Based Optimizer. The application runs very well, and is a mature product. Product is used in thousands in companies around the world. Some users clamor for new features--more horns and whistles. The new development team, afraid to become obsolete, wants to convert to Cost-Based Optimizer (CBO). They also wisely consider that Oracle recommends using CBO on new projects. The older developers, now nearing peaceful retirement, predict disaster if the database is switched to CBO, because the execution plans will change. Issue: How can Optimizer be selectively switched to CBO without changing the code? MYSTERY # 5 THE CASE OF THE FORGETFUL MEMORY A new internet-transaction application, ECXpert, and its database have been installed on a Sun ULTRA Enterprise Server. Sun Solaris 2.5.1; RDBMS version 7.2.3. Application appears to run smoothly for several months, although it occasionally creates large dump files. Trace files appear occasionally with ORA-4030 Out of Process Memory and recommends increase process memory quota. New feature: Server seems to hang occasionally. Server reboot fixes. Sysadmin checks kernel parameters related to memory. All correct, and match other servers. Not using any large stored procedures. Problem: What is causing memory/hang problems? MYSTERY # 6 THE RELUCTANT INDEX AFFAIR BACKGROUND: DBA asked to analyze and tune Australia manufacturing database. RDBMS is running CBO. One particularly bothersome sql statement is identified. The "where" condition is perfect for a new index, because of excellent selectivity. Index is quickly created. Table is also analyzed. PROBLEM: Even though index is a "perfect" solution to the query, a full table scan is used instead. MYSTERY # 7 "MYSTERY OF THE HANGING DATABASE" At random intervals, a 7.3.2.3 database "hangs." No trace files, and nothing unusual in the alert log. When problem occurs, no response to new connections requests; over 1200 existing connections "hang." At this point, the DBAs get to know the IS VP on a first name basis. Oracle Support is alerted to priority 1 problem; experts across the world investigate for weeks. Early one morning, DBA happens to be using OEM Lock Mgr tool and notices user who is blocking about 25 other users. The hang occurs soon after. Oracle Australia recommends checking indexes. This suggestion led to the solution. PROBLEM: How did index problems hang database? MYSTERY # 8 "THE CASE OF THE MYSTERIOUS PACKAGE" A new manufacturing application was installed on a SUN ULTRA 3000 server. A small database was created for testing purposes. RDBMS version 7.2.3. Shared Pool size about 60 mb. At first, all went well. Then, seemingly randomly, when the users began to try new features, they would receive a "funny" error message, and the application failed. A trace file recommended increasing shared pool. PROBLEM: How can application fail with such a sizable shared pool? Aside from massive increase in shared pool, what can be done? MYSTERY # 9 THE CASE OF THE UNCOOPERATIVE ROLLBACK In mid-afternoon, DBA (running "OEM Top Sessions") notices many users "ACTIVE" but showing 0 file i/o. Lock manager reveals one user performing big update blocking all. Culprit tracked down--agrees to be terminated. DBA disconnects session. Locks are not released, but user is "marked for kill." Very little file i/o activity. Alert log shows very slow switching of redo logs. DBA performs shutdown abort then startup. Database starts up after 2 minutes. All is well. PROBLEM? Why did user not rollback and release locks? MYSTERY # 10 THE SINGULAR CASE OF THE PHANTOM USERS A manufacturing database in Sydney, Australia, needed performance tuning. Sql tuning on US databases had yielded good results. The table v$sqlarea was queried to find resource-intensive sql statements. Several commonly run statements were isolated. Performance was improved through index additions. Statistics were re-examined over the next 4 hours, in order to confirm improvements. However, repeated look at execution statistics showed no change. DBA puzzles over enigma for several hours, then realizes that NOTHING is WRONG! What did he finally realize? MYSTERY # 11 THE CASE OF THE SLOW PHYSICIAN Health application is experiencing slow run times. Analysis shows following sql statement causing 3000 disk reads select * from view COSIGN_VIEW where doctor_id = DR. MCKENZIE COSIGN_VIEW is a join of 2 tables (DOCS + COSIGN), joined on patient_id (indexed) Search criteria DR.MCKENZIE is very selective; thus, nested loop IS expected choice for optimizer, with DOCS as Driving table. Even with index on DOCS(doctor_id), optimizer (CBO) insists on using hash-join, and refuses to ever use index on doctor_id ! Repeated analyze table commands do not correct. Substituting query not using a view yields expected NL result. Why does using the view cause optimizer to make the wrong choice? SOLUTION TO MYSTERY # 1 OTRACE is the culprit. It is active by default on many 7.3 rdbms versions. Excerpt from Oracle Corp Alert: Problems described here can occur when Oracle Trace is not configured and is widely enabled. TO DETECT Check directory ORACLE_HOME/rdbms/otrace: As size of files process.dat and regid.dat approach 10mb, problems arise. -rw-r--r-- 1 oracle dba 3161424 Jun 05 09:43 process.dat -rw-r--r-- 1 oracle dba 263808 Jun 05 09:43 regid.dat To correct simply remove these two files, then issue command otrccref SOLUTION TO MYSTERY # 1 (contd) TO PREVENT 1) Add line to listener.ora for each database (after ORACLE_HOME): (ENVS=EPC_DISABLED=TRUE) 2) Set and export environment variable EPC_DISABLED=TRUE for ALL USERS. Put standard profile in /etc directory. 3) Restart all databases and restart listener. SOLUTION TO MYSTERY #2 DBA happens to notice that upon sqlplus startup, rdbms is 7.3.2.2 ! The patch was really only applied on the SECOND run. This is apparently a quirk in the patch readme file. The command what oracle (then grep for patch#) can be used to determine which patches are applied. SOLUTION TO MYSTERY #3 CACHE the table! For example, alter table xyz cache; Normally, blocks from full-table scans are designated for rapid age-out; otherwise, they would wipe-out the db cache. Cache of table causes blocks to be treated normally. Caching table disables rapid age-out of this table Logical reads will not be reduced, but disk reads approach zero! Note: DB_BLOCK_BUFFERS was slightly increased to compensate for the cached table that now consumes a few megabytes of database cache. SOLUTION TO MYSTERY #4 Simply substitute a view having a hint for the table needing CBO. For example: rename DEPT to DEPT_ORIG create view DEPT as select /*+ ALL_ROWS */ * from DEPT_ORIG; Now, application will use the VIEW when it looks for DEPT. All queries using DEPT will use CBO. Note: Upon renaming a table, the indexes and constraints will move with the table; however, synonyms and grants may need to be reset. SOLUTION TO MYSTERY #5 DBA checks /tmp (swap area on server) and notes 99% consumed. Investigation reveals that application occasionally goes berserk and consumes ENTIRE SWAP area with log files. Deletion of log files does not return disk space, as application is still holding the files. Reboot of server cleaned up /tmp area, thereby correcting problem. SUGGESTION: If memory-related errors messages exist, check swap area first. SOLUTION TO MYSTERY #6 The values in the table are very "lopsided." Optimizer, however, will assume uniform distribution, which is incorrect in many cases. Re-analyze and specify histogram: ANALYZE TABLE XYZ FOR ALL INDEXED COLUMNS SIZE 75 This creates histogram of 75 buckets for each indexed column. With these statistics, optimizer will know how values are distributed, and will more often make right decision to use an index or not. SOLUTION TO MYSTERY #7 Application design flaw. There are hundreds of foreign keys in the database; 99% had indexes; A few did NOT, violating good design practice. When batch program began updates, locking increased rapidly. Without FK index, updates on parent table completely block updates on child (vice versa for 7.1.6) Ref: Server Application Developers Guide. Although not admitted as RDBMS "bug," RDBMS was completely overwhelmed by the locks. Once indexes on all FKs created, problem disappeared. SOLUTION TO MYSTERY #8 The application used about 20 massive PL/SQL packages. Some are 5x the "standard" package. When a package load is attempted, it will not fit in the shared pool. Memory-intensive packages should be "pinned" or kept in shared pool after rdbms startup. execute sys.dbms_shared_pool.keep (OBJECT_NAME); But first, must find the big packages (will also list Standard): select owner, name, sharable_mem from v$db_object_cache where sharable_mem > 100000; Example script to find "big" packages and generate sql script to pin them in memory: select execute sys.dbms_shared_pool.keep(||||owner||.||name|| ||); from v$db_object_cache where sharable_mem > 100000 and type not in (VIEW, SYNONYM, TABLE) and name not like %SHARED_POOL%; SOLUTION TO MYSTERY #9 If session is terminated, speed of rollback is proportional to init.ora parameter CLEANUP_ROLLBACK_ENTRIES If default value (20) is used, rollback of killed session can take 50x time of original update. Alternatively, shutdown abort/startup cleans up database much faster. Rationale: Parameter prevents rollback of one user from hogging all the resources on a busy system. Solution: Increase parameter to reduce rollback time (since shutdown abort is usually not an option) SOLUTION TO MYSTERY #10 Nothing is wrong, because the users were still asleep, as it was still 5 AM in Sydney. SOLUTION TO MYSTERY #11 Everything seemed to point to a problem with the view, because all worked normally as long as the view was excluded. Finally, DBA compared view definition (in OEM Schema Mgr) to definition seen using describe table syntax. The columns did NOT match! Examining the object-create script revealed that the view switched column names, so that column DOC_ID in the view did NOT match DOC_ID in the table! Once the correct column was indexed, a Nested-Loop Join was selected by the optimizer.  TEN DATABASE MYSTERIES  EMBED PBrush  *=>hkl  M O k s t   ( < > Z b c   = > m n z vw~CFp789QVuz˻˳˻˻˳⫻˻˳6>*CJ OJQJ>*CJ$OJQJ6CJ OJQJ>*CJ OJQJ5CJ OJQJ CJ OJQJ5CJ(OJQJ5CJ$OJQJ CJOJQJ CJ0OJQJ CJ,OJQJ5CJ,OJQJ?*+=>?@ANOhikl45d@&$*+=>?@ANOhikl45deGHRS   M þȴGHmnKOP2deGHRS   M N O s t @&M N O s t 8 9 s t $ % ' ( < = > b c E F   8 = > m n z { TUը󏊅{Jj0 8 9 s t $ % ' ( < = > b c E F  @&  8 = > m n z { TUuvwBCEFpq@&UuvwBCEFpqEF89\/Onzup5Urs[]^?,EF89\/On@@&@&]d[t!(/P^SZ@Cp"7I M """Z#c#$$%4%5%%%%a&o&w&&&&&&'<'K'X'''' CJ(OJQJ5CJ$OJQJ5CJ(OJQJ CJOJQJ>*CJ OJQJ5CJ OJQJ CJ OJQJ6CJ OJQJL#$\]tu)*rsst  RSzupkf1jkV$&yz78y{(#$\]tu)*rsst@&  RSOP@BCqr@&OP@BCqr~89Ŀ~~yto~jtpXYfg _`)r~89 I J L M @& I J L M .!/!x!y!!!""_"`"i"j""""""O#P####T$U$$$%%4%5%%%%%?&@&zupEFJK-.`,M .!/!x!y!!!""_"`"i"j""""""O#P####T$U$$$%@&%%4%5%%%%%?&@&&&A'B'r's''''''((_(`(((((@&@&&&A'B'r's''''''((_(`(((((((c)d))))'*(*)***+*L*M*X*Y*****+-+.+]+œzup K@yz+'''''(((;(B((())))*&*(***C*L*X*****^+u+v++++, ,4,?,,,,,,-- .......E0J0`0b0m0q000112222 2n22>*CJ OJQJOJQJ6CJOJQJ5CJOJQJ CJ(OJQJ6CJ OJQJ56CJ OJQJ CJOJQJ5CJ(OJQJ CJ OJQJ5CJ OJQJC(((c)d))))'*(*)***+*L*M*X*Y*****+-+.+]+^+u+v++ @&]+^+u+v+++',(,,,,,,,,----............//B/W/X/////E0F0G0I0J0a0b000鲭{MPQd578NO0++',(,,,,,,,,----............//@&/B/W/X/////E0F0G0I0J0a0b00011p1q1111112222@&011p1q111111222222 2222222>3?3333333334455B5C55555魨{vqIJ23tvw>`aQR*222 2222222>3?3333333334455B5C55555@&2222233344445#5$5@5556%7j7788 99`9y9":+:::::;\;];b;{;<<<< ==i=l=m======= > >>>>¾jq8 UV jUCJ, 56CJ,5CJ,5CJ jUmHOJQJ>*CJ OJQJ CJOJQJ5CJ(OJQJ5CJ OJQJ6CJ OJQJ CJ OJQJ:55556666$7%7j7k77788o8888888 99y9z9!:"::::::;;^;`;󎉄zupkfa  o|} 8Tk o pGH%55556666$7%7j7k77788o8888888 99y9z9!:":@&"::::::;;^;`;a;b;z;{;;;y<z<==j=k=l=======@&`;a;b;z;{;;;y<z<==j=k=l=======>>>>OPqr=>>>>#&P0= /!"#$%n47KL' `W1`PNG  IHDR[jgAMAPLTE)ƽl, pHYsg >HIDATx햫 P γUI0 cf(}E;$p5`2+%?'lqhPC;Y7 AJ ꅦӍ`G= 9@S CIm&#OI0]p`jsS/W '0Jكifٔ05t0*5D`.Var`W$L@/Um~2CVB^ixa S0 %z8Vf0rQ+ТM%0J0x0 ǀ/{5l;Ô6Uj<{D,@.Dbp2$EYc bFAg+,`Xg*-!@/夒S3 Hv0*|TMSkd㫒vD|<5(vd~0̨; $`+t 07.1ZN MSW"vŝf8bR|#QLsQ| sykFI( %$`Q0JFI( %0ٷz$`tN(Ì,Z+h9Yɻ/Car;ryZ1( #VGN0v=#w؇!atW$f Ce"iPPpYZ#T.F)6=&)?``8)11F0@] s 0kc1:n3Sfz*`Me{ lySvaf@]MD4Zw#2v]I[̵X\i(~&r<d;~ F\ٚ09Z_cAIENDB`sDd[0  # Ab47KL' `W1`Dn47KL' `W1`PNG  IHDR[jgAMAPLTE)ƽl, pHYsg >HIDATx햫 P γUI0 cf(}E;$p5`2+%?'lqhPC;Y7 AJ ꅦӍ`G= 9@S CIm&#OI0]p`jsS/W '0Jكifٔ05t0*5D`.Var`W$L@/Um~2CVB^ixa S0 %z8Vf0rQ+ТM%0J0x0 ǀ/{5l;Ô6Uj<{D,@.Dbp2$EYc bFAg+,`Xg*-!@/夒S3 Hv0*|TMSkd㫒vD|<5(vd~0̨; $`+t 07.1ZN MSW"vŝf8bR|#QLsQ| sykFI( %$`Q0JFI( %0ٷz$`tN(Ì,Z+h9Yɻ/Car;ryZ1( #VGN0v=#w؇!atW$f Ce"iPPpYZ#T.F)6=&)?``8)11F0@] s 0kc1:n3Sfz*`Me{ lySvaf@]MD4Zw#2v]I[̵X\i(~&r<d;~ F\ٚ09Z_cAIENDB`  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEGHIJKLMPSVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~Root Entry\ }E1 wson$  FMyeRData FWordDocument ObjectPool p)`ye_955182961 Fp)`@aOle CompObjMObjInfo  FPBrushPBrushPBrush9qOh+'0 $ @ L X dpxOle10Native UgOle10ItemName1Table;SummaryInformation( gBMg6([c)!TEN MYSTERIOUS DATABASE PROBLEMSrosEN chrisSThri Normal.dotO Chris LawsonS D42iMicrosoft Word 8.0B@r@ I@NW@QDf,՜.+,D՜.+,L hp|   Y^6j DocumentSummaryInformation8 CompObjj!TEN MYSTERIOUS DATABASE PROBLEMS Title 6> _PID_GUIDAN{DD616B23-DDC5-11D1-B5BE-00609735B535}  FMicrosoft Word Document MSWordDocWord.Document.89q [$@$NormalmH <A@<Default Paragraph Font,@,Header  !, ,Footer  !0Z@0 Plain TextOJQJ8Y"8 Document Map-D OJQJ:::!!!!!!!!! ! ! ! !!!!!!!!!!!!!j&< D AK!#)&(*H,./14_7:     '2>"+5<d  rM %(+/25":=>#%'(*-.023689;>?AM U@&]+05`;>$&),/147:=@:t  ,b$47KL' `W1`%@ 0(  B S  ?(  H  C A ?j:N42333333333.4G4H4a4t4444l999:22o3u3334$4]4a4o4s4444444l999:KPNCR6C:\WINNT\Profiles\etopudo\Desktop\dba-wave\mystery.doc Chris LawsonA:\mystery.doc Chris Lawson(C:\TEMP\AutoRecovery save of mystery.asd Chris Lawson(C:\TEMP\AutoRecovery save of mystery.asd AdministratorA:\mystery.doc Administrator(C:\TEMP\AutoRecovery save of mystery.asd Administrator(C:\TEMP\AutoRecovery save of mystery.asd Chris Lawson(C:\TEMP\AutoRecovery save of mystery.asd Chris Lawson(C:\TEMP\AutoRecovery save of mystery.asd Chris Lawson(C:\TEMP\AutoRecovery save of mystery.asd@}}dLL}}:@@GTimes New Roman5Symbol3& Arial?5 Courier New5& Tahoma"qhc&**f*sf,^\20d6G TEN MYSTERIOUS DATABASE PROBLEMSchris Chris Lawson