Image by rwoan via Flickr
The Case of the Unexplained is an ongoing series of webcasts, by Mark Russinovich, based on real troubleshooting scenarios in Windows. Mark Russinovich is a Technical Fellow at Microsoft, which is the highest technical position at Microsoft. A Senior Architect in the Windows division, Mark was the co-founder and chief software architect of Winternals Software, co-author of Windows Internals 4th and 5th edition and Inside Windows 2000 3rd edition with
David Solomon, author of Technet SysInternals, and contributing editor of TechNet magazine and Windows IT Pro magazine. Mark has a Ph.D in Computer Engineering.
David Solomon, author of Technet SysInternals, and contributing editor of TechNet magazine and Windows IT Pro magazine. Mark has a Ph.D in Computer Engineering.
Mark takes you through the troubleshooting process, step-by-step, and adds a dose of laughter throughout the video. He explains the process simply so that you will not be so overwhelmed by the complexity of operating system internals. I watched two other webcasts by Mark, Windows Hang and Crash Dump Analysis and Advanced Malware Cleaning. I just finished The Case of the Unexplained, 2010, to better understand how to use SysInternals tools. In this webcast, in addition to Process Monitor, Mark explains how he used Process Explorer and AutoRuns in The Case of the Unexplained. He demonstrates ZoomIt which will be the next SysInternals tool I try.
In case you haven't noticed, I am a huge fan of SysInternals. You can check out my analysis of the other webcasts I viewed here at windows-hang-and-crash-dump-analysis and advanced-malware-cleaning. The following is my summary and note taking of the SysInternals The Case of the Unexplained, 2010 Webcast by Mark Russinovich:
Software is written and tested and it is taken for granted that a particular field will be valid, or a file will be present, and that certain paths will be taken, based on choices coded by the programmer. Since humans sometimes make mistakes, the software writer might not have accounted for every possible situation, and an unknown path taken in the code could cause a variety of problems leading to crashes or hangs, and misleading error messages.
Mark says it is worth spending even just a few minutes to try to figure out what happened. Often times that will lead you to a solution. There is a new version of a driver on a website, or there is a application hotfix. In some cases it is just a work around. It might mean uninstalling something until you get a fix, flipping a registry key to disable some functionality. Troubleshooting is as much of an art as it is a science. The more you do it, the better you get and the more intuition you are going to build up about how to solve problems.
According to Mark, there are some fundamental techniques you need to use. You should understand how file system access works, how to interpret registry accesses and the results from registry accesses, network activity, how to look at process trees to understand the relationship between processes, you need to dig into a process to find out what is going on inside of it. Mark considers the number one troubleshooting tool to be "interpreting a call stack".
Sluggish Performance using Process Explorer. In the first example, Mark received a case where the computer was exhibiting sluggish performance. The tool used to resolve the sluggish performance was Process Explorer.
Process Explorer is a Task Manager replacement, Mark explains. The first obvious difference between Task Manager and Process Explorer is the process tree. In the Process Tree, there is a parent/child relationship. Every process has a parent. The system idle process is the root process. Every process is descended from that.
Processes that are left justified are the processes that have no parent. Mark says these processes are called orphans because their parent process has exited. Services.exe is the Service Control Manager (SCM) and every direct descendant of services.exe is a windows service. Process Explorer highlights these in the color pink.
The blue processes are the processes running in the same account as Process Explorer. You can get more detail about a process by clicking on its process properties. If Verify Image Signature is turned on, Process Explorer will try to verify the image signature as a valid binary Microsoft signature.
You can add a comment to the process. Right click the process, go to properties, and fill in the comment field. This is especially good for mission critical servers because you can add a comment to every single process identifying what it is. Then, you can look back later, and if it doesn't have a comment, you know you haven't looked at it and you need to determine what this process is doing.
The performance tab, located in the process properties, shows you real-time performance. Right click the process and go to properties to see the performance tab. The tab showing the performance graph is like Task Manager's system wide performance graph, except focused just on this process. There is a TCPIP tab, a security tab that shows security information for the user including the SID, and you have the Environment Variables tab. The Strings tab is useful for hunting malware. Malware often packs itself and the strings tab allows you to look in memory and see the strings so that you can get a clue to see if it is malicious.
Across the top of the Process Explorer menu, there are Options. Mark says the options he always enables on his system is Hide When Minimized, Allow Only One Instance, Confirm Kill, Verify Image Signatures. You can also configure column sets and add lots of additional columns on the View menu.
Mark says he normally runs Process Explorer in the tray so that he can see the CPU usage. If he sees solid green for an extended period, he knows something is burning his CPU. Even if it is not causing sluggish performance, if I am on a laptop, it's burning my battery.
In this multi-core environment, it's very hard to tell when something is burning the CPU. You don't necessarily feel it. It's burning your CPU, but it is hard to tell.Back to the case Mark is working on. In this situation, the guy has one CPU and it's easy to tell something is wrong. He noticed sluggish behavior, opens Process Explorer, and sees what looks like the problem, Wmiprvse.exe. This is the WMI Provider Host Process. A Hosting process is one that can hosts other components.
Mark says when you see a host process and it looks like the problem, it typically is not the problem. It is something the host process has loaded because the host process is just a container for the most part. To determine what is really causing the problem, you need to turn to threads.
A process is a container. It has an address space associated with it, a security context, a handle table associated with it, which the operating system uses to keep track of open operating system resources. It's got a number of performance characteristics or attributes that you can look at in Perfmon or Process Explorer. Processes don't actually run. It's the thread in the process that runs.
Each process has at least one thread, typically more than one thread. It is the threads that are given CPU time that execute the code in the process. One bad thread in the process, and all the other threads are effected. The process is a boundary. One bad process does not normally cause a problem with another process.
You look inside a process by looking at the threads. If you hover your cursor over a svchost process, the tool tip will show the services it is running inside that svchost. If you hover over the Internet Explorer process, it will show you the tabs running inside Internet Explorer. Now, let's take a look at the threads running inside of the process. The best way to look at threads is to configure symbols. Go to Options, Configure Symbols, and point at the public Symbol Server. This is documented in the Debugging Tools for Windows Help page. (See my Windows Hang and Crash Dump Analysis review to see how I configured Symbols to test Mark's WinDbg.)
Now, look at which thread is consuming the most CPU. In this example that Mark is referring to, it turns out to be kernel32.dll!CreateThread+0x27. This is a generic start address. He needs to dig even deeper into this process.
The way to do this is to look at the thread stack. Programmers divide their code up into functions. Functions are implemented into different DLLs. When one function calls another function, it normally wants to pass the called function parameters. When the second function finishes its processing, it needs to get back to where it was called from. It stores the return address information in the thread stack. When the second function is ready to go back, it checks the return address on the stack, and continues it's execution. You can look at a stack and see the history of how functions called each other. Click on the thread to see the thread stack. Look for a suspicious DLL. Normally a third party DLL will be the problem. In this case, the DLL is AssetAdvisor. Google the AssetAdvisor.DLL and it turns out there is a hot fix for it. Problem solved!
Next Case. The user notices the system is sluggish, so he runs Process Explorer, and sees the system process consuming CPU. He clicks on the System processes threads and sees ACLKWDM.SYS is responsible. He double-clicks on the thread and sees that it is a Realtec AC'97 Audio driver. He goes to Realtec's website and downloads a new version of the driver. Problem solved!
Application Hangs and Process Monitor. Process Monitor is the tool you use for application hangs. Mark says Process Monitor is so much more powerful than Filemon and Regmon. There is more advanced filtering, operation call stacks, boot time logging, data mining views, and process tree to see short-lived process. You have time of day, process name with tool tip to show full path, process id, the operation and path to the operation, the result of the operation, and detail of the operation. You can double-click to see all information for an event. You can add additional columns. You can filter out your display. Right-click to do a quick filter. You can do complex filters. When you create filters, you can check and uncheck it to enable or disable the filter. This is so you don't have to add it back in case you delete it. Mark says the most powerful tool of Process Monitor that he uses is Process Tree. It's just like Process Explorer's process tree. If you are interested in one process, just select it and click on Include Process, and it sets a filter.
Process Monitor and The Case of the Slow Signed Application Start. The user has an application that starts quickly until the user digitally signs it. Mark asked the user to capture a Process Monitor trace. He sets a filter for the app. A pattern of gaps in the time stamps is found. Mark clears the filter and looks at what is going on around it. It turns out the problem is because of a CRL, Certificate Revocation Check. There was a hot fix for it. Problem solved!
Process Monitor and The Case of the SQL Failed Reporting Services Attachment. A customer is unable to attach an image to an email he is sending from the SQL Reporting Services. He ends up comparing the Process Monitor logs from a working system to a failed system to resolve the problem.
Process Monitor, AutoRuns, and The Case of the Blocked HTTP Port. A user complains he is unable to browse the web. The customer gets a connection error from Internet Explorer. He had just migrated machines between domains. The user starts to troubleshoot. After deleting the IE cache, he checks the DNS, gateway, and IP settings, and other outbound ports. It appears to be only HTTP. The user captures a log of the system with Process Monitor and does not see any unusual third party activity. He starts to look at the thread stacks. You do this by clicking on the event. He sees a third party driver in the stack. It turns out he had previously uninstalled something that had left this driver behind. He turns to Autoruns from Sysinternals. In Autoruns, the user turns on the filter to hide Microsoft signed drivers. He unchecks the driver so that it will not run and restarts the system. Mark stresses to always do no harm. It's better to uncheck than to delete.See my review of Autoruns, autoruns-another-awesome-freee-utility from SysInternals.
Process Monitor and the Application Crash. Mark says in most cases there is nothing you can do about it. It is buggy software that just crashes. But, take a look at the process with Process Monitor because it might be a third party plugin that is causing the app to crash. Do a Crash Dump Analysis. Find the crash dump file. Use Windbg to analyze the dump.See my review of Mark's Crash Dump Analysis, windows-hang-and-crash-dump-analysis, where I create and test a crash dump using Windbg, or, check out Mark's webcast Windows Hang and Crash Dump Analysis, where he describes how to analyze crash dumps and blue screens.
That was very long and perhaps the most informative article I have ever seen pertaining to Windows malware cleaning. Thanks a lot for sharing.
ReplyDeleteemail filtering services