You are probably all familiar with the debugger DataTips that let you see the value of a variable when you hover over it in the editor. This post introduces the new PerfTips feature in Visual Studio for C#/VB, C++, and JavaScript developers on all Windows platforms. We have just released an early preview of this new feature that you can try out today by downloading the latest CTP of Visual Studio “14”. If you do that, be sure to give us your feedback by signing up to receive a short PerfTips survey.
Why PerfTips?
We have been releasing new and improved performance tools in the Performance and Diagnostics hub in Visual Studio 2013 that help you diagnose performance issues in your app. However, for many of you, the majority of your time is dedicated to getting the app to function correctly (as opposed to focusing on the performance of your app). You would like to understand the performance of the code you write, but you might not have the time to learn and run the performance tools until performance becomes a big problem (which is sometimes too late). Furthermore, when you do want to investigate performance problems, you often just need a simple stopwatch measurement of how long it takes to execute isolated sections of code rather than the detailed top-down analysis provided by the hub tools.
We know that instead of running profiling tools, many developers do one of the following things:
- Insert code into their app (such as System.Diagnostics.Stopwatch) to measure how long it takes to run between various points in code, iteratively adding more stopwatches as needed to narrow down the hot path
- Step through code to see if any particular step “feels slow”
- Hit the Break All (“pause”) button at random points to get a feel of how far execution has progressed through their code, e.g. a loop. In some circles this is referred to as “poor man’s sampling”
- Over-optimize code without measuring performance at all, sometimes by applying a set of performance best practices (e.g. don’t use LINQ) across an entire code base
These practices are typically not very accurate, not a good use of time, or both. That’s why we introduced PerfTips: to help you to understand the performance of your app during normal debugging. Anyone that uses the debugger can use PerfTips to do a quick measurement on any section of code and find unexpected performance issues simply by setting breakpoints and/or by stepping through code with the debugger. With PerfTips you don’t need to run a separate set of tools or insert stopwatches into your code to take a simple performance measurement.
What are PerfTips?
As you step over lines of code or run from breakpoint to breakpoint, the debugger will display a PerfTip (tooltips with performance information) in the editor at the end of the code line indicating how long the program was running during the previous step or since the last breakpoint. All you need to do is look at it!
More specifically, any time that you are stopped on a line of code in the debugger, the debugger will show a PerfTip on that line of code if there are any functions on the call stack that were also on the call stack when the program was last started or resumed. For example, if you are stopped in Button1_Click(), press F5 and hit a breakpoint in a child function called by Button1_Click(), you will see a PerfTip. However if you are stopped in Button1_Click(), press F5, then click Button2 and hit a breakpoint in Button2_Click() (which is not on the same call stack as Button1_Click) you will not see a PerfTip.
You can even hover your mouse over the PerfTip to see the CPU time of the app while the program was running. Notice how there are two numbers in the tooltip, the elapsed time that you see on the PerfTip itself, plus the CPU time.
In most cases the CPU time will be less than the elapsed time because the execution measured will include non-CPU intensive activity such as I/O. The CPU time can also be greater than the elapsed time if multiple threads and cores are being used concurrently by your app. This is because CPU time is counted as the total amount of time each individual core on the CPU was executing code.
Notice in the screenshots above, that the PerfTip is shown after the line of code that will be executed next. We decided that, while potentially confusing at first, this was the best place for PerfTips so that the location would be the same for both the single line and the multiple line cases (as shown in the next section).
You can customize PerfTips by going to Tools -> Options -> Debugging -> Performance Tools, or you can go directly to that options page by clicking on the PerfTip and selecting “Performance Tips Options…”.
From here you can set options so that PerfTips only show up when the values exceed a certain threshold, and you can also enable the CPU time PerfTip to see the CPU time at-a-glance without having to hover on the PerfTip – see the following screenshot of what that looks like.
Keep on reading to see how to use PerfTips in action!
PerfTips in Action
To show you an example of how it works, we will take a look at improving the startup performance of a C#/XAML Windows Store app. This app loads and displays pictures from the local pictures library and from the cloud, and we want to see how much time it takes to load images while the app is starting up.
Instead of inserting stopwatch code into our app, we can measure how long loading images takes by setting breakpoints where we would have started and stopped the stopwatch. In this case, we have set breakpoints at the start and the end of the LoadImages method.
To take our first measurement with PerfTips we start debugging and wait for the first breakpoint to be hit (on line 67). Then we press F5 or click Continue to run to the second breakpoint and see the PerfTips for this section of code:
Here we are at the second breakpoint (on line 80), and the PerfTip is displayed. In this case we have configured PerfTips so that both elapsed time and CPU time are shown:
From the information presented, we see that the app was running for 2,780ms between these two breakpoints and during that time 985ms of CPU was used on all CPU cores. It makes sense that the CPU time is much less than the elapsed time because the app is not going to use much CPU while it is waiting on network or disk I/O.
Side TIP: Instead of using breakpoints, we could have also taken a similar measurement by using Run To Cursor (Ctrl+F10, or right-click -> Run To Cursor) first with the cursor on the first line of the function and then again with the cursor on the last line of the function.
Now that we know how long this particular code path takes to run, we can run the same scenario again and after the first breakpoint is hit step over each line to see which lines are taking the most time. For those who are stepping over code to measure performance, you can now do the same thing and see more accurate measurements for each step. As we step over the first few lines we see that they take less than 20ms each, but when we step over GetImagesFromCloud (line 72) it takes 1,391ms to run and uses 266ms of CPU time. Here we are before stepping:
… and then after stepping over GetImagesFromCloud (line 72):
Then we step over LoadImagesFromDisk (line 73) and see that it takes 1,361ms to run and uses 563ms of CPU time:
Finally we step over the LINQ query (lines 75 and 76) and see that it takes 259ms and uses 156ms of CPU:
As we step over the remaining lines they each take less than 10ms to complete.
By stepping through the code we were able to determine that getting images from the cloud and loading images from the disk (statements on lines 72 and 73) both take about the same time and use relatively little CPU. We can likely improve performance by doing both concurrently, especially since these two operations will not be competing for CPU resources. Instead of waiting on both sequentially, we change the code to kick off both tasks at once and then await on both to complete. Running between the same two breakpoints as before, we measure how long it takes to load images with this change.
After the change this section of code completes in 2,079ms which is about 700ms faster than the 2,780ms it took initially, a 25% improvement! This is an example of an optimization you could easily make while you are first writing and debugging this area of code.
If you are optimizing code without measuring at all, you can now take these kinds of quick measurements with PerfTips to make informed choices about where you spend your time. For example, before you may have considered improving the performance of the LINQ query by generating the collection of images manually, but now we know from the PerfTips that this line only takes around 250ms and we can expect to make image loading at most 10% faster by applying such an optimization.
Now that we’ve given you a tour of PerfTips, we’ll go over some best practices that you should keep in mind when using this feature.
Best Practice: Take Multiple Measurements
Performance naturally varies from run-to-run, and code paths typically execute slower the first time they are executed in a given run of the app due to one-time initialization work such as loading .dlls, JIT compiling methods, and initializing caches. Taking multiple measurements will give you an idea of the range and median of the metric being shown and will allow you to compare the first time vs. steady state performance of an area of code.
Best Practice: Verify in Release
If you want to make improvements on pieces of code that run for less than 50 milliseconds, or you are about to invest a lot of time in rewriting code to make it run faster, we recommend that you switch to the Release build of your app and use the tools that are available in the Performance and Diagnostics hub to measure the performance of your app without the debugger. Using the hub tools on a Release build without the debugger will more accurately measure the performance that the users of your app see, and provides a much more detailed breakdown of where time is being spent. If you are not familiar with the hub, you can check back here (or on the on the Diagnostics blog feed) in a few weeks for a follow-up blog post which will show you how to use the hub tools to measure the exact same data that you see in our earlier PerfTips example.
You can also change how you use the debugger in order to make PerfTips more accurate. When you are running with the debugger, there is some overhead included in the PerfTips when the debugger pauses and resumes the target process. While this overhead is normally small (a few milliseconds), the debugger may sometimes have to pause and resume the target process several times during a single step, making the overhead much larger. These pauses can also change the performance characteristics of code when it is executing. For example, cache performance on the CPU will be better when code is executing in a tight loop without breaking. To mitigate these effects you can measure larger sections of code by running between breakpoints as shown in the first example in this post.
In sections of code that are highly sensitive to compiler optimizations, the CPU time shown in debugging or when the debugger is attached can be significantly higher than what it is in release mode. This is because code optimizations such as inlining are not applied and extra validation such as array bounds checking are added in Debug builds. This allows you to have a better debugging experience, but with extra run-time overhead. You can mitigate this by switching to a Release build and debugging the optimized version of the code. If you are debugging managed code you will also need to uncheck “Suppress JIT optimization on module load” and “Enable Just My Code” in Tools -> Options -> Debugging. However, be warned that you will likely encounter breakpoints that fail to set and the yellow arrow moving unexpectedly when stepping. So, if you would like to verify the release performance of your app, as stated earlier, using the tools in the Performance and Diagnostics hub is the recommended approach.
Known Issues in Current VS “14” CTP Releases
We wanted to make this feature available for you to try as soon as possible. In order to do that we have released PerfTips in the early preview builds of Visual Studio “14” without waiting to fix all of the known (and unknown) issues. The following are the issues we know about:
- First-chance exceptions can add 10s-100s of milliseconds to PerfTips per exception. If you suspect this is the case, glance at the Output window to see if a first-chance exception occurred.
- Occasionally a PerfTip will not be shown if a large number of modules were loaded since the app was last resumed. Again, the Output window indicates when modules are loaded.
- When using the PerfTips on the Phone or on VMs, simple lines of code that should complete in well under 1ms show as taking 10s of milliseconds.
If you find any others, please take a moment of your time and let us know so we can fix them before the final RTM release! Read onto the next section to find out how to share your feedback with us.
We Want Your Feedback!
This is a new way of approaching performance analysis that is easy to use and is much closer to how you understand your code when you write it and debug it. We think you’re going to like PerfTips but we would like to make it as useful as we can to you. If you have feedback for us, you can immediately send us general comments in Send-a-smile, feature requests on UserVoice, and submit issues you run into on the Forums.
In addition to that, you can sign up now to receive a PerfTips survey and send us your in-depth evaluation of the feature.