On a recent client project I’ve been trying to use Windows’ built in performance monitoring tools to monitor machine health. You can programmatically (or manually using the system provided Performance Monitor tool) set up a data collector that will sample aspects of system performance (such as CPU idle time, disk throughput and space, and a whole lot more) at a specified interval, which you can then have logged to a file in a selection of formats. Whilst on one hand, it’s a pretty nice feature, I’ve spotted some oddities around how it handles timestamps in those files, and I thought I’d write those up here as I failed to find anywhere else warning of these issues. The following is by large taken from a stack overflow post I made hoping someone might correct me; if you know better please do let me know.
If we look at some sample data from a performance monitor file, it might look like this:
"(PDH-CSV 4.0) (GMT Standard Time)(-60)","\\MACHINE-NAME\% C2 Time" "10/29/2017 01:59:44.562","88.359340674585567" "10/29/2017 01:59:59.562","93.754278507632279"
Here’ I’m jut monitoring one metric, and the system inserts a timestamp next to it, and the header contains information about the TZ of the machine and the offset from UTC in minutes. However, I noticed whilst reviewing some data captured over the last few months that when the UK moved from GMT to BST that there was just a gap in the data for an hour, and upon some fiddling with my machine’s clock I managed to show that going the other way there was duplicate timestamps for an hour. This is not good: if we look at the longer version of the above data:
"(PDH-CSV 4.0) (GMT Standard Time)(-60)","\\MACHINE-NAME\% C2 Time" "10/29/2017 01:59:44.562","88.359340674585567" "10/29/2017 01:59:59.562","93.754278507632279" "10/29/2017 01:00:14.562","89.834673388599242" "10/29/2017 01:00:29.563","94.014449309223536"
Because the TZ offset is only stored in the column header, it has no way to indicate the fact that the local timezone offset changed during the recording of this file, we just see a second set of data for that hour. This makes trying to reason about this data during a daylight savings change very hard.
I thought perhaps this might be a limitation of the CSV data format, so I tried the binary format. The binary output of Performance Monitor is not documented, but there are PowerShell bindings to let you query the data. So I had a look at the same data in binary format:
# $counters = Import-Counter -Path mysamples.blg # $counters.Timestamp 29 October 2017 01:59:59 # $counters.Timestamp 29 October 2017 01:00:14 # $counters.Timestamp.IsDaylightSavingTime() False # $counters.Timestamp.IsDaylightSavingTime() False # $counters.Timestamp.ToFileTimeUtc() 131537159995620000 # $counters.Timestamp.ToFileTimeUtc() 131537124145620000
Again, not only are the timestamps not timezone aware, it’ll happily tell you that time goes backwards at one point. I’ve had a look through the API documentation for setting these up programatically, and had a play with the UI and I can’t see a way to correct this.
I had a look with how the Windows tools cope with graphing data around these incidents, and the answer is they don’t. The either show a missing hour in the graph, or they’ll squash the “duplicate” hour into a single sample that is averaged.
Timezone information is hard, I’m struggling to see why Microsoft didn’t store the timezone in UTC per sample and let the reading program deal with any view related TZ offsets. As it is this makes it quite hard for tools that average data to reliably work over the transitions, particularly if you have to account for things like the machine being intermittently up. You can try and work out the local daylight saving time for the given timezone, but that’s helpfully written down in human readable form that you’ll have to translate yourself.
If I’m missing a trick here that makes this all go away, let me know, either by contacting me directly or better yet on the stack overflow question I posted. But given Microsoft’s own tools for processing the data don’t deal with this scenario I’m not that hopeful. Otherwise hopefully this post will at least save others the head scratching I did trying to work out what was going on in my data.