I explained in a previous post how you can use the Microsoft Log Parser to dice up Autonomy IDOL search logs. If you’ve exhausted the typical checks for performance problems in your IDOL installation, it might help to narrow down when the problems occur, and look for cluster periods of slow performance. That’s a great opportunity to use the Log Parser again.
The first step is to level the playing field on timing information; the content GRL and the DAH GRL show duration information in milliseconds mixed with seconds. I’m sure there is a clever way to correct that in-stream, but I took the brute-force approach: create separate files from first the rows with seconds, then those with milliseconds, and finally produce a single file from the results.
logparser -i:xml -o:csv “select *, mul(to_real(extract_prefix([autn:duration], 0, ‘ s’)), 1000) as milliseconds into over_1_second.csv from http://server:port/?action=grl&format=xml&tail=10000 where [autn:duration] like ‘% s'”
Then the rows under 1 second:
logparser -i:xml -o:csv “select *, to_real(extract_prefix([autn:duration], 0, ‘ ms’)) as milliseconds into under_1_second.csv from http://server:port/?action=grl&format=xml&tail=10000 where [autn:duration] like ‘% ms'”
Then merge the two files into a single file:
logparser -i:csv -o:csv “select milliseconds, [autn:time], [autn:thread], [autn:status], [autn:action], [autn:request], [autn:client] into merged.csv from *.csv”
These three basic steps serve as the basis for most log analysis I do, so I’ve added them into a script. The result, merge.csv is a flattened file that contains the data we need. If you are going to script this, don’t forget to escape the percents, i.e. like ‘%% s’.
Next, we run a quant operation on the logs. I’ve found that a half-hour period makes for a good range to view the average query performance:
logparser -i:csv -o:csv “select quantize(to_timestamp([autn:time], ‘dd MMM yy hh:mm:ss’), 1800) as period, avg(milliseconds) from merged.csv group by period order by period”
The 1800 constant there is seconds, i.e. half an hour. The result is a list, here’s a short snippet:
This report shows clearly that around 11pm, we see a sharp decline in performance. You might also want to add a count(*) clause to the query to highlight the system activity.