Much has been written about the free Microsoft Log Parser, a simple command-line tool that can access and parse log files from a number of sources, execute SQL-like queries against that data, and present results. Did I mention it’s free?
Autonomy services will drop log files everywhere (literally all over the place), in different formats, and the challenge is to merge all that log data into a single store in order to get a handle on the big picture. For instance, a single query against the IDOL server shows up in several logs:
- the GetRequestLog
- content_index.log
- possibly the OGS query log (if you have securityinfo)
- possibly a DAH log (if you’re distributing/mirroring)
How can you aggregate all that information to get a single picture for performance analysis and forensics? And what about aggregating in other trace information, like application trace logs and IIS logs? Use the Log Parser. The first example I’ll give here is a simple query against the GRL–that should provide a view of the current queries that the IDOL is servicing. I am using the latest version of the Log Parser (2.2, from Jan 2005)–download and install, then either copy to your %SYSTEM32% path, or simply add the install directory to your path, and run the following from a command-line:
That opens a pretty little window for you to scroll through. You can modify the url with “&tail=[somenumber]” to return a different count of rows (the default is 100). There are a couple of parameters for the output type (DATAGRID), one is the autoScroll, which is on by default. This scrolls whenever new data shows up, but does not work with URLs, so you will have to re-run the command-line to get an update.
Let’s look at a slightly more complicated query. I’m working with a client on query performance, and we’re studying why certain queries take longer than others. Most queries take under a second, but every once in a while, they take longer. With a simple query, we can look at exactly the information we need:
select
mul(to_real(extract_prefix([autn:duration], 0, ' s')), 1000),
[autn:request]
from
http://server:port/?action=grl&format=xml
where
[autn:duration] like '% s'
We limit this to rows with a duration in the format ’1.62 s’, then turn the value into milliseconds. Removing the [autn:request] column from the select, and surrounding the mul() operation with an AVG() gives you a handy number on average query time over 1 second. Make sure to add a more meaningful depth, with something like &tail=10000 to your URL.
I’ll look at more complicated queries next.
Thanks so much for the information. I am starting to work on the same type of thing for our implementation. The one area I was wondering if you could elaborate more on was the utilizing the where clause and dates. I am running into a 5000 item limit when I try to do a larger query. That being said, I need to setup a script that will run every couple of hours and then at the end of the day combine those files into one larger one for the day. I’m having an issue filtering on the autn:time tag.
Any additional information you can provide will be appreciated.
Thanks
Marc Weigum
EMC