Opinions sought on information presentation

Thu Apr 21 15:57:50 PDT 2011

On Wed, Apr 20, 2011 at 01:43:26PM -0700, Ramin K wrote:
> ...
> 	Have you looked at Collectd? It does quite a bit out of the box and 
> 	is pulling in the same stats you are though I do like the looks of your 
> graphs. What I enjoy about Collectd is that stats collection happens 
> locally and is periodically pushed upstream where it is eventually 
> flushed out to rrd files. Makes it easy to automate installation of 
> graphing in most environments. http://collectd.org/

No, as one of my requirements was to be able to do this on an
unmodified FreeBSD system.

Further, while I have used RRDtool with earlier variations on this
theme, I found that my (admittedly naive) approach of dumping all of the
captured data for a given machine (for example) into a single RRD proved
singularly awkward, as RRD data source names have some peculiar
restrictions ("A ds-name must be 1 to 19 characters long in the
characters [a-zA-Z0-9_].") which I circumvented by mapping the "real"
names (of the sysctl OIDs) to some names that RRD could accept -- but
that adds another layer of obfuscation.

In addition, one of the more critical things I was doing with some of
the gathered data is doing a small amount of statistical analysis on
metrcis of critical importance; in the cases in point, elapsed time for
the workload to complete (successfully -- comparing the elapsed time of
incomplete or otherwise unsuccessful runs is worse than useless).

By using R to generate the graphs, it's very easy to generate (for
example) a set of boxplots to show the relationships between
configuration and performance (as measured by elapsed time).  Of course,
one needs to run sufficient tests of the same workload in the same
configuration to be able to have any basis for a statistical treatment:
an isolated measurement may be useful as a reality check, but it is not,
by itself, statistically significant.

(I was also using phk's ministat(1) to advantage, but it is intended for
a relatively small number of distinct configurations being measured, and
really isn't designed to scale beyond that.)

> 	Here are a few graphs from one of my systems generated with the 
> included cgi script. I believe you can make nicer graphs from the source 
> rrd files, but I haven't had time to look at that yet.
> 
> http://badapple.net/images/cpu0.png
> http://badapple.net/images/cpu1.png
> http://badapple.net/images/load.png

Interesting.  One difference in intent between those and what I
provided earlier is that the latter are intended to show (selected)
resource usage of the system only during the time the workload under
test was being tested.  While this tends to be less informative if the
workload under test is competing with significant other workloads, that
was not the situation in the tests that I was running.

Also, I find it generally (though not always) more useful to "scale" the
load averages by dividing them by the number of CPUs the scheduler sees.

I note, too, that I'm actually capturing >120 other metrics (in
addition to the CPU utilization & load averages); I haven't figured out
how to depict them necessarily, but I figured that if I captured the
information, I'd at least be able to depict them at some later point
(after I figured out what to do with them).  In theory, at least, I
should be able to generate graphs showing (e.g.) how the memory usage
(as reported by top(1)) changes during the course of the workload.

Peace,
david
-- 
David H. Wolfskill				david at catwhisker.org
Depriving a girl or boy of an opportunity for education is evil.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://www.baylisa.org/pipermail/baylisa/attachments/20110421/92f914f6/attachment.bin>