[tortech] Network / Service Monitoring

Discussion:

(too old to reply)

erol

2005-09-29 15:29:35 UTC

What are people using for monitoring servers, switches , services etc?
I'm trying to put together a monitoring system at work but I am not
finding anything that suits my needs , without having to spend
beaucoup dollars.

My main requirements are:

i. Must do more than a simple connectivity check. For example I should
be able to configure it to do a GET on port 80 and have it evaluate
the response time and code
ii. I'd love it to be distributed. Ideally I'd spread the monitoring
across a couple of our data centres in Toronto and Montreal. The idea
behind it being distributed is that it is also electoral, so if 3 of 4
servers say a machine is dead then send me a page at 2am.
iii. This is an absolute must, I have to be able to use SNMP to
monitor some things (ie: disk space).

I've read through various products - Nagios, Argus and a few others -
and they all seem to have the features but not one of them is
close enough to perfection to make me happy.

What are others using? Any recommendations?

--
erol
***@samurai.com
"You can pretend to be serious; you can't pretend to be witty."" - Sacha Guitry

Bryan Fullerton

2005-09-29 15:46:00 UTC

Permalink

On 29-Sep-05, at 11:29 AM, erol wrote:

| I've read through various products - Nagios, Argus and a few others -
| and they all seem to have the features but not one of them is
| close enough to perfection to make me happy.
|
| What are others using? Any recommendations?
|

I use Nagios. I'm quite happy with it, but I don't have the
distributed requirement.

Bryan

Blake Crosby

2005-09-29 16:04:34 UTC

Permalink

Erol,

At CBC here, we use Nagios, although not in a distributed manner (it is
possible).

We are monitoring over 150 hosts, and over 900 services. Some of the
more interesting checks we do are:

- Make sure that a specific web application returns a known value when
doing an HTTP GET.
- Count the number of messages per minute passing through a mail server,
and notify accordingly.
- Check to see if a live stream is up and running (by doing a RTSP DESCRIBE)
- Keep track of the time it takes for an email to be delivered from one
mail host, to another, and warn accordingly.

- We are also able to monitor hosts not on the same network as the
monitoring box using NRPE (which is like a nagios proxy) that runs on a
box that is on that network.

Most of these checks were custom written by us. In some cases (like the
first example) nagios is set to automatically run another script that
will restart the web application automatically if the status ever
becomes "critical".

Blake

erol wrote:
| What are people using for monitoring servers, switches , services etc?
| I'm trying to put together a monitoring system at work but I am not
| finding anything that suits my needs , without having to spend
| beaucoup dollars.
|
| My main requirements are:
|
| i. Must do more than a simple connectivity check. For example I should
| be able to configure it to do a GET on port 80 and have it evaluate
| the response time and code
| ii. I'd love it to be distributed. Ideally I'd spread the monitoring
| across a couple of our data centres in Toronto and Montreal. The idea
| behind it being distributed is that it is also electoral, so if 3 of 4
| servers say a machine is dead then send me a page at 2am.
| iii. This is an absolute must, I have to be able to use SNMP to
| monitor some things (ie: disk space).
|
| I've read through various products - Nagios, Argus and a few others -
| and they all seem to have the features but not one of them is
| close enough to perfection to make me happy.
|
| What are others using? Any recommendations?

Stephen van Egmond

2005-09-29 16:15:53 UTC

Permalink

erol (***@samurai.com) wrote:
| What are people using for monitoring servers, switches , services etc?
| I'm trying to put together a monitoring system at work but I am not
| finding anything that suits my needs , without having to spend
| beaucoup dollars.

I use http://www.alertra.com/ . It watches pages on my client's systems,
e.g. http://tinyplanet.ca/uptime.html .

Those are actually php pages that sample a
variety of systems for signs of good functioning, and either (a) prints
the error message, which alertra logs, or (b) prints the "all's clear"
string.

n.b. don't use "success" as your string; an "unsuccessful"
error message won't trigger pages.

If that string isn't there, ICQs, SMSs, and emails go out.
I think ICQ just recently broke for them -- the network owner cut them off.
It's 2.95 per month per target for monitoring every 30 minutes from
three different continents.

| ii. I'd love it to be distributed. Ideally I'd spread the monitoring
| across a couple of our data centres in Toronto and Montreal. The idea
| behind it being distributed is that it is also electoral, so if 3 of 4
| servers say a machine is dead then send me a page at 2am.

(I think) my system waits for a confirmed "down" from 3 locations before
sending pages.

| iii. This is an absolute must, I have to be able to use SNMP to
| monitor some things (ie: disk space).

In parallel with Alertra, I use Debian's snmp daemon with mrtg to cough
up graphs. http://harlan.tinyplanet.ca/graphs/

| I've read through various products - Nagios, Argus and a few others -
| and they all seem to have the features but not one of them is
| close enough to perfection to make me happy.
|
| What are others using? Any recommendations?

I place a huge personal value on simplicity, and I've found the above
setup to work well by that metric. The suckiest part was using (and
parsing the output of) snmpwalk to figure out which OIDs meant what so
that MRTG could chart them.

Bryan Fullerton

2005-09-29 16:57:29 UTC

Permalink

On 29-Sep-05, at 12:15 PM, Stephen van Egmond wrote:

| In parallel with Alertra, I use Debian's snmp daemon with mrtg to
| cough
| up graphs. http://harlan.tinyplanet.ca/graphs/

I use Cacti to do ongoing capacity monitoring.

Debian has their own SNMP daemon? ;)

Bryan

Julian C. Dunn

2005-09-29 16:51:32 UTC

Permalink

On Thu, 29 Sep 2005, erol wrote:

| What are people using for monitoring servers, switches , services etc?
| I'm trying to put together a monitoring system at work but I am not
| finding anything that suits my needs , without having to spend
| beaucoup dollars.

Someone mentioned this recently at work:

http://www.itgroundwork.com/products/gw-monitor.html

It seems to be a pretty good idea -- tying in elements like Nagios with
other monitoring tools and systems. I'm not sure whether it is explicitly
distributed or whether that would come out as a result of an
implementation.

ITgroundwork also has a nicer front-end to Nagios that is, I believe,
freely available.

- Julian

[ Julian C. Dunn <***@aquezada.com> * "You can throw confetti, ]
[ WWW: www.aquezada.com/staff/julian * but you're still going ]
[ PGP: 91B3 7A9D 683C 7C16 715F * through the motions, baby" ]
[ 442C 6065 D533 FDC2 05B9 * - Aimee Mann ]