Tips, Tricks and Some Missing Documentation for Solarwinds NCM (v6)

Introduction

(This is one of those that I probably can't give you the code for. Instead, I'll tell you how to achieve it yourself.)

SolarWinds make some truly excellent software. It's not necessarily the functionality that they include that makes the difference, it's the simplicity and separation of functionality into independent tools which together form a pretty sophisticated package. As the title indicates, there's some functionality that doesn't seem to get properly documented and that doesn't sit well with me. Here then, are a few nuggets which will hopefully be of use to someone else. Where appropriate and legal (some of these bits are now technically the intellectual property of my employer. While this is something they would never (choose to) make money off in any way, shape or form, the fact remains that ownership is not mine in all cases) I have supplied information to the SolarWinds user community @ thwack.com.

FTP support, of sorts

NCM supports almost every common mechanism for retrieving configs from devices. It doesn't, however, support vanilla FTP (SFTP is supported). I happen to have a fairly significant inventory of APC appliances, and the first device that I've tried to manage with NCM was the AP7721. It has a built-in management module which gives CLI and SNMP access, an optional FTP server and a web interface. The only way to get configs off them is to connect to the built-in FTP server (if it's enabled, which it is by default) and retrieve config.ini. As I said, this isn't possible with NCM as it stands. You could execute an external program in a scheduled task to do the FTP downloads, but this is only half the battle: NCM puts its downloaded configs into the database and doesn't appear to offer any way to scoop these externally downloaded configs up.

The database is very straight-forward (I'm assuming you know this runs on an SQL Server back-end, either an included SQLExpress install or your own SQL Server). There are a couple of confusing JOINs required amongst the many tables required to get meaningful information out of some aspects, but the ConfigArchive table is simple. A unique identifier ConfigID, unenforced foreign key on the Nodes table NodeID, a few datetimes (created, modified etc.), a Running vs. Startup ConfigType field, Comments and then the actual Config. All you need to know is the NodeID for the device you want to associate the config with; the ConfigID needs to be unique, but in practice it looks fairly trivial to avoid collisions with NCMs own insertions: just take a look at how long that string is, and it's in hex. I've picked a 'base' string that gives me 12 hex digits to play with. I decided to use time() in perl to give me seconds since epoch, converted it to hex and then left-padded it with 0s for the last 12 digits. This leaves me with an obvious problem if more than one instance of my script runs at exactly the same time, generating identical ConfigIDs: the latter INSERT will be rejected to avoid violation of the primary key constraint. If recreating this for yourself, you should probably take that into consideration. For me, it suffices to declare it a known feature and avoid it happening! (Turns out that I needed to restructure my code and ended up needing to re-implement this, so here's the way I've now done it:) I'm using time() in perl to give me seconds since epoch, converting it to hex and then appending (left-padded with zeros to 12 digits) the current index of the Node in the resultset. This obviously allows for a considerable number of devices to be processed every second without collision.
My SQL takes 2 queries: one to pull out the appropriate devices (SELECT NodeID,SysName[,...] FROM Nodes WHERE DeviceType LIKE 'APC%';) and one to INSERT the Config into ConfigArchive.

The rest is pretty trivial: we have a scheduled task to execute a perl script that connects to the NCM database and grabs all of the appropriate device names and NodeIDs from the inventory for a given device type, it downloads via FTP (I used wget) the config of the device (unfortunately you need static credentials, or some alternative way of passing authentication, since the credentials in the database are proprietarily encrypted) and stuffs the whole lot into the ConfigArchive table, carefully emulating NCMs style/conventions for consistency. Job done. Hit 'Refresh' in NCMs Node List, and the config should be picked up and added to the node properties. You can now have Policy Reports run against it and get the benefits of the Config Comparison feature as well.

Terminal Server Support? Beware

In the Node Properties (for an existing node, or indeed on the 'Add Node' form) there is an option to allow Terminal Server Support. This doesn't seem to get any explanations in the documentation. There's some info on adding devices which are behind terminal servers (though it's a little on the light side) but nothing that I've seen explains what switching 'No' to 'Yes' does for this option. A SolarWinds support engineer mentioned it before I saw the change in behaviour myself, but as it turns out it doesn't play nicely with Raritan Dominion SX Secure Consoles which have Security Banners set... I thought maybe this option would lengthen the connection timeout, or maybe watch for a second round of authentications (one for the terminal server, one for the device console). What it actually does is sends a solitary CRLF to the remote system immediately after passing the password (so that would be password, enter, enter) looking for 'TACACS/RADIUS validation' (from the session trace/debug of a login session with this option set). For my use-case, this breaks acceptance of the Security Banner (default response is 'no', that CRLF to check for TACACS/RADIUS ends up issuing an empty response to the banner which becomes default and then the session is terminated for authentication failure) and so I can't enable it: I've no idea if it offers any other subtle differences, so far SolarWinds haven't given me a detailed response!

For my needs, I've modified a copy of the Cisco IOS device template (look around in your NCM directory, have a browse) to support a few extra PreCommands (which have extra available arguments 'RegExp=exp' where exp is a regular expression to match in the CLI output and 'Delay=x' where x = number of seconds to wait after seeing the regexp match before sending the 'PreCommand' text - this is documented in a few places, but not much of the documentation is complete) and that's all I find is required.

Device Templates: Watch out for custom 'More'

Device Templates are a little lightly documented for my liking. Some of the documentation is pretty good, but it's inconsistent and not all where I'd expect to find it. One piece of information that the documentation utterly overlooks is the way the parser deals with custom 'More' prompts ('Press SPACE for More---' type prompts in long output listings). One of the examples confusingly refers to a custom 'More' prompt that indicates 'Enter' is the key it's looking for. Again with the Raritan Dominion SX, I was hacking a Device Template to this time pull the 'config' of the Secure Console itself (note, though, that the 'config' is unusable as a backup or command-script.. it is simply (and annoyingly) a textual representation of the current system configuration!) and found myself with config downloads seemingly timing out and retrying, until they hit the max retry limit. Confusingly in the traces, the config had started to be displayed, but NCM didn't seem to be requesting more pages of output. Turns out that the 'More' parser only knows to respond with a 'space'. And the Dominion SX's more of choice only responds to the 'enter' key. The hack is to set up multiple PreCommands which send strings of carriage returns (in practice you probably only need one or two: they'll be sent too quickly for the processor in the DominionSX to act on it meaningfully, since it takes so long to produce the next page of output). Every time the CLI sits waiting for input and reaches its current timeout (I believe that the timeout grows over time, like a back-off) it will execute the next PreCommand. You need enough of these to successfully 'More' through the whole config output. Beware of only testing on devices with small configs! I hadn't realised quite how 'chatty' the section of the config that dealt with the serial lines was compared to the other sections of config, and hacked the Device Template from a relatively unimportant 8 port unit. When I extended testing to live devices, I struck the problem again and realised that I had to near double the number of PreCommands (not to mention now having enough time to make a coffee whilst waiting for the config download)!