Thursday, October 24, 2024

Integrating Zabbix with BigPanda

BigPanda has made its way into the organization. I wasn't sure at first why, given that there's no shortage of Network Monitoring OSS / EMS systems in play. 

Many vendors use their own EMS. VMware for example, uses VROPS (vRealize Operations Suite - now known as Aria Operations). So there is and has been a use case for consolidating this information from these disparate monitoring systems into a "Northbound" system. 

So that's what BigPanda is, I guess. It was pitched as a Northbound system. It does not seem to be very mature, and it is simpler to use than most of them (based on limited inspection and reading). But the business case pitch is that it has an Artificial Intelligence rules engine that provides superior correlation, and if this is true, it could certainly make it a northbound system worthy of consideration.

So - that is why we stepped in to integrate Zabbix with BigPanda. We already have VROPS as our "authoritative" monitoring system for all things VMWare. Our team, which does use this VROPS, does not own and manage that platform (another group does). I believe they use it to monitor the vCenters, the hypervisors, and datastores.  I don't think they're using it to monitor tenant workloads (virtual machines running on the hypervisors).

Our Zabbix platform, which we manage ourselves, is a "second layer of monitoring" behind VROPS. It manages only the VMWare Hypervisors along with some targeted specific virtual machines we run (load balancers, cloud management platform VMs, et al).

The BigPanda team wanted to showcase the ability to correlate information from Zabbix and VROPS, so we volunteered to integrate the two systems. 

Integrating the two systems was as simple as adding a MediaType in Zabbix, with a trigger alongside it.  The MediaType, which is of type Webhook, when you import it using a BigPanda template, has a number of macro variables. Four of these need to be filled out to test the integration. API Key, Token, Url for BigPanda, and the Zabbix url.

In addition to the Media Type and Trigger, you also need to add a user and user group for the integration.  I don't recommmend making either the user or user group an Administrator. What you can do instead, is give them No Front End Access (no GUI access) because it is, after all, an integration. Then, give the user group read only access to the host group. In our case, this host group was the VMWare/Hypervisors host group that lights up and becomes available when you add a vCenter host to Zabbix and select the VMWare template for that vCenter (we only have the hypervisor sub-template available so that we don't get inundated with VM Monitoring data which would require us to resize the system for that volume of information).

In this way, I would say that the integration is quite similar to Zabbix and Slack.

Wednesday, September 18, 2024

Fixing Clustering and Disk Issues on an N+1 Morpheus CMP Cluster

I had performed an upgrade on Morpheus which I thought was fairly successful. I had some issues doing this upgrade on CentOS 7 because it was designated EOL and the repositories were archived, but I worked through that and it seemed everyone was using the system just fine.

Today, however, I had someone contact me to tell me that they provisioned a virtual machine, but it was stuck in an incomplete "Provisioning" state (a state that has a blue icon with a rocketship in it). The VM was provisioned on vCenter and working, but the state in Morpheus never set to "Finalized".

I couldn't figure this out, so I went to the Morpheus help site and I discovered that I myself had logged a ticket on this issue quite a while back. It turned out that the reason the state never flipped in that case, was because the clustering wasn't working properly.

So I checked RabbitMQ. It looked fine.

I checked MySQL and Percona, and I suspected that perhaps the clustering wasn't working properly. In the process of restarting the VMs, one of the virtual machines wouldn't start. I had to do a bunch of Percona advanced troubleshooting to figure out that I needed to do a wsrep recover commit before I could start the system and have it properly join the cluster. 

The NEXT problem was that Zabbix was screeching about these Morpheus VMs using too much disk space. It turned out that the /var file system was 100% full - because of ElasticSearch. Fortunately I had an oversized /home directory, and was able to do an rsync of the elasticsearch directory over to /home and re-link it.

But this gets to the topic of system administration with respect to disks.

First let's start with some KEY commands you MUST know:

>df -Th 

This command (disk free = df) shows how much space is used in human readable format, but with the mountpoint and file system type. This tells you NOTHING about the physical disks though!

>lsblk -f

This command (list block device) will give you the physical disk, the mountpoint, the uuid and any labels. It is a device specific command and doesn't show you space consumption.

>fdisk -l

I don't really like this command that much because of the output formatting. But it does list disk partitions and related statistics.

Some other commands you can use are:

>sudo file -sL /dev/sda3

the -s flag enables reading of block or character files and -L enables following of symlinks:

>blkid /dev/sda3

Similar command to lsblk -f above.

When a Percona Cluster Node Stops Working

Had a horrible problem where a Percona node (2 of 3) went down and wouldn't start.

I finally ran a command: 

> mysqld_safe --wsrep-recover --tc-heuristic-recover=ROLLBACK

This didn't work, so I had to run a journalctl -xe command to find out that the startup for Percona is actually in a temporary startup file: /var/lib/mysql/wsrep_recovery.xxxxx

From this, I could see pending transactions. Well, transactions either need to be committed, or rolled back.

The rollback didn't work, so, I tried the commit, which DID work.

> mysqld_safe --wsrep-recover --tc-heuristic-recover=COMMIT

Now, you can also edit your /etc/my.cnf file and put this option in that file in this format:

[mysqld]

tc-heuristic-recover = COMMIT

So after running the commit, which seemed to run fine, I went ahead and attempted to start the mysql service again: 

> systemctl start mysql

Fortunately, it came up!

Now - a quick way to check and make sure your percona node is working properly, is to log into mysql, and run the following query:

mysql> show status like 'wsrep%';

Below are the following variables that I tend to look for:
| wsrep_cluster_conf_id            | 56                                                   
| wsrep_cluster_size                  | 3                                                    
| wsrep_cluster_state_uuid        | f523290f-9336-11eb-be5b-d6f9514c9c3c                 
| wsrep_cluster_status               | Primary                                              
| wsrep_connected                     | ON                                                   
| wsrep_local_bf_aborts            | 0                                                    
| wsrep_local_index                  | 2                                                    
| wsrep_ready                            | ON                                                   

The cluster conf id should be the same on all of your cluster nodes!

Monday, September 16, 2024

DB_RUNRECOVERY: Fatal error, run database recovery

I got this scary error when trying to run an upgrade on a cloud management system.

Here is what caused it:

1. The OS was CentOS 7.

2. The repositories for CentOS  7 were removed because CentOS 7 was End of Life (EOL). 

The repos were moved to an archive, and I have a post about how to update a Cent7 OS using archived repos in a previous post.

3. The upgrade was running Chef scripts that in turn were making yum update calls.


What effectively happened, was that the rpm database was getting corrupted. Which sounds frightening. But as it turns out, a post I found saved the day. Apparently rebuilding the rpm database is simple.

From this link, to give credit where credit is due: rebuilding the rpm database

$ mv /var/lib/rpm/__db* /tmp/
$ rpm --rebuilddb
$ yum clean all

Tuesday, September 10, 2024

Updating CentOS 7 After EOL

I found a site that showed how you could update CentOS 7 after Red Hat shut down all of the repositories for it when it was classified End of Life.

I thought I would post on how to do this, lest I cannot locate that link or perhaps it gets taken down.

The link is at https://gcore.de/en/help/linux/centos7-new-repo-url-after-eol.php

Basically the process is as follows:

1. Backup the CentOS-* repositories.

2. Backup the existing epel.repo

2. Make a new CentOS.repo repository file, with the following:

[base]
name=CentOS-7.9.2009 - Base
baseurl=https://vault.centos.org/7.9.2009/os/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=1
metadata_expire=never

#released updates
[updates]
name=CentOS-7.9.2009 - Updates
baseurl=https://vault.centos.org/7.9.2009/updates/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=1
metadata_expire=never

# additional packages that may be useful
[extras]
name=CentOS-7.9.2009 - Extras
baseurl=https://vault.centos.org/7.9.2009/extras/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=1
metadata_expire=never

# additional packages that extend functionality of existing packages
[centosplus]
name=CentOS-7.9.2009 - CentOSPlus
baseurl=https://vault.centos.org/7.9.2009/centosplus/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=0
metadata_expire=never

#fasttrack - packages by Centos Users
[fasttrack]
name=CentOS-7.9.2009 - Contrib
baseurl=https://vault.centos.org/7.9.2009/fasttrack/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=0
metadata_expire=never
NOTE: I had to change the repos from http to https. 

3. Make a new epel.repo repository file with the following:

[epel]
name=Extra Packages for Enterprise Linux 7 - $basearch
baseurl=https://archives.fedoraproject.org/pub/archive/epel/7/$basearch
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
metadata_expire=never

[epel-debuginfo]
name=Extra Packages for Enterprise Linux 7 - $basearch - Debug
baseurl=https://archives.fedoraproject.org/pub/archive/epel/7/$basearch/debug
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
gpgcheck=1
metadata_expire=never

[epel-source]
name=Extra Packages for Enterprise Linux 7 - $basearch - Source
baseurl=https://archives.fedoraproject.org/pub/archive/epel/7/SRPMS
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
gpgcheck=1
metadata_expire=never
NOTE: These base urls are already https in his post, so no changes needed here.
 

Next, Remove all currently available metadata: yum clean all

Now enter yum check-update to load a new list of all available packages and to check if your local installation has all available updates. 

Afterwards you can install packages as usual using yum install.

NOTE: I just did a yum update instead of a yum install. Hope that was correct. It seemed to work fine.

 

Tuesday, August 27, 2024

Programming a Saab

I use the term "Programming" loosely here because I am not talking about Programming in the true sense of the word (writing code that is compiled and run on a chipset).

I am really referring to the use of software so that you can tune and make settings adjustments to the car's software components. 

The Saab has several control units, such as the Engine Control Unit (ECU) - sometimes also referred to as an Engine Control Module (ECM).  General Motors, who made the Saab 9-3 as a joint venture after taking over the auto division of Saab, uses a device called a Tech II  to pull codes, run diagnostics and adjust settings on the cars. These Tech IIs are handheld devices that interface with the OBD connector (which is under the dashboard in most car models). 

The OBD connectors, these are fairly standard, which allows you to drive the car into just about any auto store (Advanced Auto, O'Reilly, Autozone, et al) and they can plug an OBD reader in and get the codes, look them up and make recommendations (and or sell parts which is why they do this as a courtesy).

Since they don't make Saabs anymore, there is no US-based network of dealerships, and mechanics are disappearing fast - only a handful of Saab shops are left operating, and some of them are simply individuals who work on Saabs for various reasons (restoring them, extra cash, etc). So having an OBD reader is certainly helpful if you buy or own a Saab, because you will DEFINITELY need to learn to do some things on your own (most garages won't even a Saab enter their engine bays). 

Buying a Tech II device, which has the Saab software module (PCMCIA card), is almost necessary if you're hardcore into your Saab. But they're expensive. And hard to find, actually. When they pop up on places like eBay, they get snatched up pretty quick by enthusiasts, restorers, mechanics etc. Also, the Tech II devices interface with laptop software, and there are two kinds: TIS2000, and a newer version called TISWeb. This link discusses these laptop software packages:

https://www.uksaabs.co.uk/UKS/viewtopic.php?t=123074

But ... if you cannot get a Tech II device, there is another way to skin the cat!

You see, software is software. And you don't "need" a handheld device as a host for the software. Any laptop will do, if you have the software! Fortunately, someone (Saab?) released the software in open source. You can download and run it. Not the source code I don't think, but the compiled X86 program that will run on a Windows laptop with an installer that sets it up.  But - how do you interface it with the car? There is a cable you can buy, called OBDLink SX. One side is OBD, the other side of it is USB and plugs into the laptop (more on this later).

Now - all this said - you DO need to know what you're doing with this software. Or you can brick the car! But if you learn how to use this software, you can reset faults, run diagnostics, and you can even swap car components and re-flash them (i.e. the ECU). Many Saab parts, believe it or not, are tied to the VIN and you cannot just pull them off of one Saab and stick them on another without running this kind of software.

Lastly, the software. If you don't have a Tech II or can't afford one or can't find one, there is some software called the Trionic Can Flasher (trioniccanflasher). With this, you can flash a new ECU if the one on your Saab went bad - provided you can follow steps.

For example, the steps for cloning a Trionic 8 ecu are as follows:

1: start trioniccanflasher, select T8 and your interface (which corrresponds to the serial port on laptop)

2: read ecu content from the original ecu

3: select t8 mcp and read ecu again

4: switch to the new ecu

5: make sure legion bootloader and unlock sys partitions are checked

6: select t8 mcp and flash that

7: select t8 and flash that

Now - what if you are on a workbench, say at a Saab garage with ten cars that need ECUs, and you don't want to deal with the laptop and getting in and out of the car(s)? There is a different interface you can use where one connector plugs into the ECUs and the other end on the laptop (AEZ Flasher 2?). Honestly, I am not savvy about this yet and don't even know what interface this is (but will update this post once I do).

NOTE: GM makes a software called Tech2Win. I hear that this software does not work with the OBDLink SX cable - but cannot verify this at this time of writing. UPDATE: Indeed it did not work, but someone somehow went in and patched the software and apparently now it DOES work - but only with the MDI 1 (not MDI 2) clone cable adaptor.

https://www.saabcentral.com/threads/tech2win-for-saab-fixes-i-bus-missing-on-2003-9-3.731283/

Friday, August 16, 2024

Pinephone Pro - Unboxing and Use Part II

I picked up the Pinephone Pro, which I had attached to a standard USB-C charger. It indeed was sitting at 100%. So it looks like the charging works okay.

The OS asked me for a pin code to unlock the screen. Yikes. I wasn't prompted to set up a pin code! 

I rebooted the phone to see if I could figure out what OS was on it from the boot messages. I figured out that the phone was running the Pinephone Manjaro OS. 

https://github.com/manjaro-pinephone/phosh/releases

Since the Manjaro OS has a default pincode, I attempted that pin code and got lucky - it wasn't changed, and it worked.  I (re) connected to WiFi, and noticed that the OS is prompting for my WiFi Password every single time and doesn't seem to remember it from before. Secure? Yes Annoying? Yes.

The form factor issue I ran into using the Firefox browser seemed to be more related to Firefox than the OS. The issue with Firefox is that the browser is sized past the phone form factor, and you need to scroll left and right which is a major hassle. The browser doesn't auto-size itself for the screen dimensions.

I played with the Terminal app, and noticed that the user when I launched the Terminal app was pico-xxxx (I don't remember what the suffix is). I tried to sudo to root, but didn't know what the password was for this user. 

Lastly, I played a video from YouTube, and the sound was very tinny. So the speaker on this phone is not high-end. I have not yet attempted to use a headphone on this device yet. 

Since the Linux-Mobile apps are so limited, many apps you typically run from a dedicated icon app/client on a mobile phone will need to be run from a browser.

I am not sure Manjaro is the "right" OS to use on this phone, or if the version of the OS running is current or stale. I ordered the Docking Hub and a Micro SD Card and when those arrive, maybe I will try flashing a new/different OS on this phone.

Integrating Zabbix with BigPanda

BigPanda has made its way into the organization. I wasn't sure at first why, given that there's no shortage of Network Monitoring OS...