Wednesday, September 18, 2024

Fixing Clustering and Disk Issues on an N+1 Morpheus CMP Cluster

I had performed an upgrade on Morpheus which I thought was fairly successful. I had some issues doing this upgrade on CentOS 7 because it was designated EOL and the repositories were archived, but I worked through that and it seemed everyone was using the system just fine.

Today, however, I had someone contact me to tell me that they provisioned a virtual machine, but it was stuck in an incomplete "Provisioning" state (a state that has a blue icon with a rocketship in it). The VM was provisioned on vCenter and working, but the state in Morpheus never set to "Finalized".

I couldn't figure this out, so I went to the Morpheus help site and I discovered that I myself had logged a ticket on this issue quite a while back. It turned out that the reason the state never flipped in that case, was because the clustering wasn't working properly.

So I checked RabbitMQ. It looked fine.

I checked MySQL and Percona, and I suspected that perhaps the clustering wasn't working properly. In the process of restarting the VMs, one of the virtual machines wouldn't start. I had to do a bunch of Percona advanced troubleshooting to figure out that I needed to do a wsrep recover commit before I could start the system and have it properly join the cluster. 

The NEXT problem was that Zabbix was screeching about these Morpheus VMs using too much disk space. It turned out that the /var file system was 100% full - because of ElasticSearch. Fortunately I had an oversized /home directory, and was able to do an rsync of the elasticsearch directory over to /home and re-link it.

But this gets to the topic of system administration with respect to disks.

First let's start with some KEY commands you MUST know:

>df -Th 

This command (disk free = df) shows how much space is used in human readable format, but with the mountpoint and file system type. This tells you NOTHING about the physical disks though!

>lsblk -f

This command (list block device) will give you the physical disk, the mountpoint, the uuid and any labels. It is a device specific command and doesn't show you space consumption.

>fdisk -l

I don't really like this command that much because of the output formatting. But it does list disk partitions and related statistics.

Some other commands you can use are:

>sudo file -sL /dev/sda3

the -s flag enables reading of block or character files and -L enables following of symlinks:

>blkid /dev/sda3

Similar command to lsblk -f above.

When a Percona Cluster Node Stops Working

Had a horrible problem where a Percona node (2 of 3) went down and wouldn't start.

I finally ran a command: 

> mysqld_safe --wsrep-recover --tc-heuristic-recover=ROLLBACK

This didn't work, so I had to run a journalctl -xe command to find out that the startup for Percona is actually in a temporary startup file: /var/lib/mysql/wsrep_recovery.xxxxx

From this, I could see pending transactions. Well, transactions either need to be committed, or rolled back.

The rollback didn't work, so, I tried the commit, which DID work.

> mysqld_safe --wsrep-recover --tc-heuristic-recover=COMMIT

Now, you can also edit your /etc/my.cnf file and put this option in that file in this format:

[mysqld]

tc-heuristic-recover = COMMIT

So after running the commit, which seemed to run fine, I went ahead and attempted to start the mysql service again: 

> systemctl start mysql

Fortunately, it came up!

Now - a quick way to check and make sure your percona node is working properly, is to log into mysql, and run the following query:

mysql> show status like 'wsrep%';

Below are the following variables that I tend to look for:
| wsrep_cluster_conf_id            | 56                                                   
| wsrep_cluster_size                  | 3                                                    
| wsrep_cluster_state_uuid        | f523290f-9336-11eb-be5b-d6f9514c9c3c                 
| wsrep_cluster_status               | Primary                                              
| wsrep_connected                     | ON                                                   
| wsrep_local_bf_aborts            | 0                                                    
| wsrep_local_index                  | 2                                                    
| wsrep_ready                            | ON                                                   

The cluster conf id should be the same on all of your cluster nodes!

Monday, September 16, 2024

DB_RUNRECOVERY: Fatal error, run database recovery

I got this scary error when trying to run an upgrade on a cloud management system.

Here is what caused it:

1. The OS was CentOS 7.

2. The repositories for CentOS  7 were removed because CentOS 7 was End of Life (EOL). 

The repos were moved to an archive, and I have a post about how to update a Cent7 OS using archived repos in a previous post.

3. The upgrade was running Chef scripts that in turn were making yum update calls.


What effectively happened, was that the rpm database was getting corrupted. Which sounds frightening. But as it turns out, a post I found saved the day. Apparently rebuilding the rpm database is simple.

From this link, to give credit where credit is due: rebuilding the rpm database

$ mv /var/lib/rpm/__db* /tmp/
$ rpm --rebuilddb
$ yum clean all

Tuesday, September 10, 2024

Updating CentOS 7 After EOL

I found a site that showed how you could update CentOS 7 after Red Hat shut down all of the repositories for it when it was classified End of Life.

I thought I would post on how to do this, lest I cannot locate that link or perhaps it gets taken down.

The link is at https://gcore.de/en/help/linux/centos7-new-repo-url-after-eol.php

Basically the process is as follows:

1. Backup the CentOS-* repositories.

2. Backup the existing epel.repo

2. Make a new CentOS.repo repository file, with the following:

[base]
name=CentOS-7.9.2009 - Base
baseurl=https://vault.centos.org/7.9.2009/os/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=1
metadata_expire=never

#released updates
[updates]
name=CentOS-7.9.2009 - Updates
baseurl=https://vault.centos.org/7.9.2009/updates/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=1
metadata_expire=never

# additional packages that may be useful
[extras]
name=CentOS-7.9.2009 - Extras
baseurl=https://vault.centos.org/7.9.2009/extras/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=1
metadata_expire=never

# additional packages that extend functionality of existing packages
[centosplus]
name=CentOS-7.9.2009 - CentOSPlus
baseurl=https://vault.centos.org/7.9.2009/centosplus/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=0
metadata_expire=never

#fasttrack - packages by Centos Users
[fasttrack]
name=CentOS-7.9.2009 - Contrib
baseurl=https://vault.centos.org/7.9.2009/fasttrack/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=0
metadata_expire=never
NOTE: I had to change the repos from http to https. 

3. Make a new epel.repo repository file with the following:

[epel]
name=Extra Packages for Enterprise Linux 7 - $basearch
baseurl=https://archives.fedoraproject.org/pub/archive/epel/7/$basearch
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
metadata_expire=never

[epel-debuginfo]
name=Extra Packages for Enterprise Linux 7 - $basearch - Debug
baseurl=https://archives.fedoraproject.org/pub/archive/epel/7/$basearch/debug
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
gpgcheck=1
metadata_expire=never

[epel-source]
name=Extra Packages for Enterprise Linux 7 - $basearch - Source
baseurl=https://archives.fedoraproject.org/pub/archive/epel/7/SRPMS
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
gpgcheck=1
metadata_expire=never
NOTE: These base urls are already https in his post, so no changes needed here.
 

Next, Remove all currently available metadata: yum clean all

Now enter yum check-update to load a new list of all available packages and to check if your local installation has all available updates. 

Afterwards you can install packages as usual using yum install.

NOTE: I just did a yum update instead of a yum install. Hope that was correct. It seemed to work fine.

 

Tuesday, August 27, 2024

Programming a Saab

I use the term "Programming" loosely here because I am not talking about Programming in the true sense of the word (writing code that is compiled and run on a chipset).

I am really referring to the use of software so that you can tune and make settings adjustments to the car's software components. 

The Saab has several control units, such as the Engine Control Unit (ECU) - sometimes also referred to as an Engine Control Module (ECM).  General Motors, who made the Saab 9-3 as a joint venture after taking over the auto division of Saab, uses a device called a Tech II  to pull codes, run diagnostics and adjust settings on the cars. These Tech IIs are handheld devices that interface with the OBD connector (which is under the dashboard in most car models). 

The OBD connectors, these are fairly standard, which allows you to drive the car into just about any auto store (Advanced Auto, O'Reilly, Autozone, et al) and they can plug an OBD reader in and get the codes, look them up and make recommendations (and or sell parts which is why they do this as a courtesy).

Since they don't make Saabs anymore, there is no US-based network of dealerships, and mechanics are disappearing fast - only a handful of Saab shops are left operating, and some of them are simply individuals who work on Saabs for various reasons (restoring them, extra cash, etc). So having an OBD reader is certainly helpful if you buy or own a Saab, because you will DEFINITELY need to learn to do some things on your own (most garages won't even a Saab enter their engine bays). 

Buying a Tech II device, which has the Saab software module (PCMCIA card), is almost necessary if you're hardcore into your Saab. But they're expensive. And hard to find, actually. When they pop up on places like eBay, they get snatched up pretty quick by enthusiasts, restorers, mechanics etc. Also, the Tech II devices interface with laptop software, and there are two kinds: TIS2000, and a newer version called TISWeb. This link discusses these laptop software packages:

https://www.uksaabs.co.uk/UKS/viewtopic.php?t=123074

But ... if you cannot get a Tech II device, there is another way to skin the cat!

You see, software is software. And you don't "need" a handheld device as a host for the software. Any laptop will do, if you have the software! Fortunately, someone (Saab?) released the software in open source. You can download and run it. Not the source code I don't think, but the compiled X86 program that will run on a Windows laptop with an installer that sets it up.  But - how do you interface it with the car? There is a cable you can buy, called OBDLink SX. One side is OBD, the other side of it is USB and plugs into the laptop (more on this later).

Now - all this said - you DO need to know what you're doing with this software. Or you can brick the car! But if you learn how to use this software, you can reset faults, run diagnostics, and you can even swap car components and re-flash them (i.e. the ECU). Many Saab parts, believe it or not, are tied to the VIN and you cannot just pull them off of one Saab and stick them on another without running this kind of software.

Lastly, the software. If you don't have a Tech II or can't afford one or can't find one, there is some software called the Trionic Can Flasher (trioniccanflasher). With this, you can flash a new ECU if the one on your Saab went bad - provided you can follow steps.

For example, the steps for cloning a Trionic 8 ecu are as follows:

1: start trioniccanflasher, select T8 and your interface (which corrresponds to the serial port on laptop)

2: read ecu content from the original ecu

3: select t8 mcp and read ecu again

4: switch to the new ecu

5: make sure legion bootloader and unlock sys partitions are checked

6: select t8 mcp and flash that

7: select t8 and flash that

Now - what if you are on a workbench, say at a Saab garage with ten cars that need ECUs, and you don't want to deal with the laptop and getting in and out of the car(s)? There is a different interface you can use where one connector plugs into the ECUs and the other end on the laptop (AEZ Flasher 2?). Honestly, I am not savvy about this yet and don't even know what interface this is (but will update this post once I do).

NOTE: GM makes a software called Tech2Win. I hear that this software does not work with the OBDLink SX cable - but cannot verify this at this time of writing. UPDATE: Indeed it did not work, but someone somehow went in and patched the software and apparently now it DOES work - but only with the MDI 1 (not MDI 2) clone cable adaptor.

https://www.saabcentral.com/threads/tech2win-for-saab-fixes-i-bus-missing-on-2003-9-3.731283/

Friday, August 16, 2024

Pinephone Pro - Unboxing and Use Part II

I picked up the Pinephone Pro, which I had attached to a standard USB-C charger. It indeed was sitting at 100%. So it looks like the charging works okay.

The OS asked me for a pin code to unlock the screen. Yikes. I wasn't prompted to set up a pin code! 

I rebooted the phone to see if I could figure out what OS was on it from the boot messages. I figured out that the phone was running the Pinephone Manjaro OS. 

https://github.com/manjaro-pinephone/phosh/releases

Since the Manjaro OS has a default pincode, I attempted that pin code and got lucky - it wasn't changed, and it worked.  I (re) connected to WiFi, and noticed that the OS is prompting for my WiFi Password every single time and doesn't seem to remember it from before. Secure? Yes Annoying? Yes.

The form factor issue I ran into using the Firefox browser seemed to be more related to Firefox than the OS. The issue with Firefox is that the browser is sized past the phone form factor, and you need to scroll left and right which is a major hassle. The browser doesn't auto-size itself for the screen dimensions.

I played with the Terminal app, and noticed that the user when I launched the Terminal app was pico-xxxx (I don't remember what the suffix is). I tried to sudo to root, but didn't know what the password was for this user. 

Lastly, I played a video from YouTube, and the sound was very tinny. So the speaker on this phone is not high-end. I have not yet attempted to use a headphone on this device yet. 

Since the Linux-Mobile apps are so limited, many apps you typically run from a dedicated icon app/client on a mobile phone will need to be run from a browser.

I am not sure Manjaro is the "right" OS to use on this phone, or if the version of the OS running is current or stale. I ordered the Docking Hub and a Micro SD Card and when those arrive, maybe I will try flashing a new/different OS on this phone.

Friday, August 9, 2024

Pinephone Pro - Unboxing and First Use

I ordered a Linux Pinephone that just arrived.

In the United States, trying to get off of Google, Apple, and even Samsung is nigh onto impossible. Carriers make a ton of money off of selling and promoting phones, and have locked Linux phones out of their stores and off of their networks because they can't all collude and make money, either by selling the devices (carriers) or siphoning your data on their operating systems or defaulting the browser, etc.

There are probably numerous videos that show the unboxing of a Pinephone, so I will skip that and just make some general comments on my first experience.

When I unboxed the phone, there was no charger included. I bought this phone used on eBay, and while it came in the box, I wasn't sure if they come standard with a charger or not. The phone uses USB-C as a charger, though, and I had plenty of these. The phone had some weight to it. The screen seemed quality, but the back cover looked like a cheap piece of plastic and I could feel something pushing against the back cover (battery? dip or kill switches?). As I don't yet have a SIM for it, I have not yet opened the back.

The phone did not boot up at first. I wasn't sure of the button sequences, so I downloaded the Pinephone User Guide to get going. I decided that the phone probably needed to be charged, and plugged it into my USB-C charger, and immediately, I got a Linux boot sequence on the screen. Linux boot sequences are intimidating to just about anyone and most certainly to a user that is unfamiliar with Linux and not Linux-savvy.

When the boot sequence finished, the phone shut itself down again - presumably because it didn't have enough juice to boot and stay running. I left the phone on the charger, and returned to it 3-4 hours later.

When I came in and picked the phone up and powered it on, I got the boot sequence again and it booted up to the operating system. The OS was reasonably intuitive. I don't have a SIM in the phone yet, so I configured it for WiFi as a first step. Then I tried to set the clock, and I added my city but it is using UTC as the default. Next I went looking to see what apps were installed. It took me a few minutes to realize that the "Discover" app is the app for finding, updating and installing applications.  The first time I tried to run Discover, it crashed. When I re-launched it, it showed me some apps and I tried to update a couple of them, and got a repository error. I finally was able to update Firefox, though. Then I launched Firefox. 

Right away with Firefox, I had issues with screen real-estate and positioning. The browser didn't fit on the screen, and I didn't see a way to shrink it down to fit the screen properly. After closing the 2nd tab I had opened, I was able to use my finger to "grab" the browser, and pull it around, but clearly the browser window fit and lack of a gyroscope to re-orient the browser when the phone is turned sideways are going to make this browser a bit of a hassle - unless I can solve this.

I want to test out the sound quality. That's next.


Fixing Clustering and Disk Issues on an N+1 Morpheus CMP Cluster

I had performed an upgrade on Morpheus which I thought was fairly successful. I had some issues doing this upgrade on CentOS 7 because it wa...