Wednesday, September 18, 2024

When a Percona Cluster Node Stops Working

Had a horrible problem where a Percona node (2 of 3) went down and wouldn't start.

I finally ran a command: 

> mysqld_safe --wsrep-recover --tc-heuristic-recover=ROLLBACK

This didn't work, so I had to run a journalctl -xe command to find out that the startup for Percona is actually in a temporary startup file: /var/lib/mysql/wsrep_recovery.xxxxx

From this, I could see pending transactions. Well, transactions either need to be committed, or rolled back.

The rollback didn't work, so, I tried the commit, which DID work.

> mysqld_safe --wsrep-recover --tc-heuristic-recover=COMMIT

Now, you can also edit your /etc/my.cnf file and put this option in that file in this format:

[mysqld]

tc-heuristic-recover = COMMIT

So after running the commit, which seemed to run fine, I went ahead and attempted to start the mysql service again: 

> systemctl start mysql

Fortunately, it came up!

Now - a quick way to check and make sure your percona node is working properly, is to log into mysql, and run the following query:

mysql> show status like 'wsrep%';

Below are the following variables that I tend to look for:
| wsrep_cluster_conf_id            | 56                                                   
| wsrep_cluster_size                  | 3                                                    
| wsrep_cluster_state_uuid        | f523290f-9336-11eb-be5b-d6f9514c9c3c                 
| wsrep_cluster_status               | Primary                                              
| wsrep_connected                     | ON                                                   
| wsrep_local_bf_aborts            | 0                                                    
| wsrep_local_index                  | 2                                                    
| wsrep_ready                            | ON                                                   

The cluster conf id should be the same on all of your cluster nodes!

No comments:

A Long Day with Morpheus CMP

I had performed an upgrade on Morpheus which I thought was fairly successful. I had some issues doing this upgrade on CentOS 7 because it wa...