Quantcast
Channel: Severalnines - MariaDB
Viewing all 365 articles
Browse latest View live

Become a MySQL DBA blog series - Troubleshooting Galera cluster issues - part 1

$
0
0

In this blog post, we are going to show you some examples of things that can go wrong in Galera - inexplicable node crashes, network splits, clusters that won’t restart and inconsistent data. We’ll take a look at the data stored in log files to diagnose the problems, and discuss how we can deal with these.

This builds upon the previous post, where we looked into log files produced by Galera (error log and innobackup.* logs). We discussed how regular, “normal” activity looks like - from initialization of Galera replication to Incremental State Transfer (IST) and State Snapshot Transfer (SST) processes, and the respective log entries. 

This is the seventeenth installment in the ‘Become a MySQL DBA’ blog series. Our previous posts in the DBA series include:

 

GEN_CLUST_INDEX issue

This is an example of an issue related to lack of Primary Key (PK) in a table. Theoretically speaking, Galera supports InnoDB tables without a PK. Lack of PK induces some limitations, for e.g. , DELETEs are not supported on such tables. Row order can also be different. We’ve found that lack of PK can also be a culprit of serious crashes in Galera. It shouldn’t happen but that’s how things look like. Below is a sample of error log covering such crash.

2015-04-29 11:49:38 49655 [Note] WSREP: Synchronized with group, ready for connections
2015-04-29 11:49:38 49655 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2015-04-29 11:49:39 7ff1285b5700 InnoDB: Error: page 409 log sequence number 207988009678
InnoDB: is in the future! Current system log sequence number 207818416901.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: for more information.
BF-BF X lock conflict
RECORD LOCKS space id 1288 page no 33 n bits 104 index `GEN_CLUST_INDEX` of table `db`.`tab` trx id 4670189 lock_mode X locks rec but not gap

Here’s the hint we are looking for: ‘GEN_CLUST_INDEX’. InnoDB requires a PK to be available - it uses PK as a clustered index, which is the main component of a table structure. No PK - no way to organize data within a table. If there’s no explicit PK defined on a table, InnoDB looks for UNIQUE keys to use for clustered index. If there’s no PK nor UNIQUE key available, an implicit PK is created - it’s referred to by InnoDB as ‘GEN_CLUST_INDEX’. If you see the entry like above, you can be sure that this particular table (db.tab) doesn’t have a PK defined.

In this particular case, we can see that some kind of lock conflict happened.

Record lock, heap no 2 PHYSICAL RECORD: n_fields 18; compact format; info bits 0
 0: len 6; hex 00000c8517c5; asc       ;;
 1: len 6; hex 0000004742ec; asc    GB ;;
 2: len 7; hex 5f0000026d10a6; asc _   m  ;;
 3: len 4; hex 32323237; asc 2227;;

...

 15: len 30; hex 504c4154454c4554204147475245474154494f4e20494e48494249544f52; asc xxx    ; (total 50 bytes);
 16: len 30; hex 414e5449504c4154454c4554204452554753202020202020202020202020; asc xxx    ; (total 100 bytes);
 17: len 1; hex 32; asc 2;;

In the section above you can find information about the record involved in the locking conflict. We obfuscated the data, but normally this info may point you to a particular row in the table.

15:49:39 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=25165824
read_buffer_size=131072
max_used_connections=13
max_threads=200
thread_count=12
connection_count=12
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 104208 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fed4c000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ff10037bd98 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8ea2e5]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x6788f4]
/lib64/libpthread.so.0[0x391380f710]
/lib64/libc.so.6(gsignal+0x35)[0x3913432625]
/lib64/libc.so.6(abort+0x175)[0x3913433e05]
/usr/sbin/mysqld[0x9c262e]
/usr/sbin/mysqld[0x9c691b]
/usr/sbin/mysqld[0x9c6c44]
/usr/sbin/mysqld[0xa30ddf]
/usr/sbin/mysqld[0xa36637]
/usr/sbin/mysqld[0x997267]
/usr/sbin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x9c)[0x5ba3bc]
/usr/sbin/mysqld(_ZN14Rows_log_event24do_table_scan_and_updateEPK14Relay_log_info+0x188)[0x883378]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0xd77)[0x88e127]
/usr/sbin/mysqld(_ZN9Log_event11apply_eventEP14Relay_log_info+0x68)[0x88f458]
/usr/sbin/mysqld(_Z14wsrep_apply_cbPvPKvmjPK14wsrep_trx_meta+0x58e)[0x5b6fee]
/usr/lib64/galera/libgalera_smm.so(_ZNK6galera9TrxHandle5applyEPvPF15wsrep_cb_statusS1_PKvmjPK14wsrep_trx_metaERS6_+0xb1)[0x7ff1288ad2c1]
/usr/lib64/galera/libgalera_smm.so(+0x1aaf95)[0x7ff1288e4f95]
/usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trxEPvPNS_9TrxHandleE+0x283)[0x7ff1288e5e03]
/usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM11process_trxEPvPNS_9TrxHandleE+0x45)[0x7ff1288e66f5]
/usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x2c9)[0x7ff1288c3349]
/usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x63)[0x7ff1288c3823]
/usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x93)[0x7ff1288e23f3]
/usr/lib64/galera/libgalera_smm.so(galera_recv+0x23)[0x7ff1288f7743]
/usr/sbin/mysqld[0x5b7caf]
/usr/sbin/mysqld(start_wsrep_THD+0x3e6)[0x5a8d96]
/lib64/libpthread.so.0[0x39138079d1]
/lib64/libc.so.6(clone+0x6d)[0x39134e88fd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 3
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
150429 11:49:39 mysqld_safe Number of processes running now: 0
150429 11:49:39 mysqld_safe WSREP: not restarting wsrep node automatically
150429 11:49:39 mysqld_safe mysqld from pid file /var/lib/mysql/hostname.pid ended

Finally, we have a pretty standard MySQL crash report along with stack trace. If you look closely, you’ll see a pretty important bit of information - the stack trace contains entries about ‘libgalera_smm.so’. This points us to the Galera library being involved in this crash. As you can see, we are talking here about a node which is applying an event executed first on some other member of the cluster (_ZN9Log_event11apply_eventEP14Relay_log_info). This event involves SQL performing a table scan: 

_ZN14Rows_log_event24do_table_scan_and_updateEPK14Relay_log_info
_ZN7handler11ha_rnd_nextEPh

Apparently, while scanning our ‘db.tab’ table, something went wrong and MySQL crashed. From our experience, in such cases, it’s enough to create a PK on the table. We don’t recall issues which persist after that schema change. In general, if you don’t have a good candidate for a PK on the table, the best way to solve such problem would be to create a new integer (small/big - depending on the max expected table size) column, auto-incremented, unsigned, not null. Such column will work great as PK and it should prevent this type of errors.

Please keep in mind that for tables without PK, if you execute simultaneous writes on multiple Galera nodes, rows may have been ordered differently - this is because implicit PK base on id of a row. Such ID increases monotonically as new rows are inserted. So, everything depends on the write concurrency - higher it is, more likely rows were inserted in different order on different nodes. If this happened, when you add a Primary Key, you may end up with table not consistent across the cluster. There’s no easy workaround here. The easiest way (but also most intrusive) is to set one node in the cluster as a donor for the rest of it and then force SST on remaining nodes - this will rebuild other nodes using single source of truth. Another option is to use pt-table-sync, script which is a part of the Percona Toolkit. It will allow you to sync chosen table across the cluster. Of course, before you perform any kind of action it’s always good to use another Percona Toolkit script (pt-table-checksum) to check if given table is in sync across the cluster or not. If it is in sync (because you write only to a single node or write concurrency is low and issue hasn’t been triggered), there’s no need to do anything.

Network split

Galera Cluster requires quorum to operate - 50%+1 nodes have to be up in order to form a ‘Primary Component’. This mechanism is designed to ensure that a split brain won’t happen and that your application does not end up talking to two separate disconnected parts of the same cluster. Obviously, it’d be a very bad situation to be in - therefore we are happy for the protection Galera gives us. We are going to cover a real-world example of the network split and why we have to be cautious when dealing with such scenario.

150529  9:30:05 [Note] WSREP: evs::proto(a1024858, OPERATIONAL, view_id(REG,428eb82c,1111)) suspecting node: 428eb82c
150529  9:30:05 [Note] WSREP: evs::proto(a1024858, OPERATIONAL, view_id(REG,428eb82c,1111)) suspected node without join message, declaring inactive
150529  9:30:05 [Note] WSREP: evs::proto(a1024858, OPERATIONAL, view_id(REG,428eb82c,1111)) suspecting node: d272b968
150529  9:30:05 [Note] WSREP: evs::proto(a1024858, OPERATIONAL, view_id(REG,428eb82c,1111)) suspected node without join message, declaring inactive

Two nodes (id’s of 428eb82c and d272b968) were declared inactive and left the cluster. Those nodes, in fact, were still running - the problem was related only to the network connectivity.

150529  9:30:05 [Note] WSREP: view(view_id(NON_PRIM,428eb82c,1111) memb {
        60bbf616,1
        a1024858,1
} joined {
} left {
} partitioned {
        428eb82c,0
        d272b968,0
})

This cluster consists of four nodes. Therefore, as expected, it switched to Non-Primary state as less than 50%+1 nodes are available.

150529  9:30:05 [Note] WSREP: declaring 60bbf616 at tcp://10.87.84.101:4567 stable
150529  9:30:05 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
150529  9:30:05 [Note] WSREP: Flow-control interval: [23, 23]
150529  9:30:05 [Note] WSREP: Received NON-PRIMARY.
150529  9:30:05 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 5351819)
150529  9:30:05 [Note] WSREP: New cluster view: global state: 8f630ade-4366-11e4-94c6-d6eb7adc0ddd:5351819, view# -1: non-Primary, number of nodes: 2, my index: 1, protocol version 3
150529  9:30:05 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
150529  9:30:05 [Note] WSREP: view(view_id(NON_PRIM,60bbf616,1112) memb {
        60bbf616,1
        a1024858,1
} joined {
} left {
} partitioned {
        428eb82c,0
        d272b968,0
})
150529  9:30:05 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
150529  9:30:05 [Note] WSREP: Flow-control interval: [23, 23]
150529  9:30:05 [Note] WSREP: Received NON-PRIMARY.
150529  9:30:05 [Note] WSREP: New cluster view: global state: 8f630ade-4366-11e4-94c6-d6eb7adc0ddd:5351819, view# -1: non-Primary, number of nodes: 2, my index: 1, protocol version 3

...

150529  9:30:20 [Note] WSREP: (a1024858, 'tcp://0.0.0.0:4567') reconnecting to 428eb82c (tcp://172.16.14.227:4567), attempt 30
150529  9:30:20 [Note] WSREP: (a1024858, 'tcp://0.0.0.0:4567') address 'tcp://10.87.84.102:4567' pointing to uuid a1024858 is blacklisted, skipping
150529  9:30:21 [Note] WSREP: (a1024858, 'tcp://0.0.0.0:4567') reconnecting to d272b968 (tcp://172.16.14.226:4567), attempt 30

Above you can see a couple of failed attempts to bring the cluster back in sync. When a cluster is in ‘non-Primary’ state, it won’t rejoin unless all of the nodes are available again or one of the nodes is bootstrapped.

Bootstrapping from incorrect node

Sometimes bootstrapping the cluster using one of the nodes is the only quick way to bring the cluster up - usually you don’t have time to wait until all nodes become available again. Actually, you could wait forever if one of the nodes is down due to hardware failure or data corruption - in such cases bootstrapping the cluster is the only way to bring it back. What’s very important to keep in mind is that a cluster has to be bootstrapped from the most advanced node. Sometimes it may happen that one or more of the nodes is behind in applying writesets. In such cases you need to confirm which one of the nodes is the most advanced one by running, for example, mysqld_safe --wsrep_recover on stopped node. Failing to do so may result in troubles like shown in the error log below:

151104 16:52:19 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
151104 16:52:19 mysqld_safe Skipping wsrep-recover for 98ed75de-7c05-11e5-9743-de4abc22bd11:235368 pair
151104 16:52:19 mysqld_safe Assigning 98ed75de-7c05-11e5-9743-de4abc22bd11:235368 to wsrep_start_position

...

2015-11-04 16:52:19 18033 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '172.30.4.156:4567,172.30.4.191:4567,172.30.4.220:4567'

What we have here is a standard startup process for galera. Please note that the cluster consists of three nodes.

2015-11-04 16:52:19 18033 [Warning] WSREP: (6964fefd, 'tcp://0.0.0.0:4567') address 'tcp://172.30.4.220:4567' points to own listening address, blacklisting
2015-11-04 16:52:19 18033 [Note] WSREP: (6964fefd, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2015-11-04 16:52:20 18033 [Note] WSREP: declaring 5342d84a at tcp://172.30.4.191:4567 stable
2015-11-04 16:52:20 18033 [Note] WSREP: Node 5342d84a state prim
2015-11-04 16:52:20 18033 [Note] WSREP: view(view_id(PRIM,5342d84a,2) memb {
    5342d84a,0
    6964fefd,0
} joined {
} left {
} partitioned {
})
2015-11-04 16:52:20 18033 [Note] WSREP: save pc into disk
2015-11-04 16:52:20 18033 [Note] WSREP: discarding pending addr without UUID: tcp://172.30.4.156:4567
2015-11-04 16:52:20 18033 [Note] WSREP: gcomm: connected
2015-11-04 16:52:20 18033 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2015-11-04 16:52:20 18033 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2015-11-04 16:52:20 18033 [Note] WSREP: Opened channel 'my_wsrep_cluster'
2015-11-04 16:52:20 18033 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2015-11-04 16:52:20 18033 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2015-11-04 16:52:20 18033 [Note] WSREP: Waiting for SST to complete.
2015-11-04 16:52:20 18033 [Note] WSREP: STATE EXCHANGE: sent state msg: 69b62fe8-8314-11e5-a6d4-fae0f5e6c643
2015-11-04 16:52:20 18033 [Note] WSREP: STATE EXCHANGE: got state msg: 69b62fe8-8314-11e5-a6d4-fae0f5e6c643 from 0 (172.30.4.191)
2015-11-04 16:52:20 18033 [Note] WSREP: STATE EXCHANGE: got state msg: 69b62fe8-8314-11e5-a6d4-fae0f5e6c643 from 1 (172.30.4.220)

So far so good, nodes are communicating with each other.

2015-11-04 16:52:20 18033 [ERROR] WSREP: gcs/src/gcs_group.cpp:group_post_state_exchange():319: Reversing history: 0 -> 0, this member has applied 140700504171408 more events than the primary component.Data loss is possible. Aborting.
2015-11-04 16:52:20 18033 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Aborted (core dumped)
151104 16:52:20 mysqld_safe mysqld from pid file /var/lib/mysql/mysql.pid ended

Unfortunately, one of the nodes was more advanced that the node we bootstrapped the cluster from. This insane number of events is not really valid, as you can tell. It’s true though that this node cannot join the cluster. The only way to do it is by running SST. But in such case you are going to lose some of the data which wasn’t applied on the node you bootstrapped the cluster from. The best practice is to try and recover this data from the ‘more advanced’ node - hopefully there are binlogs available. If not, the process is still possible but comparing the data on a live system is not an easy task.

Consistency issues

Within a cluster, all Galera nodes should be consistent. No matter where you write, the change should be replicated across all nodes. For that, as a base method of moving data around, Galera uses row based replication -  this is the best way of replicating data while maintaining consistency of the data set. All nodes are consistent - this is true most of the time. Sometimes, though, things do not work as they are expected to. Even with Galera, nodes may get out of sync. There are numerous reasons for that - from node crashes which result in data loss to human errors while tinkering around replication or ‘wsrep_on’ variables. What is important - Galera has an internal mechanism of maintaining consistency. If it detects an error while applying a writeset, a node will shut down to ensure that inconsistency won’t propagate, for example through SST.

By looking at the error log, this is how things will look like.

2015-11-12 15:35:59 5071 [ERROR] Slave SQL: Could not execute Update_rows event on table t1.tab; Can't find record in 'tab', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 162, Error_code: 1032

If you are familiar with row based replication, you may have seen errors similar to the one above. This is information from MySQL that one of the events cannot be executed. In this particular case, some update failed to apply.

2015-11-12 15:35:59 5071 [Warning] WSREP: RBR event 3 Update_rows apply warning: 120, 563464
2015-11-12 15:35:59 5071 [Warning] WSREP: Failed to apply app buffer: seqno: 563464, status: 1
     at galera/src/trx_handle.cpp:apply():351

Here we can see the Galera part of the information - writeset with sequence number of 563464 couldn’t be applied.

Retrying 2th time
2015-11-12 15:35:59 5071 [ERROR] Slave SQL: Could not execute Update_rows event on table t1.tab; Can't find record in 'tab', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 162, Error_code: 1032
2015-11-12 15:35:59 5071 [Warning] WSREP: RBR event 3 Update_rows apply warning: 120, 563464
2015-11-12 15:35:59 5071 [Warning] WSREP: Failed to apply app buffer: seqno: 563464, status: 1
     at galera/src/trx_handle.cpp:apply():351
Retrying 3th time
2015-11-12 15:35:59 5071 [ERROR] Slave SQL: Could not execute Update_rows event on table t1.tab; Can't find record in 'tab', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 162, Error_code: 1032
2015-11-12 15:35:59 5071 [Warning] WSREP: RBR event 3 Update_rows apply warning: 120, 563464
2015-11-12 15:35:59 5071 [Warning] WSREP: Failed to apply app buffer: seqno: 563464, status: 1
     at galera/src/trx_handle.cpp:apply():351
Retrying 4th time
2015-11-12 15:35:59 5071 [ERROR] Slave SQL: Could not execute Update_rows event on table t1.tab; Can't find record in 'tab', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 162, Error_code: 1032
2015-11-12 15:35:59 5071 [Warning] WSREP: RBR event 3 Update_rows apply warning: 120, 563464

To avoid transient errors (like deadlocks, for example), the writeset has been re-executed four times. It was a permanent error this time so all attempts failed.

2015-11-12 15:35:59 5071 [ERROR] WSREP: Failed to apply trx: source: 7c775f39-893d-11e5-93b5-0b0d185bde79 version: 3 local: 0 state: APPLYING flags: 1 conn_id: 5 trx_id: 2779411 seqnos (l: 12, g: 563464, s: 563463, d: 563463, ts: 11103845835944)
2015-11-12 15:35:59 5071 [ERROR] WSREP: Failed to apply trx 563464 4 times
2015-11-12 15:35:59 5071 [ERROR] WSREP: Node consistency compromized, aborting…

With failure to apply the writeset, the node was determined inconsistent with the rest of the cluster and Galera took an action which is intended to minimize the impact of this inconsistency - the node is going to be shut down and it won’t be able to join the cluster as long as this writeset remains unapplied.

2015-11-12 15:35:59 5071 [Note] WSREP: Closing send monitor...
2015-11-12 15:35:59 5071 [Note] WSREP: Closed send monitor.
2015-11-12 15:35:59 5071 [Note] WSREP: gcomm: terminating thread
2015-11-12 15:35:59 5071 [Note] WSREP: gcomm: joining thread
2015-11-12 15:35:59 5071 [Note] WSREP: gcomm: closing backend
2015-11-12 15:35:59 5071 [Note] WSREP: view(view_id(NON_PRIM,7c775f39,3) memb {
    8165505b,0
} joined {
} left {
} partitioned {
    7c775f39,0
    821adb36,0
})
2015-11-12 15:35:59 5071 [Note] WSREP: view((empty))
2015-11-12 15:35:59 5071 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2015-11-12 15:35:59 5071 [Note] WSREP: gcomm: closed
2015-11-12 15:35:59 5071 [Note] WSREP: Flow-control interval: [16, 16]
2015-11-12 15:35:59 5071 [Note] WSREP: Received NON-PRIMARY.
2015-11-12 15:35:59 5071 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 563464)
2015-11-12 15:35:59 5071 [Note] WSREP: Received self-leave message.
2015-11-12 15:35:59 5071 [Note] WSREP: Flow-control interval: [0, 0]
2015-11-12 15:35:59 5071 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2015-11-12 15:35:59 5071 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 563464)
2015-11-12 15:35:59 5071 [Note] WSREP: RECV thread exiting 0: Success
2015-11-12 15:35:59 5071 [Note] WSREP: recv_thread() joined.
2015-11-12 15:35:59 5071 [Note] WSREP: Closing replication queue.
2015-11-12 15:35:59 5071 [Note] WSREP: Closing slave action queue.
2015-11-12 15:35:59 5071 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Aborted (core dumped)
151112 15:35:59 mysqld_safe Number of processes running now: 0
151112 15:35:59 mysqld_safe WSREP: not restarting wsrep node automatically
151112 15:35:59 mysqld_safe mysqld from pid file /var/lib/mysql/mysql.pid ended

What follows is the regular MySQL shutdown process.

This concludes the first part of our troubleshooting blog. In the second part of the blog, we will cover different types of SST issues, as well as problems with network streaming.

Blog category:


Become a ClusterControl DBA: Making your DB components HA via Load Balancers

$
0
0

Choosing your HA topology

There are various ways to retain high availability with databases. You can use Virtual IPs (VRRP) to manage host availability, you can use resource managers like Zookeeper and Etcd to (re)configure your applications or use load balancers/proxies to distribute the workload over all available hosts.

The Virtual IPs need either an application to manage them (MHA, Orchestrator), some scripting (KeepaliveD, Pacemaker/Corosync) or an engineer to manually fail over and the decision making in the process can become complex. The Virtual IP failover is a straightforward and simple process by removing the IP address from one host, assigning it to another and use arping to send a gratuitous ARP response. In theory a Virtual IP can be moved in a second but it will take a few seconds before the failover management application is sure the host has failed and acts accordingly. In reality this should be somewhere between 10 and 30 seconds. Another limitation of Virtual IPs is that some cloud providers do not allow you to manage your own Virtual IPs or assign them at all. E.g., Google does not allow you to do that on their compute nodes.

Resource managers like Zookeeper and Etcd can monitor your databases and (re)configure your applications once a host fails or a slave gets promoted to master. In general this is a good idea but implementing your checks with Zookeeper and Etcd is a complex task.

A load balancer or proxy will sit in between the application and the database host and work transparently as if the client would connect to the database host directly. Just like with the Virtual IP and resource managers, the load balancers and proxies also need to monitor the hosts and redirect the traffic if one host is down. ClusterControl supports two proxies: HAProxy and MaxScale and both are supported for MySQL master-slave replication and Galera cluster. HAProxy and MaxScale both have their own use cases, we will describe them in this post as well.

Why do you need a load balancer?

In theory you don’t need a load balancer but in practice you will prefer one. We’ll explain why. 

If you have virtual IPs setup, all you have to do is point your application to the correct (virtual) IP address and everything should be fine connection wise. But suppose you have scaled out the number of read replicas, you might want to provide virtual IPs for each of those read replicas as well because of maintenance or availability reasons. This might become a very large pool of virtual IPs that you have to manage. If one of those read replicas had a failure, you need to re-assign the virtual IP to another host or else your application will connect to either a host that is down or in worst case, a lagging server with stale data. Keeping the replication state to the application managing the virtual IPs is therefore necessary.

Also for Galera there is a similar challenge: you can in theory add as many hosts as you’d like to your application config and pick one at random. The same problem arises when this host is down: you might end up connecting to an unavailable host. Also using all hosts for both reads and writes might also cause rollbacks due to the optimistic locking in Galera. If two connections try to write to the same row at the same time, one of them will receive a roll back. In case your workload has such concurrent updates, it is advised to only use one node in Galera to write to. Therefore you want a manager that keeps track of the internal state of your database cluster.

Both HAProxy and MaxScale will offer you the functionality to monitor the database hosts and keep state of your cluster and its topology. For replication setups, in case a slave replica is down, both HAProxy and MaxScale can redistribute the connections to another host. But if a replication master is down, HAProxy will deny the connection and MaxScale will give back a proper error to the client. For Galera setups, both load balancers can elect a master node from the Galera cluster and only send the write operations to that specific node.

On the surface HAProxy and MaxScale may seem to be similar solutions, but they differ a lot in features and the way they distribute connections and queries. Both HAProxy and MaxScale can distribute connections using round-robin. You can utilize the round-robin also to split reads by designating a specific port for sending reads to the slaves and another port to send writes to the master. Your application will have to decide whether to use the read or write port. Since MaxScale is an intelligent proxy, it is database aware and is also able to analyze your queries. MaxScale is able to do read/write splitting on a single port by detecting whether you are performing a read or write operation and connecting to the designated slaves or master in your cluster. MaxScale includes additional functionality like binlog routing, audit logging and query rewriting but we will have to cover these in a separate article.

That should be enough background information on this topic, so let’s see how you can deploy both load balancers for MySQL replication and Galera topologies.

Deploying HAProxy

Using ClusterControl to deploy HAProxy on a Galera cluster is easy: go to the relevant cluster and select “Add Load Balancer”:

severalnines-blogpost-add-galera-haproxy.png

And you will be able to deploy an HAProxy instance by adding the host address and selecting the server instances you wish to include in the configuration:

severalnines-blogpost-add-galera-haproxy-2.png

By default the HAProxy instance will be configured to send connections to the server instances receiving the least number of connections, but you can change that policy to either round robin or source. 

Under advanced settings you can set timeouts, maximum amount of connections and even secure the proxy by whitelisting an IP range for the connections.

Under the nodes tab of that cluster, the HAProxy node will appear:

severalnines-blogpost-add-galera-haproxy-3.png

Now your Galera cluster is also available via the newly deployed HAProxy node on port 3307. Don’t forget to GRANT your application access from the HAProxy IP, as now the traffic will be incoming from the proxy instead of the application hosts.  Also, remember to point your application connection to the HAProxy node.

Now suppose the one server instance would go down, HAProxy will notice this within a few seconds and stop sending traffic to this instance:

severalnines-blogpost-add-galera-haproxy-node-down.png

The two other nodes are still fine and will keep receiving traffic. This retains the cluster highly available without the client even noticing the difference.

Deploying a secondary HAProxy node

Now that we have moved the responsibility of retaining high availability over the database connections from the client to HAProxy, what if the proxy node dies? The answer is to create another HAProxy instance and use a virtual IP controlled by Keepalived as shown in this diagram:

The benefit compared to using virtual IPs on the database nodes is that the logic for MySQL is at the proxy level and the failover for the proxies is simple.

So let’s deploy a secondary HAProxy node:

severalnines-blogpost-add-galera-second-haproxy-1.png

After we have deployed a secondary HAProxy node, we need to add Keepalived:

severalnines-blogpost-add-keepalived.png

And after Keepalived has been added, your nodes overview will look like this:

severalnines-blogpost-keepalived.png

So now instead of pointing your application connections to the HAProxy node directly you have to point them to the virtual IP instead.

In the example here, we used separate hosts to run HAProxy on, but you could easily add them to existing server instances as well. HAProxy does not bring much overhead, although you should keep in mind that in case of a server failure, you will lose both the database node and the proxy.

Deploying MaxScale

Deploying MaxScale to your cluster is done in a similar way to HAProxy: ‘Add Load Balancer’ in the cluster list.

severalnines-blogpost-add-maxscale.png

ClusterControl will deploy MaxScale with both the round-robin router and the read/write splitter. The CLI port will be used to enable you to administrate MaxScale from ClusterControl.

After MaxScale has been deployed, it will be available under the Nodes tab:

severalnines-blogpost-maxscale-admin2.png

Opening the MaxScale node overview will present you the interface that grants you access to the CLI interface, so there is no reason to log into MaxScale on the node anymore. 

For MaxScale, the grants are slightly different: as you are proxying, you need to allow connections from the proxy - just like with HAProxy. But since MaxScale is also performing local authentication and authorization, you need to grant access to your application hosts as well.

Deploying Garbd

Galera implements a quorum-based algorithm to select a primary component through which it enforces consistency. The primary component needs to have a majority of votes (50% + 1 node), so in a 2 node system, there would be no majority resulting in split brain. Fortunately, it is possible to add a garbd (Galera Arbitrator Daemon), which is a lightweight stateless daemon that can act as the odd node. The added benefit by adding the Galera Arbitrator is that you can now do with only two nodes in your cluster.

If ClusterControl detects that your Galera cluster consists of an even number of nodes, you will be given the warning/advice by ClusterControl to extend the cluster to an odd number of nodes:

severalnines-blogpost-even-nodes-galera.png

Choose wisely the host to deploy garbd on, as it will receive all replicated data. Make sure the network can handle the traffic and is secure enough. You could choose one of the HAProxy or MaxScale hosts to deploy garbd on, like in the example below:

severalnines-blogpost-add-garbd.png

Alternatively you could install garbd on the ClusterControl host.

After installing garbd, you will see it appear next to your two Galera nodes:

severalnines-blogpost-garbd-cluster-list.png

Final thoughts

We showed you how to make your MySQL master-slave and Galera cluster setups more robust and retain high availability using HAProxy and MaxScale. Also garbd is a nice daemon that can save the extra third node in your Galera cluster. 

This finalizes the deployment side of ClusterControl. In our next blog, we will show you how to integrate ClusterControl within your organization by using groups and assigning certain roles to users.

Blog category:

Become a MySQL DBA blog series - Troubleshooting Galera cluster issues - part 2

$
0
0

This is part 2 of our blog on how to troubleshoot Galera cluster - SST errors, and problems with network streaming. In part 1 of the blog, we covered issues ranging from node crashes to clusters that won’t restart, network splits and inconsistent data. Note that the issues described are all examples inspired from real life incidents in production environments.

This is the eighteenth installment in the ‘Become a MySQL DBA’ blog series. Our previous posts in the DBA series include:

 

SST errors

State Snapshot Transfer (SST) is a mechanism intended to provision or rebuild Galera nodes. There are a couple of ways you can execute SST, including mysqldump and rsync. The most popular one, though, is to perform SST using xtrabackup - it allows the donor to stay online and makes the SST process less intrusive. Xtrabackup SST process involves streaming data over the network from the donor to the joining node. Additionally, xtrabackup needs access to MySQL to grab the InnoDB log data - it checks ‘wsrep_sst_auth’ variable for access credentials. In this chapter we are going to cover examples of problems related to SST.

Incorrect password for SST

Sometimes it may happen that the credentials for MySQL access, set in wsrep_sst_auth, are not correct. In such case SST won’t complete properly because xtrabackup, on the donor, will not be able to connect to the MySQL database to perform a backup. You may then see the following symptoms in the error log.

Let’s start with a joining node.

2015-11-13 10:40:57 30484 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 16,
        members    = 2/3 (joined/total),
        act_id     = 563464,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 98ed75de-7c05-11e5-9743-de4abc22bd11
2015-11-13 10:40:57 30484 [Note] WSREP: Flow-control interval: [28, 28]
2015-11-13 10:40:57 30484 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 563464)
2015-11-13 10:40:57 30484 [Note] WSREP: State transfer required:
        Group state: 98ed75de-7c05-11e5-9743-de4abc22bd11:563464
        Local state: 00000000-0000-0000-0000-000000000000:-1

So far we have the standard SST process - a node joined the cluster and state transfer was deemed necessary. 

...

2015-11-13 10:40:57 30484 [Warning] WSREP: Gap in state sequence. Need state transfer.
2015-11-13 10:40:57 30484 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.30.4.220' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --
defaults-group-suffix '' --parent '30484''''
WSREP_SST: [INFO] Streaming with xbstream (20151113 10:40:57.975)
WSREP_SST: [INFO] Using socat as streamer (20151113 10:40:57.977)
WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql//sst_in_progress (20151113 10:40:57.980)
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151113 10:40:58.005)
2015-11-13 10:40:58 30484 [Note] WSREP: Prepared SST request: xtrabackup-v2|172.30.4.220:4444/xtrabackup_sst//1

Xtrabackup started.

2015-11-13 10:40:58 30484 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2015-11-13 10:40:58 30484 [Note] WSREP: REPL Protocols: 7 (3, 2)
2015-11-13 10:40:58 30484 [Note] WSREP: Service thread queue flushed.
2015-11-13 10:40:58 30484 [Note] WSREP: Assign initial position for certification: 563464, protocol version: 3
2015-11-13 10:40:58 30484 [Note] WSREP: Service thread queue flushed.
2015-11-13 10:40:58 30484 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (98ed75de-7c05-11e5-9743-de4abc22bd11): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2015-11-13 10:40:58 30484 [Note] WSREP: Member 0.0 (172.30.4.220) requested state transfer from '*any*'. Selected 1.0 (172.30.4.156)(SYNCED) as donor.
2015-11-13 10:40:58 30484 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 563464)

IST was not available due to the state of the node (in this particular case, the MySQL data directory was not available).

2015-11-13 10:40:58 30484 [Note] WSREP: Requesting state transfer: success, donor: 1
WSREP_SST: [INFO] WARNING: Stale temporary SST directory: /var/lib/mysql//.sst from previous state transfer. Removing (20151113 10:40:58.430)
WSREP_SST: [INFO] Proceeding with SST (20151113 10:40:58.434)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151113 10:40:58.435)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20151113 10:40:58.436)
removed ‘/var/lib/mysql/ibdata1’
removed ‘/var/lib/mysql/ib_logfile1’
removed ‘/var/lib/mysql/ib_logfile0’
removed ‘/var/lib/mysql/auto.cnf’
removed ‘/var/lib/mysql/mysql.sock’
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20151113 10:40:58.568)

The remaining files from the data directory were removed, and the node started to wait for SST process to complete.

2015-11-13 10:41:00 30484 [Note] WSREP: (05e14925, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [ERROR] xtrabackup_checkpoints missing, failed innobackupex/SST on donor (20151113 10:41:08.407)
WSREP_SST: [ERROR] Cleanup after exit with status:2 (20151113 10:41:08.409)
2015-11-13 10:41:08 30484 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.30.4.220' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '30484''' : 2 (No such file or directory)
2015-11-13 10:41:08 30484 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2015-11-13 10:41:08 30484 [ERROR] WSREP: SST script aborted with error 2 (No such file or directory)
2015-11-13 10:41:08 30484 [ERROR] WSREP: SST failed: 2 (No such file or directory)
2015-11-13 10:41:08 30484 [ERROR] Aborting

2015-11-13 10:41:08 30484 [Warning] WSREP: 1.0 (172.30.4.156): State transfer to 0.0 (172.30.4.220) failed: -22 (Invalid argument)
2015-11-13 10:41:08 30484 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():731: Will never receive state. Need to abort.

Unfortunately, SST failed on a donor node and, as a result, Galera node aborted and the mysqld process stopped. There’s not much information here about the exact cause of the problem, we’ve been pointed to the donor, though. Let’s take a look at it’s log.

2015-11-13 10:44:33 30400 [Note] WSREP: Member 0.0 (172.30.4.220) requested state transfer from '*any*'. Selected 1.0 (172.30.4.156)(SYNCED) as donor.
2015-11-13 10:44:33 30400 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 563464)
2015-11-13 10:44:33 30400 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2015-11-13 10:44:33 30400 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.30.4.220:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '''' --gtid '98ed75de-7c05-11e5-9743-de4abc22bd11:563464''
2015-11-13 10:44:33 30400 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with xbstream (20151113 10:44:33.598)
WSREP_SST: [INFO] Using socat as streamer (20151113 10:44:33.600)
WSREP_SST: [INFO] Using /tmp/tmp.jZEi7YBrNl as xtrabackup temporary directory (20151113 10:44:33.613)
WSREP_SST: [INFO] Using /tmp/tmp.wz0XmveABt as innobackupex temporary directory (20151113 10:44:33.615)
WSREP_SST: [INFO] Streaming GTID file before SST (20151113 10:44:33.619)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio TCP:172.30.4.220:4444; RC=( ${PIPESTATUS[@]} ) (20151113 10:44:33.621)
WSREP_SST: [INFO] Sleeping before data transfer for SST (20151113 10:44:33.626)
2015-11-13 10:44:35 30400 [Note] WSREP: (b7a335ea, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] Streaming the backup to joiner at 172.30.4.220 4444 (20151113 10:44:43.628)

In this part of the log, you can see some similar lines as in the joiner’s log - joiner (172.30.4.220) is a joiner and it requested state transfer from 172.30.4.156 (our donor node). Donor switched to a donor/desync state and xtrabackup has been triggered. 

WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf  --defaults-group=mysqld --no-version-check  $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>
${DATA}/innobackup.backup.log | socat -u stdio TCP:172.30.4.220:4444; RC=( ${PIPESTATUS[@]} ) (20151113 10:44:43.631)
2015-11-13 10:44:43 30400 [Warning] Access denied for user 'root'@'localhost' (using password: YES)
WSREP_SST: [ERROR] innobackupex finished with error: 1.  Check /var/lib/mysql//innobackup.backup.log (20151113 10:44:43.638)

At some point SST failed - we can see a hint about what may be the culprit of the problem. There’s also information about where we can look even further for information - innobackup.backup.log file in MySQL data directory.

WSREP_SST: [ERROR] Cleanup after exit with status:22 (20151113 10:44:43.640)
WSREP_SST: [INFO] Cleaning up temporary directories (20151113 10:44:43.642)
2015-11-13 10:44:43 30400 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.30.4.220:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.s
ock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '''' --gtid '98ed75de-7c05-11e5-9743-de4abc22bd11:563464'
2015-11-13 10:44:43 30400 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.30.4.220:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '''' --gtid '98ed75de-7c05-11e5-9743-de4abc22bd11:563464': 22 (Invalid argument)
2015-11-13 10:44:43 30400 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.30.4.220:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '''' --gtid '98ed75de-7c05-11e5-9743-de4abc22bd11:563464'
2015-11-13 10:44:43 30400 [Warning] WSREP: 1.0 (172.30.4.156): State transfer to 0.0 (172.30.4.220) failed: -22 (Invalid argument)
2015-11-13 10:44:43 30400 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 563464)
2015-11-13 10:44:43 30400 [Note] WSREP: Member 1.0 (172.30.4.156) synced with group.
2015-11-13 10:44:43 30400 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 563464)

All SST-related processes terminate and, the donor gets in sync with the rest of the cluster switching to ‘Synced’ state.

Finally, let’s take a look at the innobackup.backup.log:

151113 14:00:57 innobackupex: Starting the backup operation

IMPORTANT: Please check that the backup run completes successfully.
           At the end of a successful backup run innobackupex
           prints "completed OK!".

151113 14:00:57 Connecting to MySQL server host: localhost, user: root, password: not set, port: 3306, socket: /var/lib/mysql/mysql.sock
Failed to connect to MySQL server: Access denied for user 'root'@'localhost' (using password: YES).

This time there’s nothing new - the problem is related to access issues to MySQL database - such issue should immediately point you to my.cnf and ‘wsrep_sst_auth’ variable.

Problems with network streaming

Sometimes it may happen that streaming doesn’t work correctly. It can happen because there’s a network error or maybe some of the processes involved died. Let’s check how a network error impacts the SST process.

First error log snippet comes from the joiner node.

2015-11-13 14:30:46 24948 [Note] WSREP: Member 0.0 (172.30.4.220) requested state transfer from '*any*'. Selected 1.0 (172.30.4.156)(SYNCED) as donor.
2015-11-13 14:30:46 24948 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 563464)
2015-11-13 14:30:46 24948 [Note] WSREP: Requesting state transfer: success, donor: 1
WSREP_SST: [INFO] Proceeding with SST (20151113 14:30:46.352)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151113 14:30:46.353)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20151113 14:30:46.356)
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20151113 14:30:46.363)
2015-11-13 14:30:48 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') turning message relay requesting off
2015-11-13 14:31:01 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.30.4.156:4567
2015-11-13 14:31:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 0
2015-11-13 14:33:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 30
2015-11-13 14:35:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 60
^@2015-11-13 14:37:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 90
^@2015-11-13 14:39:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 120
^@^@2015-11-13 14:41:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 150
^@^@2015-11-13 14:43:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 180
^@^@2015-11-13 14:45:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 210
^@^@2015-11-13 14:46:30 24948 [Warning] WSREP: 1.0 (172.30.4.156): State transfer to 0.0 (172.30.4.220) failed: -22 (Invalid argument)
2015-11-13 14:46:30 24948 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():731: Will never receive state. Need to abort.
2015-11-13 14:46:30 24948 [Note] WSREP: gcomm: terminating thread
2015-11-13 14:46:30 24948 [Note] WSREP: gcomm: joining thread
2015-11-13 14:46:30 24948 [Note] WSREP: gcomm: closing backend
2015-11-13 14:46:30 24948 [Note] WSREP: view(view_id(NON_PRIM,201db672,149) memb {
    201db672,0
} joined {
} left {
} partitioned {
    b7a335ea,0
    fff6c307,0
})
2015-11-13 14:46:30 24948 [Note] WSREP: view((empty))
2015-11-13 14:46:30 24948 [Note] WSREP: gcomm: closed
2015-11-13 14:46:30 24948 [Note] WSREP: /usr/sbin/mysqld: Terminated.

As you can see, it’s obvious that SST failed:

2015-11-13 14:46:30 24948 [Warning] WSREP: 1.0 (172.30.4.156): State transfer to 0.0 (172.30.4.220) failed: -22 (Invalid argument)
2015-11-13 14:46:30 24948 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():731: Will never receive state. Need to abort.

A few lines before that, though: 

2015-11-13 14:31:01 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.30.4.156:4567
...
2015-11-13 14:45:02 24948 [Note] WSREP: (201db672, 'tcp://0.0.0.0:4567') reconnecting to b7a335ea (tcp://172.30.4.156:4567), attempt 210

The donor node was declared inactive - this is a very important indication of network problems. Given that Galera tried 210 times to communicate with the other node, it’s pretty clear this is not a transient error but something more serious. On the donor node, the error log contains the following entries that also make it clear something is not right with the network:

WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf  --defaults-group=mysqld --no-version-check  $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:172.30.4.220:4444; RC=( ${PIPESTATUS[@]} ) (20151113 14:30:56.713)
2015-11-13 14:31:02 30400 [Note] WSREP: (b7a335ea, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.30.4.220:4567
2015-11-13 14:31:03 30400 [Note] WSREP: (b7a335ea, 'tcp://0.0.0.0:4567') reconnecting to 201db672 (tcp://172.30.4.220:4567), attempt 0

...

2015-11-13 14:45:03 30400 [Note] WSREP: (b7a335ea, 'tcp://0.0.0.0:4567') reconnecting to 201db672 (tcp://172.30.4.220:4567), attempt 210
2015/11/13 14:46:30 socat[8243] E write(3, 0xc1a1f0, 8192): Connection timed out
2015-11-13 14:46:30 30400 [Warning] Aborted connection 76 to db: 'unconnected' user: 'root' host: 'localhost' (Got an error reading communication packets)
WSREP_SST: [ERROR] innobackupex finished with error: 1.  Check /var/lib/mysql//innobackup.backup.log (20151113 14:46:30.767)
WSREP_SST: [ERROR] Cleanup after exit with status:22 (20151113 14:46:30.769)
WSREP_SST: [INFO] Cleaning up temporary directories (20151113 14:46:30.771)

This time the disappearing node is 172.30.4.220 - the node which is joining the cluster. Finally, innobackup.backup.log looks as follows:

Warning: Using unique option prefix open_files instead of open_files_limit is deprecated and will be removed in a future release. Please use the full name instead.
151113 14:30:56 innobackupex: Starting the backup operation

IMPORTANT: Please check that the backup run completes successfully.
           At the end of a successful backup run innobackupex
           prints "completed OK!".

151113 14:30:56 Connecting to MySQL server host: localhost, user: root, password: not set, port: 3306, socket: /var/lib/mysql/mysql.sock
Using server version 5.6.26-74.0-56
innobackupex version 2.3.2 based on MySQL server 5.6.24 Linux (x86_64) (revision id: 306a2e0)
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /var/lib/mysql
xtrabackup: open files limit requested 4000000, set to 1000000
xtrabackup: using the following InnoDB configuration:
xtrabackup:   innodb_data_home_dir = ./
xtrabackup:   innodb_data_file_path = ibdata1:100M:autoextend
xtrabackup:   innodb_log_group_home_dir = ./
xtrabackup:   innodb_log_files_in_group = 2
xtrabackup:   innodb_log_file_size = 536870912
xtrabackup: using O_DIRECT
151113 14:30:56 >> log scanned up to (8828922053)
xtrabackup: Generating a list of tablespaces
151113 14:30:57 [01] Streaming ./ibdata1
151113 14:30:57 >> log scanned up to (8828922053)
151113 14:30:58 >> log scanned up to (8828922053)
151113 14:30:59 [01]        ...done
151113 14:30:59 [01] Streaming ./mysql/innodb_table_stats.ibd
151113 14:30:59 >> log scanned up to (8828922053)
151113 14:31:00 >> log scanned up to (8828922053)
151113 14:31:01 >> log scanned up to (8828922053)
151113 14:31:02 >> log scanned up to (8828922053)

...

151113 14:46:27 >> log scanned up to (8828990096)
151113 14:46:28 >> log scanned up to (8828990096)
151113 14:46:29 >> log scanned up to (8828990096)
innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
xb_stream_write_data() failed.
innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
[01] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[01] xtrabackup: Error: failed to copy datafile.

Pipe was broken because the network connection timed out.

One interesting question - how we can track errors when one of the components required to perform SST dies? It can be anything from xtrabackup process to socat, which is needed to stream the data. Let’s go over some examples:

2015-11-13 15:40:02 29258 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 563464)
2015-11-13 15:40:02 29258 [Note] WSREP: Requesting state transfer: success, donor: 0
WSREP_SST: [INFO] Proceeding with SST (20151113 15:40:02.497)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151113 15:40:02.499)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20151113 15:40:02.499)
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20151113 15:40:02.511)
2015-11-13 15:40:04 29258 [Note] WSREP: (cd0c84cd, 'tcp://0.0.0.0:4567') turning message relay requesting off
2015/11/13 15:40:13 socat[29525] E write(1, 0xf2d420, 8192): Broken pipe
/usr//bin/wsrep_sst_xtrabackup-v2: line 112: 29525 Exit 1                  socat -u TCP-LISTEN:4444,reuseaddr stdio
     29526 Killed                  | xbstream -x
WSREP_SST: [ERROR] Error while getting data from donor node:  exit codes: 1 137 (20151113 15:40:13.113)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20151113 15:40:13.116)

In the case described above, as can be clearly seen, xbstream was killed.

When socat process gets killed, joiner logs do not offer much data:

2015-11-13 15:43:05 9717 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 563464)
2015-11-13 15:43:05 9717 [Note] WSREP: Requesting state transfer: success, donor: 1
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151113 15:43:05.438)
WSREP_SST: [INFO] Proceeding with SST (20151113 15:43:05.438)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20151113 15:43:05.440)
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20151113 15:43:05.448)
2015-11-13 15:43:07 9717 [Note] WSREP: (3a69eae9, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [ERROR] Error while getting data from donor node:  exit codes: 137 0 (20151113 15:43:11.442)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20151113 15:43:11.444)
2015-11-13 15:43:11 9717 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.30.4.220' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '9717''' : 32 (Broken pipe)
2015-11-13 15:43:11 9717 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2015-11-13 15:43:11 9717 [ERROR] WSREP: SST script aborted with error 32 (Broken pipe)
2015-11-13 15:43:11 9717 [ERROR] WSREP: SST failed: 32 (Broken pipe)
2015-11-13 15:43:11 9717 [ERROR] Aborting

We have just an indication that something went really wrong and that SST failed. On the donor node, things are a bit different:

WSREP_SST: [INFO] Streaming the backup to joiner at 172.30.4.220 4444 (20151113 15:43:15.875)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf  --defaults-group=mysqld --no-version-check  $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:172.30.4.220:4444; RC=( ${PIPESTATUS[@]} ) (20151113 15:43:15.878)
2015/11/13 15:43:15 socat[16495] E connect(3, AF=2 172.30.4.220:4444, 16): Connection refused
2015-11-13 15:43:16 30400 [Warning] Aborted connection 81 to db: 'unconnected' user: 'root' host: 'localhost' (Got an error reading communication packets)
WSREP_SST: [ERROR] innobackupex finished with error: 1.  Check /var/lib/mysql//innobackup.backup.log (20151113 15:43:16.305)
WSREP_SST: [ERROR] Cleanup after exit with status:22 (20151113 15:43:16.306)
WSREP_SST: [INFO] Cleaning up temporary directories (20151113 15:43:16.309)

We can find ‘Aborted connection’ warning and information that innobackupex (wrapper that’s used to execute xtrabackup) finished with error. Still not that much hard data. Luckily, we have also innobackup.backup.log to check:

151113 15:46:59 >> log scanned up to (8829243081)
xtrabackup: Generating a list of tablespaces
151113 15:46:59 [01] Streaming ./ibdata1
innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
xb_stream_write_data() failed.
innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
[01] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[01] xtrabackup: Error: failed to copy datafile.

This time it’s much more clear - we have an error in xtrabackup_copy_datafile() function which, as you may expect, is responsible for copying files. This leads us to the conclusion that some kind of network connection issue happened between donor and joiner. Maybe not a detailed explanation but the conclusion is still valid - killing the process which is on the other end of the network connection closes that connection abruptly - something which can be described like a network issue of some kind.

We are approaching the end of this post - hopefully you enjoyed our small trip through the Galera logs. As you have probably figured out by now, it is important to check an issue from every possible angle when dealing with Galera errors. Galera nodes work together to form a cluster but they are still separate hosts connected via a network. Sometimes looking at single node’s logs is not enough to understand how the issue unfolded - you need to check every node’s point of view. This is also true for all potential issues about SST - sometimes it’s not enough to check logs on a joiner node. You also have to look into the donor’s logs - both error log and innobackup.backup.log. 

The examples of issues described in this post (and the one before) only covers a small part of all possible problems, but hopefully, they’ll help you understand the troubleshooting process.

In the next post, we are going to take a closer look at pt-stalk - a tool which may help you understand what is going on with MySQL when the standard approach fails to catch the problem.

Blog category:

Become a ClusterControl DBA: Safeguarding your Data

$
0
0

In the past four posts of the blog series, we covered deployment of clustering/replication (MySQL/Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health and in the last post, how to make your setup highly available through HAProxy and MaxScale.

So now that you have your databases up and running and highly available, how do you ensure that you have backups of your data?

You can use backups for multiple things: disaster recovery, to provide production data to test against development or even to provision a slave node. This last case is already covered by ClusterControl. When you add a new (replica) node to your replication setup, ClusterControl will make a backup/snapshot of the master node and use it to build the replica. After the backup has been extracted, prepared and the database is up and running, ClusterControl will automatically set up replication.

Creating an instant backup

In essence creating a backup is the same for Galera, MySQL replication, Postgres and MongoDB. You can find the backup section under ClusterControl > Backup and by default it should open the scheduling overview. From here you can also press the “Backup” button to make an instant backup.

severalnines-blogpost-schedule-backup.png

As all these various databases have different backup tools, there is obviously some difference in the options you can choose. For instance with MySQL you get choose between mysqldump and xtrabackup. If in doubt which one to choose (for MySQL), check out this blog about the differences and use cases for mysqldump and xtrabackup.

On this very same screen, you can also create a backup schedule that allows you to run the backup at a set interval, for instance, during off-peak hours.

severalnines-blogpost-current-schedule.png

Backing up MySQL and Galera

As mentioned in the previous paragraph, you can make MySQL backups using either mysqldump or xtrabackup. Using mysqldump you can make backups of individual schemas or a selected set of schemas while xtrabackup will always make a full backup of your database.

In the Backup Wizard, you can choose which host you want to run the backup on, the location where you want to store the backup files, and its directory and specific schemas.

severalnines-blogpost-instant-backup.png

If the node you are backing up is receiving (production) traffic, and you are afraid the extra disk writes will become intrusive, it is advised to send the backups to the ClusterControl host. This will cause the backup to stream the files over the network to the ClusterControl host and you have to make sure there is enough space available on this node.

If you would choose xtrabackup as the method for the backup, it would open up extra options: desync, compression and xtrabackup parallel threads/gzip. The desync option is only applicable to desync a node from a Galera cluster. 

severalnines-blogpost-backup-xtrabackup.png

After scheduling an instant backup you can keep track of the progress of the backup job in the Settings > Cluster Jobs. After it has finished, you should be able to see the backup file in the configured location.

severalnines-blogpost-cluster-jobs-backup.png

Backing up PostgreSQL

Similar to the instant backups of MySQL, you can run a backup on your Postgres database. With Postgres backups the are less options to fill in as there is one backup method: pg_dump.

severalnines-blogpost-backup-postgresql.png

Backing up MongoDB

Similar to PostgreSQL there is only one backup method: mongodump. In contrary to PostgreSQL the node that we take the backup from can be desynced in the case of MongoDB.

severalnines-blogpost-mongodb-backup.png

Scheduling backups

Now that we have played around with creating instant backups, we now can extend that by scheduling the backups.
The scheduling is very easy to do: you can select on which days the backup has to be made and at what time it needs to run.

For xtrabackup there is an additional feature: incremental backups. An incremental backup will only backup the data that changed since the last backup. Of course, the incremental backups are useless if there would not be full backup as a starting point. Between two full backups, you can have as many incremental backups as you like. But restoring them will take longer. 

Once scheduled the job(s) should become visible under the “Current Backup Schedule” and you can edit them by double clicking on them. Like with the instant backups, these jobs will schedule the creation of a backup and you can keep track of the progress via the Cluster Jobs overview if necessary.

Backup reports

You can find the Backup Reports under ClusterControl > Backup and this will give you a cluster level overview of all backups made. Also from this interface you can directly restore a backup to a host in the master-slave setup or an entire Galera cluster. 

severalnines-blogpost-backup-reports.png

The nice feature from ClusterControl is that it is able to restore a node/cluster using the full+incremental backups as it will keep track of the last (full) backup made and start the incremental backup from there. Then it will group a full backup together with all incremental backups till the next full backup. This allows you to restore starting from the full backup and applying the incremental backups on top of it.

Offsite backup in Amazon S3 or Glacier

Since we have now a lot of backups stored on either the database hosts or the ClusterControl host, we also want to ensure they don’t get lost in case we face a total infrastructure outage. (e.g. DC on fire or flooded) Therefore ClusterControl allows you to copy your backups offsite to Amazon S3 or Glacier. 

To enable offsite backups with Amazon, you need to add your AWS credentials and keypair in the Service Providers dialogue (Settings > Service Providers).

several-nines-blogpost-aws-credentials.png

Once setup you are now able to copy your backups offsite:

severalnines-blogpost-upload-backups-aws-s3-glacier.png

This process will take some time as the backup will be sent encrypted and the Glacier service is, in contrary to S3, not a fast storage solution.

After copying your backup to Amazon S3 or Glacier you can get them back easily by selecting the backup in the S3/Glacier tab and click on retrieve. You can also remove existing backups from Amazon S3 and Glacier here.

An alternative to Amazon S3 or Glacier would be to send your backups to another data center (if available). You can do this with a sync tool like BitTorrent Sync. We wrote a blog article on how to set up BitTorrent Sync for backups within ClusterControl.

Final thoughts

We showed you how to get your data backed up and how to store them safely off site. Recovery is always a different thing. ClusterControl can recover automatically your databases from the backups made in the past that are stored on premises or copied back from S3 or Glacier. Recovering from backups that have been moved to any other offsite storage will involve manual intervention though.
 
Obviously there is more to securing your data, especially on the side of securing your connections. We will cover this in the next blog post!

Blog category:

Webinar replay & slides for MySQL DBAs: performing live database upgrades in replication & Galera setups

$
0
0

Thanks to everyone who joined us for our recent live webinar on performing live database upgrades for MySQL Replication & Galera, led by Krzysztof Książek. The replay and slides to the webinar are now available to watch and read online via the links below.

During this live webinar, Krzysztof covered one of the most basic, but essential tasks of the DBA: minor and major database upgrades in production environments.

Watch the replay

 

Read the slides

 

Topics

  • What types of upgrades are there?
  • How do I best prepare for the upgrades?
  • Best practices for:
    • Minor version upgrades - MySQL Replication & Galera
    • Major version upgrades - MySQL Replication & Galera

up56_logo.png

Speaker

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard. 

This webinar builds upon recent blog posts and related webinar series by Krzysztof on how to become a MySQL DBA.

Blog category:

Severalnines breaks records on MySQL, PostgreSQL and MongoDB

$
0
0

Sweden’s self-funded database provider powers past 2014 revenues and grows team

Stockholm, Sweden and anywhere else in the world - 01 DECEMBER 2015 - Severalnines, a database automation pioneer which helps businesses deploy and manage open source databases in any environment, closes 2015 on a high with new customer wins and new hires.
 
Momentum highlights:

  • Over 100% sales growth achieved early in first half of 2015
  • 150+ enterprise customers, 8,000+ community users
  • New enterprise accounts wins such as the European Broadcast Union, European Gravitational Observatory, BT Expedite and French national scientific research centre, CNRS
  • Hired Gerry Treacy, former MongoDB executive, as Vice President of Sales
  • Added support for PostgreSQL to ClusterControl alongside MySQL and MongoDB

Severalnines’ flagship ClusterControl platform helps deploy, monitor, manage and scale SQL and NoSQL open source databases. Automation and control of database infrastructure across mixed environments, usually the case in large enterprises, makes Severalnines the ideal polyglot persistence solution to support modern business mixed IT environments. The reason for ClusterControl’s popularity is the way it provides full operational visibility and control for open source databases.

Severalnines is entirely self-funded, having taken no external capital, allowing its product team to focus solely on solving pressing customer and community user needs. 

Gerry Photo.jpgTo further its expansion in 2016 and beyond, Severalnines hired Gerry Treacy as Vice President of Sales to drive revenues worldwide. Treacy brings veteran sales expertise on SQL and NoSQL technologies from his time at MySQL, Oracle and MongoDB, where he held senior international management roles in Corporate Sales.

 

Commenting on the ongoing growth of the company, Severalnines CEO, Vinay Joosery, said: “It is a promising sign Severalnines had a strong 2015 when the global technology sector is going through a phase of readjustment. Enterprises are increasingly choosing to operate mixed environments and we backed a hunch that the market was ready for our polyglot persistence technology. As a result, we’ve been laser-focused on building the best operational tools to run high availability open source databases. Users are responding positively to that.”vinay_new.png

 
Explaining the company’s commercial success, he noted: “In the current age of virtualised environments across public/private clouds, we are reminded time and time again that servers are not immune to failures. But building resilient infrastructure is no small feat - from design, configuration and management through to data integrity and security - there is plenty to do. With ClusterControl, we’ve effectively lowered the barrier to entry for sysadmins and devops engineers to build and run highly available database infrastructures on top of open source. Distributed database systems are now a reality for all enterprises, not just for web giants.”
 
To join Severalnines’ growing customer base please click here.
 
 
About Severalnines

Severalnines provides automation and management software for database clusters. We help companies deploy their databases in any environment, and manage all operational aspects to achieve high-scale availability.

Severalnines' products are used by developers and administrators of all skills levels to provide the full 'deploy, manage, monitor, scale' database cycle, thus freeing them from the complexity and learning curves that are typically associated with highly available database clusters. The company has enabled over 8,000 deployments to date via its popular online database configurator. Currently counting BT, Orange, Cisco, CNRS, Technicolour, AVG, Ping Identity and Paytrail as customers.  Severalnines is a private company headquartered in Stockholm, Sweden with offices in Singapore and Tokyo, Japan. To see who is using Severalnines today visit, http://www.severalnines.com/company

Press contact

Severalnines
Jean-Jérôme Schmidt
jj@severalnines.com

Blog category:

Become a ClusterControl DBA: Managing your Database Configurations

$
0
0

In the past five posts of the blog series, we covered deployment of clustering/replication (MySQL / Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health, how to make your setup highly available through HAProxy and MaxScale and in the last post, how to prepare yourself for disasters by scheduling backups.

With ClusterControl 1.2.11, we made major enhancements to the database configuration manager. The new version allows changing of parameters on multiple database hosts at the same time and, if possible, changing their values at runtime.

We featured the new MySQL Configuration Management in a Tips & Tricks blog post, but this blog post will go more in depth and cover Configuration Management within ClusterControl for MySQL, PostgreSQL and MongoDB.

Cluster Control Configuration management

The configuration management interface can be found under Manage > Configurations. From here, you can view or change the configurations of your database nodes and other tools that ClusterControl manages. ClusterControl will import the latest configuration from all nodes and overwrite previous copies made. Currently there is no historical data kept.

If you’d rather like to manually edit the config files directly on the nodes, you can re-import the altered configuration by pressing the Import button.

And last but not least: you can create or edit configuration templates. These templates are used whenever you deploy new nodes in your cluster. Of course any changes made to the templates will not retroactively applied to the already deployed nodes that were created using these templates.

MySQL Configuration Management

As previously mentioned, the MySQL configuration management got a complete overhaul in ClusterControl 1.2.11. The interface is now more intuitive. When changing the parameters ClusterControl checks whether the parameter actually exists. This ensures your configuration will not deny startup of MySQL due to parameters that don’t exist.

From Manage -> Configurations, you will find an overview of all config files used within the selected cluster, including MaxScale nodes.

We use a tree structure to easily view hosts and their respective configuration files. At the bottom of the tree, you will find the configuration templates available for this cluster.

Changing parameters

Suppose we need to change a simple parameter like the maximum number of allowed connections (max_connections), we can simply change this parameter at runtime.

First select the hosts to apply this change to.

Then select the section you want to change. In most cases, you will want to change the MYSQLD section. If you would like to change the default character set for MySQL, you will have to change that in both MYSQLD and client sections.

If necessary you can also create a new section by simply typing the new section name. This will create a new section in the my.cnf.

Once we change a parameter and set its new value by pressing “proceed”, ClusterControl will check if the parameter exists for this version of MySQL. This is to prevent any non-existent parameters to block the initialization of MySQL on the next restart.

When we press “proceed” for the max_connections change, we will receive a confirmation that it has been applied to the configuration and set at runtime using SET GLOBAL. A restart is not required as max_connections is a parameter we can change at runtime.

Now suppose we want to change the bufferpool size, this would require a restart of MySQL before it takes effect:

And as expected the value has been changed in the configuration file, but a restart is required. You can do this by logging into the host manually and restarting the MySQL process. Another way to do this from ClusterControl is by using the Nodes dashboard.

Restarting nodes in a Galera cluster

You can perform a restart per node by selecting “Shutdown Node” and pressing the “Execute” button.

This will stop MySQL on the host but depending on your workload and bufferpool size this could take a while as MySQL will start flushing the dirty pages from the InnoDB bufferpool to disk. These are the pages that have been modified in memory but not on disk.

Once the host has stopped MySQL the “Start Node” button should become available:

Make sure you leave the “initial” checkbox unchecked in the confirmation:

When you select “initial start” on a Galera node, ClusterControl will empty the MySQL data directory and force a full copy this way. This is, obviously, unncessary for a configuration change.

Restarting nodes in a MySQL master-slave topologies

For MySQL master-slave topologies you can’t just restart node by node. Unless downtime of the master is acceptable, you will have to apply the configuration changes to the slaves first and then promote a slave to become the new master.

You can go through the slaves one by one and execute a “Shutdown node” on them and once MySQL has stopped execute the “Start node” again. Again make sure you leave the “initial” checkbox unchecked in the confirmation:

Just like the “Start Node” with Galera clusters, “initial start” will delete the MySQL data directory and copy the data from the master.

After applying the changes to all slaves, promote a slave to become the new master:

After the slave has become the new master, you can shutdown and start the old master node to apply the change.

Importing configurations

Now that we have applied the change directly on the database, as well as the configuration file, it will take until the next configuration import to see the change reflected in the configuration stored in ClusterControl. If you are less patient, you can schedule an immediate configuration import by pressing the “Import” button.

PostgreSQL Configuration Management

For PostgeSQL, the Configuration Management works a bit different from the MySQL Configuration Management. In general, you have the same functionality here: change the configuration, import configurations for all nodes and define/alter templates.

The difference here is that you can immediately change the whole configuration file and write this configuration back to the database node.

If the changes made requires a restart, a “Restart” button will appear that allows you to restart the node to apply the changes.

MongoDB Configuration Management

The MongoDB Configuration Management works similar to the PostgreSQL Configuration Management: you can change the configuration, import configurations for all nodes and alter templates.

Changing the configuration is, just like PostgreSQL, altering the whole configuration:

The biggest difference for MongoDB is that there are four configuration templates predefined:

The reason for this is that we support different types of MongoDB clusters, and this gets reflected in the cluster configurations.

Final thoughts

In this blog post we learned about how to manage, alter and template your configurations in ClusterControl. Changing the templates can save you a lot of time when you have deployed only one node in your topology. As the template will be used for new nodes, this will save you from altering all configurations afterwards. However for MySQL based nodes changing the configuration on all nodes has become trivial due to the new Configuration Management interface.

As a reminder, we recently covered in the same series deployment of clustering/replication (MySQL / Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health, how to make your setup highly available through HAProxy and MaxScale and in the last post, how to prepare yourself for disasters by scheduling backups.

Blog category:

Webinar: Polyglot Persistence for the MongoDB, PostgreSQL & MySQL DBA - Tuesday December 22nd

$
0
0

Join us for our new webinar on Tuesday, December 22nd, which is also our last webinar in 2015!

Polyglot Persistence for the MongoDB, PostgreSQL & MySQL DBA

The introduction of DevOps in organisations has changed the development process, and perhaps introduced some challenges. Developers, in addition to their own preferred programming languages, also have their own preference for backend storage.The former is often referred to as polyglot languages and the latter as polyglot persistence.

Having multiple storage backends means your organization will become more agile on the development side and allows choice to the developers but it also imposes additional knowledge on the operations side. Extending your infrastructure from only MySQL, to deploying other storage backends like MongoDB and PostgreSQL, implies you have to also monitor, manage and scale them. As every storage backend excels at different use cases, this also means you have to reinvent the wheel for every one of them.

DATE, TIME & REGISTRATION

Europe/MEA/APAC
Tuesday, December 22nd at 09:00 GMT / 10:00
CET (Germany, France, Sweden)
Register Now

North America/LatAm
Tuesday, December 22nd at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)
Register Now

AGENDA

This webinar will cover the four major operational challenges for MySQL, MongoDB & PostgreSQL:

  • Deployment
  • Management
  • Monitoring
  • Scaling
  • And how to deal with them

SPEAKER

Art van Scheppingen is a Senior Support Engineer at Severalnines. He’s a pragmatic MySQL and Database expert with over 15 years experience in web development. He previously worked at Spil Games as Head of Database Engineering, where he kept a broad vision upon the whole database environment: from MySQL to Couchbase, Vertica to Hadoop and from Sphinx Search to SOLR. He regularly presents his work and projects at various conferences (Percona Live, FOSDEM) and related meetups.

This webinar is based upon the experience Art had while writing our How to become a ClusterControl DBA blog series and implementing multiple storage backends to ClusterControl. To view all the blogs of the ‘Become a ClusterControl DBA’ series visit: http://severalnines.com/blog-categories/clustercontrol

Blog category:


s9s Tools and Resources: Polyglot Persistence webinar for MySQL, MongoDB & PostgreSQL - and more!

$
0
0

We have created new resources and tools for you and this is a summary of what we’ve recently published. Please do check it out and let us know if you have any comments or feedback.

New Technical Webinar

Polyglot Persistence for the MongoDB, PostgreSQL & MySQL DBA
Tuesday, November 22nd

Join us for our new webinar on Tuesday, December 22nd, which is also our last webinar in 2015! Art van Scheppingen, Senior Support Engineer at Severalnines, will discuss all the main aspects of managing a polyglot open source database environment as well as conduct a live demo to walk you through some examples.

To register and for the full agenda, click here.

Customer Case Studies

From small businesses to Fortune 500 companies, customers have chosen Severalnines to deploy and manage MySQL, MongoDB and PostgreSQL.  

View our Customer page to discover companies like yours who have found success with ClusterControl.

Technical Webinar - Replays

As you know, we run a monthly technical webinar cycle; this is the latest replay, which you can watch at your own leisure, all part of our ‘Become a MySQL DBA’ series:

During this live webinar, Krzysztof Książek, Senior Support Engineer at Severalnines, covered one of the most basic, but essential tasks of the DBA: minor and major database upgrades in production environments.

View all our replays here

ClusterControl Blogs

Our series of blogs focussing on how to use ClusterControl continues. Do check them out!

View all ClusterControl blogs here

The MySQL DBA Blog Series

We’re on the 18th installment of our popular ‘Become a MySQL DBA’ series and you can view all of these blogs here. Here are the latest ones in the series:

View all the ‘Become a MySQL DBA’ blogs here

Additional Technical Blogs

We trust these resources are useful. If you have any questions on them or on related topics, please do contact us!

Blog category:

ClusterControl Tips & Tricks: Monitoring multiple MySQL instances on one machine

$
0
0

Requires ClusterControl 1.2.11 or later. Applies to MySQL based instances/clusters.

On some occasions, you might want to run multiple instances of MySQL on a single machine. You might want to give different users access to their own mysqld servers that they manage themselves, or you might want to test a new MySQL release while keeping an existing production setup undisturbed.

It is possible to use a different MySQL server binary per instance, or use the same binary for multiple instances (or a combination of the two approaches). For example, you might run a server from MySQL 5.1 and one from MySQL 5.5, to see how the different versions handle a certain workload. Or you might run multiple instances of the latest MySQL version, each managing a different set of databases.

Whether or not you use distinct server binaries, each instance that you run must be configured with unique values for several operating parameters. This eliminates the potential for conflict between instances. You can use MySQL Sandbox to create multiple MySQL instances. Or you can use mysqld_multi available in MySQL to start, or stop any number of separate mysqld processes running on different TCP/IP ports and UNIX sockets.

In this blog post, we’ll show you how to monitor multiple MySQL instances on one host using ClusterControl.

ClusterControl Limitation

At the time of writing, ClusterControl does not support monitoring of multiple instances on one host per cluster/server group. It assumes the following best practices:

  • Only one MySQL instance per host (physical server or virtual machine).
  • MySQL data redundancy should be configured on N+1 server.
  • All MySQL instances are running with uniform configuration across cluster/server group, e.g., listening port, error log, datadir, basedir, socket are identical.

With regards to the points mentioned above, ClusterControl assumes that in a cluster/server group:

  • MySQL instances are configured uniformly across a cluster; same port, same location of logs, base/data directory and other critical configurations.
  • It monitors, manages and deploys only one MySQL instance per host.
  • MySQL client must be installed on the host and available on the executable path for the corresponding OS user.
  • The MySQL is bound to an IP address reachable by ClusterControl node.
  • It keeps monitoring the host statistics e.g CPU/RAM/disk/network for each MySQL instance individually. In an environment with multiple instances per host, you should expect redundant host statistics since it monitors the same host multiple times.

With the above assumptions, the following ClusterControl features do not work for a host with multiple instances:

  • Backup - Percona Xtrabackup does not support multiple instances per host and mysqldump executed by ClusterControl only connects to the default socket.
  • Process management - ClusterControl uses the standard ‘pgrep -f mysqld_safe’ to check if MySQL is running on that host. With multiple MySQL instances, this is a false positive approach. As such, automatic recovery for node/cluster won’t work.
  • Configuration management - ClusterControl provisions the standard MySQL configuration directory. It usually resides under /etc/ and /etc/mysql.

Workaround

Monitoring multiple MySQL instances on a machine is still possible with ClusterControl with a simple workaround. Each MySQL instance must be treated as a single entity per server group.

In this example, we have 3 MySQL instances on a single host created with MySQL Sandbox:

We created our MySQL instances using the following commands:

$ su - sandbox
$ make_multiple_sandbox mysql-5.6.26-linux-glibc2.5-x86_64.tar.gz

By default, MySQL Sandbox creates mysql instances that listen to 127.0.0.1. It is necessary to configure each node appropriately to make them listen to all available IP addresses. Here is the summary of our MySQL instances in the host:

[sandbox@test multi_msb_mysql-5_6_26]$ cat default_connection.json
{
"node1":
    {
        "host":     "127.0.0.1",
        "port":     "15227",
        "socket":   "/tmp/mysql_sandbox15227.sock",
        "username": "msandbox@127.%",
        "password": "msandbox"
    }
,
"node2":
    {
        "host":     "127.0.0.1",
        "port":     "15228",
        "socket":   "/tmp/mysql_sandbox15228.sock",
        "username": "msandbox@127.%",
        "password": "msandbox"
    }
,
"node3":
    {
        "host":     "127.0.0.1",
        "port":     "15229",
        "socket":   "/tmp/mysql_sandbox15229.sock",
        "username": "msandbox@127.%",
        "password": "msandbox"
    }
}

From ClusterControl, we need to perform ‘Add Existing Server/Cluster’ for each instance as we need to isolate them in a different group to make it work. For node1, enter the following information in ClusterControl> Add Existing Server/Cluster:

You can monitor the progress by clicking on the spinning arrow icon in the top menu. You will see node1 in the UI once ClusterControl finishes the job:

Repeat the same steps to add another two nodes with port 15228 and 15229. You should see something like the below once they are added:

There you go. We just added our existing MySQL instances into ClusterControl for monitoring. Happy monitoring!

PS.: To get started with ClusterControl, click here!

Blog category:

Press Release: Severalnines adds silver lining to database management for CloudStats.me

$
0
0

UK server monitoring platform provider moves from MySQL to MariaDB with ClusterControl

Stockholm, Sweden and anywhere else in the world - 08 December 2015 – Severalnines, the provider of open source database management tools, today announced its latest customer, CloudStats.me. CloudStats.me is a cloud-based server monitoring platform capable of monitoring Linux, Windows, OS X servers and personal computers, web sites and IP addresses. Users of CloudStats.me range from system administrators to website owners and web hosting providers, such as WooServers, HudsonValleyHost and other SMEs.

CloudStats.me is a UK company which currently monitors more than 1300 servers and 1000 websites and its server monitoring platform is growing rapidly at 500-1000 new users per week, adding approximately 1.5 servers to each new user account.

The rapid growth of the CloudStats user base and the number of services being monitored created a very high load on its database. Such high load has led the MySQL-based CloudStats database to being unable of handling large amounts of incoming data collected by the CloudStats system, causing false alerts and further inconveniences to some of its users. CloudStats needed a new database distribution mechanism that would replace the current MySQL database configuration to avoid further customer churn.

The MariaDB MySQL Galera Cluster, which allows users to create a MySQL cluster with a “Master-Master” replication, was chosen as the replacement. The database is automatically distributed and replicated across multiple servers, and additional servers can be added to scale workloads, which makes it extremely easy to scale a database cluster and add additional resources to it if necessary.

For the CloudStats.me team it was apparent, however, that the database cluster management required a number of specialist technical skills. This could create a burden on the development team as they would need to devote extra time to setup, configure and manage clusters. Instead of hiring additional database system administrators, it was much more efficient to use one platform that could quickly install, configure and manage a MariaDB Galera Cluster. Furthermore, such a platform would need to support easy cluster expansion, database availability, scalability and security.

After some online investigtion, the CloudStats.me team found Severalnines’ ClusterControl. The initial setup of the ClusterControl platform only took a couple of hours and went live straight out of the box. The whole installation process was fully automated, so the team could have clusters up and running in very little time. Severalnines also offered valuable advice on design of the new database architecture.

Today, CloudStats.me sees the following benefits from using Severalnines ClusterControl:

  • Complete control of its MariaDB cluster with an ability to scale at any time.
  • Significant cost savings were achieved as ClusterControl helped reduce database administration costs by 90%.
  • There is additional bandwidth to save costs on technical issues with support included as part of the ClusterControl subscription.

Vinay Joosery, Severalnines Founder and CEO said: “As we have seen recently, 2015 is truly the year of the cloud as the leading vendors are beating their expected sales targets. There is an appetite to ensure servers can be rapidly deployed and optimised for performance. It is therefore no surprise to see the success of companies like CloudStats.me. We are delighted to provide CloudStats.me with a scalable infrastructure and the ability to increase or decrease workload based on its customers’ needs.”

Alex Krasnov, CEO CloudStats said: “Our cloud and server performance management business is witnessing a ‘hockey stick’ growth trajectory. This means it was particularly important to plan our infrastructure in such a way that it would support further growth and allow us to add more features into the CloudStats control panel. The support we have had from Severalnines has made our lives much easier because it provided the database expertise, which we needed to ensure our system has a maximum uptime. We plan to deploy more clusters and continue to integrate ClusterControl with our server monitoring platform.”

For more information on how CloudStats uses ClusterControl panel based on WooServers MariaDB cluster-optimized dedicated servers, click here.

About Severalnines

Severalnines provides automation and management software for database clusters. We help companies deploy their databases in any environment, and manage all operational aspects to achieve high-scale availability.

Severalnines' products are used by developers and administrators of all skills levels to provide the full 'deploy, manage, monitor, scale' database cycle, thus freeing them from the complexity and learning curves that are typically associated with highly available database clusters. The company has enabled over 8,000 deployments to date via its popular online database configurator. Currently counting BT, Orange, Cisco, CNRS, Technicolour, AVG, Ping Identity and Paytrail as customers. Severalnines is a private company headquartered in Stockholm, Sweden with offices in Singapore and Tokyo, Japan. To see who is using Severalnines today visit, http://www.severalnines.com/company

About CloudStats.me

CloudStats.me is a server and website monitoring and backup solution that works from the cloud. With CloudStats it is easy to monitor and backup any type of Server, Droplet or Instance working on Linux or Windows operating system. Being platform agnostic CloudStats.me will allow you to monitor and backup your whole IT infrastructure from one single place. For instance, if you have several droplets on Digital Ocean, a few instances on Amazon EC2, one dedicated server and one virtual server at a 3rd party company - CloudStats will gather statistics from all of them and will notify you of any problems with your services.

CloudStats.me is a Microsoft partner and has a full support of Microsoft Azure, including monitoring and backups of any services to Azure cloud storage.

Among CloudStats clients are both private system administrators as well as hosting providers, such as WooServers.com, HudsonValleyHost, VirtusHost, Host4Geeks, etc.

Press contact

Severalnines
Jean-Jérôme Schmidt
jj@severalnines.com

Blog category:

Become a MySQL DBA blog series - Troubleshooting with pt-stalk

$
0
0

In our previous posts, we have covered different log files and how to use them to troubleshoot MySQL, but that is not all that MySQL has to offer. But what do you do when the standard approach fails to pinpoint the problem? In this post, we will take a closer look at pt-stalk - a tool which may help you understand what is going on with MySQL when the standard approach fails to catch the problem.

This is the nineteenth installment in the ‘Become a MySQL DBA’ blog series. Our previous posts in the DBA series include:

Logs are not enough for troubleshooting

In the recent posts of our ‘Become a MySQL DBA’ series, we’ve discussed what kind of useful data the MySQL and Galera logs can bring us. This data is great for debugging potential issues but it’s not everything that MySQL has to offer. There are multiple ways in which MySQL presents its internal state - we are talking here about data presented in the output of ‘SHOW GLOBAL STATUS’ or ‘SHOW ENGINE INNODB STATUS’. When it comes to dealing with row locking contention, the following queries can give you great understanding of how things are looking in this matter:

SELECT * FROM information_schema.INNODB_TRX\G
SELECT * FROM information_schema.INNODB_LOCK_WAITS\G

They present you with information about open transactions and transactions in LOCK_WAIT state. Also, we should not forget that MySQL doesn’t run in a vacuum - operating system state and its settings can impact how MySQL performs. Hardware has its own limits too - memory utilization, CPU utilization, disk load. Many times, issues you’ll face will not be that complex. But you never know when knowledge about how the CPU is handling interrupts, or how memory is allocated by MySQL, may help you in dealing with seemingly weird issues.

The main problem related to this data is that all of it is available and meaningful only when the issue is evolving - you may run queries to collect info on row locking and you can run external tools to grab the data on CPU utilization, but if it’s happening after the incident, this data, most likely, won’t help much in solving the problem.

Obviously, it’s not always possible to have someone online all the time something is happening to the database. It’s especially true for those transient, hardly reproducible crashes or slowdowns. Of course, you can (and many do) build your own set of scripts which will monitor MySQL status and collect the data. Luckily, it’s not really needed - there’s a great tool designed to do just that - pt-stalk from Percona Toolkit.

How to use pt-stalk?

Before we cover what type of data you can get out of pt-stalk, we need to discuss how to use it. There are a couple of ways you can trigger pt-stalk to execute. By default it will execute and begin to watch for a predefined condition.

Let’s take a look at an example of pt-stalk invocation:

root@ip-172-30-4-156:~# pt-stalk --password=pass --function=status --variable=Threads_running --threshold=25 --cycles=5 --interval=1 --iterations=2

Most of those settings are default ones (only iterations has to be passed explicitly). What does it mean?

The variable passed to ‘--function’ defines what type of data pt-stalk should monitor. Default setting, ‘status’, is basically the output of ‘SHOW GLOBAL STATUS’. In this case, the variable passed to ‘--variable’ defines which counter from the status output should be monitored - by default it’s Threads_running.

Another option you can pass to ‘--function’ is a ‘processlist’. This will make pt-stalk focus on the contents of ‘SHOW PROCESSLIST’. In such case, ‘--variable’ will define which column of the output will be monitored and argument passed to ‘--match’. The latter defines a pattern to use on the chosen column.

Last option you can pass to ‘--function’ is a file. This allows you to build much more complex methods of triggering pt-stalk by executing a user defined script or function. We are going to cover this, along with example of how to use, later in this post.

‘--threshold’ defines when pt-stalk triggers data collection. In general, the idea is that each of the method we just discussed ends up returning an integer value. This can be a counter from the SHOW GLOBAL STATUS like, for example, number of currently running threads. When watching processlist output, this can be for example number of queries in a given query state (i.e. ‘statistics’ or ‘query end’) or maybe number of queries for a given MySQL user. If you define a function via external file, you also have to return some kind of integer to make it work.

In our example, the threshold is set to 25 so if more than 25 threads are running at the same time, the threshold will be crossed.

Another option we’ve passed to pt-stalk is ‘--cycles’ with default setting of 5. This number defines how many times a threshold has to be crossed before the actual data collection will be triggered. Sometimes it’s not easy to differentiate between the normal workload and condition related to an issue only by looking at the threshold value. Let’s say that in our case it is expected that the workload may spike and cross the 25 threads running simultaneously but when the issue happens, it stays high for much longer time. That’s why we set ‘--cycles’ to 5 - a single spike won’t trigger data collection but if the threshold will be passed for a longer period of time, pt-stalk will start doing its job. The check happens every ‘--interval’ seconds, by default it’s set to 1 second.

Last option we used is ‘--iterations’. This is meant to stop pt-stalk collecting data after ‘--iterations’. In our case, data collection was triggered twice and then pt-stalk terminated.

Example output from pt-stalk may look like:

2015_12_03_12_41_51 Starting /usr/bin/pt-stalk --function=status --variable=Threads_running --threshold=25 --match= --cycles=5 --interval=1 --iterations=2 --run-time=30 --sleep=300 --dest=/var/lib/pt-stalk --prefix= --notify-by-email= --log=/var/log/pt-stalk.log --pid=/var/run/pt-stalk.pid --plugin=
2015_12_03_12_41_54 Check results: status(Threads_running)=43, matched=yes, cycles_true=1
2015_12_03_12_41_55 Check results: status(Threads_running)=33, matched=yes, cycles_true=2
2015_12_03_12_41_56 Check results: status(Threads_running)=51, matched=yes, cycles_true=3
2015_12_03_12_41_57 Check results: status(Threads_running)=31, matched=yes, cycles_true=4
2015_12_03_12_42_01 Check results: status(Threads_running)=44, matched=yes, cycles_true=1
2015_12_03_12_42_02 Check results: status(Threads_running)=43, matched=yes, cycles_true=2
2015_12_03_12_42_05 Check results: status(Threads_running)=65, matched=yes, cycles_true=1
2015_12_03_12_42_06 Check results: status(Threads_running)=47, matched=yes, cycles_true=2
2015_12_03_12_42_07 Check results: status(Threads_running)=65, matched=yes, cycles_true=3
2015_12_03_12_42_08 Check results: status(Threads_running)=65, matched=yes, cycles_true=4
2015_12_03_12_42_09 Check results: status(Threads_running)=65, matched=yes, cycles_true=5
2015_12_03_12_42_09 Collect 1 triggered
2015_12_03_12_42_09 Collect 1 PID 18267
2015_12_03_12_42_09 Collect 1 done
2015_12_03_12_42_09 Sleeping 300 seconds after collect
2015_12_03_12_47_09 Check results: status(Threads_running)=65, matched=yes, cycles_true=1
2015_12_03_12_47_11 Check results: status(Threads_running)=52, matched=yes, cycles_true=1
2015_12_03_12_47_13 Check results: status(Threads_running)=35, matched=yes, cycles_true=1
2015_12_03_12_47_14 Check results: status(Threads_running)=65, matched=yes, cycles_true=2
2015_12_03_12_47_15 Check results: status(Threads_running)=66, matched=yes, cycles_true=3
2015_12_03_12_47_16 Check results: status(Threads_running)=32, matched=yes, cycles_true=4
2015_12_03_12_47_17 Check results: status(Threads_running)=65, matched=yes, cycles_true=5
2015_12_03_12_47_18 Collect 2 triggered
2015_12_03_12_47_18 Collect 2 PID 3466
2015_12_03_12_47_18 Collect 2 done
2015_12_03_12_47_18 Waiting up to 90 seconds for subprocesses to finish...
2015_12_03_12_47_49 Exiting because no more iterations
2015_12_03_12_47_49 /usr/bin/pt-stalk exit status 0

As you can see, the condition was met a couple of times but not for five checks consecutively.

Of course, this is not all of the variables you can pass to pt-stalk. Some more of them can be found at the beginning of the pt-stalk output. There are also others, not mentioned there. We’ll now cover a couple of useful settings.

‘--disk-bytes-free’ (default of 100M) and ‘--disk-pct-free’ (default of 5). Pt-stalk collects lots of data and collection may happen pretty often (by default, every 5 minutes - there’s a ‘--sleep’ variable with a default of 300 seconds which define how long pt-stalk should wait after the data collection before next check can be executed) so disk utilization can be an issue. Those variables allow you to set minimum thresholds in both disk space free and percent of disk free which pt-stalk keeps on the system. ‘--disk-bytes-free’ accepts suffixes like k, M, G or T. Once you set those variables, pt-stalk will remove old data as needed, in order to meet the condition.

‘--retention-time’, by default set to 30. This defines how many days pt-stalk will keep the data before it is discarded. This is another means of keeping enough disk free.

We’d suggest to tune those disk-related variables according to your monitoring settings - you don’t want to be waken up at 3am just because pt-stalk generated too many files on disk and caused an alert to be triggered.

‘--collect-gdb’, ‘--collect-oprofile’, ‘--collect-strace’ and ‘--collect-tcpdump’ - those variables can enable additional ways of collecting data. Some of them are more intrusive and may create bigger impact on the system (like gdb and strace) - in general, unless you are positive you know what you are doing, you probably don’t want to enable any of them. Some additional software may need to be installed in order to make those options work.

‘--collect’ - by default pt-stalk collects the data when condition is triggered. It may be that you want it to just monitor the system. In such case you can negate this variable and start pt-stalk with ‘--no-collect’ - it won’t collect the data, will just print the checks if they cross the threshold defined.

Of course, pt-stalk can also be daemonized - this is the second way you can run it. For that you can use ‘--daemonize’ option to fork it to the background. In this state it will write to a log file described in ‘--log’ variable. By default it’s /var/log/pt-stalk.log.

Sometimes we want to collect the data right now, no matter what conditions are. This is the third way we can execute pt-stalk. By default it’s started with ‘--stalk’ variable. It is negatable and when we use ‘--no-stalk’, pt-stalk won’t be checking the state of the system. Instead it will proceed with collecting the data. It’s very useful to set ‘--iterations’ to some value, otherwise pt-stalk will run indefinitely triggering the data collection every ‘--sleep’ seconds.

Extending pt-stalk

As we mentioned earlier in this post, pt-stalk can be extended to some extent. You can do it two-fold.

First, there’s an option to execute something in addition to pt-stalk. We can use ‘--plugin’ for that. The process is fairly simple - what you need to do is to pass a file name to this variable. This file should be a bash script, not necessarily executable. It should define at least one of the following bash functions:

  • before_stalk - triggered before stalking
  • before_collect - triggered before collecting data
  • after_collect - triggered after data collection but before the ‘sleep’ phase
  • after_collect_sleep - triggered after data collection and after the ‘sleep’ phase
  • after_interval_sleep - triggered after sleeping for defined ‘--interval’, between checks
  • after_stalk - triggered after stalking (when ‘--iterations’ are defined)

Such function should look like, for example:

before_collect() {
   # run something here, an external script maybe?
}

As you can see, you can execute pretty much anything. It can be a binary, it can be a perl/python/go script that will implement some additional logic, it can be just bash if that’s what you prefer - anything which will allow you to do what you wanted to.

For example, such scripts can implement additional ways of collecting the data (using before_collect function). They can run some sanity checks on access rights to pt-stalk log and data where collected data ends (using before_stalk). They can parse the output and generate a report out of them (using after_collect).  It’s entirely up to you how you use them.

Pt-stalk enables you to send an email for every collect using ‘--notify-by-email’ but you can implement sending the whole report once the collection finishes.

If you are not satisfied with the different conditions pt-stalk allows to use for checks, it’s very easy to implement your own. As we mentioned earlier, ‘--function’ accepts file name. It works in a very similar way as ‘--plugin’ - it has to be a bash script with a defined function trg_plugin. Again, it doesn’t have to be executable file, just simple plain text script like:

trg_plugin() {
        # Execute something here, it has to return integer
        # you can leverage $EXT_ARGV which passes access details:

        mysql $EXT_ARGV -e "SELECT trx_id FROM INFORMATION_SCHEMA.INNODB_TRX WHERE trx_started < DATE_ADD(NOW(), INTERVAL -1 MINUTE);" | grep -v trx_id | wc -l
}

This particular function will calculate how many transactions are running for more than one minute. We can use it with pt-stalk to, for example, run a data collection when there’s at least one such long running transaction:

root@ip-172-30-4-156:~# pt-stalk --password=pass --function=test.sh --threshold=0 --cycles=1 --iterations=1
2015_12_03_15_45_20 Starting /usr/bin/pt-stalk --function=test.sh --variable=Threads_running --threshold=0 --match= --cycles=1 --interval=1 --iterations=1 --run-time=30 --sleep=300 --dest=/var/lib/pt-stalk --prefix= --notify-by-email= --log=/var/log/pt-stalk.log --pid=/var/run/pt-stalk.pid --plugin=
2015_12_03_15_45_20 Check results: test.sh(Threads_running)=1, matched=yes, cycles_true=1
2015_12_03_15_45_20 Collect 1 triggered
2015_12_03_15_45_20 Collect 1 PID 5967
2015_12_03_15_45_20 Collect 1 done
2015_12_03_15_45_20 Waiting up to 90 seconds for subprocesses to finish...
2015_12_03_15_45_53 Exiting because no more iterations
2015_12_03_15_45_53 /usr/bin/pt-stalk exit status 0

We had one such transaction (which was more than the threshold of 0) and data has been collected.

As mentioned, you can extend it by calling scripts. Let’s take a look at the following example. We’d like to execute pt-stalk when the number of queries executed per second reaches some level. In the output of “SHOW GLOBAL STATUS” you will find ‘Queries’ counter, but it’s a counter, not a gauge - it increments with time. We need to grab two of the values and subtract the first from the second. We can also use mysqladmin tool for that. To accomplish that, we’ve prepared this simple setup. First, we have our ‘test.sh’ file with the following contents:

trg_plugin() {
    # Execute something here, it has to return integer
    # you can leverage $EXT_ARGV which passes access details:

    ./qps_trigger.pl $EXT_ARGV
}

What it does is fairly simple - it calls a perl script and pass $EXT_ARGV to it for MySQL access credentials.

The Perl script looks as below:

#!/usr/bin/perl

my $conn_str = '';

foreach my $arg (@ARGV) {
    $conn_str = $conn_str . $arg . '';
}

open(OUT,"mysqladmin " . $conn_str . "-c 2 -ri 1 ext | grep Queries |") || die "Failed: $!\n";
@out_arr = ();
while ( <OUT> )
{
   push(@out_arr, $_);
}

my @cols = split("",$out_arr[1]);
print @cols[3];

We first build a connection string out of the arguments passed to the perl script, and then we use mysqladmin to execute ‘SHOW GLOBAL STATUS’ twice, using -r option which triggers recursive output:

root@ip-172-30-4-156:~# mysqladmin --user=root --password=pass -c 2 -ri 1 ext | grep Queries
| Queries                                       | 61002165                                              |
| Queries                                       | 1628                                                  |

The second line of the output contains the value we want to check. The script grabs it and prints it. Finally, we run our pt-stalk with ‘--function=test.sh’ and we want it to trigger when MySQL does more than 2000 queries per second.

root@ip-172-30-4-156:~# pt-stalk --password=pass --user=root --function=test.sh --threshold=2000 --cycles=3 --iterations=1
2015_12_03_16_25_40 Starting /usr/bin/pt-stalk --function=test.sh --variable=Threads_running --threshold=2000 --match= --cycles=3 --interval=1 --iterations=1 --run-time=30 --sleep=300 --dest=/var/lib/pt-stalk --prefix= --notify-by-email= --log=/var/log/pt-stalk.log --pid=/var/run/pt-stalk.pid --plugin=
2015_12_03_16_25_41 Check results: test.sh(Threads_running)=2482, matched=yes, cycles_true=1
2015_12_03_16_25_55 Check results: test.sh(Threads_running)=2936, matched=yes, cycles_true=1
2015_12_03_16_25_58 Check results: test.sh(Threads_running)=2068, matched=yes, cycles_true=2
2015_12_03_16_26_04 Check results: test.sh(Threads_running)=2235, matched=yes, cycles_true=1
2015_12_03_16_26_12 Check results: test.sh(Threads_running)=2009, matched=yes, cycles_true=1
2015_12_03_16_26_28 Check results: test.sh(Threads_running)=2382, matched=yes, cycles_true=1
2015_12_03_16_26_30 Check results: test.sh(Threads_running)=2080, matched=yes, cycles_true=2
2015_12_03_16_26_36 Check results: test.sh(Threads_running)=2163, matched=yes, cycles_true=1
2015_12_03_16_26_39 Check results: test.sh(Threads_running)=2083, matched=yes, cycles_true=2
2015_12_03_16_26_41 Check results: test.sh(Threads_running)=2346, matched=yes, cycles_true=3
2015_12_03_16_26_41 Collect 1 triggered
2015_12_03_16_26_41 Collect 1 PID 28229
2015_12_03_16_26_41 Collect 1 done
2015_12_03_16_26_41 Waiting up to 90 seconds for subprocesses to finish...
2015_12_03_16_27_14 Exiting because no more iterations
2015_12_03_16_27_14 /usr/bin/pt-stalk exit status 0

As you can see, pt-stalk is very flexible in terms of what can be done with it (and under what conditions). In the next post in the series, we are going to talk about what data pt-stalk can collect for you and what kind of information you can derive from this data.

Blog category:

Time to vote! Severalnines talks and tutorials for Percona Live 2016!

$
0
0

The Percona Live Data Performance Conference (for MySQL and MongoDB users and more) is coming up in just a few months and talk submissions have been going strong judging by the social media activity.

As you might have seen from various communications, Percona are asking participants to vote upfront for the tutorials and talks that have been submitted for consideration in the conference programme.

We’ve been busy ourselves with submissions and we’d like to ask you to have a look at the content we submitted. If you like what you see and would like to find out more in Santa Clara, then please vote for your prefered Severalnines talks and/or tutorials below!

Thank you and we look forward to seeing you in April!

Tutorials

Become a MySQL DBA

This hands-on tutorial is intended to help you navigate your way through the steps that lead to becoming a MySQL DBA. We are going to talk about the most important aspects of managing MySQL infrastructure and we will be sharing best practices and tips on how to perform the most common activities.
Vote for this tutorial!

Become a Polyglot Persistence DBA with ClusterControl

This tutorial will cover the four major operational challenges when extending your infrastructure from only MySQL to deploying other storage backends: deployment, management, monitoring and scaling … and how to deal with them using Severalnines’ ClusterControl software. We will cover MySQL, PostgreSQL and MongoDB storage backends and provide a setup using virtuals where you can freely test upon.
Vote for this tutorial!

Talks

Docker and Galera: Stateful in a stateless World!

Docker is becoming more mainstream and adopted by users as a method to package and deploy self-sufficient applications in primarily stateless Linux containers. It's a great toolset on top of OS level virtualization (LXC a.k.a containers) and plays well in the world of micro services. There are a number ways to provide persistent storage in docker containers and in this presentation we will talk about how to setup a persistence data service with docker that can be teared down and brought up across hosts and containers.
Vote for this talk!

Load Balancers for MySQL: an overview and comparison of options

This session aims to give a solid grounding in load balancer technologies for MySQL and MariaDB. We will review the wide variety of open-source options available: from application connectors (php-mysqlnd, jdbc), TCP reverse proxies (HAproxy, Keepalived, Nginx) and SQL-aware load balancers (MaxScale, ProxySQL, MySQL Router), and look at what considerations you should make when assessing their suitability for your environment.
Vote for this talk!

MySQL (NDB) Cluster - Best Practices

In this session we will talk about Core architecture and Design principles of NDB Cluster APIs for data access (SQL and NoSQL interfaces) Important configuration parameters Best practices: indexing and schema We will also compare performance between MySQL Cluster 7.4 and Galera (MySQL 5.6), and how to best make use of the feature set of MySQL Cluster 7.4.
Vote for for this talk!

Performance Analysis and Auto-tuning of MySQL using DBA-Minions

In this session, we will introduce you to a new tool that can be used to develop DBA-Minions. The tool is available freely to the wider MySQL community, and a number of DBA-Minions for analysing database performance are already available to download from GitHub. At the end of the session, attendees will learn how to use these DBA-Minions, modify them, or create new ones using an integrated IDE.
Vote for this talk!

How to automate, monitor and manage your MongoDB servers

The business model of the company behind MongoDB is to sell premium support and administration tools to maintain and monitor MongoDB. As an alternative there are (open source) solutions that could make your life as a DBA easier. In this session, we will go beyond the deployment phase and show you how you can automate tasks, how to monitor a cluster and how to manage MongoDB overall.
Vote for this talk!

Polyglot Persistence for the MySQL & MongoDB DBA

The introduction of DevOps in organisations has changed the development process, and perhaps introduced some challenges. Developers, in addition to their own preferred programming languages, also have their own preference for backend storage. Extending your infrastructure from only MySQL, to deploying other storage backends like MongoDB and PostgreSQL, implies you have to also monitor, manage and scale them. This session, we will show you how!
Vote for this talk!

About the speakers

Alex Yu

Alex Yu, VP Products, Severalnines

Alex is the VP of Products at Severalnines, responsible for all product related strategy and operations. Prior to Severalnines, Alex was Master Principal Sales Consultant at MySQL/Sun Microsystems/Oracle in the APAC region, where he worked with some of the region's largest telecoms service providers and network equipment manufactures to build massively scalable database infrastructures. He previously held key development roles in various startups, and was part of the original MySQL Cluster development team at Ericsson Alzato, which MySQL acquired 2003.

Johan Andersson

Johan Andersson, CTO, Severalnines

Johan is CTO at Severalnines, a company that enables developers to easily deploy, manage, monitor and scale highly-available MySQL clusters in the data center, in hosted environments and on the cloud. Prior to Severalnines, Johan worked at MySQL/Sun/Oracle and was the Principal Consultant and lead of the MySQL Clustering and High Availability consulting group, where he designed and implemented large-scale MySQL systems at key customers.

Art van Scheppingen

Art van Scheppingen, Senior Support Engineer, Severalnines

Art is a pragmatic MySQL and Database expert with over 15 years experience in web development. He previously worked at Spil Games as Head of Database Engineering, where he kept a broad vision upon the whole database environment: from MySQL to Couchbase, Vertica to Hadoop and from Sphinx Search to SOLR. He regularly presents his work and projects at various conferences (Percona Live, FOSDEM) and related meetups.

Ashraf Sharif

Ashraf Sharif, System Support Engineer, Severalnines

Ashraf is System Support Engineer at Severalnines. He was previously involved in hosting world and LAMP stack, where he worked as principal consultant and head of support team and delivered clustering solutions for large websites in the South East Asia region. His professional interests are on system scalability and high availability.

See you all in Santa Clara!

Blog category:

Become a ClusterControl DBA: Managing your logfiles

$
0
0

Earlier in the blog series, we touched upon deployment of clustering/replication (MySQL / Galera, MySQL Replication, MongoDB & PostgreSQL), management & monitoring of your existing databases and clusters, performance monitoring and health, how to make your setup highly available through HAProxy and MaxScale, how to prepare yourself for disasters by scheduling backups and in the last post how to manage your database configuration files where we described the new configuration management interface that got introduced in ClusterControl 1.2.11.

Another enhancement in ClusterControl 1.2.11 is the addition of system log files. Instead of having to log into each and every node in a cluster, you can now conveniently browse and read the mysqld and mongod log files of every node from within ClusterControl.

Todays blog post will cover the ClusterControl log section with all the tools available in ClusterControl and how to use them to your benefit. We will also cover how to grab all the necessary log files when troubleshooting issues together with the Severalnines support team.

Cluster Jobs

The Cluster Jobs contain the output of the various jobs that are run on a cluster. You can find the cluster specific jobs under Cluster > Logs > Cluster Jobs. The output of the job is in a certain sense just a log file detailing the steps executed in a job. Normally you would have no need to look at the output of these jobs. But should a certain job not succeed, then this is the first place to look for clues.

In this overview you can immediately see all jobs and their status. For instance here you can see that a backup is currently running on 10.10.11.11.

We can also spot a failed job. If we want to know why it failed, we can click on the entry and get the job output in the view below.

In the job details, we can look at the exit code of each step to trace back to the beginning of the problem. In this case, the first entry with an exit code of 1 is the ssh command to the new host. Apparently the CMON controller is unable to establish an ssh session to the new host and this is something we can resolve.

CMON Log files

The next place to look are the CMON Log files. You can find them under Cluster > Logs > CMON Logs. Here you will find the log entries of all scheduled jobs CMON is running, like crons and reports. Also any failure of nodes or cluster degradation can be found here. So for instance, if a node in your cluster is down, this is the place to look for hints.

The example above shows log entries of errors that one node in the cluster cannot be reached while there are informative lines that inform you that the cluster has 1 dead node and 2 nodes that are alive.

You can sort and filter the log entries as well.

MySQL log files

As mentioned earlier, we have added the collection of the mysql log files in ClusterControl 1.2.11. Files included are the MySQL error log, the innobackup backup and restore log files. You can find them under Cluster > Logs > System Logs.

All log files are being collected by ClusterControl every 30 minutes and you can check the “Last Updated” time at the bottom of the overview. If you are in immediate need of the log files you can push the “Refresh Logs” button to trigger a job in ClusterControl to collect the latest lines from the log files.

Also if you wish to have the log files collected more (or less) often, you can change this in Cluster > Settings > General Settings or change this in the cluster configuration file directly and reload the CMON service.

The MySQL error log can be very helpful to find and resolve issues within your cluster. We published a blog post about the ins and outs of the MySQL error log a few weeks ago.

Next to the MySQL error log, we also provide the innobackup backup and restore logs. These log files are created by the process that provides a node with the data from its master (or SST from another node in Galera’s case). If anything goes wrong during loading the data, these log files will give you a good clue about what went wrong.

To give an example, suppose we are forcing an SST in Galera and this fails. Firstly we can find the failed SST error in the MySQL error log:

As you can see, first 10.10.11.12 get selected as a donor, the MySQL data directory gets emptied and then the data is transferred. So next step would be to check the innobackup backup log on the donor:

We can see that innobackupex made an attempt to make a backup but failed to connect to MySQL. It used the root account and password in this case, so this indicates the stored credentials for the SST (wsrep_sst_auth) are invalid. In this case, it is quite obvious why it failed. But in less obvious cases, these log files are a great help in resolving an issue.

Mongodb log files

Just as described above, the MongoDB log files are collected by ClusterControl. You can find them under Cluster > Logs > System Logs.

All log files are being collected by ClusterControl every 30 minutes and you can check the “Last Updated” time at the bottom of the overview. If you are in immediate need of the log files you can push the “Refresh Logs” button to trigger a job in ClusterControl to collect the latest lines from the log files.

Error reports

Whenever you were not able to resolve your issues using the log files as described above and would like us to have a look, it is always handy to include an error report for us. You can find this under Cluster > Logs > Error reports. The error report is basically a tarball that contains a collection of log files, job lists and job details from the cluster.

You can create a job that will generate an error report by clicking on the “Create Error Report” button in the interface. This will give you a dialogue that asks whether you want to store the report on the web server or not. If you store the reports on the web server, you can download the report once the job has succeeded. Otherwise you can specify the location on the ClusterControl node where you want the report to be stored.

You can attach this report to the support ticket you are creating, so we have all the information at hand.

Final thoughts

With the combined insights you can retrieve from the cluster jobs, CMON logs and system log files, you should be able to narrow down issues more easily. Combine that insight with the knowledge of our blog post on the MySQL error log, this should help you not only identify the issue resolve it yourself.

Blog category:

Become a MySQL DBA blog series - Troubleshooting with pt-stalk - part 2

$
0
0

In our last post, we showed you how to use pt-stalk to help gather a bunch of data that can be used for troubleshooting. But what type of data does it collect? How do we parse the files to make sense of all that data?

This is the twentieth installment in the ‘Become a MySQL DBA’ blog series. Our previous posts in the DBA series include:

After reading the  previous post, you should know how to use pt-stalk to watch MySQL metrics and trigger data collection when a predefined situation happens. You should also know about a couple of different ways how you can collect the data, and how you can extend pt-stalk to cover some more complex cases.

Today, we’ll talk about the data you collected - what kind of data does pt-stalk provide you with, and how you can utilize it. Let’s start with answering the question - where and how pt-stalk stores its data?

By default, it uses /var/lib/pt-stalk directory but you can change it using --dest variable at pt-stalk invocation. If you take a look into the directory, you’ll see multiple files following the pattern of:

‘2015_12_03_15_45_20-*’

Basically, every file from a single data set uses the same timestamp, which you’ll find very useful when searching through the different incidents. Each of the files contains specific type of data collected by a particular process executed by pt-stalk. Some of the files are created all the time, some of them only when a particular condition is matched. An example of such a file can be ‘*-lock-waits’ - file created only when at least one transaction was in the ‘LOCK_WAIT’ state. Otherwise you won’t find it in the output.

We will not cover all of the files created by pt-stalk in detail - there are too many of them and some are useful only in edge cases. We want to focus on those which are used most often.

Disk status

Pt-stalk collects some data about the state of the storage layer. Most important aspects are stored in following files.

*-df

Sample of data from this file may look like this:

TS 1449157522.002282558 2015-12-03 15:45:22
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/xvda1       8115168  2286260   5393632  30% /
none                   4        0         4   0% /sys/fs/cgroup
udev             2018468       12   2018456   1% /dev
tmpfs             404688      412    404276   1% /run
none                5120        0      5120   0% /run/lock
none             2023428        0   2023428   0% /run/shm
none              102400        0    102400   0% /run/user
/dev/xvdf      104806400 11880864  92925536  12% /var/lib/mysql
TS 1449157523.008747734 2015-12-03 15:45:23
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/xvda1       8115168  2286264   5393628  30% /
none                   4        0         4   0% /sys/fs/cgroup
udev             2018468       12   2018456   1% /dev
tmpfs             404688      412    404276   1% /run
none                5120        0      5120   0% /run/lock
none             2023428        0   2023428   0% /run/shm
none              102400        0    102400   0% /run/user
/dev/xvdf      104806400 11879236  92927164  12% /var/lib/mysql

This is a simple output of df command, executed every second for the duration of 30s. This is a pretty common pattern in pt-stalk, it tries to collect some data samples over time to make it easier to compare the changes. In this particular case, we are presented with information about disk space in the system - may come handy when e.g. binary logs filled the disk, triggering some weird behavior.

*-diskstats

This file contains samples from /proc/diskstats collected for 30 seconds. Raw output is not very useful but we can use it together with pt-diskstats, another tool from Percona Toolkit, to present some nice info about I/O utilization. Here’s a sample of this output.

root@ip-172-30-4-156:/var/lib/pt-stalk# pt-diskstats --columns-regex='_s|time|cnc|prg' 2015_12_03_15_45_20-diskstats

  #ts device    rd_s rd_mb_s rd_cnc    wr_s wr_mb_s wr_cnc in_prg    io_s  qtime stime
  1.0 xvda       0.0     0.0    0.0    37.8     0.6    2.3      0    37.8   31.2   2.0
  1.0 xvda1      0.0     0.0    0.0    37.8     0.6    2.3      0    37.8   31.2   2.0
  1.0 xvdf      48.7     0.7    3.8   217.6    15.3   37.8      0   266.3  127.3   2.4

  #ts device    rd_s rd_mb_s rd_cnc    wr_s wr_mb_s wr_cnc in_prg    io_s  qtime stime
  2.0 xvda       0.0     0.0    0.0     0.0     0.0    0.0      0     0.0    0.0   0.0
  2.0 xvda1      0.0     0.0    0.0     0.0     0.0    0.0      0     0.0    0.0   0.0
  2.0 xvdf      73.5     1.0    0.6   136.3     5.4    2.0      0   209.8    7.7   1.9

  #ts device    rd_s rd_mb_s rd_cnc    wr_s wr_mb_s wr_cnc in_prg    io_s  qtime stime
  3.0 xvda       0.0     0.0    0.0     1.0     0.0    0.0      0     1.0    0.0   0.0
  3.0 xvda1      0.0     0.0    0.0     1.0     0.0    0.0      0     1.0    0.0   0.0
  3.0 xvdf      73.8     0.9    0.5   141.5     5.8    1.8      0   215.3    6.3   2.1

  #ts device    rd_s rd_mb_s rd_cnc    wr_s wr_mb_s wr_cnc in_prg    io_s  qtime stime
  4.0 xvda       0.0     0.0    0.0     1.0     0.0    0.0      0     1.0    0.0   0.0
  4.0 xvda1      0.0     0.0    0.0     1.0     0.0    0.0      0     1.0    0.0   0.0
  4.0 xvdf      57.1     0.8    1.0   192.3     8.9    3.7      0   249.4   12.1   1.7

  #ts device    rd_s rd_mb_s rd_cnc    wr_s wr_mb_s wr_cnc in_prg    io_s  qtime stime
  5.0 xvda       0.0     0.0    0.0     1.0     0.0    0.0      0     1.0    0.0   0.0
  5.0 xvda1      0.0     0.0    0.0     1.0     0.0    0.0      0     1.0    0.0   0.0
  5.0 xvdf      79.0     1.0    0.5   180.0    10.9    4.4    110   259.0   37.5   1.8

  #ts device    rd_s rd_mb_s rd_cnc    wr_s wr_mb_s wr_cnc in_prg    io_s  qtime stime
  6.0 xvda       0.0     0.0    0.0     5.9     0.1    0.1      0     5.9    0.5   2.1
  6.0 xvda1      0.0     0.0    0.0     5.9     0.1    0.1      0     5.9    0.5   2.1
  6.0 xvdf      71.4     1.0    3.6   234.0    19.0   40.3      0   305.4  106.2   2.4

Pt-diskstats is subject for another blog post. We’ll just mention that this tool gives you the ability to aggregate and filter the most important I/O data from your system. Looking into our case here, it’s pretty clear we are suffering from some kind of I/O contention - ‘qtime’, a column which tells us how long I/O call waits in the scheduler’s queue, spikes even to more than 100ms. This is high even for a spindle. This kind of data, combined with knowledge of the limits of your hardware, gives you better insight in whether I/O is a culprit or just one of the symptoms of a problem.

Network data

*-netstat, *-netstat_s

Those files contain info about currently opened connections (*-netstat) and a summary of network-related statistics (*-netstat_s). This kind of data can be very useful when dealing with issues with network.

Memory data

*-meminfo

This file contains samples extracted from /proc/meminfo. Samples are taken every second, collecting information about memory utilization on the system.

TS 1449157522.002282558 2015-12-03 15:45:22
MemTotal:        4046860 kB
MemFree:          121688 kB
Buffers:            6768 kB
Cached:          2353040 kB
SwapCached:            0 kB
Active:          2793796 kB
Inactive:         986648 kB
Active(anon):    1420712 kB
Inactive(anon):      396 kB
Active(file):    1373084 kB
Inactive(file):   986252 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:             14476 kB
Writeback:         15772 kB
AnonPages:       1420692 kB
Mapped:           145200 kB
Shmem:               472 kB
Slab:              96028 kB
SReclaimable:      80948 kB
SUnreclaim:        15080 kB
KernelStack:        2160 kB
PageTables:         9996 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     2023428 kB
Committed_AS:    1715484 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       14164 kB
VmallocChunk:   34359717888 kB
HardwareCorrupted:     0 kB
AnonHugePages:   1335296 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       26624 kB
DirectMap2M:     4298752 kB

*-pmap

More detailed info about MySQL’s memory allocation can be find in this file.

*-slabinfo

Here you’ll see information collected from /proc/slabinfo. Again here, some samples are collected every second which show how slab cache utilization changed over time.

*-procvmstat

This file contain samples of data from /proc/vmstat collected every second.

CPU and kernel activity

*-procstat

Information about CPU activity as shown in /proc/stat.

*-interrupts

In this file, you can find information about different kind of interrupt counts and how they are split between CPU cores. It can be pretty useful when trying to identify weird cases of high system load or, sometimes, hardware issues.

Miscellaneous system data

*-lsof

This particular file contains output of “lsof -p $(pidof mysqld)”. We can go through a list of files opened by mysqld, this also include any TCP connections which may be open at that time. Data like this may be interesting when, let’s say, you want to confirm what exact MySQL version you are running (you’ll see what libraries were opened by the mysqld process).

*-ps

This file contains the output of the ps command, listing processes running at the time of the incident. Could be very useful for catching different types of jobs executed directly on the MySQL host. It also gives nice insight on what is running on the system.

*-top

This file stores collected output of the ‘top’ command - data very similar to what you can find in the ‘*-ps‘ file. An important addition is a snapshot of the CPU utilization per process.

*-vmstat and *-vmstat-overall

Stores data collected using ‘vmstat’ command. Sample may look like this:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
71  1      0 122156   6768 2352744    0    0   166  1267   83  200 20 18 47  4 11
 1 11      0 124996   6768 2353260    0    0   684 19456 7023 6694 37 34  0  8 21
56  1      0 125676   6780 2352220    0    0   636  6259 7830 6733 37 36  0  4 23
 0  1      0 125304   6780 2353312    0    0   920  7118 6233 10782 36 30 12  5 17
 0  3      0 122824   6780 2355568    0    0   988  5474 6733 10108 31 27 13 12 16
39  0      0 122396   6780 2356252    0    0   844  8618 8251 10843 30 24 25  9 13
 0  4      0 123236   6780 2355080    0    0   896 22482 11639 9266 30 29  9 18 14
58  0      0 122072   6788 2356484    0    0  1036  6446 8043 11345 33 25  8 20 15
11  2      0 124680   4460 2355928    0    0  1096  9380 8784 11449 36 33  3  8 19
22  0      0 124172   4460 2356952    0    0  1068  5922 8303 12137 36 32  3 11 18
15  0      0 124008   4460 2355200    0    0   796  5764 8256 10389 36 36  3  3 22
54  0      0 125920   4460 2356124    0    0  1012  8041 8449 12298 38 31  2 11 18
61  0      0 122172   4468 2358724    0    0   920  5811 7817 11650 37 33  2  8 20
51  0      0 124420   4248 2357304    0    0   920 10804 6565 10155 32 25  8 21 14
10  0      0 123900   4248 2358144    0    0  1252  5554 8922 12652 38 30  5 10 18
30  0      0 125356   4248 2356552    0    0   760  4208 8296 9757 32 26 22  5 15
27  2      0 123956   4248 2357688    0    0  1420  7164 6986 12300 41 32  4  4 20
 3  1      0 121624   4256 2360016    0    0   912  6373 7653 10600 36 34  6  4 20
 0  0      0 123724   4248 2358048    0    0   748 10550 6878 10302 34 30  7 12 17
52  1      0 124720   4248 2356756    0    0  1068  3228 6470 10636 33 29 17  5 17
23  0      0 123984   4248 2357672    0    0  1072  5455 5990 11145 36 32  7  7 17

It’s a little bit of everything - running processes, memory and swap utilization, I/O and swap disk traffic, number of interrupts and context switches per second, CPU utilization. Simple data but it can give you a good idea on how the system performed at a time of the incident.

Data related to MySQL

*-processlist

This is self-explanatory - this file contains output of the SHOW PROCESSLIST\G, one of the most important means of understanding the state of MySQL.

*-innodbstatus1 and *-innodbstatus2

Those files contain output of SHOW ENGINE INNODB STATUS taken at the start and at the end of the data collection. SHOW ENGINE INNODB STATUS is a crucial tool to understand the state of MySQL and InnoDB as engine. Having two files allows us to compare how the state changed. It would not be possible to explain, in a couple of sentences, how this data can be useful to a DBA. In short, you’ll find here information about running transactions and their state, info about last deadlocks, locking contentions on both row level and internal contention (on mutex/latch level). You can also use this data to get an insight on I/O activity for InnoDB. We covered some of these points in two past blog posts:

Monitoring - Understanding and Optimizing CPU-related InnoDB metrics

Monitoring - Understanding and Optimizing IO-related InnoDB metrics

*-lock-waits

Another important piece of data - this file contains information about transactions which were stuck in the ‘LOCK_WAIT’ state when data was collected. Sample output may look like this:

TS 1449157537.003406070 2015-12-03 15:45:37
*************************** 1. row ***************************
   who_blocks: thread 28242 from 172.30.4.23:14643
  idle_in_trx: 0
max_wait_time: 0
  num_waiters: 1
*************************** 1. row ***************************
    waiting_trx_id: 16480064
    waiting_thread: 28196
         wait_time: 0
     waiting_query: UPDATE sbtest1 SET k=k+1 WHERE id=500167
waiting_table_lock: `sbtest`.`sbtest1`
   blocking_trx_id: 16479993
   blocking_thread: 28242
     blocking_host: 172.30.4.23
     blocking_port: 14643
       idle_in_trx: 0
    blocking_query: COMMIT

As you can see, we have information about the transaction which was waiting - transaction ID and thread number, query which waits for the execution, table which is involved. We also can see who’s locking - transaction id, thread number, host, port and blocking query. At the beginning you can also see nice summary of who’s blocking and how many other transactions are waiting (num_waiters) - pretty useful in identifying the most offending blockers.

Data collected here is coming from information_schema.innodb_lock_waits table.

*-transactions

This file contains list of all running transactions at the time of the data collection. Sample output can look like:

*************************** 43. row ***************************
                    trx_id: 16479993
                 trx_state: RUNNING
               trx_started: 2015-12-03 15:45:36
     trx_requested_lock_id: NULL
          trx_wait_started: NULL
                trx_weight: 8
       trx_mysql_thread_id: 28242
                 trx_query: COMMIT
       trx_operation_state: NULL
         trx_tables_in_use: 0
         trx_tables_locked: 0
          trx_lock_structs: 4
     trx_lock_memory_bytes: 1184
           trx_rows_locked: 3
         trx_rows_modified: 4
   trx_concurrency_tickets: 0
       trx_isolation_level: REPEATABLE READ
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
 trx_adaptive_hash_latched: 0
 trx_adaptive_hash_timeout: 10000
          trx_is_read_only: 0
trx_autocommit_non_locking: 0

As you may have noticed, this particular example contains the info about transactions in which there was a blocker - we identified it in the *-lock_waits file. Knowing the thread or transaction ID, it’s very easy to grep through the lock_waits data, transaction data, processlist and SHOW ENGINE INNODB STATUS output and connect all the pieces together, building a broad view of how the lock contention started and what might have been the culprit. This particular data comes from the information_schema.innodb_trx table and contains lots of info about each of the running transactions. You can find here information on currently running queries, number of rows locked, when the transaction started, transaction isolation level and many others.

*-mutex-status1 and *-mutex-status2

These two files contain information collected using ‘SHOW ENGINE INNODB MUTEX’ at the beginning and at the end of the data collection process. Sample output may look like:

InnoDB  &space->latch   os_waits=1
InnoDB  &purge_sys->latch       os_waits=11760
InnoDB  &new_index->lock        os_waits=20451
InnoDB  &dict_operation_lock    os_waits=229
InnoDB  &space->latch   os_waits=9778
InnoDB  &log_sys->checkpoint_lock       os_waits=925
InnoDB  &btr_search_latch_arr[i]        os_waits=23502
InnoDB  table->sync_obj.rw_locks + i    os_waits=2

What you see here is a list of different mutexes, and for each information, details on how many os_waits happened. If you do not have detailed tracking of waits enabled in the performance schema, this is the best way to get the understanding which mutex causes the biggest contention for your workload.

*-mysqladmin

This file contains samples of output of ‘SHOW GLOBAL STATUS’ executed using mysqladmin tool. What we are looking here is everything MySQL presents using this command - handlers, InnoDB stats, binlog stats, Galera stats (if you use Galera cluster). Such data can be useful for investigating different type of problems. In fact, this is exactly the type of data DBA’s will like to have and review if needed. Main issue with such output is that MySQL presents some of the data as monotonous incremented counters while we’d be interested more in how they change in time. It’s not rocket science to build a script which will perform such conversion but there’s also no need to reinvent the wheel as Percona Toolkit contains pt-mext, a tool which does exactly that.

You can use it in a following way to get a nice and browsable, columnar output.

pt-mext -r -- cat /var/lib/pt-stalk/2015_12_03_15_45_20-mysqladmin | less -S

*-opentables1 and *-opentables2

Those files contain a list of opened tables in MySQL, at the beginning and at the end of the data collection. Please note that the current pt-stalk version at the time of writing (version 2.2.16) has a bug where it doesn’t print this data if MySQL contains more than 1000 tables.

*-variables

This file contains output of ‘SHOW GLOBAL VARIABLES’ - a list of all runtime variables in MySQL at the time of data collection. There’s not much to add here - such data is invaluable because, as you probably know, seeing a configuration file is not enough to tell what kind of configuration MySQL is using. Many MySQL variables are dynamic and can be changed at any time, without making a change in my.cnf.

Tools which help to parse pt-stalk data

As you can see, pt-stalk brings you a considerable amount of data to work with. It may look like it’s hard to go through this data but it gets easier with experience. Additionally, the more you are familiar with your MySQL installation, the faster you’ll be able to pinpoint possible issues. Still, it’s a process of looking into tens of files to find the important data - a process which can be very time consuming. Luckily, there are tools that can help in this regard. We’ll talk about two of them.

pt-sift

This tool is part of the Percona Toolkit, and it’s intended to give you an overview of the data collected by pt-stalk. Invocation is simple:

root@ip-172-30-4-156:~# pt-sift /var/lib/pt-stalk/

You are presented with a list of data sets stored in the directory you’ve chosen. You can use the latest one (default) or pass another one you want to review.

  2015_11_20_10_15_49  2015_11_20_10_34_20  2015_11_20_10_40_15
  2015_11_24_19_47_52  2015_11_24_19_48_03  2015_11_24_19_48_46
  2015_12_03_12_38_18  2015_12_03_12_42_09  2015_12_03_12_47_18
  2015_12_03_15_43_41  2015_12_03_15_45_20  2015_12_03_16_10_23
  2015_12_03_16_26_41

Select a timestamp from the list [2015_12_03_16_26_41] 2015_12_03_15_45_20

After you pick the sample, you’ll see a summary page:

======== ip-172-30-4-156 at 2015_12_03_15_45_20 DEFAULT (11 of 13) ========
--diskstats--
  #ts device    rd_s rd_avkb rd_mb_s rd_mrg rd_cnc   rd_rt    wr_s wr_avkb wr_mb_s wr_mrg wr_cnc   wr_rt busy in_prg    io_s  qtime stime
 {29} xvdf      75.9    13.1     1.0     0%    1.2    15.5   151.1    47.2     7.0    29%    5.3    24.6  63%      0   227.0   18.4   2.2
 xvdf  0% 30% 15% 20% 15% 20% 30% 25% 20% 25% . . 30% 25% 15% 25% 20% 15% . 20% 15% 20% 15% 25% 15% . . 25% 10% 15%
--vmstat--
 r b swpd   free buff   cache si so  bi   bo   in    cs us sy id wa st
72 1    0 122064 6768 2352744  0  0 166 1267   83   200 20 18 47  4 11
 0 0    0 126908 4248 2356128  0  0 979 7353 7933 10729 35 31  8  8 18
wa 0% 5% 0% 5% 10% 5% 15% 20% 5% 10% 0% 10% 5% 20% 10% 5% 0% . 10% 5% . 0% 5% . 10% 0% 5% 0% 5% 0%
--innodb--
    txns: 65xACTIVE (180s) 3xnot (0s)
    3 queries inside InnoDB, 0 queries in queue
    Main thread: sleeping, pending reads 1, writes 1, flush 0
    Log: lsn = 28182468703, chkp = 27881812915, chkp age = 300655788
    Threads are waiting at:
     50 log0log.ic line 358
      1 btr0cur.cc line 3892
    Threads are waiting on:
      1 S-lock on RW-latch at 0x7f1bf2f53740 '&block->lock'
--processlist--
    State
     40  updating
     14  wsrep in pre-commit stage
      6  statistics
      5
      3  Sending data
    Command
     65  Query
      7  Sleep
      1  Connect
--stack traces--
    No stack trace file exists
--oprofile--
    No opreport file exists

As you can see, pt-sift presents a summary of the diskstats data and vmstat data. It also presents information about InnoDB internals - how many transactions, are there any waits? Some data on InnoDB checkpointing is also presented. We also see a summary of the processlist output. By typing ‘?’, you can see the internal ‘help’ for pt-sift. There are couple of other options you can pick from - see diskstats output, check the SHOW ENGINE INNODB STATUS output, summarize netstat data and check the data from SHOW STATUS.

Rain Gauge

Pt-sift is a CLI tool but if you prefer a UI, you can use a Rain Gauge - a tool which presents pt-stalk data using the web browser.

After setup you’ll be presented with a screen summarizing all the source hosts which pushed the data to the Rain Gauge and number of samples collected for each of them. You can pick a host either from the drop down menu or by clicking on the list. You’ll see a screen similar to this one:

Next step is to pick a sample you find interesting. You’ll end up in a summary page.

At the top, there are links to different sections of collected data. You can click those links to check the information particularly interesting to you. At the bottom, there’s a summary generated using pt-sift.

Most of the ‘sections’ just print file contents, split by samples, which doesn’t make much difference than using CLI. One of them, though, gives you some extra info. When you check the ‘mysqladmin’ section, you will see a screen like the below:

It is possible to graph any combination of the counters by picking them from the list on the left. You can also use one of the presets by clicking the buttons below the graph. Finally, at the bottom, you will see samples stored in multiple columns. Such way of presenting the data is really useful when you want to compare state of different counters. Having the ability to graph the data is also helpful. It’s definitely easier to see trends on a graph than by looking at raw numbers.

With this post, we conclude our coverage of pt-stalk. Hopefully you’ll find this useful if ever you’ll need to use pt-stalk to pinpoint issues in your database environment. In our next post, we will talk about database recovery in MySQL Replication and Galera Cluster setups.


Picture credit: Ken Fager

Blog category:


ClusterControl Developer Studio: write your first database advisor

$
0
0

Did you ever wonder what triggers the advice in ClusterControl that your disk is filling up? Or the advice to create primary keys on InnoDB tables if they don’t exist? These advisors are mini scripts written in the ClusterControl Domain Specific Language (DSL) that is a Javascript-like language. These scripts can be written, compiled, saved, executed and scheduled in ClusterControl. That is what the ClusterControl Developer Studio blog series will be about.

Today we will cover the Developer Studio basics and show you how to create your very first advisor where we will pick two status variables and give advice about their outcome.

The advisors

Advisors are mini scripts that are executed by ClusterControl, either on-demand or after a schedule. They can be anything from simple configuration advice, warning on thresholds or more complex rules for predictions or cluster-wide automation tasks based on the state of your servers or databases. In general, advisors perform more detailed analysis, and produce more comprehensive recommendations than alerts.

The advisors are stored inside the ClusterControl database and you can add new or alter/modify existing advisors. We also have an advisor Github repository where you can share your advisors with us and other ClusterControl users.

The language used for the advisors is the so called ClusterControl DSL and is an easy to comprehend language. The semantics of the language can be best compared to Javascript with a couple of differences, where the most important differences are:

  • Semicolons are mandatory
  • Various numeric data types like integers and unsigned long long integers.
  • Arrays are two dimensional and single dimensional arrays are lists.

You can find the full list of differences in the ClusterControl DSL reference.

The Developer Studio interface

The Developer Studio interface can be found under Cluster > Manage > Developer Studio. This will open an interface like this:

Advisors

The advisors button will generate an overview of all advisors with their output since the last time they ran:

You can also see the schedule of the advisor in crontab format and the date/time since the last update. Some advisors are scheduled to run only once a day so their advice may no longer reflect the reality, for instance if you already resolved the issue you were warned about. You can manually re-run the advisor by selecting the advisor and run it. Go to the “compile and run” section to read how to do this.

Importing advisors

The Import button will allow you to import a tarball with new advisors in them. The tarball has to be created relative to the main path of the advisors, so if you wish to upload a new version of the MySQL query cache size script (s9s/mysql/query_cache/qc_size.js) you will have to make the tarball starting from the s9s directory.

By default the import will create all (sub)folders of the import but not overwrite any of the existing advisors. If you wish to overwrite them you have to select the “Overwrite existing files” checkbox.

Exporting advisors

You can export the advisors or a part of them by selecting a node in the tree and pressing the Export button. This will create a tarball with the files in the full path of the structure presented. Suppose we wish to make a backup of the s9s/mysql advisors prior to making a change, we simply select the s9s/mysql node in the tree and press Export:

Note: make sure the s9s directory is present in /home/myuser/.

This will create a tarball called /home/myuser/s9s/mysql.tar.gz with an internal directory structure s9s/mysql/*

Creating a new advisor

Since we have covered exports and imports, we can now start experimenting. So let’s create a new advisor! Click on the New button to get the following dialogue:

In this dialogue, you can create your new advisor with either an empty file or pre fill it with the Galera or MySQL specific template. Both templates will add the necessary includes (common/mysql_helper.js) and the basics to retrieve the Galera or MySQL nodes and loop over them.

Creating a new advisor with the Galera template looks like this:

#include "common/mysql_helper.js"

Here you can see that the mysql_helper.js gets included to provide the basis for connecting and querying MySQL nodes.

var WARNING_THRESHOLD=0;
…
if(threshold > WARNING_THRESHOLD)

The warning threshold is currently set to 0, meaning if the measured threshold is greater than the warning threshold, the advisor should warn the user. Note that the variable threshold is not set/used in the template yet as it is a kickstart for your own advisor.

var hosts     = cluster::Hosts();
var hosts     = cluster::mySqlNodes();
var hosts     = cluster::galeraNodes();

The statements above will fetch the hosts in the cluster and you can use this to loop over them. The difference between them is that the first statement includes all non-MySQL hosts (also the CMON host), the second all MySQL hosts and the last one only the Galera hosts. So if your Galera cluster has MySQL asynchronous read slaves attached, those hosts will not be included.

Other than that these objects will all behave the same and feature the ability to read their variables, status and query against them.

Advisor buttons

Now that we have created a new advisor there are six new button available for this advisor:

Save will save your latest modifications to the advisor (stored in the CMON database), Move will move the advisor to a new path and Remove will obviously remove the advisor.

More interesting is the second row of buttons. Compiling the advisor will compile the code of the advisor. If the code compiles fine, you will see this message in the Messages dialogue below the code of the advisor:

While if the compilation failed, the compiler will give you a hint where it failed:

In this case the compiler indicates a syntax error was found on line 24.

The compile and run button will not only compile the script but also execute it and its output will be shown in the Messages, Graph or Raw dialogue. If we compile and run the table cache script from the auto_tuners, we will get output similar to this:

Last button is the schedule button. This allows you to schedule (or unschedule) your advisors and add tags to it. We will cover this at the end of this post when we have created our very own advisor and want to schedule it.

My first advisor

Now that we have covered the basics of the ClusterControl Developer Studio, we can now finally start to create a new advisor. As an example we will create a advisor to look at the temporary table ratio. Create a new advisor as following:

The theory behind the advisor we are going to create is simple: we will compare the number of temporary tables created on disk against the total number of temporary tables created:

tmp_disk_table_ratio = Created_tmp_disk_tables / (Created_tmp_tables + Created_tmp_disk_tables) * 100;

First we need to set some basics in the head of the script, like the thresholds and the warning and ok messages. All changes and additions have been marked in bold:

var WARNING_THRESHOLD=20;
var TITLE="Temporary tables on disk ratio";
var ADVICE_WARNING="More than 20% of temporary tables are written to disk. It is advised to review your queries, for example, via the Query Monitor.";
var ADVICE_OK="Temporary tables on disk are not excessive." ;

We set the threshold here to 20 percent which is considered to be pretty bad already. But more on that topic once we have finalised our advisor.

Next we need to get these status variables from MySQL. Before we jump to conclusions and execute some “SHOW GLOBAL STATUS LIKE ‘Created_tmp_%’” query, there is already a function to retrieve the status variable of a MySQL instance:

statusVar = readStatusVariable(host, <statusvariablename>);

We can use this function in our advisor to fetch the Created_tmp_disk_tables and Created_tmp_tables.

for (idx = 0; idx < hosts.size(); ++idx)
{
   host        = hosts[idx];
   map         = host.toMap();
   connected     = map["connected"];
   var advice = new CmonAdvice();
   var tmp_tables = readStatusVariable(host, ‘Created_tmp_tables’);
   var tmp_disk_tables = readStatusVariable(host, ‘Created_tmp_disk_tables’);

And now we can calculate the temporary disk tables ratio:

var tmp_disk_table_ratio = tmp_disk_tables / (tmp_tables + tmp_disk_tables) * 100;

And alert if this ratio is greater than the threshold we set in the beginning:

if(checkPrecond(host))
{
   if(tmp_disk_table_ratio > WARNING_THRESHOLD) {
      advice.setJustification("Temporary tables written to disk is excessive");
      msg = ADVICE_WARNING;
   }
   else {
      advice.setJustification("Temporary tables written to disk not excessive");
      msg = ADVICE_OK;
   }
}

It is important to assign the Advice to the msg variable here as this will be added later on into the advice object with the setAdvice function. The full script for completeness:

#include "common/mysql_helper.js"

/**
* Checks the percentage of max ever used connections
*
*/
var WARNING_THRESHOLD=20;
var TITLE="Temporary tables on disk ratio";
var ADVICE_WARNING="More than 20% of temporary tables are written to disk. It is advised to review your queries, for example, via the Query Monitor.";
var ADVICE_OK="Temporary tables on disk are not excessive.";

function main()
{
   var hosts     = cluster::mySqlNodes();
   var advisorMap = {};

   for (idx = 0; idx < hosts.size(); ++idx)
   {
       host        = hosts[idx];
       map         = host.toMap();
       connected     = map["connected"];
       var advice = new CmonAdvice();
       var tmp_tables = readStatusVariable(host, 'Created_tmp_tables');
       var tmp_disk_tables = readStatusVariable(host, 'Created_tmp_disk_tables');
       var tmp_disk_table_ratio = tmp_disk_tables / (tmp_tables + tmp_disk_tables) * 100;
       
       if(!connected)
           continue;

       if(checkPrecond(host))
       {
          if(tmp_disk_table_ratio > WARNING_THRESHOLD) {
              advice.setJustification("Temporary tables written to disk is excessive");
              msg = ADVICE_WARNING;
              advice.setSeverity(0);
          }
          else {
              advice.setJustification("Temporary tables written to disk not excessive");
              msg = ADVICE_OK;
          }
       }
       else
       {
           msg = "Not enough data to calculate";
           advice.setJustification("there is not enough load on the server or the uptime is too little.");
           advice.setSeverity(0);
       }

       advice.setHost(host);
       advice.setTitle(TITLE);
       advice.setAdvice(msg);
       advisorMap[idx]= advice;
   }

   return advisorMap;
}

Now you can play around with the threshold of 20, try to lower it to 1 or 2 for instance and then you probably can see how this advisor will actually give you advice on the matter.

As you can see, with a simple script you can check two variables against each other and report/advice based upon their outcome. But is that all? There are still a couple of things we can improve!

Improvements on my first advisor

The first thing we can improve is that this advisor doesn’t make a lot of sense. What the metric actually reflects is the total number of temporary tables on disk since the last FLUSH STATUS or startup of MySQL. What it doesn’t say is at what rate it actually creates temporary tables on disk. So we can convert the Created_tmp_disk_tables to a rate using the uptime of the host:

var tmp_disk_table_rate = tmp_disk_tables / uptime;

This should give us the number of temporary tables per second and combined with the tmp_disk_table_ratio, this will give us a more accurate view on things. Again, once we reach the threshold of two temporary tables per second, we don’t want to immediately send out an alert/advice.

Another thing we can improve is to not use the readStatusVariable function from the mysql_helper.js library. This function executes a query to the MySQL host every time we read a status variable, while CMON already retrieves most of them every second and we don’t need a real-time status anyway. It’s not like two or three queries will kill the hosts in the cluster, but if many of these advisors are run in a similar fashion, this could create heaps of extra queries.

In this case we can optimize this by retrieving the status variables in a map using the host.sqlInfo()function and retrieve everything at once as a map. This function contains the most important information of the host, but it does not contain all. For instance the variable uptime that we need for the rate is not available in the host.sqlInfo()map and has to be retrieved with the readStatusVariable function.

This is what our advisor will look like now, with the changes/additions marked in bold:

#include "common/mysql_helper.js"

/**
* Checks the percentage of max ever used connections
*
*/
var RATIO_WARNING_THRESHOLD=20;
var RATE_WARNING_THRESHOLD=2;
var TITLE="Temporary tables on disk ratio";
var ADVICE_WARNING="More than 20% of temporary tables are written to disk and current rate is more than 2 temporary tables per second. It is advised to review your queries, for example, via the Query Monitor.";
var ADVICE_OK="Temporary tables on disk are not excessive.";

function main()
{
   var hosts     = cluster::mySqlNodes();
   var advisorMap = {};

   for (idx = 0; idx < hosts.size(); ++idx)
   {
       host        = hosts[idx];
       map         = host.toMap();
       connected     = map["connected"];
       var advice = new CmonAdvice();
       var hostStatus = host.sqlInfo();
       var tmp_tables = hostStatus['CREATED_TMP_TABLES'];
       var tmp_disk_tables = hostStatus['CREATED_TMP_DISK_TABLES'];
       var uptime = readStatusVariable(host, 'uptime');
       var tmp_disk_table_ratio = tmp_disk_tables / (tmp_tables + tmp_disk_tables) * 100;
       var tmp_disk_table_rate = tmp_disk_tables / uptime;

       if(!connected)
           continue;

       if(checkPrecond(host))
       {
          if(tmp_disk_table_rate > RATE_WARNING_THRESHOLD && tmp_disk_table_ratio > RATIO_WARNING_THRESHOLD) {
              advice.setJustification("Temporary tables written to disk is excessive: " + tmp_disk_table_rate + " tables per second and overall ratio of " + tmp_disk_table_ratio);
              msg = ADVICE_WARNING;
              advice.setSeverity(0);
          }
          else {
              advice.setJustification("Temporary tables written to disk not excessive");
              msg = ADVICE_OK;
          }
       }
       else
       {
           msg = "Not enough data to calculate";
           advice.setJustification("there is not enough load on the server or the uptime is too little.");
           advice.setSeverity(0);
       }

       advice.setHost(host);
       advice.setTitle(TITLE);
       advice.setAdvice(msg);
       advisorMap[idx]= advice;
   }

   return advisorMap;
}

Scheduling my first advisor

After we have saved this new advisor, compiled it and run, we now can schedule this advisor. Since we don’t have an excessive workload, we will probably run this advisor once per day.

The base scheduling mode has every minute, 5 minutes, hour, day, month preset and this is exactly what we need. Changing this to advanced will unlock the other greyed out input fields. These input fields work exactly the same as a crontab, so you can even schedule for a particular day, day of the month or even set it on weekdays.

Blog category:

Managing MySQL Replication for High Availability

$
0
0

Join us on February 2nd for this new webinar on Managing MySQL Replication for High Availability led by Krzysztof Książek, Senior Support Engineer at Severalnines. This is part of our ongoing ‘Become a MySQL DBA’ series.

Deploying a MySQL Replication topology is only the beginning of your journey. Maintaining it also involves topology changes, managing slave lag, promoting slaves, repairing replication issues, fixing broken nodes, managing schema changes and scheduling backups. Multi-datacenter replication also adds another dimension of complexity. It is always good to be prepared up front and know how to deal with these cases.

In this webinar we will cover deployment and management of MySQL replication topologies using ClusterControl, show how to schedule backups, promote slaves and what are the most important metrics to keep a close eye on. We will also cover how you can deal with schema and topology changes as well as some of the most common replication issues.

Date & time

Europe/MEA/APAC

Tuesday, February 2nd at 09:00 GMT / 10:00 CET (Germany, France, Sweden)
Register Now

North America/LatAm

Tuesday, February 2nd at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)
Register Now

Agenda

  • Deployment of MySQL replication topologies using ClusterControl
  • Schedule backups
  • Promote slaves
  • Important metrics to keep an eye on
  • Schema changes
  • Topology changes
  • Common replication issues

Speaker

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard. This webinar builds upon recent blog posts and related webinar series by Krzysztof on how to become a MySQL DBA.

We look forward to “seeing” you there and to some good discussions!

To read our new MySQL Replication online tutorial, please visit:
http://severalnines.com/tutorials/mysql-replication-high-availability-tutorial

To view all the blogs of the ‘Become a MySQL DBA’ series visit:
http://www.severalnines.com/blog-categories/db-ops

To view all our webinar replays, please visit:
http://severalnines.com/webinars-replay

Blog category:

Get all the insight on open source database management and infrastructure operations with Severalnines whitepapers

$
0
0

Whether you’re looking into ways to automate various aspects of administering your open source databases or to take better control of your data, we have the relevant whitepaper that will help you in your quest and hopefully provide you with good food for thought on how to achieve your database management objectives.

Management and Automation of Open Source Databases

As the adoption of open source databases, such as MySQL / MariaDB, PostgreSQL or MongoDB, increases in the enterprise, especially for mission-critical applications, so does the need for robust and integrated tools. Operational staff need to able to manage everything from provisioning, capacity, performance and availability of the database environment. This is needed to minimize the risk for service outages or poor application performance.

This whitepaper discusses the database infrastructure lifecycle, what tools to build (or buy) for effective management, database deployment options beyond Chef or Puppet, important aspects of monitoring and managing open source database infrastructures and how ClusterControl enables a systematic approach to open source database operations.

You may also be interested in our related blog series on:

All of our white papers can be downloaded here: http://severalnines.com/whitepapers

Happy clustering!

Blog category:

February 23rd: how CloudStats.me moved from MySQL to clustered MariaDB for high availability

$
0
0

On Tuesday, February 23, please join us and the WooServers team for a webinar on the scalable, open source database infrastructure behind CloudStats.me.

CloudStats.me is a fast growing cloud-based server and website monitoring service. The rapid growth of the CloudStats user base and the number of services being monitored created a significant load on its MySQL infrastructure. The system ingests large amounts of incoming metrics/event data collected by thousands of agents. The backend systems also perform analytics on large portions of that data, and alerts are triggered as soon as certain conditions are met.

Andrey Vasilyev, CTO of Aqua Networks Limited - a London-based company which owns brands, such as WooServers.com, CloudStats.me and CloudLayar.com, and Art van Scheppingen, Senior Support Engineer at Severalnines, will discuss the challenges encountered by CloudStats.me in achieving database high availability and performance, as well as the solutions that were implemented to overcome these challenges.

Registration, Date & Time

Europe/MEA/APAC

Tuesday, February 23rd at 09:00 GMT / 10:00 CET (Germany, France, Sweden)
Register Now

North America/LatAm

Tuesday, February 23rd at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)
Register Now

Agenda

  • CloudStats.me infrastructure overview
  • Database challenges
  • Limitations in cloud-based infrastructure
  • Scaling MySQL - many options
    • MySQL Cluster, Master-Slave Replication, Sharding, ...
  • Availability and failover
  • Application sharding vs auto-sharding
  • Migration to MariaDB / Galera Cluster with ClusterControl & NoSQL
  • Load Balancing with HAProxy & MaxScale
  • Infrastructure set up provided to CloudStats.me
    • Private Network, Cluster Nodes, H/W SSD Raid + BBU
  • What we learnt - “Know your data!”

 

 

Speakers

Andrey Vasilyev is the CTO of Aqua Networks Limited - a London-based company which owns brands, such as WooServers.com, CloudStats.me and CloudLayar.com. Andrey has been leading the company’s new product development initiatives for 5 years and worked closely with the development and sales teams helping turn customer feedback into mass-market products. Having previously worked at Bloomberg L.P. and UniCredit Bank, Andrey’s main focus has always been on building stable and reliable platforms capable of serving hundreds of thousands of users.

Art van Scheppingen is a Senior Support Engineer at Severalnines. He’s a pragmatic MySQL and Database expert with over 15 years experience in web development. He previously worked at Spil Games as Head of Database Engineering, where he kept a broad vision upon the whole database environment: from MySQL to Couchbase, Vertica to Hadoop and from Sphinx Search to SOLR. He regularly presents his work and projects at various conferences (Percona Live, FOSDEM) and related meetups.

We look forward to “seeing” you there and to some good discussions!

For more discussions on database clustering and high availability strategies, do visit our Webinars Replay page.

Blog category:

We need your feedback: please participate in our open source database management survey

$
0
0

As members of the wider open source database users community, we’d like you to participate in our open source database deployment and management survey.

Your input will help us make our resources and tools for deploying, monitoring, managing and scaling databases of even more use to the community. It will give us valuable insight into the challenges you face when operating databases.

Please take the survey today by providing your input below; this will take approx. 5 minutes of your time.

 

We’ll share the results of the survey once we have compiled your responses.

Thank you!

Blog category:

Viewing all 365 articles
Browse latest View live