摘要:通过python+pt-kill实现对批量mysql多实例的过载保护
什么是pt-kill
pt-kill是percona-toolkit中的一个工具,用来杀掉指定匹配条件的慢查询会话以保证mysql实例的可用性。
安装方法详见官网:https://www.percona.com/doc/percona-toolkit/LATEST/installation.html
安装好后就可以在/usr/local/bin/目录下面看到一组pt-开头的可执行文件:
[root@mysql80 bin]# ll /usr/local/bin/pt-* -r-xr-xr-x 1 root root 40852 Nov 29 2016 /usr/local/bin/pt-align -r-xr-xr-x 1 root root 261876 Nov 29 2016 /usr/local/bin/pt-archiver -r-xr-xr-x 1 root root 168211 Nov 29 2016 /usr/local/bin/pt-config-diff -r-xr-xr-x 1 root root 165435 Nov 29 2016 /usr/local/bin/pt-deadlock-logger -r-xr-xr-x 1 root root 164145 Nov 29 2016 /usr/local/bin/pt-diskstats -r-xr-xr-x 1 root root 168157 Nov 29 2016 /usr/local/bin/pt-duplicate-key-checker -r-xr-xr-x 1 root root 49262 Nov 29 2016 /usr/local/bin/pt-fifo-split -r-xr-xr-x 1 root root 148953 Nov 29 2016 /usr/local/bin/pt-find -r-xr-xr-x 1 root root 66409 Nov 29 2016 /usr/local/bin/pt-fingerprint -r-xr-xr-x 1 root root 132405 Nov 29 2016 /usr/local/bin/pt-fk-error-logger -r-xr-xr-x 1 root root 191457 Nov 29 2016 /usr/local/bin/pt-heartbeat -r-xr-xr-x 1 root root 225247 Nov 29 2016 /usr/local/bin/pt-index-usage -r-xr-xr-x 1 root root 32345 Nov 29 2016 /usr/local/bin/pt-ioprofile -r-xr-xr-x 1 root root 247973 May 31 18:08 /usr/local/bin/pt-kill -r-xr-xr-x 1 root root 21753 Nov 29 2016 /usr/local/bin/pt-mext -r-xr-xr-x 1 root root 100986 Nov 29 2016 /usr/local/bin/pt-mysql-summary -r-xr-xr-x 1 root root 383964 Nov 29 2016 /usr/local/bin/pt-online-schema-change -r-xr-xr-x 1 root root 24594 Nov 29 2016 /usr/local/bin/pt-pmp -r-xr-xr-x 1 root root 517540 Nov 29 2016 /usr/local/bin/pt-query-digest -r-xr-xr-x 1 root root 72542 Nov 29 2016 /usr/local/bin/pt-show-grants -r-xr-xr-x 1 root root 37724 Nov 29 2016 /usr/local/bin/pt-sift -r-xr-xr-x 1 root root 144771 Nov 29 2016 /usr/local/bin/pt-slave-delay -r-xr-xr-x 1 root root 126975 Nov 29 2016 /usr/local/bin/pt-slave-find -r-xr-xr-x 1 root root 178774 Nov 29 2016 /usr/local/bin/pt-slave-restart -r-xr-xr-x 1 root root 69559 Nov 29 2016 /usr/local/bin/pt-stalk -r-xr-xr-x 1 root root 89868 Nov 29 2016 /usr/local/bin/pt-summary -r-xr-xr-x 1 root root 418326 Nov 29 2016 /usr/local/bin/pt-table-checksum -r-xr-xr-x 1 root root 395628 Nov 29 2016 /usr/local/bin/pt-table-sync -r-xr-xr-x 1 root root 222599 Nov 29 2016 /usr/local/bin/pt-table-usage -r-xr-xr-x 1 root root 329999 Nov 29 2016 /usr/local/bin/pt-upgrade -r-xr-xr-x 1 root root 176246 Nov 29 2016 /usr/local/bin/pt-variable-advisor -r-xr-xr-x 1 root root 101650 Nov 29 2016 /usr/local/bin/pt-visual-explain
|
如何使用pt-kill
可以通过pt-kill –help查看帮助。假如希望每10秒监控一次来自xldev用户的查询语句,如果有超过10秒的SQL就杀掉:
[root@mysql80 ~]# pt-kill \ --no-version-check \ --host 192.168.1.80 --port 3306 --user 'root' --password 'xueledba' \ --charset utf8 \ --match-command Query \ --match-user xldev \ --busy-time 10 \ --kill \ --victims all \ --interval 10 \ --print
|
在客户端通过xldev用户执行一个12秒的查询语句看看能否被杀掉:
[root@mysql1 ~]# mysql -uxldev -pxueledev -h192.168.1.80 -P3306 -e "select sleep(12);" Warning: Using a password on the command line interface can be insecure. ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
|
可以看到pt-kill输出日志显示该语句运行到第11秒的时候将它kill掉了:
# 2017-04-06T01:25:41 KILL 2921294 (Query 11 sec) select sleep(12)
|
由于这个是针对特定用户xldev的,因此在客户端用root执行就不会被杀:
[root@mysql1 ~]# mysql -uroot -pxueledba -h192.168.1.80 -P3306 -e "select sleep(12);" Warning: Using a password on the command line interface can be insecure. +-----------+ | sleep(12) | +-----------+ | 0 | +-----------+
|
多实例集中监控的实现
由于一个pt-kill进程只能监控一个实例,如果是多实例部署的话就要起多个进程来监控。因此通过python的多进程可以方便的实现,python脚本内容如下:
[root@mysql80 bin]# cat pt-kill.py #!/bin/env python # -*- coding: utf-8 -*- # by yangchangjiang@xueleyun.com import string import commands import datetime import multiprocessing import sys, os, time result = [] #传入要被拦截的SQL来源IP,多个IP中间要用|分割 refuse_ip = sys.argv[1] def runCmd(ip,port,user,pwd,refuse_ip): cmd = "/usr/local/bin/pt-kill --no-version-check --host %s --port %s --user %s --password %s --charset utf8 --match-command Query \ --match-host %s --busy-time 10 --kill --victims all --interval 10 --print" % (ip,port,user,pwd,refuse_ip) print cmd os.system(cmd) if __name__ == "__main__": global ip,port,user,pwd p = multiprocessing.Pool(processes=2) data = {'192.168.1.80':['3306','root','xueledba'], '192.168.1.91':['3306','root','xueledba'] } for d in data: ip = d port = data[d][0] user = data[d][1] pwd = data[d][2] #print ip,port,user,pwd result.append(p.apply_async(runCmd,(ip,port,user,pwd,refuse_ip))) p.close() for res in result: res.get(timeout=864000)
|
假如要针对来自192.168.1.3的SQL进行监控查杀,可以启动python脚本:
[root@mysql80 bin]# python pt-kill.py 192.168.1.3 /usr/local/bin/pt-kill --no-version-check --host 192.168.1.80 --port 3306 --user root --password xueledba --charset utf8 --match-command Query --match-host 192.168.1.3 --busy-time 10 --kill --victims all --interval 10 --print /usr/local/bin/pt-kill --no-version-check --host 192.168.1.91 --port 3306 --user root --password xueledba --charset utf8 --match-command Query --match-host 192.168.1.3 --busy-time 10 --kill --victims all --interval 10 --print #然后到192.168.1.3执行select sleep(30),可以看到会话被杀的日志输出: # 2017-05-26T17:01:02 KILL 1100285 (Query 15 sec) select sleep(30) # 2017-05-26T17:02:12 KILL 1146147 (Query 13 sec) select sleep(30)
|
采用这种方式同时监控多实例的话,由于日志是集中打印的,就不知道到底杀掉的是哪台上面的sql。因此可以修改pt-kill的源代码,添加输出主机IP信息:
#进入源码,搜索关键字MATCHING_QUERY或直接定位到第7144行: MATCHING_QUERY: foreach my $query ( @queries ) { if ( $o->get('print') ) { printf "# %s %s %d (%s %d sec) %s\n", ts(time), $o->get('kill-query') ? 'KILL QUERY' : 'KILL', $query->{Id}, ($query->{Command} || 'NULL'), $query->{Time}, ($query->{Info} || 'NULL'); } #将这段代码改为: MATCHING_QUERY: foreach my $query ( @queries ) { if ( $o->get('print') ) { printf "# %s %s %d (%s %d sec) %s %s\n", ts(time),$o->get('kill-query') ? 'KILL QUERY' : 'KILL', $query->{Id}, ($query->{Command} || 'NULL'), $query->{Time},($dbh->{mysql_hostinfo} || ''), ($query->{Info} || 'NULL'); } #然后保存退出,启动pt-kill: [root@mysql80 bin]# python pt-kill.py 192.168.1.3 /usr/local/bin/pt-kill --no-version-check --host 192.168.1.80 --port 3306 --user root --password xueledba --charset utf8 --match-command Query --match-host 192.168.1.3 --busy-time 10 --kill --victims all --interval 10 --print /usr/local/bin/pt-kill --no-version-check --host 192.168.1.91 --port 3306 --user root --password xueledba --charset utf8 --match-command Query --match-host 192.168.1.3 --busy-time 10 --kill --victims all --interval 10 --print # 2017-05-26T18:03:23 KILL 1150267 (Query 16 sec) 192.168.1.80 via TCP/IP select sleep(36) # 2017-05-26T18:04:03 KILL 1104444 (Query 13 sec) 192.168.1.91 via TCP/IP select sleep(35)
|
这样就可以看到是在80上的还是91上的被杀了。pt-kill的结果可以输出到日志也可以保存到数据库,具体可参考官方文档。