使用pt-kill实现MySQL的过载保护


摘要:通过python+pt-kill实现对批量mysql多实例的过载保护

什么是pt-kill

pt-kill是percona-toolkit中的一个工具,用来杀掉指定匹配条件的慢查询会话以保证mysql实例的可用性。
安装方法详见官网:https://www.percona.com/doc/percona-toolkit/LATEST/installation.html
安装好后就可以在/usr/local/bin/目录下面看到一组pt-开头的可执行文件:

[root@mysql80 bin]# ll /usr/local/bin/pt-*
-r-xr-xr-x 1 root root 40852 Nov 29 2016 /usr/local/bin/pt-align
-r-xr-xr-x 1 root root 261876 Nov 29 2016 /usr/local/bin/pt-archiver
-r-xr-xr-x 1 root root 168211 Nov 29 2016 /usr/local/bin/pt-config-diff
-r-xr-xr-x 1 root root 165435 Nov 29 2016 /usr/local/bin/pt-deadlock-logger
-r-xr-xr-x 1 root root 164145 Nov 29 2016 /usr/local/bin/pt-diskstats
-r-xr-xr-x 1 root root 168157 Nov 29 2016 /usr/local/bin/pt-duplicate-key-checker
-r-xr-xr-x 1 root root 49262 Nov 29 2016 /usr/local/bin/pt-fifo-split
-r-xr-xr-x 1 root root 148953 Nov 29 2016 /usr/local/bin/pt-find
-r-xr-xr-x 1 root root 66409 Nov 29 2016 /usr/local/bin/pt-fingerprint
-r-xr-xr-x 1 root root 132405 Nov 29 2016 /usr/local/bin/pt-fk-error-logger
-r-xr-xr-x 1 root root 191457 Nov 29 2016 /usr/local/bin/pt-heartbeat
-r-xr-xr-x 1 root root 225247 Nov 29 2016 /usr/local/bin/pt-index-usage
-r-xr-xr-x 1 root root 32345 Nov 29 2016 /usr/local/bin/pt-ioprofile
-r-xr-xr-x 1 root root 247973 May 31 18:08 /usr/local/bin/pt-kill
-r-xr-xr-x 1 root root 21753 Nov 29 2016 /usr/local/bin/pt-mext
-r-xr-xr-x 1 root root 100986 Nov 29 2016 /usr/local/bin/pt-mysql-summary
-r-xr-xr-x 1 root root 383964 Nov 29 2016 /usr/local/bin/pt-online-schema-change
-r-xr-xr-x 1 root root 24594 Nov 29 2016 /usr/local/bin/pt-pmp
-r-xr-xr-x 1 root root 517540 Nov 29 2016 /usr/local/bin/pt-query-digest
-r-xr-xr-x 1 root root 72542 Nov 29 2016 /usr/local/bin/pt-show-grants
-r-xr-xr-x 1 root root 37724 Nov 29 2016 /usr/local/bin/pt-sift
-r-xr-xr-x 1 root root 144771 Nov 29 2016 /usr/local/bin/pt-slave-delay
-r-xr-xr-x 1 root root 126975 Nov 29 2016 /usr/local/bin/pt-slave-find
-r-xr-xr-x 1 root root 178774 Nov 29 2016 /usr/local/bin/pt-slave-restart
-r-xr-xr-x 1 root root 69559 Nov 29 2016 /usr/local/bin/pt-stalk
-r-xr-xr-x 1 root root 89868 Nov 29 2016 /usr/local/bin/pt-summary
-r-xr-xr-x 1 root root 418326 Nov 29 2016 /usr/local/bin/pt-table-checksum
-r-xr-xr-x 1 root root 395628 Nov 29 2016 /usr/local/bin/pt-table-sync
-r-xr-xr-x 1 root root 222599 Nov 29 2016 /usr/local/bin/pt-table-usage
-r-xr-xr-x 1 root root 329999 Nov 29 2016 /usr/local/bin/pt-upgrade
-r-xr-xr-x 1 root root 176246 Nov 29 2016 /usr/local/bin/pt-variable-advisor
-r-xr-xr-x 1 root root 101650 Nov 29 2016 /usr/local/bin/pt-visual-explain

如何使用pt-kill

可以通过pt-kill –help查看帮助。假如希望每10秒监控一次来自xldev用户的查询语句,如果有超过10秒的SQL就杀掉:

[root@mysql80 ~]# pt-kill \
--no-version-check \
--host 192.168.1.80 --port 3306 --user 'root' --password 'xueledba' \
--charset utf8 \
--match-command Query \
--match-user xldev \
--busy-time 10 \
--kill \
--victims all \
--interval 10 \
--print

在客户端通过xldev用户执行一个12秒的查询语句看看能否被杀掉:

[root@mysql1 ~]# mysql -uxldev -pxueledev -h192.168.1.80 -P3306 -e "select sleep(12);"
Warning: Using a password on the command line interface can be insecure.
ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query

可以看到pt-kill输出日志显示该语句运行到第11秒的时候将它kill掉了:

# 2017-04-06T01:25:41 KILL 2921294 (Query 11 sec) select sleep(12)

由于这个是针对特定用户xldev的,因此在客户端用root执行就不会被杀:

[root@mysql1 ~]# mysql -uroot -pxueledba -h192.168.1.80 -P3306 -e "select sleep(12);"
Warning: Using a password on the command line interface can be insecure.
+-----------+
| sleep(12) |
+-----------+
| 0 |
+-----------+

多实例集中监控的实现

由于一个pt-kill进程只能监控一个实例,如果是多实例部署的话就要起多个进程来监控。因此通过python的多进程可以方便的实现,python脚本内容如下:

[root@mysql80 bin]# cat pt-kill.py
#!/bin/env python
# -*- coding: utf-8 -*-
# by yangchangjiang@xueleyun.com
import string
import commands
import datetime
import multiprocessing
import sys, os, time
result = []
#传入要被拦截的SQL来源IP,多个IP中间要用|分割
refuse_ip = sys.argv[1]
def runCmd(ip,port,user,pwd,refuse_ip):
cmd = "/usr/local/bin/pt-kill --no-version-check --host %s --port %s --user %s --password %s --charset utf8 --match-command Query \
--match-host %s --busy-time 10 --kill --victims all --interval 10 --print" % (ip,port,user,pwd,refuse_ip)
print cmd
os.system(cmd)
if __name__ == "__main__":
global ip,port,user,pwd
p = multiprocessing.Pool(processes=2)
data = {'192.168.1.80':['3306','root','xueledba'],
'192.168.1.91':['3306','root','xueledba']
}
for d in data:
ip = d
port = data[d][0]
user = data[d][1]
pwd = data[d][2]
#print ip,port,user,pwd
result.append(p.apply_async(runCmd,(ip,port,user,pwd,refuse_ip)))
p.close()
for res in result:
res.get(timeout=864000)

假如要针对来自192.168.1.3的SQL进行监控查杀,可以启动python脚本:

[root@mysql80 bin]# python pt-kill.py 192.168.1.3
/usr/local/bin/pt-kill --no-version-check --host 192.168.1.80 --port 3306 --user root --password xueledba --charset utf8 --match-command Query --match-host 192.168.1.3 --busy-time 10 --kill --victims all --interval 10 --print
/usr/local/bin/pt-kill --no-version-check --host 192.168.1.91 --port 3306 --user root --password xueledba --charset utf8 --match-command Query --match-host 192.168.1.3 --busy-time 10 --kill --victims all --interval 10 --print
#然后到192.168.1.3执行select sleep(30),可以看到会话被杀的日志输出:
# 2017-05-26T17:01:02 KILL 1100285 (Query 15 sec) select sleep(30)
# 2017-05-26T17:02:12 KILL 1146147 (Query 13 sec) select sleep(30)

采用这种方式同时监控多实例的话,由于日志是集中打印的,就不知道到底杀掉的是哪台上面的sql。因此可以修改pt-kill的源代码,添加输出主机IP信息:

#进入源码,搜索关键字MATCHING_QUERY或直接定位到第7144行:
MATCHING_QUERY:
foreach my $query ( @queries ) {
if ( $o->get('print') ) {
printf "# %s %s %d (%s %d sec) %s\n",
ts(time), $o->get('kill-query') ? 'KILL QUERY' : 'KILL',
$query->{Id}, ($query->{Command} || 'NULL'), $query->{Time},
($query->{Info} || 'NULL');
}
#将这段代码改为:
MATCHING_QUERY:
foreach my $query ( @queries ) {
if ( $o->get('print') ) {
printf "# %s %s %d (%s %d sec) %s %s\n",
ts(time),$o->get('kill-query') ? 'KILL QUERY' : 'KILL',
$query->{Id}, ($query->{Command} || 'NULL'), $query->{Time},($dbh->{mysql_hostinfo} || ''),
($query->{Info} || 'NULL');
}
#然后保存退出,启动pt-kill:
[root@mysql80 bin]# python pt-kill.py 192.168.1.3
/usr/local/bin/pt-kill --no-version-check --host 192.168.1.80 --port 3306 --user root --password xueledba --charset utf8 --match-command Query --match-host 192.168.1.3 --busy-time 10 --kill --victims all --interval 10 --print
/usr/local/bin/pt-kill --no-version-check --host 192.168.1.91 --port 3306 --user root --password xueledba --charset utf8 --match-command Query --match-host 192.168.1.3 --busy-time 10 --kill --victims all --interval 10 --print
# 2017-05-26T18:03:23 KILL 1150267 (Query 16 sec) 192.168.1.80 via TCP/IP select sleep(36)
# 2017-05-26T18:04:03 KILL 1104444 (Query 13 sec) 192.168.1.91 via TCP/IP select sleep(35)

这样就可以看到是在80上的还是91上的被杀了。pt-kill的结果可以输出到日志也可以保存到数据库,具体可参考官方文档。

文章目录
  1. 1. 什么是pt-kill
  2. 2. 如何使用pt-kill
  3. 3. 多实例集中监控的实现
|