查看: 2202|回复: 6

大型论坛 + MySQL 5.1 表分区

2354 主题	170 好友	3万积分

站长

Rank: 120

发消息

电梯直达

1^#

发表于 2008-12-28 11:32 AM |只看该作者 |倒序浏览

Mysql从5.1开始支持表的分区，今天下载了一个mysql-5.1.22的源码包安装后进行大数据表的分区测试。
mysql> select version();
+-----------------+
| version()       |
+-----------------+
| 5.1.22-rc-log  |
+-----------------+
1 row in set (0.00 sec)
用来测试的数据为discuz论坛的数据库，表为cdb_posts表，数据量为1500多万条
mysql> select count(*) from cdb_posts;
+-------------+
| count(*) |
+-------------+
| 15276429 |
+-------------+
1 row in set (0.04 sec)
为了增强表的扩展性，将cdb_posts表分为10个分区，新建一个表cdb_posts1,建表语句为

CREATE TABLE `cdb_posts1` (
  `pid` int(10) unsigned NOT NULL auto_increment,
  `fid` smallint(6) unsigned NOT NULL default '0',
  `tid` mediumint(8) unsigned NOT NULL default '0',
  `first` tinyint(1) NOT NULL default '0',
  `author` varchar(15) NOT NULL default '',
  `authorid` mediumint(8) unsigned NOT NULL default '0',
  `subject` varchar(80) NOT NULL default '',
  `dateline` int(10) unsigned NOT NULL default '0',
  `message` mediumtext NOT NULL,
  `useip` varchar(15) NOT NULL default '',
  `invisible` tinyint(1) NOT NULL default '0',
  `anonymous` tinyint(1) NOT NULL default '0',
  `usesig` tinyint(1) NOT NULL default '0',
  `htmlon` tinyint(1) NOT NULL default '0',
  `bbcodeoff` tinyint(1) NOT NULL default '0',
  `smileyoff` tinyint(1) NOT NULL default '0',
  `parseurloff` tinyint(1) NOT NULL default '0',
  `attachment` tinyint(1) NOT NULL default '0',
  `rate` smallint(6) NOT NULL default '0',
  `ratetimes` tinyint(3) unsigned NOT NULL default '0',
  `status` tinyint(1) NOT NULL default '0',
  PRIMARY KEY  (`pid`),
  KEY `fid` (`fid`),
  KEY `dateline` (`dateline`),
  KEY `authorid` (`authorid`),
  KEY `invisible` (`invisible`),
  KEY `displayorder` (`tid`,`invisible`,`dateline`),
  KEY `first` (`tid`,`first`)
) ENGINE=MyISAM AUTO_INCREMENT=17715224 DEFAULT CHARSET=gbk PARTITION BY RANGE (pid) (
PARTITION p0 VALUES LESS THAN (3000000),
PARTITION p1 VALUES LESS THAN (6000000),
PARTITION p2 VALUES LESS THAN (9000000),
PARTITION p3 VALUES LESS THAN (12000000),
PARTITION p4 VALUES LESS THAN (15000000),
PARTITION p5 VALUES LESS THAN (18000000),
PARTITION p6 VALUES LESS THAN (21000000),
PARTITION p7 VALUES LESS THAN (24000000),
PARTITION p8 VALUES LESS THAN (27000000),
PARTITION p9 VALUES LESS THAN (30000000),
PARTITION p10 VALUES LESS THAN MAXVALUE
);

接下来将cdb_post表的数据导入到cdb_posts1

mysql>insert into cdb_posts1 select * from cdb_posts;

由于数据量非常大，所用的时间当然也就很长，在双路双核1.6G 4M L2cache CPU，2G内存的机器上这个过程花费了1小时3秒。

接下来测试分区后的表和未分区的表的查询速度比较。

测试程序如下：

<?php
mysql_connect("localhost","root","");
mysql_select_db("discuz"); )
//cdb_posts
$start = microtime_float(); //开始时间

for($i=0;$i<100;$i++) { //查询100次

$rand = rand(100,15000000);
$sql = "select * from cdb_posts where pid= ".$rand;
mysql_query($sql);
$i++;
}
$end = microtime_float();    //结束时间

//cdb_posts1
$posts =  "select cdb_posts has total spend ".($end-$start)." seconds\n";

$start = microtime_float(); //开始时间

for($i=0;$i<100;$i++) {    //查询100次
$rand = rand(100,15000000);
$sql = "select * from cdb_posts1 where pid = ".$rand;
mysql_query($sql);
$i++;
}
$end = microtime_float();    //结束时间
$posts1 =  "select cdb_posts1 has total spend ".($end-$start)." seconds\n";
echo "\n\n\n";
echo $posts."\n";
echo $posts1."\n";
mysql_query("flush table cdb_posts"); //清除表缓存
mysql_query("flush table cdb_posts1");  //清除表缓存
function microtime_float()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
?>
为了尽可能的避免Mysql查询使用缓存，将pid采用rand函数取随机数。
执行这个php
select cdb_posts has total spend 0.48239207267761 seconds
select cdb_posts1 has total spend 1.4392290115356 seconds
接下来将查询次数改到1000
select cdb_posts has total spend 6.6411547660828 seconds
select cdb_posts1 has total spend 13.684276103973 seconds
接下来我把查询次数增加到10000，执行结果为

select cdb_posts has total spend 42.948129892349 seconds
select cdb_posts1 has total spend 68.492646932602 seconds

从以上结果可以看出，大表进行分区后，查询效率有所降低，但是随着查询次数的增多，所用时间的差距不断减小。

本文转自http://blog.kankantu.com/article.php?type=blog&itemid=14