我们经常会碰到需要删除数据表中重复记录,下面我来总结了几种能删除重复记录并助相对来讲效率是非常不错的,有需要了解的朋友可进入参考.
创建一个表用来存放,要删除的记录的id信息,代码如下:
- CREATE TABLE `tmp_ids` (
- `id` int(11),
- `name` char(20)
- ) ENGINE=MyISAM;
如果要删除的记录不多的话,可以把这个表创建成内存表形式,代码如下:
- CREATE TABLE `tmp_ids` (
- `id` int(11),
- `name` char(20)
- ) ENGINE=HEAP;
然后在test表中删除重复记录,代码如下:
- insert into tmp_ids select min(id),name from test group by name having count(*)>1 order by null;
- delete a.* from test a,tmp_ids b where b.name=a.name and a.id>b.id;
- truncate table tmp_ids;
方法二,复制无重复记录到新表格,删除旧表格,然后重命名新表格为旧表名称,代码如下 :
- mysql> select * from duplicate where id in(select min(id) from duplicate group by name);
- +
- | id | name |
- +
- | 1 | wang |
- | 3 | wdang |
- | 5 | wdand |
- | 6 | wddda |
- +
- 4 rows in set (0.01 sec)
- mysql> create table duplica select * from duplicate where id in(select min(id) from duplicate group by name);
- Query OK, 4 rows affected (0.02 sec)
- Records: 4 Duplicates: 0 Warnings: 0
- mysql> drop table duplicate;
- Query OK, 0 rows affected (0.01 sec)
- mysql> alter table duplica rename to duplicate;
- Query OK, 0 rows affected (0.00 sec)
- mysql> select * from duplicate;
- +
- | id | name |
- +
- | 1 | wang |
- | 3 | wdang |
- | 5 | wdand |
- | 6 | wddda |
- +
- 4 rows in set (0.00 sec)
-
- mysql> alter table duplicate modify id int(2) not null primary key auto_increment;
- Query OK, 4 rows affected (0.00 sec)
- Records: 4 Duplicates: 0 Warnings: 0
后来想了一个语句搞定了,代码如下:
- mysql> use mysql
- Database changed
- mysql> select * from duplicate;
- +
- | id | name |
- +
- | 1 | wang |
- | 3 | wdang |
- | 5 | wdand |
- | 6 | wddda |
- | 2 | wang |
- | 4 | wdang |
- +
- 6 rows in set (0.00 sec)
- mysql> delete duplicate as a from duplicate as a,
- -> (
- -> select * from duplicate group by name having count(1)>1) as b
- -> where a.name=b.name and a.id > b.id;
- Query OK, 2 rows affected (0.00 sec)
- mysql> select * from duplicate;
- +
- | id | name |
- +
- | 1 | wang |
- | 3 | wdang |
- | 5 | wdand |
- | 6 | wddda |
- +
- 4 rows in set (0.00 sec)
保留ID最小的记录.
第1种,代码如下:
- delete from %s where goodsurl in (select goodsrul as gurl1 from %s
- #group by grul1 having count(gurl1)>1)rs1 and id not in (select min(id)as id2 from %s
- #group by goodsurl having count(goodsurl)>1)rs2"%(a,a,a)
第2种,这种方法不使用子集,但是我不知道怎么把rs1,rs2里面的元组嵌入到SQL语句中,代码如下:
- exeSql = "select min(id) from %s group by goodsurl havingcount(goodsurl)>1)"%(a,)
- cur.execute(exeSql)
- rs1 = cur.fetchall()
- exeSql = "select goodsurl from %s group by goodsurl havingcount(goodsurl)>1"%(a,)
- cur.execute(exeSql)
- rs2 = cur.fetchall()
- exeSql = "delete from %s where goodsurl in %s and id not in %"%(a,rs2,rs1)
- cur.execute(exeSql)
|