196. Delete Duplicate Emails

来源:互联网 发布:晕血学生物 知乎 编辑:程序博客网 时间:2024/05/22 03:35

Write a SQL query to delete all duplicate email entries in a table named Person, keeping only unique emails based on its smallest Id.

+----+------------------+| Id | Email            |+----+------------------+| 1  | john@example.com || 2  | bob@example.com  || 3  | john@example.com |+----+------------------+Id is the primary key column for this table.

For example, after running your query, the above Person table should have the following rows:

+----+------------------+| Id | Email            |+----+------------------+| 1  | john@example.com || 2  | bob@example.com  |+----+------------------+


题目大意:删除重复邮箱(注意必须对原表进行删除操作,查询操作将无结果。
博主今天脑子短路,这种简单的题愣是没想出来答案,不过在搜答案的过程中遇到了几个不能理解的,在这边做阐述尝试理解:

先讲容易理解的:
DELETE FROM Person WHERE Id NOT IN(SELECT Id FROM (SELECT MIN(Id) Id FROM Person GROUP BY Email) p);
先将表根据Email分组,找出每个组中最小的Id,然后取其Id补集并删除,看似第二个select id from是多余的,其实,这个是mysql语法导致的,mysql语句不允许在同一条语句中对同一个表进行select和update操作,这会导致一个
You can't specify target table 'Person' for update in FROM clause错误,所以要引入中间表p
其实下面的代码也能正常工作:(大小写请见谅)
Delete  from Person where Id in(select Id from  (select p1.Id from Person p1,Person p2 where p1.Id > p2.Id and p1.Email = p2.Email) p);

但是既然用了表的连接,其实有一种能规避子查询的方法:

Delete p2 from Person p1,Person p2 where p1.Email = p2.Email and p2.id > p1.id;

博主有点不理解这个,因为没有见过delete 后面能够跟表别名的,想了好久,能够自圆其说的是,将p2表看作原表,删除其id > 同邮箱对应id的数据。(也许可以当做表单字段去重工具用?)

运行时间1 < 2 < 3;


原创粉丝点击