Postgresql生成大量测试数据

来源:互联网 发布:怎么买备案的域名 编辑:程序博客网 时间:2024/05/21 17:19
在PostgreSQL中如何用简单的几条SQL语句生成大量的测试数据呢?

此处,我简单的写一个例子,供参考(在Postgresql9.1下面做的):

(1)准备知识

针对 Postgres生成数据方法 

生成序列

SELECT * FROM generate_series(1,5);

生成date

SELECT date(generate_series(now(), now() + '1 week', '1 day'));

生成integer 随机数

SELECT (random()*(2*10^9))::integer;

生成numeric 随机数

random()*100.)::numeric(4,2)

生成字符串

substr('abcdefghijklmnopqrstuvwxyz',1,(random()*26)::integer)

生成重复串

repeat('1',(random()*40)::integer)

举例:

SELECT generate_series(1,10) as key,(random()*100.)::numeric(4,2),repeat('1',(random()*25)::integer) ORDER BY random();

结果:

(2)使用存储过程生成

看完上面的数据 我们有必要写一个Fucntion去生成数据

CREATE OR REPLACE FUNCTION generate_test_data(IN num integer, IN table_name text) RETURNS void AS$BODY$DECLAREquery_sql text :='';insert_sql text :='';insert_sql_1 text :='';insert_sql_2 text :='';column_info record;type_name name :='';column_name name :=''; BEGIN-- 根据表名查出各列的类型和列名query_sql := 'select t.typname as type, a.attname as name from pg_class c, pg_attribute a, pg_type t where c.relname = '''|| table_name ||''' and a.attnum > 0 and a.attrelid = c.oid and a.atttypid = t.oid';       --组成插入语句        insert_sql_1 := 'insert into '||table_name||'(';insert_sql_2 := 'values(';                for column_info in execute(query_sql) loop                                type_name := column_info.type;                column_name := column_info.name;                --根据类型写表达式拼insert语句  if substring(type_name from 1 for 3) = 'int' theninsert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || '(random()*(2*10^9))::integer' ||',';elsif type_name = 'text' theninsert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || 'substr(''abcdefghijklmnopqrstuvwxyz'',1, (random()*26)::integer)'||',';elsif type_name = 'char' theninsert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || 'substr(''abcdefghijklmnopqrstuvwxyz'',1, (random()*26)::integer)::char'||',';elsif substring(type_name from 1 for 5) = 'float' theninsert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || '(random()*(2*10^9))::float' ||',';elsif type_name = 'varchar' theninsert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || 'substr(''abcdefghijklmnopqrstuvwxyz'',1, (random()*26)::integer)'||',';elsif type_name = 'date' theninsert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || 'date(generate_series(now(), now() + ''1 week'', ''1 day''))' ||',';elsif type_name = 'time' theninsert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || 'generate_series(now(), now() + ''1 week'', ''1 day'')::time' ||',';elsif type_name = 'timestamp' theninsert_sql_1 := insert_sql_1 || column_name ||',';insert_sql_2 := insert_sql_2 || 'generate_series(now(), now() + ''1 week'', ''1 day'')::timestamp' ||',';                end if;end loop;--insert语句生成OKinsert_sql_1 := substring(insert_sql_1 from 1 for char_length(insert_sql_1)-1)||')';        insert_sql_2 := substring(insert_sql_2 from 1 for char_length(insert_sql_2)-1)||')';        insert_sql := insert_sql_1 ||' '||insert_sql_2;        --raise notice '%',insert_sql;for i in 1..num loop--执行insert语句--raise notice '%',i;execute(insert_sql);end loop;END;$BODY$LANGUAGE 'plpgsql';

之后执行

select generate_test_data(1000000,'test');

不过这样Insert语句会很慢,之后放弃,采用insert into select写法

(3)采用Insert into select生成一千万条测试数据


如果您想知道执行该sql的时间,请在执行上述命令前设置:
postgres=# \timing on
Timing is on.

最后说明一点的是,这是在我的笔记本上测试(内存8G,SSD 硬盘),插入一千万条记录要25秒 ,faint:

主要参考:
[1]PostgreSQL 之 生成测试数据,http://hi.baidu.com/hjzheng/blog/item/c78ad09320eb5297a877a4f6.html,
[2]http://www.depesz.com/2010/02/02/waiting-for-9-0-table-and-index-sizes/

(4)查看表test占用的存储空间

若查看其中的index的空间或整个relation的空间,请参考:

http://www.postgresql.org/docs/9.1/static/functions-admin.html ,

或:http://www.postgresql.org/docs/9.1/static/functions-admin.html。


(5)查看整个数据库占用的硬盘空间:
[postgres@localhost ~]$ cd /home/postgres/db/master/pgsql/data/base/16384   注释:这里是pgsql的data目录
[postgres@localhost 16384]$ du -sh
6.3M    
(6)如果想用delete清空该表,然后真正清空硬盘空间
postgres=# delete from foo;
DELETE 10000
然后在命令行:
[postgres@localhost ~]$ /home/postgres/db/master/pgsql/bin/vacuumdb mydb
然后再用第3步中的看看,是不是又减少到最初的几兆(我的大约是6M)空间了。

(7)通过Shell脚本执行测试数据生成

result=`psql -U postgres -h localhost -d postgres << EOF\timing on    insert into test(id, name) select i, 'text:' || i from generate_series(1,1) i;EOF`echo ${result}

参考:
[1] http://www.postgresql.org/docs/9.1/static/functions-sequence.html
[2] http://www.postgresql.org/docs/9.1/static/datatype-numeric.html#DATATYPE-SERIAL
[3] 在postgresql中查看数据库/表占用的物理存储空间大小,http://blog.xupeng.me/2008/02/13/physical-storage-for-dbs-tables-in-postgre/
[4] Waiting for 9.0 – table and index sizes,http://www.postgresql.org/docs/9.1/static/functions-admin.html 
0 0