dsdgen+dsqgen安装
来源:互联网 发布:云端软件平台下载 编辑:程序博客网 时间:2024/06/05 12:11
1安装vmware9虚拟机+centos6.4 i386,virualbox需要给大些内存和显存否则不出图形界面,virtualbox安完reboot会崩溃。
2安装gcc yum -y install gcc
gcc yum -y install gcc/c++
3下载dsdgen 在官网需要它给邮箱发 ,搜索tpc-ds,然后unzip压缩包
4在tools文件夹 make命令
5命令生成数据
dbgen2–scale 100 –dir /tmp
其他见下面引用内部文档:
TPC-DS “how to” Guide
Introduction
The purpose of this guide is to describethe most common tasks necessary to implement a TPC-DS benchmark. The target audience is individuals who wantto install, populate, run and analyze the database, queries and data maintenanceworkloads for TPC-DS.
This guide does not discuss anythingrelated to actually publishing TPC-DS results (e.g.: tuning, data layout,pricing, auxiliary data structures, auditing, etc.).
For more details, see the official TPC-DSspecification.
Change Log
Date
Version
Comments
Feb. 8, 2007
1.0
Initial draft for kit 1.1.52 (Doug Johnson/Netezza)
Feb. 22, 2007
1.1
Added comments from Meikel Poess/Oracle and Shirley Wang/MS
How to download the latest kit
The kit is downloadable from www.tpc.org. Use the link to the TPC-DSbenchmark.
What’s in the kit
There are hundreds of files in the kit butonly a few that you have to read or modify. Here’s the list of files that you shouldread:
- Specification.doc: latest specification document
Here’s the list of files that you willprobably have to modify:
- tpcds.sql: file with SQL to create the data warehouse tables. You will probably need to add allocation or distribution information.
- tpcds_source.sql: file with SQL to create the source/staging tables. You will probably need to add allocation or distribution information.
- Makefile.suite: file with Unix-style make commands to build the dbgen2 and qgen2 binaries for a target platform. You will need to specify the execution environment (see make/build section below).
- vcproj files: files used to build the tools for Windows Visual C++.
- query_templates: folder with query template files (.tpl). You don’t manually change these files but the qgen2 program transforms them into executable SQL.
- TBD: folder with sample ANSI SQL “create view” statements used to refresh the data warehouse during the data maintenance phase.
How to make/build the binaries
For AIX, LINUX, HPUX, NCR and Solaris
- Copy Makefile.suite to Makefile
- Edit Makefile and find the line containing “OS = “
- Read the comments and append your target OS. For example: “OS = LINUX”
- Run “make”
For Windows
- Install Microsoft Visual Studio 2005
- Open the entire solution by double-clicking on dbgen2.sln (you might see an error saying “project file grammar.vcproj” has failed to load; you can safely ignore this error)
- From list of projects, right click on dbgen2 and select “build” (or from top menu, Build -> Build Solution). This will build mkheader and distcomp before building dbgen2.
- Repeat step 3 for building qgen2
- To cross compile for X64 and IA64 on X86 platforms, install Microsoft Visual Studio 2005 “Team Suite” SKU, modify the target platform from Build -> Configuration Manager, and repeat steps 3 & 4.
How to generate the load data
The dbgen2 utility generates input data to(a) load the initial data warehouse and (b) “refresh” the data warehouse for thedata maintenance workload. This section describes how to generate the loaddata.
Run “dbgen2 –h” for the help info. Notethat many of the options are “advanced” and usually not needed.
Example to generate the load data files fora 100GB in the /tmp directory:
dbgen2 –scale 100 –dir /tmp
The official scale factors are 100GB,300GB, 1TB, 3TB, 10TB, 30TB and 100TB.
The output files will be of the form“<table>.csv”. Even though file suffix is “.csv”, the default fielddelimiter is ‘|’. Use the “-delimiter ‘<c>’” option to change delimiters.
Since dbgen2 only generates 200-300GB/hour(on a 2-3GHz x86 processor), it is useful to run multiple parallel streams whengenerating large amounts of data. Here’s an example for generating 100 GB with 4parallel streams simultaneously on Linux/Unix :
dbgen2–scale 100 –dir /tmp –parallel 4 –child 1 &
dbgen2–scale 100 –dir /tmp –parallel 4 –child 2 &
dbgen2–scale 100 –dir /tmp –parallel 4 –child 3 &
dbgen2–scale 100 –dir /tmp –parallel 4 –child 4 &
Note that dbgen2 always reads the“tpcds.idx” file so if you run it from somewhere other than the “kit” directory,then you need to copy tpcds.idx to the current directory.
How to load the data
Run the loader provided with your DBMS toload the dbgen2 generated data files into the data warehouse tables.
Note that the default delimiter is ‘|’ soyou may need to specify a different delimiter with dbgen2 or the loader if thedefaults don’t match.
Also, the default “null” value is “||” soif your loader expects (for example), “|NULL|”, then you will need to overridethe loader’s value for nulls.
Unlike TPC-H, the load time is a componentof the QphDS@SF performance metric.
How to generate query SQL from templates
The “qgen2” utility is used to transformthe query templates (see query_templates/*.tpl) into executable SQL for yourtarget DBMS. The unmodified templates are not executable.
Run “qgen2 –h” for the help info. Note thatmany of the options are “advanced” and usually not needed.
Since some common SQL features do not haveANSI standard forms (e.g. “LIMIT” and “BEGIN/COMMIT”), the qgen2 utility mustbe told which “dialect” to use. The following “dialect templates” aresupported: db2.tpl, netezza.tpl, oracle.tpl, sqlserver.tpl. The followingexample generates a SQL file (named query_0.sql) from the query99 template for a100GB database using Oracle syntax.
qgen2 –query99.tpl –directory query_templates
–dialect oracle –scale 100
Note that qgen2 also reads the “tpcds.idx” file(in the “kit” directory) and the “ansi.tpl” file (in the “kit/query_templates”)so you’ll need to copy something somewhere.
How to run the queries
You can, of course, run the queries any wayyou want but the “official” method is to run N concurrent query streams where Nis a function of the database size. Unlike TPC-H, the TPC-DS query workloaddoes not have a single-stream query component. The relationship betweendatabase size (SF) and query streams (N) is:
SF
N
100
7
300
9
1,000
11
3,000
13
10,000
15
30,000
17
100,000
19
How to generate the refresh data
Run “dbgen2 –h” for the help info. Notethat many of the options are “advanced” and usually not needed.
Example to generate the refresh data filesin /tmp for the 3rd “update” stream:
dbgen2 –scale 100 –dir /tmp –update 3
The output files will be of the form“s_<table>_<stream>.csv”. The default field delimiter is ‘|’. Usethe “-delimiter ‘<c>’” option to change delimiters.
Since dbgen2 only generates 200-300GB/hour(on a 2-3GHz x86 processor), it is useful to run multiple parallel streams whengenerating large amounts of data. Here’s an example for the 3rdstream/child of 10 parallel streams:
dbgen2 –scale 100 –dir /tmp –update 3 –parallel10
–child 3
Note that dbgen2 always reads the“tpcds.idx” file so if you run it from somewhere other than the “kit”directory, then you need to copy tpcds.idx to the current directory.
How to run the data maintenance workload
The rules for the data maintenance phase (a.k.a.“refresh”) are intentionally “open-ended” because the natural methods forimplementation are DBMS-specific. The specification describes the methods ingeneral and the kit contains some sample SQL and but be warned thatimplementing the refresh workload on your system will require some time (thinkweeks, not days) from someone that has read the specification and knows whatthey’re doing.
As with the query streams, you can run therefresh streams any way you want. Officially, the specification defines thesame number of streams for both query and refresh operations. Unlike the querystreams, the refresh streams are run serially. Within a single stream, you canrun the individual DM function serially or in parallel.
In general, the process is:
- Create the staging tables (also optional): Example: “isql –db tpcds100 –f tpcds_source.sql”
- Load the staging tables using the DBMS-supplied loader.
- Run whatever else your database requires (e.g. grant permissions, generate statistics, etc.) to prepare the tables.
- Create sequences needed for the “history keeping” tables: call_center, item, store, web_page and web_site. See the specification for definition of “history keeping”.
- Create the refresh views. The kit contains sample ANSI SQL that will almost certainly require some changes.
- Run the refresh functions. There are five types of refresh functions – corresponding to the type of table and type of data maintenance operations. See the specification for details about the functions. In general, the methods involve running UPDATE, INSERT and DELETE commands using the views created above to refresh the data warehouse tables.
Here’s an example using generic shellscript and SQL.
MAX=`nzsqltpcds1 -At -c "(SELECT max(cc_call_center_sk)+1
FROM call_center)"`
isql -db$DB -c "CREATE SEQUENCE cc_seq START WITH $MAX"
- dsdgen+dsqgen安装
- tpc-ds 1.3.1的dsqgen生成查询sql用法
- 安装
- 安装
- 安装
- 安装
- 安装
- 安装
- 安装
- 安装
- 安装
- 安装
- 安装
- 安装
- 安装
- 安装
- 安装
- 安装
- 读代码大全
- emulator: ERROR: unknown virtual device name: 'my_avd'解决方法
- 思维导图
- Android源代码编译原理与前期准备
- 【自然框架】之通用权限:有ER图和表关系图 (收藏)
- dsdgen+dsqgen安装
- 黑马程序员_加速Java应用开发速度3:单元/集成测试+持续集成
- Vim简明教程【CoolShell】
- 判断两个浮点数是否相等
- Android NDK 概述,翻译OVERVIEW.html整体了解NDK
- 在序列中找到符合某个条件的子序列
- uva-11234 Evaluating Simple C Expressions
- 黑马程序员_加速Java应用开发速度4:使用模板技术加速项目开发速度
- win7 64位+Oracle 11g 64位下使用PL/SQL Developer 的解决办法