Datameer使用点滴

来源:互联网 发布:c语言中的函数格式 编辑:程序博客网 时间:2024/06/05 09:40

Datameer Q&A

What do Read,Write, and Execute Permissions mean inDatameer?

Datameer permissions are built like Linux-based or Mac-basedpermissions. This means you can grant different permissions on artifacts fordifferent groups within Datameer and different users within Datameer.

READ - Allows other users to view your artifact.

WRITE – Allows other users to edit your artifact. This requiresread permissions as well.

EXECUTE – Allows other users to run your artifact (i.e. workbooks,import job, etc.). This requires read permissions as well.

For example, say you create a workbook. You can allow other usersto view your workbook, but not make changes if you only grant READ access tothem. They will be able to see your analysis but not make changes. On the flipside, you can grant WRITE permissions to them, and they would be able to go inand make changes to the workbook. If you grant users EXECUTE access, and thenthey would be able to go in and run the workbook against the entire data set.

Term

Definition

Job

The complete set of data including the connections, associated analytics (workbooks), and visualization tools (infographics). The job also includes the schedule of when the data gets updated, and whether this happens automatically or manually.

Connections

The repository of structured, semi-structured, and unstructured data from one or more sources used to create analytics.

Workbook

Where you view a sample of your data and create analytics, using the built-in functions, sorting, filtering, and other tools to discover relationships in your data set.

Widgets

The reporting tools you use to easily create tables, charts, graphs, and other visual ways of looking at your data. Widgets let you quickly and visually manipulate your data.

Infographics

Where you can see at a glance the tables, charts, and graphs you create for visualizing your data.

You can use Datameer with any type of data such as logfiles, call details records, sales or transactional data, clickstream data, website metrics, social networking data and more. You can combine multipledatasources and data types together to collect the raw data you need foranalysis. You can import data or use the data imported by a system analyst.

Data formats supported include:

  1. Flat files such as Excel spreadsheets, comma-delimited text files(.CSV), FDFS(File Descriptor File System), Apache log files, and S3(Amazon Simple Storage Service), and unstructured data such as Twitter data.
  2. Relational databases such Oracle(10g), HSQL-DB, DB2, or MySQL(5.1)
  3. Other types such as HIVE(a data warehouse infrastructure built on Hadoop)

Raw data is stored andprocessed using Hadoop, which manages and distributes both the data and thecomputational load over multiple computers networked together. The Datameertools allow you to easily analyze and visualize relationships in the data.


Datameerincludes datasource integration, storage, an analytics engine, andvisualization tools.

Datameer is based onHadoop  which allows it to scale to accomodate and manipulate largevolumes of data. It supports integrating data from many of the commonlyused databases including Oracle, DB2, MySQL, and from files such as logfiles, twitter data feeds, CSV files, Excel files, or text files.


UseDatameer to analyze customer relationship management content, web logs,customer data, sales data, social media content, and even data from Excelfiles. You can store that data on your own servers or use a service availableon the cloud such as Amazon Web Services.

Datameer provides afamiliar interactive spreadsheet-based interface that is easy to use, butalso powerful so that you don’t need to turn to developers for analytics. Thespreadsheet is specifically designed for visualization of big data and includesmore than 200 built-in functions for exploring and discovering complexrelationships. In addition, because Datameer is extensible, you can usefunctions from third-party tools or write your own commands.

Datameer's BusinessInfographics tools include charts, graphs, maps, and allows you to incorporateyour own visual elements to produce stunning, print-quality datavisualizations.

TheDatameer tools allow you to easily Extract, Transform, and Load (ETL) datafrom multiple sources including your current transactional database systemsregardless of source or formats. Then you can analyze relationshipsin the data using an interactive spreadsheet interface and visualize theresults of that analysis using the built-in infographic widgets.

Datameer is specificallydesigned to solve the challenges of accessing, analyzing and using massiveamounts of data, leveraging Apache Hadoop open source technology. Datameerenables enterprises to gain insights from all available data sources regardlessof size in a cost effective manner.

Massively parallelprocessing architecture facilitates ultra-fast performance of complexanalytics. Hadoop scales to 4000 servers and petabytes of data andthe application processes are fully parallelized inside Hadoopclusters. This dynamic workload optimization utilizes hardware moreefficiently.

Datameer includesbuilt-in fault resilience for high application availability, and elasticexpansion to dynamically expand storage capacity without system downtime. Theadvanced data compression increases performance and decreases storagerequirements.

What is a job?

job sets up the connection to adatasource to import information into Datameer for processing. It can then runat the intervals you specify, for example, when manually triggered, whendata changes, or at a time schedule you set up. That way, you control howcurrent the data is and how frequently it gets updated.

An analytics job runs thecalculations and logic you set up in the workbook and displays the results inthe infographic widgets so you can easily view or share your results.

What are Connections?

Each type of data is setup as a connection so it can be used by Datameer. Forexample, you can have sales data from an Oracle or MySQL server, othercontent from a CSV file exported from Excel, twitter feeds about yourcompany and products, and customer call logging data from yet anothersource. You can easily pull all that information into Datameer. 

How do I connect to various kindsof data?

You create a workbook in Datameer that connects to oneor more of these sources of data which you can then use to do analysis.For example, you could use sales data from yourcorporate database, twitter feeds, customer call logging data--all fromdifferent sources as the basis for your analysis. 

Key concepts forAnalysts

Before you can useDatameer to do analysis, you or a systems administrator need to set up aconnection and import data into Datameer. Once that is done, you canchoose a job to start analyzing.

When you use a workbook:

·        You are setting up an analysis while viewing a subsetof the entire dataset. That analysis will then run on the full data

·        Each filter you apply to the data creates a new tab(sheet) in the workbook, and is one step of the analysis

·        You can click each tab of the workbook to see the results ofthat step of analysis

·        Some pages are read-only and others are editable (you can easilytell by viewing the tabs)

·        Calculations apply to columns, not to a range of cells in acolumn

·        There are more than 175 built-in formulas

·        You can add custom formulas--using the ApplicationProgrammer Interface (API)

·        You can choose which sheets you want to save along the way

·        You can change workbook settings at any time

·        You can import a worksheet into another workbook

When you use infographics:

·        You are selecting and customizing a way of looking at theanalysis you did in the workbook

·        You can set up multiple widgets to look at your data indifferent ways

·        The data fields you can show in the widgets come fromthe sheets that are saved in the workbook

·        You can share your infographics with other users by sending alink to them

 

Key concepts forSystem Administrators

·        You set up connections which are a collection of data thatcan be structured, semi-structured, unstructured, or a mix of types

·        Jobs are created using connections, and include any associatedworkbooks and infographics created by analysts

·        Both you or the analyst can specify when jobs will run

·        You can optimize for speed by saving only the sheets you need inworkbooks

·        Datameer provides role-based security features. You can setup group permissions and assign users to groups.

 

 

Step 1
As your first step, Open the first tutorial folder Tutorial Hellow World in the folder Start Here in the Browser tab or download the app from the Analytics App Market.

Step 2
Double-click the "My Upload" file and choose the "Edit" button.

Step 3
Here you would choose the file you wish to upload by selecting "Choose File" and format by selecting the drop down menu.  For this example, simply select "Next" to continue.

Step 4
In  'Data Details' leave the box checked "Column names are contained in the first row." Also notice you can choose your delimiter and advanced options (e.g. quote character). Click "next" to continue.  

Step 5

You'll now see a preview of the data where you can make any changes as needed to the file type, column name, or columns to include.  Click "Next" twice to continue to the "Save" Page.

Step 6

Save the File Upload with the same name by clicking "Save" or create a copy by clicking "Save Copy As..."

Step 7

Back in the Browser Tap, click on the + button and select "Analytics" and then "Workbook."  From the Link Data browser box, select "MyUpload" from the /_Start Here_/_1_Tutorial Hello World folder and click the blue Link Data button at the bottom of the box.


Step 7
Once you have opened the data in your workbook you can now analyze the data.  Click the Apply Filter button on the tool bar and choose "City" as the filter column and "Equals" as the expression, and type "Chicago" as your value.  Then click Create Filtered Sheet at the bottom on the box.

Step 8:
To save the workbook, click the "Save Workbook" button in the toolbar and choose a name for the file such as "MyWorkbook" and click save.  You will have options on on how and when to run the workbook.  Click the box "Start calculation process immediately after save" and then click Next and then Save to finish start the calculation.

Step 9
After the workbook has been calculated you can visualize the analysis of your data though Datameer's Business Infographics™.  Click the + in the Browser again and choose Infographic to create your first data visualization.  

Step 10
From the “Add Widget” Inspector on the left of your screen, simply drag and drop the BARCHART onto the canvas. Click on "Data" on the widget to add your data from your workbook. On the right side of your screen, navigate to 'MyWorkbook', again under in the data browser  to latest results, then to Sheet1 and drag column 'Name' to the Label field, and column 'Age' to the data field. The widget will update automatically. That’s it! If you want to save this infographic, simply click the Save icon, and you’re all set. Have fun exploring!

To create a connection for a database type that you do notsee listed in the file type list, you need to first add the appropriatedatabase drivers to your Datameer installation.

How to InstallDatabase Drivers

·        Go to the driver download site: http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html

·        Select the following driver:

ojdbc6.jar (2,111,220 bytes) - Classes for use with JDK 1.6. It contains the JDBCdriver classes except classes for NLS support in Oracle Object and Collectiontypes.

(or select a driver appropriate for your database.)

·        In DAS, go to the Administration tab and then to the Database Drivers tab.
Click on the
 New button to add a newdatabase driver.

·        Enter the following information:

·        Name: Oracle

·        Jar File: ojdbc6.jar (orother file you downloaded)

·        Dialect: Oracle-Dialect

·        Driver Class: oracle.jdbc.driver.OracleDriver

·        Connection Pattern: jdbc:oracle:thin:@%host%:%port%:%sidName%

·        Oracle should now appear as an available database driver.


 

原创粉丝点击