olap4j:在线分析处理[from ddd]

来源:互联网 发布:淘宝怎么报名淘抢购 编辑:程序博客网 时间:2024/05/01 07:50

    The advent of multicore and many-core processors on expensive desktops and servers has ushered in an era in which small companies can fairly easily perform data mining and analytical processing of large databases as part of an effort to optimize business performance.


    Online analytical processing (OLAP) is capable of leveraging multiple processors, but it has long been hampered by the lack of standardized APIs, especially for Java. Not that nobody was working on them...


    In fact, over the past two decades, many Java Community Process initiatives have tried to establish successful APIs, but all efforts have failed; in many cases because they relied on proprietary extensions. This article discusses a recently released API, supported by multiple vendors in the business intelligence (BI) sector, which makes it easy to leverage today’s processors to glean insights into business data.


    The API, known as olap4j (http://www.olap4j.org/), is the OLAP equivalent of Java database connectivity (JDBC) for relational data. Specifically, olap4j extends core classes of JDBC specifications 3 and 4 in order to bring OLAP data sources to the Java platform. Connections can be obtained by JDBC connection management facilities.


    The API uses statements, which are provided over connections. The queries, formulated using the MDX language, can be sent textually to the connection. The olap4j API also includes a type system capable of representing any server-specific MDX grammar as a business object model.


    The API makes heavy use of metadata, as most OLAP-related operations involve a user exploring the database and building queries interactively. Results of multi-dimensional queries are represented by a CellSet object. CellSet is to multi-dimensional data what ResultSet is to tabular data.

  • Getting Started
    The olap4j API is distributed as four modules, which are shown in Table 1. For the purpose of this article, I will use both the core module and the XML/A driver. You can find those libraries at the olap4j website(http://www.olap4j.org/) and in the Resources section at the end of this article.

  • Connecting to an OLAP Server
    Connecting to an OLAP server is done in the same way you connect to a relational database. As I mentioned earlier, olap4j leverages JDBC driver management and connection facilities. A sample is shown in Listing One.
  • Listing One: Connecting to a remote OLAP server.

          Class.forName("org.olap4j.driver.xmla.XmlaOlap4jDriver");
          final Connection conn = DriverManager.getConnection ("jdbc:xmla:Server=http://example.com/xmla/msmdpump.dll;"
+ "Cache=org.olap4j.driver.xmla.cache.XmlaOlap4jNamedMemoryCache;"
+ "Cache.Mode=LFU;Cache.Timeout=600;Cache.Size=100");


          final OlapConnection oConn =conn.unwrap(OlapConnection.class);

 

 

      The code in Listing One will give you a connection object to a remote OLAP server using the XML/A driver. We use JDBC 4’s connection unwrapping to obtain access to the underlying OLAP connection. Applications developed for Java 5 can cast the connection object directly. Take note that the code in Listing One uses the XML/A driver’s caching SPI. A few basic implementations of the cache are provided with olap4j, but integrators are encouraged to develop their own. In this case, the SOAP requests will be cached in memory and use a Least-Frequently-Used eviction policy, enforced for a maximum of 100 elements over 10 minutes.


    More configuration options exist for both the XML/A driver or the cache SPI within the API documentation.

 

  • Exploring an OLAP Server
    The metadata hierarchy of an OLAP server is different from what you might be accustomed to with relational databases. The information is organized according to the hierarchy illustrated in Figure 1.
    The olap4j API exposes the metadata in two ways. It can be explored either through DatabaseMetaData, in the form of tabular data, or by exploring a hierarchy of business objects. Listing Two shows how this
    is done programmatically.

          Listing Two: Exploring olap4j metadata.
          final OlapDatabaseMetaData meta =
          oConn.getMetaData();
          // We can explore the meta data in a tabular form
          final ResultSet catalog = meta.getCatalogs();
          // We can also explore using the object model
          final Schema schema = oConn.getOlapSchema();
          for (Cube cube : schema.getCubes()) {
                  System.out.println(cube.getName());
          }

 

Figure 1: Layout of the olap4j metadata hierarchy.

 

 

    By convention, OLAP connections are bound to a database, a catalog, and a schema. Although a user can modify the binding easily by simply invoking setters or passing JDBC arguments in the connection’s URL, connection implementations are expected to bind automatically to the first elements of the hierarchy encountered, should none be specified.

    Users of olap4j should keep this in mind when using an OLAP connection’s sugar accessors, such as getOlapCatalog(). Retrieving a list of Catalogs implies a prior binding to a Database. If you have not set a specific database to use, the connection will bind itself to the first one it encounters, then return all catalogs within this database.

 

  • Parsing and Validating Queries
    We are now ready to query the server for data. Let’s write a query and parse it (Listing Three). The parser will check it for correct grammar.

          Listing Three: Querying the server.
          final String myQuery ="SELECT " +"{[Drink].[Beverages].Children} " + "ON COLUMNS " +"FROM [Sales] " + "WHERE ([Time].[Cake Time!])";
          final MdxParser parser = oConn.getParserFactory().createMdxParser(oConn);
          final SelectNode parsedObject =parser.parseSelect(myQuery);
          System.out.print(parsedObject.toString());


    A SelectNode object is a representation of your query, as interpreted by your connection’s specific grammar. The API does not endorse any singular MDX grammar (so as to keep the project independent of OLAP vendors). Instead, the connection itself has to expose a parser that is capable of bridging the grammar with olap4j’s type system.


    Once the textual query is parsed, you are left with an object representation of it, but there is no telling if this query, aside from being grammatically correct, is valid. To determine if the query is executable and will return data, the query validator (Listing Four) is used.

 

          Listing Four: The results of a query validation.
          oConn.getParserFactory().createMdxValidator(oConn).validateSelect(parsedObject);
          ->: Mondrian Error:MDX object '[Time].[Cake time!]' not found in cube 'Sales'


    The validator is a helper object used to validate queries further. It looks up all members and hierarchy elements referenced in your query and makes sure that they exist on the target server. In our example, indeed, there is no such thing as cake time [alas! —Ed.].


    Validating the query might seem like overhead, as executing the query straight away will indeed tell you whether the query is valid or not. But when a query is validated, only the metadata is evaluated against the server, and no data is computed or returned. This means that the server won’t have to compute all the tuples of members and
associated data. Validating a query is a very fast operation compared with executing it. If you are developing a GUI where a user sits in front of the screen until an answer is given, you should keep that in mind.

 

  • Executing Queries
    Executing a query against an OLAP server is similar to what you would do for a relational database. Queries can consist of a query parsed into a SelectNode object, as in Listing Five(a) or executed using a String of
    MDX, as in Listing Five(b).
    Listing Five(a): Executing a query with SelectNode.
    // We can execute the parsed SelectNode itself...
    CellSet data = oConn.createStatement().executeOlapQuery(parsedObject);

          Listing Five(b): Executing a query with MDX.
          // ...or use the MDX query directly.
          data = oConn.createStatement().executeOlapQuery(myQuery);
          Reading the Results of an OLAP Query Once the query is executed, you can explore the returned data. A
CellSet object is different than a ResultSet. You must think of CellSets as data organized on any number of axes. On each of those axes, there are a series of positions. Each position describes a set of tuples, formed by cross-joining all the members of the current axis.


          Listing Six: Iterating over the results of an olap4j query.
          // Iteration over a two-axis query
          for (Position axis_0: data.getAxes().get( Axis.ROWS.axisOrdinal() ).getPositions()){
              for (Position axis_1: data.getAxes().get(Axis.COLUMNS.axisOrdinal()).getPositions()){
                  Object value =data.getCell(axis_0, axis_1).getValue();
              }
          }
    The code in Listing Six demonstrates how to iterate over the resultsof a query that included only two axes; columns and rows. If your query uses more than two axes, you will need to iterate over each of them. In our simple two-dimensional use case, olap4j includes a very handy helper package to help display the results ofthe query:

 

    // We use the utility formatter.
    RectangularCellSetFormatter formatter = new RectangularCellSetFormatter(false);
    // Print out.
    PrintWriter writer = new PrintWriter(System.out);
    formatter.format(cellSet, writer);
    writer.flush();


    Going Further With the information presented so far, you should now be able to connect, execute queries against an OLAP server, and obtain meaningful results. The olap4j project can get you far beyond this point. The Resources
section contains links to materials that showcase the most advanced features, including how to build a user interface to explore OLAP data sources using a programmatic query model, statistical simulations, and real-time updates.

  • Resources
    olap4j.org: Home of the olap4j project: http://olap4j.org/
    olap4j.org/api: The latest API reference: http://olap4j.org/api
    olap4j download resources:. http://is.gd/3Z7kyf
原创粉丝点击