用flatworm解析和生成Flat Files开发指南【转】

来源:互联网 发布:淘宝如何改密码修改 编辑:程序博客网 时间:2024/06/14 16:28

用flatworm解析和生成Flat Files开发指南

For Version 2.0

Last Revised December, 2009

 

Flat files.  Much as we live in an XML/SOAP/Web Services world, there's still a ton of data being moved around between proprietary and legacy applications that consists of fixed length fields delimited by EOLs.  Around about the time I wrote my 20th Java class who's only purpose in life was to suck up a flat file, use String.substring to break it up into pieces, and then populate a bean with it, I decided there had to be a better way.  This package represents the fruit of that frustration.

     Flat files更多的用在在XML/SOAP/Web Services的应用系统中,目前依然存在大量的数据在新的应用系统和遗留的老系统进行来回传递,这些数据通过EOLs来区分。在我写Java类的20世纪,人们只打算用String.substring方法分块解析flat file并形成一个bean,我决定用更好的一种方法来实现。这个package代表那个失败的成果。

What is Flatworm?

什么是Flatworm?

Flatworm is a Java library intended to allow a developer to describe the format of a flat file using an XML definition file, and then to be able to automatically read lines from that file, and have one or more beans be instantiated for each logical record.

Flatworm是一个免费开源的java库。开发人员可以用一个XML文件来描述falat file的格式,并通过定义文件自动读取文件并且每个逻辑记录可以形成一个或多个实例化的bean。

 

There are a few powerful features in Flatworm worth mentioning.  For one thing, a record may consist of one or more physical lines in the file.  A record may contain more than one bean once decoded.  A flat file may contain more than one type of record, and Flatworm can use line length and substring matching to determine which type of record a line begins.

Flatworm有几个非常给力的特征。例如,一个记录在文件中可以包含一行或者多行。一个记录可以解析后可以形成不知一个bean。一个falt file可以包含多种类型的记录,而且可以用行的长度或特定字符串来决定每个类型记录的开始。

 

Besides fielded buffer flat files, Flatworm also supports text files where the different fields are separated by a separator character, e.g. CSV (comma separated values) files.

除了字段化flat file外,Flatworm也支持用分隔符来界定字段的文本文件,例如 CSV文件(逗号分隔符)

Flatworm, as of version 2.0, also supports delimited files that contain segments that may repeat. These are different than standard flat files that have a well defined number of fields for each record. With repeating segments it is possible to have a varying number of the segment in each record, so that different records in the file could have a different number of fields. Repeating segments are supported only for reading delimited files.

Flatworm2.0版本,也支持重复的字符片段来分隔文件。这个跟已经定义好的字段数量的标准卡flat file有些不同的,它通过重复的字符片段每个记录都有可能不同数量的重复字符片段,而且这些文件中每个记录可能有不同数目的字段。重复字符片段只支持读取界定好了的文件。

 

Last but not least, Flatworm is able to produce flat files from beans and the same definition file.

最后但同样重要的,Flatworm能够跟相同的定义文件生成flat file 。

 

Requirements

开发需求

In addition to the flatworm jar file, you will also need to have the following jars in your classpath in order for Flatworm to thrive:

  • commons-beanutils (from Apache Commons)
  • commons-collections (from Apache Commons)
  • commons-logging (from Apache Commons)
  • commons-lang (from Apache Commons)
  • log4j (www.log4j.org)(optional)

Recent versions of all of these packages are available in the source jar file.

 

要利用flatworm进行开发,除了faltworm的jar包文件外,您还需要以下的jar包引入到您的classpath中:

  

  • commons-beanutils (from Apache Commons)
  • commons-collections (from Apache Commons)
  • commons-logging (from Apache Commons)
  • commons-lang (from Apache Commons)
  • log4j (www.log4j.org)(optional)

这些包的最新jar都可以使用。

 

Downloading

下载

The latest version of Flatworm is Release 2.0.  You can download it from Sourceforge .

flatworm最新版本为2.0,你可以从 Sourceforge下载。

 

A Simple Example

一个简单的例子

Before diving into the complexities of Flatworm, let's look at a simple example that illustrates the basic operation.  Imagine the following input file which contains new hire data for a company:

在深入到Flatworm的复杂功能之前,让我们先看一个包含基本操作的简单例子。假设下面是一公司新雇员的数据文件:

NHJAMES          TURNER         M123-45-67890004224345

NHJOHN           JONES          M987-65-43210104356745

The layout of the file is as follows:

文件的排列如下:

 

RECORD NAME

TYPE

LENGTH

recordtype

char

2

firstname

char

15

lastname

char

15

gender

char

1

ssn

char

11

salary

double

10 (2 decimal places)

 

We want to suck this file into a Java bean called Employee that has properties firstName, lastName, ssn, gender and salary.  These are available via the standard JavaBean mechanisms.

我们想要将这个文件转化为一个叫做Employee的java bean,属性包含姓、名、ssn、性别和工资。这些是标准卡的JavaBean机制。

 

To do this, we start by writing the Flatworm XML descriptor for the file:

为了达到这个目标,我们先从编写faltaworm XML描述文件开始:

 

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE file-format SYSTEM "http://www.blackbear.com/dtds/flatworm-data-description_1_0.dtd">

<file-format>

<converter name="char" class="com.blackbear.flatworm.converters.CoreConverters" method="convertChar" return-type="java.lang.String"/>

<converter name="decimal" class="com.blackbear.flatworm.converters.CoreConverters" method="convertDecimal" return-type="java.lang.Double"/>

<record name="newhire">

<record-ident>

<field-ident field-start="0" field-length="2">

<match-string>NH</match-string>

</field-ident>

</record-ident>

<record-definition>

<bean name="employee" class="Employee"/>

<line>

<record-element length="2"/>

<record-element length="15" beanref="employee.firstName" type="char">

<conversion-option name="justify" value="left"/>

</record-element>

<record-element length="15" beanref="employee.lastName" type="char">

<conversion-option name="justify" value="left"/>

</record-element>

<record-element length="1" beanref="employee.gender" type="char"/>

<record-element length="11" beanref="employee.ssn" type="char">

<conversion-option name="strip-chars" value="non-numeric"/>

</record-element>

<record-element length="10" beanref="employee.salary" type="decimal">

<conversion-option name="decimal-places" value="2"/>

<conversion-option name="decimal-implied" value="true"/>

<conversion-option name="pad-character" value="0"/>

<conversion-option name="justify" value="right"/>

</record-element>

</line>

</record-definition>

</record>

</file-format>

The file-format tag is required, and specifies the beginning of the actual description.  The first thing that we must do is to register converters for the datatypes used in the file.  There are a number of  predefined converter methods in the provided class com.blackbear.flatworm.coverters.CoreConverters:

这个XML文件中的file-format标签是必须的,它标识着实际描述的开始。开始后第一件事情就算是我们必须先注册文件中用到数据类型的转换器。flatworm提供了一些预定义的转换器在核心包中。

  • convertChar - Simply returns the field specified, optionally stripping leading or trailing (or both) padding characters, and removing unwanted characters.

字符转换器-

  • convertDecimal - As above but converts the value to a Double. The decimal place may be implied by position, or explicit

数字类型转化器

  • convertDate - Parses the date using the default (MM-dd-yyyy) or a user defined format.

时间转换器-

  • convertInteger - Parses to an Integer

整形转化器

  • convertLong - Parses to a Long

长整形转换器

  • covertBigDecimal - Parses to a BigDecimal

大数字类型转换器

In order to be used in record definitions, a converter must always be registered first.  Next in the file, a record is defined.  A file may contain several different types of records, the record-indent tag is used to specify which record definition is approach for a given line.  There are two different ways to identify a record, by a substring match on a specific section of the line, or by the overall length of the line.  Later, you will see how multiple record types can be read from the same file, for them moment only one is defined, which matches on the characters NH (new hire) at locations 0-2 on the line.  If no record-ident is defined, all records will match.

一个转换器要能够用在记录的定义中必须先注册。接下来,记录被定义。一个文件可能包含几种不同的记录类型,record-indent标签一般用来定义一行的记录定义。有两不同的方式来识别一条记录:通过字符串来区分一行的特定部分或通过每行的长度。接下来,你可以看到从一个相同文件中读取多少中记录类型,对它们来说只有一个地方可以定义,例如NH在每行的0-2位置。如果没有定义record-ident,所有记录都符合。

 

Once we're sure that we are dealing with the correct record type, we can define the record.  We start by defining the beans that will be returned.  Each bean has a name which is used to reference it inside the definition, and a class (fully qualified) with which to create objects.  The class specified must have a valid zero-argument instantiator.

一旦确定我们正在处理正确的记录类型,我们可以定义记录。我们将会返回一开始定义的bean,每个bean有一个跟定义相关联的名字,而且可以用来创建对象。这些指定的类必须有一个没有参数的实例化处理方法。

 

Finally the record is broken down line by line (since a record is allowed to span multiple lines).

最后,记录是通过行结束的(因为一天记录是允许包括多行的)

Record-elements (fields) may be defined in terms of:

记录元素定义的依据可以有一下几种:

  •  a length alone, in which case they are considered to span from the end of the last field to that position plus the specified length

只是长度

  • a start position and a length, in which case they span from the start position to that position plus the length

开始位置和长度

  • a start and end position, in which case they span from the start to end position  (not inclusive of the end)

一个开始和结束的位置

  • an end position alone, in which case they span from the last end position to the specified end position (not inclusive of the end)

只有结束的问题

Each record element also defines the beanref (according to the standard used in the Apache Commons BeanUtil package), and the type (which should match one of the types defined at the top of the file)  Record elements also may have conversion-options, which are specific to the converter specified.  For example, in the above example, the lastName field should have any trailing spaces removed, the social security number show be stripped of all non-numeric characters, and the salary has two implied decimal places and may be left-padded with zeros which should be removed.

每一天记录也可以定义关联bean(根据标准卡的用法:Apache Commons BeanUtil包),而且这些记录的类型可以有改变类型的选项,只要在转换器中定义。例如,在上面的例子中,lastName必须去掉多余的空格,社保编号必须是非数字的字符串,工资必须2位精度的数字类型而且左边补齐的0也必须去掉。

 

Now we're ready to fire it all up. 

现在我们往事具备,准备开始出发。

 Here's a simple Java class that parses the input file and prints out the beans produced:

这是一个简单的解析文件并打印出结果的java类:

import java.io.*;

import java.util.HashMap;

 

import com.blackbear.flatworm.ConfigurationReader;

import com.blackbear.flatworm.FileFormat;

import com.blackbear.flatworm.MatchedRecord;

import com.blackbear.flatworm.errors.*;

 

public class SimpleFlatwormExample {

public static void main(String[] args) {

ConfigurationReader parser = new ConfigurationReader();

try {

FileFormat ff = parser.loadConfigurationFile(args[0]);

InputStream in = new FileInputStream(args[1]);

BufferedReader bufIn = new BufferedReader(new InputStreamReader(in));

MatchedRecord results;

while ((results = ff.getNextRecord(bufIn)) != null) {

if (results.getRecordName().equals("newhire")) {

System.out.println(results.getBean("employee"));

}

}

 

} catch (FlatwormUnsetFieldValueException flatwormUnsetFieldValueError) {

flatwormUnsetFieldValueError.printStackTrace(); 

} catch (FlatwormConfigurationValueException flatwormConfigurationValueError) {

flatwormConfigurationValueError.printStackTrace();

} catch (FileNotFoundException e) {

e.printStackTrace();

} catch (FlatwormInvalidRecordException e) {

e.printStackTrace();

} catch (FlatwormInputLineLengthException e) {

e.printStackTrace();

} catch (FlatwormConversionException e) {

e.printStackTrace();

}

}

 

}

The location of the configuration file is passed in as the first argument to the method, and the file to be parsed as the second. 

第一个参数为flatworm配置定义文件,第二个参数为待解析的数据文件,下面就是程序的说明:

 A ConfigurationReader object is created, and the loadConfigurationFile method is called with the path to the file as the argument.  A FileFormat is returned.  After opening the input file and morphing it into a BufferedReader,  the BufferedReader is passed in to the getNextRecord method of the FileFormat.  getNextRecord either returns null if the input file has been exhusted, or a MatchedRecord object.  The getRecordName method lets us know which type of record is being returned (remembering again that a file can have several types of records), and we can access specific beans with the getBean method.

 

When we run this test program, the results are as expected:

运行这个测试承运,结果如下:

C:/j2sdk1.4.2_04\bin\java SimpleFlatwormExample simple-example.xml import1.txt

Employee@120a47e[TURNER, JAMES, 123456789, M, 42243.45]

Employee@f73c1[JONES, JOHN, 987654321, M, 1043567.45]

Process terminated with exit code 0

0 0