Validation with Java and XML Schema

来源：互联网发布：oracle数据工程师编辑：程序博客网时间：2024/05/18 03:01

Part 1
Learn the value of data validation and why pure Java isn't the complete solution for handling it

By Brett Mclaughlin, JavaWorld.com, 09/08/00
As technologies have matured and APIs for Java and other languages have taken more of the burden of low-level coding off your hands (JMS, EJB, and XML are just a few recent examples), business logic has become more important to application coding. With this increase in business logic comes an increase in the specification of data allowed.

Read the whole "Validation with Java and XML Schema" series:
Part 1. Learn the value of data validation and why pure Java isn't the complete solution for handling it
Part 2. Use XML Schema for constraining Java data
Part 3. Parsing XML Schema to validate data
Part 4. Build Java representations of schema constraints and apply them to Java data

For example, applications no longer just accept orders for shoes; they ensure that the shoe is of a valid size, in stock, and accurately priced. The business rules that must be applied even for a simple shoe store are extremely complex. The user input and the input combination must be validated; those data often result in computed data, which may have to be validated before it is passed on to another application component. With that added complexity, you spend more time writing validation methods. You ensure that a value is a number, a decimal, a dollar amount, that it's not negative, and on, and on, and on.

With servlets and JSP pages sending all submitted parameters as textual values (an array of Java Strings, to be exact), your application must convert to a different data type at every step of user input. That converted data is most likely passed to session beans. The beans can ensure type safety (requiring an int, for example), but not the value range. So validation must occur again. Finally, business logic may need to be applied. (Does Doc Marten make this boot in a size 10?) Only then can computation safely be performed, and results supplied to the user. If you're starting to feel overwhelmed, good! You are starting to see the importance of validation, and why this series might be right for you.
Coarse-grained vs. fine-grained validation

The first step in making your way through the "validation maze" is breaking the validation process into two distinct parts: coarse-grained validation and fine-grained validation. I'll look at both.

Coarse-grained validation is the process of ensuring that data meet the typing criteria for further action. Here, "typing criteria" means basic data constraints such as data type, range, and allowed values. These constraints are independent of other data, and do not require access to business logic. An example of coarse-grained validation is making sure that shoe sizes are positive numbers, smaller than 20, and either whole numbers or half sizes.

Fine-grained validation is the process of applying business logic to values. It typically occurs after coarse-grained validation, and is the final step of preparation, before one either returns results to the user or passes derived values to other application components. An example of fine-grained validation is ensuring that the requested size (already in the correct format because of coarse-grained validation) is valid for the requested brand. V-Form inline skates are only available in whole sizes, so a request for a size 10 1/2 should cause an error. Because that requires interaction with some form of data store and business logic, it is fine-grained validation.

age 2 of 4

The fine-grained validation process is always application-specific and is not a reusable component, so it is beyond the scope of this series. However, coarse-grained validation can be utilized in all applications, and involves applying simple rules (data typing, range checking, and so on) to values. In this series, I will examine coarse-grained validation and supply a Java/XML-based solution for handling it.
Data: Ever present, ever problematic

If you're still not convinced of the need for this sort of utility, consider the fact that data has become the commodity in today's global marketplace. It is not applications, not technology, not even people that drive business -- it is raw data. The tasks of selecting a programming language, picking an application server, and building an application are all byproducts of the need to support data. Thus, those decisions may all later be revisited and changed. (Ever had to migrate from SAP or dBase to Oracle? Ever switched from NetDynamics to Lutris Enhydra?)

However, the fundamental commodity, data, never changes. Platforms change, software changes, but you never hear anyone say, "Well, let's just trash all that old customer data and start fresh." So the problem of constraining data is a fundamental one. It will always be part of any application, in any language. And data is always problematic because of problematic users. People type too fast, type too slow, make a silly mistake, or spill coffee on their keyboards -- the bottom line is that validation is essential to preserving accurate data, and therefore is essential to a good application. With that in mind, I'll show you how people are solving that common problem today.
Current solutions (and problems)

Since data validation is so important, you'd probably expect there to be plenty of solutions for the problem. In reality, most solutions for handling validation are clumsy and not at all reusable, and result in a lot of code applicable only in specific situations. Additionally, that code often gets intertwined with business logic and presentation logic, causing trouble with debugging and troubleshooting. Of course, the most common solution for data validation is to ignore it, which causes exceptions for the user. Obviously, none of those are good solutions, but understanding the problems they don't solve can help establish requirements for the solution built here.
A big hammer

The most common way to handle data validation (besides ignoring it) is also the most heavy-handed. It involves simply coding the validation directly into the servlet, class, or EJB that deals with the data. In this example, validation is performed as soon as a parameter is obtained from a servlet: Continued

age 3 of 4

Inline validation in a servlet

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
public class ShoeServlet extends HttpServlet {
 public void doGet(HttpServletRequest req, HttpServletResponse res)
 throws ServletException, IOException {
 // Get the shoe size
 int shoeSize;
 try {
 shoeSize = Integer.parseInt(req.getParameter("shoeSize"));
 } catch (NumberFormatException e) {
 throw new IOException("Shoe size must be a number.");
 }
 // Ensure viable shoe size
 if ((shoeSize <= 0) || (shoeSize > 20)) {
 throw new IOException("Invalid shoe size.");
 }
 // Get the brand
 String brand = req.getParameter("brand");
 // Ensure correct brand
 if (!validBrand(brand)) {
 throw new IOException("Invalid shoe brand.");
 }
 // Ensure correct size and brand
 if (!validSizeForBrand(shoeSize, brand)) {
 throw new IOException("Size not available in this brand.");
 }
 // Perform further processing
 }
}

This code is neither cleanly separated nor reusable. The specific parameter, shoeSize, was presumably obtained from a submitted HTML form. The parameter is converted to a numeric value (hopefully!), then compared to the maximum and minimum acceptable values. This example doesn't even check for half sizes. In an average case where four or more parameters are received, the servlet's validation portion alone could result in more than 100 lines of code. Now imagine increasing that to 10 or 15 servlets. This approach results in a massive amount of code, often difficult to understand and poorly documented.

In addition to the code's lack of clarity, the business logic often mixes with the validation, making code modularization very difficult. In the following example, a session bean must not only perform its business task, but also ensure that the data are correctly formatted:

Inline validation in a session bean

import java.rmi.RemoteException;
public class ShoeBean implements javax.ejb.SessionBean {
 public Shoe getShoe(int shoeSize, String brand) {
 // Ensure viable shoe size
 if ((shoeSize <= 0) || (shoeSize > 20)) {
 throw new RemoteException("Invalid shoe size.");
 }
 // Ensure correct brand
 if (!validBrand(brand)) {
 throw new RemoteException("Invalid shoe brand.");
 }
 // Ensure correct size and brand
 if (!validSizeForBrand(shoeSize, brand)) {
 throw new RemoteException("Size not available in this brand.");
 }
 // Perform business logic
 }

An obvious problem here is that the only way to inform the calling component of a problem is by throwing an Exception, usually a java.rmi.RemoteException in EJBs. That makes fielding the exception and responding to the user difficult, at best. Of course, each business component that uses the shoeSize variable must perform the same validation, which could be wedged between different blocks of business logic.

This sort of "big hammer" solution doesn't help you in reusability, code clarity, or even reporting problems to the user. This solution, the most common method for handling data validation issues, should be used only as an example of what not to do in your next project.
A smaller hammer

Over time, some developers have seen the "big hammer" approach's problems. As servlets' popularity has increased, handling textual parameters has been recognized as a problem worth solving. As a result, utility classes that parse parameters and convert them to a specific data type have been developed. The most popular solution is Jason Hunter's com.oreilly.servlet.ParameterParser class, introduced in his O'Reilly book, Java Servlet Programming. (See Resources.) Hunter's class allows a textual value to be supplied, formatted into a specific data type, and returned. A portion of that class is shown here:

The com.oreilly.servlet.ParameterParser class

package com.oreilly.servlet;
import java.io.*;
import javax.servlet.*;
public class ParameterParser {
    private ServletRequest req;
    public ParameterParser(ServletRequest req) {
        this.req = req;
    }
    public String getStringParameter(String name)
        throws ParameterNotFoundException {
        // Use getParameterValues() to avoid the once-deprecated getParameter()
        String[] values = req.getParameterValues(name);
        if (values == null)
            throw new ParameterNotFoundException(name + " not found");
        else if (values[0].length() == 0)
            throw new ParameterNotFoundException(name + " was empty");
        else
            return values[0]; // ignore multiple field values
    }
    public String getStringParameter(String name, String def) {
        try { return getStringParameter(name); }
        catch (Exception e { return def; }
    }
    public int getIntParameter(String name)
        throws ParameterNotFoundException, NumberFormatException {
        return Integer.parseInt(getStringParameter(name));
    }
    public int getIntParameter(String name, int def) {
        try { return getIntParameter(name); }
        catch (Exception e) { return def; }
    }
    // Methods for other Java primitives
}

Two versions of the utility method are provided for each Java primitive data type. One returns the converted value or throws an exception if conversion fails, and another returns the converted value or returns a default if no conversion can occur. Using the ParameterParser class in a servlet significantly reduces the problems described above:

Using the com.oreilly.servlet.ParameterParser class in a servlet

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import com.oreilly.servlet.ParameterParser;
public class ShoeServlet extends HttpServlet {
 public void doGet(HttpServletRequest req, HttpServletResponse res)
 throws ServletException, IOException {
 ParameterParser parser = new ParameterParser(req);
 // Get the shoe size
 int shoeSize = parser.getIntParameter("shoeSize", 0);
 // Ensure viable shoe size
 if ((shoeSize <= 0) || (shoeSize > 20)) {
 throw new IOException("Invalid shoe size.");
 }
 // Get the brand
 String brand = parser.getStringParameter("brand");
 // Ensure correct brand
 if (!validBrand(brand)) {
 throw new IOException("Invalid shoe brand.");
 }
 // Ensure correct size and brand
 if (!validSizeForBrand(shoeSize, brand)) {
 throw new IOException("Size not available in this brand.");
 }
 // Perform further processing
 }
}

This is a better solution, but still clumsy; you can obtain the appropriate data type, but range checking is still a manual process. It also doesn't allow, for example, just a set of values to be permitted (such as allowing only "true" or "false," rather than any textual value). Trying to implement that sort of logic in the ParameterParser class results in a clumsy API, with at least four different variations for each data type.

This approach also requires the acceptable values to be hard-coded into the servlet or Java class. A maximum shoe size of 20 is in the compiled code, rather than an easily changed flat file (such as a properties file or XML document). A change to that value should be trivial, but requires a code change and subsequent recompilation. This approach is a step in the right direction (kudos to Hunter for providing the utility class), but not an answer for data validation.
Where's the toolbox?

The common problem with validation is that, in its current form, it is not reusable or compartmentalized. The ParameterParser class is reusable, but still requires hard-coded values and range checking. A solution that allows session beans to simply perform business logic, assuming appropriate values are supplied, does not exist. Also, there is no easy way to add functionality to the shown solutions without affecting the code -- not only the utility class itself, but the calling code too.

Additionally, these solutions are incompatible with other applications and languages. Data that do not come in a specific format (in the examples, Java Strings) cannot be plugged into the validation code. In other words, these solutions simply don't cut it for today's applications' more complex needs.
Pure Java: Not cutting it

Trying to create a solution with pure Java is a big part of the problem. Without using some sort of noncompiled format for ranges, data types, and allowed values, changes to validation rules will always result in recompilation. There are better ways to store this information; as I mentioned earlier, Java property files and XML are two formats that might help create a solution.
Property files

Java property files have been used in attempts to solve the validation problem. However, that methodology has significant flaws. First, standard Java property files do not allow multiple keys separated by periods (key1.key2.key3 = value). That level of nesting, while handy, is impossible without writing custom property file handling code. So a simple properties file that should look like this:

Non-standard properties file

field.shoeSize.minSize = 0
field.shoeSize.maxSize = 20
field.brand.allowedValue = Nike
field.brand.allowedValue = Adidas
field.brand.allowedValue = Dr. Marten
field.brand.allowedValue = V-Form
field.brand.allowedValue = Mission

ends up looking more like this code with a pure Java solution:

Java property files specifying validation constraints

shoeSizeMin = 0
shoeSizeMax = 20

While the key for shoe-size range becomes less clear, there is simply no way to represent the allowed values for a brand -- Java property files cannot have the same key multiple times with different values.

Page 4 of 4

Some utility packages allow more advanced property file reading. (See the Java Apache Project for an example.) However, using property files for these constraints poses a more fundamental problem: mixing basic functionality. Property files are generally used for startup parameters, configuration information, and binding names to a JNDI namespace. Mixing validation logic with those other data causes confusion, both for users and for programmers who maintain the code.

Imagine looking for the minimum shoe size allowed among properties detailing what port a Web service should start on, the recommended size of the Java heap, and on what hostname the LDAP directory server can be found. An isolated component should be used instead, just for handling validation information.
XML to the rescue?

I have examined several possible solutions for handling validation, none of which seem perfect. I propose a different approach that uses XML (and XML Schema) in concert with Java. My solution will be detailed fully in the next two articles of the series, but I'll introduce it now.

First, you can use an XML document to represent the constraints on your data. This will allow these constraints to be changed without code recompilation, simply by changing the values in the XML document. Separating constraints from other application data will also be possible. Finally, using XML and XML Schema will allow you to use a simple parser and API (which is itself a standard) to manipulate the data. No proprietary extensions or APIs are needed to handle the XML data, so the resulting code will be portable.

Here is an XML Schema that describes the previously described constraints:

Validation constraints using XML Schema

<?xml version="1.0"?>
<schema targetNamespace="http://www.buyShoes.com"
 xmlns="http://www.w3.org/199/XMLSchema"
 xmlns:buyShoes="http://www.buyShoes.com"

>
<attribute name="shoeSize">
 <simpleType baseType="integer">
 <minExclusive value="0" />
 <maxInclusive value="20" />
 </simpleType>
</attribute>
<attribute name="brand">
 <simpleType baseType="string">
 <enumeration value="Nike" />
 <enumeration value="Adidas" />
 <enumeration value="Dr. Marten" />
 <enumeration value="V-Form" />
 <enumeration value="Mission" />
 </simpleType>
</attribute>
</schema>

All of those constraints are defined as attributes in the XML Schema, and you can express them fully and simply. If that XML Schema could perform coarse-grained validation, applications could discard the validation code described in this article and focus on business logic. I will examine that solution later in this series.
Summary

Now that I've detailed the various problems presented by pure Java solutions for validation, you might be feeling a bit down on the language. Have no fear, though. In upcoming articles, I'll let Java help solve those problems. First, though, I'll examine XML Schema more closely and look at the richer set of constraints it allows you to set on data. In fact, to data, XML Schema will start to look like Java interfaces look to code; it can provide a data interface for user input.

For my next article, I will prepare some XML documents that allow handling of user input. I'll then use JDOM, a Java API for manipulating XML (and XML Schema), to code utility classes around the constraints mentioned here. As a result, you'll have a good start on your reusable components for validation, using Java and XML together. In the meantime, I hope you'll think about the problems I've identified, find an application on which you can try out next month's code, and introduce yourself to JDOM (see Resources), as I'll use it heavily. See you next month!
Author Bio
Brett McLaughlin is an Enhydra strategist at Lutris Technologies and specializes in distributed systems architecture. He is the author of Java and XML, and is involved in technologies such as Java servlets, Enterprise JavaBeans technology, XML, and business-to-business applications. With Jason Hunter, he recently founded the JDOM project, which provides a simple API for manipulating XML from Java applications. McLaughlin is also an active developer on the Apache Cocoon project and the EJBoss EJB server, and a cofounder of the Apache Turbine project.

Part 2
Using XML Schema for constraining Java data

By Brett Mclaughlin, JavaWorld.com, 10/13/00
With the wealth of application development in Java today, there seems to be an API for almost everything: remote method invocation (RMI), reusable business components (EJB), manipulating XML (SAX, DOM, JDOM, JAXP), and user interfaces (Swing) as well as writing a help system (JavaHelp). Yet programmers still spend hours and even days on each project, working out validation routines. Mind you, those aren't complex business formulas but ensuring that a value is of the correct data type when submitted via an HTML form or checking the range of a shoe size. Somehow, with all the recent focus on enterprise applications, some of a programmer's core tasks have been overlooked.

In an effort to resolve that problem, at least until the powers that be come up with a robust API for validation, this series takes a detailed look at validation in Java. That isn't an explanation on using JavaScript in your HTML or expensive third-party libraries but instead on creating a simple validation framework based on existing standards. The focus is on ease of use and a simple means to add new validation rules into the data constraints without cluttering business and presentation logic with validation details.
The story so far

To get started, you should take the time to read Part 1 in the series. In that article, I looked at several existing options for validation, particularly pure Java options. Both inline validation (such as directly in a servlet or Enterprise JavaBean) as well as helper classes (such as Jason Hunter's ParameterParser class) often still resulted in code that was cluttered and that mixed validation with business and application logic. Additionally, you were left to deal with numerous try/catch blocks and throwing exceptions. It also left the unwanted problem of having to constantly recompile, even for the most minor changes in data constraints (such as changing an allowed range from between 0 and 20 to between 1 and 20).

I also discussed Java property files as a way to handle that problem. First, a small clarification: while Java does allow property files to have multiple period separated keys (key1.key2.key3 = value), it does not allow their use in any meaningful way. For example:

ldap.hostname = galadriel.middleearth.com
ldap.port = 389
ldap.userDN = cn=Directory Manager
ldap.password = foobar

It would seem that that sample entry in a Java properties file represents a logical grouping; all the entries start with the ldap key. However, that is not the case with standard Java APIs. That entry set is functionally equivalent to:

hostname = galadriel.middleearth.com
port = 389
userDN = cn=Directory Manager
password = foobar

In other words, there is no means to get, for example, all the keys with an ldap root. That makes using multiple-key values useful for human readability only and essentially a waste of time in actual application programming without custom or third-party libraries. So Java property files, too, are not suitable for large validation rules. Continued

Part 2
Using XML Schema for constraining Java data

By Brett Mclaughlin, JavaWorld.com, 10/13/00

Page 2 of 2

Finally, I briefly explained using XML to store validation constraints. XML Schema was addressed specifically, as it already has a mechanism to constrain an XML document that is type-safe and verbose. It allows range setting, specification of an enumeration of acceptable values, and a simple syntax. In this article, I'll delve deeper into using XML for constraining data in your Java applications, and you'll begin to write some code to put XML to work. First, I'll address your options within the XML realm.
Perusing the options

So now you know you want to use XML Schema for data constraints. So let's start talking specifics. The biggest issue you need to address is that your constraints are in one language, XML, and your data is in another, Java. So some sort of conversion must take place. Should you convert all of your data to XML, and then validate that XML against your schema? Should your XML Schema somehow be converted to Java, and those objects used to validate your data? What's the right thing to do here? Is it some mixture of the two?
Java to XML

The first option, converting Java data to XML, is actually fairly simple. With the rise in popularity of XML Data Binding, there are several frameworks available that convert, or marshal, a Java object to an XML document. One of those, which I wrote (yes, it's a shameless plug!), is discussed in detail in "Objects, Objects Everywhere" (see Resources). A complete working package is provided for converting between Java and XML. The API is simple, lightweight, and intuitive -- all desirable qualities for your validation solution.

However, data binding is not as perfect as it may seem when you look a little deeper. First, your Java data may often come in four, five, or even more different pieces. Imagine a form that receives 15 input fields, all as separate Java objects (Strings, in this case). Each would have to be assembled into a single object, marshaled to XML, and then validated against the XML Schema. Your code, then, has to include logic for converting multiple objects into one object suitable for conversion to XML. So your once-simple solution is already getting convoluted.

Further, that option does not allow for any optimization. You can't store the XML Schema in memory in your JVM, and the only real advantage you might introduce is caching the actual XML Schema document (as a DOM or JDOM Document object, perhaps). In other words, there is no performance gain over multiple validation calls. While that might seem like icing on the cake, consider that validation, especially of form data, happens hundreds and even thousands of times per page, per day (or hour, or minute!). Caching, or some sort of performance gain, should really be expected over multiple invocations of the validation. Additionally, parsing XML is a costly operation, and even if the XML Schema document is cached, the marshalled Java object, resulting in an XML document, must be parsed at each validation call. Thus, the conversion from Java to XML doesn't seem to be such a good idea.
XML to Java

Since conversion from Java to XML doesn't seem to be a good idea, let's take a look at the flipside: converting XML to Java. In that case, your Java data would stay as is and would not need to be marshaled into XML. You would instead need to convert your XML Schema constraints into Java objects. Those objects can then take in data and return a result, indicating if the data was valid for the constraints that the object and the underlying XML Schema represented. That is a much more natural case for Java developers as well, as it allows them to stay in a Java environment.

Another advantage to that technique is that it effectively isolates XML Schema from the equation. Using schemas, then, becomes a decision tied only to the conversion from XML to Java, and not the use of the resultant Java objects at all. In other words, if the implementation of those validation classes was changed to convert an XML document (not a schema) to a validation object, the developer would still see the same interface for validation; no application code would need to change. Why is that a big deal? Well, there are two reasons. First, XML Schema is still being finalized, and minor changes may occur. Using that design ensures that you can code to the validation classes covered in that series and, even if XML Schema's specification changes and the implementation of the classes changes, your application code stays the same. Second, there is still some widespread concern over the acceptance of XML Schemas. If they did not satisfy your needs, or if they perhaps were overcomplicated for your application, you could switch to a simpler mechanism (such as simple XML documents or Relax) and still have the same code routines work.

Going back to some original concerns, that also means that your XML Schema document only has to be parsed a single time. The schema is converted to Java objects, representing constraints, and then stored in memory. Data can be validated against the objects over and over without any additional parsing ever occurring. That addresses some of the performance issues I discussed in the section on converting from Java to XML. That is even more critical when the XML Schema might be located across a network, requiring network transfer time for each parsing.

So it seems clear that conversion from XML to Java is the right way to go. Additionally, you want more than just a simple object in Java (such as one that might be produced by unmarshaling an XML Schema document to Java, as in data binding); it should take in a piece of data, and then return whether the data is valid for the constraints supplied in the XML Schema.
The game plan
With those basic design decisions in place, it's time to start outlining your classes and decide what they will look like.

Elements and attributes

The first order of business is to decide what sort of XML Schema constructs you need to support. While it might seem logical to try to support everything in the XML Schema specification, that is both an enormous task (certainly more articles than this series can bear!) as well as counterproductive. For example, the minOccurs and maxOccurs attributes, a core part of XML Schema, have no meaning in the context of validation; a value either exists or does not, and cannot appear repeatedly in that context. So already you can see that some schema constructs are not needed for your validation code.

In fact, looking back at Part 1, XML Schema elements are completely unnecessary. Remember that elements in XML are usually meant to represent repeatable, complex data structures. Most often, they are mapped to Java objects (nonprimitive ones, mostly) rather than single data values. For example, examine this XML Schema:

<?xml version="1.0"?>
<schema targetNamespace="http://www.enhydra.org"
 xmlns="http://www.w3.org/1999/XMLSchema"
 xmlns:enhydra="http://www.enhydra.org"
>
<complexType name="ServiceConfiguration">
 <attribute name="name" type="string" />
 <attribute name="version" type="float" />
</complexType>
<element name="serviceConfiguration" type="ServiceConfiguration" />
</schema>

That schema could easily be mapped to the Java object seen here:

public class ServiceConfiguration {
    private String name;
    private float version;
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public float getVersion() {
        return version;
    }
    public void setVersion(float version) {
        this.version = version;
    }
}

Here, the attributes in the schema map to Java primitives (the types of data you will be validating), while the elements map to complex Java objects. What does that mean to you? Actually, quite a bit. You can essentially dispense with the support for elements in an XML Schema. Instead, focusing on the attribute type can make your job both simpler and more manageable.

In your schemas, then, you need to supply a name for the data for each data member you want to validate. That name acts as more of a data identifier than an instance variable name. That means that passing your validation framework a piece of data with the name supplied will result in the data being validated against that identifier's constraints. You can then specify the constraints on the type, such as the data type (int, String, and so forth) and allowed values.

With those decisions starting to firm up your game plan, let's look back at the XML Schema discussed in Part 1.

<?xml version="1.0"?>
<schema targetNamespace="http://www.buyShoes.com"
 xmlns="http://www.w3.org/199/XMLSchema"
 xmlns:buyShoes="http://www.buyShoes.com"
>
<attribute name="shoeSize">
 <simpleType baseType="integer">
 <minExclusive value="0" />
 <maxInclusive value="20" />
 </simpleType>
</attribute>
<attribute name="brand">
 <simpleType baseType="string">
 <enumeration value="Nike" />
 <enumeration value="Adidas" />
 <enumeration value="Dr. Marten" />
 <enumeration value="V-Form" />
 <enumeration value="Mission" />
 </simpleType>
</attribute>
</schema>

Here, several things are happening. Two data identifiers are setup: "shoeSize" and "brand." The shoe size must be a Java int, greater than 0, and less than or equal to 20. The brand is a Java String, and several allowed values are specified. I'll discuss each of those constraints in a little more detail now.
Data types

Supporting data type validation is fairly easy. You'll start the framework with basic Java primitives. In that aspect, your validation code (at least that aspect of it) is similar to the ParameterParser class I covered in Part 1. A Java String is supplied to each method you will code. That is to account for almost all input, especially from an HTML form, being in that format. If that is new ground for you, ask your fellow developers with experience; data almost always comes in as simple Java Strings. The result of the conversion should, of course, be an object of the correct data type:

    public int getIntParameter(String value)
        throws NumberFormatException {
        return Integer.parseInt(value);
    }

That is just a general code fragment; in the next section, you'll start to code actual validation classes. However, you should see that the data type conversion is pretty easy to ensure.
Value constraints

The other major facet you want to support is value constraints. Value constraints are items that restrict the allowed values once data type has been established, such as the shoe size restriction that ensures it is between 1 and 20 (inclusive). Those constraint types are generally detailed by two XML Schema constructs: the enumeration, which specifies allowed character values, and the minXXX and maxXXX keywords, which specify minimum and maximum allowed values (both inclusively and exclusively).

By supporting those two keyword sets, both nested within the simpleType schema feature, you can allow representation of almost all basic data constraints. You should also notice that you've narrowed down the keywords you have to support even further. In a later article, I'll touch on the pattern keyword, allowing pattern matching (like regular expressions) within character values. That will add yet another constraint tool to your growing arsenal.

So now that you've made it through the design phase (all the talk!), you can get down to the code. Let's look at starting work on a validation framework.
Getting down to it

I've written about enough -- let's get down to the code. You basically have four classes that you need to code:
The Constraint class, which represents constraints for a single type, like the shoeSize type.
The Validator class, which provides an interface for allowing developers to pass in data, and find out if the data is valid.
The SchemaParser class, which parses an XML Schema and creates the Constraint objects for use by the Validator class.
The DataConverter helper class, which will convert from XML Schema data types to Java data types and perform other data type conversions for you.

As that code is being developed for this article and the Enhydra Application Server (which you can check out in Resources), all code is in the org.enhydra.validation package. In addition, the code in this series is open source, meaning you can change whatever you like (such as adding features and functionality). You are also welcome to email those changes to the Enhydra mailing list or to me, and they will be added to the main code base. So with all the details set, let's get into classes.
The Constraint class

The first class is perhaps the simplest to actually turn from design to compilable code. The Constraint class will represent the various constraints for a specific data type. It is identified by an identifier, not surprisingly! Thus, you can easily code the constructor to accept that as a required parameter. Then you need methods for the various types of constraints such as the minimum inclusive value. An accessor ("get" methods), mutator ("set" methods), and interrogator ("has" methods) are needed for each type of constraint. So, for the minInclusive constraint, you would have the following three methods:

/**
 * 
 * This will set the minimum allowed value for this data type (inclusive).
 * 
 *
 * @param minInclusive minimum allowed value (inclusive)
 */
 public void setMinInclusive(double minInclusive) {
 this.minInclusive = minInclusive;
 }
 /**
 * 
 * This will return the minimum allowed value for this data type (inclusive).
 * 
 *
 * @return <code>double</code> - minimum value allowed (inclusive)
 */
 public double getMinInclusive() {
 return minInclusive;
 }
 /**
 * 
 * This will return <code>true</code> if a minimum value (inclusive) constraint
 * exists.
 * 
 *
 * @return <code>boolean</code> - whether there is a constraint for the
 * minimum value (inclusive)
 */
 public boolean hasMinInclusive() {
 return (minInclusive != Double.NaN);
 }

Similar methods are provided for the minExclusive, maxInclusive, and maxExclusive constraints. Additionally, you need a means of adding allowed values, used when enumerations are supplied in the XML Schema. You also need similar methods for returning the allowed values and seeing if any allowed values exist.

/**
 * 
 * This will add another value to the list of allowed values for this data type.
 * 
 *
 * @param value <code>String</code> value to add.
 */
 public void addAllowedValue(String value) {
 allowedValues.add(value);
 }
 /**
 * 
 * This will return the list of allowed values for this data type.
 * 
 *
 * @return <code>List</code> - allowed values for this <code>Constraint</code>.
 */
 public List getAllowedValues() {
 return allowedValues;
 }
 /**
 * 
 * This checks to see if there are only a certain set of allowed values.
 * 
 *
 * @return <code>boolean</code> - whether there are allowed values for this type.
 */
 public boolean hasAllowedValues() {
 if (allowedValues.size() == 0) {
 return true;
 } else {
 return false;
 }
 }

And, of course, there must be a means to set the data type and retrieve it, according to the schema. That is the Java equivalent of the type specified by the schema type or baseType attribute. Similar methods are provided for that functionality:

/**
 * 
 * This will allow the data type for the constraint to be set. The type is specified
 * as a Java <code>String</code>.
 * 
 *
 * @param dataType <code>String</code> this is the Java data type for this constraint.
 */
 public void setDataType(String dataType) {
 this.dataType = dataType;
 }
 /**
 * 
 * This will return the <code>String</code> version of the Java data type for this
 * constraint.
 * 
 *
 * @return <code>String</code> - the data type for this constraint.
 */
 public String getDataType() {
 return dataType;
 }

And as simply as that, you are finished with the Constraint class. You can view the complete class in Resources. If it seems as if that class is just a bunch of get and set methods, you are exactly right! That is just a Java representation of a set of data constraints. The next class you'll code though, the Validator class, will use it heavily.
The Validator class

The Validator class plays the most visible role in your validation framework -- it is the means for developers to interact with validation. Developers get an instance of the class and pass in data to be validated, getting a simple boolean result. They can then refuse the data, throw errors, or take other courses of action.

First, though, a word about how that class is set up. Unlike most classes, the Validator class is not best constructed directly (using the new keyword). The same servlet, running in multiple threads, multiple servlets in multiple threads, and multiple classes in multiple threads, may all use the validation code. If each object instantiated a new Validator instance, parsing would end up taking place, often for the same schema, many times. Instead, you want parsing of an XML Schema to take place only once. Because of that, you employ the Singleton design pattern, which ensures that only one instance of a given class is made available to all threads in the JVM. However, you make a slight modification. Because you don't need just one instance, but one instance per XML Schema, you will actually use a number of instances, each one being tied to a specific schema. Requests for an instance for a schema in which the instance already exists result in that existing instance being returned and parsing not reoccurring. So you can now create the core of the Validator class.

package org.enhydra.validation;
import java.net.URL;
import java.util.HashMap;
import java.util.Map;
/**
* 
* The <code>Validator</code> class allows an application component or client to
* provide data, and determine if the data is valid for the requested type.
* 
*/
public class Validator {
 /** The instances of this class for use (singleton design pattern) */
 private static Map instances = null;
 /** The URL of the XML Schema for this <code>Validator</code> */
 private URL schemaURL;
 /** The constraints for this XML Schema */
 private Map constraints;
 /**
 * 
 * This constructor is private so this the class cannot be instantiated
 * directly, but instead only through <code>{@link #getInstance()}</code>.
 * 
 */
 private Validator(URL schemaURL) {
 this.schemaURL = schemaURL;
 constraints = new HashMap();
 // parse the XML Schema and create the constraints
 }
 /**
 * 
 * This will return the instance for the specific XML Schema URL. If a schema
 * exists, it is returned (as parsing will already be done); otherwise,
 * a new instance is created, and then returned.
 * 
 *
 * @param schemaURL <code>URL</code> of schema to validate against.
 * @return <code>Validator</code> - the instance, ready to use.
 */
 public static Validator getInstance(URL schemaURL) {
 if (instances != null) {
 if (instances.containsKey(schemaURL.toString())) {
 return (Validator)instances.get(schemaURL.toString());
 } else {
 Validator validator = new Validator(schemaURL);
 instances.put(schemaURL.toString(), validator);
 return validator;
 }
 } else {
 instances = new HashMap();
 Validator validator = new Validator(schemaURL);
 instances.put(schemaURL.toString(), validator);
 return validator;
 }
 }
}

As you can see, the constructor is made private. Application coders will instead call the static getInstance() method and supply the schema to use for constraints. If an instance tied to that schema exists, it is returned, and no instantiation occurs. If, however, no instances exist, or no instances exist for the supplied schema, a new instance is created. You can see that a comment is a placeholder for a method, in the constructor, that will cause parsing of the schema to occur and a list of constraints to be built up. That prepares the instance for use. I'll look at the actual parsing, which the SchemaParser class will perform, a little later.

Finally, you must provide a method that allows developers to validate their data (remember, that was the point of that whole exercise!). That is also simple, and you'll skeleton out the method here. In the next article, you'll fill in the logic:

/**
 * 
 * This will validate a data value (in <code>String</code> format) against a
 * specific constraint, and return <code>true</code> if this value is valid
 * for the constraint.
 * 
 *
 * @param constraintName the identifier in the constraints to validate this data against.
 * @param data <code>String</code> data to validate.
 * @return <code>boolean</code> - whether the data is valid or not.
 */
 public boolean isValid(String constraintName, String data) {
 // Validate against the correct constraint
 // This will be coded in Article 2
 // For now, return true
 return true;
 }

With that, I will stop here. Now I'll take a look at where you are, and where you need to go next.
Summary

Well, you've come quite a ways since our conceptual beginnings. Details about how you are going to build the framework have been solidified, and you know the classes you will need to code. One of those, the Constraint class, is complete, and the second, the Validator class, has a working skeleton and a lot of code filled in. I hate to make you wait in the middle of all that code, but that's about all the time and space I have.

In the next article, I'll look at the important task of actually parsing the XML Schema and building up a list of Constraint objects for use in the Validator. I'll also finish up the utility class, DataConverter, and complete the Validator class that was started here. Then you're all done! I'll introduce some examples so you can look at the code in action as well as discuss some advanced topics such as pattern matching and error reporting. Until then, have fun with the code, and see you online!
Author Bio
Brett McLaughlin is an Enhydra strategist at Lutris Technologies who specializes in distributed systems architecture. He is the author of Java and XML and is involved in technologies such as Java servlets, Enterprise JavaBeans technology, XML, and business-to-business applications. With Jason Hunter, he recently founded the JDOM project, which provides a simple API for manipulating XML from Java applications. McLaughlin is also an active developer on the Apache Cocoon project and the EJBoss EJB server, and a cofounder of the Apache Turbine project.