mirror of git://gcc.gnu.org/git/gcc.git
				
				
				
			
		
			
				
	
	
		
			256 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			HTML
		
	
	
	
			
		
		
	
	
			256 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			HTML
		
	
	
	
<html><head><title>
 | 
						|
blah
 | 
						|
<!--
 | 
						|
/*
 | 
						|
 * Copyright (C) 1999-2001 The Free Software Foundation, Inc.
 | 
						|
 */
 | 
						|
-->
 | 
						|
</title></head><body>
 | 
						|
 | 
						|
<p>This package exposes a kind of XML processing pipeline, based on sending
 | 
						|
SAX events, which can be used as components of application architectures.
 | 
						|
Pipelines are used to convey streams of processing events from a producer
 | 
						|
to one or more consumers, and to let each consumer control the data seen by
 | 
						|
later consumers.
 | 
						|
 | 
						|
<p> There is a <a href="PipelineFactory.html">PipelineFactory</a> class which
 | 
						|
accepts a syntax describing how to construct some simple pipelines.  Strings
 | 
						|
describing such pipelines can be used in command line tools (see the
 | 
						|
<a href="../util/DoParse.html">DoParse</a> class)
 | 
						|
and in other places that it is
 | 
						|
useful to let processing be easily reconfigured.  Pipelines can of course
 | 
						|
be constructed programmatically, providing access to options that the
 | 
						|
factory won't.
 | 
						|
 | 
						|
<p> Web applications are supported by making it easy for servlets (or
 | 
						|
non-Java web application components) to be part of a pipeline.  They can
 | 
						|
originate XML (or XHTML) data through an <em>InputSource</em> or in
 | 
						|
response to XML messages sent from clients using <em>CallFilter</em>
 | 
						|
pipeline stages.  Such facilities are available using the simple syntax
 | 
						|
for pipeline construction.
 | 
						|
 | 
						|
 | 
						|
<h2> Programming Models </h2>
 | 
						|
 | 
						|
<p> Pipelines should be simple to understand.
 | 
						|
 | 
						|
<ul>
 | 
						|
    <li> XML content, typically entire documents,
 | 
						|
    is pushed through consumers by producers.
 | 
						|
 | 
						|
    <li> Pipelines are basically about consuming SAX2 callback events,
 | 
						|
    where the events encapsulate XML infoset-level data.<ul>
 | 
						|
 | 
						|
	<li> Pipelines are constructed by taking one or more consumer
 | 
						|
	stages and combining them to produce a composite consumer.
 | 
						|
 | 
						|
	<li> A pipeline is presumed to have pending tasks and state from
 | 
						|
	the beginning of its ContentHandler.startDocument() callback until
 | 
						|
	it's returned from its ContentHandler.doneDocument() callback.
 | 
						|
 | 
						|
	<li> Pipelines may have multiple output stages ("fan-out")
 | 
						|
	or multiple input stages ("fan-in") when appropriate.
 | 
						|
 | 
						|
	<li> Pipelines may be long-lived, but need not be.
 | 
						|
 | 
						|
	</ul>
 | 
						|
 | 
						|
    <li> There is flexibility about event production. <ul>
 | 
						|
 | 
						|
	<li> SAX2 XMLReader objects are producers, which
 | 
						|
	provide a high level "pull" model: documents (text or DOM) are parsed,
 | 
						|
	and the parser pushes individual events through the pipeline.
 | 
						|
 | 
						|
	<li> Events can be pushed directly to event consumer components
 | 
						|
	by application modules, if they invoke SAX2 callbacks directly.
 | 
						|
	That is, application modules use the XML Infoset as exposed
 | 
						|
	through SAX2 event callbacks.
 | 
						|
 | 
						|
	</ul>
 | 
						|
    
 | 
						|
    <li> Multiple producer threads may concurrently access a pipeline,
 | 
						|
    if they coordinate appropriately.
 | 
						|
 | 
						|
    <li> Pipeline processing is not the only framework applications
 | 
						|
    will use.
 | 
						|
 | 
						|
    </ul>
 | 
						|
 | 
						|
 | 
						|
<h3> Producers: XMLReader or Custom </h3>
 | 
						|
 | 
						|
<p> Many producers will be SAX2 XMLReader objects, and
 | 
						|
will read (pull) data which is then written (pushed) as events.
 | 
						|
Typically these will parse XML text (acquired from
 | 
						|
<code>org.xml.sax.helpers.XMLReaderFactory</code>) or a DOM tree
 | 
						|
(using a <code><a href="../util/DomParser.html">DomParser</a></code>)
 | 
						|
These may be bound to event consumer using a convenience routine,
 | 
						|
<em><a href="EventFilter.html">EventFilter</a>.bind()</em>.
 | 
						|
Once bound, these producers may be given additional documents to
 | 
						|
sent through its pipeline.
 | 
						|
 | 
						|
<p> In other cases, you will write producers yourself.  For example, some
 | 
						|
data structures might know how to write themselves out using one or
 | 
						|
more XML models, expressed as sequences of SAX2 event callbacks.
 | 
						|
An application module might
 | 
						|
itself be a producer, issuing startDocument and endDocument events
 | 
						|
and then asking those data structures to write themselves out to a
 | 
						|
given EventConsumer, or walking data structures (such as JDBC query
 | 
						|
results) and applying its own conversion rules.  WAP format XML
 | 
						|
(WBMXL) can be directly converted to producer output.
 | 
						|
 | 
						|
<p> SAX2 introduced an "XMLFilter" interface, which is a kind of XMLReader.
 | 
						|
It is most useful in conjunction with its XMLFilterImpl helper class;
 | 
						|
see the <em><a href="EventFilter.html">EventFilter</a></em> javadoc
 | 
						|
for information contrasting that XMLFilterImpl approach with the
 | 
						|
relevant parts of this pipeline framework.  Briefly, such XMLFilterImpl
 | 
						|
children can be either producers or consumers, and are more limited in
 | 
						|
configuration flexibility.  In this framework, the focus of filters is
 | 
						|
on the EventConsumer side; see the section on
 | 
						|
<a href="#fitting">pipe fitting</a> below.
 | 
						|
 | 
						|
 | 
						|
<h3> Consume to Standard or Custom Data Representations </h3>
 | 
						|
 | 
						|
<p> Many consumers will be used to create standard representations of XML
 | 
						|
data.  The <a href="TextConsumer.html">TextConsumer</a> takes its events
 | 
						|
and writes them as text for a single XML document,
 | 
						|
using an internal <a href="../util/XMLWriter.html">XMLWriter</a>.
 | 
						|
The <a href="DomConsumer.html">DomConsumer</a> takes its events and uses
 | 
						|
them to create and populate a DOM Document.
 | 
						|
 | 
						|
<p> In other cases, you will write consumers yourself.  For example,
 | 
						|
you might use a particular unmarshaling filter to produce objects
 | 
						|
that fit your application's requirements, instead of using DOM.
 | 
						|
Such consumers work at the level of XML data models, rather than with
 | 
						|
specific representations such as XML text or a DOM tree.  You could
 | 
						|
convert your output directly to WAP format data (WBXML).
 | 
						|
 | 
						|
 | 
						|
<h3><a name="fitting">Pipe Fitting</a></h3>
 | 
						|
 | 
						|
<p> Pipelines are composite event consumers, with each stage having
 | 
						|
the opportunity to transform the data before delivering it to any
 | 
						|
subsequent stages.
 | 
						|
 | 
						|
<p> The <a href="PipelineFactory.html">PipelineFactory</a> class
 | 
						|
provides access to much of this functionality through a simple syntax.
 | 
						|
See the table in that class's javadoc describing a number of standard
 | 
						|
components.  Direct API calls are still needed for many of the most
 | 
						|
interesting pipeline configurations, including ones leveraging actual
 | 
						|
or logical concurrency.
 | 
						|
 | 
						|
<p> Four basic types of pipe fitting are directly supported.  These may
 | 
						|
be used to construct complex pipeline networks.  <ul>
 | 
						|
 | 
						|
    <li> <a href="TeeConsumer.html">TeeConsumer</a> objects split event
 | 
						|
    flow so it goes to two two different consumers, one before the other.
 | 
						|
    This is a basic form of event fan-out; you can use this class to
 | 
						|
    copy events to any number of output pipelines.
 | 
						|
 | 
						|
    <li> Clients can call remote components through HTTP or HTTPS using
 | 
						|
    the <a href="CallFilter.html">CallFilter</a> component, and Servlets
 | 
						|
    can implement such components by extending the
 | 
						|
    <a href="XmlServlet.html">XmlServlet</a> component.  Java is not
 | 
						|
    required on either end, and transport protocols other than HTTP may
 | 
						|
    also be used.
 | 
						|
 | 
						|
    <li> <a href="EventFilter.html">EventFilter</a> objects selectively
 | 
						|
    provide handling for callbacks, and can pass unhandled ones to a
 | 
						|
    subsequent stage.  They are often subclassed, since much of the
 | 
						|
    basic filtering machinery is already in place in the base class.
 | 
						|
 | 
						|
    <li> Applications can merge two event flows by just using the same
 | 
						|
    consumer in each one.  If multiple threads are in use, synchronization
 | 
						|
    needs to be addressed by the appropriate application level policy.
 | 
						|
 | 
						|
    </ul>
 | 
						|
 | 
						|
<p> Note that filters can be as complex as
 | 
						|
<a href="XsltFilter.html">XSLT transforms</a>
 | 
						|
available) on input data, or as simple as removing simple syntax data
 | 
						|
such as ignorable whitespace, comments, and CDATA delimiters.
 | 
						|
Some simple "built-in" filters are part of this package.
 | 
						|
 | 
						|
 | 
						|
<h3> Coding Conventions:  Filter and Terminus Stages</h3>
 | 
						|
 | 
						|
<p> If you follow these coding conventions, your classes may be used
 | 
						|
directly (give the full class name) in pipeline descriptions as understood
 | 
						|
by the PipelineFactory.  There are four constructors the factory may
 | 
						|
try to use; in order of decreasing numbers of parameters, these are: <ul>
 | 
						|
 | 
						|
    <li> Filters that need a single String setup parameter should have
 | 
						|
    a public constructor with two parameters:  that string, then the
 | 
						|
    EventConsumer holding the "next" consumer to get events.
 | 
						|
 | 
						|
    <li> Filters that don't need setup parameters should have a public
 | 
						|
    constructor that accepts a single EventConsumer holding the "next"
 | 
						|
    consumer to get events when they are done.
 | 
						|
 | 
						|
    <li> Terminus stages may have a public constructor taking a single
 | 
						|
    paramter:  the string value of that parameter.
 | 
						|
 | 
						|
    <li> Terminus stages may have a public no-parameters constructor.
 | 
						|
 | 
						|
    </ul>
 | 
						|
 | 
						|
<p> Of course, classes may support more than one such usage convention;
 | 
						|
if they do, they can automatically be used in multiple modes.  If you
 | 
						|
try to use a terminus class as a filter, and that terminus has a constructor
 | 
						|
with the appropriate number of arguments, it is automatically wrapped in
 | 
						|
a "tee" filter.
 | 
						|
 | 
						|
 | 
						|
<h2> Debugging Tip: "Tee" Joints can Snapshot Data</h2>
 | 
						|
 | 
						|
<p> It can sometimes be hard to see what's happening, when something
 | 
						|
goes wrong.  Easily fixed:  just snapshot the data.  Then you can find
 | 
						|
out where things start to go wrong.
 | 
						|
 | 
						|
<p> If you're using pipeline descriptors so that they're easily
 | 
						|
administered, just stick a <em>write ( filename )</em>
 | 
						|
filter into the pipeline at an appropriate point.
 | 
						|
 | 
						|
<p> Inside your programs, you can do the same thing directly: perhaps
 | 
						|
by saving a Writer (perhaps a StringWriter) in a variable, using that
 | 
						|
to create a TextConsumer, and making that the first part of a tee --
 | 
						|
splicing that into your pipeline at a convenient location.
 | 
						|
 | 
						|
<p> You can also use a DomConsumer to buffer the data, but remember
 | 
						|
that DOM doesn't save all the information that XML provides, so that DOM
 | 
						|
snapshots are relatively low fidelity.  They also are substantially more
 | 
						|
expensive in terms of memory than a StringWriter holding similar data.
 | 
						|
 | 
						|
<h2> Debugging Tip: Non-XML Producers</h2>
 | 
						|
 | 
						|
<p> Producers in pipelines don't need to start from XML
 | 
						|
data structures, such as text in XML syntax (likely coming
 | 
						|
from some <em>XMLReader</em> that parses XML) or a
 | 
						|
DOM representation (perhaps with a
 | 
						|
<a href="../util/DomParser.html">DomParser</a>).
 | 
						|
 | 
						|
<p> One common type of event producer will instead make
 | 
						|
direct calls to SAX event handlers returned from an
 | 
						|
<a href="EventConsumer.html">EventConsumer</a>.
 | 
						|
For example, making <em>ContentHandler.startElement</em>
 | 
						|
calls and matching <em>ContentHandler.endElement</em> calls.
 | 
						|
 | 
						|
<p> Applications making such calls can catch certain
 | 
						|
common "syntax errors" by using a
 | 
						|
<a href="WellFormednessFilter.html">WellFormednessFilter</a>.
 | 
						|
That filter will detect (and report) erroneous input data
 | 
						|
such as mismatched document, element, or CDATA start/end calls.
 | 
						|
Use such a filter near the head of the pipeline that your
 | 
						|
producer feeds, at least while debugging, to help ensure that
 | 
						|
you're providing legal XML Infoset data.
 | 
						|
 | 
						|
<p> You can also arrange to validate data on the fly.
 | 
						|
For DTD validation, you can configure a
 | 
						|
<a href="ValidationConsumer.html">ValidationConsumer</a>
 | 
						|
to work as a filter, using any DTD you choose.
 | 
						|
Other validation schemes can be handled with other
 | 
						|
validation filters.
 | 
						|
 | 
						|
</body></html>
 |