Friday, July 29, 2005

Java AWK - parsing csv-files in Java can be fun

Ever parsed text-files, writing loops and calling indexOf / substr just to get some boring job done? I confess, I did. Today, instead, I sat back and browsed the jdk-JavaDocs. Thinking what it would take to bring some of the simplicity of AWK to Java - and it'really simple - just a few lines of code and you can write things like this:

public class JawkSample extends JawkProgram {

protected void registerRules() {
addRules( new Rule[] {

new Rule( "^Q\\d",
new Command(){ protected void execute() {
print( "\nName: " + f(3));
}} ),

new Rule( "^P",
new Command(){ protected void execute() {
print("\n ignoring line "+ line() );
}} ),



public static void main(String[] args) throws IOException {
FileInputStream in = new FileInputStream("test1.txt");
JawkSample prog = new JawkSample();

prog.process( in, System.out );


Ok, anonymous classes are not nearly as nice and pretty as closures a la Ruby and Python. But it does the job, cleanly and nicely. Here is the base class - it just needs rt.jar - nothing else. I never noticed how much Regexp-support improved from 1.3 to 1.4. So it is possible to easily create small domain-specific languages in Java.


import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

* Instanzen der Klasse JawkProgram "J-AWK-Programme", AWK-artige
* Programme zur Verarbeitung von Text-Streams in Java.
* @author schultma
public abstract class JawkProgram {
protected String fieldSeparator = ";";
protected Pattern splitPattern;

private List /*of Rule*/ rules = new ArrayList();

private BufferedReader input;
private BufferedWriter output;

private class Line {
private String content;
private String[] fields;

public Line ( String content ) {
this.content = content;

public String getContent(){
return content;

public String field( int i ) {
if (fields==null)
fields = splitPattern.split( content );
return fields[i];

public void process() {
for ( Iterator ruleIterator = rules.iterator(); ruleIterator.hasNext(); ) {
Rule curRule = (Rule);
curRule.applyTo( this );

protected abstract class Command {
private Line line;

protected void print( String s ) {
try {
output.write( s );
} catch ( IOException e ) {
throw new RuntimeException(e);

protected String f(int i) {
return line.field(i);

protected String line() {
return line.getContent();

public final void executeOn( Line l ) {
line = l;
abstract protected void execute();

protected class Rule {
private Matcher matchPattern;
private Command command;

public Rule( String pattern, Command command ) {
this.matchPattern = Pattern.compile( pattern ).matcher("");
this.command = command;

public void applyTo( Line line ) {
matchPattern.reset( line.getContent() );
if ( matchPattern.find() ) {
if ( command==null ) {
command.print( line.getContent() );
} else {
command.executeOn( line );

protected void addRule( Rule rule ) {
rules.add( rule );

protected void addRules( Rule[] r ) {
rules.addAll( Arrays.asList(r) );

protected abstract void registerRules();

public JawkProgram () {

public void process( InputStream in, OutputStream out ) {
input = new BufferedReader( new InputStreamReader(in) );
output = new BufferedWriter( new OutputStreamWriter(out) );

splitPattern = Pattern.compile( fieldSeparator );

try {

String currentLine;
while ( (currentLine = input.readLine()) != null ) {
Line l = new Line(currentLine);

} catch ( IOException e ) {
throw new RuntimeException(e);


Thursday, July 28, 2005

Unified service lifecyle-interface for different service-models

Hivemind sports a bunch of service-models (singleton, pooled, threaded). After adding "client-session" for my request-spanning, hibernate-sessions, I realised that it's rather unfortunate that each of the service models comes with its own lifecycle-interface (Discardable, PoolManageable,...). So if you want to be able to switch service-models via configuration in one place, your service has to implement all of these interfaces - and you might not be sure what "all" is, thanks to hivemind's ease of extensibility.
How about a unified lifecycle-interface for all service models? It should reflect the structure of possible client-service-interactions rather than the implementation of a service-model. I would call it, maybe, "Conversation" with the following methods:
  • begin() (a new client starts to talk to the service)
  • end() (current client is finished talking to the service, throw away client state)
  • pause() (client stop's talking to the service for now, but will resume later, so don't throw away client state, but free resources like db-connections.
  • resume() (previous client continues to talk to the service)
So when configured as a "pooled" service, just begin() and end() would be called, "threaded" would probably just use end(), "client-session" would employ all of them.

nifty toy = Tomcat + MX4J + + mc4j

Just brought up a managed version of Tomcat 5.5 on j2se 1.4 using mx4j and the module . There are just a few gotchas
  • put mx4j.jar and mx4j-remote.jar on tomcat's system classpath, along with bootstrap.jar
  • don't contribute services with performance interceptor to
Everything else works cleanly out of the box. It's fun to watch your services drawing nice performance-graphs - just too cool.