Introduction

DataForge is a scalable engine for performing queries across a heterogenous set of data sources that can be both relational (SQL) and non-relational (anything!). Any datasource can be queried if it can be expressed as a tabular dataset and a transformer can be written for it.

Through the use of a streaming pipeline, DataForge provides a simple and efficient, yet powerful way to integrate java code and data sources.

New reference manual - PDF reference or HTML reference

News

07-Mar-2004

DataForge now solely uses contexts rather property sets to pass configuration between stages of transformers. This allows a single DataForge to be used by multiple threads.