Data

The Data Team develops and maintains a web application and a related database for text variants of the Greek 2 Samuel.

Our immediate goals are to develop a system for linking indirect witnesses (e.g., quotations by early patristic writers) to our main text and to automate the generation of the text critical apparatus to the upcoming critical edition.

For the research community, the usefulness and value of data depends heavily on how well-organized and accessible it is. This calls for strong emphasis on dependable storage methods and at some degree, careful choice of storage format. In the Data Team we are attempting to standardize the data management, storage, and distribution processes. Our workhorse is industry standard XML which is widely adopted encoding format for data publication and distribution over digital networks. For data distribution we have created an XML schema based on Text Encoding Initiative (TEI) guidelines which are primarily designed for use in social sciences, digital humanities, and linguistics. We have further extended these rules to form a basis for a complete, versatile and future proof data interchange format.

Even at very basic level, an electronic edition which combines biblical text with morphological analysis yields obvious benefits such as searches based on word combinations and grammatical structures. Moreover, standard algorithms allow for approximate text matching and classification which can be combined with advanced data processing technologies such as machine learning. We are actively seeking to implement new features into our software tools at all levels.

One important goal of the Data Team is to make the data and software tools accessible. This benefits the research community but also IT specialists. The source code of our software will be released in GitHub under General Public License. In time, we seek to make the research data openly available to general public in an interesting and captivating format.