Data Integrator
tbl2tbl - Text file to table converter.

Parse any tabular text file (csv, tab delimited, excel exported text files) and output the result back as a simple non-quoted, non-escaped, tab-separated and cleaned file that can be easily parsed by most unix text tools, especially by our own set of tools which are based on tab-separated files.


With this tool a table-like text file can be transformed into a table with well defined and easier parsable rows, ie. a clean version of the table. It aims to read heavily Excel-accentuated files, which inlucde a plethora of special cases that feature entries hampering automated processing, for example new lines in cells. The tool is able to read a text file with any separation character, and each cell can be delimited by a pre-defined string character, such that everything between the start and the end of this character will be treated as the cell's content, including a set of white-space characters, that optionally can be removed from a cell (eg, tab and newline characters). It is also possible to trim cell content from whitespace characters and to remove empty lines (and even empty columns) before the line is printed to the output, please refer to the examples below.

If needed, in a second step cells with multiple entries can be split into rows by using tblsubsplit - Split cells by a non-column separator character.

Options applicable to more than a single tool are summarized in common command line options.


The main objective of this tool is to generate tab-separated files without special characters and escape characters. Nevertheless, column seperation and cell delimiter characters can be specified for the output file.

An Excel CSV file is very likely to look like this:

transformant id,construct id,cg number,chromosome nr,status,fb numbers,synonyms
cornichon related"
Early gene at 23"

This however does not look like a table, at least not viewed as above. But if we show the white space characters and still indicate line breaks by separate lines we are able to grasp the table-like structure of the file:

transformant id,construct id,cg number,chromosome nr,status,fb numbers,synonyms
2615,856,CG17262,2,available,"FBgn0064188,\nFBgn0031499,\nFBgn0243513","Cnir\nCni-related,\ncnir,\ncornichon related"
2620,866,CG3327,2,available,"FBgn0020445,\nFBgn0031524,\nFBgn0031525","CG15410,\nE23,\nEarly gene at 23"

HINT: In order to avoid dealing with whitespace characters in the text file, remove them already in your spreadsheet software (Excel or OpenOffice, for example) with a regular expression search and replace.


Example: Dealing with Excel-like CSV files

Let's assume this is stored in file excel-hell.csv, then the command

chris-cmd$ ./ -d , -q \" -c '\n' -w /tmp/excel-hell.csv

will generate a much more behaved table. Parameters were chosen such that

  • the column input separator is a command -d ,
  • the cell quoting character is a quote -q \"
  • newlines should be removed -c '\n' (this is the Unix version, it may be that on Windows another character must be entered here.)
  • and whitespace within the cell should be removed.
transformant id construct id cg number chromosome nr status fb numbers


2615 856 CG17262 2 available FBgn0064188,FBgn0031499,FBgn0243513

Cnir,Cni-related,cnir,cornichon related

2620 866 CG3327 2 available FBgn0020445,FBgn0031524,FBgn0031525 CG15410,E23,Early gene at 23

Example: Removing empty lines and columns

This tool can also be used to get rid of empty lines, it even is able to detect empty columns and remove them upon request. If for example file /tmp/empty-lines.tsv is given by

hdr1    hdr2    hdr3    hdr4    hdr5
1       2       3               5
a       b       c               d

A       B       C               E
?       :       ;               -

this    is      the             end

the command

chris-cmd$ ./ -H -r /tmp/empty-lines.tsv

will remove lines without any information, basically empty lines, which frequently appear at the end of a file. The output looks as shown below.

hdr1    hdr2    hdr3    hdr4    hdr5
1       2       3               5
a       b       c               d
A       B       C               E
?       :       ;               -
this    is      the             end

Now, we also want to get rid of column hdr4, which does not contain any valuable information. This is done by issuing the command

chris-cmd$ ./ -H -r -R /tmp/empty-lines.tsv

and presents us with this output:

hdr1    hdr2    hdr3    hdr5
1       2       3       5
a       b       c       d
A       B       C       E
?       :       ;       -
this    is      the     end

Please note that even though there was a header entry (hdr4), the column was deleted since the data cells were all empty.