Friday, April 20, 2012

Creating text tables with Text::ASCIITable

I work a lot with delimited flat files, and a good way to display them in plain text is to create an ASCII table.  The attached script, tab2tbl.pl uses the Text::ASCIITable module and can convert an incoming tab-delimited file to a nice looking table.

Here's an example of a simple input file:


Below is what it looks like after formatting is applied:


Adjusting for proper width when formatting UTF-8 files

If we are formatting UTF-8 data, not only are some of the characters encoded in more than one byte, they can also take up one or two spaces on the screen.  The latter are referred to as double-width characters, and many CJKV glyphs fall into this category.  See this proposal for a more detailed description.

If we use Text::ASCIITable without modification, it is not able to distinguish between single and double-width characters, so the output doesn't always line up.  Here's a sample:


However, the module gives us the option of defining our own count call-back function, so we can correct for this, by making use of the mbswidth() function in the Text::CharWidth module:

sub count_utf8_cb {
    my $input = shift;
    $input =~ s/\33\[(\d+(;\d+)?)?[musfwhojBCDHRJK]//g;
    $input =~ s/\33\([0B]//g;
    return Text::CharWidth::mbswidth($input);
}
$tbl->setOptions('cb_count',\&count_utf8_cb);
 Once this is in place, we get the following:



ANSI character output

The script also provides output using ANSI color codes (if your terminal supports them) using the -a option:

Download Links

You can download the files mentioned above here: tab2tbl.pl, chinese_test_file_1.txt.