Perl UTF-8 and Regular Expressions

less than 1 minute read

Today I wanted to explore Perl’s Unicode and regular expression capabilities, so I wrote down this simple script. It is quite amazing how simply Perl handles strings and regular expressions! Otherwise you have to use multiple sed or egrep commands with pipelines.

#!/usr/bin/perl

use Encode;
use utf8;

# mercy in Greek
my $bob  = "<b>Έλεος</b>";

# get the first argument of script and decode it to utf8 string
my $telis = decode('UTF-8',$ARGV[0]);

# beta 'β' letter
my $ter = ord('β');
$ter+=4;

my $arithmouba = 2;
$arithmouba = $arithmouba << 3;

# convert number back to letter
$ter = chr($ter);

# regular expression substitution
$bob =~ s/<b>/<b>\n/g; 

# encode output to utf8
$bob = encode('UTF-8', $bob);
$ter = encode('UTF-8', $ter);


print "$bob\n$telis\n$ter\n$arithmouba\n";

Comments