Perl UTF-8 and Regular Expressions

less than 1 minute read

Today I wanted to explore Perl’s Unicode and regular expression capabilities, so I wrote down this simple script. It is quite amazing how simply Perl handles strings and regular expressions! Otherwise you have to use multiple sed or egrep commands with pipelines.


use Encode;
use utf8;

# mercy in Greek
my $bob  = "<b>Έλεος</b>";

# get the first argument of script and decode it to utf8 string
my $telis = decode('UTF-8',$ARGV[0]);

# beta 'β' letter
my $ter = ord('β');

my $arithmouba = 2;
$arithmouba = $arithmouba << 3;

# convert number back to letter
$ter = chr($ter);

# regular expression substitution
$bob =~ s/<b>/<b>\n/g; 

# encode output to utf8
$bob = encode('UTF-8', $bob);
$ter = encode('UTF-8', $ter);

print "$bob\n$telis\n$ter\n$arithmouba\n";