Perl UTF-8 and Regular Expressions
Today I wanted to explore Perl’s Unicode and regular expression capabilities, so I wrote down this simple script. It is quite amazing how simply Perl handles strings and regular expressions! Otherwise you have to use multiple sed or egrep commands with pipelines.
#!/usr/bin/perl
use Encode;
use utf8;
# mercy in Greek
my $bob = "<b>Έλεος</b>";
# get the first argument of script and decode it to utf8 string
my $telis = decode('UTF-8',$ARGV[0]);
# beta 'β' letter
my $ter = ord('β');
$ter+=4;
my $arithmouba = 2;
$arithmouba = $arithmouba << 3;
# convert number back to letter
$ter = chr($ter);
# regular expression substitution
$bob =~ s/<b>/<b>\n/g;
# encode output to utf8
$bob = encode('UTF-8', $bob);
$ter = encode('UTF-8', $ter);
print "$bob\n$telis\n$ter\n$arithmouba\n";
Comments