[olug] Regex question
Adam Haeder
adamh at aiminstitute.org
Mon Oct 9 19:16:18 UTC 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I'm working on a thorny regex issue. I have some text files that contain
some lines that have extended ascii characters. I would like to replace
those characters with a regular ascii character that is as much of a
logical replacement as I can come up with.
You can see an example of the lines at this link:
http://www.adamhaeder.com/regex_more.jpg
The image is what the lines look like when I run 'more' on the text
file to view the output. I wrote (ok, ok, found online somewhere) a perl script to
tell me exactly what this character is. Here's the script:
#!/usr/bin/perl
$FILE=$ARGV[0];
open(FILE_HANDLE, $FILE) || die "Can't open $FILE\n";
while (<FILE_HANDLE>)
{
$line = $_;
@chars = split(//,$line);
foreach my $ch (@chars)
{
$new=ord($ch);
print "$ch -> $new\n";
}
}
close FILE_HANDLE;
Here's the output relevant to the text in the image:
-> 10
- -> 226
0 -> 48
2 -> 50
2 -> 50
-> 9
S -> 83
o -> 111
u -> 117
g -> 103
h -> 104
t -> 116
-> 32
a -> 97
p -> 112
p -> 112
l -> 108
i -> 105
c -> 99
a -> 97
n -> 110
t -> 116
s -> 115
-> 32
f -> 102
o -> 111
r -> 114
-> 32
m -> 109
o -> 111
r -> 114
t -> 116
g -> 103
a -> 97
g -> 103
e -> 101
-> 10
- -> 226
0 -> 48
2 -> 50
2 -> 50
-> 9
F -> 70
i -> 105
l -> 108
l -> 108
e -> 101
d -> 100
-> 32
o -> 111
u -> 117
t -> 116
-> 32
m -> 109
o -> 111
r -> 114
t -> 116
g -> 103
a -> 97
g -> 103
e -> 101
-> 32
a -> 97
p -> 112
p -> 112
l -> 108
i -> 105
c -> 99
a -> 97
t -> 116
i -> 105
o -> 111
n -> 110
s -> 115
So this tells me my extended ascii character is #226, which according to
http://www.lookuptables.com/ is a weird upside down and backwords capital
L (that's what it looks like to me anyway).
So I'm trying to come up with a sed to replace this with something else,
and I can't seem to get sed to match it.
I want sed to replace ASCII 226 followed by two numbers with a dash.
This sed line replaces everything _but_ our extended ASCII char:
sed -r -e "s/[[:print:][:space:]]/-/g" $filename
But the inverse doesn't work:
sed -r -e "s/[^[:print:][:space:]]/-/g" $filename
This regex works when passed to grep:
grep -e "[^[:print:][:graph:]][0-9]{2}" $filename
But the same regex _does not_ work when passed to sed.
What am I doing wrong?
- --
Adam Haeder
Vice President of Information Technology
AIM Institute
adamh at aiminstitute.org
(402) 345-5025 x115
PGP Public key: http://www.haederfamily.org/pgp.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
iD8DBQFFKqACbHC3IXlHqBQRAgPLAJ9R/vltSDck3rv008j/mgS0Bh3QDwCdHyDf
+alQVcIfrImKTmEaMWJ9dBw=
=X/Al
-----END PGP SIGNATURE-----
More information about the OLUG
mailing list