Schrift
[thread]12739[/thread]

unicode utf8 (Seite 2)

Leser: 3


<< |< 1 2 >| >> 11 Einträge, 2 Seiten
Gast Gast
 2008-11-19 14:46
#116398 #116398
perlunicode (Security Implications of Unicode):


Regular expressions behave slightly differently between byte data and character (Unicode) data. For example, the "word character" character class \w will work differently depending on if data is eight-bit bytes or Unicode.

In the first case, the set of \w characters is either small--the default set of alphabetic characters, digits, and the "_"--or, if you are using a locale (see perllocale), the \w might contain a few more letters according to your language and country.

In the second case, the \w set of characters is much, much larger. Most importantly, even in the set of the first 256 characters, it will probably match different characters: unlike most locales, which are specific to a language and country pair, Unicode classifies all the characters that are letters somewhere as \w . For example, your locale might not think that LATIN SMALL LETTER ETH is a letter (unless you happen to speak Icelandic), but Unicode does.



Mit LC_All = "de_DE.utf8" verhält sich das Skript "first case"-mäßig und mit LC_All = "de_DE" "second case"-mäßig.
Wie erreiche ich, dass \w "a few more letters according to your language and country" enthält?
<< |< 1 2 >| >> 11 Einträge, 2 Seiten



View all threads created 2008-11-10 11:40.