Thread Regex match nicht alle Emoji (24 answers)
Opened by GwenDragon at 2025-12-10 16:06

GwenDragon
 2025-12-13 17:11
#197379 #197379
User since
2005-01-17
14911 Artikel
Admin1
[Homepage]
user image
Und noch eines per KI. Etwas weird zusammen gestellt.
Seit wann müssen Zeichen-Klassen per | getrennt werden!?

Quote
Me:
Extend Perl regex to filter flags symbols and Emojis from HTML source.
Current code is
$string =~ s|([\p{Block=Miscellaneous_Symbols}\p{Emoji_Presentation}\p{Block=Dingbats}])||gs;

KI:
To extend your Perl regex for filtering flags, symbols, and emojis from HTML source, you can adjust it to cover flags and a broader range of emojis. Here’s how you can modify your regex:

$string =~ s|([\p{Block=Miscellaneous_Symbols}|
\p{Emoji_Presentation}|
\p{Block=Dingbats}|
\p{Block=Enclosed_Alphanumerics}|
\p{Block=Geometric_Shapes}|
\p{Block=Transportation_Symbols}|
\p{Block=Supplemental_Symbols_and_Pictographs}|
\p{Block=Flags}])||g;

Explanation of the Code:

1. Block Definitions:
- Miscellaneous_Symbols: Captures various symbols.
- Emoji_Presentation: Matches emojis that are represented in emoji format.
- Dingbats: Includes dingbat symbols.
- Enclosed_Alphanumerics: Covers alphanumeric symbols within a box.
- Geometric_Shapes: Matches geometric symbols, which can also include emoji-like shapes.
- Transportation_Symbols: Includes symbols related to transportation.
- Supplemental_Symbols_and_Pictographs: More emojis and pictorial symbols, including additional icons.
- Flags: Captures flag emojis.

2. Global Modifier g: Ensures all matches in the string will be replaced, not just the first.

Usage:

This improved regex pattern can effectively remove both flags and a broader range of symbols and emojis from your HTML source. Just apply this modified regex in the same way as your original code.

Feel free to adjust or extend the regex further depending on any specific symbols or emoji blocks you may need!

Ziemlich halluziniert, das.

Perl 5.42 sagt das dazu:
Code: (dl )
1
2
3
4
5
6
7
8
9
Perl 5.42 X:\TEST>perl -cw a.pl
Can't find Unicode property definition "Block=Transportation_Symbols" in regex; marked by <-- HERE in m/([\p{Block=Miscellaneous_Symbols}|
\p{Emoji_Presentation}|
\p{Block=Dingbats}|
\p{Block=Enclosed_Alphanumerics}|
\p{Block=Geometric_Shapes}|
\p{Block=Transportation_Symbols} <-- HERE |
\p{Block=Supplemental_Symbols_and_Pictographs}|
\p{Block=Flags}])/ at a.pl line 18.

Last edited: 2025-12-13 17:37:38 +0100 (CET)

View full thread Regex match nicht alle Emoji