其他教程

其他教程

Products

当前位置:首页 > 其他教程 >

正则表达式删除完整的HTML实体

GG网络技术分享 2025-03-18 16:15 3


问题描述:

We have a requirement to remove special characters from text strings. For example, we may get a string that looks like this; the ® is the registered trademark symbol:

PEPSI® Bottle 20 oz<br><br>

I\'m not great with regex, and can\'t figure out how to edit the existing code to produce that.

Here\'s what we currently have:

$ui = \"PEPSI Bottle 20 oz<br><br>\";

$ui = preg_replace(\'/[^A-Za-z0-9\\.\\\' -]/\', \'\', $ui);

This results in PEPSI174 Bottle 20 ozbrbr.

Our desired result is PEPSI Bottle 20 oz<br><br>.

How can I edit the regex to make sure that

  1. It doesn\'t remove valid HTML tags like <br>, and
  2. If it does find a special character entity, it removes not only the special characters (the & and #), but also the numbers and semicolon?

We don\'t want to have it remove all the numbers, as obviously the string can contain numbers; it\'s only numbers that are part of the entity code that we need to remove.

网友观点:

You could use this but now I can\'t guaranty it covers all the possible HTML entities:

$res = preg_replace(\'/&[A-Za-z0-9#]+;/\', \'\', $ui);

That says replace any substring that:

- starts with &

- followed by any number of alphanumeric characters or # in random order

- followed by ;.

如何用用正则表达式过滤html中所有 Script ?

理论上,正则表达式办不到。

解析HTML,从DOM上删除。

标签:

提交需求或反馈

Demand feedback