桐木舟学英语人工智能

 找回密码
 立即注册
搜索
热搜: 活动 交友 discuz
查看: 560|回复: 1

PHP编程 php 英文分句/分段落

[复制链接]

97

主题

29

回帖

441

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
441
发表于 2023-4-9 03:30:02 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。

您需要 登录 才可以下载或查看,没有账号?立即注册

x
据说这一段代码,可以让英文语境下,根据句号分行。或者说分句。不知有没有人去测试过。

<?php

/*TWWY'S ART*/

function break_passage($text){          //分割段落
    return preg_split("/(\\r|\\n|\\r\\n)/", $text, -1, PREG_SPLIT_NO_EMPTY);
}

function break_sentence($text){     //分割句子   英文的句号后面必须有空格
    $re = '/# Split sentences on whitespace between them.
    (?<=                # Begin positive lookbehind.
      [.!?]             # Either an end of sentence punct,
    | [.!?][\\'"]        # or end of sentence punct and quote.
    )                   # End positive lookbehind.
    (?<!                # Begin negative lookbehind.
      Mr\\.              # Skip either "Mr."
    | Mrs\\.             # or "Mrs.",
    | Ms\\.              # or "Ms.",
    | Jr\\.              # or "Jr.",
    | Dr\\.              # or "Dr.",
    | Prof\\.            # or "Prof.",
    | Sr\\.              # or "Sr.",
                        # or... (you get the idea).
    )                   # End negative lookbehind.
    \\s+                 # Split on whitespace between sentences.
    /ix';
    $sentences = preg_split($re, $text, -1, PREG_SPLIT_NO_EMPTY);
    return $sentences;
}

function get_sentence($text){       //先分割段落再分割句子 [推荐]
    $passage = break_passage($text);
    $return = array();
    foreach ($passage as $key => $value) $return = array_merge($return, break_sentence($value));
    return $return;
}
回复

使用道具 举报

97

主题

29

回帖

441

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
441
 楼主| 发表于 2023-4-9 03:30:29 | 显示全部楼层
PHP试图将段落拆分成句子。保持标点符号

基本上我正在填充各种标点符号 如 ! ? 。 ; “并将它们分解成句子。 我面临的问题是想办法将它们分解成标点符号完整的句子,同时考虑对话中的引语 例如该段落:

    One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin. "What has happened!?" he asked himself. "I... don't know." said Samsa, "Maybe this is a bad dream." He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections.

需要像这样分裂

[0] One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin.
[1] "What has happened!?" he asked himself.
[2] "I... don't know." said Samsa, "Maybe this is a bad dream."

等等。 目前我只是使用爆炸

$sentences = explode(".", $sourceWork);

并且只在期间分割并在末尾附加一个。我所知道的远不是我想要的,但我不太确定哪里可以开始处理。如果有人能够至少指出我寻找想法的正确方向,那将是惊人的。 提前致谢!

3 个回复

网友1:

preg_split('/[.?!]/',$sourceWork);

这是非常简单的正则表达式,但我认为你的任务是不可能的。

网友2:
你需要手动浏览你的String并做爆炸。跟踪报价计数,如果是奇数不打破,这里有一个简单的想法:

    <?
//$str = 'AAA. BBB. "CCC." DDD. EEE. "FFF. GGG. HHH".';
$str = 'One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin. "What has happened!?" he asked himself. "I... don\'t know." said Samsa, "Maybe this is a bad dream." He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections.';
$last_dot=0;
$quotation=0;
$explode_list = Array();
for($i=0;$i < strlen($str);$i++)
{
    $char = substr($str,$i,1);//get the currect character
    if($char == '"') $quotation++;//track quotation
    if($quotation%2==1) continue;//nothing to do so go back
if($char == '.')
    {
        echo "char is $char $last_dot<br/>";
         $explode_list[]=(substr($str,$last_dot,$i+1-$last_dot));
         $last_dot = $i+1;
    }
}
echo "testing:<pre>";
print_r($explode_list);;

网友3:
这就是我所拥有的:

<?php
/**
* @param string $str                          String to split
* @param string $end_of_sentence_characters   Characters which represent the end of the sentence. Should be a string with no spaces (".,!?")
*
* @return array
*/
function split_sentences($str, $end_of_sentence_characters) {
    $inside_quotes = false;
    $buffer = "";
    $result = array();
    for ($i = 0; $i < strlen($str); $i++) {
        $buffer .= $str[$i];
        if ($str[$i] === '"') {
            $inside_quotes = !$inside_quotes;
        }
        if (!$inside_quotes) {
            if (preg_match("/[$end_of_sentence_characters]/", $str[$i])) {
                $result[] = $buffer;
                $buffer = "";
            }
        }
    }
    return $result;
}
$str = <<<STR
One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin. "What has happened!?" he asked himself. "I... don't know." said Samsa, "Maybe this is a bad dream." He lay on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections.
STR;
var_dump(split_sentences($str, "."));

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

Archiver|手机版|小黑屋|桐木舟论坛

GMT+8, 2024-10-31 18:18 , Processed in 0.036495 second(s), 23 queries .

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表