PHP · 2011年4月26日 0

一个简单的解析非标准xml的小玩意儿

速度还算不错,挺好用的,处理个10来M的文本没啥问题。根据各自需要改吧

 <?php
class XmlPhrase {
    private $startTag    = '<Row>';
    private $endTag        = '</Row>';
    public function  __construct() {
        set_time_limit(0);
        ini_set ( 'memory_limit', '8000M' );
    }

    public function run() {
        $xmlstr = '';
        $fp     = fopen('1.xml','r');
        echo "beginn";
        if($fp) {
            while(!feof($fp)) {
                $buffer = fgets($fp, 4096);
                if(strpos($buffer, $this->startTag)!==false) {
                    $xmlstr = '';
                } elseif(strpos($buffer, $this->endTag)!==false) {
                    $this->phraseXmlStr($xmlstr);
                } else {
                    $xmlstr .= $buffer;
                }
            }
            fclose($fp);
        }
        echo "finishn";
    }

    private function phraseXmlStr($xmlstr) {
        $xml = $tmp = $out = '';
        $xmlstr    = $this->startTag.$xmlstr.$this->endTag;
        $xmlstr    = str_replace('<Data ss:Type="String">', '<![CDATA[', $xmlstr);
        $xmlstr    = str_replace('</Data>', ']]>', $xmlstr);
        $xml    = new SimpleXMLElement($xmlstr);
        foreach($xml->Cell as $v) {
            $tmp[]    = trim($v);
        }
        $out    = '"'.join('","', $tmp).'"'."n";
        file_put_contents('result.csv', $out, FILE_APPEND);
    }
}

$app = new XmlPhrase();
$app->run();
?>

原始数据类似这样的

<Row>
<Cell><Data ss:Type="String">时间 </Data></Cell>
<Cell><Data ss:Type="String">IP 地址 </Data></Cell>
<Cell><Data ss:Type="String">您的个人邮箱: </Data></Cell>
<Cell><Data ss:Type="String">姓名: </Data></Cell>
<Cell><Data ss:Type="String">年龄: </Data></Cell>
<Cell><Data ss:Type="String">性别: </Data></Cell>
<Cell><Data ss:Type="String">月薪: </Data></Cell>
<Cell><Data ss:Type="String">通讯地址: </Data></Cell>
<Cell><Data ss:Type="String">申请款式: </Data></Cell>
<Cell><Data ss:Type="String">申请理由: </Data></Cell>
<Cell><Data ss:Type="String">联系电话: </Data></Cell>
<Cell><Data ss:Type="String">您的职业: </Data></Cell>
</Row>