sep 22 2019, 5:04am feb 8, 2022
The DOMDocument class is good for reading small XML file but for large / huge XML, code may stall and give you no error at all. For large XML, you should use XMLReader instead to preserve your server memory usage.
A huge example of XML was downloaded from this page at Karaoke Version affiliation program and it generally has following structure:
<artists> <artist id="2000"> <name>The Solids</name> <name_sorted>Solids, The</name_sorted> <url>http://www.karaoke-version.com/mp3-backingtrack/the-solids/</url> <rank>5866</rank> <songs> <song id="5022"> <name>Hey Beautiful</name> <url>http://www.karaoke-version.com/mp3-backingtrack/the-solids/hey-beautiful.html</url> <rank>24467</rank> <preview>http://www.karaoke-version.com/preview/57278/</preview> ...
If we somehow need to save that into our database then we may format a data row / line i.e: artist's name, artist's song name and the link for previewing the audio. Example:
The Solids, Hey Beautiful, http://www.karaoke-version.com/preview/57278/ ...
<?php $t = time(); $m = memory_get_usage(); const XML_FILENAME = 'karaokeversion_catalog_en_GBP.xml'; $liner = new XMLReader(); if($liner->open(XML_FILENAME)){ $artistCount = 0; //number of artists $songCount = 0; //number of songs of all artists while($liner->read()){ if($liner->nodeType === XMLReader::ELEMENT && $liner->name === 'artist'){ //convert current line into an XML node $node = $liner->expand(); //for each artist node found, assume unknown artist name, initialize it $artistName = ''; //walk through this artist node's child nodes to find artist name and songs for($j = 0; $j < $node->childNodes->length; $j++){ $nodeChild = $node->childNodes->item($j); if($nodeChild->nodeType === XML_ELEMENT_NODE && $nodeChild->nodeName === 'name') $artistName = $nodeChild->nodeValue; elseif($nodeChild->nodeName === 'songs'){ //walk through this songs node's child nodes for($k = 0; $k < $nodeChild->childNodes->length; $k++){ $nodeGrandChild = $nodeChild->childNodes->item($k); if($nodeGrandChild->nodeType === XML_ELEMENT_NODE && $nodeGrandChild->nodeName === 'song'){ //for each song node found, assume unknown song details, initialize them $songName = ''; $songPreview = ''; //walk through this song node's child nodes for($l = 0; $l < $nodeGrandChild->childNodes->length; $l++){ $nodeGrandGrandChild = $nodeGrandChild->childNodes->item($l); if($nodeGrandGrandChild->nodeType === XML_ELEMENT_NODE){ if($nodeGrandGrandChild->nodeName === 'name') $songName = $nodeGrandGrandChild->nodeValue; elseif($nodeGrandGrandChild->nodeName === 'preview') $songPreview = $nodeGrandGrandChild->nodeValue; } } //add validation first here then format a new entry line to be stored somewhere if(!empty($artistName) && !empty($songName) && filter_var($songPreview, FILTER_VALIDATE_URL)){ echo "$artistName, $songName, $songPreview\n"; $songCount++; } } } } } $artistCount++; } }//end while $liner->close(); //report $t = time() - $t; $m = memory_get_usage() - $m; echo "\ntime spent: $t seconds.", "\nmemory usage: $m bytes.", "\nartist count: $artistCount.", "\nsong count: $songCount."; } else //if $liner->open fails echo 'error: can not open xml file.'; ?>
Click on following link to test: huge-xml-read (on a sister site).
Comments