reading huge xml with xmlreader

sep 22 2019, 5:04am

The DOMDocument class is good for reading small XML file but for large / huge XML, code may stall and give you no error at all. For large XML, you should use XMLReader instead to preserve your server memory usage.

XML source

A huge example of XML was downloaded from this page at Karaoke Version affiliation program and it generally has following structure:

	<artist id="2000">
		<name>The Solids</name>
		<name_sorted>Solids, The</name_sorted>
			<song id="5022">
				<name>Hey Beautiful</name>

If we somehow need to save that into our database then we may format a data row / line i.e: artist's name, artist's song name and the link for previewing the audio. Example:

The Solids, Hey Beautiful,

PHP script

	$t = time();
	$m = memory_get_usage();

	const XML_FILENAME = 'karaokeversion_catalog_en_GBP.xml';

	$liner = new XMLReader();

	$artistCount = 0;	//number of artists
	$songCount = 0;		//number of songs (all artists)

	if($liner->nodeType === XMLReader::ELEMENT && $liner->name === 'artist'){

	//convert current line into an XML node
	$node = $liner->expand();

	//for each artist node found, assume unknown artist name, initialize it
	$artistName = '';

	//walk through this artist node's child nodes to find artist name and songs
	for($j = 0; $j < $node->childNodes->length; $j++){
		$nodeChild = $node->childNodes->item($j);
		if($nodeChild->nodeType === XML_ELEMENT_NODE && $nodeChild->nodeName === 'name')
			$artistName = $nodeChild->nodeValue;
		elseif($nodeChild->nodeName === 'songs'){
			//walk through this songs node's child nodes
			for($k = 0; $k < $nodeChild->childNodes->length; $k++){
				$nodeGrandChild = $nodeChild->childNodes->item($k);
				if($nodeGrandChild->nodeType === XML_ELEMENT_NODE && $nodeGrandChild->nodeName === 'song'){

					//for each song node found, assume unknown song details, initialize them
					$songName = '';
					$songPreview = '';
					//walk through this song node's child nodes
					for($l = 0; $l < $nodeGrandChild->childNodes->length; $l++){
						$nodeGrandGrandChild = $nodeGrandChild->childNodes->item($l);
						if($nodeGrandGrandChild->nodeType === XML_ELEMENT_NODE){
							if($nodeGrandGrandChild->nodeName === 'name')
								$songName = $nodeGrandGrandChild->nodeValue;
							elseif($nodeGrandGrandChild->nodeName === 'preview')
								$songPreview = $nodeGrandGrandChild->nodeValue;
					//add validation first here then format a new entry line to be stored somewhere
					if(!empty($artistName) && !empty($songName) && filter_var($songPreview, FILTER_VALIDATE_URL)){
						echo "$artistName, $songName, $songPreview\n";
	}//end while


	$t = time() - $t;
	$m = memory_get_usage() - $m;

	echo "time spent: $t seconds.\n",
	"memory usage: $m bytes.\n",
	"artist count: $artistCount.\n",
	"song count: $songCount.\n";


Click on following link to test: xmlreader (on a sister site).