reading huge xml with xmlreader

sep 22 2019, 5:04am feb 8, 2022

The DOMDocument class is good for reading small XML file but for large / huge XML, code may stall and give you no error at all. For large XML, you should use XMLReader instead to preserve your server memory usage.

XML source

A huge example of XML was downloaded from this page at Karaoke Version affiliation program and it generally has following structure:

	<artist id="2000">
		<name>The Solids</name>
		<name_sorted>Solids, The</name_sorted>
			<song id="5022">
				<name>Hey Beautiful</name>

If we somehow need to save that into our database then we may format a data row / line i.e: artist's name, artist's song name and the link for previewing the audio. Example:

The Solids, Hey Beautiful,

PHP script

	$t = time();
	$m = memory_get_usage();

	const XML_FILENAME = 'karaokeversion_catalog_en_GBP.xml';
	$liner = new XMLReader();
		$artistCount = 0;	//number of artists
		$songCount = 0;		//number of songs of all artists

			if($liner->nodeType === XMLReader::ELEMENT && $liner->name === 'artist'){
				//convert current line into an XML node
				$node = $liner->expand();
				//for each artist node found, assume unknown artist name, initialize it
				$artistName = '';
				//walk through this artist node's child nodes to find artist name and songs
				for($j = 0; $j < $node->childNodes->length; $j++){
					$nodeChild = $node->childNodes->item($j);
					if($nodeChild->nodeType === XML_ELEMENT_NODE && $nodeChild->nodeName === 'name')
						$artistName = $nodeChild->nodeValue;
					elseif($nodeChild->nodeName === 'songs'){
						//walk through this songs node's child nodes
						for($k = 0; $k < $nodeChild->childNodes->length; $k++){
							$nodeGrandChild = $nodeChild->childNodes->item($k);
							if($nodeGrandChild->nodeType === XML_ELEMENT_NODE && $nodeGrandChild->nodeName === 'song'){

								//for each song node found, assume unknown song details, initialize them
								$songName = '';
								$songPreview = '';
								//walk through this song node's child nodes
								for($l = 0; $l < $nodeGrandChild->childNodes->length; $l++){
									$nodeGrandGrandChild = $nodeGrandChild->childNodes->item($l);
									if($nodeGrandGrandChild->nodeType === XML_ELEMENT_NODE){
										if($nodeGrandGrandChild->nodeName === 'name')
											$songName = $nodeGrandGrandChild->nodeValue;
										elseif($nodeGrandGrandChild->nodeName === 'preview')
											$songPreview = $nodeGrandGrandChild->nodeValue;
								//add validation first here then format a new entry line to be stored somewhere
								if(!empty($artistName) && !empty($songName) && filter_var($songPreview, FILTER_VALIDATE_URL)){
									echo "$artistName, $songName, $songPreview\n";
		}//end while
		$t = time() - $t;
		$m = memory_get_usage() - $m;
		echo "\ntime spent: $t seconds.",
			"\nmemory usage: $m bytes.",
			"\nartist count: $artistCount.",
			"\nsong count: $songCount.";
	else //if $liner->open fails
		echo 'error: can not open xml file.';


Click on following link to test: huge-xml-read (on a sister site).