Recently we’ve started working with a client who uses MRI’s Market Connect (powered by VaultWare) to manage their real-time pricing and availability feeds for their websites. It’s an easy way for them to have a widely accessible repository of their property information for use anywhere on the web. The only small, teensy-weensy caveat about the system is that it uses XML to return this information to you.
If you’re any new developer, chances are you’ve never worked a whole lot with XML and to be honest… It can be quite tricky to parse and manipulate. Luckily, today I’m going to teach you how to pull in the VaultWare XML file you have, and parse the contents using libxml, all in the beautiful language of PHP.
In summary, these are the steps that we will take:
– Open the file (via file path or ftp endpoint)
– Fetch all content of the XML file
– Parse XML into an easy to use array format
Where to start?
Well, first things first, we have to get your XML file to a place that’s easily accessible, either by FTP to a remote server or locally on your own server.
I personally recommend setting up a small, remote server for them to drop the file into. You don’t need any fancy server or firewall configuration or even a lot of space. The only thing you need to do is set up permissions for two users: Market Connect/VaultWare and you. (You’ll want to restrict the VW user to its own directory and nowhere else.)
Since this is my personal recommendation, the rest of this post will carry out under the assumption that you have set that up.
We’re going to try and be a little bit more elegant with our solution and create a dedicated class that will accept the file path or ftp endpoint as a string, then return us our properties in an array format. We’ll call it, VaultWareImporter:
There we go! Now we’re on our way to having our importer and we’re sticking with best practices. One thing to note here is how I’m “type hinting” with a scalar value. This is something that’s new to PHP7 so if you have a version before that, then just omit it. It’s the only feature of PHP7 I’ll use in this example so don’t worry about anything else going forward.
The $filePath value will consist of a string value of:
– The path of the xml file on your personal server ( /home/user/<file name>.xml )
– The ftp endpoint to the xml file on your remote server ( ftp://<server user>:<server password>@<server IP>/path/to/xml/file.xml )
For sensitive information, such as the FTP server username and password, you should consider putting these values into ‘environment’ variables. I won’t go into how to do that in this post, but if you are using Apache and wish to do it in your vhost file, use the directive SetEnv. Like so:
SetEnv <variable name> “<variable value>”
It’s important to note that doing it this way will make the variable appear in $_SERVER as opposed to $_ENV.
Now that we have our class built, we can continue to open our file for retrieval and manipulation. Before we get into the meat of the code, let’s write out a handle() method that will perform all steps for us. In doing so, we will create a comprehensible set of methods that read like documentation:
Ok, so, pretty easy to read, right? We can see that we are going to open the file, get the contents of said file, then we are going to convert the results into an array format! Making chain-able methods like this can drastically improve readability, but there are some people that say it can cause confusion if not worded right. I do agree with this, but if it’s for only a handful of processes then I don’t see why we can’t make it a bit more pleasant for the next dev.
The first step we’re going to work with here is…
Opening the file
We’re going to have to open the file from our provided path to be able to work with it, so, we are going to use an age-old PHP method called fopen():
So, by definition, we’re binding a named resource, specified by filename, to a stream. Our second parameter, ‘rb’, essentially means “This is a read only file, set the pointer to the beginning (r) , but force binary mode and don’t translate the data (b). I put this into a try/catch so that we make sure we are notified when there are any errors trying to open the stream. In reality you might return a response or fire some other exception when the ‘catch’ is thrown, but in this example we will simply kill the application and display the error message.
It’s important to see that we’re passing back the $this class object as opposed to any value; this is what allows us to have methods that chain.
Once we have our file bound a stream, then we need to….
Get the contents of the file stream
Now that we have our file stream opened, we can then grab the contents of the entire file by using stream_get_contents():
Here we can see how chaining can be useful. In order to get the contents of the stream, we must pass through the file handler that we opened in the previous method. We can easily access this value because it was set as a class variable. These are normally accompanied by the appropriate getter and setter methods, but for this example we won’t worry about those.
At this point, PHP looks through the file in the stream from beginning to end and returns us the contents in a string format. After that, we close the stream because we aren’t working with it anymore.
Again, we make sure we do some error handling and pass back the class instance.
Next on our list is the fun part…
Converting the stream content into an array
If we’ve successfully retrieved the stream contents, then we can use the simplexml_load_string() method to parse those contents and convert them into an array!
We can see that the simplexml_load_string() method accepts a few different parameters that might not make sense right away. Let me explain:
The first parameter is supposed to just be the data you are wanting to convert. In other words, a well-formed XML string.
The second parameter seems arbitrary (and it is optional) just by looking at it but, in reality, SimpleXMLElement is a PHP specific class used to parse the xml string and return you an object. I won’t go over it here, but you can extend this class and mess around with your own parser. For now, we’re going to use the default.
The third parameter was introduced with the release of PHP 5.1.0 and Libxml 2.6.0, and they provide a number of constants that can alter the structure and content of the returned object. I use LIBXML_NOCDATA so that we merge CDATA as text nodes, and I use LIBXML_NOBLANKS to make sure we remove any blank nodes (which pertains to any values that may come through the property as null. Remove this to see the null values).
After we check for exceptions, then we return the results!
Here is a small snippet showing how this class may be used:
As we can see, we now have a very easy to read, easy to use class that helps us parse our VaultWare XML files!
What we get back in the end is a SimpleXMLElement object array that contains all information about our properties and any meta information that came along with it.
If you take a look at the last line in the screenshot above, I use a method available on SimpleXMLElement called xpath(). This method lets us search through the XML object we have and return the nodes for children matching the ‘XPath’ path. You can read up more on how that works in the PHP docs (http://php.net/manual/en/simplexmlelement.xpath.php).
Congrats! We’re finished
So to recap what we’ve learned above, we open the file with fopen(), get the contents of the file stream using stream_get_contents(), then we convert the stream content into a SimpleXMLElement object and return the results.
We also learned that we can access the data by using a neat method called xpath(). In my opinion, using this makes data retrieval and parsing on the object a heck of a lot easier and, in some cases, cleaner.
If you have any more questions about how this works, or have any ideas on what else could be added to this helper class, feel free to start the conversation in the comments below!
Cheers.