Skip to main content

Parsing XML with VMware Orchestrator

For my first blog I thought I would start with something easy - parsing XML using VMware Orchestrator (a.k.a vCO)!  I started playing with vCO in September 2012 for a "Cloud" project so I still consider myself a newbie - if you happen across this post and find something incorrect or something that could be done better then please don't hesitate to speak up.

Since I can't post our actual XML, I'll be using the following XML which will give the gist of how to parse for elements & attributes.

<?xml version="1.0" encoding="UTF-8" ?>
<people>
  <person firstname="Jack" lastname="Smith" age="40">
    <phone type="home" number="1234567890" />
    <phone type="cell" number="1234567891" />
    <sport name="basketball" position="shooting guard" />
  </person>
  <person firstname="Jill" lastname="Smith" age="39">
    <phone type="home" number="1234567890" />
    <phone type="cell" number="1234567892" />
  </person>
</people>

My initial attempt was to use the XMLManager API class that is part of vCO.  This parser was pretty simple but lacking on so error handling as you can see:

var errorCode = "success";
var document = XMLManager.fromString(XMLString);
if (!document) {
  errorCode = "Invalid XML Document";
  throw "Invalid XML document";
}

// make sure we have at least one <person> element
var peopleElementList = document.getElementsByTagName("people");
var numOfPeople = peopleElementList.length;
System.log("numOfPeople : "+ numOfPeople);
if (numOfPeople == 0) {
  errorCode = "Invalid XML Document - people element missing";
  throw "Invalid XML document";
}

// loop through the people
for (var i = 0; i < numOfPeople; i++) {

  // get the person
  var person = peopleElementList.item(i);

  //get the attributes of the person element
  var personAttributes = person.attributes;

  // get each of the attributes on the element
  var firstname = person.getAttribute("firstname");
  var lastname = person.getAttribute("lastname");
  var age = person.getAttribute("age");

  // get the <phone> element and attributes
  var phoneElementList = person.getElementsByTagName("phone");
  for (var j = 0; j < phoneElementList.length; j++) {
    var phone = phoneElementList.item(j);
    var phoneType = phone.getAttribute("type");
    var phoneNumber = phone.getAttribute("number");
    System.log("phone type : "+ phoneType);
    System.log("phone number : "+ phoneNumber);
  }

  // get the <sport> element and attributes
  var sportElementList = person.getElementsByTagName("sport");
  for (var j = 0; j < sportElementList.length; j++) {
    var sport = sportElementList.item(j);
    var sportName = sport.getAttribute("name");
    var sportPosition = sport.getAttribute("position");
    System.log("sport name : "+ sportName);
    System.log("sport position : "+ sportPosition);
  }


} //end the loop
System.log("XML Parsing Completed");

This worked fine until we tied into a 3rd-party application which generated XML based on namespaces (http://www.w3schools.com/xml/xml_namespaces.asp) which turned the XML into something like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:people majorVersion="1" minorVersion="0" xmlns:ns2="http://xmlns.local.com/shared/people" xmlns:ns3="http://xmlns.local.com/common/address">
  <ns2:person firstname="Jack" lastname="Smith" age="40">

    <ns2:phone type="home" number="1234567890" />
    <ns2:phone type="cell" number="1234567891" />
    <ns2:sport name="basketball" position="shooting guard" />
  </ns2:person>
  <ns2:person firstname="Jill" lastname="Smith" age="39">
    <ns2:phone type="home" number="1234567890" />
    <ns2:phone type="cell" number="1234567892" />
  </ns2:person>
</ns2:people>

My first thought was that the namespace is ns2: so just append that to the beginning of each element names I'm searching for.  While that would work for this particular case, we are dealing with multiple namespaces and there's no guarantee that namespaces will always be the same - remember, these messages are generated from a 3rd-party app that we have little control over.  Back to square one.  I remembered reading about Javascript;s E4X (http://wso2.org/project/mashup/0.2/docs/e4xquickstart.html) when I was first looking at parsing XML so time to see what this is all about.  The big question - does vCO's Javascript Engine support E4X ... the answer is YES!  The key here is the .*:: as part of the inline query.  This will get the element regardless of the namespace prefix.  For our case the element names are unique even without the namespaces so I was safe to go with .*::.  If this isn't the case for you then check out http://communities.vmware.com/thread/391844 on the vmware community site for some additional information on dealing with namespaces.  Now for the updated code:  

var document = new XML(XMLString);
if (!document) {
  var errorCode = "Invalid XML Document";
  throw "Invalid XML document";
}

// make sure we have at least one <person> element
var numOfPeople = document.*::person.length();
System.log("numOfPeople : "+ numOfPeople);
if (numOfPeople == 0) {
    System.error("Invalid XML - no people provided");
    errorCode = "Invalid XML - no people provided";
    throw "Invalid XML";
}

// ass-u-me the XML is correct
var isValidXML = true;

// set the errorCode which is used if we throw an exception
var errorCode = "Invalid XML submitted";

// parse the <person> element
for (var i=0; i<numOfPeople ; i++) {
  var person = document.*::person[i];

  // populate the local variables for the attributes
  // these are required so make sure they exist
  // they dont then we log what's missing

  // firstname attribute
  if (person.hasOwnProperty('@firstname')) {
    var firstname= person.@firstname;
    System.log("firstname : "+ firstname);
  } else {
    System.error("no firstname attribute found");
    errorCode += "\n firstname attribute not found on <person> element number "+ (i+1);
    isValidXML = false;
  }

  // lastname attribute
  if (person.hasOwnProperty('@lastname')) {
    var lastname = person.@lastname;
    System.log("lastname : "+ lastname);
  } else {
    System.error("no lastname attribute found");
    errorCode += "\n lastname attribute not found on <person> element number "+ (i+1);
    isValidXML = false;
  }

  // age attribute
  if (person.hasOwnProperty('@age')) {
    var age = person.@age;
    System.log("age : "+ age);
  } else {
    System.error("no age attribute found");
    errorCode += "\n age attribute not found on <person> element number "+ (i+1);
    isValidXML = false;
  }

  // if anything is invalid then throw the exception
  if (isValidXML === false) {
    System.error(errorCode);
    throw "Invalid XML submitted";
  }

  // get child elements of this particular element
  // these are all optional so we dont throw an exception if they are missing
  var numChildren = person.*.length();
  System.log("numChildren: "+ numChildren);
  for (var j=0 ; j<numChildren ; j++) {
    var tag = person.*[j];
    var tagName = tag.localName();    // localName() gets element name without the namespace
    System.log("found "+ tagName);
    
    // check for phone element
    if (tagName == "phone") {
      var phoneType = tag.@type;
      var phoneNumber = tag.@number;
      System.log("phone type : "+ phoneType);
      System.log("phone number : "+ phoneNumber);

    // check for sport element
    } else if (tagName == "sport") {
      System.log("found a sport element")
      var sportName = tag.@name;
      var sportPosition = tag.@position;
      System.log("sport name : "+ sportName);
      System.log("sport position : "+ sportPosition);

    // throw away anything else
    } else {
      System.log("cannot handle "+ tagName +" : skipping");
    }
  }
}  //end the loop
System.log("XML Parsing Completed");

This new approach will now work regardless of the namespace so we are back in business - plus, I was able to add some much needed XML validation and error logging.  Happy Parsing!

UPDATE:
After posting I received a "tweet" from @vCOTeam with the following three lines of code:


var document = new XML(XMLString);
var ns = new Namespace("ns", document.namespace());
default xml namespace = ns;

This sets the default namespace so we no longer need the .*::.  The following lines should replace the lines in bold above:

var numOfPeople = document.person.length();
var person = document.person[i];

Thanks to the vCOTeam for improving the code.  One thing I forgot to include - the values returned by the E4X parsing methods are XMLList type.  You'll need to use the toString() function to use cast them to String type.

Comments

  1. Hello, an amazing Information dude. Thanks for sharing this nice information with us. Software cloud BPM

    ReplyDelete

Post a Comment

Popular posts from this blog

Reconnecting ESXi servers with PowerCLI

We recently needed to re-ip our vCenter servers; each with ~200 ESXi servers which needed to be reconnected - thank goodness for PowerCLI.  Since there isn't a "Reconnect-VMHost" cmdlet provided by PowerCLI we needed to check the HostSystem Object at h ttps://www.vmware.com/support/developer/converter-sdk/conv55_apireference/vim.HostSystem.html and to see what methods were available (hint: there's a ReconnectHost_Task method which will do the trick).  We can still leverage the "Get-VMHost" cmdlet to return the disconnected ESXi servers and then call the ReconnectHost_Task method to reconnect each ESXi server.  The code is fairly short: Get-VMHost -state Disconnected | foreach-object {   $vmhost = $_   $connectSpec = New-Object VMware.Vim.HostConnectSpec   $connectSpec.force = $true   $connectSpec.hostName =  $vmhost .name   $connectSpec.userName = 'root'   $connectSpec.password = 'MySecretPassword'    $vmhost .ex...

Querying for nested folders with PowerCLI

Have you fought trying to query nested, duplicate-named folders?  Hopefully this will help solve the problem!  Suppose you have a VM folder-tree similar to this:   So, how do you get the "\dotcom\linux\dev" folder using PowerCLI?  If you query for just "dev" then you can get an array of folders.    You can parse through the array and, using the parent object, traverse the tree backwards validating the folder names.  But, what if you have 100s of folders?  In my opinion, this is not an optimal approach.   We really need to do this: This is great case for  recursion .  In my words, recursion is a "stack" of operations.  When an operation completes its result is used by the next operation in the "stack".  Most importantly there has to be base-case which causes the last operation in the stack to return a valid result.  Then each operation can be popped off the "stack" and its result can be used by...