Wednesday, June 8, 2011

Using Salesforce.com's REST API from Perl

Introduction

Here's a simple way to access Salesforce data even from behind a firewall, using Perl and the REST API.

If you're not familiar with the details of the Salesforce REST API, have a look at the developer docs.

I'm using Perl 5.12.1 with modules WWW::Mechanize, and WWW::Salesforce.

The firewall settings (optional)

Since I'm behind a firewall, I have to define the necessary variables so the various Perl packages can route the requests properly.  Here's what's needed:
 #!/usr/bin/perl
use warnings;
use strict;
use WWW::Mechanize;
use WWW::Salesforce;

# The firewall proxy variables:
$ENV{HTTPS_PROXY} = 'http://proxy.org.com:80';
$ENV{HTTPS_PROXY_USERNAME} = "proxy_user";
$ENV{HTTPS_PROXY_PASSWORD} = "proxy_pass";
my $mech = WWW::Mechanize->new();
$mech->agent('Mozilla/5.0');
$mech->proxy('http', 'http://proxy_user:proxy_pass@proxy.org.com:80');
$mech->proxy('https', undef);

If you don't need to go through a firewall, you can skip all of these proxy settings, but you'll still need the $mech definition, as it will be used to make the REST requests later.

Authorization

The most difficult part (and it wasn't difficult) was setting up the authorization.  Salesforce offers two options:
  1. OAuth 2.0
  2. Session ID
The Sesion ID is far easier than the first option, so I'm using it here.

The first step is to log-in via the SOAP API (since that is a way to get a session ID):

# Authenticate first via SOAP interface to get a session ID:
my $sforce = eval { WWW::Salesforce->login(
                    username => 'sfdc_user@org.com',
                    password => 'sfdc_pass' ); };
die "Could not login to SFDC: $@" if $@;

Session ID

Now that we've logged-in via the SOAP API, we have a session ID as part of the $sforce variable, which we can get as follows:
 # Get the session ID:
my $hdr = $sforce->get_session_header();
my $sid = ${$hdr->{_value}->[0]}->{_value}->[0];

This SID will need to be sent as part of our request header when using the REST API.

A REST Request

Here's a simple request for all available SObjects:
# Now get some data (using the session ID from above):
$mech->add_header( "Authorization" => "OAuth $sid" );
$mech->add_header( "X-PrettyPrint" => '1' );
$mech->get("https://na1.salesforce.com/services/data/v20.0/sobjects/
");
# ...and write it out to the screen:
print $mech->content;

Going further

 The developer doc mentioned above lists lots of examples for requests that can be made this way, including SOQL queries.

Friday, February 18, 2011

Parsing XML from SharePoint lists using Perl

Introduction

I was tasked with generating flat-files from Microsoft SharePoint
lists recently, for loading into a database, and wanted to share one way of
doing this, which involves downloading the XML with a tool like wget or curl,
and then parsing it using a Perl script.
Unfortunately, I did not have access to the back-end, or I might have looked
for a way of running a query on the SQL Server database behind the SharePoint
application.
There are also other ways of doing this that involve the use of SOAP::Lite.
Have look, for example, at this great post.
For this particular exercise I had access to a SharePoint 2007 site. Your
mileage may vary with a more recent version.
 

The SharePoint list URL

Before we can pull the XML from SharePoint, we’ll need the URL for the XML
version of the list we’ll be extracting.
From the SharePoint site for the list, export the list as an Excel
Spreadsheet. This will cause the site to push an file called owssvr.iqy as
a download to your browser. Save this file and open it with a text editor.
  

The owssvr.iqy file

Here’s what the downloaded web query file (owssvr.iqy) might look like:
WEB
1
https://sharepoint.domain.org/path/to/list/_vti_bin/owssvr.dll?XMLDATA=1&List={921ACBDB-5060-4B35-A126-19E9A060513D}&View={F1275A60-A531-4F88-882E-E096721E53D0}

Selection={921ACBDB-5060-4B35-A126-19E9A060513D}-{F1275A60-A531-4F88-882E-E096721E53D0}
EditWebPage=
Formatting=None
PreFormattedTextToColumns=True
ConsecutiveDelimitersAsOne=True
SingleBlockTextImport=False
DisableDateRecognition=False
DisableRedirections=False
SharePointApplication=https://sharepoint.domain.org/path/to/list/_vti_bin
SharePointListView={F1275A60-A531-4F88-882E-E096721E53D0}
SharePointListName={921ACBDB-5060-4B35-A126-19E9A060513D}
RootFolder=/path/to/list/My List 
The URL listed on the third line of the file will point directly to the XML for the list. The list ID itself is shown as the attribute SharePointListName, and is also part of the URL.

Downloading the XML

With wget

For the SharePoint site I’m querying I have to use HTTPS, and it requires
authentication. Using the link obtained above, I can download the XML for the
list as follows (type the whole command into a single line):
wget --no-check-certificate --user=username --password=password -O test_list.xml
'https://sharepoint.domain.org/path/to/list/_vti_bin/owssvr.dll?XMLDATA=1&List={921ACBDB-5060-4B35-A126-19E9A060513D}&View={F1275A60-A531-4F88-882E-E096721E53D0}'
Note: Even though the curly braces ({, and }) in the URL are not valid
characters for a link, wget is able to use them, as it is encoding them behind
the scenes into %7B and %7D respectively. BTW if you need to encode
strings into valid URL format, have a look at URI::Escape.
 

With curl

Curl requires more strict URL encoding, but downloads the XML just as
well with the below command (again, put it into a single line).
curl --insecure --ntlm --user username:password -o test_list.xml
'https://sharepoint.domain.org/path/to/list/_vti_bin/owssvr.dll?XMLDATA=1&List=%7B921ACBDB-5060-4B35-A126-19E9A060513D%7D&View=%7BF1275A60-A531-4F88-882E-E096721E53D0%7D'

Structure of the XML

Here’s a sample of the XML obtained with either of the commands above. In this
example I’m querying a list of tasks.
<xml xmlns:s='uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882'
     xmlns:dt='uuid:C2F41010-65B3-11d1-A29F-00AA00C14882'
     xmlns:rs='urn:schemas-microsoft-com:rowset'
     xmlns:z='#RowsetSchema'>
<s:Schema id='RowsetSchema'>
   <s:ElementType name='row' content='eltOnly' rs:CommandTimeout='30'>
      <s:AttributeType name='ows_Attachments' rs:name='Attachments' rs:number='1'>
         <s:datatype dt:type='boolean' dt:maxLength='1' />
      </s:AttributeType>
      <s:AttributeType name='ows_LinkTitle' rs:name='Title' rs:number='2'>
         <s:datatype dt:type='string' dt:maxLength='512' />
      </s:AttributeType>
      <s:AttributeType name='ows_AssignedTo' rs:name='Assigned To' rs:number='3'>
         <s:datatype dt:type='string' dt:lookup='true' dt:maxLength='512' />
      </s:AttributeType>
      <s:AttributeType name='ows_Status' rs:name='Status' rs:number='4'>
         <s:datatype dt:type='string' dt:maxLength='512' />
      </s:AttributeType>
      <s:AttributeType name='ows_Priority' rs:name='Priority' rs:number='5'>
         <s:datatype dt:type='string' dt:maxLength='512' />
      </s:AttributeType>
      <s:AttributeType name='ows_DueDate' rs:name='Due Date' rs:number='6'>
         <s:datatype dt:type='datetime' dt:maxLength='8' />
      </s:AttributeType>
      <s:AttributeType name='ows_PercentComplete' rs:name='% Complete' rs:number='7'>
         <s:datatype dt:type='float' dt:maxLength='8' />
      </s:AttributeType>
   </s:ElementType>
</s:Schema>
<rs:data>
   <z:row ows_Attachments='0' ows_LinkTitle='Task #1' ows_AssignedTo='1;#Smith, John' ows_Status='Not Started' ows_Priority='(2) Normal' ows_DueDate='2011-02-28 00:00:00' ows_PercentComplete='0' />
   <z:row ows_Attachments='0' ows_LinkTitle='Task #2' ows_AssignedTo='1;#Smith, John' ows_Status='Not Started' ows_Priority='(2) Normal' ows_DueDate='2011-02-28 00:00:00' ows_PercentComplete='0' />
   <z:row ows_Attachments='0' ows_LinkTitle='Task #3' ows_AssignedTo='1;#Smith, John' ows_Status='Not Started' ows_Priority='(2) Normal' ows_DueDate='2011-02-28 00:00:00' ows_PercentComplete='0' />
</rs:data>
</xml> 
There are three main sections
(1) The XML header, enclosed in the topmost <xml> tag.
(2) The schema or data description. Surrounded by the <s:Schema> tags.
(3) The data. Enclosed by the <rs:data> tags.
The strategy then, is to parse the schema first, and after having all
available attributes, we can then parse the data section to output
delimited columns.

The Perl code

I’m using Perl 5.12.1 with the below modules:
XML::Simple
XML::LibXML
XML::Simple allows you to choose a preferred parser, and I chose
XML::LibXML::SAX::Parser, since it supports name spaces, which are used in
the XML.
#!/usr/bin/perl
# --------------------------------------------------------------
my $VERSION = '$Id$';
# --------------------------------------------------------------
# Script Name:  xmlparse.pl
# Purpose:      To parse an XML file from a SharePoint list.
# --------------------------------------------------------------
use warnings; use strict;
use Getopt::Std; $Getopt::Std::STANDARD_HELP_VERSION = 1;
use XML::Simple;
$XML::Simple::PREFERRED_PARSER = "XML::LibXML::SAX::Parser";
# --------------------------------------------------------------


# --------------------------------------------------------------
# Print usage:
# --------------------------------------------------------------
sub print_usage {
    (my $usage = <<END_OF_HELP) =~ s/^ {4}//gm;

    Parse an XML file from a Sharepoint list and output a
    delimted version of the data.

    USAGE:
    $0 [-d DELIMITER] file1 [file2 file3 file4 ... filen]
    $0 -h prints this help message.

    Available options:

        -o output file (optional -- STDOUT is the default).
        -k the output delimiter (TAB is the default)
        -H print a header with the output.
        -M print meta-data (schema) only.
        -R remove extraneous IDs or data-types pre-pended to
           data elements

    EXAMPLE:
    $0 -o my_output.txt my_input.xml
    $0 my_input.xml >my_output.txt

END_OF_HELP

    print $usage;
}
sub HELP_MESSAGE { print_usage; }
sub VERSION_MESSAGE { print $VERSION; }
# --------------------------------------------------------------


# --------------------------------------------------------------
# Parse the command-line options:
# --------------------------------------------------------------
my %option; getopts("k:o:hHMR",\%option);
if ($option{h}) { print_usage; exit(0); }
my $odelm = "\t"; if ($option{k}) { $odelm = $option{k}; }
my $ofile = "-"; if ($option{o}) { $ofile = $option{o}; }
my $header  = 0; if ($option{H}) { $header  = 1; }
my $meta = 0; if ($option{M}) { $meta = 1; }
my $rcln = 0; if ($option{R}) { $rcln  = 1; }
if ($#ARGV < 0) { $ARGV[0] = "-"; }
# --------------------------------------------------------------


# --------------------------------------------------------------
# The main part of the script:
# --------------------------------------------------------------
my $xml = new XML::Simple;
open my $OFILE, "> $ofile" or die "Couldn't open file $ofile: $!";
for my $ifile (@ARGV) {

    my $sp_list = $xml->XMLin("$ifile");
    my @attr;

    #
    # First, parse the Schema.
    # We are setting up a description hash (%desc) for every column in the
    # schema and putting all descriptions in an attributes array (@attr),
    # which is indexed by the "rs:number" attribute.
    for my $col (keys %{$sp_list->{"s:Schema"}->{"s:ElementType"}->{"s:AttributeType"}}) {
        my %desc;
        $desc{"name"} = $col;
        $desc{"rs_name"} = $sp_list->{"s:Schema"}->{"s:ElementType"}->{"s:AttributeType"}->{$col}->{"rs:name"};
        $desc{"type"} = $sp_list->{"s:Schema"}->{"s:ElementType"}->{"s:AttributeType"}->{$col}->{"s:datatype"}->{"dt:type"};
        $desc{"maxl"} = $sp_list->{"s:Schema"}->{"s:ElementType"}->{"s:AttributeType"}->{$col}->{"s:datatype"}->{"dt:maxLength"};
        $attr[$sp_list->{"s:Schema"}->{"s:ElementType"}->{"s:AttributeType"}->{$col}->{"rs:number"} - 1] = \%desc;
    }

    #
    # Print the meta-data if needed:
    # This includes every attribute for every column.
    # (Note that if we output the schema, we don't go into the data)
    if ($meta) {
        my @header = ("COL","FIELD NAME","VISIBLE NAME","DATA TYPE","LEN");
        print join("$odelm",@header),"\n";
        for my $i (0..$#attr) {
            my @record;
            push @record, $i+1;
            push @record, $attr[$i]->{"name"};
            push @record, $attr[$i]->{"rs_name"};
            push @record, $attr[$i]->{"type"};
            push @record, $attr[$i]->{"maxl"};
            print join("$odelm",@record),"\n";
        }
        next;
    }

    #
    # Print the header if needed.
    # (This is the header for the SharePoint list.)
    if ($header and not $meta) {
        my @header;
        for my $i (0..$#attr) {
            $header[$i] = $attr[$i]->{"rs_name"};
        }
        print join("$odelm",@header),"\n";
    }
    #
    # Get the SharePoint list data and print it.
    unless($meta) {
        for my $row (@{$sp_list->{"rs:data"}->{"z:row"}}) {
            my @record;
            for my $i (0..$#attr) {
                if (exists($row->{$attr[$i]->{"name"}})) {
                    my $token = $row->{$attr[$i]->{"name"}};
                    if ($rcln) {
                        $token =~ s/^([0-9]+|float|string)\;\#//go;
                    }
                    push @record, $token;
                } else {
                    push @record, qq{};
                }
            }
            print {$OFILE} join("$odelm",@record),"\n";
        }
    }
}
close($OFILE);
# --------------------------------------------------------------

Running the script

If we’re just interested in the data, we can call the script as follows:
xmlparse.pl -RH test_list.xml > test_list.txt
And it will give a tab-delimited file.

You can also output the schema of the list with the following command:
xmlparse.pl -M test_list.xml
I’ve reformatted it into an ASCII table below:
.------------------------------------------------------------.
| COL | FIELD NAME          | VISIBLE NAME | DATA TYPE | LEN |
+-----+---------------------+--------------+-----------+-----+
|   1 | ows_Attachments     | Attachments  | boolean   |   1 |
|   2 | ows_LinkTitle       | Title        | string    | 512 |
|   3 | ows_AssignedTo      | Assigned To  | string    | 512 |
|   4 | ows_Status          | Status       | string    | 512 |
|   5 | ows_Priority        | Priority     | string    | 512 |
|   6 | ows_DueDate         | Due Date     | datetime  |   8 |
|   7 | ows_PercentComplete | % Complete   | float     |   8 |
'-----+---------------------+--------------+-----------+-----'

Additional comments

Overall this code is fairly reusable as the structure of the XML is consistent
across various SharePoint lists. And since it is scriptable, it is easy
automate the update of a database table from a SharePoint list.

You can download the files mentioned above, here: xmlparse.pl,
test_list.xml.
It is possible to do the download step in Perl as well by using LWP::UserAgent,
but this is left as an exercise.