Archive for the ‘ETL’ Category

Extracting Lotus Notes Data and Importing to SharePoint List.

Friday, March 26th, 2010

Step 1: export Lotus Notes database to XML. You can either write a Notes agent or write a java program that does it. I did the latter. The export extracts all fields including rich text, attachment, doc link, embedded image, etc. The attachments and embedded images are encoded in base64.

My first attempt at this was to write an extract in Java, but as I learned later it is much easier handling XML. Plus I had written several parsers before. Keep in mind that everything is extracted, so before extracting a database w/ many document w/ many LARGE attachments you may want to allocated additional time for the effort.

Step 2: Import to SharePoint List: Here is my technique. write java program that parses the extracted XML documents and build javascript/html file that calls SharePoint web services to insert items into the List. Then I import the html file to a document library. The java program builds a button that when clicked runs the javascript that insert items into the list. All the variables such as database name, server name, list name are read from a properties file stored locally.

The program also extracts all Notes attachments. Attachments are actually in the XML file and stored in base64 format. The program builds javascript/ajax code that calls add attachment web service for a given item.

This has been a big hit because the java programs runs outside Notes and extract user defined fields and runtime variables are driven by the property file.

This technique does not use .NET at all. Good thing because I do not know much about it. It is very simple and works for every database, both local and server based.

Extract filename from string

Thursday, February 5th, 2009

The following Perl program extracts a filename from a given string. It takes one string parameter.


Syntax:
Perl FilenameExtract.pl Paramerter1

Example:
Perl FilenameExtract.pl http://www.csatechconsulting.com/images/WashingtonFlags.JPG

Returns: WashingtonFlags.JPG

FilenameExtract.pl:

if (@ARGV ==0) { print "You must include one parametern"; exit(); } $_ = $ARGV[0]; $wholePath = ~m{([^/]*)$}; $fileName = $1; print "$1n";

RSS Transformer

Wednesday, January 28th, 2009

This program takes a RSS feed either from a file or a URL, applies a stylesheet (either file or URL) to it and produces html for viewing with a browser. I compiled and ran an example on my mac - see the html attachment below.

There are three parameters:

  1. xml - may be file or url
  2. xsl - may be file or url
  3. html - file that the transformer puts results

There are hundreds of RSS publishers. From CNN to Reuters to NBC to NPR and everything in between. Google ‘RSS News’ and you will see what I mean.

I apologize for the plain presentation. When time permits, I will add styles (css) to the xsl to shine it up for acceptable presentation.

As a quick note, I use this a lot to test xsl before I move them to my J2EE container. Moving them to the container may take up to 15 minutes at my site. It may not sound much, but in general you may have to make several rounds of changes (i.e. debug) before the xsl is correct. So this is a huge savings as the utility runs in less than three seconds. Why not let the browser transform the xsl? You’d be surprised how different the browser transformers are compared to the java one. Just as a side note, I used this utility today several times as I was testing a new xsl. This is a great utility!

Examples:

NPR Topics: News
java Rss http://www.npr.org/rss/rss.php?id=1001 http://csatechconsulting.com/attachments/055_Rss.xsl Rss.html
Reuters: Top News
jara Rss http://feeds.reuters.com/reuters/topNews http://csatechconsulting.com/attachments/055_Rss.xsl Rss.html

BBC Front Page:
java Rss http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml http://csatechconsulting.com/attachments/055_Rss.xsl Rss.html

RSS Tracking FedEx Package:
java Rss http://www.simpletracking.com/RSSTracking.aspx?TrackingNumber=979887377090 http://csatechconsulting.com/attachments/055_Rss.xsl Rss.html

Rss.java

import java.io.*; import javax.xml.transform.Source; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.stream.StreamResult; import javax.xml.transform.stream.StreamSource; import javax.xml.transform.TransformerException; public class Rss { public static void Rss (String xml, String xsl, String out){ try{ Source xmlSource = new StreamSource(xml); Source xslSource = new StreamSource(xsl); TransformerFactory tf = TransformerFactory.newInstance(); Transformer transformer = tf.newTransformer(xslSource); FileOutputStream file = new FileOutputStream(out); transformer.transform(xmlSource, new StreamResult(file)); } catch (TransformerException e){ System.out.println("Transformer error:" + e.getCause()); } catch (Exception e){ System.out.println("Error:" + e.getCause()); } } public static void main(String[] args){ String xml = args[0]; // xml file or url String xsl = args[1]; // xsl file or url String output = args[2]; // html output file Rss(xml, xsl, output); } }

XSL:
http://csatechconsulting.com/attachments/055_Rss.xsl

Output (from Rss.java):
http://csatechconsulting.com/attachments/055_Rss.html

Download Oracle Blob to File

Friday, January 23rd, 2009

This example reads a binary field (BLOB) from oracle and extracts it to a file. It uses a properties file to obtain the Oracle userid, password and environment. The program requires two parameters: 1) file name of the attachment that is downloaded, 2) document id of the existing row.

Properties file:
userid=myuserid
pass=mypassword
env=myenvironment

How to run:
java AttachmentDownload parameter1 parameter 2
where parameter1 = filename of the downloaded attachment and parameter2 = document id of the row that contains the attachment

Example:
java AttachmentDownload atestdocument.doc 121

Oracle:
Table Name: MYTABLE
Column Name: DOC_ID type number, ATTACHMENT_BLOB type blob

Please note: In order to run this successful, you will need 9.2.07 or higher Oracle jdbc driver, otherwise you will get runtime errors.


AttachmentDownload.java

import java.io.*; import java.sql.*; import java.util.*; import java.text.*; import oracle.sql.BLOB; import oracle.jdbc.*; import oracle.jdbc.oci8.OCIDBAccess; public class AttachmentDownload { public static void AttachmentDownload (String fName, String docId){ String PROPERTYFILE = "c://temp//my.properties"; // location of properties file StringBuffer queryString = new StringBuffer(); // used to build sql String userid = null; String pass = null; String env = null; InputStream is = null; BLOB blob = null; try{ // read properties file to get userid, password and environment BufferedReader in = new BufferedReader(new FileReader(PROPERTYFILE)); String line; // go through each line and get tag/value pair // tag and value are separated by equals ‘=’ while((line = in.readLine()) != null){ StringTokenizer st = new StringTokenizer(line,"="); while (st.hasMoreTokens()){ String token = st.nextToken(); if(token.equals("userid")){ userid = st.nextToken(); } if(token.equals("pass")){ pass = st.nextToken(); } if(token.equals("env")){ env = st.nextToken(); } } } in.close(); // close properties file Class.forName("oracle.jdbc.driver.OracleDriver"); // get a connection Connection con = DriverManager.getConnection("jdbc:oracle:oci:@ " + env, userid, pass); // build SQL queryString.append("SELECT ATTACHMENT_BLOB FROM MYTABLE "); queryString.append("WHERE DOC_ID = " + docId); // run query ResultSet rs = stmt.executeQuery(queryString.toString()); // get results - will be only one as the docId are unique while (rs.next()) { // get blob blob = ((OracleResultSet)rs).getBLOB(1); // get length long blob_length = blob.length(); // stream to input stream is = blob.getBinaryStream(); // create attachment output file FileOutputStream file = new FileOutputStream(fName); // ready bytes and write to output file byte[] b = new byte[(int)blob_length]; int pos = 0; int length = 0; while((length = is.read(b)) != -1){ pos += length; file.write(b); // write chunk out to output file } file.flush(); file.close(); // close output file } rs.close(); // close result set stmt.close(); // close statement con.close(); // close connection } catch(SQLExecption ex){ ex.printStackTrace(); } catch(Exception ex){ ex.printStackTrace(); } } public static void main(String[] args){ String fName = args[0]; // attachment file name String docId = args[1]; // document id AttachmentDownload(fName, docId); } }

Upload File to Oracle Blob

Tuesday, January 20th, 2009

This example reads a file from storage and uploads the content to an Oracle blob. It uses a properties file to obtain the Oracle userid, password and environment. The program requires two parameters: 1) file to be uploaded, 2) document id of the existing row.

Properties file:
userid=myuserid
pass=mypassword
env=myenvironment

How to run:
java AttachmentUpload parameter1 parameter 2
where parameter1 = file to be uploaded and parameter2 = document id of the row to be updated

Example:
java AttachmentUpload c:\temp\atestdocument.doc 121

Oracle:
Table Name: MYTABLE
Column Name: DOC_ID type number, ATTACHMENT_BLOB type blob


AttachmentUpload.java

import java.io.*; import java.io.ByteArrayOutputStream; import java.lang.*; import java.util.StringTokenizer; import java.io.InputStream; import java.sql.*; import java.sql.DriverManager; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; public class AttachmentUpload { public static void AttachmentUpload(String fileDir, String docId) { try{ //get a file object File f = new File(fileDir); // it must be a file, otherwise fall through if (f.isFile()){ UpdateItem(docId, f); // update attachment with file } } catch (Exception e) { e.printStackTrace(); } } public static void UpdateItem(String docId, File f){ String PROPERTYFILE = "c://temp//my.properties"; // location of properties file StringBuffer queryString = new StringBuffer(); // used to build sql String userid = null; String pass = null; String env = null; try{ // read properties file to get userid, password and environment BufferedReader in = new BufferedReader(new FileReader(PROPERTYFILE)); String line; // go through each line and get tag/value pair // tag and value are separated by equals "=" while((line = in.readLine()) != null){ StringTokenizer st = new StringTokenizer(line,"="); while (st.hasMoreTokens()){ String token = st.nextToken(); if(token.equals("userid")){ userid = st.nextToken(); } if(token.equals("pass")){ pass = st.nextToken(); } if(token.equals("env")){ env = st.nextToken(); } } } in.close(); // close properties file Class.forName("oracle.jdbc.driver.OracleDriver"); // get a connection Connection con = DriverManager.getConnection("jdbc:oracle:oci:@ " + env, userid, pass); // get an input stream for the attachment file java.io.InputStream fin = new java.io.FileInputStream(f.getAbsoluteFile()); long sizeInBytes = f.length(); // get size of attachment file // build SQL queryString.append("UPDATE MYTABLE SET ATTACHMENT_BLOB = ? "); queryString.append("WHERE DOC_ID = " + docId); PreparedStatement pstmt = con.prepareStatement(queryString.toString()); pstmt.setBinaryStream(1,fin,(int)sizeInBytes); pstmt.execute(); // update blob pstmt.close(); // close statement fin.close(); // close attachment file con.close(); // close connection } catch(SQLException ex){ ex.printStackTrace(); } catch(Exception ex){ ex.printStackTrace(); } } public static void main(String[] args){ String fileDir = args[0]; // attachment file String docId = args[1]; // document id AttachmentUpload(fileDir, docId); } }

Extract/Transform Lotus Notes Database(s) to XML

Sunday, January 18th, 2009

This program extracts data from Lotus Notes Databases from a specified location. The location is derived from the extract.properties file. The file contains tag/value pairs that identify starting and output locations. For example, if the starting location is the c: drive, the program will extract all Lotus Notes database from the c: drive, including all sub-directories. In short, it will extract all databases from the drive. The intent was to segment extracts by providing capability to exclude unnecessary access points. It must be noted that data sets  may be huge and caution is advised before running the extract. This program wraps data with xml tags and uses Notes field names as element names. Rich text data is converted to string format and stored accordingly. Attachments are extracted in native form.

This program also extract “system” data such as session and database data. This provides useful data such as users list, group names, log lists, access control, server name, etc. See extract for all system elements.  Also extracted are view and form names. It must be noted that only view and form names are extracted as it was determined that the content itself was of limited value. File names are generated by using the following method: Lotus Notes Universal ID: this is a 32-character combination of hex digits that uniquely identifies Lotus Notes documents across all replicas of a database. Since it is possible to have databases from many environments - depending on the collection - it was not clear if the Universal ID was indeed unique in this instance, so a random number using the timestamp in milliseconds as a seed value was appended to the Universal ID.

The output format looks like: for data elements:

for xml file: Output Folder + “/” + Universal ID + “_” + random number + “_” + “.xml”

for attachments: Output Folder  + “/” + Universal ID + “_” + random_number + “_” + Attachment Name

With this method, it is possible (but unlikely) that file names may be duplicate, especially if the extract is run on multiple computers at the same time. This is reasonable with large collections of data. Although it is not shown here, appending the computer name to the Universal ID and random number would lessen the likelihood of duplicate file names. What we do not want to do is write over (destroy) extracted data as the program was designed to be used in various situations and does not provide any warning of duplication. It was decided not to add warning capability because the intent was to run the extract with little to no intervention as it was unclear the technical level of the operator running the extract.

This program only requires the Lotus Notes API (notes.jar) and JVM. This version of the extract run on Microsoft Windows, but can be easily enhanced to run in other environments such as Unix/Linux.  This program was developing using version 7 of the API, but is generic enough to work with any version of the API. This program does not need the Notes client to work. This program uses generic java classes and methods and does not require any specific jar files other than the API.

The properties file has two element pairs; 1) inFolder. The extract uses this value to start looking for databases, 2) outFolder. The extract uses this value to store extracted data.

Example of properties file:

extract.properties:

  • inFolder=c:
  • outFolder=c:\extract

The path to the properties file is “hard” coded. The program checks to see if the outFolder exists. If it does not, the program creates it. The program uses the DOS ‘dir’ command to get a list of all the Lotus Notes databases for a given directory.

It uses three parameters:

  • ‘/b’ no heading information or summary
  • ‘/n’ long list format
  • ‘/s’ shows file in specified directory and all subdirectories under that

An example of one result set is:  C:\ADIRECTORY\TEST.NSF

For more information, open a windows command window and type ‘HELP DIR’.

——————————————

Note: Because of logistics problems, I had scan the extract program, and unfortunately I discovered that I need a new scanner. The errors seems to be case in nature and ‘1′ and ‘l’, and ‘i’ and ‘l’ are also in error.  This program has been tested in an operational environment and works fine, but scanner errors as mentioned above may cause compile errors, assuming you try to use this program.  I did my best to correct, but I could have missed one or two.

NotesExtractor.java