This program extracts data from Lotus Notes Databases from a specified location. The location is derived from the extract.properties file. The file contains tag/value pairs that identify starting and output locations. For example, if the starting location is the c: drive, the program will extract all Lotus Notes database from the c: drive, including all sub-directories. In short, it will extract all databases from the drive. The intent was to segment extracts by providing capability to exclude unnecessary access points. It must be noted that data sets may be huge and caution is advised before running the extract. This program wraps data with xml tags and uses Notes field names as element names. Rich text data is converted to string format and stored accordingly. Attachments are extracted in native form.
This program also extract “system” data such as session and database data. This provides useful data such as users list, group names, log lists, access control, server name, etc. See extract for all system elements. Also extracted are view and form names. It must be noted that only view and form names are extracted as it was determined that the content itself was of limited value. File names are generated by using the following method: Lotus Notes Universal ID: this is a 32-character combination of hex digits that uniquely identifies Lotus Notes documents across all replicas of a database. Since it is possible to have databases from many environments - depending on the collection - it was not clear if the Universal ID was indeed unique in this instance, so a random number using the timestamp in milliseconds as a seed value was appended to the Universal ID.
The output format looks like: for data elements:
for xml file: Output Folder + “/” + Universal ID + “_” + random number + “_” + “.xml”
for attachments: Output Folder + “/” + Universal ID + “_” + random_number + “_” + Attachment Name
With this method, it is possible (but unlikely) that file names may be duplicate, especially if the extract is run on multiple computers at the same time. This is reasonable with large collections of data. Although it is not shown here, appending the computer name to the Universal ID and random number would lessen the likelihood of duplicate file names. What we do not want to do is write over (destroy) extracted data as the program was designed to be used in various situations and does not provide any warning of duplication. It was decided not to add warning capability because the intent was to run the extract with little to no intervention as it was unclear the technical level of the operator running the extract.
This program only requires the Lotus Notes API (notes.jar) and JVM. This version of the extract run on Microsoft Windows, but can be easily enhanced to run in other environments such as Unix/Linux. This program was developing using version 7 of the API, but is generic enough to work with any version of the API. This program does not need the Notes client to work. This program uses generic java classes and methods and does not require any specific jar files other than the API.
The properties file has two element pairs; 1) inFolder. The extract uses this value to start looking for databases, 2) outFolder. The extract uses this value to store extracted data.
Example of properties file:
extract.properties:
- inFolder=c:
- outFolder=c:\extract
The path to the properties file is “hard” coded. The program checks to see if the outFolder exists. If it does not, the program creates it. The program uses the DOS ‘dir’ command to get a list of all the Lotus Notes databases for a given directory.
It uses three parameters:
- ‘/b’ no heading information or summary
- ‘/n’ long list format
- ‘/s’ shows file in specified directory and all subdirectories under that
An example of one result set is: C:\ADIRECTORY\TEST.NSF
For more information, open a windows command window and type ‘HELP DIR’.
——————————————
Note: Because of logistics problems, I had scan the extract program, and unfortunately I discovered that I need a new scanner. The errors seems to be case in nature and ‘1′ and ‘l’, and ‘i’ and ‘l’ are also in error. This program has been tested in an operational environment and works fine, but scanner errors as mentioned above may cause compile errors, assuming you try to use this program. I did my best to correct, but I could have missed one or two.
NotesExtractor.java