Tuesday, July 20, 2010

Writing large excel files (Excel 2007)

I recently was working on an assignment of exporting data from database to excel-2007. This made me hunt for an API that would take care of low-level excel format handling. There are probably many Java API's available but I am aware of JExcel and Apache POI. I had chosen JExcel earlier to generate and download ".xls" documents on a portal. There was no particular reason I chose JExcel over Apache POI, but I was curious about Apache POI and this was my chance.

Apache POI works very well and has a huge array of functionalities to write sophisticated xlsx documents. Apache POI permits writing xls documents as well using the HSSF Suite of packages. There is a prominent difference in the internal structure of xls(Excel-97 - 2003) documents. To demonstrate this: Create two documents and save one with ".xls" extension and the other with ".xlsx" extension. Some pre-requisite to doing this is: you must have Office 2007 to create xlsx documents and you must have some Zip/UnZip software like WinRar, Zip etc installed and working. Right Click each of the documents and try to open with Zip software you have. ".xlsx" opens up and shows directory structure of what looks like, whereas the ".xls" document refuses to open.:

Structure

/
- ..
- _rels
- .rels
- docProps
- app.xml
- core.xml
- xl
- _rels
- worksheets
- sheet1.xml
- sheet2.xml
- sharedStrings.xml
- styles.xml
- workbook.xml
- [Content_Types].xml

So much for the introduction to .xlsx document. Now getting back to the writing some code to generate excel'07 docs. Check out this link for getting started quickly: http://poi.apache.org/spreadsheet/quick-guide.html.

Within some time I was able to generate excel reports by spitting the data fetched from database. However, for large reports having several thousands of rows and tens of columns I started facing fatal "OutOfMemory Error". Our server hosts 8 GB RAM and I was testing this on my 2 GB laptop. Probably if I'd run the same on server it would not have thrown up. But I did not want to take chances with multiple users trying to generate reports simultaneously which could exhaust our server

I see Apache POI has API that is capable of opening streams from xlsx documents. So I tried this approach of writing a few rows...closing the stream and then reopening it and start re-writing it. I was hoping that all data would be flushed from physical memory....but it did not go as expected. Here is the output of my code:


Please wait while data is being written...
TOTAL COLS:13
Rows updated = 2000
WorkSheet name REPORT3_VIEW
Rows updated = 4000
Connection OFF
Connection Count: 0
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.apache.xmlbeans.impl.store.CharUtil.allocate(CharUtil.java:397)
at org.apache.xmlbeans.impl.store.CharUtil.saveChars(CharUtil.java:441)...


I would not suggest this approach for two reasons:
* it is slow
* it still causes java to retain document in the memory causing heap space to exhaust. I tried with smaller value for MAX_ROWS_UPDATED_AT_ONCE and it perhaps met the same fate...see below:



Headers written..
Please wait while data is being written...
TOTAL COLS:13
Rows updated = 200
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 400
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 600
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 800
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 1000
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 1200
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 1400
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 1600
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 1800
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 2000
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 2200
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 2400
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 2600
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 2800
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 3000
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 3200
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 3400
WorkSheet nameAFP_REPORT3_VIEW
Rows updated = 3600
Connection OFF
Connection Count: 0
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space



Later I came across a solution from Yegor Koslov which mitigates the error above. In brief, his solution works like this: (1) A template xlsx doc is created which indicates the number of sheets to be written and the sheet names etc (2) All the data is dumped into an xml file (3) Build the zip file with the structured described above. This zip file is nothing but .xlsx doc. Details can be found here:http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/usermodel/examples/BigGridDemo.java

Extending his logic to add more sheets in the xlsx document:

Disclaimer: Please use it at your own risk.


Yegor's solution worked very well for me and I chose to customize it a little bit. Any feedback would be helpful.

package ....common.util.excel;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.Writer;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Enumeration;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.zip.ZipEntry;
import java.util.zip.ZipException;
import java.util.zip.ZipFile;
import java.util.zip.ZipOutputStream;

import ....util.excel.BigGridDemo.SpreadsheetWriter;
import ....util.excel.SheetNotFoundException;
import org.apache.poi.ss.usermodel.BorderStyle;
import org.apache.poi.ss.usermodel.IndexedColors;
import org.apache.poi.xssf.usermodel.XSSFCellStyle;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import ...common.util.db.ExportDBView;
import ...logging.Logger;



public class Excel07WorkbookWriter {
private XSSFWorkbook xwb;
private String xlsxFileName;
private Map styles;
private LinkedHashMap sheets;
private LinkedHashMap sheetData ;
private HashMap sheetSheetRefRelation;
private static Logger logg = Logger.getInstance(Excel07WorkbookWriter.class);

/*
* @ Border selection will be same across all 4 cell borders
* @ alignment
*/
private XSSFCellStyle createDataCellStyle(XSSFCellStyle style, BorderStyle borderStyle, short horAlign, short verAlign,
boolean wrapText, boolean even){
style.setBorderBottom(borderStyle);
style.setBorderLeft(borderStyle);
style.setBorderRight(borderStyle);
style.setBorderTop(borderStyle);
style.setAlignment(horAlign);
style.setVerticalAlignment(verAlign);
style.setWrapText(wrapText);
if (even){
//style.setFillForegroundColor(this.DATA_FOREGROUND_COLOR_EVEN);
style.setFillForegroundColor(IndexedColors.LIGHT_CORNFLOWER_BLUE.getIndex());
}else{
style.setFillForegroundColor(IndexedColors.LIGHT_YELLOW.getIndex());
}
style.setFillPattern(XSSFCellStyle.SOLID_FOREGROUND);
return style;
}

/*
* @ Border selection will be same across all 4 cell borders
* @ alignment
*/
private XSSFCellStyle createHeadingCellStyle(XSSFCellStyle style, BorderStyle borderStyle, short horAlign, short verAlign,
boolean wrapText){
style.setBorderBottom(borderStyle);
style.setBorderLeft(borderStyle);
style.setBorderRight(borderStyle);
style.setBorderTop(borderStyle);
style.setAlignment(horAlign);
style.setVerticalAlignment(verAlign);
style.setWrapText(wrapText);
style.setFillForegroundColor(IndexedColors.GREY_25_PERCENT.getIndex());
style.setFillPattern(XSSFCellStyle.SOLID_FOREGROUND);
return style;
}

/*
* Create styles
*
*/
private Map getDefaultStyles(XSSFWorkbook wb){
Map styles = new HashMap();
//XSSFDataFormat fmt = wb.createDataFormat();
XSSFCellStyle headerCellStyle = wb.createCellStyle();
headerCellStyle = createHeadingCellStyle(headerCellStyle,BorderStyle.THIN,
XSSFCellStyle.ALIGN_GENERAL, XSSFCellStyle.VERTICAL_TOP, false);
styles.put("headerRowStyle", headerCellStyle);

XSSFCellStyle dataCellEvenRowStyle = wb.createCellStyle();
dataCellEvenRowStyle = createDataCellStyle(dataCellEvenRowStyle,BorderStyle.THIN,
XSSFCellStyle.ALIGN_GENERAL, XSSFCellStyle.VERTICAL_TOP, true,true);
styles.put("dataRowEvenStyle", dataCellEvenRowStyle);

XSSFCellStyle dataCellOddRowStyle = wb.createCellStyle();
dataCellOddRowStyle = createDataCellStyle(dataCellOddRowStyle,BorderStyle.THIN,
XSSFCellStyle.ALIGN_GENERAL, XSSFCellStyle.VERTICAL_TOP, true,false);
styles.put("dataRowOddStyle", dataCellOddRowStyle);

return styles;
}

/*
*
* @xlsxfilename with or without ".xlsx"
* @LinkedHashMap stores sheetnamea and sheet Obj
* @Map Map containing styles. Can be left NULL
*/

public void init(String xlsxfilename, LinkedHashMap sheets, Map styles) throws NullPointerException, IOException{
logg.info("Start init() for "+xlsxfilename);
this.xlsxFileName = xlsxfilename;
xwb = new XSSFWorkbook();
if ((null == sheets) ||(sheets.isEmpty())) throw new NullPointerException("LinkedHashMap sheets NULL!");


Iterator sheetIter = sheets.keySet().iterator();
while(sheetIter.hasNext()){
String sheetName = (String)sheetIter.next();
XSSFSheet asheet = xwb.createSheet(sheetName);
sheets.put(sheetName, asheet);
logg.info("Adding sheet:"+sheetName);
}
// create template xlsx
FileOutputStream os = new FileOutputStream("template.xlsx");
xwb.write(os);
os.close();

if (null == styles) styles = getDefaultStyles(xwb);

// holds mapping between sheetName and sheetref
sheetSheetRefRelation = new HashMap();

// populate Sheet Data with sheet reference
LinkedHashMap sheetData = new LinkedHashMap();
sheetIter = sheets.keySet().iterator();
while(sheetIter.hasNext()){
String _sheetName = sheetIter.next();
String _sheetNameRef = sheets.get(_sheetName).getPackagePart().getPartName().getName();
// substring to remove leading "/" from /xl/worksheets/sheet1.xml
sheetData.put(_sheetNameRef.substring(1), null);
//substring to remove leading "/" from /xl/worksheets/sheet1.xml
sheetSheetRefRelation.put(_sheetName, _sheetNameRef.substring(1));
//logg.info("sheetSheetRefRelation:"+_sheetName +":"+_sheetNameRef.substring(1));
}

this.sheets = new LinkedHashMap();
this.sheets = sheets;
this.sheetData = new LinkedHashMap();
this.sheetData = sheetData;
logg.info("End init(): sheetSheetRefRelation{}:"+sheetSheetRefRelation);

}

private String checkExceptions(String inpStr){
// if null then return ""
if (null == inpStr) return "";

return inpStr.replace("&", "&");
}

/*
This method does the actual hardwork of writing data as xml.
I have used my styles (color and format).you can ignore this or replace what suits you.
I have added a checkExceptions method which takes care of wrapping xml illegal characters that are not illegal in db.
For instance: if you have "&" in your data and you remove checkExceptions() below...the xlsx document will be generate
....however...the sheet containing this "&" will not be shown up even if the data was present."&" is not acceptable in xml
and since we are dealing with xmls here (zipped into xlsx) we need to take care of wrapping it up...I may not have covered all
illegal patterns...but that can be added into checkExceptions() as required.

*/

private File writeSheet(short headerStartRow, short numOfColumns, LinkedHashMap columnNames, ResultSet data) throws IOException, SQLException{
File tmp = File.createTempFile("sheet", ".xml");
Writer fw = new FileWriter(tmp);
SpreadsheetWriter sw = new SpreadsheetWriter(fw);
sw.beginSheet();

// insert header row
sw.insertRow(headerStartRow);
int styleIndex = -1;
if (null != styles) styleIndex = styles.get("headerRowStyle").getIndex();

Iterator lhm_iter = columnNames.keySet().iterator();
for(int columnIndex = 0; columnIndex <>
String nxtval = (String)lhm_iter.next();
System.out.println(""+columnIndex+":"+ nxtval);
sw.createCell(columnIndex,nxtval, styleIndex);
}
sw.endRow();
// insert header row

// insert data rows
while(data.next()){
sw.insertRow(++headerStartRow);
for(int columnIndex = 0; columnIndex <>
//String nwStr = (String)data.getString(columnIndex+1);
//if (null == nwStr) nwStr = "";
String nwStr = checkExceptions((String)data.getString(columnIndex+1));
//System.out.println("nwStr: "+nwStr);
if ((headerStartRow %2) == 0){// even # row
if (null != styles) styleIndex = styles.get("dataRowEvenStyle").getIndex();
sw.createCell(columnIndex, nwStr, styleIndex);
}else{ // odd # row
if (null != styles) styleIndex = styles.get("dataRowOddStyle").getIndex();
sw.createCell(columnIndex, nwStr,styleIndex);
}
}
sw.endRow();
}
// end insert data rows

sw.endSheet();

fw.close();
return tmp;
}

/*
Fetches

*/
public void importViewData(String viewName, short startSheetRowNumber) throws NullPointerException, SQLException, SheetNotFoundException, IOException{
logg.info("Start:importViewData(): "+viewName);
/// This is my standalone class that returns resultset and does the connection maintenance work.
ExportDBView eview = new ExportDBView(viewName);

//@throws NullPointer and SQLException
ResultSet rs = eview.initExport();
LinkedHashMap columnNames = eview.getResultSetMetaData();
short col_sz = (short)columnNames.size();
logg.info(viewName+": columnsize: "+col_sz);


//@throws IOException
File tmp = writeSheet((short)startSheetRowNumber, col_sz, columnNames, rs);

// fetch sheetref using viewName
String _sheetref = sheetSheetRefRelation.get(viewName);
if (null == _sheetref) {
eview.cleanup();// cleanup anyway
throw new SheetNotFoundException("viewName does not exist as a Sheet in this workbook");
}
// put the sheet data alongside _sheetref
logg.info("Adding sheetdata for sheetref:"+_sheetref);
this.sheetData.put(_sheetref, tmp);
eview.cleanup();
logg.info("End:importViewData(): "+viewName);
}

private static void copyStream(InputStream in, OutputStream out) throws IOException {
byte[] chunk = new byte[1024];
int count;
while ((count = in.read(chunk)) >=0 ) {
out.write(chunk,0,count);
}
}


/*
Adapted Yegor's substitute() method to add more than 1 sheets created dynamically into workbook
*/
public void flushDataIntoWorkbook() throws ZipException, IOException{
logg.info("Start:flushDataIntoWorkbook(): started writing into workbook...");
OutputStream out = new FileOutputStream(this.xlsxFileName);
ZipFile zip = new ZipFile(new File("template.xlsx"));

ZipOutputStream zos = new ZipOutputStream(out);

@SuppressWarnings("unchecked")
Enumeration en = (Enumeration) zip.entries();
while (en.hasMoreElements()) {
ZipEntry ze = en.nextElement();
Iterator sheetIter = this.sheetData.keySet().iterator();
boolean entryMatches = false;
while(sheetIter.hasNext()){
String entry = sheetIter.next();
entryMatches = ze.getName().equals(entry);
if (entryMatches == true) break;
}
if(!entryMatches){
zos.putNextEntry(new ZipEntry(ze.getName()));
InputStream is = zip.getInputStream(ze);
copyStream(is, zos);
is.close();
}
}

// add all sheets
Iterator sheetIter = this.sheetData.keySet().iterator();
//InputStream is = null;
while(sheetIter.hasNext()){
String key = (String)sheetIter.next();
logg.info("Writing sheet: "+key);
zos.putNextEntry(new ZipEntry(key));
InputStream is = new FileInputStream(this.sheetData.get(key));
copyStream(is, zos);
is.close();
zos.closeEntry();
}
zos.close();
logg.info("End:flushDataIntoWorkbook(): Finished writing...cleaning up resources");
}


/*
* Sample test:
* In my set up I have Oracle 9i setup and 3 views as mentioned below. I extract data from these 3 views and
* and write it to excel sheet. Here is a simple stored procedure I use to export any view given the name:
* ///////////////////////ORACLE STORED PROCEDURE/////////////////////////////////////
* CREATE OR REPLACE PROCEDURE "MYSCHEMANAME"."EXPORT_VIEW"(in_table IN VARCHAR2,
p_cursor OUT SYS_REFCURSOR)
is
begin
open p_cursor FOR
'select * from ' || in_table;
end EXPORT_VIEW;

*
* ///////////////////////////////////////////////////////////////////////////////////
* Note !!: MYSCHEMANAME is the default schema when you log in. EXPORT_VIEW is my procedure name.
* ExportView (see imports above) is a standalone class I use to fetch data as resultset. I used ojdbc14.jar
* for JDBC support.
*
*/
public static void main(String []args){
Excel07WorkbookWriter eww = new Excel07WorkbookWriter();
LinkedHashMap sheets = new LinkedHashMap();
sheets.put("REPORT1_VIEW", null);
sheets.put("REPORT2_VIEW", null);
sheets.put("REPORT3_VIEW", null);
try {
eww.init("myxls.xlsx", sheets, null);
} catch (NullPointerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

try {
eww.importViewData("REPORT1_VIEW",(short)0);// the view names should match the view names in sheet.put(...) above
eww.importViewData("REPORT2_VIEW",(short)0);
eww.importViewData("REPORT3_VIEW",(short)0);
} catch (NullPointerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SheetNotFoundException e) { // Custom Exception class written. It extends Exception
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

try {
eww.flushDataIntoWorkbook();
} catch (ZipException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

}
}






Wednesday, November 19, 2008

Tex.LaTeX..TeX..LaTeX...TeX...LaTeX....TeX....LaTeX.....TeX.....LaTeX

Hi ,
I just finished writing my Masters Thesis report...great relief...some free time to watch the season change...Fall, to be specific. I have a lot of experience writing lifeless documents, but believe me, the last one I did was real piece of satisfaction, no grappling with any formatting mismatches. How did this happen??....OK this is the reason why I am putting together my lazy butt to write this Blog. I was not a great open-source enthusiast or anything...yeah but I love the concept of 'free software'...(of course, if It was not written by me ;o)) ....jokes apart..cheers to the community!!.
I have an experience to share...which many of the document gurus would know since birth...but still just wanted to gather minimum details in simple words for someone impatient like me, unaware of LaTeX.... I am talking about LaTeX....as in Documentation using LaTeX. In this blog I will show you how to setup LaTeX, without spending a cent(पैसा). Without wasting any more time lets get started.

Sitting on a MS Windows machine? does not matter...
Installing LaTeX:
1। First you need to install Miktex on your machine. Dont ask me why!
http://miktex.org/
  • Tip: it takes some time to download....
2. Install a text-editor to edit your documents . I used Crimson Editor which supports formatting for LaTeX documents. Crimson Editor is a quality text editor, in my opinion, and is free!!!
  • Available at: http://www.crimsoneditor.com/
3. Customizing Crimson editor: Crimson provides customizable functions which can be modified as required....simply said, SHORTCUTS
Go To Tools> Preferences>Tools>User Tools
Enter Menu Text, Command, Argument, Initial Directory etc.

LaTeX documents need to be compiled! yeah u heard it right but this little effort is justified....why? you will realize soon!

For example to compile my LaTeX doc and see the resulting PDF I did the following:
  1. Compile TeX file: Produces DVI (device independent output) file. --> abc.dvi
  • Menu text: Compile TeX
  • Command: Path\to\Miktex\bin\directory\latex.exe
  • Argument:$(FileName)
  • Initial dir: $(FileDir)
2. Configure Yap( Yet another Previewer) to preview how ur document looks...

  • Menu text: Open File using Yap
  • Command: Path\to\Miktex\bin\directory\yap.exe
  • Argument:$(FileTitle).dvi
  • Initial dir: $(FileDir)
3. Create quality pdf documents:
  • Menu text: Create PDF
  • Command: Path\to\Miktex\bin\directory\dvipdfm.exe
  • Argument:$(FileTitle).dvi
  • Initial dir: $(FileDir)

Apply > OK

4. Thats it....look up a sample latex doc in google is all u need to produce flawless documentation. By flawless I mean...auto-equation numbering, Table of COntents, Sections, Appendix andBibliography taken care of in a fairly intuitive, organized and reusable way.

5. Oh!! what about diagrams, graphs, charts?? Yes your LaTeX document requires all the images, diagrams charts be available in encapsulated post script (.eps) format....well...help is available on the next line:

Check out ImageMagick: http://www.imagemagick.org/script/binary-releases.php


For example: on a Windows machine
Save Excel charts as .bmp or jpg images using MSPaint. I assume you have downloaded and installed ImageMagick by now,
just open a command prompt and type:
convert abc.jpg -compress none eps2:abc.eps

Now include this figure in your document using this simple macro:
\begin{figure}[htbp]
\centering
\includegraphics[width=12cm, height=9cm]{abc.eps}
\caption{Figure Caption goes here}
\label{fig:reference}
\end{figure}

you are smart enough to customize dimensions, caption and label for figure

Voila! you have ur eps figure ready to be embedded in your document!!



if you chose to write ur thesis using LaTeX, your school will probably have a template, besides a lot of useful online material available..just like this one[;-)]..

So, if you have a comment or two ..would like to hear from you!!

and as all LaTeX guides end with, "Happy LaTeXing!"