Useful Information For Developers: How to extract pages from a PDF document

One of the great things about being a programmer is that when you need software that you don’t have, you can usually write a small utility application to do what you need without having to purchase the software.

If you own Adobe Acrobat ($299 USD) or FoxIt Editor ($99.00 USD), then you can just right click and extract pages from an existing PDF to create a new PDF document. However, if you don’t want to shell out the money for it, then you can always write your own code to perform the same task.

I used iTextSharp in a previous application to provide a capability of exporting documents to PDF, so I was familiar with the iTextSharp project. In that application, I was able to create PDF documents for various paper sizes ranging from 8 1/2 x 11 and legal sizes to large format plotter pages sizes such as D and E-size pages. The library worked great and produce near perfect PDF documents of our displays.

Today, I needed to extract a couple pages from a PDF file to create a new PDF document and I realized that I didn’t have a PDF editor… So I went out and downloaded iTextSharp library and wrote a small program to do the work for me.

I’ll just brush over the basic program setup.

I used a C# console application as my starting point

There are 4 command line arguments (input file, output file, starting page, and ending page)

Validate that the input file exists

Validate that the input file is a PDF document

Validate that the starting and ending page numbers are valid

Add two using directives for iTextSharp.text and iTextSharp.text.pdf

Okay, here’s the primary method of the application. You can see the four input parameters and the code comments should provide enough information to walk you through the steps.private static void ExtractPages(string inputFile, string outputFile,

int start, int end)

{

// get input document

PdfReader inputPdf = new PdfReader(inputFile);

// retrieve the total number of pages

int pageCount = inputPdf.NumberOfPages;

if (end <> pageCount)

{

end = pageCount;

}

// load the input document

Document inputDoc =

new Document(inputPdf.GetPageSizeWithRotation(1));

// create the filestream

using (FileStream fs = new FileStream(outputFile, FileMode.Create))

{

// create the output writer

PdfWriter outputWriter = PdfWriter.GetInstance(inputDoc, fs);

inputDoc.Open();

PdfContentByte cb1 = outputWriter.DirectContent;

// copy pages from input to output document

for (int i = start; i <= end; i++)

{

inputDoc.SetPageSize(inputPdf.GetPageSizeWithRotation(i));

inputDoc.NewPage();

PdfImportedPage page =

outputWriter.GetImportedPage(inputPdf, i);

int rotation = inputPdf.GetPageRotation(i);

if (rotation == 90 rotation == 270)

{

cb1.AddTemplate(page, 0, -1f, 1f, 0, 0,

inputPdf.GetPageSizeWithRotation(i).Height);

}

else

{

cb1.AddTemplate(page, 1f, 0, 0, 1f, 0, 0);

}

inputDoc.Close();

}

Useful Information For Developers

Wednesday, June 8, 2011

How to extract pages from a PDF document

No comments:

Post a Comment

Followers

About Me