public class PDFReader
extends java.lang.Object
PDFReader reader = new PDFReader(new File("my.pdf"));
reader.open(); // open the file.
int pages = reader.getNumberOfPages();
for(int i=0; i < pages; i++) {
String text = reader.extractTextFromPage(i);
System.out.println("Page " + i + ": " + text);
}
... // perform other operations on pages.
reader.close(); // finally, close the file.
Main features:
Constructor and Description |
---|
PDFReader(java.io.File pdfFile)
Creates a new PDF reader for the given PDF file.
|
PDFReader(java.io.InputStream pdfStream)
Creates a new PDF reader with the specified stream as the input.
|
Modifier and Type | Method and Description |
---|---|
void |
close()
Closes the PDF and releases resources used.
|
java.lang.String |
extractTextFromPage(int pageIndex)
Extracts text from the specified page.
|
int |
getNumberOfPages()
Returns the total number of pages in the PDF.
|
java.awt.image.BufferedImage |
getPageAsImage(int pageIndex)
Renders the specified page at 200 DPI as a buffered image.
|
java.awt.image.BufferedImage |
getPageAsImage(int pageIndex,
int dpi)
Renders the specified page as a buffered image.
|
java.awt.Rectangle |
getPageSize(int pageIndex)
Returns the page size of the specified page.
|
PDFSecurityObject |
getSecurityObject()
Returns the security object.
|
static void |
main(java.lang.String[] args)
A utility that extract text from a PDF file.
|
void |
open()
Opens and parses the pdf content.
|
void |
open(java.lang.String password)
Opens and parses the pdf content.
|
void |
savePageAsImageFile(int pageIndex,
java.lang.String formatName,
java.io.File output)
Saves the specified page as an image file with the given format.
|
void |
setSecurityObject(PDFSecurityObject securityObject)
If the PDF is encrypted, you need to supply a security object to 'unlock' the PDF before open().
|
public PDFReader(java.io.InputStream pdfStream)
pdfStream
- public PDFReader(java.io.File pdfFile) throws java.io.FileNotFoundException
pdfFile
- java.io.FileNotFoundException
public void open() throws java.io.IOException
This method may throw exception when the PDF page is too complex to rasterize (for example type 0 font). In that case, you can use this free utility.
java.io.IOException
public void open(java.lang.String password) throws java.io.IOException
This method may throw exception when the PDF page is too complex to rasterize (for example type 0 font). In that case, you can use this free utility.
password
- optional password to open the PDFjava.io.IOException
public void close() throws java.io.IOException
java.io.IOException
public int getNumberOfPages()
public java.awt.image.BufferedImage getPageAsImage(int pageIndex) throws java.io.IOException
pageIndex
- - zero based page index, i.e., the first page is page 0.java.io.IOException
public java.awt.image.BufferedImage getPageAsImage(int pageIndex, int dpi) throws java.io.IOException
pageIndex
- - zero based page index, i.e., the first page is page 0.dpi
- the DPI, e.g, 72, 100, 200, 300, etc.java.io.IOException
public void savePageAsImageFile(int pageIndex, java.lang.String formatName, java.io.File output) throws java.io.IOException
This method may throw exception when the PDF page is too complex to rasterize (for example type 0 font). In that case, you can use this free utility.
pageIndex
- - zero based page index, i.e., the first page is page 0.formatName
- - valid values are "gif", "jpeg", "png"output
- java.io.IOException
public java.lang.String extractTextFromPage(int pageIndex) throws java.io.IOException
pageIndex
- - zero based page index, i.e., the first page is page 0.java.io.IOException
public java.awt.Rectangle getPageSize(int pageIndex)
pageIndex
- - zero based page index, i.e., the first page is page 0.public static void main(java.lang.String[] args) throws java.lang.Exception
args
- java.lang.Exception
public PDFSecurityObject getSecurityObject()
public void setSecurityObject(PDFSecurityObject securityObject)
securityObject
-