Receipt Optical Character Recognition (OCR)¶
Asprise Receipt OCR detects and extracts receipt information from images.
To initiate a receipt OCR, one needs to send a POST request (the image file and optional setting parameters) to our API endpoint. The API will return the results as JSON within seconds.
Receipt OCR endpoints
http://ocr.asprise.com/api/v1/receipt
(HTTP)
https://ocr.asprise.com/api/v1/receipt
(HTTPS)
http://ocr2.asprise.com/api/v1/receipt
(HTTP - backup)
https://ocr2.asprise.com/api/v1/receipt
(HTTPS - backup)
You may perform receipt OCR from Windows, macOS and Linux command consoles or from any of your favorite programming languages.
Click the tab below to find out how to OCR a receipt from the command line or in C# VB.NET, Java, JavaScript/Node.js, PHP or Python.
curl -X POST -F "api_key=TEST" -F "recognizer=auto" -F "ref_no=my_ref_123" -F "file=@receipt.jpg" https://ocr.asprise.com/api/v1/receipt
// View complete code at: https://github.com/Asprise/receipt-ocr/tree/main/csharp-vb-net-receipt-ocr
string response = httpPost("https://ocr.asprise.com/api/v1/receipt", // Receipt OCR API endpoint
new NameValueCollection()
{
{"api_key", "TEST"}, // Use 'TEST' for testing purpose
{"recognizer", "auto"}, // can be 'US', 'CA', 'JP', 'SG' or 'auto'
{"ref_no", "ocr_dot_net_123"} // optional caller provided ref code
},
new NameValueCollection() {{"file", "../../receipt.jpg"}} // Modify this to use your own file if necessary
);
Console.WriteLine(response); // Result in JSON
// View complete code at: https://github.com/Asprise/receipt-ocr/tree/main/java-receipt-ocr
/**
* Uploads an image for receipt OCR and gets the result in JSON.
* Required dependencies: org.apache.httpcomponents:httpclient:4.5.13 and org.apache.httpcomponents:httpmime:4.5.13
*/
public class JavaReceiptOcr {
public static void main(String[] args) throws Exception {
String receiptOcrEndpoint = "https://ocr.asprise.com/api/v1/receipt"; // Receipt OCR API endpoint
File imageFile = new File("receipt.jpg");
System.out.println("=== Java Receipt OCR Demo - Need help? Email support@asprise.com ===");
try (CloseableHttpClient client = HttpClients.createDefault()) {
HttpPost post = new HttpPost(receiptOcrEndpoint);
post.setEntity(MultipartEntityBuilder.create()
.addTextBody("api_key", "TEST") // Use 'TEST' for testing purpose
.addTextBody("recognizer", "auto") // can be 'US', 'CA', 'JP', 'SG' or 'auto'
.addTextBody("ref_no", "ocr_java_123'") // optional caller provided ref code
.addPart("file", new FileBody(imageFile)) // the image file
.build());
try (CloseableHttpResponse response = client.execute(post)) {
System.out.println(EntityUtils.toString(response.getEntity())); // Receipt OCR result in JSON
}
}
}
}
// View complete code at: https://github.com/Asprise/receipt-ocr/tree/main/javascript-nodejs-receipt-ocr
console.log("=== JavaScript/Node.js Receipt OCR Demo - Need help? Email support@asprise.com ===");
var receiptOcrEndpoint = 'https://ocr.asprise.com/api/v1/receipt';
var imageFile = 'receipt.jpg'; // Modify this to use your own file if necessary
var fs = require('fs');
var request = require('request');
request.post({
url: receiptOcrEndpoint,
formData: {
api_key: 'TEST', // Use 'TEST' for testing purpose
recognizer: 'auto', // can be 'US', 'CA', 'JP', 'SG' or 'auto'
ref_no: 'ocr_nodejs_123', // optional caller provided ref code
file: fs.createReadStream(imageFile) // the image file
},
}, function(error, response, body) {
if(error) {
console.error(error);
}
console.log(body); // Receipt OCR result in JSON
});
<?php // View complete code at: https://github.com/Asprise/receipt-ocr/tree/main/php-receipt-ocr
function receiptOcr($imageFile) {
$receiptOcrEndpoint = 'https://ocr.asprise.com/api/v1/receipt'; //
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $receiptOcrEndpoint);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, array(
'api_key' => 'TEST', // Use 'TEST' for testing purpose
'recognizer' => 'auto', // can be 'US', 'CA', 'JP', 'SG' or 'auto'
'ref_no' => 'ocr_php_123', // optional caller provided ref code
'file' => curl_file_create($imageFile) // the image file
));
$result = curl_exec($ch);
if(curl_errno($ch)){
throw new Exception(curl_error($ch));
}
echo $result; // result in JSON
}
print("=== Java Receipt OCR Demo - Need help? Email support@asprise.com ===\n");
receiptOcr('receipt.jpg'); // Modify this to use your own file if necessary
# View complete code at: https://github.com/Asprise/receipt-ocr/tree/main/python-receipt-ocr
import requests
print("=== Python Receipt OCR Demo - Need help? Email support@asprise.com ===")
receiptOcrEndpoint = 'https://ocr.asprise.com/api/v1/receipt' # Receipt OCR API endpoint
imageFile = "receipt.jpg" # // Modify this to use your own file if necessary
r = requests.post(receiptOcrEndpoint, data = { \
'api_key': 'TEST', # Use 'TEST' for testing purpose \
'recognizer': 'auto', # can be 'US', 'CA', 'JP', 'SG' or 'auto' \
'ref_no': 'ocr_python_123', # optional caller provided ref code \
}, \
files = {"file": open(imageFile, "rb")})
print(r.text) # result in JSON
The complete source code of the receipt OCR sample programs in C#, Java, JavaScript, PHP and Python can be found at github.com/Asprise/receipt-ocr
Request Parameters¶
When sending a receipt OCR request, you may pass along the following parameters:
api_key (string, required)
recognizer (string, required)
file (file, required)
ref_no (string, optional)
mapping_rule_set (string, optional)
api_key¶
api_key
(string, required) is used to identify the client who makes the OCR request. If you don’t have one, you may simply set it to TEST.
recognizer¶
A recognizer is implemented as a set of machine learning algorithms that optimizes the receipt recognition for a particular country or a specific scenario.
You use recognizer
(string, required) to select the recognizer to be used for the given receipt.
Country-specific recognizers offered by the OCR API:
AU
for recognizing receipts from Australia
DE
for recognizing receipts from German
GB
for recognizing receipts from the United Kingdom
JP
for recognizing receipts from Japan
MY
for recognizing receipts from Malaysia
SG
for recognizing receipts from Singapore
US
for recognizing receipts from the United States
If all the receipts you need to recognize are from a single country, you may simply set recognizer
to one of the value above.
If you need to recognize receipts from any country not in the above list, please contact us so that we can add it for you.s
Multiple countries¶
If the receipts are from two countries or more, you can specify recognizer
to list of the countries code separated by comma.
For example, if the receipts are from either German or the UK, recognizer
should be set to DE,UK.
When a receipt is detected, the OCR API will first select the top match from the list of the recognizers. The selected recognizer is then used to recognize the receipt.
“auto”¶
When recognizer
is set to auto, the OCR API will try to find a top match from all of the available recognizers.
This is a convenient value if you aren’t sure where a receipt is from. However, it comes at a cost - it is usually slower as the OCR API needs to find a match among all the recognizers. Always specify a recognizer or a list of recognizers if you can.
file¶
This is the image file that contains one or multiple receipts. File format supported:
JPEG
PNG
TIFF
ref_no¶
You use ref_no
(string, optional) to identify a OCR request for your own reference if necessary. ref_no
from the request will be copied to the response.
It doesn’t affect the OCR process in any way.
mapping_rule_set¶
Mapping rule sets can be used to post-process receipts after they have been recognized. For example, a mall operator may use a mapping rule set to identify each store accurately via matching of merchant address (unit number) or phone numbers, and sets a custom merchant id property accordingly.
A mapping rule set defines a set of rules. Each rule defines matching criteria and properties to be set if a receipt is matched.
You use mapping_rule_set
(string, optional) to specify the id of the receipt mapping rule set that should be applied to a receipt
after it has been recognized.
Before you can use mapping_rule_set
to specify a rule set, you must first define it and
then submit it to the OCR API.
Define mapping rule sets¶
A mapping rule set is represented in JSON. Below is a sample:
{
"mapping_rule_set_id": "MY_MALL",
"rules": [
{
"matching" : {
"merchant_name": "MCD",
"merchant_tax_reg_no" : "TAX1234",
},
"set_props": {
"merchant_name": "McDonald's Restaurant",
"my_custom_store_id": "mcdonald_123",
"my_custom_prop": "US Food"
}
},
{
"matching" : {
"merchant_phone": "6362"
},
"set_props": {
"merchant_name": "Another Great Store",
}
}
]
}
mapping_rule_set_id
(required, string, min 2 characters) - the id of the rule set; rules
- list of rules as an array.
Each rule object contains two main parts:
matching
(the matching criteria) contains receipt property name to keyword pairs. At runtime, the OCR API will attempt each pair to see
whether the value of the corresponding property of a receipt contains the specified keyword (case-insensitive).
The minimum length of keyword is 2.
By default, a receipt is considered as rule matched if there is at least one such pair being matched.
For the list of supported property names, please refer to Receipt Object.
set_props
contains properties to be set only if a receipt is matched. The properties can be either the standard
receipt properties as defined in Receipt Object or your own custom properties.
By default, the OCR API will stop further matching once a rule has been matched for a receipt. You may use stop_if_matched
to change this behavior. The default value of stop_if_matched
for a rule is true
. Setting it to false
will
allows the OCR API to continue matching even if the current rule is matched.
You draft a receipt mapping rule set in a JSON file. Keeping such rule set JSON files in a versioning system like GIT will help you track all the changes.
To get started, you may refer to the following samples: Sample 1 | Sample 2
When writing rules, you are recommended to use an editor that can provide code assist and validation against the JSON schema: Receipt Mapping Rule Set JSON Schema. Attributes that are not defined in the schema will be ignored, and no error will be thrown.
Submit mapping rule sets¶
After defining a rule set, you need to submit it to the OCR API to take effect.
Using Web GUI¶
Visit the web GUI URL we provided to you:
Input your API key, keep the endpoint URL, select mapping_rule_set_update as the action and copy your entire rule set to the content box (alternatively, you may drag and drop your JSON file to the web page to set the content), hit ‘Execute Action’.
Once a rule set has been submitted, it takes effect immediately. To delete it, you simply set the content to a rule set with rules defined (mapping_rule_set_id must be present though).
Using the REST API¶
If you need to frequently update rule sets or you want to automate, you may use the REST API to do so.
To create or update a receipt mapping rule set, please make a POST request to https://ocr.asprise.com/api/v1/receipt with the following parameters:
api_key
your API key
action
must be set tomapping_rule_set_update
content
the entire content of the rule set
Receipt OCR Results¶
When a receipt OCR request is received, the OCR API will process it. In case of failure (e.g., missing required request paramters), the OCR API will respond the error message with HTTP code of 400 (Bad Request response). In case of success, it will return the result in JSON with HTTP code of 200.
Below is a sample JSON result:
{
"request_id" : "...",
"ref_no" : "123",
"file_name" : "receipt.jpg",
"request_received_on" : 1610077103664,
"success" : true,
"recognition_completed_on" : 1610077104172,
"receipts" : [ {
"merchant_name" : "Merchant A", // receipt object #1
...
}, {
"merchant_name" : "Merchant B", // receipt object #2
...
} ]
}
Top level result properties include:
request_id System generated ID
ref_no the reference number passed in the request by the client
file_name Name of the uploaded image file
request_received_on Epoch time in milliseconds when the request is received
success Whether the OCR is performed successfully
recognition_completed_on Epoch time in milliseconds when the OCR is complete
receipts an array of receipt objects; each receipt is represented by a receipt object
Receipt Object¶
A receipt object has many properties and owns items. Sample receipt object:
{
"merchant_name" : "McDonald's",
"merchant_address" : "600 @ Toa Payoh #01-02, Singapore 319515",
"merchant_phone" : "62596362",
"merchant_website" : null,
"merchant_tax_reg_no" : "M2-0023981-4",
"merchant_company_reg_no" : null,
"merchant_logo" : null,
"region" : null,
"mall" : "600 @ Toa Payoh",
"country" : "SG",
"receipt_no" : "002201330026",
"date" : "2016-01-13",
"time" : "15:49",
"items" : [ {
"amount" : 2.95,
"description" : "Med Ice Lemon Tea",
"flags" : "",
"qty" : 1,
"remarks" : null,
"unitPrice" : null
}, {
"amount" : 2.40,
"description" : "Coffee with Milk",
"flags" : "",
"qty" : 1,
"remarks" : null,
"unitPrice" : null
} ],
"currency" : "SGD",
"total" : 5.35,
"subtotal" : null,
"tax" : 0.35,
"service_charge" : null,
"tip" : null,
"payment_method" : "cash",
"payment_details" : null,
"credit_card_type" : null,
"credit_card_number" : null,
"ocr_text" : "...",
"ocr_confidence" : 96.82,
"width" : 1940,
"height" : 2395,
"avg_char_width" : null,
"avg_line_height" : null,
"source_locations" : {
"date" : [ [
{ "x" : 1024, "y" : 1396 },
{ "x" : 1971, "y" : 1390 },
{ "x" : 1972, "y" : 1522 },
{ "x" : 1024, "y" : 1528 }
] ],
"total" : [ [
{ "x" : 1909, "y" : 1958 },
{ "x" : 2123, "y" : 1955 },
{ "x" : 2124, "y" : 2057 },
{ "x" : 1910, "y" : 2060 }
] ]
}
}
Receipt object properties include:
Name |
Description |
---|---|
|
Name of the merchant |
|
Address of the merchant |
|
Phone number |
|
Website if any |
|
Tax registration number |
|
Company registration number |
|
URL of the merchant logo image |
|
Region or area |
|
Mall |
|
Two-letter country code |
|
Receipt number (can be used for duplicate detection) |
|
Date of the receipt |
|
Time of the receipt if available |
|
An array of line item objects (see below for more details) |
|
Currency used |
|
Total amount |
|
Subtotal |
|
Tax |
|
Server charge amount |
|
Tip amount |
|
Payment method: cash, credit card, etc. |
|
Payment |
|
Credit card type: amex, master, visa |
|
Usually the last 4 digits of the credit number |
|
The complete text with layout rentention |
|
A number less than 100; the higher the better |
|
Width of the input image in pixel |
|
Height of the input image in pixel |
|
Map of key fields to polygon locations where the values are retrieved from |
Note that not all properties are present on all receipts.
Line Item Object¶
Properties of a line item object:
Name |
Description |
---|---|
|
Amount of the line item |
|
Description |
|
Text after amount indicating tax status |
|
Quantity |
|
Remarks |
|
Unit price |
If you need to recognize other properties, please get in touch with us.