FreeLance

Data Extraction From PDF Invoice


Introduction

This project is based on Tabular Data Extraction from PDF Invoices, I got this project from one of My Australian clients who had a store mart for which all the Purchasing he do in bulk from Warehouses he got Invoices through his email so he wanted a automated process which will extract data from invoices which comes through his email and then extracted data will be in tabular formate and dump into the csv files and then use those csv files for Data Visualization for his clear picture.


There were multiple types of Invoices you can see in the given below link, so I wrote different process for different type of invoices and tested on multiple copies where might be possible that pdf contains multiple table and process will extract each recquired table.


I have uploaded different small chunks of code of a Complete software for understanding. Full software is consists of monitoring the client email account so when recquired email recived it will download pdf from it and then software would have different functions for different type of invoices and software will deftect the invoice type and run function w.r.t to invoice and dump csv into defined directory


Click here to see GitHub Code Link