How to turn our THUNK into an OCR server

There is a project I am working on to digitize customs documents. I can say that I was inspired by Thunk. I’m almost done. I write in Python.

Step 1: Function1 downloads the attached emails in Microsoft Outlook to disk. ‘\Address\Domain\Sender’sMailAddress’ automatically creates folders and saves attachments.

Step 2: Function2 scans these folders or subfolders of the declared folders and understands that the PDF document (with Tesseract-OCR) is a customs document. The code adds the customs documents it finds to another folder.

Step 3: Function3 names the file within the Customs document as ‘date_exported-companyName_DeclarationID.PDF’.

Step 4: Function4 retrieves the information from the manually edited CSV file, which fields are stored in which coordinates, with PDFMiner (Python library). (If the PDF file is an image rather than text, another library is used)

“CSV structure is: fieldName, coordinates”
StunName, x0, y0, x1, y1
Declaration type, 344, 728, 358, 741
Declaration ID, 420, 681, 513, 692
Declaration DATE, 330, 150, 390, 165
Currency of the declaration, 418, 726, 485, 740

It scans the CSV file in a loop, writes the field names to the Excel file, and writes the information in the relevant coordinates to the relevant columns. Another function then sends them to an ERP program with win32com support.

Now let’s get to the main event, as far as I read on Yahoo and various Medium blogs, it is said that the size of OCR technology will reach 40 billion dollars in 2030.

Here is a link = ( [Economic scale of OCR by 2030.]([Smart OCR – Advancing the Use of Artificial Intelligence with Open Data – New Jersey State Policy Lab (OCR)%20is,as%20estimated%20by%20Straits%20research.]

So the cake is big, you decide if it’s worth the effort.

But what I want to ask is this:

Add a new feature to Thunk (I don’t think AI will do this)

a- Let’s upload the PDF file to Thunk for training
b- Let’s choose which information is stored in which coordinates. (It can be with the help of the mouse. Or another way.)
c- Let’s match the coordinates with our own database table.
d- Then, let’s send the document to Thunk via API or by e-mail.
e-Thunk sends the PDF file, fields and coordinates sent via API or e-mail to artificial intelligence.
f- Let AI send the data to the application using our application’s API.

Now this is a model, there are n document types, n databases, n customers. Using such a model, can we turn our Thunk’u into an OCR server?"

Hi @bayramtag

Thanks for sharing your thoughts. Thunk might come close to what you want. As a starter, maybe check this video out that shows a related example (working with AppSheet, but the same idea applies to any other system that can send/receive messages): Incoming / Outgoing set ups in details - #15 by tony

One thing that’s missing from Thunk right now is handling email attachments. You’d have to include your data in the email body. This is on our roadmap, though, so stay tuned.

1 Like

Hello @tony,

Thanks for your answer.

1- I watched the video before, I follow it regularly.
2- Thunk receives and processes mail. Ok.
3- We send a message with Thunka Api. Thunk processes this too. OK.
4- Sends a message to another application with Thunk Api. That’s okay too.
5- Thunk can somehow receive and process incoming e-mail attachments. I don’t think this will be a problem either.
6- However, it seems a bit difficult to do which API to use for which document type.

Document and API types

a-) a document type => sending db information via an API (For example, CV)
b-) multiple document types (e.g. contracts in different formats) => sending to db via an API
c-) One document => many APIs (for example customs documents)
d-) Documents containing the same information => many APIs (for example, invoices)

This is the real work to worry about. Bringing the information in the document into regular expressions and ensuring which field matches which API field.
This is either matched by the user. Or it requires a genius idea.

I don’t know if this is on your roadmap. I’m very curious how you will solve it, if any.

Now imagine that this application is running, a document will be sent to the customer representative by e-mail, thunk will track it, record it instantly and add it to the database.

Think of how much time and personnel companies will save.

We will watch and see, it is a fascinating evolution process.

1 Like

hi @bayramtag , I am not following entirely, so perhaps you could clarify.

As I understand it, you want to do three things:
a) classify documents into different categories (customs document, etc)
b) for each kind of document, you want to use AI to extract specific fields. You have information about bounding regions of those fields, which you want to provide to the AI model to use
c) Then you want the extracted information sent back to an API

In general, you should be able to do all of this directly using a thunk without need for any kind of custom OCR model or any custom code.
The underlying OpenAI model should know how to do “semantic OCR” even without the need for the bounding boxes. I’m not 100% sure if we have enabled PDF reading yet – if not, it is a smallish change to make because the OpenAI GPT model does support it.

Thanks for your answer, dear @praveen-Thunk.AI

I thought about it a bit. I tried Gemini and ChatGpt. I wanted it to synchronize the texts on an image to the field names (in the table).

In other words, the information on the picture is the receipt number and its value is A123456.
The field name in the table is rcptNumber. I wanted it to write the information in the picture like this: rcptNumber=“A123456”, every AI did what I said.

I increased the field names and wanted it to give a result like this.

rcptNumber=“A123456”
rcptDate=“01.06.2024”
firmName=“abc”

AI’s did this too.

And logically, I learned that in Thunk, we can extract the data from an image with natural language and send it to a program that receives the message we want via API.

Things get difficult when you think about it with the classical code writing logic.

Thunk.AI can currently do this job, that is, it can be an OCR server. You just need to create key sentences as if you were writing code.

Of course, neither artificial intelligence nor Thunk are wizards. As always, it is necessary to think deeply and work :hammer_and_pick:.

Now the balance is changing. A smart person can become a smart programmer. Real citizen programming starts now.

2 Likes