AIaaS – Getting started

The most complex part of every project is to define its scope and targets. Once the scope and objectives are clearly defined, reverse engineering can be applied in order to create an action plan, we advise reading Cameron Herold’s book named ‘Double Double’ for an increased depth of knowledge in the matter.

In the particular case of AIaaS, we rapidly comprehended the intrinsic complexities that it conveyed, therefore, we had a strong determination and were committed to succeeding when various difficulties were encountered, such as technical complications, necessary infrastructure to create a self-floating product, GDPR, among many others. Our main goal was to achieve a high rate of accuracy when recognising and extracting text from images.

During our research, we found several ways to achieve our target, in addition, we were already familiarised with the technology and technical terms (OCR, Linear Regression, Support Vector Machine ‘SVM’, K-means clustering, Random Forest,…). Most of the previously outlined algorithms are currently being used to detect and extract text from images, despite the fact that they first come to light in the 19th century; for instance, Stochastic Gradient Descent (SGD) and Backpropagation algorithms are key components in most of the Neural Networks (NN) developed nowadays, even though they were created in the eighteen hundreds. This fact was not surprising to us, as we already knew from first hand that the technology behind the RPA ‘boom’ was not a total novelty. Some of our colleagues from MCCM had formerly work in the development of an RPA tool in .NET, VB6, and in some of the fundamentals of the artificial intelligence field.

After numerous days of investigation, we got in touch with specialists of this subject, among them there was our beloved  Alvaro Arcos, a PhD and former colleague at the university; he has spent the last few years researching and developing traffic sign recognition systems in order to provide cars with driving autonomy through the application of Machine Learning and Deep Learning. You can check one of his papers  here. Alvaro is acquainted with Convolutional Neural Networks (CNN), Spatial Transformer Networks (STN), and Recurrent Neural Networks (RNN) by using Deep Learning libraries like Keras (https://keras.io/), Tensorflow (https://www.tensorflow.org/) y Torch (http://torch.ch/). Owing to Alvaro’s expertise we were pointed to libraries that could potentially solve our problems Tesseract OCR Engine (https://github.com/tesseract-ocr/tesseract) and Google Cloud Vision API (https://cloud.google.com/vision/) among the most relevant that we tested. Hereafter, we described our experience utilising them.

Tesseract OCR Engine  4.0

Tesseract OCR Engine was initially developed by Hewlett-Packard between 1985 and 1995. In 2005 was open sourced and since then Google has been its main developer.

Nowadays, the most recent version is 4.0. This version adds a new neuronal network system based on Long Short-Term Memory (LSTM) units, which offers high precision. Among its main characteristics, we can find the language recognition, reaching up to a hundred different lingoes, it also allows to train different architectures of neuronal networks with a varied language style known as Variable-size Graph Specification Language (VGSL). In addition, it is possible to train models from scratch or fine-tune models previously trained, specifically, the test version used for the following trials has been Tesseract 4.0.0-beta.1 compiled from an Ubuntu 16.04 server.

Using a fake ID card image (left) for Tesseract, we obtained the subsequent results (right):

Our experience with Tesseract was not as positive as we hoped for when employing the OCR to ID cards and passports with low contrast between the characters and the background; However, fantastic results were obtained when the contrast was high.

Google Cloud Visión API

This API performs Optical Character Recognition (OCR). It detects and extracts text within an image, supporting a broad range of languages, moreover, it features automatic language identification.

For this specific case, the tests were really simple. We uploaded the ID card through a web interface which gave back a JSON with the extracted content. As we can appreciate, the results were really accurate, nevertheless, it did not offer the reliability level we were looking for.

In order to conclude, we must say that the fact that Google offers a service that solves the real problem for many of our clients did not concern us. In reality, those results provided us even more motivation to resolve this issue that our clients are experiencing once and for all.

We did have no other option than to welcome Alvaro to join our challenge and vision, he gladly accepted to join MCCM and lead the project.

AIaaS will come out in a few weeks, we hope that it will provide all that you may need and that it will help you as much as has helped us.

What did we do next? We will let you need in the next post. Subscribe and you will receive it in your inbox!

Leave a Reply

Your email address will not be published. Required fields are marked *