AIaaS – How does it works?

As AIaaS,was being developed it was made clear that it will exceed its main target, which is no other than help thousands of colleagues to automate processes that otherwise they would not have been capable to do due to the systems’ incapacity to work with non-structured data.

It would not have made sense to offer an innovative and robust product that solves this common issue without facing a complex implementation. At MCCM, we have worked with some providers that were held back due to the necessary infrastructure or tedious configuration to utilise their own products, therefore, we wanted to be certain that AIaaS should rise up to the measure of its creation.

Many of you may already know that the algorithms used in the area of artificial intelligence require a high processing capacity, hence, a question to answer was: how to offer an innovative, personal, self-floating product to clients without requiring of them a complex infrastructure in order to execute AIaaS? The answer to this question was: ‘by storing it in the cloud’.

Offering the service through the cloud by calling a REST service permits to use it from any RPA software (UiPath, BluePrism, Automation Anywhere, etc, or by default through any programming language). Sometimes, in order to use the Machine Learning algorithms is required a high processing power, subsequently, instead of having to set a computer on the clients’ site, a self-floating structure was created, providing simultaneous service to all clients without the need for them to have a live machine.

After analysing several options, Amazon Sagemaker service was the chosen one to store AIaaS, above Google ML. Owing to Amazon Sagemaker, the issue of implementing a machine on the clients’ site was solved. At MCCM we are faithful supporters of automation, thus, it would not be right for us not to offer a fully automated product.

Once AIaaS’s models were stored in Amazon Sagemaker it was fairly easy to use it. However, specific information from MCCM employees was not recorded just the overall data. This was a confirmation to our concerns, then, a system to monitor and administrate each users’ log was created. Kong an Open-Source API Management and Microservice Management did the trick; by doing so, users’ calls were dealt through Kong, thus, the MVP of the final product was taking shape. Lastly, access to the logs was provided to users via a web application.

The web application was the least innovative part yet not exempt from difficulties. MCCM takes automation as its own religion, so Kong was implemented within the MCCM website, using different microservices, in addition to Stripe, so payments and invoice postings could be performed and accessible for users.

All the infrastructure is developed with microservices exploiting Dockers containers, compiling each dependency into an isolated container makes them portable to any interface, for example, from the developers’ computer to the cloud; that characteristic allows continuous integration and deployment of the self-floating solutions.

What do you think about it? Does it lock complex? Dockers, microservices, APIs administration, Stripe… Do not worry, at the end of the day is as easy as logging into a website. AIaaS is no other thing than a cognitive API, very cognitive! 🙂

AIaaS – Getting started

The most complex part of every project is to define its scope and targets. Once the scope and objectives are clearly defined, reverse engineering can be applied in order to create an action plan, we advise reading Cameron Herold’s book named ‘Double Double’ for an increased depth of knowledge in the matter.

In the particular case of AIaaS, we rapidly comprehended the intrinsic complexities that it conveyed, therefore, we had a strong determination and were committed to succeeding when various difficulties were encountered, such as technical complications, necessary infrastructure to create a self-floating product, GDPR, among many others. Our main goal was to achieve a high rate of accuracy when recognising and extracting text from images.

During our research, we found several ways to achieve our target, in addition, we were already familiarised with the technology and technical terms (OCR, Linear Regression, Support Vector Machine ‘SVM’, K-means clustering, Random Forest,…). Most of the previously outlined algorithms are currently being used to detect and extract text from images, despite the fact that they first come to light in the 19th century; for instance, Stochastic Gradient Descent (SGD) and Backpropagation algorithms are key components in most of the Neural Networks (NN) developed nowadays, even though they were created in the eighteen hundreds. This fact was not surprising to us, as we already knew from first hand that the technology behind the RPA ‘boom’ was not a total novelty. Some of our colleagues from MCCM had formerly work in the development of an RPA tool in .NET, VB6, and in some of the fundamentals of the artificial intelligence field.

After numerous days of investigation, we got in touch with specialists of this subject, among them there was our beloved  Alvaro Arcos, a PhD and former colleague at the university; he has spent the last few years researching and developing traffic sign recognition systems in order to provide cars with driving autonomy through the application of Machine Learning and Deep Learning. You can check one of his papers  here. Alvaro is acquainted with Convolutional Neural Networks (CNN), Spatial Transformer Networks (STN), and Recurrent Neural Networks (RNN) by using Deep Learning libraries like Keras (, Tensorflow ( y Torch ( Owing to Alvaro’s expertise we were pointed to libraries that could potentially solve our problems Tesseract OCR Engine ( and Google Cloud Vision API ( among the most relevant that we tested. Hereafter, we described our experience utilising them.

Tesseract OCR Engine  4.0

Tesseract OCR Engine was initially developed by Hewlett-Packard between 1985 and 1995. In 2005 was open sourced and since then Google has been its main developer.

Nowadays, the most recent version is 4.0. This version adds a new neuronal network system based on Long Short-Term Memory (LSTM) units, which offers high precision. Among its main characteristics, we can find the language recognition, reaching up to a hundred different lingoes, it also allows to train different architectures of neuronal networks with a varied language style known as Variable-size Graph Specification Language (VGSL). In addition, it is possible to train models from scratch or fine-tune models previously trained, specifically, the test version used for the following trials has been Tesseract 4.0.0-beta.1 compiled from an Ubuntu 16.04 server.

Using a fake ID card image (left) for Tesseract, we obtained the subsequent results (right):

Our experience with Tesseract was not as positive as we hoped for when employing the OCR to ID cards and passports with low contrast between the characters and the background; However, fantastic results were obtained when the contrast was high.

Google Cloud Visión API

This API performs Optical Character Recognition (OCR). It detects and extracts text within an image, supporting a broad range of languages, moreover, it features automatic language identification.

For this specific case, the tests were really simple. We uploaded the ID card through a web interface which gave back a JSON with the extracted content. As we can appreciate, the results were really accurate, nevertheless, it did not offer the reliability level we were looking for.

In order to conclude, we must say that the fact that Google offers a service that solves the real problem for many of our clients did not concern us. In reality, those results provided us even more motivation to resolve this issue that our clients are experiencing once and for all.

We did have no other option than to welcome Alvaro to join our challenge and vision, he gladly accepted to join MCCM and lead the project.

AIaaS will come out in a few weeks, we hope that it will provide all that you may need and that it will help you as much as has helped us.

What did we do next? We will let you need in the next post. Subscribe and you will receive it in your inbox!

AIaaS – Hello world!

It was in 2011 when we first got acquainted with process automation. At that time, the acronym currently known as RPA (Robotic Process Automation) did not exist yet.

What did we do then? According to many, it was something innovative and in the words of others we simply did macros. The technology was not mature enough in order to extend its use, therefore, some praised us, and others labelled us to be all smoke and mirrors.

In retrospect the truth was that technology lacked development, consequently, we were confronted with scepticism due to the profound technical knowledge that was necessary to create a so-called ‘robot’. As a faithful follower of Peter Diamandis, we could say we were immersed in the second stage out of the six ‘Ds’ of exponential growth, as he describes in his magnificent book Bold. For those who are not familiar with Peter´s beliefs, he deemed that exponential growth has six stages which are: Digitalized, Deceptive, Disruptive, Demonetized, Dematerialized, Democratized. It is undeniable to acknowledge that since a technology is digitalized until the moment that it becomes democratized, its development is exponential – those who are reading this post from a digital device surely agree! 🙂

2011 was not exempt from challenges. Occasionally, features were simply not feasible to develop on the technical side (or even bureaucratic); process re-engineering used to ‘solve’ those limitations by including human intervention checkpoints when required. For example, an employee’s common task could be to read an image displayed by the robot and fill out a simple form with the requested information.

By using the previously described methodology and applying the technological improvements that came as time goes by, the first few ‘choices’ based on the use of OCR commenced to rise in 2015 – nonetheless, this technology thrived in barely any instance. It was in 2016 when we felt a change, the term RPA first appeared, referencing something that we were carrying out since 2011. We can not say that we are the pioneers as BluePrism, NICE, Automation Anywhere, or UiPath started the journey before us.

However, we are still tackling some of the limitations we faced seven years ago. For instance, experts have not been able to entrust the right data extraction from an image by using the latest version of Tesseract OCR Engine (currently version 4), whose extraction models are based on a Recurrent Neuronal Network (RNN) known as LSTM (Long Short-Term Memory). LSTM networks are frequently used in different applications to predict the evolution of temporal series, with the purpose of recognizing human speech, character sequences, among many other things.

Therefore, bearing all the previous outlined facts in mind, we have developed AIaaS (Artificial Intelligence as a Service), a cognitive library that aims to extract unstructured data from images. This technology can be used everywhere through an HTTP request to an API REST, in that way, the technical complexity that this technology usually conveys is non-existent.

AIaaS offers every user a combination of state-of-the-art models developed with the latest techniques on Machine Learning and Deep Learning, providing a solution to common issues as: detecting and extracting text inside of images – like reading a passport or a driving license – automatic identification of languages, classification and detection of objects within images, sentiment analysis, document classification, recognition of explicit content (violence, pornography), etc.

We are pleased to provide you with more details, it will certainly take longer than a single post. Are you interested? Please subscribe and you will receive the journals concerning all the development process and the challenges we have found.