fbpx

From text to understanding with Artificial Intelligence

by | Feb 22, 2021 | Blog

Communication has always been the basis of human relationships, so much so that we have invented many creative ways of communicating. Every day lots of data is produced in the form of messages, comments, posts, and e-mails, all written in natural language.

In order to analyze and understand these texts in all their complexity, many techniques have been developed that fall under the Natural Language Processing (NLP) area. These techniques can follow a set of prescriptive rules (rule-based approach), or use Machine Learning, automatically building a set of rules starting from properly annotated data.

There are numerous text analysis techniques that help us extract information in a structured format so that it can be processed with computers. Some of them, like Named Entity Recognition (NER), Syntax Analysis, and Part-of-Speech tagging, stop at the surface of language processing, while others, such as Sentiment Analysis or Entity Linking, extrapolate concepts at a higher level of understanding.

Let’s see an example of how NER allows us to analyze the phrase, by distinguishing between words whose meaning varies depending on the context:

Another example, this time of how Sentiment Analysis lets us recognize the main aspects of a text and verify the opinion for each of them (opinion mining), as well as returning the overall result:

As we’ve already seen in the previous article, if you don’t have artificial intelligence know-how or do not want to develop these tools from scratch, you can use cloud-based services. We will compare those made available by Microsoft Azure, Amazon AWS, and Google Cloud, taking a look at common features and exclusive ones.

Azure Cognitive Services

The service developed by Microsoft offers some interesting features like NER and Linked Entities. The former allows us to recognize generic entities such as names, location, events, and more. The latter instead is capable of discerning between potentially ambiguous terms and returning a link to the corresponding Wikipedia page for each recognized entity. Opinion Mining is available starting from version v3.1, currently in preview and only for English texts.

Development libraries for many languages are available. Alternatively, it is possible to query the REST endpoint. As for the speech services, these features are also available on-premises by installing a Docker container, which can be activated only by filling out a form and sending the request to Microsoft. At the moment both NER and Entity Linking are not available in the container and you’ll have to make do with the Cloud service.

For the list of Cognitive Services functions, you can see the official documentation here: https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/

Amazon Comprehend

Comprehend is an NLP service that provides features like keyphrase extraction, sentiment analysis, and entity recognition. Moreover, Comprehend allows custom-tailored models for your application domain to be created, for both entity recognition (like specific codes or acronyms) and text classification. Either way, the system is instructed by feeding it lists of texts that have already been categorized, then the machine learning algorithms will do the rest. Comprehend can be only used through the SDK, which is available for Java, Python, PHP, Javascript, Ruby, C# and Go. Currently, the REST API is not available.

You can find the documentation at the following link:
https://docs.aws.amazon.com/comprehend/

Google Cloud Natural Language API

The Natural Language APIs provide some of the functions we’ve already seen in generic pre-trained models, the same ones used by the Google Assistant. For those of you who have specific requirements, AutoML allows you to create customized models for your application domain. This service also offers a complete Syntax Analysis feature, which returns detailed information for each analyzed text. While other methods allow us to understand the text’s meaning, syntax analysis looks at a text in depth and reveals its structure. The text is split into many tokens, which are classified (as an adjective, verb, subject, etc.), and then it’s noted how they relate to each other. Entity sentiment analysis is not available in Italian.

Google provides development libraries and documentation for both REST and gRPC APIs.

Here you can find more information on supported functionalities:
https://cloud.google.com/natural-language/docs

Azure, AWS and Google Cloud in comparison

Let’s try to sum up the features offered by the major Cloud providers in a summary table:

Azure
Cognitive Services
Amazon
Comprehend
Google Cloud
Natural Language API
Sentiment analysis✔️✔️✔️
Opinion mining✔️
(preview)
✔️
Language recognition✔️✔️
Named entity recognition✔️✔️✔️
Syntax analysis✔️✔️
Keyphrase extraction✔️✔️
Custom Models✔️✔️
Supported LanguagesC#, Python, Javascript, Go, RubyJava, Python, PHP, Javascript, Ruby, C#, GoC#, Go, Java, Javascript, PHP, Python, Ruby
REST✔️✔️
gRPC✔️
On-prem✔️
(Docker)

All three follow a similar billing model that takes into account the number of analyzed “units”. Microsoft and Google define as a unit a number of characters between 0 and 1000, while Amazon defines a unit as between 0 and 100 characters. For Google and Amazon the unit price varies according to which features are used and the total number units to be analyzed: the higher the number of monthly analized units, the lower the unit price.

Final thoughts

Given the infinite variety of meaning and interpretations, understanding the real user intent is the true challenge when building a virtual assistant, regardless of how they interact.

The services we’ve seen so far, when combined properly, can be a good starting point in this fascinating and complex field. But they are not the only ones provided by the Cloud.

If you want to know more, keep following us!

Written by

Written by

Salvatore Merone