Communication has always been the basis of human relationships, so much so that we have invented many creative ways of communicating. Every day lots of data is produced in the form of messages, comments, posts, and e-mails, all written in natural language.
In order to analyze and understand these texts in all their complexity, many techniques have been developed that fall under the Natural Language Processing (NLP) area. These techniques can follow a set of prescriptive rules (rule-based approach), or use Machine Learning, automatically building a set of rules starting from properly annotated data.
There are numerous text analysis techniques that help us extract information in a structured format so that it can be processed with computers. Some of them, like Named Entity Recognition (NER), Syntax Analysis, and Part-of-Speech tagging, stop at the surface of language processing, while others, such as Sentiment Analysis or Entity Linking, extrapolate concepts at a higher level of understanding.
Let’s see an example of how NER allows us to analyze the phrase, by distinguishing between words whose meaning varies depending on the context:
Another example, this time of how Sentiment Analysis lets us recognize the main aspects of a text and verify the opinion for each of them (opinion mining), as well as returning the overall result:
As we’ve already seen in the previous article, if you don’t have artificial intelligence know-how or do not want to develop these tools from scratch, you can use cloud-based services. We will compare those made available by Microsoft Azure, Amazon AWS, and Google Cloud, taking a look at common features and exclusive ones.
Azure Cognitive Services
The service developed by Microsoft offers some interesting features like NER and Linked Entities. The former allows us to recognize generic entities such as names, location, events, and more. The latter instead is capable of discerning between potentially ambiguous terms and returning a link to the corresponding Wikipedia page for each recognized entity. Opinion Mining is available starting from version v3.1, currently in preview and only for English texts.
Development libraries for many languages are available. Alternatively, it is possible to query the REST endpoint. As for the speech services, these features are also available on-premises by installing a Docker container, which can be activated only by filling out a form and sending the request to Microsoft. At the moment both NER and Entity Linking are not available in the container and you’ll have to make do with the Cloud service.
For the list of Cognitive Services functions, you can see the official documentation here: https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/
You can find the documentation at the following link:
Google Cloud Natural Language API
The Natural Language APIs provide some of the functions we’ve already seen in generic pre-trained models, the same ones used by the Google Assistant. For those of you who have specific requirements, AutoML allows you to create customized models for your application domain. This service also offers a complete Syntax Analysis feature, which returns detailed information for each analyzed text. While other methods allow us to understand the text’s meaning, syntax analysis looks at a text in depth and reveals its structure. The text is split into many tokens, which are classified (as an adjective, verb, subject, etc.), and then it’s noted how they relate to each other. Entity sentiment analysis is not available in Italian.
Google provides development libraries and documentation for both REST and gRPC APIs.
Here you can find more information on supported functionalities:
Azure, AWS and Google Cloud in comparison
Let’s try to sum up the features offered by the major Cloud providers in a summary table:
Natural Language API
|Opinion mining||✔️ |
|Named entity recognition||✔️||✔️||✔️|
All three follow a similar billing model that takes into account the number of analyzed “units”. Microsoft and Google define as a unit a number of characters between 0 and 1000, while Amazon defines a unit as between 0 and 100 characters. For Google and Amazon the unit price varies according to which features are used and the total number units to be analyzed: the higher the number of monthly analized units, the lower the unit price.
Given the infinite variety of meaning and interpretations, understanding the real user intent is the true challenge when building a virtual assistant, regardless of how they interact.
The services we’ve seen so far, when combined properly, can be a good starting point in this fascinating and complex field. But they are not the only ones provided by the Cloud.
If you want to know more, keep following us!