Artificial intelligence (AI) encompasses various technologies such as machine learning, natural language processing, deep learning, cognition, and machine reasoning. Usually, AI is defined as a biological system which is designed for computers to give them the human-like ability of hearing, seeing, thinking and then reasoning. One of the newest technology applications in businesses, Computer Vision, is an AI field that deals with how computers can be made to gain a high-level understanding of images and videos. The sub-domains of Computer Vision are video tracking, object recognition, learning, motion estimation, image restoration, etc.
According to a survey conducted by Narrative Science, 0% of businesses already use AI in some form or another, a figure set to increase at over 60% by the end of 2018.
Let’s look at a typical use case we are working on right now, after which we will compare the two exciting entrants into the area of Computer Vision.
Use case:
Marketing activities are centred around smart advertising in online platforms. The business wants to change the advertising to be based on a person’s demographics such as race, gender, and age which can increase the benefits for the company placing the advertising advertisement.
Two platforms compared:
The two emerging major services set to disrupt the Computer Vision market are Microsoft Cognitive Intelligence and Amazon (AWS) Recognition. These services aim to place AI such as Computer Vision services in the hands of analytics developers or analysis by providing APIs/ SDKs which can easily integrate into applications by simply writing a few lines of code. The added benefit is the integration with their larger cloud-based offering which gives the businesses a quicker ROI, higher reliability, and lower cost.
Let’s have a look at Microsoft’s Cognitive Intelligence and Amazon’s Recognition based Object Identification, Text Recognition, Face Detection, Emotion (in depth) and Price.
Object Identification:
Amazon and Microsoft both provide APIs and SDKs to read, analyze and label various objects in images. Both Microsoft and Amazon services could identify and label the objects included in the uploaded image (with a calculated level of confidence as shown). However, Microsoft can also analyze videos in real time in addition to images. Figure 1 and 2 show the results of both platforms respectively.
Figure 1: Microsoft object identification results
Figure 2: Amazon object identification results
If you need to process videos, then Microsoft Cognitive Intelligence provides the superior service. It can also detect adult content and image or video category. However, if you are using images only, both products step up to the plate very well.
Text Recognition:
Similar to object Identification, we conducted a test to analyze images that include text. Unfortunately, Amazon doesn’t yet provide a full-text recognition service. The Microsoft offering can find, analyze and write back text in different languages. Figure 3 and 4 present the results by Microsoft and Amazon after analyzing the texts included in uploaded images.
Figure 3: Microsoft Text Recognition Result
Figure 4: Amazon Text Recognition Results
If you need to analyse text within images, the Microsoft service is at present the only option. Amazon only shows that the uploaded image has text whereas Microsoft shows the actual text (even from multiple languages).
Face Detection:
One of the main applications of Computer Vision in AI is face detection. This can be extended to finding human demographics such as gender, age, emotion, wearing glasses, facial hair, ethnicity, etc. Figure 5 and 6 show our results.
Figure 5: Microsoft Face Detection
Figure 6: Amazon Face Detection
Both Microsoft and Amazon have the ability to find demographic information such as gender, age, whether they are wearing glasses or not, having a beard, etc. Microsoft goes one step further as faces can be grouped into visual similarity (such as verifying that two given faces belong to the same person). In addition, Microsoft can process real-time videos of people.
Emotion in Depth:
Computer Vision analyses a person’s emotion by studying his/her face. It returns anger, sadness, contempt, disgust, fear, happiness, neutral and surprise percentages.
Figure 7 Microsoft Emotion in Depth
If a business requires the analysis of someone’s emotion then Microsoft can analyze and measure each of the emotions listed above based on faces. Amazon only returns the percentage of detected smiles. Also, Microsoft can process both images and real-time videos.
Service Price:
This is not a quote but highlighting the simple cost comparisons as obtained from the respective Microsoft and Amazon pricing websites:
For Object Identification and Text Recognition, Amazon is priced at $1.00 per 1000 images, compared to Microsoft’s $1.50 per 1000 images.
For Demographic Recognition (e.g. gender, age, wearing glasses, etc.), Amazon is priced at $1.00 per 1000 images. Microsoft has a free plan if the number of calls is less than 30,000 per month, and above that, prices vary from $1.50 to $0.65 based on the number of calls. In addition, Emotion “in depth” has its own prices at $0.10 per 1000 calls.
Amazon Recognition (all services): https://Amazon.Amazon.com/rekognition/pricing/
Microsoft object and text identification: https://www.microsoft.com/cognitive-services/en-us/computer-vision-api
Microsoft face detection: https://www.microsoft.com/cognitive-services/en-us/face-api
Microsoft emotion in depth: https://www.microsoft.com/cognitive-services/en-us/emotion-api
Summary of services:
The following table provides a summary of Computer Vision services between Microsoft and Amazon (at the time of authoring of this article).
Conclusion:
Although Microsoft’s Computer Vision is in some areas more mature compared to the Amazon equivalent, it must be noted that Amazon’s Computer Vision services are much newer compared to Microsoft’s equivalent. We have seen a lot of investment by both vendors in this area, so expect Amazon to close the gaps in due course. However, at the time of writing this, Microsoft is certainly leading the pack in Computer Vision. But watch this space.