Author: Etienne Oosthuysen
We increasingly hear statements like, “machines are smarter than us” and “they will take over our jobs”. The fact of the matter is that computers can simply compute faster, and more accurately than humans can. So, in the short video below, we instead focus on how machines can be used to assist us do our jobs better, rather than viewing AI as an imminent threat. It shows how AI can assist in better occupational health and safety in the hospitality industry. It does however apply to many use cases across many industries, and positions AI as an enabler. Also see an extended description of the solution after the video demo.
Image and video recognition – a new dimension of data analytics
With the introduction of video, image and video streaming analytics, the realm of advanced data analytics and artificial intelligence just stepped up a notch.
All the big players are currently competing to provide the best and most powerful versions; Microsoft with Azure Cognitive Services APIs, Amazon with AWS Rekognition, Google Cloud Video Intelligence as well as IBM with Intelligent Video Analytics.
Not only can we analyse textual or numerical data historically or in real time, we’re now able to extend this to use cases of videos and images. Currently, there are API’s available to carry out these conceptual tasks:
- Face Detection
o Identify a person from a repository / collection of faces
o Celebrity recognition
- Facial Analysis
o Identify emotion, age, and other demographics within individual faces
- Object, Scene and Activity Detection
o Return objects the algorithm has identified within specific frames i.e. cars, hats, animals
o Return location settings i.e. kitchen, beach, mountain
o Return activities from video frame i.e. riding, cycling, swimming
- Tracking
o Track movement/path of people within a video
- Unsafe Content Detection
o Auto moderate inappropriate content i.e. Adult only content
- Text Detection
o Recognise text from images
The business benefits
Thanks to cloud computing, this complex and resource demanding functionality can be used with relative ease by businesses. Instead of having to develop complex systems and processes to accomplish such tasks, a business can now leverage the intelligence and immense processing power of cloud products, freeing them up to focus on how best to apply the output.
In a nutshell, vendors offering video and image services are essentially providing users API’s which can interact with the several located cloud hosts they maintain globally. All the user needs to do, therefore, is provide the input and manage the responses provided by the many calls that can be made using the provided API’s. The exposé team currently have the required skills and capability to ‘plug and play’ with these API’s with many use cases already outlined.
Potential use cases
As capable as these functions already are, improvements are happening all the time. While the potential scope is staggering, the following cases are based on the currently available. There are potentially many, many more – the sky really is the limit.
Cardless, pinless entry using facial recognition only
This is a camera used to view a person’s face, which then gets integrated with the facial recognition API’s. This then sends a response, which can be used to either open the entry or leave it shut. Not only does this improve security, preventing the use of someone else’s card, or pin number, but if someone were to follow another person through the entry, security can be immediately alerted. Additional cameras can be placed throughout the secure location to ensure that only authorised people are within the specified area.
Our own test drive use case
As an extension of the above cardless, pinless entry using facial recognition only use case, additional API’s can be used to not only determine if a person is authorised to enter a secure area, but to check if they are wearing the correct safety equipment. The value this brings to various occupational health and safety functions is evident.
We have performed the following scenario ourselves, using a selection of API’s to provide the alert. The video above demonstrates a chef who the API recognises using face detection. Another API is then used to determine that he is wearing the required head wear (a chef’s hat). As soon as the chef is seen in the kitchen not wearing the appropriate attire, an alert is sent to his manager to report the incident.
Technical jargon
To provide some understanding of how this scenario plays out architecturally, here is the conceptual architecture used in the solution showcased in the referenced Video.
Architecture Pre-requisite:
· Face Repository / Collection
Images of faces of people in the organisation. The vendors solution maps facial features, e.g. distance between eyes, and stores this information against a specific face. This is required by the succeeding video analytics as it needs to be able to recognise a face from various angles, distances and scenes. Associated with the faces are other metadata such as name, date range for permission to be on site, and even extra information such as work hours.
Architecture of the AI Process:
· Video or Images storage
Store the video to be processed within the vendors storage location within the cloud, so it is accessible to the API’s that will be subsequently used to analyse the video/image.
· Face Detection and Recognition API’s
Run the video/images through the Face Detection and Recognition API to determine where a face is detected and if a particular face is matched from the Face Repository / Collection. This will return the timestamp and bounding box of the identified faces as output.
· Frame splitting
Use the face detection output and 3rd party video library to extract the relevant frames from the video to be sent off to additional API’s for further analysis. Within each frames timestamp create a subset of images from the detected faces bounding box, there could be 1 or more faces detected in a frame. The bounding box extract will be expanded to encompass the face and area above the head ready for the next step.
· Object Detection API’s
Run object detection over the extracted subset of images from the frame. In our scenario we’re looking to detect if the person is wearing their required kitchen attire (Chef hat) or not. We can use this output in combination with the person detected to send an appropriate alert.
· Messaging Service
Once it has been detected that a person is not wearing the appropriate attire within the kitchen an alert mechanism can be triggered to send to management or other persons via e-mail, SMS or other mediums. In our video we have received an alert via SMS on the managers phone.
Below we have highlighted the components of the Architecture in a diagram:
Conclusion
These are just a couple of examples of how we can interact with such powerful functionality; all available in the cloud. It really does open the door to a plethora of different ways we can interact with videos and images and automate responses. Moreover, it’s an illustration of how we can analyse what is occurring in our data, extracted from a new medium – which adds an exciting new dynamic!
Video and image analytics opens up immense possibilities to not only further analyse but to automate tasks within your organisation. Leveraging this capability, the exposé team can apply our experience to your organisation, enabling you to harness some of the most advanced cloud services being produced by the big vendors. As we mentioned earlier, this is a space that will only continue to evolve and improve with more possibilities in the near future.
Do not hesitate to call us to see how we may be able to help.
Contributors to this solution and blog entry:
Jake Deed – https://www.linkedin.com/in/jakedeed/
Cameron Wells – https://www.linkedin.com/in/camerongwells/
Etienne Oosthuysen – https://www.linkedin.com/in/etienneo/
Chris Antonello – https://www.linkedin.com/in/christopher-antonello-51a0b592/