EPE Partners is aware that advances in Artificial Intelligence (AI) technology has enabled engineers to come up with a software that can recognize and describe the content in photos and videos. Previously, image recognition, also known as computer vision, was limited to recognizing discrete objects in an image. However, researchers at the Stanford University and at Google have identified a new software, which identifies and describes the entire scene in a picture. The software can also write highly accurate captions in 'English', describing the picture. Today, artificial intelligence software which can mimic the observational and understanding capability of humans and can recognize and describe the content of videos and photographs with great accuracy are also available.
The annual developers' conference held in April 2017 by Facebook witnessed Mark Zuckerberg outlining the social network's AI plans to create systems which are better than humans in perception. He then demonstrated a new, impressive image-recognition technology designed for the blind, which identifies what is going on in the image and explains it aloud. This indicates the multitude of beneficial applications, which businesses worldwide can harness by using artificial intelligent programs and latest trends in image recognition.
Right from the safety features in cars that detect large objects to programs that assist the visually impaired, the benefits of image recognition are making new waves. Although the benefits are just making their way into new industry sectors, they are heading with a great pace and depth. For instance, the LDV Vision Summit saw Evan Nisselson of the LDV Capital stating that, "Currently, the advances in computer vision are providing tremendous, new opportunities to analyze images that exponentially impact various business verticals, from advertising to automotive". With the application of Artificial Intelligence across numerous industry sectors, such as gaming, natural language procession, or bioinformatics, image recognition is also taken to an all new level by AI.
Today, computer vision has greatly benefited from the deep-learning technology, superior programming tools, exhaustive open-source data bases, as well as quick and affordable computing. Although headlines refer Artificial Intelligence as the next big thing, how exactly they work and can be used by businesses to provide better image technology to the world still need to be addressed. Are Facebook's DeepFace and Microsoft's Project Oxford the same as Google's TensorFlow? Well, not exactly. However, we can gain a clearer insight with a quick breakdown of all the latest image recognition technology and the ways in which businesses are making use of them.
Massive amounts of data is required to prepare computers for quickly and accurately identifying what exactly is present in the pictures. Some of the massive databases, which can be used by anyone, include Pascal VOC and ImageNet. They contain millions of keyword-tagged images describing the objects present in the pictures - everything from sports and pizzas to mountains and cats. Such massive, open datasets are the basis of system training. For example, computers quickly identify "horses" in the photos because they have learned what "horses" look like by analyzing several images tagged with the word "horse".
ImageNet was launched by the scientists of Princeton and Stanford in the year 2009, with close to 80,000 keyword-tagged images, which has now grown to over 14 million tagged images. All these images are easily accessible at any given point of time for machine training. On the other hand, Pascal VOC is powered by numerous universities in the UK and offers fewer images, however each of these come with richer annotation. This rich annotation not only improves the accuracy of machine training, but also paces up the overall processes for some applications, by omitting few of the cumbersome computer subtasks.
Well, this is not the case with social networking giants like Facebook and Google. These companies have the advantage of accessing several user-labeled images directly from Facebook and Google Photos to prepare their deep-learning networks to become highly accurate.
Once image datasets are available, the next step would be to prepare machines to learn from these images. Freely available frameworks, such as open-source software libraries serve as the starting point for machine training purposes. They provide different types of computer-vision functions, such as emotion and facial recognition, large obstacle detection in vehicles, and medical screening. Some of the popular libraries are Torch and Google TensorFlow.
Created in the year 2002, Torch is used by the Facebook AI Research (FAIR), which had open-sourced a few of its modules in early 2015. Google TensorFlow is also a well-known library with its selected parts open sourced late 2015. Another popular open-source framework is UC Berkeley's Caffe, which has been in use since 2009 and is known for its huge community of innovators and the ease of customizability it offers. Although these tools are robust and flexible, they require quality hardware and efficient computer vision engineers for increasing the efficiency of machine training. Therefore, they make a good choice only for those companies who consider computer vision as an important aspect of their product strategy.
Not many companies have skilled image recognition experts or would want to invest in an in-house computer vision engineering team. However, the task does not end with finding the right team because getting things done correctly might involve a lot of work. This is exactly where hosted API services can be used. Being cloud-based, they provide customized, out-of-the-box image-recognition services, which can be used to build a feature, an entire business, or easily integrate with the existing apps.
For instance, a travel channel might require "landmark detection" to showcase relevant pictures on the landing page for a landmark or a dating site would carefully want to filter out all the "unsafe" profile pictures uploaded by its users. Neither of them need to invest in deep-learning processes or hire an engineering team of their own, but can certainly benefit from these techniques.
For example, Google Cloud Vision offers a variety of image detection services, which include optical character and facial recognition, explicit content detection, etc. and charge per photo. Next, there is Microsoft Cognitive Services offering visual image recognition APIs, which include face and celebrity detection, emotion, etc. and then charge a specific amount for every 1,000 transactions. However, start-ups such as Clarifai provide numerous computer vision APIs including the ones for organizing the content, filter out user-generated, unsafe videos and images, and also make purchasing recommendations.
With Artificial Intelligence in image recognition, computer vision has become a technique that rarely exists in isolation. It gets stronger by accessing more and more images, real-time big data, and other unique applications. While companies having a team of computer vision engineers can use a combination of open-source frameworks and open data, the others can easily use hosted APIs, if their business stakes are not dependent on computer vision. Therefore, businesses that wisely harness these services are the ones that are poised for success.