- Deep Learning
Practical insights to succeed in your deep learning vision projects. Part 1: Fail fast and iterate.April 05, 2022
In this 3-part series we will be discussing practical insights that lead to success in computer vision projects. We’ll start with a quick introduction to deep learning, how to predict complexity in DL vision projects, and techniques to manage that complexity.
How does deep learning work?
Deep learning (DL) is a pattern matching technology capable of automatically assigning a label or category to a piece of unstructured data, like an image, video clip, sound bite or a piece of text. This is based on the concept of a neural network, a statistical model that is capable of modifying its own parameters given a set of inputs and outputs.To develop a working AI model you need a dataset where the input data and output label are both known and the relationship between them is clear. The AI will automatically extract patterns that are common across the inputs, known as features, and assign them a value in determining the right output or category. For example if an AI was trained on pictures of animals, it could learn that number of legs is a key feature for differentiating between species of animals.
If you wanted to train an AI to detect the type of boat in an image for example, you would provide it with a training set of images of boats with associated labels such as “fishing boat”. You could then show the AI a new picture of a boat and if the AI had seen similar boats before it would be able to tell what type of boat it is. One of the patterns that the AI may have learned in its training could be that if a boat is very long it’s likely to be an oil tanker or a cargo ship.
Today we will be using the example of an autonomous boat, using deep learning to detect if there are cargo ships in its path, in order to avoid getting too close to them. This deep learning task is known as object detection and is one of the applications of DL in the vision domain. We want the neural network to automatically detect when a cargo ship appears on our sensors. Other examples of DL tasks include: voice recognition, which is used by audio note taking apps to differentiate between speakers. Speech to text, used by call centers and AI assistants to automatically transform soundwaves into text. Character recognition which gives applications like translators the ability to extract text from images. The common theme being that DL can extract insights from unstructured data automatically.
Battle tested methodology: fail fast and iterate.
Our team at Lodestar has had experience delivering many deep learning projects, the most valuable single piece of insight we have derived is contained in the next paragraph.
Our team at Lodestar has had experience delivering many deep learning projects, the key to success that has been critical to the success of our projects is that you should use an iterative, milestone-oriented mindset.
Develop testable hypotheses, let the data answer your questions, fail fast, and iterate. This is why we highly recommend that you start with a narrow, focused problem scope that you solve end to end before expanding the problem. If you tackle too many things at once each step will explode in complexity and measuring progress will become very difficult. We believe this is one of the key reasons why according to Gartner 85% of AI projects fail before they even make it to production.
Today we are going to provide you the tools to have successful deep learning projects. Let’s start by answering two key questions you need to be aware of when starting a deep learning object detection project:
- What makes an object detection project complex?
- What can you do to make that complexity manageable?
What makes an object detection project complex?
Before attempting to solve an object detection problem using deep learning it’s critical to understand what makes a problem more or less difficult to solve using DL. Framing the problem correctly and understanding where challenges lie can multiply your chances of success.
Number of categories
The first parameter to take into account when thinking about the complexity of an object detection problem is to think about the number of categories that you want to automatically recognize. In order to successfully detect a category of objects you need to collect examples of that category as well as examples outside of that category and label them. Some categories will be much easier to collect data for while others will be much more complex. For example if you decided to start your autonomous boat project with ten categories including aircraft carriers. It may be that you will have completed the data collection and labeling for the first nine categories before you are to collect a single example of aircraft carriers, simply because they are much less frequent and require a much longer data collection effort. For that reason we recommend to start with 1-3 plentiful categories and expand the scope of the project once you’ve completed the full cycle at least once.
Variability of the object
Does the object we want to detect have a fixed shape? A fixed color or texture? The more variability in the object itself the more complex the task. For example, chairs have a much more defined shape than t-shirts, the color and texture of a chair can vary but its rigid structure brings it more visual consistency. Creating an AI capable of recognizing products in a furniture store has different challenges than doing it in a clothing store where the products are deformable. Detecting only ripe red tomatoes is easier than detecting ripe and unripe tomatoes as that introduces more variability in size and color.
Variability of lighting conditions
Lighting affects the visual characteristics of objects in ways that might not seem very important to us but can have a large impact for AI systems. Is this a lab setting where the lighting is precisely controlled or is this an environment with very little control over lighting like an outside application subject to the weather? If you want an AI to recognize objects in all types of lighting and weather conditions with very high accuracy you will need training data that includes all types of lighting and weather conditions.
Proximity of other classes
Are you trying to differentiate between close categories? For example, identifying one type of plant in a jungle with lots of plants? You need the AI to capture much more subtle features which could take a lot more training data. In these cases it could be useful to identify what the closest classes are and train the AI to recognize those as well. This way you can measure your progress with more insight into examples you know will be tricky.
Variability of the background
In object detection the variety of backgrounds your objects will be displayed against can have a significant impact on the difficulty of the problem. For example, in the case of detecting defects in manufacturing, you can imagine a limited manufacturing environment with consistent backgrounds and good lighting. This can create better defined edges and simplify the neural network’s task. However in a consumer application that can translate script from images the variety of backgrounds and materials is much more vast. Self-driving, for example, presents a very large challenge when we consider all the potential backgrounds involved in detecting all objects relevant to driving in any location.
What can you do to make that complexity manageable?
In order to make our need for data more manageable, increase our velocity, and enable us to measure progress, the problem we are trying to solve needs to be tightly defined. That way we can start testing our assumptions and iterate quickly. Once we’ve successfully proven DL can be used in this context we can easily expand the scope of the project. Let’s start by asking a few questions about the first problem we want to solve and develop a Minimum Viable Product (MVP) for our solution, using cargo ship detection as an example.
- What does the AI model need to do: Detect cargo ships.
- When will the AI predictions be used: In a range of 100-300 meters on the open sea, in all weather conditions, by day or night.
- Why does it matter: If our autonomous boat encounters a cargo ship on the open sea we want it to course correct in time and avoid waves and collisions.
This tight definition can now be used to inform the data collection process and allow us to focus on the most relevant data that will help us achieve our goals. In this case we should focus on images of cargo ships on the open sea between 100 and 300 meters from the camera and get every weather type as well as both night and day represented in the data.
In Part 1 we explored the factors that can increase the complexity of solving an object detection project using deep learning. We’ve identified the areas that can introduce variability and will therefore increase the need for data and labeling. We then presented an easy framework for reducing variability and keeping the scope manageable. The objective being to accelerate the rate of progress and enabling you to succeed quickly or fail fast and iterate.
In the next part of this series we will be sharing practical insights for data collection, curation and labeling.