What does the model detect?
This model detects street view text detection from images/videos.
What is the use of this model?
The model is the foundation for OCR technology. It is able to detect text in the image and help run OCR on the same. It has uses in the logistics sector, for example – number plate recognition. The model can be inserted in blind assistive devices with ocr and text-to-speech to read-out-loud the text on signages. In dealing with outdoor street-level imagery, we note two characteristics. (1) Image text often comes from business signages (2) Business names are available through geographic business searches. These factors make the Street View Text set uniquely suited for word spotting in the wild: given a street view image, the goal is to identify words from nearby businesses. In computer vision, the method of converting this text present in images or scanned documents to a machine-readable format that can later be edited, searched, and used for further processing is known as Optical Character Recognition (OCR).
Approach to creating a model in Vredefort
Step 1 – Dataset Collection
We collected a dataset from Kaggle. It contains 300 images for training and 50 images for testing. There is only one class – Text.
The Street View Text (SVT) dataset was harvested from Google Street View. Image text in this data exhibits high variability and often has low resolution.
Step 2 – Data Cleaning
After collecting the dataset, we uploaded it on Vredefort. Vredefort automatically cleans the data by removing the corrupt images and resizing them to a suitable resolution.
Step 3 – Data Annotation
The computer learns to detect objects from images through a process of labeling. Thus, we drew boxes around the concerned objects and labeled them as text (only one object to detect).
We annotated 300 images using the inbuilt Vredefort tool.
Annotation Rules – (Keep them in mind for better detection)
⦁ Skip the object if it is in motion or blur.
⦁ Precisely draw the bounding box on the object.
⦁ Bounding boxes should not be too large.
[Optional] Step 4 – Tuning Parameters
If you register as a developer and developer mode is on, you can modify the number of epochs, batch size per GPU, neural network model, etc. In case of no user inputs, the settings will change to default.
Step 5 – Training
The training process takes place automatically with a single click.
Evaluation of the model
After training, we can evaluate the model.
In evaluation, there are two parts. The first is accuracy and the second is to play inference videos. Vredefort enables us to obtain total model accuracy and class-wise accuracy. In this case, only one class is present. We achieved 14% model accuracy.
A new video for inference
We made a video from test dataset images and used it for interference. If the developer mode is on, it will ask to set confidence. You can set it as per your convenience. Here we set 0.1 [10%] confidence.
Model download and transfer learning from unpruned model
Vredefort provides one more feature to get the accuracy of the model. It allows you to download the model and dataset for further applications(like adding logic to your model). If you have downloaded model files, you can use the unpruned model (click here to know more about the unpruned model) for different datasets and save training time. You can generate alerts and write use-cases with that model.
Any challenges faced
⦁ The model is trained on street view images, hence will work best on those images or video feeds.
⦁ It will struggle to detect text has different shapes, sizes.
For more model accuracy, collect the dataset with different colors, sizes, and shapes.
Model Name – Street Text Detection
Dataset Images – 300
Number of Labels – 1
Label name and count – text (1012)
Accuracy – 14%
Dataset Download – Download here
Model Download Link – Download here
Inference Video Link – Download here