We are collecting a mobile terminal characteristic database for a large mobile operator. Our workers watch lots of Internet reviews for mobile devices daily. There are endless phones, smartphones, tablets, routers, modems, etc. Information is presented in various forms including videos, articles, reviews, tech specs and it is available in several languages. We even process Chinese sources with the help of online translate services. We look for the mobile terminal serial number (IMEI), to identify the model of the terminal and then fill in about 70 tech characteristics. Interesting enough that great source of information are smartphone unboxing videos on YouTube.
Firstly, we need to find the place in a video or an article where the IMEI is displayed. Usually it is a sticker containing IMEI, the model and the manufacturer, one possible example is shown on figure 1. Once this image is found the operator saves it as a screenshot. Watching one video and taking a screenshot takes 4 minutes on the average. We challenged to automate the process.
Figure 1 – Example of IMEI
We’ve built an application to detect IMEI using computer vision techniques. Inspired by this example we realized our business logic in Python3.7 using openCV and pytesseract libraries. We made a simple GUI (look at fig. 2) in the Jupyter notebook for easy app operation. We packed it into the docker container and made accessible locally for our staff.
Figure 2 – GUI
While interface is not quite sexy you might be interested in the internals. Let’s move to the workflow. We upload a list of URLs into the app and start parsing in the background mode. When it is a YouTube video, the app extracts several frames of a video fragment, which is supposed to contain IMEI. Then the app analyses each 4th frame and tries to detect the IMEI. If it is a Web page, the app scrapes all the images from it and analyses them all. The app uses some image conversions for frames to find a region, most probably containing the IMEI. Then with some heuristics we check if it’s really an IMEI. Images from Web pages are checked with pytesseract API. Finally, we get a set of screenshots with IMEI ready to be processed by an operator.
Figure 3 – The workflow
Our app works accurately: for about 85% of URLs the app gives at least one image with IMEI. It takes about 4-5 seconds to parse a video and 15-20 seconds to parse a Web page. . It is much quicker than doing it with human eyes! Moreover, the app released our staff from watching these reviews and gave an opportunity to split the work between operators, as one can prepare a list of resources to analyse and another can check the results.
Now we plan to collect a dataset of the IMEI images, big enough to build an effective neural network for IMEI detection.