Luxms team has joined “Data Warehouse Modernization” section at Greenplum Summit, 26th August, 2020. Serg Shestakov, CEO of Luxms Group, and Dmitry Dorofeev, chief architect, shared a breakthrough case study which boosted heavy ETL analytical queries 30 times and protected resources of Greenplum. The result was achieved using Dremio – a robust SQL engine. So BI queries were run against a specially prepared SQL cache (reflections) . We’ve presented a target architecture for fast operational BI on MPP data with Dremio as ETL booster, Greenplum as a big data storage and Luxms MPP BI is for datacentric visualization and analytics.
Greenplum Summit this year is a series of online talks on best practices and use cases from Greenplum ecosystem partners and customers, and VMware engineering team.
We are collecting a mobile terminal characteristic database for a large mobile operator. Our workers watch lots of Internet reviews for mobile devices daily. There are endless phones, smartphones, tablets, routers, modems, etc. Information is presented in various forms including videos, articles, reviews, tech specs and it is available in several languages. We even process Chinese sources with the help of online translate services. We look for the mobile terminal serial number (IMEI), to identify the model of the terminal and then fill in about 70 tech characteristics. Interesting enough that great source of information are smartphone unboxing videos on YouTube.
Firstly, we need to find the place in a video or an article where the IMEI is displayed. Usually it is a sticker containing IMEI, the model and the manufacturer, one possible example is shown on figure 1. Once this image is found the operator saves it as a screenshot. Watching one video and taking a screenshot takes 4 minutes on the average. We challenged to automate the process.
We’ve built an application to detect IMEI using computer vision techniques. Inspired by this example we realized our business logic in Python3.7 using openCV and pytesseract libraries. We made a simple GUI (look at fig. 2) in the Jupyter notebook for easy app operation. We packed it into the docker container and made accessible locally for our staff.
While interface is not quite sexy you might be interested in the internals. Let’s move to the workflow. We upload a list of URLs into the app and start parsing in the background mode. When it is a YouTube video, the app extracts several frames of a video fragment, which is supposed to contain IMEI. Then the app analyses each 4th frame and tries to detect the IMEI. If it is a Web page, the app scrapes all the images from it and analyses them all. The app uses some image conversions for frames to find a region, most probably containing the IMEI. Then with some heuristics we check if it’s really an IMEI. Images from Web pages are checked with pytesseract API. Finally, we get a set of screenshots with IMEI ready to be processed by an operator.
Our app works accurately: for about 85% of URLs the app gives at least one image with IMEI. It takes about 4-5 seconds to parse a video and 15-20 seconds to parse a Web page. . It is much quicker than doing it with human eyes! Moreover, the app released our staff from watching these reviews and gave an opportunity to split the work between operators, as one can prepare a list of resources to analyse and another can check the results.
Now we plan to collect a dataset of the IMEI images, big enough to build an effective neural network for IMEI detection.
Rostelecom – Streamlining executive-level financial reports with big data analytics
Rostelecom is the largest digital services provider in Russia with 42.3 million subscribers and a backbone digital network running to a total length of 500,000 km. The telecom giant leverages MPP BI and Greenplum DB to drive performance of hundreds of its regional offices across the country with fast, interactive financial dashboards.
Regional offices are responsible for network operation and development. On October 1, 2017 Rostelecom started a project nicknamed “The owner of the territory”. The idea was that a head of a regional office is not only an engineer, but an owner of the business. These business heads manage their territory and should be involved in growing the territory’s profitability.
To provide heads with relevant financial data Rostelecom prepared a big amount of monthly analytics on profits from operations, profits from rent, losses, client base, subscriptions and churn. At the start reports were designed manually in Excel. Each time it took about 4 days for an analyst to edit megabytes of data from various sources, calculate analytical parameters and set up appropriate diagrams. The process often caused errors and confusion in data which led to multiple data verifications and re-work. On top of that it was very hard to drill-down in Excel. Before very long, the business heads were simply not using these limited Excel reports. So Rostelecom turned to BI.
With a powerful combination of MPP BI and Greenplum DB Rostelecom were able to build a consistent analytical solution to track its financial metrics. Rostelecom is enjoying rapid response times to its data queries even as their data volumes grow over time. Using Greenplum DB Rostelecom has built an integrated storage view of business data from over 130 regional offices and across more than 17 corporate data sources. MPP BI keeps data centric logic very close to data and uses Greenplum to process complex queries pushing them down to the database and visualizing the results for a business user. MPP BI stores only meta-data locally. All analytical data remain in Greenplum DB. In this way, MPP BI reports always present accurate business information to help drive profitability.
MPP BI enables simple and clear analytics not only for analysts in the finance division, but also for decision makers on-site. Rostelecom has designed custom visual dashboards to help heads easily see their scores among the others and sense their areas for growth, which direct a head from finding positions of his regional office in a corporate rating to analyzing his growing areas. This brings complete transparency into revenue and costs, and the profitability of any office against the others. With a vast array of information at his fingertips, he can drill down any number selecting a location, a contract or any other analytical parameter, going deeper into the report to obtain precise required information.
Rostelecom underlines the transformative impact of MPP BI on the whole or their business process. With company-wide ratings heads of the regional offices now have an easy way to drive increased profitability, as nobody wants to be the last. They also have quick access to robust analytics combining high-level information, calculated analytical parameters and transactional data that simply couldn’t be supported using Excel. Moreover, by automating their report generation process, Rostelecom is freeing up the time of analysts to help increase their business value.