How AI is helping Pocket FM channel its 10x user growth, CIO News, ET CIO

Starting from a fairly small segment and now growing with over 15 million MALs (Monthly Active Listeners), as well as processing approximately 4 petabytes of data each month. It has become essential for Pocket FM, a leader in the OTT audio platform, to have the right automation in place.

“We use AI 70% of the time for automation, like automatic moderation of processes, quality checks, etc. Next is the use of AI for the listener experience. We always try to ensure our listeners get the best listening experience with the most relevant content, quality audio standard and an uninterrupted streaming experience. So for us, if you’re talking about AI, we normally use automating things in the novice scheme and creating better listening experiences for the future,” said Prateek Dixit, CTO and co-founder of PocketFM.

With the increase in the number of listeners, Pocket FM has found it essential to develop AI models based on deep learning to establish profiles of unique listeners based on their affinity with the content, which will serve as a backbone main for a personalized content suggestion engine. However, since they are community platforms, it becomes difficult to analyze these audio files and extract the data for machine learning to further develop their deep learning architecture.

“We work with our community of creators, including great writers, and the scripts are in multiple languages. It becomes difficult in terms of processing. Our deep learning infrastructure takes those audio files and converts 25-30% of the content to text through our intelligent speech-to-text conversion.We then tag these audio files with various metadata, including categories, genres, author, creator, etc. In addition, this data is used by the data science and content pipelines to power the recommendation engine. Listener personas, built through our data segmentation, are recommended in real time. Let’s say you’re not a very big listener the next day or two So how can I motivate you today Therefore we have this system that allows us to do certain things on the features related to reduction. on engagement, which increases our retention,” he explained.

But, if you take any machine learning pipeline, there are multiple steps involved because the algorithm is unique for each product.

“Obviously we can’t inherit someone else’s algorithm. Because we have particularly long fictional content, we have a different kind of challenge in terms of recommendations. So the coding, which is around 70%, is completely built in-house while we take help from Amazon for the infrastructure. We refer to tools such as Sagemaker, EMR, etc. This ensures that we do not involve too much bandwidth in infrastructure and capacity planning. So the algorithms are built entirely in-house and we’ve partnered with AWS for the infrastructure. This allows us to focus on the crucial part, which is an algorithm, and make sure we don’t bite too much resource bandwidth in terms of building infrastructure,” Dixit added.

Challenges on the way

As a community-driven, long-running audio entertainment, Pocket FM faces challenges in ensuring content quality and maintaining a consistent audio standard.

He explains: “There aren’t many specific solutions in the audio market, and we didn’t have solutions for ad hoc audio processing and moderation. We build these solutions in-house using AI/ML to automate large-scale audio file processing. But these are not specific solutions on the market. We are working on building our internal audio privacy systems using these which should help us automate this pipeline of quality checks. At the scale at which we operate, we get millions of audio files, and it becomes difficult to do it manually. So we don’t rely on manual processes and interventions.”

Therefore, the company develops deep learning solutions and automates this audio in QC pipelines. They structured their pipe to deal with many variables and data points from listeners. “We continue to improvise the system with data points from our listeners such as network carriers, internet speed and historical consumption pattern. These variables help build listener profiles and help us adopt a more personalized approach that intelligently optimizes experiences,” added Dixit.

Reduced infrastructure costs by 25%

With the expansion of the company, their main objective is to integrate technology in the spirit of cost reduction. For example, until recently, their content filtering was done manually, which cost us a lot of time and money. Automation has helped Pocket FM reduce the cost of content moderation.

“We thought, why shouldn’t we hose down the full psychic? It’s like it’s a perfect system, and if we miss a few months in it, it might at least save us that amount of money. We can let the team focus on something more relevant. Instead of forcing or moderating a bit. The modern system is one of the things that is probably dedicated to supporting our revenue saving costs.

Citing the example, Dixit says that by implementing the micropayment model, Pocket FM achieved a 350% increase in revenue in just one quarter. They built the entire model around in-app virtual currency through their partners’ payment gateway to make the subscription model appealing to listeners. This allows their community of listeners to consume free limited episodes daily or unlock one or more episodes through their in-app micropayment mechanism.

As they grow exponentially, their next plan is to save their computing power. “It’s all about compute capacity and computing power. The more compute hours, the higher the infrastructure costs for you. We plan to do about three-quarters of what we have now by moving to a full-featured, database-based operations platform like Kubernetes. We found we could reduce that official cost by a quarter. So we’re working to reduce infrastructure costs by 25% by putting implement a multi-cloud strategy and virtualization,” he said.

NLP to the point

In addition, the company with high priorities will work around NLP and the text-to-speech engine. So, for the translation process, they plan to strengthen their language processing aspect.

“On the text-to-speech engine, we are working on recording voice artists and with a few partners who help us optimize the quality of the text-to-speech and the upscaling process. These are the priority areas for the next few months that will help us increase the content experience for our listeners,” he said.

Comments are closed.