Analyzing app reviews using LLMs for competitive intelligence

Published on

Overview

Mobile games that I've built as part of Area730 game studio received multiple hundreds of thousands of reviews. Total competitor apps got much more.

When building those games, we used to manually go through thousands of reviews at a time for hours, making notes of what players like, dislike, want, or maybe there is a new competitor we haven't heard about. We also wanted to understand our audience and create some resemblance of a customer profile to better understand who played our games.

OpenAI's flagship model GPT-4o was just released less than a week ago, and I wanted to give it a spin. The idea was to automate those tedious tasks that took hours and compress them into a few minutes with the help of OpenAI's SOTA model.

Getting the reviews

We will use Android app store for this example.

To pull reviews we will use Google Play Scraper library for Python.

It is not typed, so we will add a Pydantic model for the review type based on API responses from Google Play.


from datetime import datetime
from uuid import UUID
from pydantic import BaseModel, Field, HttpUrl

class AppReview(BaseModel):
    review_id: UUID = Field(alias="reviewId")
    user_name: str = Field(alias="userName")
    user_image: HttpUrl = Field(alias="userImage")
    content: str
    score: int = Field(ge=1, le=5)
    thumbs_up_count: int = Field(alias="thumbsUpCount", ge=0)
    review_created_version: str | None = Field(
        alias="reviewCreatedVersion", default=None
    )
    at: datetime
    reply_content: str | None = Field(alias="replyContent", default=None)
    replied_at: datetime | None = Field(alias="repliedAt", default=None)
    app_version: str | None = Field(alias="appVersion", default=None)

    class Config:
        validate_by_name = True
        json_encoders = {datetime: lambda v: v.isoformat(), UUID: lambda v: str(v)}

    @property
    def has_reply(self) -> bool:
        return self.reply_content is not None

    @property
    def review_age_days(self) -> int:
        return (datetime.now() - self.at).days

Next, we will write a function to get a specified amount of reviews and convert them to typed objects. The API has paging, so we have to factor that in.

from google_play_scraper import Sort, app, reviews, search


def get_app_reviews(
    package: str, limit: int, filter_rating: int | None = None
) -> list[AppReview]:
    results = []
    continuation_token = None
    count = min(limit, 4000)

    while len(results) < limit:
        res, continuation_token = reviews(
            package,
            lang="en",
            country="us",
            sort=Sort.NEWEST,
            count=count,
            filter_score_with=filter_rating,
            continuation_token=continuation_token,
        )

        results.extend([AppReview.model_validate(r) for r in res])

        if continuation_token.token is None:
            break

    return results[:limit]

Nice, now we have a list of reviews for any Android app.

We will add a small object to store our results so they can be serialized to a file in case we want to run some analytics on them later.

class AppReviewResult(BaseModel):
    app_id: str # this is Android package
    reviews: list[AppReview]

LLM's performance degrades as context grows, and even if it didn't - we want to analyze tens of thousands of reviews that may not fit into the context limit.

So we will need to process data in chunks of defined size. The exact size of the chunk will depend on the model you use. There are many benchmarks that measure at what context length the performance of a certain LLM starts to degrade.

For this example, we will use a context size of 120,000 characters, which should roughly translate to 30,000 tokens.

To structure our context, we will use custom XML tags. They are one of the best options to split context into sections, according to Anthropic and multiple studies.

<app_reviews>
  <review>...</review>
  <review>...</review>
</app_reviews>

The next step is to come up with a prompt that will guide the LLM to better accomplish its task. This is a simple version that works quite well, but depending on your case, you may want to extend it with custom instructions.

You are a given a list of reviews from the mobile app.
    Analyze them and return a lists of:
    - most common complains
    - most loved features
    - requested features
    - user's frustrations
    - alternatives to this app users may have mentioned
    - list of possible customer personas you can derive from given reviews


<app_reviews>
  <review>...</review>
  <review>...</review>
</app_reviews>

Next, we need to define a structure of the output we want to get from the LLM. This model will be translated to a JSON schema when we do a call to the LLM API. We can use field descriptions to add more context and help the LLM return better results.


class ReviewAnalyzeResponse(BaseModel):
    complaints: list[str] = Field(
        description="List of most frequent complaints from users"
    )
    requested_features: list[str] = Field(
        description="List of most requested features from users"
    )
    features_users_loved: list[str] = Field(
        description="List of most features users loved"
    )
    user_frustrations: list[str] = Field(
        description="List of things user is most frustrated about"
    )
    alternatives: list[str] = Field(
        description="List of alternatives to this app mentioned by users"
    )
    customer_personas: list[str] = Field(
        description="List of customer personas based on the reviews they left, no more that 5"
    )

For each batch of reviews, we will make a call to the LLM and get a structured output defined above.

Different batches will have some unique and some repeating items, so we need a way to merge and de-duplicate them. We will use another LLM call to achieve this.

def normalize_results(results: list[ReviewAnalyzeResponse]) -> ReviewAnalyzeResponse:
    data = "".join(f"<result>{json.dumps(r.model_dump(), indent=2)}\n" for r in results)
    prompt = f"""You are given a list of objects, they may contain overlapping items in each property.
    Normalize and merge them into one object.

    <result_objects>
    {data}
    </result_objects>"""

    agent = create_agent(ReviewAnalyzeResponse)
    result = agent.run_sync(prompt)

    return result.output

Now we just have to pull all this together into one function.

from pydantic_ai import Agent

def create_agent(output_type):
    return Agent("openai:gpt-4.1-2025-04-14", output_type=output_type)

def analyze_app(package: str, review_count: int) -> ReviewAnalyzeResponse:
    print(f"Getting [{review_count}] latest reviews ...")

    reviews = get_app_reviews(package, review_count)
    res = AppReviewResult(
        app_id=package,
        reviews=reviews,
    )

    print("\t > Analyzing ...")

    max_chars_per_batch = 120_000
    review_prompt_batches = []
    prompt_cache = ""

    for review in reviews:
        prompt_cache += f"<review>{review.content}</review>\n"

        if len(prompt_cache) > max_chars_per_batch:
            review_prompt_batches.append(prompt_cache)
            prompt_cache = ""

    review_prompt_batches.append(prompt_cache)

    print(f"\t > Batches: {len(review_prompt_batches)}")

    base_prompt = """You are a given a list of reviews from the mobile app.
    Analyze them and return a lists of:
    - most common complains
    - most loved features
    - requested features
    - user's frustrations
    - alternatives to this app users may have mentioned
    - list of possible customer personas you can derive from given reviews

    <app_reviews>
    {reviews}
    </app_reviews>"""

    results: list[ReviewAnalyzeResponse] = []

    agent = create_agent(ReviewAnalyzeResponse)

    for i, batch in enumerate(review_prompt_batches):
        print(f"\t > Analyzing batch {(i + 1)}/{len(review_prompt_batches)} ...")
        result = agent.run_sync(base_prompt.format(reviews=batch))
        results.append(result.output)

    if len(results) > 1:
        print("\t > Normalizing results...")
        final_result = normalize_results(results)
    else:
        final_result = results[0]

    print("Done")

    return final_result

Testing

Let's test our solution on the Netflix app. From the Play Store URL, we can get its package - com.netflix.mediaclient.

With our current configuration, we get around 36 batches. After a few minutes, here is our result:

{
  "complaints": [
    "Frequent app crashes, freezing, not opening—especially after updates (mobile, smart TVs, multiple devices)",
    "Household/account sharing restrictions and verification (limits for travelers, students, families, multi-device users)",
    "Too many ads/ads on paid subscriptions, intrusive/unwanted features (games, pop-ups)",
    "Constant price increases, high subscription cost, perceived lack of value",
    "Login/account access/sign-in problems, being repeatedly logged out or unable to sign up",
    "Problems with payment processing, auto-renewal, refunds, double charges, limited payment options (no mobile money, Play Store, etc.)",
    "Content removal/cancellation of favorites, missing/incomplete/new content, region-locked or lacking language dubs/subtitles",
    "Poor customer support, unresolved or slow support, difficulty unsubscribing/canceling",
    "App brightness/auto-brightness/HDR/volume bugs overriding device settings",
    "Download/offline viewing bugs (downloads expiring, not working, device/download limits)",
    "Streaming quality/technical issues (buffering, black/green screens, bad video/audio, not matching plan/price)",
    "Constant UI/layout changes, hard to navigate, missing features/categories, forced feature updates",
    "Problems with casting/Chromecast, especially on ad-supported plans"
  ],
  "requested_features": [
    "More flexible account sharing (multi-household, family plans, travel, student/expat/remote use)",
    "Manual video quality/streaming resolution selection",
    "Option to disable/remove games, unwanted genres, and in-app promotions/features",
    "Better/more regional language dubs and subtitles (Hindi, Kannada, Tamil, Tagalog, Telugu, Arabic, etc.)",
    "Better/more local/regional payment options (mobile money, wallets, Play Store, annual plans, student discounts, crypto)",
    "Bring back/restore popular shows, full anime series, and frequently update/add new/high-demand content",
    "Cheaper/more affordable, yearly, or trial plans and family/student/bundle pricing",
    "Ability to filter/sort/search by language, genre, country, release date, rating, or actor",
    "Improved parental/kid controls, more/locked profiles, content rating filters",
    "Comment/review sections for titles, better personalized recommendations, watched/continue lists at top, improved custom lists/user folders",
    "Ability to customize/override app brightness, HDR, or use device's native settings",
    "Offline downloads on all devices, higher download limits, batch management, download after subscription ends",
    "Improved casting/Chromecast/PiP/portrait mode support, tech enhancements (fast forward, skip intro, etc.)",
    "Live events, sports, and pay-per-title options"
  ],
  "features_users_loved": [
    "Large and diverse content library (movies, anime, originals, K-dramas, regional/international content)",
    "High streaming and offline download quality (HD/4K, Dolby, stable connections)",
    "Personalized recommendations, easy navigation, user-friendly interface (prior to latest updates)",
    "No/limited ads on some plans, ability to watch anywhere (multi-device/TV/phone/tablet)",
    "Family-friendly features, kids/parental controls, multiple user profiles",
    "Content available in multiple languages/audio/subtitles",
    "Exclusive/original content (e.g., Squid Game, WWE, K-dramas, anime)",
    "Regular new releases for binge-watching, trending, and classic titles"
  ],
  "user_frustrations": [
    "Locked out/forced verification/account denied due to 'household' restrictions as a legitimate user (travelers, students, families)",
    "Frequent crashes, broken updates, black/green/blank screens, and persistent technical bugs (across devices, after app updates)",
    "Constant price hikes without content/value improvements, fees charged after cancelation, feeling 'scammed'",
    "Ads/feature pushes on paid plans, forced UI/layout changes, unwanted mobile games cluttering interface",
    "Removal/loss of favorite or incomplete content, lack of dubs, missing regional titles after subscribing",
    "Poor or unresponsive customer support, unresolved payment, download, or account issues",
    "Download/offline mode fails to work as expected, device/download limits, expiring downloads",
    "Brightness/HDR control overriding device settings causing eye strain/bad viewing experience",
    "Trouble with payment options/auto-renewal, refunds not issued, unwanted subscriptions charged",
    "Regional content restrictions, geo-blocks, or missing desired language dubs/subs"
  ],
  "alternatives": [
    "Amazon Prime Video/Prime",
    "Disney+ / Disney+ Hotstar",
    "Hulu",
    "HBO Max/Max",
    "YouTube / YouTube Premium",
    "Tubi",
    "Paramount+",
    "Crunchyroll",
    "Showmax",
    "Peacock",
    "JioCinema/Hotstar/Voot/MX Player (India)",
    "Apple TV",
    "Pluto TV",
    "Viki",
    "Movie Box/Movie Boss/Moviebox",
    "Loklok",
    "Telegram (informal downloads/piracy)",
    "Public libraries/streaming/Pluto TV/Cable",
    "Piracy/torrent sites",
    "Bilibili"
  ],
  "customer_personas": [
    "Budget-conscious and value seekers (concerned about pricing, ads, deals, consider alternatives quickly)",
    "Multigenerational families/shared accounts (desire flexible access, multiple profiles, kids control, frustrated by household/device limits)",
    "Tech-savvy and power users (demand high streaming quality, app reliability, advanced customization and preferred controls, dislike bugs/UI clutter)",
    "International/regional/language-focused viewers (require local dubs/subtitles, desire wide/balanced content, compare with other global services)",
    "Mobile-first and on-the-go travelers/expats (need offline downloads, flexible device/location access, frequent login and tech issues)"
  ]
}

Conclusion

This relatively simple app unlocks insights that previously took many hours to find.

It opens a way for useful automations like running this on a weekly basis to stay on top of the product, as well as easily catch regressions or other issues that users may submit in reviews instead of writing to support (which is usually the case in B2C apps like Netflix).

And we can use it to monitor our competitors or do product research (e.g., pick an app you want to compete with, analyze its reviews, and build to solve users' pain points).