Gaana Mangwao

How can AI make Urdu music more accessible?

🏆  Top UX Thesis Award - Purdue University

Roles

UX research, design, end-to-end development & usability testing

Timeline

Sep 2024-Current

Tools

Adobe Illustrator, CSS, Figma, Gemini, Google Analytics, HTML, JavaScript, Lottie, Miro, Qualtrics

Team

Solo

TLDR

Goal

Based on my analysis of successful cross-cultural musical exports, a key finding was the importance of subtitling. The primary goal is to enhance the global accessibility of Pakistani music by enabling previously inaccessible linguistic content to be searchable and editable online.

Problem

A key issue is that Pakistan's national language, Urdu, has poor software support. This often leads to the use of the Latin script (Roman Urdu) instead of the traditional Arabic one.

A strategic benefit of using the Latin script is that it also makes Pakistani songs accessible to audiences who understand Hindi but can't read the Arabic script, reflecting the region's shared history.

However, Roman Urdu lacks accuracy, and the actual Urdu script is being erased.

Methodology



Solution

  1. A responsive LLM-powered website that enables users to upload music files for rapid, highly accurate transcription of songs into Roman Urdu and English.

  2. It offers seamless conversion of Roman Urdu transcripts into Urdu script, and reverts to Roman Urdu as needed.

  3. Comprehensive editing tools, including guidance on Urdu and transliteration keyboards, empower users to ensure linguistic precision.

  4. Guidelines for displaying multiscript multilingual songs.

Final Technology Stack‍

Website Demo

Impact

"In olden times, this would have been called magic" -M1

Musicians have reported a 95-98% gain in efficiency when transcribing and distributing lyrics. Additionally, Gaana Mangwao has streamlined other workflows: producers use it for vocal backing, and songwriters rely on it for rapid song iterations. The platform has high user satisfaction and is experiencing strong organic growth.

For more details about the study, you can continue scrolling.

Secondary Research


Key Findings

FINDING 1
FINDING 2
FINDING 3
Typing Urdu is difficult
Urdu transcription usually fails
LLMs showed promise

Nastaliq script itself is complex for digital integration because it is right-to-left and has context-sensitive letter shapes.

Urdu is misidentified as Hindi and usually isn't recognised as a language. Transcribing multilingual audio (in Urdu and English) is more challenging.

While some are promising for automatic transcription due to their extensive databases, LLMs contain language biases.

Primary Research


Process mapping with singer and rapper songwriters


Key Findings

"Posting lyrics is akin to treating your songs as poetry" -P1

FINDING 1
FINDING 2
FINDING 3
Musicians often want to, but don't post final lyrics
Usually write lyrics on their phones
Extremely concerned about Urdu accuracy

Musicians recognize the value of posting lyrics, seeing it as crucial for audience engagement and song memorability. However, they often lack the time, as there is a lack of infrastructural support in Pakistan.

They primarily use their phones for songwriting, relying on the Notes app for its accessibility and mobility to quickly capture ideas.

Musicians fear Urdu grammatical/spelling errors more than English ones. They don't believe AI will be completely accurate due to the historical lack of Urdu tech support.

Urdu Technologists


  1. Zeerak shared insights and guidance based on his research on technology adoption and pain points when typing in Urdu.

  2. Usama demonstrated Gemini's ability to create near-accurate results of Urdu song transcriptions in Roman Urdu and Urdu script with strict prompt engineering.

  3. Sheikh Ahmed showed the importance of a Roman Urdu transliteration keyboard for transcribing bilingual songs and the value of ChatGPT for rapid transliterations i.e., converting Roman Urdu to Urdu script and vice versa.

Designing a Multilingual Tool

Ideation



Mid-fi Design and Development

Working MVP Web Prototype 


An MVP was developed to test Gemini’s real-time Roman Urdu song transcription capabilities and observe how users made corrections. Participants were also provided with a transliteration keyboard, which was a first-time experience for all of them. The songs were transcribed in Roman Urdu by default, as it is the mental model for digital Urdu song lyrics.

Figma Mobile Prototypes

Figma prototypes were created to test the necessity of all ideated features and to provide users with mobile-first layouts, aligning with their habit of songwriting on phones. To evaluate different display options for Roman Urdu and Urdu lyrics, participants were given two distinct prototypes: one where both lyric types coexisted on the same page, and another where users could toggle between the two.

Testing Transcription Prototypes


Usability testing was conducted with both musicians and Urdu music enthusiasts, as they are the most likely users to transcribe song lyrics. Remote testing was necessary for participants located in Pakistan, given the researcher's location in the US. Working prototype and Figma prototype designs were updated in iterations.

Key Observations & User Feedback

“99% of the work has been done.” -M2

  • Participants were pleasantly surprised by the accuracy of the transcriptions. All participants strongly demanded that the tool be launched as soon as possible.

  • The desktop view was preferred by participants, primarily because it offered a side-by-side comparison of different lyric types and more screen real estate for text editing, which felt cramped on mobile.

  • Some participants preferred to make edits to the Urdu script lyrics and then convert them into Roman Urdu.

  • Musicians specifically requested additional accent marks for English words converted into Urdu for legibility.

  • Implementing clear loading and error states is crucial to provide feedback and prevent user uncertainty.

  • The ability to use an Urdu keyboard was identified as essential for participants to easily make minor edits and add accent marks.


Hi-Fi Development


A final version of the website was created after incorporating user feedback. The Home page now also provides clear guidance for installing Urdu and transliteration keyboards, tailored to the user’s specific platform.


Key Feature Updates

UPDATE 1
UPDATE 2
UPDATE 3
Desktop-first Responsive Design
Bidirectional Lyric Conversion
Enhanced LLM Prompting

A mobile version was also created to account for the lack of laptop access, especially when touring.

Users can now seamlessly convert Roman Urdu lyrics to Urdu and vice versa.

Transcription and transliteration functions were updated to be more precise

UPDATE 4
UPDATE 5
UPDATE 6
YouTube & Timestamps
Branding
Increased Readability

Music lovers had asked for the ability to upload songs using YouTube URLs, and musicians appreciated the idea of timestamps for syncing songs.

Consistent branding was incorporated to enhance the website's aesthetic appeal and foster user trust through a positive halo effect.

A color contrast checker was used to ensure accessibility. Urdu script size was increased for readability.

Visual Captioning Design

Effective song distribution now relies heavily on short-form social media content. This study helped establish captioning guidelines by testing multilingual and multi-script captions on Pakistani and Indian audiences. Indians are included because they are a large part of the Pakistani music fanbase, and while Urdu and Hindi are phonetically similar, their scripts are different. Therefore, Roman Urdu/Hindi serves as a common script.

It was crucial to design for this experience and test musicians' assumptions, especially considering the limited digital space on mobile screens.



Multilingual multiscript short-form example



Outcome

A section of the Gaana Mangwao website is dedicated to the captioning guidelines in a digestible format, which responds to the assumptions musicians displayed in the primary research phase.


Limitations & Future Works

The study faced challenges in recruiting Urdu-medium artists and audience members, primarily because the researcher was located in the US during the study period. As a result, the findings are biased towards English-medium participants and may not fully represent the general population.

Behavioral analytic tools such as heatmaps and session replays are also being used for design improvements. Future development also includes extending the transcription tool to support other regional languages, such as Punjabi, Sindhi, and Pashto.

Finally, the multi-script captioning guidelines require more comprehensive development before they are disseminated. The goal is to share them through the Pakistani music magazine Hamnawa, which boasts a substantial readership of musicians.



Reflection

Gaana Mangwao was possible because of the foundational work done by others before me and all of my prior experiences. It may seem obvious, but this project would have turned out very differently without the supervision of Dr. Rua Williams at Purdue; advice of Zeerak Ahmed, founder of Matnsaz and Hamnawa (Pakistani pop magazine); Sheikh Ahmed's work at Musixmatch; Usama Bin Shafqat's experiments with LLMs; and all the musicians who were willing to give me their time despite a 10-hour time difference. As it turns out, AI can be used in a meaningful way. However, my biggest takeaway from this project is the importance of building a community.

Create a free website with Framer, the website builder loved by startups, designers and agencies.