I completed a 3 week design sprint focused on analyzing and making design improvement recommendations for the Sound ID feature within the Merlin Bird ID app. Merlin Bird ID is a mobile app designed by The Cornell Lab of Ornithology, a member-supported unit of Cornell University in Ithaca, New York.
Merlin Bird ID helps users identify birds using three discreet methods:
Bird ID Wizard - 3 questions help narrow down possible matches.
Photo ID - Identification by photo.
Sound ID - Identification by sound.
Merlin users are encouraged to save and share their findings with the Cornell Lab and the greater scientific research community. This data is essential to scientists tracking bird populations and is used for environmental monitoring projects and conservation efforts. Additionally, users can share their identifications with other amateur birders, alerting them to interesting finds in their area.
"Our mission is to interpret and conserve the earth’s biological diversity through research, education, and citizen science focused on birds."
birds.cornell.edu
For this design sprint, I focused on Merlin's sound ID feature. Merlin's Sound ID feature works by translating live or prerecorded audio into a visualization called a spectrogram then analyzing that visualization to identify any possible birdcalls. After the users stops recording, they can check if the identified calls match with verified recordings of the birds identified. Once the user is confident in their identification, they can save the ID to a personal list and share the recording along with its geolocation with the Cornell Lab and other users.
"Sound ID listens to the birds around you and shows real-time suggestions for who’s singing. Compare your recording to the songs and calls in Merlin to confirm what you heard."
merlin.allaboutbirds.org/sound-id/
To learn more about birding and how Merlin is used in the field, I interviewed a subject matter expert. Gabe, a birder of many years has birded with and without assistive apps and has used Merlin Bird ID extensively in the field. The interview revolved around birders' goals and needs in the field, how birders interact with the larger birding community and how Merlin and other bird identification apps fit into this context.
-Avid birder with over 15 years of experience
-Uses Merlin Bird ID in the field
-Goes birding about twice a week
Getting a positive ID on a hoagie
Birders in the field are collecting information, taking pictures, recording audio, and taking notes. They can and will identify a bird on the spot, but often review their materials later back at home to check and verify an identification before sharing their finding.
- Deliberate, thoughtful review after a session is a major component of birding.
It is common practice for birders to identify a bird by audio alone. Birders will put a bird they only made a confident audio identification of on their list and report that to the broader birding community.
-There is no requirement or expectation of visual verification to report a bird to others
While Merlin Sound ID is relatively accurate at identifying a clear birdcall, a clean recording can be difficult to capture while out in the field. Additionally, the software can be incorrect in its identifications. Users will use the ID as a jumping off point but don't fully trust it to make a positive ID itself. Users always check the identification given.
-AI makes false identifications
-Acts as a starting point for birders
Most birders or people that are going out to see birds are not strictly trying to see a particular bird or identify a certain number of species. They are out to connect with nature and have a nice experience observing wildlife. Learning and knowing what kinds of birds one is looking at is a means to that end.
-Users' overarching goal in birding is to connect with nature and have a nice time
As part of the SME interview I asked Gabe to describe how he uses Merlin Bird ID out in the field. He described two main modes of use.
In the first scenario the user wants to get a general idea of what birds are in the area. The user sets the app to record and puts their device down or in a pocket to continue moving. After a time, they open their device back up to see what bird calls Merlin has identified. They can then stop the recording to review the identififications.
In the second scenario, the user hears a bird and recognizes the call but isn't quite positive they know what species it is. They open up the app and record the call to see if the AI confirms their hunch. The user can listen to existing examples of the call in question to either confirm or dispute the identification.
Of the insights that came from the SME interview, I kept coming back to the idea that birders are fundamentally trying to experience wildlife and connect with nature. To better serve this desire, it felt counter intuitive to bolster a digital experience. As I performed my own testing of Merlin in a nature preserve, my use of the app started to feel like a chore. I was significantly more engaged with the sights and sounds around me than with my phone. This feeling echoed something Gabe said to me, that users don't actually like using Merlin. They like what it does but the experience of using the app in the field is unpleasant and distracts from their primary focus.
Analyzing user reviews, performing my own field testing, and reviewing notes from the SME interview, I uncovered some common pain points. In response to those pain points I developed and tested a prototype bringing it back to the SME to test and with two additional test users.
The first field use scenario Gabe described was of scanning an area. In the scenario, a user starts a recording then puts the device in their pocket. After a time, they then open Merlin and review what birds were identified. This requires the user to either constantly look at their device to see what is being picked up or to intuit when enough data has been collected to review the findings. Both of these cases are incompatible with the user's primary goal of connecting with nature. To better serve birders using Merlin in the field, the user interaction should be more passive.
Reevaluating this interaction pattern revealed an opportunity to better serve users' primary focus. Notifying the user of bird identifications from the lock screen lets them review a scan without having to open their device, saving them time and easing the desired information into their experience.
My assumption was that glancing at a lock screen is a less disruptive interaction than fully opening a device up which which carries a burden apart from Merlin. Users can see how many new birds have been identified as well as which ones. Customizing lock screen notifications would allow users to define how often Merlin pushes notifications and if they want an audio ping, vibration, or no additional alert when a notification pops up.
Testing revealed users were very enthusiastic about lock screen notifications. Test users described the feature as intuitive and much needed. Gabe said "This is how the app should work". One test user expressed concern that the app would send too many notifications but with customization, users would be able to control the amount and frequency of notifications.
After a user has made an audio recording or uploaded an existing recording to Merlin, they are brought a screen that lets them review the analyzed audio clip. Users are able to navigate the audio file in three ways, using the play/ pause button, tapping one of the birds identified, or by manually dragging along the audio file.
The time code lets users know where they are in the file. Users can also see the visualization advance along the red playhead line, indicating what part of the file is currently being played. The visualization affords the user a way to scan the file visually, seeing where there are any sounds the software might have missed for identification and to know where they are in the file.
When users are dragging the spectrogram to navigate forwards or backwards, there is no audio being played. This is a problem because the timecode alone is not sufficient information to understand what parts of the clip have bird calls present. Nor does the spectrogram provide enough parsable information to tell users where to pay attention.
Scrubbing is an interaction where a user drags a playhead across an audio file to hear it. Scrubbing is a common interaction in digital audio interfaces you may have experienced it without ever noticing. That Merlin does not have scrubbing is a major oversight.
Implementing scrubbing would not change the interface visually at all, users still navigate the recording by tap and dragging. However, now while they are dragging they hear the audio as the clip advances and reverses along the playhead. This immediate audio feedback provides needed information to know where in the clip what audio is present. Helping users better navigate this complex interaction.
By implementing scrubbing, users can hear the audio as they pass by a visual indicator of a bird call learning what patterns to look for.
Test users were able to find multiple birdcalls in the provided recording within 15 seconds. Compared to users having to wait until the recording played out in real time. Users reported that not only did scrubbing offer a faster way to navigate the recording, it was a more intuitive interaction.
One user expressed that they found the audio while scrubbing harsh and noisy. In a future iteration I would test users' desire to toggle scrubbing on or off.
Gabe mentioned one major interaction birders have with Merlin is reviewing their findings after a birding session. Once back to home after their session, users will review audio recordings they took in the field. To do this, users drag the audio file back and forth doing A/B testing against existing bird call recordings to determine the accuracy of an identification. This complex operation currently lacks much needed information tagging.
Users tap on the image or name of an identified bird to be taken to the section in the file where that bird was identified. Nowhere on the interface does that timecode show up until the user taps the button. Furthermore, there is no indication of when the identified bird call ends. These recordings contain a lot of overlap of birdcalls and other noises. It can be difficult for users to determine what part of the audio Merlin has identified as belonging to a particular bird.
This is a problem because users need to be confident in their identifications in order to save and report them to the Cornell Ornithology Lab.
To more effectively analyze their recordings, users need to be able to understand where Merlin has identified a particular bird. To illustrate this information to users, I developed a mid fidelity prototype which tags their audio files. Effectively calling out to the user where an identified bird is present. This graphic overlay displays to the user where Merlin has identified a bird call and where that call ends. Icons pointing to the left and right above the spectrogram show the user where other bird calls are identified further off screen. Tapping those icons brings the user to the highlighted audio. Simple silhouette icons are used to indicate different birds. Information on an identified bird is brought up when the user is in proximity to it's identified area.
Silhouette icons appear again below the audio player in a list of the identified birds in the file. Each identified species shows timecodes where its call is present indicating to the user how many times the bird was identified and where. This information is stored in Merlin currently, but is hidden from the user in service of a streamlined design. My hypothesis is that when users are analyzing their audio, an information focused interface will provide the control they need to more confidently verify thier findings.
I performed a usability test by having users identify the possible interactions on screen and tell me what would happen if they were to interact with them. Users were able to identify the function of 90% of the interactions preemptively in my prototype. In contrast, they were only able to identify 70% on the current audio review screen. Test users felt the audio tags were very helpful in communicating where bird calls were identified. And that this information would improve user's capacity to analyze their audio.
An identified issue issue in the current interface is Merlin will sometimes identify multiple bird calls overlapping each other. Additionally there can be dozens of instances of an individual bird in a file. These extra data points could busy the interface making it difficult to navigate. A possible solution could be to limit the total number of displayed sound ID's for an individual species, and to introduce overlaying identifying graphics. In a future iteration of this project development and testing would be needed.
Identified bird link
Play verified clip
Clip control
More Recordings
Cardinal ID section
Sparrow ID section
Warbler ID section
User Recording control
ID’d species list links
To start a sound recording users navigate to Merlin's main menu then to the sound ID screen and press the microphone icon to start a recording. This navigation is unnecessarily long and cumbersome when we consider its use in context. In the field, users are dealing with adverse environmental conditions like screen glare, divided attention and trying to capture a bird's call before it flys away. Users need streamlined access to the record button to more efficiently use Sound ID while in the field.
To serve this need I included a sticky record button in the top navigation of my prototype. This reduces the number of taps to required start a recording to just one. The record button itself is a common microphone icon transformed into a toggle button. This acts as an on/off switch as well as indicating the recording status to users. Meaning that users and start and stop a recording while they are looking at information on another section of the app.
Usability testing revealed that users were not able to predict that the record button would start recording when tapped. They assumed it would require another action to confirm recording. However, after a single interaction with the button users learned it's function and were comfortable with it's behavior.
Further testing with Gabe and review against field scenarios showed that the assumption that users need for immediacy in recording was not as pressing as initially thought. In the field, birds make a lot of noise. It is reasonable to assume if you hear a bird call once, you will hear it again. Capturing a recording of a bird call is a matter of patience rather than speed.
"Did not expect it to just start recording, but I'm not mad"
"Thats cute"
"Does not seem entirely necessary, I'm not in a rush"
My research revealed that users want as passive a digital experience as possible while out in the field while also desiring more robust tools to review their recordings not in the field. Shifting the interface to better match these user needs will enhance user experience greatly. Including audio scrubbing and audio tagging will enable users to more easily navigate their audio analyses. Lock screen notifications ideally will allow users to forget that they are using an app in the field, only being reminded of its presence when it is of benefit to them. A sticky record button might be a benefit for an edge case user who needs to record something fast, but is lower on the priority list than the other designs offered. Working with an SME revealed insights that would not have come about by just working with casual users.
I learned a lot during this project about use in context. Balancing different needs of a user in different contexts within the same app provided for some challenging and fruitful ideation.