PDF Words that Describe Timbre - Welcome to alumni.media.mit.edu

MIT Media Lab Words that Describe Timbre

A Study of Auditory Perception Through Language

Mihir Sarkar

mihir@media.mit.edu

Barry Vercoe

bv@media.mit.edu

Yang Yang

y_yang@mit.edu

acoustic gentle

cold brassy

wet rich hard airy ringing dry annoying short long muted resonant light metallic rough hot dry wet thin gritty short small wooden cheerful warm sustained nasal dry short close acoustic reedy big small airy muted resonant pure heavy

Background

Musicians and music lovers often use non-technical words to describe the quality of the sounds they create or hear. In particular, terminology that is sensory in nature (e.g. bright, warm, smooth) is commonly employed. Our project investigates the relationship between auditory perception and language in this context.

Research Questions We are interested in finding whether people use a common vocabulary to describe timbre, or if their choice of words is related to their musical or cultural background. We are also examining correlations between words and timbre by identifying common audio features in sound files described (in a statistical sense) by the same words.

Preliminary Study We deployed a computer-based survey in which we asked people to assign words to sounds that were played to them. In our preliminary survey, 40 sounds that were collected from the Freesound online database were presented randomly to 7 users. After listening to each sound, users were asked to rate a list of words on a scale ranging from "not applicable" to "highly applicable". 40 words were carefully selected from previous studies and from the labels assigned by contributors to the Freesound database. Out of those, 12 words were randomly presented at each survey session. Chosen words included material properties (e.g. wood, strings), adjectives from sensory modalities other than hearing (e.g. sharp, sweet), and subjective impressions (e.g. pleasant, noisy). Users were also invited to enter their own words. Our preliminary study indicates that people tend to assign similar words to certain types of sounds, and thus suggests a fairly universal mapping between words and timbre. Moreover these words strongly correlate with audio features both in the frequency domain (i.e. spectral content) and in the time domain (i.e. amplitude envelope shape). Results of this study will be used to develop a sound synthesis engine and audio post-processing unit that can alter the timbral quality of an audio input based on a verbal description of the user's intuitive expectation rather than technical parameters.

Methodology Following our preliminary study, we deployed a full-fledged survey where people were asked to assign words to sounds that were played to them, and compare sounds with respect to certain words. We established the following guidelines for our survey: ? Sounds: acoustic and synthetic sources, Western and non-Western instruments, "sense of pitch" (include bells for instance), no sound effects, single notes, no melody or harmony. ? Words: no subjective words (e.g. annoying, pleasant). ? Questions: interactive format, attractive interface, varied questions, "game-like" to increase user participation. ? Format: Internet-accessible to reach a large and diverse population. Disley's work suggests that online surveys do not adversely affect results in audio classification tasks.

We selected a random set of 192 sounds from The Freesound Project, a database of Creative Commons-licensed sound files contributed by an open community of users. A Python application automatically downloaded random sound files of less than 2 seconds from the Freesound database. Then we used our own rating application to help us select sounds appropriate for our study (based on the criteria mentioned above). Each sound accepted by a rater was confirmed by a second rater. We selected 192 sounds from more than 1000 downloaded sound files. We chose 60 words from the literature and from the tags associated with sound files in the Freesound database. We manually filtered the labels for all words with occurrences higher than 2. We rejected words that described recording circumstances or techniques, that were subjective in nature, or that could obviously not be used to describe timbre. We decided to dynamically update our list with additional words by manually selecting words among those proposed by at least two users. We defined 3 types of questions: the first, a multiple-choice question, plays one sound and displays a list of 7 words to describe the sound. Our preliminary survey showed that a scaled rating did not provide additional information because the scale was too subjective. Therefore we selected a binary ("applicable"/"not applicable") scale. We forced the first 2 questions in the survey to follow this format to ease the user's learning curve as it is the simplest question type. The second question type is free text entry. We decided to distinguish this question type from the previous one (and thereby make free text entry non-optional) to incite people to think about new words to describe timbre. The third question type is a comparison question (as presented below).

Survey Interface

synthetic cold high gritty airy big harsh open

crunchy thin clean full gritty dull wet nasal hard clean clear open sharp

gentle solid thick gritty airy dull muffled close wooden ringing clamorous open distant ringing open clean dry rough metallic short

Survey Format To gather sufficient data to match words with sounds in a statistically significant manner and evenly distribute user responses among all auditory stimuli, we randomly divided the 192 sounds into 4 banks of 48 sounds. We further divided each bank of 48 sounds into 4 sets of 12 sounds. Within each set, 4 sounds are used for comparison questions (2 questions with 2 sounds each), 4 sounds are used for free text questions, and the remaining 4 are used for check box questions, for a total of 10 questions per set. At the beginning of each survey session, a set is assigned randomly to a user. Similarly, we randomly divided the initial 60 words into 4 groups of 15 words each. Each group of 15 words was then assigned to a particular sound bank. Words within a group are used randomly (either for check box questions or for comparison questions) with the corresponding sound bank. After the first two questions (check box types), other question types appear randomly within each set of 10 questions in the proportions mentioned above. The user can end a survey session after the first 10 questions (results with less than 10 questions answered are discarded). If a user decides to continue further, a new set of sounds is assigned randomly to the session. At the beginning of the survey, the user has the option to enter his or her e-mail address. This information allows the user to resume an interrupted session. After the front page, a demographics form is displayed. We ask the user to respond to the following questions: year of birth, native language, native country, country of current residence, favorite musical styles, and musical background. We targeted both native and non-native English-speakers to participate in our survey. Earlier studies suggest that speakers of British English and American English tend to assign different meanings to words when talking about sound. We also wanted to include non-native speakers in our study. The survey was advertized on the MIT Media Lab and the Freesound websites, as well as on various music-related mailing lists around the world. We received more than 200 responses within one week.

Preliminary Results Based on the demographic information collected from the participants, our surveyed population so far is 25% female and 75% male. Our initial analysis shows that the people tend to agree on the usage of the following words when describing certain sounds: ? bright, resonant, full, clean, dense, percussive, (non) harsh, (non) airy, (non) shrill, (non) acoustic, (non) pure, (not) close. It is interesting to note that there seems to be more agreement on the exclusion of certain words. Our results also show that people tend to disagree about the following words: ? open, hard, thin, light, heavy. Further analysis will be performed on sub-populations (based on the demographic information) to discern agreement and disagreement patterns.

Applications Based on this study, we are developing a sound description system that can automatically tag sounds in a database for retrieval purposes based on their audio features. We are also designing an audio processing engine that can modify sounds by using a verbal description of the user's intuitive expectation (such as "a sharper sound") instead of technical parameters.

Contributions Beyond the very positive feedback we received on our survey framework, the work presented here furthers our understanding of human perception of timbre and its correlate with semantic descriptors through our various contributions: ? We developed an online survey with non-traditional sounds contributed by Internet users; ? We designed an online method of data collection for auditory studies based on a novel interface, and we used the Internet to spread the word about our survey to increase user participation; ? We defined a comprehensive method to add words to the vocabulary used to describe timbre. ? We are in the process of collecting a relatively large dataset of words and corresponding sounds available to researchers interested in this field; ? We demonstrated, through our preliminary results, that there is a strong correlation between certain words and timbre, and that some of these correlations depend on a person's musical and cultural background. This indicates that an automatic tagging, retrieval, or audio processing mechanism cannot be completely universal and has to address these concerns to be effective.

References Darke, G. (2005). Assessment of Timbre Using Verbal Attributes. Proceedings of the Conference on Interdisciplinary Musicology (CIM), Montr?al (Qu?bec), Canada, March 2005.

Disley, A. C., Howard, D. M., and Hunt, A. D. (2006). Remote psychoacoustic testing using the Internet. The Journal of the Acoustical Society of America (JASA), November 2006.

Disley, A. C., Howard, D. M., and Hunt, A. D. (2006). Timbral description of musical instruments. Proceedings of the 9th International Conference on Music Perception and Cognition (ICMPC), Bologna, Italy, August 2006.

Johnson, C. G., Gounaropoulos, A. (2006). Timbre interfaces using adjectives and adverbs. Proceedings of the 2006 International Conference on New Interfaces for Musical Expression (NIME), Paris, France, June 2006.

Nicol, C. A. (2005). Development and Exploration of a Timbre Space Representation of Audio, PhD Thesis, university of Glasgow, 2005.

Turnbull, D. and Barrington, L. and Lanckriet, G. (2006). Modeling music and words using a multiclass naive Bayes approach.

Ueda, K. (1996, October). A hierarchical structure for adjectives describing timbre. Journal of the Acoustical Society of America (JASA), 100(4), 275.

The Freesound Project, Music Technology Group, Institut Universitari de l'Audiovisual, Universitat Pompeu Fabra, Barcelona, Spain. .

sharp dry thick

clamorous cold dense rough

metallic brassy

full dense harsh mild quick thin dull warm dense

hot dry high open thick acoustic crunchy warm close harsh pure

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download