Sound Indexing: Segmentation and indexing of sounds

This document is a thesis on the signal processing realised by a french student . The thesis is in french but you can find an english abstract.

Bibliographic reference:

ROSSIGNOL, Stéphane. Segmentation et indexation des signaux sonores musicaux. July 2000, thesis, Thèse de doctorat, Université Paris VI.
URL: http://stephanerossignol.ifrance.com/

Abstract:

"I defended my PhD thesis in Signal Processing in July 2000 at the University of Jussieu -- Paris VI, Paris; IRCAM -- Centre Georges Pompidou, Paris; and Supélec (engineer school), Metz (1996-2000). This work was supported by France Télécom Rennes. It deals with the Segmentation and the Indexing of Acoustic Musical Signals. Below is a summary of my PhD thesis.

This work deals with temporal segmentation and indexation of musical signals. Three interdependent schemes of segmentation are defined, which correspond to different levels of signal attributes.

1) The first scheme, named "source" scheme, concerns mainly the distinction between speech and music on movie sound tracks and on radio broadcasts.

Features have been examined: they intend to measure distinct properties of speech and music. They are combined into several multidimensional classification frameworks. The performance of the system is discussed.

2) The second scheme, named "feature" scheme, refers to labels such as: silence/sound, voiced/unvoiced, harmonic/inharmonic, monophonic/polyphonic, with vibrato/without vibrato. Most of these characteristics are features used by the third scheme.

Vibrato detection, vibrato parameter (its frequency and its magnitude) estimation, and vibrato extraction from the fundamental frequency trajectory has been particularly studied. Several techniques are described. The performance of the system is discussed.

The vibrato is extracted from the fundamental frequency trajectory to obtain a no-vibrato melodic evolution. This "flat" fundamental frequency is useful for segmentation of musical excerpts into notes (third scheme), and can also be used for sound modification or processing.

The vibrato detection is operated only when music is identified on the first scheme.

3) The third scheme leads to segmentation into "notes or into phones or more generally into stable sounds", according to the nature of the sound: instrumental part, singing voice excerpt, speech, percussive part...

The analysis is composed of four steps. The first step is to extract a large set of features. A feature will be all the more appropriate as its time evolution presents strong and short peaks when transitions occur, and as its variance and its mean remain at very low levels when describing a steady state part. Three kinds of transitions exist: fundamental frequency transients, energy transients and frequency content transients. Secondly, each of these features is automatically thresholded. Thirdly, a final decision function based on the set of the thresholded features has been built and provides the segmentation marks. Lastly, for monophonic and harmonic sounds, the automatic transcription is done. The performance of the system is discussed.

The data obtained in a given scheme are propagated from lower numbered to higher numbered schemes in order to improve their performance."

Dublin Core Metadata:

Title: Segmentation and indexing of sounds
Creator: Stéphane ROSSIGNOL
Subject: Signal processing, segmentation, indexing, sounds, musical signals.
Description: "This work deals with temporal segmentation and indexation of musical signals. Three interdependent schemes of segmentation are defined, which correspond to different levels of signal attributes."
Contributor: -
Date: july 2000
Type: thesis (summaryin english)
Identifier: http://stephanerossignol.ifrance.com/
Source: Web site of Stéphane Rossignol
Language: Fr
Coverage: -
Rights: -

Sound Indexing

Segmentation and indexing of sounds

Aucun commentaire:

Subject Index

Who am I ?

Some links

Blog Archives