Semantic Hifi

Bibliographic reference:

IRCAM. Semantic Hifi. paper. IRCAM: Information society technologies.
URL: http://shf.ircam.fr/?L=1

Text:

Objectives

In the context of large-scale digital music distribution, the goal of the project is to develop a new generation of HIFI systems, offering new functionality for browsing, interacting, rendering, personalizing and editing musical material.

This next generation of hard-disk based HIFI systems will drastically change the home users’ relationship to music and multimedia content. They will be able to interact with music, blurring the traditional limits between playing, performing and remixing. These HIFI systems will be as much open instruments as listening stations.

Main functions

  • Personalized classification and content-based management of music pieces; query by humming, automated playlist generation specified by global and content-based criteria, automatic production of musical summaries;
  • Browsing within musical pieces through the analysis of their content: temporal maps, browsing by lyrics, advanced variable speed playback, navigation within the orchestral polyphony with spatial audio rendering;
  • Personalized editing and composition tools, DJ application;
  • Instrumental and vocal tools and automatic accompaniment;
  • Sharing of the indexing, composition and performance work through P2P networks.

Dublin Core Metadata:

Title: Semantic Hifi
Creator: IRCAM
Subject: Hifi system / classification / indexing / music / summary.
Description: In the context of large-scale digital music distribution, the goal of the project is to develop a new generation of HIFI systems, offering new functionality for browsing, interacting, rendering, personalizing and editing musical material.
Contributor: -
Date: -
Type: Paper
Identifier: http://shf.ircam.fr/?L=1

Source: IRCAM Information society technologies
Language: en
Coverage: World
Rights: -

Segmentation and indexing of sounds

This document is a thesis on the signal processing realised by a french student . The thesis is in french but you can find an english abstract.

Bibliographic reference:


ROSSIGNOL, Stéphane. Segmentation et indexation des signaux sonores musicaux. July 2000, thesis, Thèse de doctorat, Université Paris VI.
URL: http://stephanerossignol.ifrance.com/

Abstract:

"I defended my PhD thesis in Signal Processing in July 2000 at the University of Jussieu -- Paris VI, Paris; IRCAM -- Centre Georges Pompidou, Paris; and Supélec (engineer school), Metz (1996-2000). This work was supported by France Télécom Rennes. It deals with the Segmentation and the Indexing of Acoustic Musical Signals. Below is a summary of my PhD thesis.

This work deals with temporal segmentation and indexation of musical signals. Three interdependent schemes of segmentation are defined, which correspond to different levels of signal attributes.

1) The first scheme, named "source" scheme, concerns mainly the distinction between speech and music on movie sound tracks and on radio broadcasts.

Features have been examined: they intend to measure distinct properties of speech and music. They are combined into several multidimensional classification frameworks. The performance of the system is discussed.

2) The second scheme, named "feature" scheme, refers to labels such as: silence/sound, voiced/unvoiced, harmonic/inharmonic, monophonic/polyphonic, with vibrato/without vibrato. Most of these characteristics are features used by the third scheme.

Vibrato detection, vibrato parameter (its frequency and its magnitude) estimation, and vibrato extraction from the fundamental frequency trajectory has been particularly studied. Several techniques are described. The performance of the system is discussed.

The vibrato is extracted from the fundamental frequency trajectory to obtain a no-vibrato melodic evolution. This "flat" fundamental frequency is useful for segmentation of musical excerpts into notes (third scheme), and can also be used for sound modification or processing.

The vibrato detection is operated only when music is identified on the first scheme.

3) The third scheme leads to segmentation into "notes or into phones or more generally into stable sounds", according to the nature of the sound: instrumental part, singing voice excerpt, speech, percussive part...

The analysis is composed of four steps. The first step is to extract a large set of features. A feature will be all the more appropriate as its time evolution presents strong and short peaks when transitions occur, and as its variance and its mean remain at very low levels when describing a steady state part. Three kinds of transitions exist: fundamental frequency transients, energy transients and frequency content transients. Secondly, each of these features is automatically thresholded. Thirdly, a final decision function based on the set of the thresholded features has been built and provides the segmentation marks. Lastly, for monophonic and harmonic sounds, the automatic transcription is done. The performance of the system is discussed.

The data obtained in a given scheme are propagated from lower numbered to higher numbered schemes in order to improve their performance."


Dublin Core Metadata:

Title: Segmentation and indexing of sounds
Creator: Stéphane ROSSIGNOL
Subject: Signal processing, segmentation, indexing, sounds, musical signals.
Description: "This work deals with temporal segmentation and indexation of musical signals. Three interdependent schemes of segmentation are defined, which correspond to different levels of signal attributes."
Contributor: -
Date: july 2000
Type: thesis (summaryin english)
Identifier: http://stephanerossignol.ifrance.com/
Source: Web site of Stéphane Rossignol
Language: Fr
Coverage: -
Rights: -

Indexing Sound Files on Search Engines that Can’t Hear Them

Référence bibliographique:

SLAWSKI, Bill. Indexing Sound Files on Search Engines that Can’t Hear Them. Creative Flow, May 27th, 2004. URL: http://blog.cre8asite.net/archives/125

Text:

"What do you do if most of the content your company creates is in audio or video format, and you want to include it on the web, and make it so that people can find it? And the content is news, which relies upon timely delivery?

In the case of National Public Radio (NPR), the answer was to put the audio files online, and also offer transcriptions of them. Since NPR started doing that a few weeks ago, they’ve noticed a substantial increase in traffic to their site for topical subjects from the search engines.

One of the things I like about this practice is that people with hearing disabilities are now able to access stories that they couldn’t hear on the radio. What a great result.

The transcription is presently done by speech recognition technology to get stories online very quickly after they are broadcast on the radio. It’s likely that humans will take over from the software presently used, which sometimes garbles results.

If at some point, the search engines become capable of indexing audio, I hope that sites providing transcripts continue to do so. It’s great to see such a great improvement in accessibility, even if it is done inadvertently."

Dublin Core Metadata:

Title: Indexing Sound Files on Search Engines that Can’t Hear Them
Creator: Bill SLAWSKI
Subject: Audio file / audio format / video format / indexing audio / National public radio / recognition / speech recognition.
Description: "What do you do if most of the content your company creates is in audio or video format, and you want to include it on the web, and make it so that people can find it? And the content is news, which relies upon timely delivery?"
Contributor: -
Date: 2004/05/27
Type: Article
Identifier: http://blog.cre8asite.net/archives/125
Source: Creative Flow
Language: En
Coverage: World
Rights: -

Audio and Sound Information

This site was created by an old student of GIDO (Gestion de l'Information et du Document dans les Organisations) at the IUT Michel de Montaigne (France, Bordeaux) in 2006.
It gives information on the processing of audio documents thanks to web sites links. It is composed to two parts, the first on the sound information and the second on the processing on the audio documents.
It contents nine links, you can visited the Assocation for recorded sound collections (ARSC), the Association des détenteurs de documents audiovisuels et sonores (AFAS), or the International Association of Sound and Audiovisual Archives (IASA) for example.

Clic on the link to visit the site Audio and Sound Information: http://www.iut.u-bordeaux3.fr/doc/sitos2006/Info%20son/Index.htm

Speech and language technologies for audio indexing and retrieval

Bibliographic reference

MAKHOUL, John et al / Speech and language technologies for audio indexing and retrieval. Proceedings of the IEEE, Vol. 88, N° 8. AUGUST 2000. URL: http://www.bbn.com/docs/whitepapers/Audio-Indexing-Retrieval.pdf

Text

URL: http://www.bbn.com/docs/whitepapers/Audio-Indexing-Retrieval.pdf

Dublin Core Metadata:

Title: Speech and language technologies for audio indexing and retrieval
Creator: John MAKHOUL
Subject: Audio indexing, information extraction, information retrieval, speech recognition, segmentation, classification.
Description: This paper explain how to extract audio information. It proposes figures of the Rough'n Ready System witch it is possible to indexing and retrieval sounds. It also talks about the speech recognition and the segmentation.
Contributor: -
Date: August 2000
Type: Pdf
Identifier: URL: http://www.bbn.com/docs/whitepapers/Audio-Indexing-Retrieval.pdf
Source: Proceedings of the IEEE
Language: En
Coverage: World
Rights: IEEE

An Online Audio Indexing System

Bibliographic reference:

AJMERA, Jitendra. McCOWAN, Iain. BOURLARD, Herve / An Online Audio Indexing System. IDIAP, Switzerland. 2002. URL: ftp://ftp.idiap.ch/pub/reports/2003/rr03-39b.pdf

Text:

Sumary: "This paper presents overview of an online audio indexing system which creates a searchable index of speech content embedded in digitized audio files. This system is based on our recently proposed offline audio segmentation techniques. As the data arrives continuously, the system first finds boundaries of the acoustically homogenous segments. Next, each of these segments is classified as speech, music or \it mixture classes, where mixtures are defined as regions where speech and other non-speech sounds are present simultaneously and noticeably. The speech segments are then clustered together to provide consistent speaker labels. The speech and mixture segments are converted to text via an ASR system. The resulting words are time-stamped together with other metadata information (speaker identity, speech confidence score) in an XML file to rapidly identify and access target segments. In this paper, we analyze the performance at each stage of this audio indexing system and also compare it with the performance of the corresponding offline modules."

URL: ftp://ftp.idiap.ch/pub/reports/2003/rr03-39b.pdf

Dublin Core Metadata:

Title: An Online Audio Indexing System
Creator: AJMERA Jitendra, McCOWAN Iain, BOURLARD Herve
Subject: audio files, audio indexing, automatic speech recognition, speaker clustering.
Description: "This paper presents an overview of an online audio indexing systemwhitch creates a searchable index of speech content embedded in digitized audio files."
Contributor: -
Date: 2004
Type: Paper
Format: Pdf
Identifier:URL: ftp://ftp.idiap.ch/pub/reports/2003/rr03-39b.pdf
Source: IDIAP Research Institute
Language: En
Coverage: World
Rights: IDIAP

Glossary English / French

Automatic sound indexing
“These methods allow the system to automatically organize new sounds introduced by the user, by analyzing their content in relation to predefined categories."
Name: IRCAM Centre Pompidou.
URL: http://www.ircam.fr/307.html?&L=1&tx_ircam_pi4[showUid]=15&cH
ash=72de9812b5

Indexation automatique des sons
“ Méthode permettant au système de ranger automatiquement les nouveaux sons introduits par l’utilisateur à partir d’une analyse de leur contenu selon des catégories qu’il aura prédéfinies.»
Nom : IRCAM Centre Pompidou.
URL : http://www.ircam.fr/307.html?&L=0&tx_ircam_pi4[showUid]=15&cH
ash=72de9812b5


Cataloging
"The compilation and maintenance of primary information by systematically describing objects in the collection, and the arranging of this information into an object catalog record."
Name: International Guidelines for Museum Object Information: The CIDOC Information Categories.
URL: http://www.willpowerinfo.myby.co.uk/cidoc/guide/guideglo.htm

Catalogage
"Consiste à analyser le document en tant que support."
Nom : Methodoc.
URL : http://www.scd.univ-lille3.fr/methodoc/cours/typedocument/catal
ogage.htm


Indexing
"The process of converting a collection of data into a database suitable for easy search and retrieval."
Name: Virtech E-solution-SEO: E-Solutions and Web Development for Today's Internet
URL: http://www.virtechseo.com/seoglossary.htm

Indexation
"Consiste à analyser le document pour les informations qu'il contient."
Nom : Methodoc.
URL : http://www.scd.univ-lille3.fr/methodoc/cours/typedocument/
indexation.htm


Record
"A group of fields relating to a particular object or transaction."
Name: International Guidelines for Museum Object Information: The CIDOC Information Categories.
URL: http://www.willpowerinfo.myby.co.uk/cidoc/guide/guideglo.htm

Enregistrement
"Fait de recueillir et de conserver (une donnée) au moyen d'appareils appropriés."
Nom : TLFI : Trésor de la langue française informatisé.
URL: http://atilf.atilf.fr/dendien/scripts/tlfiv5/saveregass.exe?43;s=2841
038385;r=2;;


Segmentation
"The process by which speech signals are divided into phonemes, syllables or words."
Name: Keith Yates Design Group.
URL: http://www.keithyates.com/index.html

Segmentation
"Consiste à détecter les variations brusques du signal, à détecter les transitions entre deux zones stables successives."
Nom : Thèse : Segmentation et indexation des sons.
URL: http://stephanerossignol.ifrance.com/


Sound recording
"The fixation of a series of musical, spoken or other sounds."
Name: The sound of American music company.
URL: http://www.americanmusicco.com/license/helpfulHints.asp

Enregistrement sonore
"Opération qui consiste à garder la trace d'un son de façon durable sur un support analogique comme la bande magnétique ou le disque vinyle, ou sur un support numérique comme le disque compact, en vue de pouvoir le diffuser au plus proche de l'identique et éventuellement le modifier (le traiter)."
Nom : Techno-science.net
URL: http://www.techno-science.net/?onglet=glossaire&definition=1256