5.19.2011

Internet Emotion Incubator OR How W3C Went Totally Fucking W3Queer in 2007



Right alignment is used by default for character-based alignment because numbering systems are almost all left-to-right even in right-to-left writing systems, and the primary use case of character-based alignment is for numerical alignment.

8.2. Last Line Alignment: the 'text-align-last' property

Name: text-align-last
Value: auto | start | end | left | right | center | justify
Initial: auto
Applies to: block containers
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value
This property describes how the last line of a block or a line right before a forced line break is aligned. If a line is also the first line of the block or the first line after a forced line break, then 'text-align-last' takes precedence over 'text-align' except when two keywords are given for 'text-align' (i.e. when 'text-align' assigns an explicit first line alignment).
'auto' is equivalent to the value of the 'text-align' property except when 'text-align' is set to 'justify', in which case it is 'justify' when 'text-justify' is 'distribute' and 'start' otherwise. All other values have the same meanings as in 'text-align'.

8.3. Justification Method: the 'text-justify' property

Name: text-justify
Value: auto | none | inter-word | inter-ideograph | inter-cluster | distribute | kashida
Initial: auto
Applies to: block containers and, optionally, inline elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value
This property selects the justification method used when a line's alignment is set to 'justify'. (See 'text-align'.) The property applies to block containers, but the UA may (but is not required to) also support it on inline elements. It takes the following values:
auto
The UA determines the justification algorithm to follow, based on a balance between performance and adequate presentation quality.
One possible algorithm is to determine the behavior based on the language of the paragraph: the UA can then choose appropriate value for the language, like 'inter-ideograph' for CJK, or 'inter-word' for English. Another possibility is to use a justification method that is a universal compromise for all scripts, .g. the 'inter-cluster' method with block scripts raised to first priority.
none
Justification is disabled. This value is intended for use in user stylesheets to improve readability or for accessibility purposes.
inter-word
Justification primarily changes spacing at word separators. This value is typically used for languages that separate words using spaces, like English or (sometimes) Korean.
inter-ideograph
Justification primarily changes spacing at word separators and at inter-graphemic boundaries in scripts that use no word spaces. This value is typically used for CJK languages.
inter-cluster
Justification primarily changes spacing at word separators and at grapheme cluster boundaries in clustered scripts. This value is typically used for Southeast Asian scripts such as Thai.
distribute
Justification primarily changes spacing both at word separators and at grapheme cluster boundaries in all scripts except those in the connected and cursive categories. This value is sometimes used in .g. Japanese.
kashida
Justification primarily stretches cursive scripts through the use of kashida or other calligraphic elongation. This value is optional for conformance to CSS3 Text. (UAs that do not support cursive elongation must treat the value as invalid.)
When justifying text, the user agent takes the remaining space between the ends of a line's contents and the edges of its line box, and distributes that space throughout its contents so that the contents exactly fill the line box. If the 'letter-spacing' and 'word-spacing' property values allow it, the user agent may also distribute negative space, putting more content on the line than would otherwise fit under normal spacing conditions. The exact justification algorithm is UA-dependent; however, CSS provides some general guidelines which should be followed when any justification method other than 'auto' is specified.
CSS defines expansion opportunities as points where the justification algorithm may alter spacing within the text. These expansion opportunities fall into priority levels as defined by the justification method. Within a line, higher priority expansion opportunities should be expanded or compressed to their limits before lower priority expansion opportunities are adjusted. (Expansion and compression limits are given by the letter-spacing and word-spacing properties.)
How any remaining space is distributed once all expansion opportunities reach their limits is up to the UA. If the inline contents of a line cannot be stretched to the full width of the line box, then they must be aligned as specified by the 'text-align-last' property. (If 'text-align-last' is 'justify', then they must be aligned as for 'center' if 'text-justify' is 'distribute' and as 'start' otherwise.)
The expansion opportunity priorities for values of 'text-justify' are given in the table below. Space must be distributed evenly among all types of expansion opportunities in a given prioritization group, but may vary within a line due to changes in the font or letter-spacing and word-spacing values. Since justification behavior varies by writing system, expansion opportunities are organized by script categories. The different types of expansion opportunities are defined as follows:
spaces
An expansion opportunity exists at spaces and other word separators. Expand as for 'word-spacing'.
block
clustered
discrete
An expansion opportunity exists between two grapheme clusters when at least one of them belongs to the affected script category and the spacing that point has not already been altered at a higher priority.
cursive
Words may be expanded through kashida elongation or other cursive expansion processes. Kashida may be applied in discrete units or continuously, and the prioritization of kashida points is UA-dependent: for example, the UA may apply more at the end of the line. The UA should not apply kashida to fonts for which it is inappropriate. It may instead rely on other justification methods that lengthen or shorten Arabic segments (e.g. by substituting in swash forms or optional ligatures). Because elongation rules depend on the typeface style, the UA should rely on on the font whenever possible rather than inserting kashida based on a font-independent ruleset. The UA should limit elongation so that, .g. in multi-script lines a short stretch of Arabic will not be forced to soak up too much of the extra space by itself. If the UA does not support cursive elongation, then no expansion points exist between grapheme clusters of these scripts.
punctuation
An expansion opportunity exists between a pair of characters from the Unicode symbols (S*) and punctuation (P*) classes and at enabled autospace points. The default justification priority of these points depends on the justification method as defined below; however there may be additional rules controlling their justification behavior due to typographic tradition. For example, there are traditionally no expansion opportunities between consecutive EM DASH +2014, HORIZONTAL BAR +2015, HORIZONTAL ELLIPSIS +2026, or TWO DOT LEADER +2025 characters [JLREQ]. The UA may introduce additional levels of priority to handle expansion opportunities involving punctuation.
connected
No expansion opportunities occur between pairs of connected script grapheme clusters. Is this correct?
Prioritization of Expansion Points
method: inter-word inter-ideograph distribute inter-cluster kashida auto
priority: 1st 2nd 1st 2nd 1st 2nd 1st 2nd 1st 2nd 3rd 1st 2nd
spaces ¿?
block ¿?
clustered ¿?
cursive ¿?
discrete ¿?
connected
punctuation ??? ??? ¿? ¿?
The 'auto' column defined above is informative.
The UA may enable or break optional ligatures or use other font features such as alternate glyphs to help justify the text under any method. This behavior is not defined by CSS.
3.8 Line Adjustment in [JLREQ] gives an example of a set of rules for how a text formatter can justify Japanese text. It describes rules for cases where the 'text-justify' property is 'inter-ideograph'. It describes rules for cases where the 'text-justify' property is 'inter-ideograph' and the 'text-trim' property does not specify 'no-justify'.
It produces an effect similar to cases where the computed value of 'text-trim' property does not specify 'force-end' or 'keep-end'. If the UA wants to prohibit this behavior, rule . of 3.8.3 should be omitted.
Note that the rules described in the document specifically target Japanese. Therefore they may produce non-optimal results when used to justify other languages such as English. To make the rules more applicable to other scripts, the UA could, for instance, omit the rule to compress half-width spaces (rule a. of 3.8.3).

9. Spacing

The next two properties refer to the <spacing-limit> value type, which is defined as follows:
<spacing-limit>
[ normal | <length> | <percentage> ]
normal
Specifies the normal optimum/minimum/maximum spacing, as defined by the current font and/or the user agent. Normal spacing should be percentage-based. Normal minimum and maximum spacing must be based on the optimum spacing so that the minimum and maximum limits increase and decrease with changes to the optimum spacing. Normal minimum and maximum spacing may also vary according to some measure of the amount of text on a line (e.g. block width divided by font size): larger measures can accommodate tighter spacing constraints. Normal optimum/minimum/maximum spacing may also vary based on the value of the 'text-justify' property, the element's language, and other factors. Normal minimum, maximum, and optimal spacing between a pair of connected or cursive grapheme clusters is always zero.
<length> or <percentage>
Specifies extra spacing in addition to the normal spacing. Percentages are with respect to the width of the affected character. Values may be negative, but there may be implementation-dependent limits.

9.1. Word Spacing: the 'word-spacing' property

Name: word-spacing
Value: <spacing-limit> {1,3}
Initial: normal
Applies to: all elements
Inherited: yes
Percentages: refers to width of space (U+0020) glyph
Media: visual
Computed value: 'normal' or computed value or percentage
This property specifies the minimum, maximum, and optimal spacing between words. If only one value is specified, then it represents the optimal spacing and the minimum and maximum are both 'normal'. If two values are specified, then the first represents both the optimal spacing and the minimum spacing, and the second represents the maximum spacing. If three values are specified, they represent the optimum, minimum, and maximum respectively.
If the value of the optimum or maximum spacing is less than the value of the minimum spacing, then its used value is the minimum spacing. If the optimum spacing is greater than the maximum spacing then its used value is the maximum spacing. (This substitution occurs after inheritance.)
In the absence of justification the optimal spacing must be used. The text justification process may alter the spacing from its optimum (see the 'text-justify' property, above) but must not violate the minimum spacing limit and should also avoid exceeding the maximum.
Spacing is applied to each word-separator character left in the text after the white space processing rules have been applied and should be applied half on each side of the character. This is correct for Ethiopian and doesn't matter for invisible spaces, but is it correct for Tibetan? Most publications seem to add space after the tsek mark during justification. Word-separator characters include the space (U+0020), the no-break space (U+00A0), the Ethiopic word space (U+1361), the Aegean word separators (U+10100,U+10101), the Ugaritic word divider (U+1039F), and the Tibetan tsek (U+0F0B, +0F0C). Is this list correct? If there are no word-separator characters, or if the word-separating character has a zero advance width (such as the zero width space +200B) then the user agent must not create an additional spacing between words. General punctuation and fixed-width spaces (such as +3000 and +2000 through +200A) are not considered word-separators.

9.2. Tracking: the 'letter-spacing' property

Name: letter-spacing
Value: <spacing-limit>{1,3}
Initial: normal
Applies to: all elements
Inherited: yes
Percentages: refers to width of space (U+0020) glyph
Media: visual
Computed value: 'normal' or computed value or percentage
This property specifies the minimum, maximum, and optimal spacing between grapheme clusters. If only one value is specified, then it represents all three values. If two values are specified, then the first represents both the optimal spacing and the minimum spacing, and the second represents the maximum spacing. If three values are specified, they represent the optimum, minimum, and maximum respectively.
If the value of the optimum or maximum spacing is less than the value of the minimum spacing, then its used value is the minimum spacing. If the optimum spacing is greater than the maximum spacing then its used value is the maximum spacing. (This substitution occurs after inheritance.)
In the absence of justification the optimal spacing must be used. The text justification process may alter the spacing from its optimum (see the 'text-justify' property, above) but must not violate the minimum spacing limit and should also avoid exceeding the maximum. Letter-spacing is applied in addition to any word-spacing. 'normal' optimum letter-spacing is typically zero.
Letter-spacing must not be applied at the beginning or at the end of a line. At element boundaries, the letter spacing is given by and rendered within the innermost element that contains the boundary.
For example, given the markup
<P>a<LS>b<Z>cd</Z><Y>ef</Y></LS>g</P> 
and the style sheet
LS { letter-spacing: 1em; } Z { letter-spacing: 0.3em; } Y { letter-spacing: 0.4em; } 
the spacing would be
a[0]b[1em]c[0.3em]d[1em]e[0.4em]f[0]g 
UAs may apply letter-spacing to cursive scripts. In this case, UAs should extend the space between disjoint graphemes as specified above and extend the visible connection between cursively connected graphemes by the same amount (rather than leaving a gap). The UA may use glyph substitution or other font capabilities to spread out the letters. If the UA cannot expand a cursive script without breaking the cursive connections, it should not apply letter-spacing between grapheme clusters of that script at all.
When the resulting space between two characters is not the same as the default space, user agents should not use optional ligatures.

9.3. Fullwidth Kerning: the 'text-trim' property

Syntax is derived as a shorthand of the following theoretical properties (defaults given first): text-trim-start: keep-start | force-start text-trim-end: allow-end | force-end | keep-end | keep-fullstop text-trim-adjacent: force-adjacent | keep-adjacent text-trim-kana: ... text-trim-justify: justify | no-justify Filtering out the default values (and ignoring trim-kana), we get the following shorthand: text-trim: normal | force-start || [ force-end | keep-end | keep-fullstop] || keep-adjacent || no-justify 
Name: text-trim
Value: normal | force-start || [ keep-end | force-end ] || no-justify
Initial: normal
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value
This property determines whether or not a fullwidth punctuation character should be trimmed (kerned). Values are defined as follows:
normal
Trim (kern) the blank half of fullwidth closing punctuation at the end of each line if it does not otherwise fit prior to justification.
force-start
Trim (kern) the blank half of fullwidth opening punctuation at the beginning of each line.
keep-end
Do not trim the blank half of fullwidth opening punctuation at the end of each line.
force-end
Trim (kern) the blank half of fullwidth closing punctuation at the end of each line.
no-justify
The blank portions of fullwidth punctuation must not be trimmed during the justification process. (If this value is not specified, the justification process may trim the blank portions of fullwidth punctuation glyphs.) An example of compression rules is given for Japanese in 3.8 Line Adjustment in [JLREQ].
In all cases, the UA must trim (kern) the blank half of fullwidth opening punctuation if its previous adjacent character is a fullwidth opening punctuation, fullwidth middle dot punctuation, fullwidth closing punctuation, or ideographic space (U+3000). It must also trim (kern) the blank half of fullwidth closing punctuation if its next adjacent character is a fullwidth closing punctuation, fullwidth middle dot punctuation, or ideographic space (U+3000).
The following example table lists the punctuation pairs affected by adjancent-pairs trimming. It uses halfwidth equivalents to approximate the trimming effect.
Demonstration of adjacent-pairs punctuation trimming
Combination Sample Pair Looks Like
Opening-Opening + (
Middle Dot-Opening + (
Closing-Opening + (
Ideographic Space-Opening + (
Closing-Closing + )
Closing-Middle Dot + )
Closing-Ideographic Space + )
In the context of this property the following definitions apply:
classes and Unicode code point should be reviewed.
fullwidth opening punctuation
Includes any opening punctuation character (Unicode category Ps) that belongs to the CJK Symbols and Punctuation block (U+3000-U+303F) or is categorized as East Asian Fullwidth (F) by [UAX11]. Also includes LEFT SINGLE QUOTATION MARK (U+2018) and LEFT DOUBLE QUOTATION MARK (U+201C). When trimmed, the left (for horizontal text) or top (for vertical text) half is kerned.
fullwidth closing punctuation
Includes any closing punctuation character (Unicode category Pe) that belongs to the CJK Symbols and Punctuation block (U+3000-U+303F) or is categorized as East Asian Fullwidth (F) by [UAX11]. Also includes RIGHT SINGLE QUOTATION MARK (U+2019) and RIGHT DOUBLE QUOTATION MARK (U+201D). May also include fullwidth colon punctuation and/or fullwidth dot punctuation (see below). When trimmed, the right (for horizontal text) or bottom (for vertical text) half is kerned.
fullwidth middle dot punctuation
Includes MIDDLE DOT (U+00B7), HYPHENATION POINT (U+2027), and KATAKANA MIDDLE DOT (U+30FB). May also include fullwidth colon punctuation and/or fullwidth dot punctuation (see below).
fullwidth colon punctuation
Includes FULLWIDTH COLON (U+FF1A) and FULLWIDTH SEMICOLON (U+FF1B).
fullwidth dot punctuation
Includes IDEOGRAPHIC COMMA (U+3001), IDEOGRAPHIC FULL STOP (U+3002), FULLWIDTH COMMA (U+FF0C), FULLWIDTH FULL STOP (U+FF0E).
Fullwidth opening and closing punctuation must not be trimmed if the glyph is not actually fullwidth. This includes the case where the glyph is not fullwidth as a result of the kerning in the font. A fullwidth glyph is one that has the same advance width as a typical Han character in the same font.
Whether fullwidth colon punctuation and fullwidth dot punctuation should be considered fullwidth closing punctuation or fullwidth middle dot punctuation depends on where in the glyph's box the punctuation is drawn. If the punctuation is centered, then it should be considered middle dot punctuation. If the punctuation is drawn to one side (left in horizontal text, top in vertical text) and the other half is therefore blank then the punctuation should be considered closing punctuation and trimmed accordingly.
The UA must classify fullwidth colon punctuation and fullwidth dot punctuation under either the fullwidth closing punctuation category or the fullwidth middle dot punctuation category as appropriate. The UA may rely on language conventions and the layout orientation (horizontal vs. vertical), and/or font information to determine this categorization. The UA may also add additional characters to any category as appropriate.
The following informative table summarizes language conventions for classifying fullwidth colon and dot punctuation:
colon punctuation dot punctuation
Simplified Chinese (horizontal) closing closing
Simplified Chinese (vertical) closing closing
Traditional Chinese middle dot middle dot
Korean middle dot closing
Japanese middle dot closing
Note, that for Chinese fonts at least, the author observes that the standard convention is often not followed.
Support for this property is optional. It is strongly recommended for UAs that wish to support CJK typography.

9.4. Adding space: the 'text-autospace' property

Name: text-autospace
Value: none | [ ideograph-numeric || ideograph-alpha || punctuation ]
Initial: none
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value
When a run of non-ideographic or numeric characters appears inside of ideographic text, a certain amount of space is often preferred on both sides of the non-ideographic text to separate it from the surrounding ideographic glyphs. This property controls the creation of that space when rendering the text. That added width does not correspond to the insertion of additional space characters, but instead to the width increment of existing glyphs.
(A commonly used algorithm for determining this behavior is specified in [JLREQ].)
This property is additive with the 'word-spacing' and 'letter-spacing' [CSS21] properties. That is, the amount of spacing contributed by the 'letter-spacing' setting (if any) is added to the spacing created by 'text-autospace'. The same applies to 'word-spacing'.
The space added can be compressed or expanded during the justification process as specified in the 'text-justify' property.
This property applies only to the same inline element context, and can apply across elements if in the same inline element context.
Values have the following meanings:
none
No extra space is created.
ideograph-numeric
Creates 1/4em extra spacing between runs of ideographic letters and non-ideographic numeric glyphs.
ideograph-alpha
Creates 1/4em extra spacing between runs of ideographic letters and non-ideographic letters, such as Latin-based, Cyrillic, Greek, Arabic or Hebrew.
punctuation
Creates extra non-breaking spacing around punctuation as required by language-specific typographic conventions. For example, if the element's content language is French, narrow no-break space (U+202F) and no-break space (U+00A0) should be inserted where required by French typographic guidelines.
It was requested to add a value for doubling the space after periods.
At element boundaries, the font size of the extra spacing is given by and rendered within the innermost element that contains the boundary.
In the context of this property the following definitions apply:
Reviews and feedbacks to the classes/code points are appreciated.
Ideographic letters
  • All characters in the range of +3041 to +30FF, except that belongs to Unicode Punctuation [P*] category.
  • CJK Strokes (U+31C0 to +31EF).
  • Katakana Phonetic Extensions (U+31F0 to +31FF).
  • All characters that belongs to Han Unicode Script Property [UAX24].
Non-ideographic numerals
  • All characters that belongs to Unicode Number, Decimal Digit [Nd] category.
  • Except any of the following conditions are met:
    • Is categorized as East Asian Fullwidth (F) by [UAX11].
    • Is upright in vertical text flow using the 'text-orientation' property or the 'text-combine' property.
Non-ideographic letters
  • All characters that belongs to Unicode Letters [L*] and Mark [M*] category.
  • Except any of the following conditions are met:
    • Is defined as Ideographic letters.
    • Is categorized as East Asian Fullwidth (F) by [UAX11].
    • Is upright in vertical text flow using the 'text-orientation' property or the 'text-combine' property.
Support for this property is optional. It is recommended for UAs that wish to support CJK typography.

10. Edge Effects

Edge effects control the indentation of lines with respect to other lines in the block ('text-indent') and how content is aligned to the start and end edges of a line ('hanging-punctuation', see also 'text-trim').
Japanese has three common start-edge typesetting schemes, which are distinguished by their handling of opening brackets.
The first scheme aligns opening brackets flush with the indent edge on the first line and with the start edge of other lines. The second scheme gives the opening bracket its full width, so that it is effectively indented half an em from the indent edge and from the start edge of other lines. The third scheme aligns the opening brackets flush with the start edge of lines, but hangs them inside the indent on the first line (resulting in an effective half-em indent instead of the full em for paragraphs that begin with an opening bracket).
Positioning of opening brackets at line head [JLREQ]
Assuming a UA style sheet of p { margin: 1em 0; }, CSS can achieve the Japanese typesetting styles with the following rules:
  • Brackets flush with indent, flush with other lines (first scheme):
    { /* Flush alignment */  margin: 0;  text-indent: 1em;  text-trim: force-start; } 
  • Brackets preserve fullwidth spacing on all lines (second scheme):
    { /* Fullwidth alignment */  margin: 0;  text-indent: 1em;  text-trim: normal; } 
  • Brackets hang in indent, flush with other lines (third scheme):
    { /* Hanging alignment */  margin: 0;  text-indent: 1em;  text-trim: force-start;  hanging-punctuation: first; } 

10.1. First Line Indentation: the 'text-indent' property

Name: text-indent
Value: [ <length> | <percentage> ] && [ hanging || each-line ]?
Initial: 0
Applies to: block containers
Inherited: yes
Percentages: refers to width of containing block
Media: visual
Computed value: the percentage as specified or the absolute length
This property specifies the indentation applied to lines of inline content in a block. The indent is treated as a margin applied to the start edge of the line box. Unless otherwise specified via the 'each-line' and/or 'hanging' keywords, only lines that are the first formatted line of an element. For example, the first line of an anonymous block box is only affected if it is the first child of its parent element.
Values have the following meanings:
<length>
Gives the amount of the indent as an absolute length.
<percentage>
Gives the amount of the indent as a percentage of the containing block's logical width.
each-line
Indentation affects the first line of the block container as well as each line after a forced line break, but does not affect lines after a text wrap break.
hanging
Inverts which lines are affected.
If 'text-align' is 'start' and 'text-indent' is '5em' in left-to-right text with no floats present, then first line of text will start 5em into the block:
Since CSS1 it has been possible to indent the first line of a block element using the 'text-indent' property. 
Note that since the 'text-indent' property inherits, when specified on a block element, it will affect descendant inline-block elements. For this reason, it is often wise to specify 'text-indent: 0' on elements that are specified 'display: inline-block'.

10.2. Hanging Punctuation: the 'hanging-punctuation' property

Name: hanging-punctuation
Value: none | [ first || [ force-end | allow-end ] ]
Initial: none
Applies to: inline elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: as specified
This property determines whether a punctuation mark, if one is present, may be placed outside the line box (or in the indent) at the start or at the end of a line of text.
Note that if there is not sufficient padding on the block container, hanging punctuation can trigger overflow.
Values have the following meanings:
first
An opening bracket or quote at the start of the first formatted line of an element hangs. This applies to all characters in the Unicode categories Ps, Pf, Pi.
force-end
A stop or comma at the end of a line hangs.
allow-end
A stop or comma at the end of a line hangs if it does not otherwise fit prior to justification.
When a punctuation mark hangs, it is not considered when measuring the line's contents for fit, alignment, or justification. Depending on the line's alignment, this can push the mark outside the line box. At most one punctuation character may hang outside each edge of the line.
A hanging punctuation mark is still enclosed inside its inline box and participates in text justification: its character advance width is just not measured when determining how much content fits on the line, how much the line's contents need to be expanded or compressed for justification, or how to position the content within the line box for text alignment.
Non-zero start and end borders or padding on an inline box are not allowed to hang. This can sometimes prevent a hangable punctuation mark from hanging: for example, a period at the end of an inline box with end padding cannot hang outside the end edge of a line.
Stops and commas allowed to hang include: +002C COMMA, +002E FULL STOP, +060C ARABIC COMMA, +06D4 ARABIC FULL STOP, +3001 IDEOGRAPHIC COMMA, +3002 IDEOGRAPHIC FULL STOP, +FF0C FULLWIDTH COMMA, +FF0E FULLWIDTH FULL STOP, +FE50 SMALL COMMA, +FE51 SMALL IDEOGRAPHIC COMMA, +FE52 SMALL FULL STOP, +FF61 HALFWIDTH IDEOGRAPHIC FULL STOP, +FF64 HALFWIDTH IDEOGRAPHIC COMMA. The UA may include other characters as appropriate.
Should other characters be included?
The CSS Working Group would appreciate if UAs including other characters would inform the working group of such additions.
Support for this property is optional. It is recommended for UAs that wish to support CJK typography, particularly those in the Japanese market.

11. Text Decoration

11.1. Line Decoration: Underline, Overline, and Strike-Through

The following properties describe line decorations that are added to the content of an element. When specified on or propagated to an inline box, such decoration affects all the boxes generated by that element, and is further propagated to any in-flow block-level boxes that split the inline (see CSS2.1 section 9.2.1.1) When specified on or propagated to a a block container that establishes an inline formatting context, the decorations are propagated to an anonymous inline element that wraps all the in-flow inline-level children of the block container. For all other elements, the decorations are propagated to any in-flow children.
Note that text decorations are not propagated to any out-of-flow descendants, nor to the contents of atomic inline-level descendants such as inline blocks and inline tables.
By default underlines, overlines, and line-throughs are applied only to text (including white space, letter spacing, and word spacing): margins, borders, and padding are skipped. Elements containing no text, such as images, are likewise not decorated. The text-decoration-skip property can be used to modify this behavior, for example allowing inline replaced elements to be underlined or requiring that white space be skipped.
The UA should place the start and end of the line inwards from the content edge of the decorating element so that, .g. two underlined elements side-by-side do not appear to have a single underline. (This is important in Chinese, where underlining is a form of punctuation.)
In determining the position and thickness of text decoration lines, user agents may consider the font sizes and dominant baselines of descendants, but for a given element's decoration must use the same baseline and thickness throughout each line. The color and line style of decorations must remain the same on all decorations applied by a given element, even if descendant elements have different color or line style values.
The following figure shows the averaging for underline:
In the first rendering of the underlined text '1st a' with 'st' as a superscript, both the '1st' and the 'a' are rendered in a small font. In the second rendering, the 'a' is rendered in a larger font. In the third, both '1st' and 'a' are large.
In the three fragments of underlined text, the underline is drawn consecutively lower and thicker as the ratio of large text to small text increases.
Relatively positioning a descendant moves all text decorations affecting it along with the descendant's text; it does not affect calculation of the decoration's initial position on that line. The 'visibility' property, filters, and other graphical transformations likewise affect text decorations as part of the text they're drawn on, even if the decorations were specified on an ancestor element.
In the following style sheet and document fragment:
  blockquote { text-decoration: underline; color: blue; }  em { display: block; }  cite { color: fuchsia; }  
  <blockquote>  <p>  <span>  Help, help!  <em> I am under a hat! </em>   <cite> -GwieF </cite>  </span>  </p>  </blockquote>  
...the underlining for the blockquote element is propagated to an anonymous inline element that surrounds the span element, causing the text "Help, help!" to be blue, with the blue underlining from the anonymous inline underneath it, the color being taken from the blockquote element. The <em>text</em> in the em block is also underlined, as it is in an in-flow block to which the underline is propagated. The final line of text is fuchsia, but the underline underneath it is still the blue underline from the anonymous inline element.
Sample rendering of the above underline example
This diagram shows the boxes involved in the example above. The rounded aqua line represents the anonymous inline element wrapping the inline contents of the paragraph element, the rounded blue line represents the span element, and the orange lines represent the blocks.

11.1.1. Text Decoration Lines: the 'text-decoration-line' property

Name: text-decoration-line
Value: none | underline || overline || line-through || [ cancel-underline || cancel-overline || cancel-line-through | cancel-all ]
Initial: none
Applies to: all elements
Inherited: no (but see prose)
Percentages: N/A
Media: visual
Computed value: as specified
Specifies what line decorations, if any, are added to the element. Values have the following meanings:
none
Neither produces nor inhibits text decoration.
underline
Each line of text is underlined.
cancel-underline
Inhibits propagated underlines.
overline
Each line of text has a line above it (i.e. on the opposite side from an underline).
cancel-overline
Inhibits propagated overlines.
line-through
Each line of text has a line through the middle.
cancel-line-through
Inhibits propagated line-throughs.
cancel-all
Inhibits propagated text decorations.
The use of "cancel-" instead of "no-" is incompatible with XSL. It was motivated by the desire to make text-decoration: underline no-underline less nonsensical.

11.1.2. Text Decoration Color: the 'text-decoration-color' property

Name: text-decoration-color
Value: <color>
Initial: currentColor
Applies to: all elements
Inherited: no
Percentages: N/A
Media: visual
Computed value: the computed color
This property specifies the color of text decoration (underlines overlines, and line-throughs) set on the element with text-decoration-line.

11.1.3. Text Decoration Style: the 'text-decoration-style' property

Name: text-decoration-style
Value: solid | double | dotted | dashed | wavy
Initial: solid
Applies to: all elements
Inherited: no
Percentages: N/A
Media: visual
Computed value: as specified
This property specifies the style of the line(s) drawn for text decoration specified on the element. Values have the same meaning as for the border-style properties [CSS3BG]. 'wavy' indicates a wavy line.

11.1.4. Text Decoration Shorthand: the 'text-decoration' property

Name: text-decoration
Value: <text-decoration-line> || <text-decoration-color> || <text-decoration-style> || blink
Initial: none
Applies to: all elements
Inherited: no
Percentages: N/A
Media: visual
Computed value: as specified
This property is a shorthand for setting text-decoration-line, text-decoration-color, and text-decoration-style in one declaration. Omitted values are set to their initial values. A text-decoration declaration that omits both the text-decoration-color and text-decoration-style values is backwards-compatible with CSS Levels 1 and 2.
If the blink keyword is specified the text blinks (alternates between visible and invisible). Conforming user agents may simply not blink the text. Note that not blinking the text is one technique to satisfy checkpoint 3.3 of WAI-UAAG.
The following example underlines unvisited links with a solid blue underline in CSS1 and CSS2 UAs and a navy dotted underline in CSS3 UAs.
 :link {  color: blue;  text-decoration: underline;  text-decoration: navy dotted underline; /* Ignored in CSS1/CSS2 UAs */ }  

11.1.5. Text Decoration Line Continuity: the 'text-decoration-skip' property

Name: text-decoration-skip
Value: none | [ objects || spaces || ink ]
Initial: objects
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: as specified
This property specifies what parts of the element's content any text decoration affecting the element must skip over. It controls all text decoration lines drawn by the element and also any text decoration lines drawn by its ancestors. Values have the following meanings:
none
Skip nothing: text-decoration is drawn for all text content and for inline replaced elements.
objects
Skip this element if it is an atomic inline (such as an image or inline-block).
spaces
Skip white space: this includes regular spaces (U+0020) and tabs (U+0009), as well as nbsp (U+00A0), ideographic space (U+3000), all fixed width spaces (such as +2000-U+200A, +202F and +205F), and any adjacent letter-spacing or word-spacing.
ink
Skip over where glyphs are drawn: interrupt the decoration line to let text show through where the text decoration would otherwise cross over a glyph. The UA may also skip a small distance to either side of the glyph outline.
Do we need a value that doesn't skip margins and padding?
Note that this property inherits and that descendant elements can have a different setting.

11.1.6. Text Underline Position: the 'text-underline-position' property

Name: text-underline-position
Value: auto | alphabetic | below left | below right
Initial: auto
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: as specified
This property sets the position of an underline specified on the same element: it does not affect underlines specified by ancestor elements. Values have the following meanings:
'auto'
The user agent may use any algorithm to determine the underline's position; however in vertical typographic mode, if the language of the element is set to Japanese or Korean (matches :lang(ja) or :lang(ko)) the underline must be drawn on the right side of the text as described for 'below right'.
It is suggested that the underline position be 'alphabetic' unless it crosses either subscripted (or otherwise lowered) text, or it affects characters from Asian scripts such as Han or Tibetan, for which an alphabetic underline is too high: in such cases, aligning to the em box edge as described for 'below left' is more appropriate.
'alphabetic'
The underline is aligned with the alphabetic baseline. In this case the underline is likely to cross some descenders.
'below left'
The underline is aligned with the under edge of the element's content box. In this case the underline usually does not cross the descenders. (This is sometimes called "accounting" underline.) If the underline affects descendants with a lower content edge, the user agent should shift the underline down further to the lowest underlined content bo xedge. The user agent may ignore elements with 'vertical-align' values given as lengths, percentages, 'top', or 'bottom' when making this adjustment. (Note that images that are not affected by the underline per 'text-decoration-skip' will not affect the position of the underline.)
'below right'
In horizontal typographic mode, this value is equivalent to 'below left'. In vertical typographic mode, the underline is aligned as for 'below left', except it is aligned to the right edge of the text. If this causes the underline to be drawn on the "over" side of the text, then an overline also switches sides and is drawn on the "under" side.
In some cases (such as in OpenType) the font format can offer information about the appropriate position of an underline. Typically this information gives the position of an 'alphabetic' underline; in some cases (especially in CJK fonts), it gives the position of a 'below left' underline. (In this case, the font's underline metrics typically touch the bottom edge of the em box). The UA is encouraged to use information (such as the underline thickness, or appropriate alphabetic alignment) from the font wherever appropriate.

11.2. Emphasis Marks

East Asian documents traditionally use small symbols next to each glyph to emphasize a run of text. For example:
Example of emphasis in Japanese appearing above the text
Accent emphasis (shown in blue for clarity) applied to Japanese text

11.2.1. Emphasis Mark Style: the 'text-emphasis-style' property

Name: text-emphasis-style
Value: none | [ [ filled | open ] || [ dot | circle | double-circle | triangle | sesame ] ] | <string>
Initial: none
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: 'none', a pair of keywords representing the shape and fill, or a string
This property applies emphasis marks to the element's text. Values have the following meanings:
none
No emphasis marks.
filled
The shape is filled with solid color.
open
The shape is hollow.
dot
Display small circles as marks. The filled dot is +2022 '', and the open dot is +25E6 ''.
circle
Display large circles as marks. The filled circle is +25CF '', and the open circle is +25CB ''.
double-circle
Display double circles as marks. The filled double-circle is +25C9 '', and the open double-circle is +25CE ''.
triangle
Display triangles as marks. The filled triangle is +25B2 '', and the open triangle is +25B3 ''.
sesame
Display sesames as marks. The filled sesame is +FE45 '', and the open sesame is +FE46 ''.
<string>
Display the given string as marks. Authors should not specify more than one grapheme cluster in <string>. The UA may truncate or ignore strings consisting of more than one grapheme cluster.
If a shape keyword is specified but neither of 'filled' nor 'open' is specified, 'filled' is assumed. If only 'filled' or 'open' is specified, the shape keyword computes to 'circle' in horizontal writing mode and 'sesame' in vertical writing mode.
The marks should be drawn using the element's font settings with its size scaled down to 50%. However, not all fonts have all these glyphs, and some fonts use inappropriate sizes for emphasis marks in these code points. The UA may opt to use a font known to be good for emphasis marks, or the marks may instead be synthesized by the UA. Marks must remain upright in vertical typographic modes: like CJK characters, they do not rotate to match the writing mode.
One example of good fonts for emphasis marks is Adobe's opensource project, Kenten Generic OpenType Font, which is specially designed for the emphasis marks.
The marks are drawn once for each grapheme cluster. However, emphasis marks are not drawn for a grapheme cluster consisting of:
  • Word separators or characters belonging to the Unicode separator classes (Z*). (But note that emphasis marks are drawn for a space that combines with any combining characters.)
  • Characters belonging to the Unicode classes for control codes and unassigned characters (Cc, Cf, Cn).
  • Anything else? Probably need to exclude punctuation, but which punctuation exactly? Can we determine that Unicode-wide?
If emphasis marks are drawn for grapheme clusters for which ruby is drawn in the same position as the emphasis mark, the ruby should be stacked between the emphasis marks and the base text. In this case, the position of the emphasis marks for a given element should be determined as if all grapheme clusters have ruby boxes of the same height as the highest ruby box in the element. If the UA is not capable of drawing ruby and emphasis marks on the same side, the UA may hide ruby and draw only emphasis marks.
Example of emphasis marks applied to 4 characters, and ruby to 2 of them
Emphasis marks applied to 4 characters, and ruby to 2 of them
A future level of CSS may define controls to specify what to do when emphasis marks and ruby text coincide.

11.2.2. Emphasis Mark Color: the 'text-emphasis-color' property

Name: text-emphasis-color
Value: <color>
Initial: currentcolor
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: as specified
This property describes the foreground color of the emphasis marks.

11.2.3. Emphasis Mark Shorthand: the 'text-emphasis' property

Name: text-emphasis
Value: '<text-emphasis-style>' || '<text-emphasis-color>'
Initial: see individual properties
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: see individual properties
This property is a shorthand for setting text-emphasis-style and text-emphasis-color in one declaration. Omitted values are set to their initial values.
Note that 'text-emphasis-position' is not reset in this shorthand. This is because typically the shape and color vary, but the position is consistent for a particular language throughout the document. Therefore the position should inherit independently.

11.2.4. Emphasis Mark Position: the 'text-emphasis-position' property

Name: text-emphasis-position
Value: [ above | below ] && [ right | left ]
Initial: above right
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: as specified
This property describes where emphasis marks are drawn at. The values have following meanings:
'above'
Draw marks over the text in horizontal typographic mode.
'below'
Draw marks under the text in horizontal typographic mode.
'right'
Draw marks to the right of the text in vertical typographic mode.
'left'
Draw marks to the left of the text in vertical typographic mode.
Emphasis marks are drawn exactly as if each grapheme cluster was assigned the mark as its ruby annotation text with the ruby position given by 'text-emphasis-position' and the ruby alignment as centered.
The effect of emphasis marks on the line height is the same as for ruby text.
Note, the preferred position of emphasis marks depends on the language. In Japanese for example, the preferred position is 'above right'. In Chinese, on the other hand, the preferred position is 'below right'. The informative table below summarizes the preferred emphasis mark positions for Chinese and Japanese:
Preferred emphasis mark and ruby position
Language Preferred mark position Illustration
Horizontal Vertical
Japanese above right Emphasis marks appear above each emphasized character in horizontal Japanese text. Emphasis marks appear on the right of each emphasized character in vertical Japanese text.
Chinese below right Emphasis marks appear below each emphasized character in horizontal Simplified Chinese text.

11.3. Text Shadows: the 'text-shadow' property

Name: text-shadow
Value: none | [<shadow>, ] * <shadow>
Initial: none
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: a color plus three absolute <length>s
This property accepts a comma-separated list of shadow effects to be applied to the text of the element. <shadow> is the same as defined for the 'box-shadow' property except that the 'inset' keyword is not allowed. The shadow is applied to all of the element's text as well as any text decorations it specifies. When a text outline is specified, the shadow shadows the outlined shape rather than the glyph shape.
The shadow effects are applied front-to-back: the first shadow is on top. The shadows may thus overlay each other, but they never overlay the text itself. The shadow must be painted at a stack level between the element's border and/or background, if present, and the elements text and text decoration. UAs should avoid painting text shadows over text in adjacent elements belonging to the same stack level and stacking context. (This may mean that the exact stack level of the shadows depends on whether the element has a border or background: the exact stacking behavior of text shadows is thus UA-defined.)
Unlike 'box-shadow', text shadows are not clipped to the shadowed shape and may show through if the text is partially-transparent. Like 'box-shadow', text shadows do not influence layout, and do not trigger scrolling or increase the size of the scrollable area.
The painting order of shadows defined here is the opposite of that defined in the 1998 CSS2 Recommendation.
The text-shadow property applies to both the ::first-line and ::first-letter pseudo-elements.

11.4. Text Outlines: the 'text-outline' property

Name: text-outline
Value: none | [ <color> <length> <length>? | <length> <length>? <color> ]
Initial: none
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: a color plus two absolute <length>s
This property specifies a text outline where the first length represents the outline's thickness and the second represents an optional blur radius. The outline never overlays the text itself. Its shape is the same as that obtained by applying text shadows in every radial direction, i.e. all text shadows whose offsets satisfy the equation x2 + 2 = thickness2. The blur radius is treated the same as for 'text-shadow'.
The Timed-Text WG had suggestions for some keywords (text-outline: normal|heavy|light;) as well as a <length> thickness. Should these be added? How would they be defined? (Maybe use (thin|medium|thick) as in border-width?)
A color value must be specified before or after the length values of the outline effect. The color value will be used as the color of the outline.
Implementations may ignore the blur radius when text outline is combined with a text shadow.
The spread radius argument added to the 'text-shadow' property makes this property somewhat redundant. Should it be kept?

Changes

Changes from the February 2011 CSS3 Text WD

Major changes include:
  • Rename 'large-kana' to 'fullsize-kana'.
  • Merged 'hyphenate-limit-before' and 'hyphenate-limit-after' into 'hyphenate-limit-chars' and added a word-length limit as well.
  • Renamed 'white-space-collapsing' to 'bikeshedding'.
  • Added 'tab-size' property.
  • Removed 'keep-words' value of 'word-break'.
  • Removed optional second hyphenation character.
  • Added 'hyphenate-zone' property.
  • Removed 'unrestricted' value of 'text-wrap'.
  • Added 'hyphenate' value to 'word-wrap'.
  • Merged 'text-align-first' into 'text-align'.
  • Removed 'last' value of 'hanging-punctuation'.
  • Added cancel values to 'text-decoration-lines'/'text-decoration'.
  • Removed 'all' value of 'text-decoration-skip'.
  • Changed 'wave' value of 'text-decoration-style' to 'wavy'.
  • Changed syntax of 'text-emphasis-position' to allow more combinations of position values.
  • Added an informative appendix on cultural correlations of various CSS3 Text features.
  • Added an appendix to categorize scripts according to the categories used to define, .g. 'text-justify' behavior.
The following properties have been marked optional for conformance to this module: 'hyphenate-limit-zone', 'hyphenate-limit-chars', 'hyphenate-limit-lines', 'hyphenate-resources'/'@hyphenate-resource', 'line-break', 'text-autospace', 'text-trim', 'hanging-punctuation'
Significant details updated:
  • More details for 'text-transform: capitalize'.
  • Linked to mapping rules for 'text-transform: fullwidth', defined interaction with space character.
  • Updated definition of control character handling.
  • Fixed errors in informative summary of white space collapsing.
  • Hard hyphens do not disable hyphenation in a word.
  • Added example of Japanese indentation schemes.
  • Major clarifications to 'hanging-punctuation'.
  • Sync'ed text decoration definition with CSS2.1 and defined some previously-undefined cases.
  • Computed value of 'text-decoration-color' is the computed (not specified) color.

    Appendix A: Acknowledgements

    This specification would not have been possible without the help from: Ayman Aldahleh, Bert Bos, Tantek Çelik, Stephen Deach, Martin ürst, Laurie Anna Edlund, Ben Errez, Yaniv Feinberg, Arye Gittelman, Ian Hickson, Martin Heijdra, Richard Ishida, Koji Ishii, Masayasu Ishikawa, Michael Jochimsen, Eric LeVine, Ambrose Li, åkon Wium Lie, Chris Lilley, Ken Lunde, Shinyu Murakami, Paul Nelson, Chris Pratley, Marcin Sawicki, Arnold Schrijver, Rahul Sonnad, Michel Suignard, Takao Suzuki, Frank Tang, Chris Thrasher, Etan Wexler, Chris Wilson, Masafumi Yabe and Steve Zilles.

    Appendix : References

    Normative references

    [CSS21]
    Bert Bos; et al. Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification. 7 December 2010. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/2010/WD-CSS2-20101207
    [CSS3BG]
    Bert Bos; Elika . Etemad; Brad Kemper. CSS Backgrounds and Borders Module Level 3. 15 February 2011. W3C Candidate Recommendation. (Work in progress.) URL: http://www.w3.org/TR/2011/CR-css3-background-20110215
    [RFC2119]
    S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. Internet RFC 2119. URL: http://www.ietf.org/rfc/rfc2119.txt
    [RFC4647]
    A. Phillips; . Davis. Matching of Language Tags. September 2006. Internet RFC 4647. URL: http://www.ietf.org/rfc/rfc4647.txt
    [UAX11]
    Asmus Freytag. East Asian Width. 23 March 2001. Unicode Standard Annex #11. URL: http://www.unicode.org/unicode/reports/tr11/tr11-8.html
    [UAX14]
    Asmus Freytag. Line Breaking Properties. 29 March 2005. Unicode Standard Anne #14. URL: http://www.unicode.org/unicode/reports/tr14/tr14-17.html
    [UAX24]
    Mark Davis. Script Names. 28 March 2005. Unicode Standard Annex #24. URL: http://www.unicode.org/unicode/reports/tr24/tr24-7.html
    [UAX29]
    Mark Davis. Unicode Text Segmentation. 8 October 2010. Unicode Standard Anne #29. URL: http://www.unicode.org/reports/tr29/tr29-17.html
    [UAX44]
    Mark Davis; Ken Whistler. Unicode Character Database. 8 October 2010. Unicode Standard Anne #44. URL: http://www.unicode.org/reports/tr44/tr44-6.html
    [UNICODE]
    The Unicode Consortium. The Unicode Standard. 2003. Defined by: The Unicode Standard, Version 4.0 (Boston, MA, Addison-Wesley, ISBN 0-321-18578-1), as updated from time to time by the publication of new versions URL: http://www.unicode.org/unicode/standard/versions/enumeratedversions.html

    Informative references

    [CSS3FONT]
    John Daggett. CSS Fonts Module Level 3. 24 March 2011. W3C Working Draft. (Work in progress.) URL: http://www.w3.org/TR/2011/WD-css3-fonts-20110324
    [JIS4051]
    Formatting rules for Japanese documents (『日本語文書の組版方法』). Japanese Standards Association. 2004. JIS 4051:2004. In Japanese
    [JLREQ]
    Kenzou Onozawa; et al. Requirements for Japanese Text Layout. 4 June 2009. W3C Note. URL: http://www.w3.org/TR/2009/NOTE-jlreq-20090604
    [ZHMARK]
    标点符号用法 (Punctuation Mark Usage). 1995. 中华人民共和国国家标准

    Appendix : Cultural Correlations

    This section is non-normative.
    Some features are more important in some typographic traditions than in others. This table summarizes which typographic traditions these features were designed for. Parentheses indicates traditions for which the feature is particularly significant. This table should be considered incomplete and non-exhaustive.
    Feature Typographic Tradition
    'hanging-punctuation' CJK
    'hyphens' Western, South Asian
    'hyphenate-character' Western, South Asian
    'hyphenate-limit-chars' Western, South Asian
    'hyphenate-limit-last' Western, South Asian
    'hyphenate-limit-lines' Western, South Asian
    'hyphenate-limit-zone' Western, South Asian
    'hyphenate-resource'/'@hyphenate-resource' Western, South Asian
    'line-break' CJK (Japanese)
    'text-autospace' CJK
    'text-autospace': 'punctuation' European (French)
    'text-justify': 'inter-cluster' Southeast Asian
    'text-justify': 'inter-ideograph', 'distribute' CJK
    'text-justify': 'inter-word' Western, African, Middle-Eastern, Indic
    'text-justify': 'kashida' Arabic
    'text-emphasis-style' CJK
    'text-emphasis-color' CJK
    'text-emphasis-position' CJK
    'text-transform': 'capitalize', 'uppercase', 'lowercase' Western
    'text-transform': 'fullwidth', 'fullsize-kana' CJK
    'text-trim' CJK
    'text-underline-position': 'auto' CJK
    'word-break' CJK (Korean)
    'word-spacing' Western, African, Middle-Eastern, Indic
    Additional information / corrections on this table are welcome.

    Appendix : Default UA Stylesheet

    This appendix is informative, and is to help UA developers to implement default stylesheet, but UA developers are free to ignore or change.
     /* make list items align together */ li { text-align: match-parent; } /* disable inheritance of text-emphasis marks to ruby text:  emphasis marks should only apply to base text */ rt { text-emphasis: none; }  :root:lang(zh), [lang|=zh] { /* default emphasis mark position is 'under' for Chinese */  text-emphasis-position: under; }  
    If you find any issues, recommendations to add, or corrections, please send the information to www-style@w3.org with [css3-text] in the subject line.

    Appendix : Categorization of Scripts

    This appendix is informative (non-normative).
    This appendix categorizes some common scripts in Unicode 6.0 according to the categorization given above.
    block scripts
    Bopomofo, Han, Hangul, Hiragana, Katakana, Yi
    clustered scripts
    Khmer, Lao, Myanmar, Thai This list is likely incomplete. What else fits here?
    connected scripts
    Bengali, Brahmi, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya?, Ogham, Tamil, Telugu
    cursive scripts
    Arabic, Mongolian, 'Ko?, Phags Pa?, Syriac
    discrete scripts
    Armenian, Bamum?, Braille, Canadian Aboriginal, Cherokee, Coptic, Cyrillic, Deseret, Ethiopic Greek, Hebrew, Kharoshthi, Latin, Lisu, Osmanya, Shavian, Tifinagh, Vai?
    UAs should treat unrecognized scripts as discrete.
    This listing should ideally be exhaustive wrt Unicode. Please send suggestions and corrections to the CSS Working Group. (As described above, the grouping is primarily based on justification behavior.)

    Appendix : Full Property Index

    Property Values Initial Applies to Inh. Percentages Media
    bikeshedding collapse | discard | [ [preserve | preserve-breaks] && [ trim-inner || consume-before || consume-after ] collapse all elements yes N/A visual
    hanging-punctuation none | [ first || [ force-end | allow-end ] ] none inline elements yes N/A visual
    hyphenate-character auto | <string> auto all elements yes N/A visual
    hyphenate-limit-chars auto | <integer>{1,3} auto all elements yes N/A visual
    hyphenate-limit-last none | always | column | page | spread none block containers yes N/A visual
    hyphenate-limit-lines no-limit | <integer> no-limit block containers yes N/A visual
    hyphenate-limit-zone <percentage> | <length> 0 block containers yes refers to width of the line bo visual
    hyphenate-resource none | <uri> [, <uri> ]* none all elements yes N/A visual
    hyphens none | manual | auto | all manual all elements yes N/A visual
    letter-spacing <spacing-limit>{1,3} normal all elements yes refers to width of space (U+0020) glyph visual
    line-break auto | loose | normal | strict auto all elements yes N/A visual
    tab-size <integer> 8 block containers yes N/A visual
    text-align [ start | <string> ]? [ start | end | left | right | center | justify | match-parent ] start block containers yes N/A visual
    text-align-last auto | start | end | left | right | center | justify auto block containers yes N/A visual
    text-autospace none | [ ideograph-numeric || ideograph-alpha || punctuation ] none all elements yes N/A visual
    text-decoration <text-decoration-line> || <text-decoration-color> || <text-decoration-style> || blink none all elements no N/A visual
    text-decoration-color <color> currentColor all elements no N/A visual
    text-decoration-line none | underline || overline || line-through || [ cancel-underline || cancel-overline || cancel-line-through | cancel-all ] none all elements no (but see prose) N/A visual
    text-decoration-skip none | [ objects || spaces || ink ] objects all elements yes N/A visual
    text-decoration-style solid | double | dotted | dashed | wavy solid all elements no N/A visual
    text-emphasis '<text-emphasis-style>' || '<text-emphasis-color>' see individual properties all elements yes N/A visual
    text-emphasis-color <color> currentcolor all elements yes N/A visual
    text-emphasis-position [ above | below ] && [ right | left ] above right all elements yes N/A visual
    text-emphasis-style none | [ [ filled | open ] || [ dot | circle | double-circle | triangle | sesame ] ] | <string> none all elements yes N/A visual
    text-indent [ <length> | <percentage> ] && [ hanging || each-line ]? 0 block containers yes refers to width of containing block visual
    text-justify auto | none | inter-word | inter-ideograph | inter-cluster | distribute | kashida auto block containers and, optionally, inline elements yes N/A visual
    text-outline none | [ <color> <length> <length>? | <length> <length>? <color> ] none all elements yes N/A visual
    text-shadow none | [<shadow>, ] * <shadow> none all elements yes N/A visual
    text-transform none | [ [ capitalize | uppercase | lowercase ] || fullwidth || fullsize-kana ] none all elements yes N/A visual
    text-trim normal | force-start || [ keep-end | force-end ] || no-justify normal all elements yes N/A visual
    text-underline-position auto | alphabetic | below left | below right auto all elements yes N/A visual
    text-wrap normal | none | avoid normal all elements yes N/A visual
    white-space normal | pre | nowrap | pre-wrap | pre-line not defined for shorthand properties all elements yes N/A visual
    word-break normal | keep-all | break-all normal all elements yes N/A visual
    word-spacing <spacing-limit> {1,3} normal all elements yes refers to width of space (U+0020) glyph visual
    word-wrap normal | [ break-word || hyphenate ] normal all elements yes N/A visual

    Appendix I: Index

    Example of Hard Coded Hyphenation Using Soft Hyphens. Resize window to see text reflow with hyphenation preserved. Zooming larger or smaller will also reflow the text.
    Excerpt is from Revising Prose, by Richard Lanham.

    VOLATILE TEXT AND TEXTUAL AUTHORITY

    With the elec­tron­ic word, au­thor­ity dif­fuses it­self between writer and read­er. Al­though we sel­dom think of it in this way, the print me­di­um is fun­da­ment­ally au­thor­it­ari­an. "In print" means un­change­able. You can't quar­rel with it. This pen­um­bra of au­thor­ity goes back a long way. The Renais­sance hu­man­ists re­sur­rec­ted the au­thor­ity of clas­sic­al Greek and Lat­in cul­ture by edit­ing that cul­ture's doc­u­ments in­to fixed prin­ted texts. The au­thor­it­at­ive edi­tion means the un­chan­ging edi­tion, text fixed forever, a lode­stone of cul­tur­al au­thor­ity. We still feel that way about print. It fixes things.
    Elec­tron­ic text un­fixes them. It is by nature change­able, anti-au­thor­it­ari­an. If we don't like what it says, we can change it, or­na­ment it, re­vise it, erase it, mock it in let­ters in­dis­tin­guish­able from the ori­gin­al ones. Pat­terns of au­thor­ity have shif­ted, be­come demo­crat­ized. This demo­crat­iz­a­tion means that the elec­tron­ic word will mean something quite dif­fer­ent from the prin­ted one. Any­one in­ter­ested in writ­ing of any sort must un­der­stand this change.
    It op­er­ates, for a start, upon the role the writer ad­opts as a writer. When we write we in­ev­it­ably ad­opt a so­cial role of some sort. A present­a­tion of self. Surely all of us have no­ticed that the self we ad­opt in com­puter com­mu­nic­a­tion, es­pe­cially on­line and, to a less­er de­gree in -mail, dif­fers from our "print" self. For reas­ons I leave to the psy­cho­lo­gists, com­puters have from their be­gin­nings evoked the game and play ranges of hu­man mo­tiv­a­tion far more strongly than print has. "Hack­er", be­fore it be­came a syn­onym for com­puter van­dal, used to be an in­no­cent word, used to de­scribe kids who liked to play around with com­puters just for the fun of it, to do something just to see if it could be done. This ori­gin­al hack­er men­tal­ity in­ev­it­ably creeps in whenev­er we put our fin­gers on the com­puter's home row: We hold lan­guage more lightly in our hands; our sense of hu­mor stands closer; we can't take things, or ourselves, so ser­i­ously.
    A good pre­dis­pos­i­tion this turns out to be. The "dig­nity of print" has a lot to an­swer for. Let's hope that the elec­tron­ic word pre­serves the muse of com­edy that has hovered around its cre­ation. Blog prose seems to sug­gest that it has. At all events, it is something to be alert to if you are writ­ing and re­vis­ing prose in an elec­tron­ic world. It has cre­ated a new com­mu­nic­a­tions de­cor­um.
    body

    W3C Emotion Incubator Group: Internet Emotion Incubator or How W3C Almost Went Totally Fucking W3Queer in 2007

    W3C Emotion Incubator Group

    W3C Incubator Group Report 10 July 2007

    This version:
    http://www.w3.org/2005/Incubator/emotion/XGR-emotion-20070710
    W3C W3C Incubator Report

    Abstract

    This is the report of the W3C Emotion Incubator Group (EmoXG) as specified in the Deliverables section of its charter.
    In this report we present requirements for information that needs to be represented in a general-purpose Emotion Markup Language in order to be usable in a wide range of use cases.
    Specifically the report:
    • describes the range of use cases in which an emotion markup language would be needed,
    • presents a rich structured collection of requirements arising from the use cases,
    • describes how these requirements coincide with scientific models of emotion,
    • begins to assess syntactic issues in view of a future specification, by evaluating existing markup languages in the light of the collected requirements.
    The report identifies various areas which require further investigation and debate. The intention is that it forms a major input into a new Incubator Group which would develop a draft specification as a proposal towards a future activity in the W3C Recommendation Track.

    Status of this document

    This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of Final Incubator Group Reports is available. See also the W3C technical reports index at http://www.w3.org/TR/.>


    This document was developed by the W3C Emotion Incubator Group. It represents the consensus view of the group, in particular those listed in the acknowledgements, on requirements for a generally usable emotion markup language. The document has two main purposes:
    1. elicit discussion with other groups, notably the MMI and VoiceBrowser groups at W3C, in view of a possible collaboration towards future standards;
    2. serve as the basis for a draft specification document which should be the output of a successor Incubator Group.
    Publication of this document by W3C as part of the W3C Incubator Activity indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. Participation in Incubator Groups and publication of Incubator Group Reports at the W3C site are benefits of W3C Membership.[[posterous-content:pid___0]][[posterous-content:pid___1]][[posterous-content:pid___2]][[posterous-content:pid___3]][[posterous-content:pid___4]][[posterous-content:pid___5]]
    Incubator Groups have as a goal to produce work that can be implemented on a Royalty Free basis, as defined in the W3C Patent Policy. Participants in this Incubator Group have made no statements about whether they will offer licenses according to the licensing requirements of the W3C Patent Policy for portions of this Incubator Group Report that are subsequently incorporated in a W3C Recommendation.

    Table of Contents

    Foreword: A Word of Caution
    1. Introduction
    2. Scientific Descriptions of Emotion
    3. Use Cases
    4. Requirements
    5. Assessment of Existing Markup Languages
    6. Summary and Outlook
    7. References
    8. Acknowledgements
    Appendix 1: Use Cases
    Appendix 2: Detailed Assessment of Existing Markup Languages

    Foreword: A Word of Caution

    This document is a report of the W3C Emotion Incubator group, investigating the feasibility of working towards a standard representation of emotions and related states in technological contexts.
    This document is not an attempt to "standardise emotions", nor is it an attempt to unify emotion theories into one common representation. The aim is not to understand the "true nature" of emotions, but to attempt a transfer - making available descriptions of emotion-related states in application-oriented technological contexts, inspired by scientific proposals, but not slavishly following them.
    At this early stage, the results presented in this document are preliminary; the authors do not claim any fitness of the proposed model for any particular application purpose.
    In particular, we expressly recommend prospective users of this technology to check for any (implicit or explicit) biases, misrepresentations or omissions of important aspects of their specific application domain. If you have such observations, please let us know -- your feedback helps us create a specification that is as generally usable as possible!

    1. Introduction

    The W3C Emotion Incubator group was chartered "to investigate the prospects of defining a general-purpose Emotion annotation and representation language, which should be usable in a large variety of technological contexts where emotions need to be represented".
    What could be the use of such a language?
    From a practical point of view, the modeling of emotion related states in technical systems can by important for two reasons.
    1. To enhance computer-mediated or human-machine communication. Emotions are a basic part of human communication and should therefore be taken into account, e.g. in emotional Chat systems or emphatic voice boxes. This involves specification, analysis and display of emotion related states.
    2. To enhance systems' processing efficiency. Emotion and intelligence are strongly interconnected. The modeling of human emotions in computer processing can help to build more efficient systems, e.g. using emotional models for time-critical decision enforcement.
    A standardised way to mark up the data needed by such "emotion-oriented systems" has the potential to boost development primarily because
    a) data that was annotated in a standardised way can be interchanged between systems more easily, thereby simplifying a market for emotional databases.
    b) the standard can be used to ease a market of providers for sub-modules of emotion processing systems, e.g. a web service for the recognition of emotion from text, speech or multi-modal input.
    The work of the present, initial Emotion Incubator group consisted of two main steps: firstly to revisit carefully the question where such a language would be used (Use cases), and secondly to describe what those use case scenarios require from a language (Requirements). These requirements are compared to the models proposed by current scientific theory of emotions (Scientific descriptions). In addition, existing markup languages are discussed with respect to the requirements (Existing languages).
    The specification of an actual emotion markup language has not yet been started, but is planned as future work (Summary and Outlook). This deviation from the original plan was the result of a deliberate choice made by the group - given the strong commitment by many of the group's members to continue work after the first year, precedence was given to the careful execution of the first steps, so as to form a solid basis for the more "applicable" steps that are the logical continuation of the group's work.
    Throughout the Incubator Activity, decisions have been taken by consensus during monthly telephone conferences and two face to face meetings.
    The following report provides a detailed description of the work carried out and the results achieved so far. It also identifies open issues that will need to be followed up in future work.
    The Incubator Group is now seeking to re-charter as an Incubator group for a second and final year. During that time, the requirements presented here will be prioritised; a draft specification will be formulated; and possible uses of that specification in combination with other markup languages will be outlined. Crucially, that new Incubator group will seek comment from the W3C MMI and VoiceBrowser working groups. These comments will be decisive for the decision whether to move into the Recommendation Track.

    1.1 Participants

    The group consisted of representatives of 16 institutions from 11 countries in Europe, Asia, and the US:
    • German Research Center for Artificial Intelligence (DFKI) GmbH, Germany *
    • Deutsche Telekom AG, T-Com, Germany *
    • University of Edinburgh, UK *
    • Chinese Academy of Sciences, China *
    • Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland *
    • University of Southern California Information Sciences Institute (USC / ISI), USA *
    • Universidad Politecnica de Madrid, Spain *
    • Fraunhofer Gesellschaft, Germany *
    • Loquendo, Italy
    • Image, Video and Multimedia Systems Lab (IVML-NTUA), Greece
    • Citigroup, USA
    • Limsi, CNRS, France +
    • University of Paris VIII, France +
    • Austrian Institute for Artificial Intelligence (OFAI), Austria +
    • Technical University Munich, Germany +
    • Emotion.AI, Japan +
    * Original sponsor organisation
    + Invited expert
    It can be seen from this list that the interest has been broad and international, but somewhat tilted towards the academic world. It will be one important aim of a follow-up activity to produce sufficiently concrete output to get more industrial groups actively interested.

    2. Scientific Descriptions of Emotion

    2.1 Defining the scope: emotions and emotion-related states

    One central terminological issue to be cleared first is the semantics of the term emotion, which has been used in a broad and a narrow sense.
    In its narrow sense, as it is e.g. used by Scherer (2000), the term refers to what is also called a prototypical emotional episode (Russell & Feldman Barrett 1999), full blown emotion, or emergent emotion (Douglas-Cowie et al. 2006): a short, intensive, clearly event triggered emotional burst. A favourite example would be "fear" when encountering a bear in the woods and fleeing in terror.
    Especially in technological contexts there is a tendency to use the term emotion(al) in a broad sense, sometimes for almost everything that cannot be captured as purely cognitive aspect of human behaviour. More useful established terms -- though still not concisely defined -- for the whole range of phenomena that make up the elements of emotional life are "emotion-related states" and "affective states".
    A number of taxonomies for these affective states have been proposed. Scherer (2000), e.g., distinguishes:
    • Emotions
    • Moods
    • Interpersonal stances
    • Preferences/Attitudes
    • Affect dispositions
    This list was extended / modified by the HUMAINE group working on databases: in Douglas-Cowie et al. (2006) the following list is proposed (and defined):
    • Attitudes
    • Established emotion
    • Emergent emotion (full-blown)
    • Emergent emotion (suppressed)
    • Moods
    • Partial emotion (topic shifting)
    • Partial emotion (simmering)
    • Stance towards person
    • Stance towards object/situation
    • Interpersonal bonds
    • Altered state of arousal
    • Altered state of control
    • Altered state of seriousness
    • Emotionless
    Emergent emotions -- not without reason also termed prototypical emotional episodes -- can be viewed as the archetypical affective states and many emotional theories focus on them. Empirical studies (Wilhelm, Schoebi & Perrez 2004) on the other hand show that while there are almost no instances where people report their state as completely unemotional, examples of full-blown emergent emotions are really quite rare. As the ever present emotional life consists of moods, stances towards objects and persons, and altered states of arousal, these indeed should play a prominent role in emotion-related computational applications. The envisaged scope of an emotion representation language clearly comprises emotions in the broad sense, i.e. should be able to deal with different emotion-related states.

    2.2 Different components of emotions

    There is an old Indian tale called "The blind men and the elephant" that enjoys some popularity in the psychological literature as an allegory for the conceptual difficulties to come up with unified and uncontroversial descriptions of complex phenomena. In this tale several blind men who never have encountered an elephant before, try to come up with an understanding of the nature of this unknown object. Depending on the body part each of them touches they provide strongly diverging descriptions. An elephant seems to be best described as a rope if you hang to its tail only, is a tree if you just touched its legs, appears as a spear if you encountered a tusk etc.
    This metaphor fits nicely with the multitude of definitions and models currently available in the scientific literature on emotions, which come with a fair amount of terminological confusion added on top. There are no commonly accepted answers to the questions on how to model the underlying mechanism that are causing emotions, on how to classify them, on whether to use categorial or dimensional descriptions etc. But leaving these questions aside, there is a core set of components that are quite readily accepted to be essentialcomponents of emergent emotions.
    Subjective component: Feelings.
    Feelings are probably what is most strongly associated with the term emotion in folk psychology and they have been claimed to make up an important part of the overall complex phenomenon of emotion.
    Cognitive component: Appraisals
    The most prominently investigated aspect of this component is the role of -- not necessarily conscious -- cognitive processes that are concerned with the evaluation of situations and events in the context of appraisal models (e.g. Arnold 1960, Lazarus 1966), i.e. the role and nature of cognitive processes in the genesis of emotions. Another aspect are modulating effects of emotions on cognitive processes, such as influences on memory and perception (e.g. narrowing of the visual field in fear).
    Physiological component:
    Physiological changes both in the peripheral (e.g., heart-rate, skin-conductivity) and the central system (e.g. neural activity) are obviously one important component of emergent emotions. This component is also strongly interconnected with other components in this list: e.g. changes in the muscular tone, also account for the modulation of some expressive features in speech (prosody, articulatory precision) or in the appearance (posture, skin color).

    Behavioral component:
    Action tendencies
    Emotions have a strong influence on the motivational state of a subject. Frijda (1986) e.g. associated emotions to a small set of action tendencies, e.g. avoidance (relates to fear), rejecting (disgust) etc. Action tendencies can be viewed as a link between the outcome of an appraisal process and actual actions.

    Expressive component:

    The expressive component comprises facial expressions but also body posture and gesture and vocal cues (prosody, voice quality, affective bursts).
    Different theories may still strongly disagree on the relative importance of these components and on interactions and cause-and-effect relations between them. However, the fact that these components are relevant to some extent seems relatively uncontroversial.

    3. Use cases

    Taking a software engineering approach to the question of how to represent emotion in a markup language, the first necessary step for the Emotion Incubator group was to gather a set of use cases for the language.
    At this stage, we had two primary goals in mind: to gain an understanding of the many possible ways in which this language could be used, including the practical needs which have to be served; and to determine the scope of the language by defining which of the use cases would be suitable for such a language and which would not. The resulting set of final use cases would then be used as the basis for the next stage of the design process, the definition of the requirements of the language.
    The Emotion Incubator group is comprised of people with wide ranging interests and expertise in the application of emotion in technology and research. Using this as a strength, we asked each member to propose one or more use case scenarios that would represent the work they, themselves, were doing. This allowed the group members to create very specific use cases based on their own domain knowledge. Three broad categories were defined for these use cases:
    • Data Annotation,
    • Emotion Recognition
    • Emotion Generation
    Where possible we attempted to keep use cases within these categories, however, naturally, some crossed the boundaries between categories.
    A wiki was created to facilitate easy collaboration and integration of each member's use cases. In this document, subheadings of the three broad categories were provided along with a sample initial use case that served as a template from which the other members entered their own use cases and followed in terms of content and layout. In total, 39 use cases were entered by the various working group members: 13 for Data Annotation, 11 for Emotion Recognition and 15 for Emotion Generation.
    Possibly the key phase of gathering use cases was in the optimisation of the wiki document. Here, the members of the group worked collaboratively within the context of each broad category to find any redundancies (replicated or very similar content), to ensure that each use case followed the template and provided the necessary level of information, to disambiguate any ambiguous wording (including a glossary of terms for the project), to agree on a suitable category for use cases that might well fit into two or more and to order the use cases in the wiki so that they formed a coherent document.
    In the following, we detail each broad use case category, outlining the range of use cases in each, and pointing out some of their particular intricacies. Detailed descriptions of all use cases can be found in Appendix 1.

    3.1. Data annotation

    The Data Annotation use case groups together a broad range of scenarios involving human annotation of the emotion contained in some material. These scenarios show a broad range with respect to the material being annotated, the way this material is collected, the way the emotion itself is represented, and, notably, which kinds of additional information about the emotion are being annotated.
    One simple case is the annotation of plain text with emotion dimensions or categories and corresponding intensities. Similarly, simple emotional labels can be associated to nodes in an XML tree, representing e.g. dialogue acts, or to static pictures showing faces, or to speech recordings in their entirety. While the applications and their constraints are very different between these simple cases, the core task of emotion annotation is relatively straightforward: it consists of a way to define the scope of an emotion annotation and a description of the emotional state itself. Reasons for collecting data of this kind include the creation of training data for emotion recognition, as well as scientific research.
    Recent work on naturalistic multimodal emotional recordings has compiled a much richer set of annotation elements (Douglas-Cowie et al., 2006), and has argued that a proper representation of these aspects is required for an adequate description of the inherent complexity in naturally occurring emotional behaviour. Examples of such additional annotations are multiple emotions that co-occur in various ways (e.g., as blended emotions, as a quick sequence, as one emotion masking another one), regulation effects such as simulation or attenuation, confidence of annotation accuracy, or the description of the annotation of one individual versus a collective annotation. In addition to annotations that represent fixed values for a certain time span, various aspects can also be represented as continuous "traces" -- curves representing the evolution of, e.g., emotional intensity over time.
    Data is often recorded by actors rather then observed in naturalistic settings. Here, it may be desirable to represent the quality of the acting, in addition to the intended and possibly the perceived emotion.
    With respect to requirements, it has become clear that Data Annotation poses the most complex kinds of requirements with respect to an emotion markup language, because many of the subtleties humans can perceive are far beyond the capabilities of today's technology. We have nevertheless attempted to encompass as many of the requirements arising from Data Annotation, not least in order to support the awareness of the technological community regarding the wealth of potentially relevant aspects in emotion annotation.

    3.2 Emotion recognition

    As a general rule, the context of the Emotion Recognition use case has to do with low- and mid-level features which can be automatically detected, either offline or online, from human-human and human-machine interaction. In the case of low-level features, these can be facial features, such as Action Units (AUs) (Ekman and Friesen 1978) or MPEG 4 facial action parameters (FAPs) (Tekalp and Ostermann, 2000), speech features related to prosody (Devillers, Vidrascu and Lamel 2005) or language, or other, less frequently investigated modalities, such as bio signals (e.g. heart rate or skin conductivity). All of the above can be used in the context of emotion recognition to provide emotion labels or extract emotion-related cues, such as smiling, shrugging or nodding, eye gaze and head pose, etc. These features can then be stored for further processing or reused to synthesise expressivity on an embodied conversational agent (ECA) (Bevacqua et al., 2006).
    In the case of unimodal recognition, the most prominent examples are speech and facial expressivity analysis. Regarding speech prosody and language, the CEICES data collection and processing initiative (Batliner et al. 2006) as well as exploratory extensions to automated call centres (Burkhardt et al., 2005) are the main factors that defined the essential features and functionality of this use case. With respect to visual analysis, there are two cases: in the best case scenario, detailed facial features (eyes, eyebrows, mouth, etc.) information can be extracted and tracked in a video sequence, catering for high-level emotional assessment (e.g. emotion words). However, when analysing natural, unconstrained interaction, this is hardly ever the case since colour information may be hampered and head pose is usually not directed to the camera; in this framework, skin areas belonging to the head of the subject or the hands, if visible, are detected and tracked, providing general expressivity features, such as speed and power of movement (Bevacqua et al., 2006).
    For physiological data, despite being researched for a long time especially by psychologists, no systematic approach to store or annotate them is in place. However, there are first attempts to include them in databases (Blech et al., 2005), and suggestions on how they could be represented in digital systems have been made (Peter and Herbon, 2006). A main difficulty with physiological measurements is the variety of possibilities to obtain the data and of the consequential data enhancement steps. Since these factors can directly affect the result of the emotion interpretation, a generic emotion markup language needs to be able to deal with such low-level issues. The same applies to the technical parameters of other modalities, such as resolution and frame rate of cameras, the dynamic range or the type of sound field of the chosen microphone, and algorithms used to enhance the data.
    Finally, individual modalities can be merged, either at feature- or decision-level, to provide multimodal recognition. In this case, features and timing information (duration, peak, slope, etc.) from individual modalities are still present, but an integrated emotion label is also assigned to the multimedia file or stream in question. In addition to this, a confidence measure for each feature and decision assists in providing flexibility and robustness in automatic or user-assisted methods.

    3.3 Generation

    We divided the 15 use cases in the generation category into a number of further sub categories, these dealt with essentially simulating modelled emotional processes, generating face and body gestures and generating emotional speech.
    The use cases in this category had a number of common elements that represented triggering the generation of an emotional behaviour according to a specified model or mapping. In general, emotion eliciting events are passed to an emotion generation system that maps the event to an emotion state which could then be realised as a physical representation, e.g. as gestures, speech or behavioural actions.
    The generation use cases presented a number of interesting issues that focused the team on the scope of the work being undertaken. In particular, they showed how varied the information being passed to and information being received from an emotion processing system can be. This would necessitate either a very flexible method of receiving and sending data or to restrict the scope of the work in respect to what types of information can be handled.
    The first sub set of generation use cases were termed 'Affective Reasoner', to denote emotion modelling and simulation. Three quite different systems were outlined in this sub category, one modelling cognitive emotional processes, one modelling the emotional effects of real time events such as stock price movements on a system with a defined personality and a large ECA system that made heavy use of XML to pass data between its various processes.
    The next sub set dealt with the generation of automatic facial and body gestures for characters. With these use cases, the issue of the range of possible outputs from emotion generation systems became apparent. While all focused on generating human facial and body gestures, the possible range of systems that they connect to was large, meaning the possible mappings or output schema would be large. Both software and robotic systems were represented and as such the generated gesture information could be sent to both software and hardware based systems on any number of platforms. While a number of standards are available for animation that are used extensively within academia (e.g., MPEG-4 (Tekalp and Ostermann, 2000), BML (Kopp et al., 2006)), they are by no means common in industry.
    The final sub set was primarily focused on issues surrounding emotional speech synthesis, dialogue events and paralinguistic events. Similar to the issues above, the generation of speech synthesis, dialogue events, paralinguistic events etc. is complicated by the wide range of possible systems to which the generating system will pass its information. There does not seem to be a widely used common standard, even though the range is not quite as diverse as with facial and body gestures. Some of these systems made use of databases of emotional responses and as such might use an emotion language as a method of storing and retrieving this information.

    4. Requirements

    Overview

    The following represents a collection of requirements for an Emotion Markup Language ("EmotionML") as they arise from the use cases specified above. Each scenario described through the use cases has implicit requirements which need need to be made explicit to allow for their representation through a language. The challenge with the 39 use case scenarios collected in the Emotion Incubator group was to structure the extracted requirements in a way that reduces complexity, and to agree on what should be included in the language itself and what should be described through other, linked representations.
    Work proceeded in a bottom-up, iterative way. From relatively unstructured lists of requirements for the individual use case scenarios, a requirements document was compiled within each of the three use case categories (Data Annotation, Emotion Recognition and Emotion Generation). These three documents differed in structure and in the vocabulary used, and emphasised different aspects. For example, while the Data Annotation use case emphasised the need for a rich set of metadata descriptors, the Emotion Recognition use case pointed out the need to refer to sensor data and environmental variables, and the use case on Emotion Generation requested a representation for the 'reward' vs. 'penalty' value of things. The situation was complicated further by the use of system-centric concepts such as 'input' and 'output', which for Emotion Recognition have fundamentally different meanings than for Emotion Generation. For consolidating the requirements documents, two basic principles were agreed on:
    1. The emotion language should not try to represent sensor data, facial expressions, environmental data etc., but define a way of interfacing with external representations of such data.
    2. The use of system-centric vocabulary such as 'input' and 'output' should be avoided. Instead, concept names should be chosen by following the phenomena observed, such as 'experiencer', 'trigger', or 'observable behaviour'.
    Based on these principles and a large number of smaller clarifications, the three use case specific requirements documents were merged into an integrated wiki document. After several iterations of restructuring and refinement, a consolidated structure has materialised for that document. The elements of that document are grouped into sections according to the type of information that they represent: (1) Information about the emotion properties, (2) Meta-information about the individual emotion annotations, (3) links to the rest of the world, (4) information about a number of global metadata, and (5) ontologies.

    4.1. Information about the emotion properties (Emotion 'Core')

    4.1.1. Type of emotion-related phenomenon

    The language should not only annotate emergent emotions, i.e. emotions in the strong sense (such as anger, joy, sadness, fear, etc.), but also different types of emotion-related states.
    The emotion markup should provide a way of indicating which of these (or similar) types of emotion-related/affective phenomena is being annotated.
    The following use cases require annotation of emotion categories and dimensions:

    4.1.2. Emotion categories

    The emotion markup should provide a generic mechanism to represent broad and small sets of possible emotion-related states. It should be possible to choose a set of emotion categories (a label set), because different applications need different sets of emotion labels. A flexible mechanism is needed to link to such sets. A standard emotion markup language should propose one or several "default" set(s) of emotion categories, but leave the option to a user to specify an application-specific set instead. Douglas-Cowie et al. (2006) propose a list of 48 emotion categories that could be used as the "default" set.
    The following use cases demonstrate the use of emotion categories:

    4.1.3. Emotion dimensions

    The emotion markup should provide a generic format for describing emotions in terms of emotion dimensions. As for emotion categories, it is not possible to predefine a normative set of dimensions. Instead, the language should provide a "default" set of dimensions, that can be used if there are no specific application constraints, but allow the user to "plug in" a custom set of dimensions if needed. Typical sets of emotion dimensions include "arousal, valence and dominance" (known in the literature by different names, including "evaluation, activation and power"; "pleasure, arousal, dominance"; etc.). Recent evidence suggests there should be a fourth dimension: Roesch et al. (2006) report consistent results from various cultures where a set of four dimensions is found in user studies: "valence, potency, arousal, and unpredictability".
    The following use cases demonstrate use of dimensions for representing emotional states:

    4.1.4. Description of appraisals of the emotion or of events related to the emotion

    Description of appraisal can be attached to the emotion itself or to an event related to the emotion. Three groups of emotional events are defined in the OCC model (Ortony, Clore, & Collins, 1988): the consequences of events for oneself or for others, the actions of others and the perception of objects.
    The language will not cover other aspects of the description of events. Instead, there will be a possibility to attach an external link to the detailed description of this event according to an external representation language. The emotion language could integrate description of events (OCC events, verbal description) and time of event (past, present, future).
    Appraisals can be described with a common set of intermediate terms between stimuli and response, between organism and environment. The appraisal variables are linked to different cognitive process levels in the model of Leventhal and Scherer (1987). The following set of labels (Scherer et al., 2004) can be used to describe the protagonist's appraisal of the event or events at the focus of his/her emotional state:relevance, implications Agency responsible, coping potential, compatibility of the situation with standards.
    Use cases:

    4.1.5 Action tendencies

    It should be possible to characterise emotions in terms of the action tendencies linked to them (Frijda, 1986). For example, anger is linked to a tendency to attack, fear is linked to a tendency to flee or freeze, etc. This requirement is not linked to any of the currently envisaged use cases, but has been added in order to cover the theoretically relevant components of emotions better. Action tendencies are potentially very relevant for use cases where emotions play a role in driving behaviour, e.g. in the behaviour planning component of non-player characters in games.

    4.1.6. Multiple and/or complex emotions

    The emotion markup should provide a mechanism to represent mixed emotions.
    The following use cases demonstrate use of multiple and / or complex emotions:

    4.1.7. Emotion intensity

    The intensity is also a dimension. The emotion markup should provide an emotion attribute to represent the intensity. The value of attribute intensity is in [0;1].
    The following use cases are examples for use of intensity information on emotions:

    4.1.8. Emotion regulation

    According to the process model of emotion regulation described by Gross (2001), emotion may be regulated at five points in the emotion generation process: selection of the situation, modification of the situation, deployment of attention, change of cognition, and modulation of experiential, behavioral or physiological responses. The most basic distinction underlying the concept of regulation of emotion-related behaviour is the distinction of internal vs. external state. The description of the external state is out of scope of the language - it can be covered by referring to other languages such as Facial Action Coding System (Ekman et al. 2002), Behavior Mark-up Language (Vilhjalmsson et al. 2007).
    Other types of regulation-related information can represent genuinely expressed/felt (inferred)/masked(how well)/simulated, or inhibition/masking of emotions or expression, or excitation/boosting of emotions or expression.
    The emotion markup should provide emotion attributes to represent the various kinds of regulation. The value of these attributes should be in [0;1].
    The following use cases are examples for regulation being of interest:

    4.1.9. Temporal aspects

    This section covers information regarding the timing of the emotion itself. The timing of any associated behaviour, triggers etc. is covered in section 4.3 "Links to the rest of the world".
    The emotion markup should provide a generic and optional mechanism for temporal scope. This mechanism allows different way to specify temporal aspects such as i) start-time + end-time, ii) start-time+duration, iii) link to another entity (start 2 seconds before utterance starts and ends with the second noun-phrase...), iv) a sampling mechanism providing values for variables at even spaced time intervals.
    The following use cases require the annotation of temporal dynamics of emotion.:

    4.2. Meta-information about individual emotion annotation

    4.2.1. Acting

    The emotion markup should provide a mechanism to add special attributes for acted emotions such as perceived naturalness, authenticity, quality, and so on.
    Use cases:

    4.2.2. Confidence / probability

    The emotion markup should provide a generic attribute enabling to represent the confidence (or, inversely, uncertainty) of detection/annotation or more generally speaking of probability to be assigned to one representation of emotion to each level of representation (category, dimensions, degree of acting, ...). This attribute may reflect the confidence of the annotator that the particular value is as stated (e.g. that the user in question is expressing happiness with confidence 0.8), which is important especially in masked expressivity, or the confidence of an automated recognition system with respect to the samples used for training. If this attribute is supplied per modality it can be exploited in recognition use cases to pinpoint the dominant or more robust of the existing modalities.
    The following use cases require the annotation of confidence:

    4.2.3. Modality

    It represents the modalities in which the emotion is reflected, e.g. face, voice, body posture or hand gestures, but also lighting, font shape, etc.
    The emotion markup should provide a mechanism to represent an open set of values.
    The following use cases require the annotation of modality:

    4.3. Links to the "rest of the world"

    4.3.1. Links to media

    Most use cases rely on some media representation. This could be video files of users' faces whose emotions are assessed, screen captures of evaluated user interfaces, audio files of interviews, but also other media relevant in the respective context, like pictures or documents.
    Linking to them could be accomplished by e.g. an URL in an XML node.
    The following use cases require links to the "rest of the world":

    4.3.2. Position on a time line in externally linked objects

    The emotion markup should provide a link to a time-line. Possible values of temporal linking are absolute (start- and end-times) and relative and refer to external sources (cf. 4.3.1) like snippets (as points in time) of media files causing the emotion.
    Start- and end-times are important to mark onset and offset of an emotional episode.
    The following use cases require annotation on specific positions on a time line:

    4.3.3. The semantics of links to the "rest of the world"

    The emotion markup should provide a mechanism for flexibly assigning meaning to those links.
    The following initial types of meaning are envisaged:
    • The experiencer (who "has" the emotion);
    • The observable behaviour "expressing" the emotion;
    • The trigger/cause/emotion-eliciting event of the emotion;
    • The object/target of the emotion (the thing that the emotion is about).
    We currently envisage that the links to media as defined in section 4.3.1 are relevant for all of the above. For some of them, timing information is also relevant:
    • observable behaviour
    • trigger
    The following use cases require annotation on semantics of the links to the "rest of the world":

    4.4. Global metadata

    Representing emotion, be it for annotation, detection or generation, requires the description of the context not directly related to the description of emotion per se (e.g. the emotion-eliciting event) but also the description of a more global context which is required for properly exploiting the representation of the emotion in a given application. Specifications of metadata for multimodal corpora have already been proposed in the ISLE Metadata Initiative [IMDI]; but they did not target emotional data and were focused on an annotation scenario.
    The joint specification of our three use cases led to the identification of four groups of global metadata: information on persons involved, the purpose of classification i.e. the intended or used application, information on the technical environment, and on the social and communicative environment. Those are described in the following.
    The following use cases require annotation of global metadata:

    4.4.1. Info on Person(s)

    Information are needed on the humans involved. Depending on the use case, this would be the labeler(s) (Data Annotation), persons observed (Data Annotation, Emotion Recognition), persons interacted with, or even computer-driven agents such as ECAs (Emotion Generation). While it would be desirable to have common profile entries throughout all use cases, we found that information on persons involved are very use case specific. While all entries could be provided and possibly used in most use cases, they are of different importance to each.
    Examples are:
    • For Data Annotation: gender, age, language, culture, personality traits, experience as labeler, labeler ID (all mandatory)
    • For Emotion Recognition: gender, age, culture, personality traits, experience with the subject, e.g. web experience for usability studies (depending on the use case all or some mandatory).
    • For Emotion Generation:gender, age, language, culture, education, personality traits (again, use case dependant)
    The following use cases need information on the person(s) involved:

    4.4.2. Purpose of classification

    The result of emotion classification is influenced by its purpose. For example, a corpus of speech data for training an ECA might be differently labelled than the same data used for a corpus for training an automatic dialogue system for phone banking applications; or the face data of a computer user might be differently labeled for the purpose of usability evaluation or guiding an user assistance program.These differences are application or at least genre specific. They are also independent from the underlying emotion model.
    The following use cases need information on the purpose of the classification:

    4.4.3. Technical environment

    The quality of emotion classification and interpretation, by either humans or machines, depend on the quality and technical parameters of sensors and media used.
    Examples are:
    • Frame rate, resolution, colour characteristics of video sources;
    • Dynamic range, type of sound field of microphones;
    • Type of sensing devices for physiology, movement, or pressure measurements;
    • Data enhancement algorithms applied by either device or pre-processing steps.
    Also should the emotion markup be able to hold information on which way an emotion classification has been obtained, e.g. by a human observer monitoring a subject directly, or via a life stream from a camera, or a recording; or by a machine, utilising which algorithms.
    The following use cases need information on the technical environment:

    4.4.4. Social and communicative environment

    The emotion markup should provide a global information to specify genre of the observed social and communicative environment and more generally of the situation in which an emotion is considered to happen (e.g. fiction (movies, theater), in-lab recording, induction, human-human, human-computer (real or simulated)), interactional situation (number of people, relations, link to participants).
    The following use cases require annotation of the social and communicative environment:

    4.5. Ontologies of emotion descriptions

    Descriptions of emotions and of emotion-related states are heterogeneous, and are likely to remain so for a long time. Therefore, complex systems such as many foreseeable real-world applications will require some information about (1) the relationships between the concepts used in one description and about (2) the relationships between different descriptions.

    4.5.1. Relationships between concepts in an emotion description

    The concepts in an emotion description are usually not independent, but are related to one another. For example, emotion words may form a hierarchy, as suggested e.g. by prototype theories of emotions. For example, Shaver et al. (1987) classified cheerfulness, zest, contentment, pride, optimism enthrallment and relief as different kinds of joy, irritation, exasperation, rage, disgust, envy and torment as different kinds of anger, etc.
    Such structures, be they motivated by emotion theory or by application-specific requirements, may be an important complement to the representations in an Emotion Markup Language. In particular, they would allow for a mapping from a larger set of categories to a smaller set of higher-level categories.
    The following use case demonstrates possible use of hierarchies of emotions:

    4.5.2. Mappings between different emotion representations

    Different emotion representations (e.g., categories, dimensions, and appraisals) are not independent; rather, they describe different parts of the "elephant", of the phenomenon emotion. Insofar, it is conceptually possible to map from one representation to another one in some cases; in other cases, mappings are not fully possible.
    Some use cases require mapping between different emotion representations: e.g., from categories to dimensions, from dimensions to coarse categories (a lossy mapping), from appraisals onto dimensions, from categories to appraisals, etc.
    Such mappings may either be based on findings from emotion theory or they can be defined in an application-specific way.
    The following use cases require mappings between different emotion representations:
    • 2b "Multimodal emotion recognition" - different modalities might deliver their emotion result in different representations
    • 3b "the ECA system", it is based on a smaller set of emotion categories compared with the list of categories that are generated in use case 3a, so a mapping mechanism is needed to convert the larger category set to a smaller set
    • 3b speech synthesis with emotion dimensions would require a mapping from appraisals or emotion categories if used in combination with an affective reasoner (use case 3a).

    4.6. Assessment in the light of emotion theory

    The collection of use cases and subsequent definition of requirements presented so far was performed in a predominantly bottom-up fashion, and thus captures a strongly application centered, engineering driven view. The purpose of this section is to compare the result with a theory centered perspective. A representation language should be as theory independent as possible but by no means ignorant of psychological theories. Therefore a crosscheck to which extent components of existing psychological models of emotion are mirrored in the currently collected requirements is performed.
    In Section 2, a list of prominent concepts that have been used by psychologists in their quest for describing emotions has been presented. In this section it is briefly discussed whether and how these concepts are mirrored in the current list of requirements.
    Subjective component: Feelings.
    Feelings have not been mentioned in the requirements at all.
    They are not to be explicitly included in the representation for the moment being, as they are defined as internal states of the subject and are thus not accessible to observation. Applications can be envisaged where feelings might be of relevance in the future though, e.g. if self-reports are to be encoded. It should thus be kept as an open issue on whether to allow for an explicit representation of feelings as a separate component in the future.
    Cognitive component: Appraisals
    As a references to appraisal-related theories the OCC model (Ortony et al 1988), which is especially popular in the computational domain, has been brought up in the use cases, but no choice for the exact set of appraisal conditions is to be made here. An open issue is whether models that make explicit predictions on the temporal ordering of appraisal checks (Sander et al., 2005) should be encodable to that level of detail. In general, appraisals are to be be encoded in the representation language via attributing links to trigger objects.
    The encoding of other cognitive aspects, i.e. effects of emotions on the cognitive system (memory, perception, etc.) is to be kept an open issue.
    Physiological component:
    Physiological measures have been mentioned in the context of emotion recognition. They are to be integrated in the representation via links to externally encoded measures conceptualised as "observable behaviour".

    Behavioral component:
    Action tendencies
    It remains an issue of theoretical debate whether action tendencies, in contrast to actions, are among the set of actually observable concepts. Nevertheless these should be integrated in the representation language. This once again can be achieved via the link mechanism, this time an attributed link can specify an action tendency together with its object or target.

    Expressive component:
    Expressions are frequently referred to in the requirements. There is agreement to not encode them directly but again to make use of the linking mechanisms to observable behaviours.
    Components of Emotions
    Figure 1. Overview of how components of emotions are to be linked to external representations.

    Emergent emotions vs. other emotion-related states

    It was mentioned before that the representation language should definitively not be restricted to emergent emotions which have received most attention so far. Though emergent emotions make up only a very small part of the emotion-related states, they nevertheless are sort of archetypes. Representations developed for emergent emotions should thus be usable as basis for the encoding of other important emotion-related states such as moods and attitudes.
    Scherer (2000) systematically defines the relationship between emergent emotions and other emotion-related states by proposing a small set of so-called design features. Emergent emotions are defined as having a strong direct impact on behaviour, high intensity, being rapidly changing and short, are focusing on a triggering event and involve strong appraisal elicitation. Moods e.g. are in contrast described using the same set of categories, and they are characterised as not having a direct impact on behaviour, being less intense, changing less quickly and lasting longer, and not being directly tied to a eliciting event. In this framework different types of emotion-related states thus just arise from differences in the design features.
    It is an open issue whether to integrate means similar to Scherer's design features in the language. Because probably not many applications will be able to make use of this level of detail, simple means for explicitly defining the type of an emotion related state should be made available in the representation language anyway.

    5. Assessment of Existing Markup Languages

    Part of the activity of the group was dedicated to the assessment of existing markup languages in order to investigate if some of their elements or even concepts could fulfill the Emotion language requirements as described in section 4. In the perspective of an effective Emotional Markup design it will be in fact important to re-use concepts and elements that other languages thoroughly define. Another interesting aspect of this activity has been the possibility to hypothesize the interaction of the emotion markup language with other existing languages and particularly with those concerning multimodal applications.
    Seven markup languages have been assessed, five of them are the result of W3C initiatives that led to recommendation or draft documents, while the remaining are the result of other initiatives, namely the projects HUMAINE and INTERFACE.

    5.1 Assessment methodology

    The assessments were undertaken when the requirements of the emotion language were almost consolidated. To this end, the members of the group responsible for this activity adopted the same methodology that basically consisted in identifying among the markup specifications those elements that could be consistent with the emotional language constraints. In some cases links to the established Emotion Requirements were possible, being the selected elements totally fulfilling their features, while in other cases this was not possible even if the idea behind a particular tag could nevertheless be considered useful. Sometimes, to clarify the concepts, examples and citations from the original documents were included.
    These analyses, reported in Appendix 2, were initially published on the Wiki page, available for comments and editing to all the members of the incubator group. The structure of these documents consists of an introduction containing references to the analyzed language and a brief description of its uses. The following part reports a description of the selected elements that were judged as fulfilling the emotion language requirements.
    The five W3C Markup languages considered in this analysis are mainly designed for multimedia application. They deal with speech recognition and synthesis, ink and gesture recognition, semantic interpretation and the writing of interactive multimedia presentations. Among the two remaining markup languages, EARL (Schröder et al., 2006), whose aim is the annotation and representation of emotions, is an original proposal from the HUMAINE consortium. The second one, VHML, is a language based on XML sub-languages such as DMML (Dialogue Manager Markup Language), FAML (Facial Animation Markup Language) and BAML (Body Animation Markup Language).
    In detail, the existing markup languages that have been assessed are:
    • SMIL Synchronized Multimedia Integration Language (Version 2.1) W3C Recommendation 13 December 2005 [SMIL ]
    • SSML Speech Synthesis Markup Language (Version 1.0) W3C Recommendation 7 September 2004 [SSML ]
    • EMMA Extensible MultiModal Annotation markup language W3C Working Draft 9 April 2007 [EMMA ]
    • PLS Pronunciation Lexicon Specification (Version 1.0) W3C Working Draft 26 October 2006 [PLS ]
    • InkML Ink Markup Language (InkML) W3C Working Draft 23 October 2006 [InkML ]
    • EARL Emotion Annotation and Representation Language (version 0.4.0) Working draft 30 June 2006 [EARL ]
    • VHML Virtual Human Markup Language (Version 0.3) Working draft 21 October 2001 [VHML ]

    5.2 Results

    Many of the requirements of the emotion markup language cannot be found in any of the considered W3C markups. This is particularly true for the emotion specific elements, i.e. those features that can be considered the core part of the emotional markup language. On the other hand, we could find descriptions related to emotions in EARL and to some extent in VHML. The first one in particular provides mechanisms to describe, through basic tags, most of the required elements. It is in fact possible to specify the emotion categories, the dimensions, the intensity and even appraisals selecting the most appropriate case from a pre-defined list. Moreover, EARL includes elements to describe mixed emotions as well as regulation mechanisms like for example the degree of simulation or suppression. In comparison VHML, that is actually oriented to the behavior generation use case, provides very few emotion related features. It is only possible to use emotion categories (a set of nine is defined) and indicate the intensity. Beyond these features there is also the emphasis tag that is actually derived from the GML (Gesture Markup Language) module.
    Beyond the categorical and dimensional description of the emotion itself, neither EARL nor VHML provide any way to deal with emotion-related phenomena like for example attitudes, moods or affect dispositions.
    The analyzed languages, W3C initiatives or not, offer nevertheless interesting approaches for the definition of elements that are not strictly related to the description of emotions, but are important structural elements in any markup language. In this sense, interesting solutions to manage timing issues, to annotate modality and to include metadata information were found.
    Timing, as shown in the requirements section, is an important aspect in the emotional language markup. Time references are necessary to get the synchronization with external objects and when we have to represent the temporal evolution of the emotional event (either recognized, generated or annotated). W3C SMIL and EMMA both provide solutions to indicate absolute timing as well as relative instants with respect to a reference point that can be explicitly indicated as in EMMA or can also be an event like in SMIL standard. SMIL has also interesting features to manage the synchronization of parallel events.
    Metadata is another important element included in the emotional markup. The W3C languages provide very flexible mechanisms that could allow the insertion of any kind of information, for example related to the subject of the emotion, the trigger event, and finally the object, into this container. Metadata annotation is available in SMIL, SSML, EMMA and VHML languages through different strategies, from simple tags like the info element proposed by EMMA (a list of unconstrained attribute-value couples) to more complex solutions like in SMIL and SSML where RDF features are exploited.
    Also referring to modality the considered languages provide different solutions, from simple to articulated ones. Modality is present in SMIL, EMMA, EARL and VHML (by means of other sub languages). They are generally mechanisms that describe the mode in which emotion is expressed (face, body, speech, etc.). Some languages get into deeper annotations by considering the medium or channel and the function. To this end, EMMA is an example of an exhaustive way of representing modalities in the recognition use case. These features could be effectively extended to the other use cases, i.e. annotation and generation.
    Regarding interesting ideas, some languages provide mechanisms that are useful to manage dynamic lists of elements. An example of this can be found in the W3C PLS language, where name spaces are exploited to manage multiple sets of features.

    6. Summary and Outlook

    This first year as a W3C Incubator group was a worthwhile endeavour. A group of people with diverse backgrounds collaborated in a very constructive way on a topic which for a considerable time appeared to be a fuzzy area.
    During the year, however, the concepts became clearer; the group came to an agreement regarding the delimitation of the emotion markup language to related content (such as the representation of emotion-related expressive behaviour). Initially, very diverse ideas and vocabulary arose in a bottom-up fashion from use cases; the integration of requirements into a consistent document consumed a major part of the time.
    The conceptual challenges encountered during the creation of the Requirements document were to be expected, given the interdisciplinary nature of the topic area and the lack of consistent guidelines from emotion theory. The group made important progress, and has produced a structured set of requirements for an emotion markup language which, even though it was driven by use cases, can be considered reasonable from a scientific point of view.
    A first step has been carried out towards the specification of a markup language fulfilling the requirements: a broad range of existing markup languages from W3C and outside of W3C were investigated and discussed in view of their relevance to the EmotionML requirements. This survey provides a starting point for creating a well-informed specification draft in the future.
    There is a strong consensus in the group that continuing the work is worthwhile. The unanimous preference is to run for a second year as an Incubator group, whose central aim is to convert the conceptual work done so far into concrete suggestions and requests for comments from existing W3C groups: the MMI and VoiceBrowser groups. The current plan is to provide three documents for discussion during the second year as Incubator:
    • a simplified Requirements document with priorities (in time for face-to-face discussions at the Cambridge meeting in November);
    • an "early Incubator draft" version of an EmotionML specification, after 6 months;
    • a "final Incubator draft" version of an EmotionML specification, after 12 months.
    If during this second year, enough interest from the W3C constituency is raised, a continuation of the work in the Recommendation Track is envisaged.

    7. References

    7.1 Scientific references

    Arnold, M., (1960). Emotion and Personality, Columbia University Press, New York.
    Batliner, A., et al. (2006). Combining efforts for improving automatic classification of emotional user states. In: Proceedings IS-LTC 2006.
    Bevacqua, E., Raouzaiou, A., Peters, C., Caridakis, G., Karpouzis, K., Pelachaud, C., Mancini, M. (2006). Multimodal sensing, interpretation and copying of movements by a virtual agent. In: Proceedings of Perception and Interactive Technologies (PIT'06).
    Blech, M., Peter, C., Stahl, R., Voskamp, J., Urban, B.(2005). Setting up a multimodal database for multi-study emotion research in HCI. In: Proceedings of the 2005 HCI International Conference, Las Vegas
    Burkhardt, F., van Ballegooy, M., Englert, R., & Huber, R. (2005). An emotion-aware voice portal. Proc. Electronic Speech Signal Processing ESSP.
    Devillers, L., Vidrascu, L., Lamel, L. (2005). Challenges in real-life emotion annotation and machine learning based detection. Neural Networks 18, 407-422
    Douglas-Cowie, E., et al. (2006). HUMAINE deliverable D5g: Mid Term Report on Database Exemplar Progress. http://emotion-research.net/deliverables/D5g%20final.pdf
    Ekman, P., Friesen, W. (1978). The Facial Action Coding System. Consulting Psychologists Press, San Francisco
    Ekman, P., Friesen, W. C. and Hager, J. C. (2002). Facial Action Coding System. The Manual on CD ROM. Research Nexus division of Network Information Research Corporation.
    Frijda, N (1986). The Emotions. Cambridge: Cambridge University Press.
    Gross, J. J. (2001). "Emotion regulation in adulthood: timing is everything." Current Directions in Psychological Science 10(6). http://www-psych.stanford.edu/~psyphy/Pdfs/2001%20Current%20Directions%20in%20Psychological%20Science%20-%20Emo.%20Reg.%20in%20Adulthood%20Timing%20.pdf
    Kopp, S., Krenn, B., Marsella, S., Marshall, A., Pelachaud, C., Pirker, H., Thórisson, K., & Vilhjalmsson, H. (2006). Towards a common framework for multimodal generation in ECAs: the Behavior Markup Language. In Proceedings of the 6th International Conference on Intelligent Virtual Agents (IVA'06).
    Lazarus, R.S. (1966). Psychological stress and the coping process. McGraw-Hill. New York.
    Leventhal, H., and Scherer, K. (1987). The Relationship of Emotion to Cognition: A Functional Approach to a Semantic Controversy. Cognition and Emotion 1(1):3-28.
    Ortony, A Clore, G.L. and Collins A (1988). The cognitive structure of emotions. Cambridge University Press, New York.
    Peter, C., Herbon, A. (2006). Emotion representation and physiology assignments in digital systems. Interacting With Computers 18, 139-170.
    Roesch, E.B., Fontaine J.B. & Scherer, K.R. (2006). The world of emotion is two-dimensional - or is it? Paper presented to the HUMAINE Summer School 2006, Genoa.
    Russell, J. A.. & Feldman Barrett L (1999). Core Affect, Prototypical Emotional Episodes, and Other Things Called Emotion: Dissecting the Elephant Journal of Personalityand Social Psychology, 76, 805-819.
    Sander, D., Grandjean, D., & Scherer, K. (2005). A systems approach to appraisal mechanisms in emotion. Neural Networks: 18, 317-352.
    Scherer, K.R. (2000). Psychological models of emotion. In Joan C. Borod (Ed.), The Neuropsychology of Emotion (pp. 137-162). New York: Oxford University Press.
    Scherer, K. R. et al. (2004). Preliminary plans for exemplars: Theory. HUMAINE deliverable D3c. http://emotion-research.net/deliverables/D3c.pdf
    Schröder, M., Pirker, H., Lamolle, M. (2006). First suggestions for an emotion annotation and representation language. In: Proceedings of LREC'06 Workshop on Corpora for Research on Emotion and Affect, Genoa, Italy, pp. 88-92
    Shaver, P., Schwartz, J., Kirson, D., and O'Connor, C. (1987). Emotion knowledge: Further exploration of a prototype approach. Journal of Personality and Social Psychology, 52:1061-1086.
    Tekalp, M., Ostermann, J. (2000): Face and 2-d mesh animation in MPEG-4. Image Communication Journal 15, 387-421
    Vilhjalmsson, H., Cantelmo, N., Cassell, J., Chafai, N. E., Kipp, M., Kopp, S., Mancini, M., Marsella, S., Marshall, A. N., Pelachaud, C., Ruttkay, Z., Thórisson, K. R., van Welbergen, H. and van der Werf, R. J. (2007). The Behavior Markup Language: Recent Developments and Challenges. 7th International Conference on Intelligent Virtual Agents (IVA'07), Paris, France.
    Wilhelm, P., Schoebi, D. & Perrez, M. (2004). Frequency estimates of emotions in everyday life from a diary method's perspective: a comment on Scherer et al.'s survey-study "Emotions in everyday life". Social Science Information, 43(4), 647-665.

    7.2 References to Markup Specifications

    [SMIL]
    Synchronized Multimedia Integration Language (Version 2.1), W3C Recommendation 13 December 2005 http://www.w3.org/TR/SMIL/
    [SSML]
    Speech Synthesis Markup Language (Version 1.0) W3C Recommendation 7 September 2004 http://www.w3.org/TR/speech-synthesis
    [EMMA]
    Extensible MultiModal Annotation markup language W3C Working Draft 9 April 2007 http://www.w3.org/TR/emma/
    [PLS]
    Pronunciation Lexicon Specification (Version 1.0) W3C Working Draft 26 October 2006 http://www.w3.org/TR/pronunciation-lexicon/
    [InkML]
    Ink Markup Language (InkML) W3C Working Draft 23 October 2006 http://www.w3.org/TR/InkML
    [EARL]
    Emotion Annotation and Representation Language (version 0.4.0) Working draft 30 June 2006 http://emotion-research.net/earl
    [VHML]
    Virtual Human Markup Language (Version 0.3) Working draft 21 October 2001 http://www.vhml.org/
    [IMDI]
    ISLE Metadata Initiative (IMDI). http://www.mpi.nl/IMDI/

    8. Acknowledgements

    The editors acknowledge significant contributions from the following persons (in alphabetical order):
    • Paolo Baggia, Loquendo
    • Laurence Devillers, Limsi
    • Alejandra Garcia-Rojas, Ecole Polytechnique Federale de Lausanne
    • Kostas Karpouzis, Image, Video and Multimedia Systems Lab (IVML-NTUA)
    • Myriam Lamolle, Université Paris VIII
    • Jean-Claude Martin, Limsi
    • Catherine Pelachaud, Université Paris VIII
    • Björn Schuller, Technical University Munich
    • Jianhua Tao, Chinese Academy of Sciences
    • Ian Wilson, Emotion.AI

    Appendix 1: Use Cases

    Use case 1: Annotation of emotional data

    Use case 1a: Annotation of plain text

    Alexander is compiling a list of emotion words and wants to annotate, for each word or multi-word expression, the emotional connotation assigned to it. In view of automatic emotion classification of texts, he is primarily interested in annotating the valence of the emotion (positive vs. negative), but needs a 'degree' value associated with the valence. In the future, he is hoping to use a more sophisticated model, so already now in addition to valence, he wants to annotate emotion categories (joy, sadness, surprise, ...), along with their intensities. However, given the fact that he is not a trained psychologist, he is uncertain which set of emotion categories to use.

    Use case 1b: Annotation of XML structures and files

    (i) Stephanie is using a multi-layer annotation scheme for corpora of dialog speech, using a stand-off annotation format. One XML document represents the chain of words as individual XML nodes; another groups them into sentences; a third document describes the syntactic structure; a fourth document groups sentences into dialog utterances; etc. Now she wants to add descriptions of the 'emotions' that occur in the dialog utterances (although she is not certain that 'emotion' is exactly the right word to describe what she thinks is happening in the dialogs): agreement, joint laughter, surprise, hesitations or the indications of social power. These are emotion-related effects, but not emotions in the sense as found in the textbooks.
    (ii) Paul has a collection of pictures showing faces with different expressions. These pictures were created by asking people to contract specific muscles. Now, rating tests are being carried out, in which subjects should indicate the emotion expressed in each face. Subjects can choose from a set of six emotion terms. For each subject, the emotion chosen for the corresponding image file must be saved into an annotation file in view of statistical analysis.
    (iii) Felix has a set of Voice portal recordings and wants to use them to train a statistical classifier for vocal anger detection. They must be emotion-annotated by a group of human labelers. The classifier needs each recording labeled with the degree of anger-related states chosen from a bag of words.
    Beneath this, some additional data must be annotated also:
    • for the dialog design, beneath to know IF a user is angry, it's even more important to know WHY the user is angry: is the user displeased with the dialog itself, e.g. too many misrecognitions? Does he hate talking to machines as a rule? Is he dissatisfied with the company`s service? Is he simply of aggressive character?
    • often voice portal recordings are
      • -not human but DTMF tones or background noises (e.g. a lorry driving by) -not directed to the dialog but to another person standing beneath the user
    • the classifier might use human annotated features for training, e.g. transcript of words, task in application, function in dialog, ... this should be annotated also
    (iv) Jianhua allows listeners to label the speech with multiple emotions to form the emotion vector.

    Use case 1c: Chart annotation of time-varying signals (e.g., multi-modal data)

    (i) Jean-Claude and Laurence want to annotate audio-visual recordings of authentic emotional recordings. Looking at such data, they and their colleagues have come up with a proposal of what should be annotated in order to properly describe the complexity of emotionally expressive behaviour as observed in these clips. They are using a video annotation tool that allows them to annotate a clip using a 'chart', in which annotations can be made on a number of layers. Each annotation has a start and an end time.
    The types of emotional properties that they want to annotate are many. They want to use emotion labels, but sometimes more than one emotion label seems appropriate -- for example, when a sad event comes and goes within a joyful episode, or when someone is talking about a memory which makes them at the same time angry and desperate. Depending on the emotions involved, this co-occurrence of emotions may be interpretable as a 'blend' of 'similar' emotions, or as a 'conflict' of 'contradictory' emotions. The two emotions that are present may have different intensities, so that one of them can be identified as the major emotion and the other one as the minor emotion. Emotions may be communicated differently through different modalities, e.g. speech or facial expression; it may be necessary to annotate these separately. Attempts to 'regulate' the emotion and/or the emotional expression can occur: holding back tears, hiding anger, simulating joy instead. The extent to which such regulation is present may vary. In all these annotations, a given annotator may be confident to various degrees.
    In addition to the description of emotion itself, Jean-Claude and Laurence need to annotate various other things: the object or cause of the emotion; the expressive behaviour which accompanies the emotion, and which may be the basis for the emotion annotation (smiling, high pitch, etc.); the social and situational context in which the emotion occurs, including the overall communicative goal of the person described; various properties of the person, such as gender, age, or personality; various properties of the annotator, such as name, gender, and level of expertise; and information about the technical settings, such as recording conditions or video quality. Even if most of these should probably not be part of an emotion annotation language, it may be desirable to propose a principled method for linking to such information.
    (ii) Stacy annotates videos of human behavior both in terms of observed behaviors and inferred emotions. This data collection effort informs and validates the design of our emotion model. In addition, the annotated video data contributes to the function and behavior mapping processes.

    Use case 1d: Trace annotation of time-varying signals (e.g., multi-modal data)

    Cate wants to annotate the same clips as Jean-Claude (1c i), but using a different approach. Rather than building complex charts with start and end time, she is using a tool that traces some property scales continuously over time. Examples for such properties are: the emotion dimensions arousal, valence or power; the overall intensity of (any) emotion, i.e. the presence or absence of emotionality; the degree of presence of certain appraisals such as intrinsic pleasantness, goal conduciveness or sense of control over the situation; the degree to which an emotion episode seems to be acted or genuine. The time curve of such annotations should be preserved.

    Use case 1e: Multiparty interaction

    Dirk studies the ways in which persons in a multi-party discussion expresses their views, opinions and attitudes. We are particularly interested in how the conversational moves contribute to the discussion, the way an argument is settled, how a person is persuaded both with reason and rhetoric. He collects corpora of multi-party discussions and annotates them on all kinds of dimensions, one of them being a 'mental state' layer in which he tries to describe the attitudes that participants express with respect to what is being said and their emotional reactions to it. This layer includes elements such as: surprise, scepticism, anger, amusement, enthusiasm. He studies how these mental states are expressed and the functions of these expressions within the conversation.

    Use case 1f: annotation of emotional speech

    Enrico wants to annotate a speech database containing emotional phrases. This material is used to extract prosodic models that will be used to appropriately select acoustic units in a corpus based speech synthesis system. The database consists of short sentences that are recorded from many speakers that read the scripts simulating certain emotional styles. Actually, each sentence is read in different emotional styles (e.g. sad, happy, angry, etc.) and a neutral style is also considered as the baseline. We want to study the acoustic correlations of the considered emotional styles in order to extract simple rules that account for the variation of some acoustic parameters. To achieve this, he needs to annotate the speech data, taking into account the intensity of the relative emotion and the level of valence.

    Use case 1g: annotation of speech acts

    In another case, Enrico wants to annotate pre-recorded illocutionary acts. Most of these prompts are frequently used expressions that have a pragmatic function such as greetings, thanks, regrets, disapprovals, apologies, compliments, etc. Given their intrinsic nature, these sentences are read in an expressive way. Enrico has to group these expressions into linguistic categories and describe them in terms of emotional intensity. For instance 'Good morning!' could be read in different ways: it could be happy, excited, or even sad. Moreover, given the emotional style, there could be different levels of intensity that could be described quantitatively using a range of values between 0 and 1.

    Use case 1h: annotation of paralinguistic events

    Enrico wants to annotate para linguistics events such as laughs, sighs, pains or phenomena like these. These elements have to be described in terms of event category and of the emotion which they refer to. It could be useful also to describe quantitatively the effort of these events (for instance there could be 'weak' laughs or 'exaggerated' laughs).

    Use case 1i: Annotation of video clips of acted emotions

    Tanja recorded a video corpus where actors under the supervision of a director were instructed to produce isolated sentences with 10 different (categorically defined) emotions. In addition some of these emotions had to be produced with i) increased intensity ii) decreased intensity and iii) in a manner as the person would try to (unsuccessfully) hide/suppress her emotion.
    This way for each sentence its intended emotion, intensity and possible regulation attempts are already known and can be directly encoded. In a next step ratings of human annotators are added, who are rating the quality of the actors' performance: i) on how well the intended emotional content can be actually perceived (i.e. this is some skewed variant of 'annotator confidence') and ii) a rating of on how believability and naturalness of the performance.
    In the future extracts of the corpus should be used in classical rating experiments. These experiments may be performed on different combinations of modalities (i.e. full-body video, facial video, each with and without speech).

    Use case 2: Automatic recognition / classification of emotions

    Use case 2a: Recognition from speech

    (i) (Speech emotion classifier): Anton has built an emotion classifier from speech data which had been annotated in a way similar to use case 1b: emotion labels were assigned on a per-word basis, and the classifier was trained with the acoustical data corresponding to the respective word. Ten labels had been used by the annotators, but some of them occurred only very rarely. Based on a similarity metric, Anton merged his labels into a smaller number of classes. In one version, the classifier distinguishes four classes; in another version, only two classes are used. The classifier internally associates various probabilities to class membership. The classifier can either output only the one emotion that received the highest probability, or all emotions with their respective probabilities. Classifier results apply in the first step to a single word; in a second step, the results for a sentence can be computed by averaging over the words in the sentence.
    (ii) Felix has a set of Voice portal recordings, a statistical classifier, a group of human labelers and a dialog designer. The aim is for the classifier to give the dialog designer a detector of a negative user state in several stages so that he/she can implement dialog strategies to deal with the user's aggression. The training data should be annotated like in use case 1b (iii) and it should be possible to use it for for several dialog applications (i.e. classifiers), so there must be mechanisms to map several emotion categories and stages into each other.
    (iii) Jianhua allows listeners to label the speech with multiple emotions to form the emotion vector and then trains a classification tree model to predict emotion vectors from acoustic features. The final emotion recognition results are used in the dialogue system on line. The dialogue system uses the results to determine the prior level the task from customers. Negative emotions will result in quick service.
    (iv) Juan is working on robots. The Automatic Speech Recognition module of his robot would be able to identify the emotional state of the speaker, not only to transcribe the uttered sentences. This emotional identification data could be used by the kernel to adapt the behavior of the robot to the new situation (for example, the identification of happiness traces in the voice of visitor could make the kernel change the dialogue in order to provide more information about the last items that could have been the cause of that happy state). The data to transfer should be the detected emotions (or ), the intensity levels and the confidence values associated to each detected emotion, and the time interval.

    Use case 2b: Multimodal recognition

    (i) (Multimodal emotion classifier): George has built a multimodal emotion classifier that operates on facial, gesture and speech features. His main issue is that facial features and gesture expressivity are usually annotated on a frame level, gestures are described with timestamps in terms of phases, and speech features may be annotated in terms of words, tunes or arbitrary time windows. He would like to have an indication for each feature as to whether it can be broken down in smaller chunks and still have the same value or, inversely, be integrated across a wider window.
    (ii) Christian is interested in software ergonomics and has built a system that tracks users' behaviour while operating software or using web pages. The system also collects emotion information on the user by use of several sensing technologies. The system is equipped with various sensors for both, behaviour tracking and emotion detection, like the following.
    • sensors for emotion and behaviour tracking
    • emotion mouse and keyboard with integrated pressure sensors
    • eye/gaze tracker
    • camera for facial feature tracking
    • microphone for speech analysis
    • chair with pressure, tilt, and inclination sensors
    • sensors for physiological parameters (skin resistance, skin temperature, pulse, respiration)
    During a test, a user sits in the chair in front of the monitor and performs a task. Unobtrusive sensors monitor her behaviour: the mouse movements, mouse clicks, focal points, keyboard inputs; her posture and movements in the chair, facial feature changes, and utterances; and ongoing changes in her physiology. Robin also observes the user using the output of the face camera, microphone, and a screen copy of the user's monitor. He enters event markers into the system and adds comments on the user's performance, environmental events like distracting sounds, spontaneous assessments of the user's current emotions, or other observations he makes. After the test, Robin also talks with her about her experiences, her likes and dislikes on the software, and how she felt at particular situations using the playback feature of his analysing tool. All the information collected are of high value for Robin, who looks at the individual values of each modality and input device, as well as the interrelations between them, their timely order and changes over time. Robin also includes remarks on the user's performance during the task and the results of the questionnaire, and puts them in timely connection with the sensor data. Other information on the setting, the software tested, environmental information like air temperature, humidity, or air pressure, are available as meta data on the test as well. Information on the subject, like gender, age, or computer experience, are stored also.
    (iii) Jianhua builds a Audio-visual system. In traditional human computer interaction, the lack of the coordination mechanism of parameters under multi-model condition quite limits the emotion recognition. The fusing of different channels is not just the combination of them, but to find the mutual relations among them. Jianhua builds an emotion recognition system which is based on audio-visual information. Both facial and audio data were recorded, the detailed features, such as facial expression parameters, voice quality parameters, prosody parameters, etc. were figured out. The mutual relations between audio-visual information were also analyzed. With all above work, the multimodal parameters were integrated into a recognition model.
    (iv) Stacy works with ECAs. For the ECA's perception of other agents or humans, there is a roughly inverse mapping process (inverse compared to affective reasoning as in Use case 3a). That is, there are recognition processes that map from the surface behavior of others to the behavioral markup and then map the behavioral markup to a functional markup.

    Use Case 2c: Digital Radio Presenter

    Robert envisages building a "Digital Radio Presenter application", using natural language and dialogue generation technology. The system would present radio shows which would include introducing music, interviewing guests and interacting with listeners calling in to the show.
    • A speech recognition component would need to pass information concerning the emotional state of interviewees or callers to the dialogue manager.
    • Both quantitative and qualitative information and timing information (or some other means of reference) would be needed to align the emotional characteristics to orthographic or semantic information.

    Use case 2d: Induction of emotional behavior using games

    Lori wants to train an audiovisual emotion classifier and needs to record data. She would like to associate user reactions with specific events happening to the user; so, she builds a simple computer game (e.g. a left-to-right space shooter) where the enemies can be controlled by the person responsible for the recordings. In this framework, sudden incidents occur (e.g. such as enemies appearing out of nowhere) inducing positive or negative reactions from the user.

    Use case 2e: Automatic emotion identification from plain text

    Juan works an automatic person-machine interactive system (such as a robot) that could include a Natural Language module to identify the emotional state or attitude of the user by analyzing the sequence of words that have been recognized by the ASR (Automatic Speech Recognition) module or that have been written by the user in the computer interface.
    As a result of this detection, if the automatic system has been insulted (one or more times) it should get progressively more and more angry; otherwise, when praised, the self esteem of the robot should go higher and higher. If the machine is really emotional, the interpretation of the emotional content can be influenced by the emotional state of the machine (when angry, it is more probable for the system to detect negative words in the text).

    Use case 3: Generation of emotional system behavior

    Use case 3a: Affective reasoner

    (i) Ruth is using an affective reasoning engine in an interactive virtual simulation for children. Taking into account the current knowledge of the virtual situation, the affective reasoner deduces the appropriate emotional response. To do that, the situation is first analysed in terms of a set of abstractions from the concrete situation, capturing the emotional significance of the situation for the agent. These abstractions are called 'emotion-eliciting conditions' or 'appraisals' depending on the model used. These 'appraisals' can then be interpreted in terms of emotions, e.g. emotion categories.
    (ii) Ian has developed an engine that uses a core functional property of emotional behavior, to prioritize and pay attention to important real time events within a stream of complex events, and wishes to apply this system to the task of prioritizing real time stock quotes and alerting users to data they, personally, would find important, surprising and interesting. A user would personalize the system to match their own personality (or a different one should they so wish) so the systems behavior would roughly match the users own were they physically monitoring the real time stream of stock data. The system would present the user with only that information it determined to be interesting at any point in time. The presentation of data could be from a simple text alert to a more complex visual representation. A central server could receive the stream of real time events, assign values to each and then send those packaged events to each user where their own, personally configured, system would determine the importance of that particular event to that particular user.
    (iii) The cognitive-emotional state of ECAs (cf. UC 1c) inform their behavior in a multi step process. First the communicative intent and cognitive-emotional state of the agent is conveyed via an XML functional markup to a behavior generation process. That process in turn specifies a behavioral plan (surface text, gestures, etc) using a xml-based behavioral markup.
    • We have separate functional and behavioral markups (we are following the SAIBA framework (Kopp et al. 2006)) for the multi-step process mentioned above)
    • To maximally leverage the work across all the above uses, we want to use the same markup scheme across the uses.
    • The emotional state information in the functional markup follows the emotion theory we are modeling. The focus of our work is appraisal theory so our markup allows not only emotional category tags (e.g., fear) and intensity, but also the specification of causal antecedents of emotion - appraisal variables such as desirability of the event, who is blameworthy for the event, etc. These appraisals all have an intensity.
    • There are also more persistent affective information that can be conveyed from our model, such as trait affect and mood.
    • To do the behavioral markup, we need to know additional information on the appraisals such as who did the blameworthy act. Obviously, it has different behavioral consequences if the cause of the emotion is the addressee of an utterance or a third party that is not present, for example.
    • The behavior manifestations of felt emotions is distinct from intentionally communicated emotions that are not felt. In addition there are distinctions based on whether the emotion expression is being either suppressed or masked. So we need our markups to support that distinction.
    • Different constituents of an utterance will have different emotions associated with them, so markups have to annotate those parts separately.
    • In our model, there may be multiple emotions in play at a time. Behaviorally, that suggests emotion blends.
    • Both emotions and the physical expressions of emotions have durations and dynamics. There is a question of the extent to which the dynamics should be encoded in the annotations as opposed to the emotion model that drives the annotations or the animation controllers that model the physical motion of the expression.

    Use case 3b: Drive speech synthesis, facial expression and/or gestural behavior

    (i) Marc has written a speech synthesis system that takes a set of coordinates on the emotion dimensions arousal, valence and power and converts them into a set of acoustic changes in the synthesized speech, realized using diphone synthesis. If the speech synthesizer is part of a complex generation system where an emotion is created by an affective reasoner as in use case 3a, emotions must be mapped from a representation in terms of appraisals or categories onto a dimensional representation before they are handed to the speech synthesizer.
    (ii) Catherine has built an ECA system that can realize emotions in terms of facial expressions and gestural behavior. It is based on emotion categories, but the set of categories for which facial expression definitions exist is smaller than the list of categories that are generated in use case 3a. A mapping mechanism is needed to convert the larger category set to a smaller set of approximately adequate facial expressions. Catherine drives an ECA from XML tags that specifies the communicative functions attached to a given discourse of the agent. Her behavior engine instantiates the communicative functions into behaviors and computes the animation of the agent. The begin and end tags of each function mark the scope of the function. We synchronize communication function and speech in this way.
    Given tags describing emotions, Catherine's difficulty is to translate them into animation commands. She is looking for specification that would help this process. For the moment we are using a categorical representation of emotions.
    (iii) Alejandra wants to build an ontology driven architecture that allows animating virtual humans (VH) considering a previous definition of their individuality. This individuality is composed of morphological descriptors, personality and emotional state. She wants to have a module that conceptualizes the emotion of a VH. This module will serve as input to behavioral controllers that will produce animations and will update the motioned emotion module. The main property that has to have this definition of emotion is to allow plugging algorithms of behavior to allow the reuse of animations and make comparisons with different models of behavior or animation synthesis.
    (iv) Ian has developed an engine that generates facial gestures, body gestures and actions that are consistent with a given characters age, gender and personality. In the application of a web based visual representation of a real person Ian would like to allow users to add those visual representations of their friends to their blog or web site for example. In order for each character to represent its own user it needs to update the visual representation, this can be achieved based on received 'event' data from the user. Using this data a locally installed emotion engine can drive a 3D character for example to represent the emotional state of a friend. Events would be generated remotely, for example by actions taken by the friend being represented, these events would be sent to the users local emotion engine which would process the events, update the model of the friends emotional state (emotion dimensions) and then map those dimensional values to facial gesture, body gesture parameters and actions.
    (v) Christine built a system that implements Scherer's theory to animate an agent: going from a set of appraisal dimensions the system generates the corresponding facial expressions in their specific time. Contrarily as when using categorical representation the facial expression of the emotion does not appear instantaneously on the face but facial regions by facial regions depending on the appraisal dimensions that have been activated. She raised a number of issues that are quite interesting and that are not specified in Scherer's theory (for example how long does an expression of a given appraisal dimension should last).
    (vi) Jianhua generated an Emotional Speech System with both voice/prosody conversion method (from neutral speech to emotional speech) and Emotion Markup Languages (tags). The system is integrated into his TTS system and used for dialogue speech generation in conversational system.
    (vii) Jianhua also works on expressive facial animation. He is doing a new coding method which can give more detailed control of facial animation with synchronized voice. The coding system was finally transferred into FAPs which is defined in MPEG-4. The coding method allows the user to configure and build systems for many applications by allowing flexibility in the system configurations, by providing various levels of interactivity with audio-visual content.
    (viii) The face, arms and general movement of Juan's robot could be affected by the emotional state of the robot (it can go from one point to another in a way that depends on the emotional state: faster, slower, strength, etc.). The input would be the emotional state, the item (face, arm...), the interval (it could be a time interval - to be happy from now to then-, or a space interval -to be happy while moving from this point to that point, or while moving this arm, etc.)
    (ix) The Text To Speech module of Juan's robotic guide in a museum should accept input text with emotional marks (sent by the kernel or dialogue manager to the speech synthesiser): the intended emotions (or emotion representation values), the first and the last word for each emotion, the degree of intensity of the intended emotional expression. The TTS module could also communicate to the NL module to mark-up the text with emotional marks (if no emotional mark is present and the fully-automatic mode is active).

    Use case 3c: generation of speech acts

    In this example, Enrico wants to insert pre-recorded illocutionary acts into a corpus based speech synthesis system. If appropriately used in the unit selection mechanism, these prompts could convey an emotional intention in the generated speech. The input text (or part of it) of the synthesis system should be annotated specifying the emotional style as well as the level of activation. The system will look for the pre-recorded expression in the speech database that best fits the annotated text.

    Use case 3d: generation of paralinguistic events

    Enrico wants to generate synthetic speech containing para linguistics events such as laughs, sighs, pains or phenomena like these, in order to strengthen the expressive effect of the generated speech. These events are pre-recorded and stored in the TTS speech database. The speech synthesis engine should appropriately select the best speech event from the database, given an effective annotation for it in the text that has to be synthesized. These events could be inserted at a particular point in the sentence or could be generated following certain criteria.

    Use case 3e: Digital Radio Presenter

    Robert envisages building a "Digital Radio Presenter application", using natural language and dialogue generation technology. The system would present radio shows which would include introducing music, interviewing guests and interacting with listeners calling in to the show.
    • The language generation component would need to pass information regarding emotion to a speech synthesis component.
    • The digital presenter would use emotion (possibly exaggerated) to empathise with a caller.

    Appendix 2: Detailed assessment of existing markup languages

    A2.1 W3C SMIL2.0 and Emotion Markup Language

    According to http://www.w3.org/TR/SMIL/ SMIL has the following design goals.
    Quote:
    • Define an XML-based language that allows authors to write interactive multimedia presentations. Using SMIL, an author can describe the temporal behavior of a multimedia presentation, associate hyper-links with media objects and describe the layout of the presentation on a screen.Allow reusing of SMIL syntax and semantics in other XML-based languages, in particular those who need to represent timing and synchronization. ...
    Though SMIL is clearly designed for the purpose of encoding output-specifications, it nevertheless offers some interesting general purpose concepts.

    A2.1.1 Overall Layout: Modularization and Profiling

    Quote:
    • Modularization is an approach in which markup functionality is specified as a set of modules that contain semantically-related XML elements, attributes, and attribute values. Profiling is the creation of an XML-based language through combining these modules, in order to provide the functionality required by a particular application.
    • Profiling introduces the ability to tailor an XML-based language to specific needs, e.g. to optimize presentation and interaction for the client's capabilities. Profiling also adds the ability for integrating functionality from other markup languages, releasing the language designer from specifying that functionality.
    In the overall design of SMIL much emphasis is put on defining it in terms of sub-modules that can be individually selected and combined for being directly used or embedded into other XML-languages.
    This ability to be integrated in parts or as a whole into other XML-languages is a very desirable feature.
    Though the degree of sophistication in SMIL probably is not necessary for our purpose (SMIL is split into more than 30 modules!), the design of SMIL should nevertheless be inspected in order to see how its modularity is achieved in technical terms (i.e. name spaces etc.)

    A2.1.2 SMIL Metadata Module

    Cf. http://www.w3.org/TR/SMIL2/metadata.html
    Metadata in SMIL refers to properties of a document (e.g., author/creator, expiration date, a list of key words, etc.), i.e. it holds information related to the creation process of the document.
    In the Emotional Language Requirements Meta data covers a more extended range of information types. Nevertheless, it is worthwhile to consider the SMIL Metadata as well, both in terms of XML syntax as well as in content.
    SMIL provides two elements for specifying meta-data.
    1) <meta>
    This is an empty element with the 2 attributes: name and content.
    <smil:meta name="Title" content="The Wonderful EmoDataBase"/> <smil:meta name="Rights" content="Copyright by Mr. Hide"/> ...
    The choice for values of the attribute 'name' is unrestricted, i.e. any meta-data can be encoded BUT users are encouraged not to invent their own tags but to use the set of names from the "Dublin Core"-initiative.
    "Dublin Core Metadata Initiative", a Simple Content Description Model for Electronic Resources, Available at http://dublincore.org/
    2) <metadata>
    This is new since SMIL 2.0 and now allows for the specification of metadata in RDF syntax. Its only sub-element is <rdf:RDF>, i.e. an element that holds RDF-specifications. It is claimed that (Quote) RDF is the appropriate language for metadata.RDF specifications can be freely chosen but again the usage of the (RDF-version of) Dublin Core metadata specification is encouraged.

    A2.1.3 SMIL Timing Module

    This module deals with the specification of the synchronization of different media objects and thus provides one of the core-functionalities of SMIL. In SMIL the synchronization of objects is specified via (possibly) nested <seq> and <par> tags, enclosing media-objects that are to be presented in sequential and parallel order respectively. In addition to this sequential/parallel layout, for each media object start- and end-times can be specified either in terms of absolute values (e.g. start="2.5s") or in terms of events (start="movieXY.end+3.5s).
    This mechanism for temporal layout is very attractive for all sorts of systems where multiple streams need to be synchronized. Most specifically it has inspired the implementation of timing modules in a number of representation languages for Embodied Conversational Agents (ECA).
    This specification definitely is very handy for the purpose of the specification of timing in generation systems. It is very likely to be able to fulfill demands in the requirement regarding the Position on a time line in externally linked objects (section 4.3.2). Nevertheless it still needs to be evaluated whether this specification that is clearly biased towards generation should be part of the Emotion Markup Language.
    A much more modest but still attractive candidate for re-using encodings from SMIL is the syntax for 'Clock Values', i.e. for time-values:

    A2.2 W3C SSML 1.0 and Emotion Markup Language

    According to W3C SSML Recommendation 7 September 2004 (http://www.w3.org/TR/speech-synthesis ) the goal of this markup language is to provide a standard way to control different aspects in the generation of synthetic speech.
    Quote:
    • The Voice Browser Working Group has sought to develop standards to enable access to the Web using spoken interaction. The Speech Synthesis Markup Language Specification is one of these standards and is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in Web and other applications. The essential role of the markup language is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate, etc. across different synthesis-capable platforms.
    Current work on SSML is to define a version 1.1 which will better address internationalization issues. A SSML 1.1 first working draft was released on 10 January 2007 (http://www.w3.org/TR/speech-synthesis11 ). The publication of a second working draft is imminent.
    SSML is oriented to a specific application that is speech synthesis, i.e. the conversion of any kind of text into speech. Consequently, the elements and attributes of this markup are specific to this particular domain. Only the meta, metadata and maybe desc elements could be considered as fulfilling the requirements of the Emotional Markup Language, while all the other elements refer to something that is outside of the emotion topic. On the other hand SSML should interact with "Emotion ML", speech being one of the available modalities in the generation of emotional behavior. By means of specific processing, the Emotional Markup annotation should be converted into an SSML document containing the constraints regarding, for example, the prosody of the speech that has to be synthesized.

    A2.2.1 SSML meta and metadata elements

    The meta and metadata elements are used as containers for any information related to the document. The metadata tag allows the use of a metadata scheme and thus provides a more general and powerful mechanism to treat these typology of data. The meta element requires one of the two attributes "name" (to declare a meta property) or "http-equiv". A content attribute is always required. The only predefined property name is seeAlso and it can be used to specify a resource containing additional information about the content of the document. This property is modelled on the seeAlso property in Section 5.4.1 of Resource Description Framework (RDF) Schema Specification 1.0 RDF-SCHEMA .
    <speak version="1.0" ...xml:lang="en-US">  <meta name="seeAlso" content="http://example.com/my-ssml-metadata.xml"/>  <meta http-equiv="Cache-Control" content="no-cache"/>  </speak>
    The metadata element exploits a metadata schema to add information about the document. Any metadata schema is allowed but it is recommended to use the XML syntax of the Resource Description Framework (RDF) RDF-XMLSYNTAX in conjunction with the general metadata properties defined in the Dublin Core Metadata Initiative DC .
    Quote:
    • The Resource Description Format [RDF] is a declarative language and provides a standard way for using XML to represent metadata in the form of statements about properties and relationships of items on the Web. Content creators should refer to W3C metadata Recommendations RDF-XMLSYNTAX and RDF-SCHEMA when deciding which metadata RDF schema to use in their documents. Content creators should also refer to the Dublin Core Metadata Initiative DC , which is a set of generally applicable core metadata properties (e.g., Title, Creator, Subject, Description, Rights, etc.).
    <speak version="1.0" ... xml:lang="en-US">  <metadata>  <rdf:RDF  xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"  xmlns:rdfs = "http://www.w3.org/2000/01/rdf-schema#"  xmlns:dc = "http://purl.org/dc/elements/1.1/">   <!-- Metadata about the synthesis document -->  <rdf:Description rdf:about="http://www.example.com/meta.ssml"  dc:Title="Hamlet-like Soliloquy"  dc:Description="Aldine's Soliloquy in the style of Hamlet"  dc:Publisher="W3C"  dc:Language="en-US"  dc:Date="2002-11-29"  dc:Rights="Copyright 2002 Aldine Turnbet"  dc:Format="application/ssml+xml" >  <dc:Creator>  <rdf:Seq ID="CreatorsAlphabeticalBySurname">  <rdf:li>William Shakespeare</rdf:li>  <rdf:li>Aldine Turnbet</rdf:li>   </rdf:Seq>  </dc:Creator>  </rdf:Description>  </rdf:RDF>  </metadata>  </speak>
    Emotion ML might use a similar mechanism to address the metadata information related to the emotions.

    A2.2.2 SSML desc element

    The SSML desc element is used in conjunction with the audio element to add a description about the event itself. It is particularly useful when there is the need to textually explain paralinguistic information related to the audio. A mechanism like this could be generalized and used also in the emotion markup language to add descriptions to any generated event.
    <speak version="1.0" ...xml:lang="en-US">  <voice xml:lang="de-DE">   <audio src="ichbineinberliner.wav">Ich bin ein Berliner.  <desc xml:lang="en-US">Kennedy's famous German language gaffe</desc>  </audio>  </voice>  </speak>

    A2.3 W3C EMMA and Emotion Markup Language

    According to W3C EMMA working draft 9 April 2007 (http://www.w3.org/TR/emma/ ) this markup language is oriented to the interpretation of user input of a multimodal system.
    Quote:
    • The W3C Multimodal Interaction working group aims to develop specifications to enable access to the Web using multimodal interaction. This document [...] provides details of an XML markup language for containing and annotating the interpretation of user input. Examples of interpretation of user input are a transcription into words of a raw signal, for instance derived from speech, pen or keystroke input, a set of attribute/value pairs describing their meaning, or a set of attribute/value pairs describing a gesture. The interpretation of the user's input is expected to be generated by signal interpretation processes, such as speech and ink recognition, semantic interpreters, and other types of processors for use by components that act on the user's inputs such as interaction managers.
    As EMMA is an annotation scheme oriented to recognition applications, some of its elements and concepts could fulfill in particular the Use case 2 requirements of the emotional language markup. In the following paragraphs, only those EMMA specific elements that could be extended to the emotion markup are considered.
    The main EMMA element is <emma:interpretation> . It comprises different attributes and values and holds a single interpretation represented in application specific markup. Each interpretation element is univocally identified by means of the "id" attribute (of type xsd:ID).

    A2.3.1 EMMA Container elements

    Cf. http://www.w3.org/TR/emma/#s3.3 These elements are used to manage the interpretations and to group them according to different criteria. EMMA considers three types of container elements:
    • emma:one-of
    • emma:group
    • emma:sequence
    The first one is used to indicate a set of mutually exclusive interpretations of the input and maybe it could be used in the emotion markup in Use case 2. The second container element is intended for multiple interpretations provided by distinct inputs (speech, gesture, etc.) but that are used for a common task. The last element is used for interpretations that are sequential in time. In the emotion markup these containers could be also used to manage interpretations. The one-of mechanism is useful when more results are available and a choice among them has to be carried out. The group concept could be generalized and used, for example, to treat multiple or complex emotions. The last container is also useful to describe the evolution of an emotional phenomenon.
    Beyond these elements EMMA reports also the <emma:lattice> container, that is tightly linked to speech recognition applications. More interesting is the <emma:literal> element that is a child element of the interpretation and is used when the semantic results of the EMMA component are string literals without any surrounding application namespace markup. It could be useful also in the emotion markup to describe something not included in the application namespace.
    <emma:interpretation>  <emma:literal>boston</emma:literal>  </emma:interpretation>

    A2.3.2 EMMA Annotation elements

    EMMA model element

    The <emma:model> is an annotation element used to express constraints on the structure and content of instance data and is specified as one of the annotations of the instance. It is identified by means of an "id" attribute, while a "ref" attribute is used to reference the data model. Within a single EMMA document, it is possible to refer to multiple data models. Since the emotion markup will consider different and also alternative representations to describe emotions, an element like the "model" could be used to manage different constraints to represent data. Models could also be used to manage domain specific sets of emotion categories or types.
    <emma:model id="model1" ref="http://myserver/models/city.xml"/>  <emma:interpretation id="int1" emma:model-ref="model1">  <city> London </city>  <country> UK </country>  </emma:interpretation>  </emma:model>

    EMMA info element

    The <emma:info> element acts as a container for vendor and/or application specific metadata regarding a user's input. In the emotion markup a tag like this could be a location for metadata. It could be used to add information about the subject and the object of the analyzed phenomenon/event. Moreover this tag can include markups that are not specific to EMMA, but something extensible and adaptable to specific requirements.
    <emma:info>  <caller_id>  <phone_number>2121234567</phone_number>  <state>NY</state>  </caller_id>  <customer_type>residential</customer_type>  <service_name>acme_travel_service</service_name>   </emma:info>

    A2.3.3 EMMA Annotation attributes

    EMMA process attribute

    The <emma:process> attribute refers to the process that generates the interpretation. This annotation may include information on the process itself, such as grammar, type of parser, etc. There is no normative regarding the description of the process. This is something linked to the "rest of the world" in the emotion requirements and could be useful to indicate which process has produced the result that has to be interpreted, or also which process has to be used to generate the output, if we extend this concept to use case 3.
    <emma:interpretation id="better" emma:process="http://example.com/mysemproc1.xml">  <origin>Boston</origin>  <destination>Denver</destination>  <date>tomorrow</date>  </emma:interpretation>

    EMMA signal and media-type attributes

    The <emma:signal> attribute is a URI reference to the signal that originated the input recognition process while the <emma:media-type> attribute specifies the data format of the signal that originated the input. Also these attributes are links to the "rest of the world" and could be used to annotate, for example, audio and/or video sources.
    <emma:interpretation id="intp1" emma:signal="http://example.com/signals/sg23.bin">  <origin>Boston</origin>  <destination>Denver</destination>  <date>03152003</date>   </emma:interpretation>   <emma:interpretation id="intp1" emma:media-type="audio/dsr-202212; rate:8000; maxptime:40">  <origin>Boston</origin>  <destination>Denver</destination>  <date>03152003</date>  </emma:interpretation>

    EMMA confidence attribute

    The emma:confidence attribute is a score in the range from 0.0 (minimum) to 1.0 (maximum) included, that indicates the quality of the input. It may state for the confidence of whatever processor was responsible for the creation of the EMMA result and it can also be used to assign confidences to elements in instance data in the application namespace. In the emotion language requirements this attribute is considered with the same meaning as in EMMA, and it could be used at different levels of representation and therefore could be applied to different elements.
    <emma:interpretation id="meaning1" emma:confidence="0.6">  <destination emma:confidence="0.8"> Boston</destination>  <origin emma:confidence="0.6"> Austin </origin>   </emma:interpretation>

    EMMA cost attribute

    The emma:cost attribute is used to indicate the weight or cost associated with a user's input or part of it. It is conceptually related to the output of a recognition process when there are more interpretations. Values span from 0.0 to 10000000.
    <emma:interpretation id="meaning1" emma:cost="1600">  <location>Boston</location>  </emma:interpretation>    <emma:interpretation id="meaning2" emma:cost="400">  <location> Austin </location>  </emma:interpretation>

    A2.3.4 Timestamps in EMMA

    In Emma time references are indicated by using either relative or absolute timestamps. The time unit is the millisecond and absolute timestamps are the time in milliseconds since 1 January 1970 00:00:00 GMT. Absolute timestamps are indicated using the <emma:start> and <emma:end> tags. Regarding relative timestamps, EMMA defines the attribute <emma:time-ref-uri> that is a URI used to anchor the relative time and can be also an interval. The <emma:offset-to-start> attribute specifies the offset in milliseconds for the start of input from the anchor point. It is also possible to indicate timestamps relative to the end of the reference interval by setting the "end" value in the <emma:time-ref-anchor-point> attribute. Finally, the <emma:duration> attribute can be used to annotate the input duration and can be used independently of absolute or relative timestamps. In EMMA it is possible to have both absolute and relative timestamps in the same container.
    <emma:interpretation id="int2"  emma:time-ref-uri="#int1"  emma:time-ref-anchor-point="start"  emma:offset-to-start="5000">  <destination>Chicago</destination>  </emma:interpretation>

    A2.3.4 Modality in EMMA

    Modality is a key concept in the emotion language. For annotating the input modality EMMA considers two attributes <emma:medium> and <emma:mode>. The first one is a sort of broad classification. Its values are acoustic, tactile, visual. The second attribute specifies the mode of communication through the channel (values: speech, dtmf_keypad, ink, video, photograph, ...). It is also possible to classify inputs with respect to their communicative function by using the <emma:function> attribute whose values are, for example : recording, transcription, dialog, verification, ...
    <emma:one-of id="nbest1">  <emma:interpretation id="interp1"  emma:confidence="0.6"  emma:medium="tactile"  emma:mode="ink"  emma:function="dialog"  emma:verbal="true">  <location>Boston</location>  </emma:interpretation>   <emma:interpretation id="interp2"  emma:confidence="0.4"  emma:medium="tactile"  emma:mode="ink"  emma:function="dialog"  emma:verbal="false">   <direction>45</direction>  </emma:interpretation>  </emma:one-of>

    A2.4 W3C PLS and Emotion Markup Language

    According to W3C PLS (Pronunciation Lexicon Specification) second last call working draft 26 October 2006 (http://www.w3.org/TR/pronunciation-lexicon/ ), PLS is designed to enable interoperable specification of pronunciation information for both ASR and TTS engines within voice browsing applications.
    The "role" attribute of the lexeme element (see Section 4.4) is the only reviewed aspect of the PLS language.

    A2.4.1 The role attribute

    The values of the role attribute are based on QNAMES defined in Section 3.2.1.8 of XML Schema Part2: Datatypes Second Edition XML-SCHEMA . A QNAME or "qualified name" is composed of two parts separated by colon, where the first part is the qualification (a namespace prefix) and the second is a value defined in the namespace, e.g. "claws:VVI" for the value "VVI" in the namespace associated to the prefix "claws". The namespace guarantees that the values are unique and that they are extensible, if the namespace is changed, a different set of values is possible.
    The QNAMES might be used to represent categorization that cannot be easily defined. In PLS the example were the Part-Of-Speech (POS), which are used in differnt ways in the NL and ASR communities.
    This is an example of the use of the role attribute in PLS:
    <?xml version="1.0" encoding="UTF-8"?>  <lexicon version="1.0"  xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"  xmlns:claws="http://www.example.com/claws7tags"  alphabet="ipa"  xml:lang="en">  <lexeme role="claws:VVI claws:VV0 claws:NN1">  <!-- verb infinitive, verb present tense, singular noun -->  <grapheme>read</grapheme>  <phoneme>ri:d</phoneme>  <!-- IPA string is: "ri&#x2D0;d" -->  </lexeme>  <lexeme role="claws:VVN claws:VVD">  <!-- verb past participle, verb past tense -->  <grapheme>read</grapheme>  <phoneme>red</phoneme>  </lexeme>   </lexicon>

    A2.5 W3C InkML and Emotion Markup Language

    This mark up language (http://www.w3.org/TR/2006/WD-InkML-20061023/ ) is quite far removed from the work of this group, being a specification for passing around information captured from pen like input devices.
    Quote:
    • This document describes the syntax and semantics for the Ink Markup Language for use in the W3C Multimodal Interaction Framework as proposed by the W3C Multimodal Interaction Activity. The Ink Markup Language serves as the data format for representing ink entered with an electronic pen or stylus. The markup allows for the input and processing of handwriting, gestures, sketches, music and other notational languages in applications. It provides a common format for the exchange of ink data between components such as handwriting and gesture recognizers, signature verifiers, and other ink-aware modules.
    It does share some of the high level concepts that we would like to have within the EmoXG group specification, namely:
    1. Capture (or input) of data
    2. Events and streams of real time events
    3. Output processing
    It also has an emphasis on interoperability with other XML specifications, for example SMIL to allow for multi modal exchange of data.
    The specifics of the markup language are bound to pen devices, which is not directly relevant for the Emotion markup language. Perhaps of interest is the way in which this is an example of a multi modal specification (http://www.w3.org/TR/mmi-reqs/ ).
    Of further interest is in how their specification is put together, it seems similar in size and scope to what we would want to achieve and could be an interesting template. Their requirements document could also be a useful template (http://www.w3.org/TR/inkreqs/ ).

    A2.5.1 Multi Modal Interaction

    Of more interest is the Multi Modal Interaction guidelines (http://www.w3.org/TR/mmi-reqs/ ) which it seems we would be wise to follow if possible, an excerpt from the requirements document is relevant:
    "We are interested in defining the requirements for the design of multi modal systems -- systems that support a user communicating with an application by using different modalities such as voice (in a human language), gesture, handwriting, typing, audio-visual speech, etc. The user may be considered to be operating in a delivery context: a term used to specify the set of attributes that characterizes the capabilities of the access mechanism in terms of device profile, user profile (e.g. identify, preferences and usage patterns) and situation. The user interacts with the application in the context of a session, using one or more modalities (which may be realized through one or more devices). Within a session, the user may suspend and resume interaction with the application within the same modality or switch modalities. A session is associated with a context, which records the interactions with the user."
    Some of the key components of this specification are:
    1. Input (modality, processing system)
    2. Events (handlers, sources, time stamps)
    3. Output (modality, processing systems)
    4. User profiles (identify, preferences and usage patterns)
    5. Sessions (suspend, resume, context)
    6. Situation (interaction history)
    7. Interaction (management, synchronization)

    A2.6 HUMAINE EARL and Emotion Markup Language

    According to HUMAINE EARL language (Emotion Annotation and Representation Language) version 0.4.0, 30 June 2006 (http://emotion-research.net/earl ) this markup language is oriented to the representation and annotation of emotion in the use cases corpus annotation, recognition and generation of emotions in the first place.
    Quote:
    • This report proposes a syntax for an XML-based language for representing and annotating emotions in technological contexts. In contrast to existing markup languages, where emotion is often represented in an ad-hoc way as part of a specific language, we propose a language aiming to be usable in a wide range of use cases, including corpus annotation as well as systems capable of recognising or generating emotions. We describe the scientific basis of our choice of emotion representations and the use case analysis through which we have determined the required expressive power of the language. We illustrate core properties of the proposed language using examples from various use case scenarios.
    This said, EARL is by definition highly related to the envisaged use cases and specification and provides many solutions to the named requirements. As general evaluation, EARL provides several highly valuable mechanisms and sets of items for the given requirements. The proposed ability of "plug-ins" seems a must, as well. The main drawback of EARL to be overcome is its lack of mechanisms for the description of Global Metadata and Classification Schemes for Emotions / Ontologies, as named in the EmoXG requirements. Some minor lacks are: no provision of emotion-related phenomenon, no real acting reference, sparse/no position on a time line and semantic links to the "rest of the world".
    The next sections report a detailed evaluation by requirements with examples.

    A2.6.1 Emotion Core

    A2.6.1.1 Type of emotion-related phenomenon

    EARL does not allow for a specification of the emotion-related phenomenon as emotions, moods, interpersonal stances, etc.

    A2.6.1.2 Emotion categories

    EARL allows for "plug-ins" or dialects and provides presets for emotion categories that are valuable for re-consideration.
    <emotion category="pleasure">Hello!</emotion>
    A set of 48 default categories is provided following Cowie et al.

    A2.6.1.3 Emotion dimensions

    These are provided within EARL. Suggested dimensions are arousal, power, valence.
    <emotion xlink:href="face12.jpg" arousal="-0.2" valence="0.5" power="0.2"/>

    A2.6.1.4 Description of appraisals of the emotion or of events related to the emotion

    These are also provided within EARL. 19 appraisals are suggested following Scherer's works.
    <emotion xlink:href="face12.jpg" suddenness="-0.8" intrinsic_pleasantness="0.7" goal_conduciveness="0.3" relevance_self_concerns="0.7"/>

    A2.6.1.5 Action tendencies

    This is not covered by the EARL draft specification.

    A2.6.1.6Multiple and/or complex emotions

    It is possible to attach several tags to one event.
    <complex-emotion xlink:href="face12.jpg">  <emotion category="pleasure" probability="0.5"/>  <emotion category="friendliness" probability="0.5"/>   </complex-emotion>

    A2.6.1.7 Emotion intensity

    It is possible to associate intensities for emotions.
    <complex-emotion xlink:href="face12.jpg">  <emotion category="pleasure" intensity="0.7"/>  <emotion category="worry" intensity="0.5"/>  </complex-emotion>

    A2.6.1.8 Regulation

    Descriptors for regulation are also found in EARL.
    <complex-emotion xlink:href="face12.jpg">  <emotion category="pleasure" simulate="0.8"/>  <emotion category="annoyance" suppress="0.5"/>  </complex-emotion>

    A2.6.1.9 Temporal aspects

    Start/end time labels for emotions are as well included as a mechanism for continuous description of emotion changes in the FEELTRACE manner.
    <emotion start="2" end="2.7">  <samples value="arousal" rate="10">  0 .1 .25 .4 .55 .6 .65 .66  </samples>  <samples value="valence" rate="10">  0 -.1 -.2 -.25 -.3 -.4 -.4 -.45  </samples>   </emotion>

    A2.6.2 Meta-information about individual emotion annotations

    A2.6.2.1 Acting

    No general mechanism exists with respect to acting apart from the regulation descriptors.

    A2.6.2.2 Confidence / probability

    A probability tag is foreseen in EARL. In general, it is also possible to assign this probability to any level of representation.
    <emotion xlink:href="face12.jpg" category="pleasure" modality="face" probability="0.5"/>

    A2.6.2.3 Modality

    A modality tag exists in EARL and allows for assignment of emotion labels per modality.
    <complex-emotion xlink:href="clip23.avi">  <emotion category="pleasure" modality="face"/>  <emotion category="worry" modality="voice"/>   </complex-emotion>

    A2.6.3 Links to the "rest of the world"

    A2.6.3.1 Links to media

    A general hyperref link mechanism allows for links to media. However, this is not intended to connect further media with objects, in the first place.
    <complex-emotion xlink:href="face12.jpg">  ...   </complex-emotion>

    A2.6.3.2 Position on a time line

    Apart from the possibility to assign emotion labels in start/end time and continuous manner, no links e.g. for recognition results in absolute and relative manner are provided.

    A2.6.3.3 The semantics of links to the "rest of the world"

    Links to e.g. the experiencer, trigger of emotion, target of emotion, etc. are not included in EARL.

    A2.6.4Global Metadata

    Mechanisms for the provision of none of the following is provided in EARL:
    • Info on Person(s)
    • Purpose of classification
    • Technical Environment
    • Social & Communicative Environment

    A2.6.5 Classification Schemes for Emotions / Ontologies

    As for global meta-data description, EARL is lacking the possibility to construct a hierarchy of emotion words. Mapping mechanisms are also not provided.

    A2.7 VHML and Emotion Markup Language

    The Virtual Human Markup Language (VHML) was created within the European Union 5th Framework Research and Technology Project InterFace . It is described in http://www.vhml.org/ . VHML is a markup language intended to be used for controlling VHs regarding speech, facial animation, facial gestures and body animation. It is important to notice that VHML has a simple representation of Emotion, however, it can be an example of the requirements formulated in Use case 3.
    Quote:
    • The language (VHML) is designed to accommodate the various aspects of human computer interaction with regards to facial animation, text to speech production, body animation, dialogue manager interaction, emotional representation plus hyper and multi media information. It uses existing standards and describes new languages to accommodate functionality that is not catered for. The language is XML/XSL based, and consists of the following sub languages:
      • EML Emotion Markup Language
      • GML Gesture Markup Language
      • SML Speech Markup Language (based on SSML)
      • FAML Facial Animation Markup Language
      • BAML Body Animation Markup Language
      • XHTML eXtensibleHyperText Markup Language
      • DMML Dialogue Manager Markup Language (based on W3C Dialogue Manager or AIML)
    The next sections report a detailed evaluation by requirements with examples.

    A2.7.1. Emotion Core

    A2.7.1.1 Type of emotion-related phenomenon

    As VHML is for HCI using Virtual Humans, its representations can be considered as Affect dispositions.

    A2.7.1.2 Emotion categories

    Within EML used by VHML the categories of emotions used are: afraid, angry, confused, dazed, disgusted, happy, neutral, sad, surprised, default-emotion.

    A2.7.1.3 Emotion dimensions

    This aspect is not specified by VHML.

    A2.7.1.4 Description of appraisals of the emotion or of events related to the emotion

    This aspect is not specified by VHML.

    A2.7.1.5 Action tendencies

    This aspect is not specified by VHML.

    A2.7.1.6 Multiple and/or complex emotions

    This aspect is not specified by VHML.

    A2.7.1.7 Emotion intensity

    Intensity can be based on numeric values (0-100), or low-medium-high categories.
    <afraid intensity="50"> Do I have to go to the dentist? </afraid>

    A2.7.1.8 Regulation

    Within the Gesture Markup Language (GML) of VHML, there is the definition of an emphasis element. Depending on the modality, speech or face, the element is synthesized.
    <emphasis level="strong"> will not &lt;/emphasis> buy this record, it is scratched.

    A2.7.1.9 Temporal aspects

    VHML specifies two temporal attributes for an emotion: 1. Duration, in seconds or milliseconds that the emotion will persist in the Virtual Human. 2. Wait, represents a pause in seconds or milliseconds before continuing with other elements or plain text in the rest of the document.
    <happy duration="7s" wait="2000ms"/> It's my birthday today.

    A2.7.2 Meta-information about individual emotion annotations

    A2.7.2.1 Acting

    This aspect is not specified by VHML.

    A2.7.2.2 Confidence / probability

    This aspect is not specified by VHML.

    A2.7.2.3 Modality

    Modalities that can be established by referring to other ML: GML a gesture, FAML a facial animation, SML a spoken and BAML body animation.
    <happy>  I think that this is a great day.  <smile duration="2s" wait="1s"/>  <look-up>  Look at the sky. There is <emphasislevel="strong">not a single </emphasis> cloud.  </look-up>  <agree duration="3500ms" repeat="4"/>  The weather is perfect for a day at the beach.   </happy>

    A2.7.3 Links to the "rest of the world"

    A2.7.3.1. Links to media

    EML allows having elements of the other markup languages to specify the modality.

    A2.7.3.2. Position on a time line

    This aspect is not specified by VHML.

    A2.7.3.3. The semantics of links to the "rest of the world"

    This aspect is not specified by VHML.

    A2.7.4. Global Metadata

    A2.7.4.1 Info on Person(s)

    VHML specifies the speaker of the text, regarding gender, age and category as well as with which emotion it is supposed to speak and act in general.
    The person element contains the following attributes: age category (child, teenager, adult, elder), gender, name (specifies a platform specific voice name to speak the contained text), variant (specifies a preferred variant of another person to speak the contained text), disposition (specifies the emotion that should be used as default emotion for the contained text - the name of any of the EML elements).
    <person age="12" gender="male" disposition="sad" variant="fred:1">  ...  </person>  <person variant="fred:2">  ...  </person>
    None of the following information can be explicitly indicated in VHML:
    • Purpose of classification
    • Technical Environment
    • Social & Communicative Environment

    Latest version:




    http://www.w3.org/2005/Incubator/emotion/XGR-emotion

    Editors:




    Marc Schröder, DFKI
    Enrico Zovato, Loquendo
    Hannes Pirker, OFAI
    Christian Peter, Fraunhofer
    Felix Burkhardt, Deutsche Telekom

    Contributors:




    See Acknowledgements.





A2.7.5 Classification Schemes for Emotions / Ontologies

VHML is lacking the possibility to construct a hierarchy of emotion words and provide mapping mechanisms.

W3C Emotion Incubator Group: Internet Emotion Incubator or How W3C Almost Went Totally Fucking W3Queer in 2007 W3C Emotion Incubator Group W3C Incubator Group Report 10 July 2007 This version: http://www.w3.org/2005/Incubator/emotion/XGR-emotion-20070710 Abstract This is the report of the W3C Emotio ...http://whatgetsmehot.posterous.com/internet-emotion-incubator-or-how-w3c-went-to» more