Notes
Slide Show
Outline
1
Data Categories
for
Terminology Management
      • Sue Ellen Wright
      • Kent State University
      • Institute of Applied Linguistics
      • ©Sue Ellen Wright  2006
2
Packaging Data
  • Model for visualizing information
    • An amorphous flow of undifferentiated “stuff”
    • An aggregate of individual elements (little packages, components) that can be identified, delimited, organized (modeled), stored, retrieved, manipulated, and reused
    • Stuff people can figure out if they think about it
    • Stuff a computer program can automatically recognize and process
3
The Shopping Basket Model
  • A shopping basket is a container to put stuff in.
  • The stuff in the basket can be tossed in loose without packaging.
  • Stuff gets lost.
  • Stuff gets mixed up.
4
Bulk Product Loading
5
Packaging Stuff
  • Keeps products clean and uncontaminated
  • Makes them easy to identify
  • Makes them easy to store
  • Makes them easy to reuse
6
Unpackaged Data
7
Organized Data
8
Nested Structures
  • Termbases form nested structures
  • Large structures
    • Smaller structures
      • Smaller structures
        • Very small structures
  • Analogy:
    • Boxes inside of boxes
    • Matryoshka dolls
9
Hierarchical Model of the Term Entry
10
Termbase Structures:
Level 1
  • Top level: termbase = virtual files
  • Actual file components:
    • Master file
    • History file
    • Index
    • Lok file

11
Termbase Structures:
Level 2
  • Termbase files contain Terminological Entries (Term Entries)
    • One concept
    • All languages
    • All terms
    • All descriptive info
    • All administrative info
12
Termbase Structures:
Level 2
  • Other entry types
    • Bibliographical entries
    • Responsibility entries
    • Other shared resources
      • Thesaurus information
      • Classification systems
      • Info on external resources


13
Termbase Structures:
Level 3
  • Term entries are hierarchical
  • Term Entry
    • Language group
      • All the terms associated                                       with a language
    • Term info group (tig)
      • All the info associated                                             with a given term
14
Termbase Structures:
Data Categories
  • Big blue box: the termbase master file
    • Holds everything else
  • Smaller aqua box:                                      the term entry:
    • Holds data fields
      • Data categories
      • Data elements


15
Termbase Structures:
Data Categories
  • Smaller mauve box: the language set
    • All term info groups for a given language
    • Any info that pertains just to a given language
  • Small green box: the term information group
    • A single term
    • All related information

16
Data-Element Related Features
  • Data Modeling Variance
    • Granularity
    • Choice of level for a data element concept (field name/attribute)
  • Data Element Autonomy
    • Combinability and repeatability
    • Elemental nature of data elements
  • Shared Resources
17
Granularity
  • The degree of detail that can be achieved by using the available data fields (data categories) to document terminological information
  • Ex: Grammar   vs. (low granularity)
      • Part of speech (high granularity)
      • Gender
      • Number
18
Granularity
  • Low level of granularity:
    • Grammar: noun, masculine, singular
  • High level of granularity:
    • Part of speech: noun
    • Gender: masculine
    • Number: singular
  • Advantage of granularity: retrievability
  • Disadvantage: more work


19
Data Modeling Variance:
Minimum Granularity
20
Data Modeling Variance:
Increased Granularity
21
Data Modeling Variance:
Elemental Nature of Data Elements (Violation)
22
Data Element Autonomy
  • Term autonomy: each term has its own field
  • Which is combinable with a full set of descriptive data categories
  • Which are in turn repeatable throughout the term information group
23
Elemental Nature of Data Elements
  • Only one kind of thing can occupy a data element
    • e.g., no terms or synonyms listed as such in definition fields
  • Only one of a thing can occupy a data element
    • e.g., only one term in a term field
24
Data Modeling Variance
Data Model I
25
Data Modeling Variance
Term Autonomy
26
Data Modeling Variance II
Term Autonomy
27
Data Model II
Repeatability and Combinability
28
Data Model II
 Repeatability and Combinability
29
Shared Resources
30
Shared Resources
  • Graphics
  • Charts
  • Audio
  • Video
  • Drawings
  • Disk Archives
  • Responsibility Records
31
Bibliographical Entries
32
Graphics File
33
Shared Graphics File
34
Shared Graphics File
35
Termbase Structures:
Inside the Term Entry
  • Terms
    • Term autonomy
    • All terms are created equal
    • One term per term element
    • Complete documentation of each term possible
36
Termbase Structures: Term Type
  • Main entry term
  • Synonym
  • Abbreviation
  • Full form
  • Variant
  • Phrase
  • Collocation
  • Boilerplate
37
Termbase Structures:
Term-Related Info
  • Term
    • Part of speech
    • Grammatical gender
    • Grammatical number (use when necessary)
      • Plural form
    • Term type (Type) (see next slide)
    • Status
    • Regional label (not in our model)
    • Pronunciation (not in our model)
    • Register (usage register)



38
 
39
 
40
 
41
Termbase Structures:
Definition-Related Info
  • Definition
    • Source ID
      • (Points to shared resources)
    • Definition type
      • Translation?
    • Administrative info
      • Responsibility
      • Date
42
Termbase Structures:
Context-Related Info
  • Context
    • Source ID
      • (Points to shared resources)
    • Context type
      • Translation?
    • Administrative info
      • Responsibility
      • Date
43
Termbase Structures:
Other Data Categories
  • Concept relations
  • Notes
  • Other administrative information
  • Bibliographical information
  • Special categories, e.g.:
    • For standardization
    • For inventory control
44
MultiTerm Levels
  • Index
    • Terms
    • Term-like elements
      • (collocations, boilerplate)
  • Text
    • Free-form data entry
  • Attributes
    • Controlled picklists
      • Variant text
45
Matryoshka Dolls