Notes
Slide Show
Outline
1
Computer Applications for Translators: Basic CAT
Tool Components
      • Sue Ellen Wright
      • Kent State University
      • Institute of Applied Linguistics
      • © Sue Ellen Wright  &  Alan Melby 2005
2
Computer Applications for Translators
  • Basic text production tools
  • Controlled authoring tools
  • Corpus management tools
  • Terminology management tools
  • Translation “workbench” type tools
    • Translation memory
    • Tagged text protection
  • Machine translation systems


3
Basic text production tools
  • Word-processing software
    • Word
  • HTML-editors
    • Dreamweaver, Homesite
  • Text editors
    • Notepad, Wordpad, Ultra-Edit
  • Desktop publishing systems
    • Quark Express, Framemaker, Pagemaker


4
Processing Order
  • Text/content production (customer side)
  • Corpus compilation
  • Text alignment of legacy files
  • Term extraction
  • Terminology management
  • Translation using integrated translation tools
    • TM + WP + TermBase
  • Project & workflow management (ongoing)



5
Organization of the Eight Tool Functions
6
Corpus Management Tools
  • Concordancers
  • Statistical analyses
  • Global search capabilities
  • Frequency reports
  • XML-based intelligent searches
7
Sources of Corpus Components
  • Enterprise information resources
  • Existing parallel texts
  • Texts mined from the Web
  • Why collect texts?
    • Leveraging existing translation solutions
    • Creating a knowledge base, in some cases, an enterprise-wide knowledge base
    • Leveraging existing terminology solutions
8
Terminology Extraction
  • When is it most useful?
    • If you have a good corpus available, particularly if you have parallel or aligned texts
    • If your source text is in machine-readable form
  • When wouldn’t I use it?
    • When the source text is not in electronic form
    • When resources are scattered through many very small texts
9
Term Extraction
  • What are the advantages to using
    Term Extract or other extraction tools?
    • Easier working interface
    • Easy generation of starter term entries
    • Easy access to concordances & selection of context references
    • Easy exportation when it works!
10
Terminology Management
  • Why store terminological information?
    • Repeatability and reproducibility:
    • Reuse of previously researched equivalents
    • Harmonization of usage across a translation team
    • Application for internationalization phase – reuse by technical writers within enterprises
    • Interaction in translation memory environments


11
Term Level before Translation
  • Term candidate extraction, either “manual” or automatic
  • Terminology research using text corpora
    • Web-based screen scraping
    • Concordancing, etc.
  • Listing single word (simple) terms
  • Listing multiword terms
12
Term-level during Translation
  • Manual term lookup in interactive word-processing/term database environment
  • Automatic term lookup using translation memory and other CAT tools
  • Concordancing for comparison of fuzzy matches
  • Assured consistency, accuracy, & spelling
13
Term-Level after Translation
  • Terminology consistency check
  • Non-allowed terminology check


14
Sample Bitext (Aligned Segments)
15
Difference between Segmentation & Bitext
  • TM: Segments are stored separately.
  • Reassembly of segments to form original source or target text only possible if one of these texts is fed into TM.
  • Bitext: Segments are identified in situ and the whole text is retained in memory.
  • Individual segments are identified and presented dynamically as needed. (Transcorpora™)
16
Text Alignment
  • Why align existing translations?
    • Convert legacy translations for use in creating new translations
    • Create structured resources for term extraction
    • Share existing translations with translation teams in an interactive way
17
Segment Level before Translation
  • New text segmentation
  • Previous source-target text alignment
  • Indexing
  • “Clean-up” phase
    • TM prepared for leveraging – reuse of existing translation segments
    • Potential resource for terminology & text-based translation research
18
Segment Level during Translation
  • So-called “pre-translation” phase
  • Translation memory lookup
    • Fuzzy matching
    • Concordance information
  • Machine translation (esp. example-based)
    • Valuable for controlled language texts
    • Valuable for “gisting”
19
Segment Level after Translation
  • Missing segment detection
  • Grammar check
  • Format retention or regeneration
20
Translation Workbench Tools
  • Trados/SDL
    • http://www.trados.com/
    • http://www.translationzone.com/
    • http://www.sdl.com/
  • Star
    • http://www.star-group.net/eng/software/sprachtech/transit.html
  • DejaVu
    • http://www.atril.com/


21
Translation Workbench Tools
  • TermSeek
    • http://www.termseekinc.com/
  • MultiCorpora/MultiTrans
    • http://www.multicorpora.ca/index_e.html
  • I-Term
    • http://www.danterm.dk/iterm.htm


22
Tool Interaction
  • Combined tool components:
    • Word-processing (e.g., Word) or other text processing (TagEditor)
    • Automatic term lookup (MultiTerm)
    • Term in context (Concordancer)
    • Translation memory
  • Why do it?
    • Increased efficiency, increased accuracy
23
Project Environments
  • Coordinating project stages in team environments
  • Storing and processing data
    • The master program(s) need to be able to find all the files in order to interact.
    • MultiTerm: work off your own directory on the LAN.
    • Read-write problems if you constantly cycle a floppy or zip disk.
    • Save data to your LAN folder.
24
Infrastructure
  • Client-service provider environment
  • T9n/L10n team environment
  • Document creation & management system
  • Terminology database
  • Translation memory database
  • Networking environment
25
Translation Workflow & Billing
  • Workflow in medium to large translation and localization projects
    • Project management software
  • Workflow in content management, globalization management systems
26
Tools