Machine Translation (MT) is a research area of high theoretical
and practical value. The research of Machine Translation requires joint efforts of various
disciplines: linguistics, mathematics, Artificial Intelligence, Computer Science, etc.
This thesis mainly describes the design and implementation of a Chinese-English machine
translation system based on stratified analysis of syntax and semantics THCEMT.
We first discuss the different characteristics of Chinese and
other natural languages. In our system, we deployed a hybrid grammar system which
incorporates Context Free Grammar, Attribute-Constraint Grammar, and Case Grammar. The
mechanism of the system is rule-based and combines top-down and bottom-up analysis
methods. The analysis of semantics and some special Chinese words are highly emphasized.
In the lexical analysis level, an efficient Chinese automatic word
segmentation method which combines Adjacent Matching method and post-segmentation
correction based on syntactic and semantic constraints is proposed. Using this method we
can solve the problem of common segmentation error caused by reiterative locution,
cross-link ambiguity, and polysemantic ambiguity. In the cover range of our segmentation
rules, the accuracy of Chinese word segmentation is approximately 100%. Various methods
are used to disambiguate words in both syntactic and semantic level.
In the syntactic and semantic analysis level, we propose a Chinese
sentence analysis method that combines the top-down segmentation of word groups and
bottom-up unification. This method makes efficient use of syntactic and semantic
constraints in Chinese sentences, and emphasizes the capability of semantic analysis. It
is also well stratified and easy to implement.
In the system rule processing level, we have realized an
expandable system with the rule bases separated from the main program and set up a
natural, easy-to-interpret knowledge representation system. In this way, we can avoid
frequent changing of program codes when there are changes in our rules. The interpretation
of rules is also classified into several levels, thus partially solving the rule conflict
problem in typical rule-based systems.
Currently, we have built an automatic Chinese-English Machine
Translation system running on PCs, with rule bases separated from the translation program.
The whole process of translation, from input of source texts to output of target texts, is
totally automatic. Tests on example Chinese texts show that the system can produce
accurate and intelligible English texts within the limit of the existing rule bases.