Další formáty:
BibTeX
LaTeX
RIS
@inproceedings{484652, author = {Sojka, Petr and Antoš, David}, address = {Budapest}, booktitle = {Proceedings of EACL 2003 workshop Computational Linguistics for South Asian Languages -- Expanding Synergies with Europe}, keywords = {segmentation Thai competing patterns}, language = {eng}, location = {Budapest}, isbn = {1-932432-02-7}, pages = {65-72}, publisher = {Association for Computational Linguistics}, title = {Context Sensitive Pattern Based Segmentation: A Thai Challenge}, url = {http://computing.open.ac.uk/Sites/EACLSouthAsia/papers.htm}, year = {2003} }
TY - JOUR ID - 484652 AU - Sojka, Petr - Antoš, David PY - 2003 TI - Context Sensitive Pattern Based Segmentation: A Thai Challenge PB - Association for Computational Linguistics CY - Budapest SN - 1932432027 KW - segmentation Thai competing patterns UR - http://computing.open.ac.uk/Sites/EACLSouthAsia/papers.htm N2 - A Thai written text is a string of symbols without explicit word boundaries. A method for a development of a segmentation tool from a corpus of already segmented text is described. The methodology is based on the technology of competing patterns. A new UNICODE pattern generation program, OPATGEN, is used for the learning phase. We have shown feasibility of our methodology by generating patterns for Thai segmentation from already segmented text of the Thai corpus ORCHID: the segmentation algorithm quickly reaches F-score of 93 %. Finally, we enumerate possible new applications based on the pattern technique, and conclude with the suggestion of a general Pattern Translation Process. The technology is general and can be used for any other segmentation tasks as phonetic, morphologic segmentation, word hyphenation, sentence segmentation and text topic segmentation for any language. ER -
SOJKA, Petr a David ANTOŠ. Context Sensitive Pattern Based Segmentation: A Thai Challenge. In \textit{Proceedings of EACL 2003 workshop Computational Linguistics for South Asian Languages -- Expanding Synergies with Europe}. Budapest: Association for Computational Linguistics, 2003, s.~65-72. ISBN~1-932432-02-7.
|