RYCHLÝ, Pavel a Samuel ŠPALEK. Utok: The Fast Rule-based Tokenizer. In Aleš Horák, Pavel Rychlý, Adam Rambousek. Proceedings of the Sixteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2022. Brno: Tribun EU, 2022, s. 149-154. ISBN 978-80-263-1752-4. |
Další formáty:
BibTeX
LaTeX
RIS
@inproceedings{2240159, author = {Rychlý, Pavel and Špalek, Samuel}, address = {Brno}, booktitle = {Proceedings of the Sixteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2022}, editor = {Aleš Horák, Pavel Rychlý, Adam Rambousek}, keywords = {tokenizer; tokenization; text processing}, howpublished = {tištěná verze "print"}, language = {eng}, location = {Brno}, isbn = {978-80-263-1752-4}, pages = {149-154}, publisher = {Tribun EU}, title = {Utok: The Fast Rule-based Tokenizer}, url = {https://nlp.fi.muni.cz/raslan/2022/paper24.pdf}, year = {2022} }
TY - JOUR ID - 2240159 AU - Rychlý, Pavel - Špalek, Samuel PY - 2022 TI - Utok: The Fast Rule-based Tokenizer PB - Tribun EU CY - Brno SN - 9788026317524 KW - tokenizer KW - tokenization KW - text processing UR - https://nlp.fi.muni.cz/raslan/2022/paper24.pdf N2 - Tokenization is one of the first processing steps in most natural language processing applications. The papper introduces a new tokenizer Utok which follows the Unitok tokenizer in the form of simplicity of configuration for different languages and is much faster in processing speed. ER -
RYCHLÝ, Pavel a Samuel ŠPALEK. Utok: The Fast Rule-based Tokenizer. In Aleš Horák, Pavel Rychlý, Adam Rambousek. \textit{Proceedings of the Sixteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2022}. Brno: Tribun EU, 2022, s.~149-154. ISBN~978-80-263-1752-4.
|