Machine learning and natural language processing
doc. RNDr. Lubomír Popelínský, Ph.D.
Machine learning and natural language processing

QUESTIONS AND TASKS:

  • Natural language (pre)processing techniques and their relevance for building machine learning models applicable to text

  • Bag of words representation of text - pros and cons

  • Text representations. When the ordering of words (does not) matter-s. Mining web

  • Main text mining tasks

  • ML for disambiguation

  • CNN for NLP

  • Distributional hypothesis - historical context, linguistic motivations and practical implementations

  • Distributional vs. formal semantics

  • Latent semantic analysis - basic principles, pros and cons

  • Word embedings - comparison of selected popular approaches

  • Basic principles of language models and their training process

  • Techniques for data augmentation

  • NN for machine translation

  • Text clustering

  • Recurrent NN for NLP. Describe one task.

  • Outliers in text data.

  • Methods for outlier detection in text

  • Why we need LSTM. Describe a task where LSTM areuseful

  • ILP. why we need it. Describe a task where ILP is useful

  • Key words, keyness. How to compute.

  • Relational learning  (ILP) for key words (key phrases) detection

  • Text sumarization.  Two main approaches.

  • Extractive sumarization. ROUGE-n.

  • Sentiment analysis - the definition of the field, justification of its practical relevance and main challenges

  • Detailed overview of a selected lexicon-based approach to sentiment analysis

  • Detailed overview of a selected classical machine learning approach to sentiment analysis

  • Detailed overview of a selected deep learning approach to sentiment analysis

  • Comparison of lexicon-based, classical machine learning and deep learning approaches to sentiment analysis

  • Basic principles of knowledge representation

  • Ontologies vs. knowledge graphs - pros and cons of each approach to knowledge representation

  • The stack of typical tasks in ontology learning

  • Main challenges and open problems of ontology learning

  • Techniques used for term extraction, synonym discovery and concept formation

  • Techniques used for taxonomy extraction

  • Techniques used for relation, rule and axiom extraction

  • Overview of a selected deep learning approach to knowledge extraction

Chapter contains:
1
PDF
1
Study Materials
1
Study text
1
Web
Teacher recommends to study from 15/9/2021 to 21/9/2021.
Chapter contains:
1
Image
2
PDF
1
Study Materials
1
Study text
4
Web
Teacher recommends to study from 22/9/2021 to 28/9/2021.
Chapter contains:
1
PDF
1
Study text
1
Web
Teacher recommends to study from 29/9/2021 to 5/10/2021.
Chapter contains:
3
PDF
1
Study Materials
1
Study text
Teacher recommends to study from 6/10/2021 to 12/10/2021.
Chapter contains:
6
PDF
1
Study text
5
Web
Teacher recommends to study from 13/10/2021 to 19/10/2021.
Chapter contains:
4
PDF
1
Study Materials
1
Study text
3
Web
Teacher recommends to study from 20/10/2021 to 26/10/2021.
Teacher recommends to study from 27/10/2021 to 2/11/2021.
Chapter contains:
2
PDF
1
Study Materials
1
Study text
11
Web
Teacher recommends to study from 3/11/2021 to 9/11/2021.
Chapter contains:
1
PDF
1
Study text
Teacher recommends to study from 10/11/2021 to 16/11/2021.
Chapter contains:
6
PDF
1
Study Materials
1
Study text
3
Web
Teacher recommends to study from 17/11/2021 to 23/11/2021.
Chapter contains:
1
PDF
1
Study Materials
1
Study text
Teacher recommends to study from 1/12/2021 to 8/12/2021.
Chapter contains:
1
PDF
1
Study text
Teacher recommends to study from 8/12/2021 to 14/12/2021.

1 Course overview, project assignment overview. Overview of text pre-processing

Evaluation

  • 20 p. poster

  • 50 p. project

  • 30 p. final exam (obligatory, must obtain at least 15 p.)

<50 F, <60 E, <70 D, <80 C, <90 B, >=90 A; zápočet: >= 45 p.


A Python notebook supporting the lecture part on typical NLP pipelines:

Additional readings:



QUESTIONS AND TASKS:

  • Natural language (pre)processing techniques and their relevance for building machine learning models applicable to text

  • Bag of words representation of text - pros and cons

2 ML techniques for NLP 1


QUESTIONS AND TASKS:

  • Text representations. When the ordering of words (does not) matter-s. Mining web

  • Main text mining tasks

  • ML for disambiguation

  • CNN for NLP


MATERIALS FOR THE LABS IN WEEK 02 (all necessary details in the notebook):

3 Distributional semantics, LSA, word embeddings

Slidy k prednasce (veskere reference apod. inline):

READINGS


QUESTIONS AND TASKS:

  • Distributional hypothesis - historical context, linguistic motivations and practical implementations

  • Distributional vs. formal semantics

  • Latent semantic analysis - basic principles, pros and cons

  • Word embedings - comparison of selected popular approaches

  • Basic principles of language models and their training process

4 Ne moc. Dis ease

Error: The referenced object does not exist or you do not have the right to read.
https://is.muni.cz/el/fi/podzim2020/PA164/um/65469229/Mining_text_data_Contents_and_Clustering_Aggarwal.pdf



MATERIALS FOR THE LABS IN WEEK 04 (all necessary details in the notebook):

5 ML techniques for NLP 2. Text clustering.

6 ML for NLP III: Recurrent NN. Outliers in text I.

Recurrent NN

Outlier detection in text I


QUESTIONS AND TASKS:

  • Recurrent NN for NLP. Describe one task.

  • Outliers in text data.

  • Methods for outlier detection in text



MATERIALS FOR THE LABS IN WEEK 06 (all necessary details in the notebook):

Independent Czechoslovak State Day

8 LSTM. RNN/LSTM case studies. ILP

LSTM

RNN/LSTM case studies


Michal Hala

Tomáš Houfek

Andrej Betík


Dominik Tuchyňa


Radoslav Sabol


ILP

Poster session. Preliminary program

Neural Code Search: How Facebook Uses Neural Networks to Help Developers Search for Code Snippets

Andrej Betík

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Michal Hala

Hierarchical Attention Network

Tomáš Houfek

Deep learning for question answering in Czech

Radoslav Sabol


Transformers

Dominik Tuchyňa


QUESTIONS AND TASKS:

  • Why we need LSTM. Describe a task where LSTM areuseful

  • ILP. why we need it. Describe a task where ILP is useful


Materials for the labs in week 08:


9 Poster session

Preliminary program


Neural Code Search: How Facebook Uses Neural Networks to Help Developers Search for Code Snippets

Andrej Betík


XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Michal Hala


Hierarchical Attention Network

Tomáš Houfek


Deep learning for question answering in Czech

Radoslav Sabol


Transformers

Dominik Tuchyňa





10 Learning language in logic. Keyness. Summarization

Learning language in logic

Keyness

Error: The referenced object does not exist or you do not have the right to read.
https://is.muni.cz/el/fi/podzim2021/PA164/um/l8_keyness_and_keywords/Key_Words_and_Key_Sections_--_TALC_Paris.pdf

Error: The referenced object does not exist or you do not have the right to read.
https://is.muni.cz/el/fi/podzim2021/PA164/um/l8_keyness_and_keywords/keyness-porto.pdf

Text summarization

Error: The referenced object does not exist or you do not have the right to read.
https://is.muni.cz/el/fi/podzim2021/PA164/um/l7_text_summ/Text_Summarization.pdf

Error: The referenced object does not exist or you do not have the right to read.
https://is.muni.cz/el/fi/podzim2021/PA164/um/l7_text_summ/ITAT2018-presentation.pdf

QUESTIONS AND TASKS:

  • Key words, keyness. How to compute.

  • Relational learning  (ILP) for key words (key phrases) detection

  • Text sumarization.  Two main approaches.

  • Extractive sumarization. ROUGE-n.


Materials for the labs in week 08:

11 Sentiment analysis (including lexicons of “sentiment words”)


Lecture materials:

Labs (06) materials:

QUESTIONS AND TASKS:

  • Sentiment analysis - the definition of the field, justification of its practical relevance and main challenges

  • Detailed overview of a selected lexicon-based approach to sentiment analysis

  • Detailed overview of a selected classical machine learning approach to sentiment analysis

  • Detailed overview of a selected deep learning approach to sentiment analysis

  • Comparison of lexicon-based, classical machine learning and deep learning approaches to sentiment analysis

12 Machine learning for knowledge extraction from text


Lecture materials:

QUESTIONS AND TASKS:

  • Basic principles of knowledge representation

  • Ontologies vs. knowledge graphs - pros and cons of each approach to knowledge representation

  • The stack of typical tasks in ontology learning

  • Main challenges and open problems of ontology learning

  • Techniques used for term extraction, synonym discovery and concept formation

  • Techniques used for taxonomy extraction

  • Techniques used for relation, rule and axiom extraction

  • Overview of a selected deep learning approach to knowledge extraction