Seq2Seq & Neural Machine Translation

Published at

2026/02/23

Last edited time

2026/02/24 03:01

Created

2026/02/23 22:58

Section

NLP & Prompt Enginnering

Status

In progress

Series

From the Bottom

Seq2Seq

•

Difference from Sequence Labeling

The length of Y can be different from the length of X

The space of Y is often much larger

Statistic Machine Translation (SMT)

IBM Translation Models

•

Word-Alignment Model

◦

Key Idea: two words are more likely to be aligned when they occur more frequently in translation pairs

Neural Machine Translation

•

Autoregressive Seq2seq Generation

◦

Autoregressive Factorization does not assume any independence

▪

cf) Sequence labeling assume independent

•

Seq2Seq Training

◦

Maximum Likelihood Estimation

▪

Backpropagate gradients through both decoder & encoder

•

Seq2seq Decoding

◦

Exhaustive Search: Requires computing all possible sequences

▪

Greedy Search Decoding

◦

Beam Search Decoding

▪

Key Idea: at every step, keep track of the k most probable partial tramslations

▪

Score of each hypothesis = log probability of sequence so far

▪

Not guaranteed to be optimal

▪

More efficient than exhaustive search

•

Heading3

NLP - Lecture Summary

Wiki

Name

AI summary

Created

Keywords

Language

Last edited time

Published at

Section

Series

Status

Tags

Week

NLP Pipelines & Text Classification Methods

Open

NLP systems integrate feature extraction and machine learning, with deep learning replacing manual feature engineering. Text classification can be approached through generative methods like Naive Bayes or discriminative methods such as Logistic Regression and SVM. Naive Bayes assumes conditional independence of words given labels, while Logistic Regression optimizes parameters to maximize log-likelihood. Effective pre-processing, including tokenization and standardization, is crucial for model performance. Challenges like zero probabilities and numerical underflow in Naive Bayes can be addressed with Laplace smoothing and log space. Model evaluation is essential to ensure generalization to unseen data.

2026/02/20 21:51

NLP-260220-1351-LE-NLP

NLP

ENG

2026/02/21 01:39

2026/02/20

NLP & Prompt Enginnering

From the Bottom

Done

Lecture Summary

26-1

Word Vector

Open

Word vectors represent words as numerical vectors capturing semantic meaning, with similar words having similar vectors. Word2Vec learns these vectors by predicting co-occurrence patterns, optimizing both word and context vectors through gradient descent. Key applications include solving analogies, measuring similarity via cosine similarity, and converting variable-length documents into fixed-size vectors using mean pooling. Limitations include challenges with polysemy and biases in training data, which can reflect historical correlations rather than true semantic relationships.

2026/02/21 00:25

NLP-260220-1625-LE-WORDV

NLP

ENG

2026/02/23 07:38

2026/02/20

NLP & Prompt Enginnering

From the Bottom

Done

Lecture Summary

26-2-1

Neural Nets

Open

A practical overview of neural networks covers MLPs, activation functions, and CNNs. Key points include the importance of non-linear activation functions for learning complex patterns, the effectiveness of deeper networks over wider ones, and the role of SGD in optimization. Proper weight initialization and learning rate management are critical for training success, while CNNs leverage word embeddings for better generalization in feature learning.

2026/02/23 04:49

NLP-260222-2049-LE-NLP

NLP

ENG

2026/02/23 07:38

2026/02/22

NLP & Prompt Enginnering

From the Bottom

Done

Lecture Summary

26-2-2

Recurrent Nerual Networks for Sequence Labeling

Open

Recurrent Neural Networks (RNNs) are essential for sequence labeling tasks, addressing issues like context dependency and gradient problems through advanced architectures like LSTMs and Bidirectional RNNs. LSTMs improve long-term memory retention and mitigate the vanishing gradient issue, while combining LSTMs with CNNs and CRFs enhances performance in structured prediction tasks. Key techniques include POS tagging, the use of structured models to capture dependencies, and the application of Maximum Entropy Markov Models and Conditional Random Fields for effective labeling.

2026/02/23 07:41

NLP-260222-2341-LE-NLP

NLP

ENG

2026/02/23 22:36

2026/02/22

NLP & Prompt Enginnering

From the Bottom

Done

Lecture Summary

26-3-1

Seq2Seq & Neural Machine Translation

Open

2026/02/23 22:58

NLP-260223-1458-LE-NLP

NLP

ENG

2026/02/24 03:01

2026/02/23

NLP & Prompt Enginnering

From the Bottom

In progress

Lecture Summary

26-3-2

Pytorch Basics

Open

2026/02/09 18:52

NLP-260209-1052-LE-NLP

NLP

ENG

2026/02/20 22:03

NLP & Prompt Enginnering

From the Bottom

In progress

Lecture Summary

26-4

Transformer

Open

2026/02/20 22:03

NLP-260220-1403-LE-PYTOR

NLP

ENG

2026/02/20 22:03

NLP & Prompt Enginnering

From the Bottom

In progress

Lecture Summary

26-4