Chapter 19 Advanced 55 Questions

Practice Questions — Natural Language Processing (NLP) Fundamentals

← Back to Notes
10 Easy
12 Medium
8 Hard

Topic-Specific Questions

Question 1
Easy
What is the output?
text = "Hello World! NLP is GREAT."
print(text.lower())
lower() converts all characters to lowercase.
hello world! nlp is great.
Question 2
Easy
What does this produce?
import re
text = "Call me at 9876543210 or email abc@xyz.com!"
cleaned = re.sub(r'[^a-zA-Z\s]', '', text)
print(cleaned)
The regex keeps only letters and whitespace.
Call me at or email abcxyzcom
Question 3
Easy
What is tokenization in NLP?
It breaks text into individual units.
Tokenization is the process of splitting text into individual tokens, usually words or subwords. For the sentence 'Natural language processing is fascinating', tokenization produces ['Natural', 'language', 'processing', 'is', 'fascinating']. It is a fundamental first step in NLP because models need to process individual units, not continuous strings of text.
Question 4
Easy
What are stop words and why do we remove them?
They are common words that appear in almost every sentence.
Stop words are extremely common words like 'the', 'is', 'a', 'an', 'in', 'at', 'of' that appear in nearly every sentence but carry little meaning for most NLP tasks. We remove them to reduce noise and dimensionality. After removing stop words from 'machine learning is a fascinating field', we get 'machine learning fascinating field' -- the core meaning is preserved with fewer tokens.
Question 5
Easy
What is the difference?
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
words = ['running', 'studies', 'better']
for w in words:
    print(f"{w} -> {stemmer.stem(w)}")
Stemming chops off suffixes using rules.
running -> run
studies -> studi
better -> better
Question 6
Easy
What shape does CountVectorizer produce?
from sklearn.feature_extraction.text import CountVectorizer
corpus = ["I love coding", "coding is fun", "I love fun"]
vec = CountVectorizer()
X = vec.fit_transform(corpus)
print(f"Shape: {X.shape}")
print(f"Vocabulary: {vec.get_feature_names_out()}")
Rows = documents, columns = unique words.
Shape: (3, 4)
Vocabulary: ['coding' 'fun' 'is' 'love']
Question 7
Medium
Explain the difference between TF and IDF. Why is TF-IDF better than raw word counts?
TF measures local importance, IDF measures global distinctiveness.
TF (Term Frequency) measures how often a word appears in a specific document. A word mentioned 5 times is probably important to that document. IDF (Inverse Document Frequency) measures how distinctive a word is across the corpus: IDF = log(N/df) where N is total documents and df is documents containing the word. A word in all documents (like 'the') gets low IDF. A word in only 1 document gets high IDF. TF-IDF = TF x IDF. It is better than raw counts because it down-weights common words (high TF but low IDF) and up-weights distinctive words (moderate TF but high IDF).
Question 8
Medium
What is the difference between CBOW and Skip-gram in Word2Vec?
One predicts the center word from context; the other predicts context from the center word.
CBOW (Continuous Bag of Words) takes surrounding context words as input and predicts the center word. For 'the cat [?] on the mat', it uses ['the', 'cat', 'on', 'the'] to predict 'sat'. Skip-gram does the reverse: it takes the center word and predicts the surrounding context words. Given 'sat', it predicts ['the', 'cat', 'on', 'the']. Skip-gram works better for rare words and larger datasets because each word generates multiple training examples. CBOW is faster to train and works better for frequent words.
Question 9
Medium
What is the key difference in these TF-IDF scores?
from sklearn.feature_extraction.text import TfidfVectorizer
docs = ["machine learning", "deep learning", "machine deep"]
tfidf = TfidfVectorizer()
X = tfidf.fit_transform(docs)
vocab = tfidf.get_feature_names_out()
for i, word in enumerate(vocab):
    scores = [f"{X[j, i]:.3f}" for j in range(3)]
    print(f"{word}: {scores}")
Words appearing in fewer documents get higher IDF.
'learning' appears in 2/3 docs (lower IDF), 'machine' appears in 2/3 docs (lower IDF), 'deep' appears in 2/3 docs (lower IDF). All words appear in the same fraction of documents, so their IDF values are similar. The differences come from TF (term frequency within each document).
Question 10
Medium
Why are dense word vectors (embeddings) better than sparse vectors (one-hot or BoW) for deep learning?
Think about dimensionality, semantic similarity, and how neural networks process data.
Dense vectors are better because: (1) Dimensionality -- dense vectors are 100-300 dimensions vs 50,000+ for sparse. Neural networks train faster with fewer dimensions. (2) Semantic similarity -- similar words have similar vectors (cosine similarity). Sparse vectors have zero similarity for all word pairs. (3) Generalization -- if the model learns 'excellent' is positive, it automatically knows 'great' is positive too (similar vectors). With sparse vectors, each word must be learned independently. (4) Compositionality -- dense vectors support arithmetic: king - man + woman = queen. (5) Pre-training -- embeddings trained on billions of words capture rich linguistic knowledge that transfers to downstream tasks.
Question 11
Medium
Why does this code produce a data leakage error?
tfidf = TfidfVectorizer()
X_train = tfidf.fit_transform(train_texts)
X_test = tfidf.fit_transform(test_texts)  # Bug here
fit_transform() re-learns the vocabulary.
Calling fit_transform() on test data re-fits the vectorizer, creating a new vocabulary and new IDF values from test data. This is data leakage -- the model benefits from information it should not have. It also creates a different feature space, so the features do not align between train and test. The fix is to use tfidf.transform(test_texts) which applies the vocabulary and IDF learned from training data.
Question 12
Hard
What is the role of ngram_range=(1, 2) in TfidfVectorizer? When is it important?
It includes both single words and word pairs as features.
ngram_range=(1, 2) creates features from both unigrams (single words) and bigrams (consecutive word pairs). For 'not good movie', the features include: unigrams ['not', 'good', 'movie'] and bigrams ['not good', 'good movie']. This is important for capturing negation and multi-word expressions. Without bigrams, 'not good' becomes 'not' + 'good', and the model might see 'good' as positive. With bigrams, 'not good' is a single feature that the model can learn is negative. Bigrams are especially important for sentiment analysis.
Question 13
Hard
Meera has a dataset of 500 customer reviews and needs to build a text classifier. Should she use TF-IDF + SVM or Embeddings + LSTM? Why?
Consider dataset size and model complexity.
Meera should use TF-IDF + SVM. With only 500 reviews: (1) An LSTM with Embedding layer has too many parameters for so little data and will overfit. The Embedding layer alone for a 10,000-word vocabulary with 64 dimensions has 640,000 parameters. (2) TF-IDF + SVM works well with small datasets because SVM is designed for high-dimensional, sparse data and generalizes well with limited examples. (3) TF-IDF does not need to learn word representations -- it uses statistical word weighting that works immediately. (4) Training is seconds (SVM) vs minutes (LSTM). If she needs LSTM-level performance, she should use pre-trained embeddings (GloVe, Word2Vec) with the Embedding layer frozen.
Question 14
Hard
What does the vector arithmetic 'king - man + woman = queen' actually mean in embedding space?
Think about what direction each subtraction and addition encodes.
In the embedding space, the vector difference (king - man) captures the concept of 'royalty' independent of gender. Adding 'woman' to this 'royalty' direction yields a point closest to 'queen'. More precisely: the vector from 'man' to 'king' represents the gender-to-royalty transformation. Applying this same transformation to 'woman' (by adding the same difference vector) lands near 'queen'. This works because Word2Vec embeddings organize so that consistent relationships between word pairs correspond to consistent directions in vector space. It is not exact (the nearest word to king-man+woman might not always be queen), but it demonstrates that embeddings capture semantic relationships as geometric directions.

Mixed & Application Questions

Question 1
Easy
What does this produce?
text = "I love NLP and AI"
tokens = text.lower().split()
print(tokens)
lower() then split() on whitespace.
['i', 'love', 'nlp', 'and', 'ai']
Question 2
Easy
What is the difference between stemming and lemmatization?
One uses rules, the other uses a dictionary.
Stemming uses rule-based suffix stripping to reduce words to a root form. It is fast but can produce non-words ('studies' -> 'studi'). Lemmatization uses vocabulary lookup and morphological analysis to reduce words to their dictionary base form (lemma). It always produces real words ('studies' -> 'study', 'better' -> 'good') but is slower. Choose stemming for speed when exact words do not matter; choose lemmatization when you need valid dictionary words.
Question 3
Easy
How many features does TfidfVectorizer with max_features=100 produce?
tfidf = TfidfVectorizer(max_features=100)
X = tfidf.fit_transform(corpus_of_1000_documents)
print(X.shape)
max_features limits the vocabulary to the top N words by term frequency.
(1000, 100)
Question 4
Medium
What is the NLP pipeline? List the steps in order.
From raw text to prediction.
The NLP pipeline: (1) Raw text input. (2) Text preprocessing: lowercasing, removing punctuation/URLs/HTML, tokenization. (3) Text cleaning: stop word removal, stemming/lemmatization. (4) Feature extraction: convert text to numbers (BoW, TF-IDF, or word embeddings). (5) Model training: feed numerical features to a classifier (Naive Bayes, SVM, LSTM, Transformer). (6) Prediction: the model outputs a label (sentiment, spam/ham, topic) or other structured output. Each step must be consistent between training and inference.
Question 5
Medium
What happens when you process a new word that was not in the training data?
tokenizer = Tokenizer(num_words=100, oov_token='')
tokenizer.fit_on_texts(['I love machine learning'])
new_seq = tokenizer.texts_to_sequences(['I love quantum computing'])
print(new_seq)
Unknown words become the OOV token.
[[2, 3, 1, 1]] where 1 is the OOV index.
Question 6
Medium
What is the vocabulary of this vectorizer?
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(ngram_range=(1, 2), max_features=8)
tfidf.fit(["I love deep learning", "deep learning is great"])
print(tfidf.get_feature_names_out())
ngram_range=(1,2) includes unigrams and bigrams. max_features keeps top 8.
The vocabulary includes the top 8 features from unigrams and bigrams, such as: ['deep', 'deep learning', 'great', 'is', 'is great', 'learning', 'learning is', 'love'] (exact order may vary by TF-IDF ranking).
Question 7
Medium
Why should you use transform() instead of fit_transform() on test data when using TfidfVectorizer?
fit_transform() learns new statistics from the data it processes.
fit_transform() on test data re-learns the vocabulary and IDF values from test data, causing two problems: (1) Data leakage -- the model benefits from test data statistics it should not know. (2) Feature misalignment -- the test vocabulary may differ from the training vocabulary, so features do not correspond to the same words. transform() applies the vocabulary and IDF values learned from training data to new text, ensuring consistent feature representation. This rule applies to all scikit-learn transformers.
Question 8
Medium
What is the difference between these two approaches?
# Approach 1: Learned embeddings
Embedding(10000, 64, trainable=True)

# Approach 2: Pre-trained embeddings (frozen)
Embedding(10000, 64, weights=[glove_matrix], trainable=False)
One learns from scratch, the other uses pre-trained knowledge.
Approach 1 initializes embeddings randomly and learns them from your training data. Good when you have lots of data (100K+ samples). Approach 2 loads pre-trained GloVe/Word2Vec embeddings and freezes them. Good when you have limited data -- the embeddings already capture word relationships from billions of tokens. You can also use trainable=True with pre-trained weights to fine-tune them for your specific task.
Question 9
Hard
Explain why TF-IDF + SVM often performs as well as deep learning for text classification on small to medium datasets.
Consider the properties of TF-IDF features and SVM's capabilities.
Several reasons: (1) TF-IDF captures word importance effectively through statistical weighting, which is often sufficient for classification. (2) SVM is specifically designed for high-dimensional, sparse data (which is exactly what TF-IDF produces). (3) SVM finds the maximum-margin hyperplane, which generalizes well even with limited data. (4) Bigrams in TF-IDF (ngram_range=(1,2)) capture some word order information. (5) Deep learning models (LSTM, Transformer) need large datasets to learn good representations from scratch. With limited data, the Embedding layer alone has more parameters than the entire TF-IDF+SVM pipeline. (6) Deep learning's advantage (learning complex patterns) only materializes with sufficient data.
Question 10
Hard
What are the limitations of Bag of Words and TF-IDF? What kinds of text understanding do they fail at?
Think about word order, context, and polysemy.
Limitations: (1) No word order -- 'dog bites man' and 'man bites dog' produce identical vectors. (2) No context -- 'bank' (financial) and 'bank' (river) get the same representation. (3) No compositionality -- cannot understand phrases ('not good' is just 'not' + 'good'). (4) Sparse, high-dimensional -- vocabulary of 50K words creates 50K-dimensional sparse vectors. (5) No generalization across synonyms -- 'excellent' and 'great' are completely different features with zero similarity. (6) Cannot handle out-of-vocabulary words. These limitations are why word embeddings and neural models were developed.
Question 11
Hard
What would happen if you remove stop words for this NER task?
# Text: "The Bank of India announced new policies"
# After stop word removal: "Bank India announced new policies"
# NER on original: "Bank of India" -> ORG
# NER on cleaned: "Bank" -> ?, "India" -> GPE
Stop word removal can break multi-word entities.
Removing stop words destroys the multi-word entity 'Bank of India'. The NER model can no longer recognize it as an organization name because 'of' (a stop word) is removed. 'Bank' alone might be classified as a common noun, and 'India' separately as a location. The correct entity (a specific bank named 'Bank of India') is lost. This is why stop word removal should NOT be applied before NER or any task where entity boundaries and multi-word expressions matter.
Question 12
Hard
Rajesh wants to build a text classifier for legal documents in Hindi. What challenges will he face compared to English NLP?
Consider tokenization, pre-trained resources, and morphological complexity.
Challenges for Hindi NLP: (1) Tokenization is harder -- Hindi uses Devanagari script, words may have different spellings, and compound words are common. (2) Limited pre-trained resources -- far fewer pre-trained embeddings and language models compared to English. GloVe and Word2Vec trained on Hindi corpora are smaller. (3) Morphological richness -- Hindi has complex inflections (gender, number, case), requiring more sophisticated stemming/lemmatization. (4) Smaller labeled datasets -- legal documents in Hindi are rare in public datasets. (5) Code-mixing -- many Hindi texts mix Hindi and English words (Hinglish), requiring special handling. (6) Standard NLP tools (NLTK, spaCy) have limited Hindi support compared to English. Solutions: use multilingual models (mBERT, XLM-RoBERTa), Hindi-specific tokenizers (iNLTK, Stanza), and consider few-shot learning approaches.
Question 13
Easy
What does this code do?
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(stop_words='english')
X = tfidf.fit_transform(['I love machine learning'])
print(tfidf.get_feature_names_out())
stop_words='english' removes common English words.
['learning' 'love' 'machine'] (approximately -- 'I' is removed as a stop word)
Question 14
Medium
What is the advantage of using pre-trained word embeddings (GloVe, Word2Vec) over training embeddings from scratch?
Think about the amount of data needed to learn good word representations.
Pre-trained embeddings are trained on billions of words (Google News for Word2Vec, Wikipedia + Common Crawl for GloVe) and capture rich semantic relationships that your small dataset cannot learn. Training embeddings from scratch on, say, 10,000 reviews would produce poor representations because there is not enough context to learn meaningful word relationships. Pre-trained embeddings provide a massive head start -- 'king' and 'queen' already have similar vectors. You can use them as fixed features (trainable=False) or fine-tune them (trainable=True) for your specific task.
Question 15
Medium
What is the output shape of TfidfVectorizer on 100 documents?
tfidf = TfidfVectorizer(max_features=5000)
X = tfidf.fit_transform(corpus_of_100_documents)
print(X.shape)
Rows = documents, columns = vocabulary features.
(100, 5000) or fewer columns if the corpus has fewer than 5000 unique words.
Question 16
Hard
How would you handle a dataset where some reviews are in English and some are in Hindi for sentiment analysis?
Think about multilingual models and consistent preprocessing.
Options: (1) Use a multilingual pre-trained model like mBERT (multilingual BERT) or XLM-RoBERTa, which understand 100+ languages including both English and Hindi. These models produce language-agnostic representations where sentiment patterns transfer across languages. (2) Translate all texts to one language before processing, though this introduces translation errors. (3) Train separate models for each language. (4) Use multilingual TF-IDF with appropriate tokenization for each language. The best approach is (1) -- multilingual Transformers handle code-mixed text naturally and have been shown to transfer sentiment understanding across languages.

Multiple Choice Questions

MCQ 1
What is the first step in most NLP pipelines?
  • A. Training the model
  • B. Text preprocessing
  • C. Making predictions
  • D. Choosing an optimizer
Answer: B
B is correct. Text preprocessing (lowercasing, removing punctuation, tokenization) is always the first step. Raw text must be cleaned and standardized before any model can process it.
MCQ 2
What does tokenization do?
  • A. Encrypts the text for security
  • B. Splits text into individual tokens (words or subwords)
  • C. Converts text to lowercase
  • D. Removes numbers from text
Answer: B
B is correct. Tokenization breaks text into individual units (tokens), usually words. 'I love NLP' becomes ['I', 'love', 'NLP']. It is a fundamental preprocessing step.
MCQ 3
What does TF-IDF stand for?
  • A. Total Feature - Information Data Format
  • B. Term Frequency - Inverse Document Frequency
  • C. Text Filter - Intelligent Data Feature
  • D. Token Frequency - Index Document Feature
Answer: B
B is correct. TF-IDF = Term Frequency (how often a word appears in a document) x Inverse Document Frequency (how distinctive the word is across all documents).
MCQ 4
Which of these words would typically be considered a stop word in English?
  • A. algorithm
  • B. the
  • C. neural
  • D. python
Answer: B
B is correct. 'the' is one of the most common English stop words. Stop words are extremely frequent words that carry little meaning (the, is, a, an, in, at). Content words like 'algorithm', 'neural', 'python' are not stop words.
MCQ 5
What is Bag of Words?
  • A. A neural network architecture for text
  • B. A word embedding technique
  • C. A text representation that counts word occurrences, ignoring order
  • D. A tokenization method
Answer: C
C is correct. Bag of Words creates a vector by counting how many times each word appears in a document. It completely ignores word order (like throwing words into a bag).
MCQ 6
Why is TF-IDF generally better than raw Bag of Words?
  • A. TF-IDF preserves word order
  • B. TF-IDF uses neural networks internally
  • C. TF-IDF down-weights common words and up-weights distinctive words
  • D. TF-IDF produces smaller vectors
Answer: C
C is correct. TF-IDF weighs words by their importance. Common words across all documents get low scores (low IDF). Words distinctive to specific documents get high scores (high IDF). BoW treats all words equally based on count alone.
MCQ 7
What is the key advantage of word embeddings over one-hot encoding?
  • A. Embeddings are always more accurate
  • B. Embeddings capture semantic similarity between words in dense vectors
  • C. Embeddings are faster to compute
  • D. Embeddings do not require training data
Answer: B
B is correct. Word embeddings map words to dense vectors where semantically similar words (king/queen, cat/kitten) have similar vectors. One-hot encoding produces sparse, orthogonal vectors where every word pair has zero similarity.
MCQ 8
In Word2Vec Skip-gram, what is the model trained to do?
  • A. Predict the center word from surrounding context words
  • B. Predict surrounding context words from the center word
  • C. Classify sentences into categories
  • D. Translate words between languages
Answer: B
B is correct. Skip-gram predicts context words from the center word. Given 'sat', it predicts surrounding words like 'cat', 'on', 'mat'. CBOW does the reverse: predicts the center word from context.
MCQ 9
Which sklearn method should be used on test data: fit_transform() or transform()?
  • A. fit_transform() on both train and test
  • B. transform() on both train and test
  • C. fit_transform() on train, transform() on test
  • D. transform() on train, fit_transform() on test
Answer: C
C is correct. fit_transform() on training data learns the vocabulary/statistics AND transforms. transform() on test data uses the SAME learned vocabulary/statistics to transform new data. This prevents data leakage.
MCQ 10
What does ngram_range=(1, 2) do in TfidfVectorizer?
  • A. Limits vocabulary to words with 1-2 characters
  • B. Creates features from both single words and consecutive word pairs
  • C. Uses only 2-word features
  • D. Limits each document to 1-2 features
Answer: B
B is correct. ngram_range=(1, 2) includes unigrams (single words like 'good') and bigrams (word pairs like 'not good') as features. This helps capture phrases and negations.
MCQ 11
What does IDF equal for a word that appears in every document of a corpus?
  • A. 1
  • B. 0
  • C. Infinity
  • D. The number of documents
Answer: B
B is correct. IDF = log(N / df). If a word appears in all N documents: IDF = log(N/N) = log(1) = 0. Words appearing everywhere contribute zero information for distinguishing documents, so their TF-IDF score is zero.
MCQ 12
Which NLP task identifies names of people, organizations, and locations in text?
  • A. Sentiment Analysis
  • B. Named Entity Recognition (NER)
  • C. Topic Modeling
  • D. Text Summarization
Answer: B
B is correct. NER (Named Entity Recognition) identifies and classifies entities: PERSON (Sundar Pichai), ORG (Google), GPE (Bangalore), DATE (March 15). It is a key information extraction technique.
MCQ 13
Which statement about GloVe is correct?
  • A. GloVe predicts context words like Word2Vec Skip-gram
  • B. GloVe factorizes the global word co-occurrence matrix
  • C. GloVe only works with English text
  • D. GloVe produces sparse vectors
Answer: B
B is correct. GloVe (Global Vectors) learns embeddings by factorizing the global word-word co-occurrence matrix. Unlike Word2Vec which uses a sliding window, GloVe uses global statistics of the entire corpus. The result is dense vectors (not sparse) that work for multiple languages.
MCQ 14
For a small dataset of 500 text samples, which approach is most appropriate?
  • A. Train a Transformer from scratch
  • B. TF-IDF + SVM or pre-trained embeddings with a simple classifier
  • C. Use only Bag of Words with no classifier
  • D. Train Word2Vec on the 500 samples
Answer: B
B is correct. With 500 samples, TF-IDF + SVM is the best simple approach (SVM handles high dimensions well with limited data). Alternatively, use pre-trained embeddings (GloVe) to leverage knowledge from billions of tokens. Training a Transformer or Word2Vec from scratch requires vastly more data.
MCQ 15
What is the famous Word2Vec analogy equation?
  • A. king + man - woman = queen
  • B. king - man + woman = queen
  • C. king * man / woman = queen
  • D. king + queen = man + woman
Answer: B
B is correct. king - man + woman = queen. The vector difference (king - man) captures the royalty concept independent of gender. Adding woman to this royalty direction yields queen. This demonstrates that embeddings capture semantic relationships as geometric directions.
MCQ 16
What does lowercasing do in text preprocessing?
  • A. Removes all lowercase letters
  • B. Converts all text to lowercase so 'The' and 'the' are treated as the same word
  • C. Removes capital letters from the text
  • D. Translates text to a simpler language
Answer: B
B is correct. Lowercasing normalizes text so that 'The', 'THE', and 'the' are all treated as the same token. This reduces vocabulary size and improves model generalization.
MCQ 17
What is the output of CountVectorizer?
  • A. Word embeddings
  • B. A matrix of word counts per document
  • C. A list of sentences
  • D. A neural network
Answer: B
B is correct. CountVectorizer creates a document-term matrix where each row is a document, each column is a word from the vocabulary, and values are word counts. This is the Bag of Words representation.
MCQ 18
What is the difference between stemming and lemmatization?
  • A. Stemming is slower but more accurate; lemmatization is faster but less accurate
  • B. Stemming uses rules to chop suffixes (may produce non-words); lemmatization uses vocabulary to produce valid base forms
  • C. They are identical in function
  • D. Stemming works only on verbs; lemmatization works on all parts of speech
Answer: B
B is correct. Stemming applies rule-based suffix removal: 'studies' -> 'studi' (not a word). Lemmatization uses morphological analysis: 'studies' -> 'study' (valid word). Stemming is faster; lemmatization is more accurate.
MCQ 19
What does a high IDF value for a word indicate?
  • A. The word appears in every document
  • B. The word appears in very few documents, making it more distinctive
  • C. The word is a stop word
  • D. The word has many characters
Answer: B
B is correct. IDF = log(N/df). A word appearing in few documents has a small df (document frequency), giving a high IDF. This word is distinctive and informative for distinguishing documents. A word in all documents has IDF = 0.
MCQ 20
Why do word embeddings support vector arithmetic like king - man + woman = queen?
  • A. The vectors are programmed with relationship rules
  • B. Consistent semantic relationships are captured as consistent directions in the vector space
  • C. The arithmetic is a coincidence that only works for a few examples
  • D. The embeddings are manually designed to support this
Answer: B
B is correct. Word2Vec and GloVe learn that words appearing in similar contexts get similar vectors. The king-queen and man-woman relationships involve the same gender dimension shift. This emergent property arises from training on billions of words, not from explicit programming.
MCQ 21
What does NER stand for in NLP?
  • A. Neural Entity Recombination
  • B. Named Entity Recognition
  • C. Natural Entity Retrieval
  • D. Numeric Expression Resolver
Answer: B
B is correct. NER (Named Entity Recognition) identifies and classifies named entities in text into categories like PERSON, ORGANIZATION, LOCATION, and DATE.
MCQ 22
What does max_features=10000 do in TfidfVectorizer?
  • A. Limits each document to 10000 words
  • B. Keeps only the top 10000 most frequent words in the vocabulary
  • C. Limits the number of documents to 10000
  • D. Sets the maximum TF-IDF score to 10000
Answer: B
B is correct. max_features limits the vocabulary size to the top N most frequent terms across the corpus. This reduces dimensionality and focuses on the most common and potentially useful words.
MCQ 23
When should you NOT remove stop words in NLP?
  • A. During topic modeling
  • B. During document similarity computation
  • C. During sentiment analysis where negation words like 'not' are critical
  • D. During keyword extraction
Answer: C
C is correct. In sentiment analysis, removing 'not' from 'not good' leaves only 'good', reversing the intended meaning. Negation words (not, no, never) are critical for sentiment analysis and should be preserved.
MCQ 24
What is the key difference between CBOW and Skip-gram in Word2Vec?
  • A. CBOW predicts the center word from context; Skip-gram predicts context words from the center word
  • B. CBOW is for English only; Skip-gram works for any language
  • C. CBOW produces sparse vectors; Skip-gram produces dense vectors
  • D. CBOW is unsupervised; Skip-gram is supervised
Answer: A
A is correct. CBOW (Continuous Bag of Words) predicts the center word from surrounding context. Skip-gram predicts surrounding context from the center word. Both are unsupervised and produce dense vectors. Skip-gram works better for rare words and larger datasets.
MCQ 25
What is a Pipeline in scikit-learn?
  • A. A type of neural network
  • B. A chain of preprocessing steps and a model into a single object
  • C. A data loading tool
  • D. A visualization library
Answer: B
B is correct. A Pipeline chains multiple steps (e.g., TfidfVectorizer + LinearSVC) into one object. Calling fit() trains all steps, and predict() applies all steps. This ensures consistent preprocessing and prevents data leakage.

Coding Challenges

Coding challenges coming soon.

Need to Review the Concepts?

Go back to the detailed notes for this chapter.

Read Chapter Notes

Want to learn AI and ML with a live mentor?

Explore our AI/ML Masterclass