Practice Questions — Natural Language Processing (NLP) Fundamentals
← Back to NotesTopic-Specific Questions
Question 1
Easy
What is the output?
text = "Hello World! NLP is GREAT."
print(text.lower())lower() converts all characters to lowercase.
hello world! nlp is great.Question 2
Easy
What does this produce?
import re
text = "Call me at 9876543210 or email abc@xyz.com!"
cleaned = re.sub(r'[^a-zA-Z\s]', '', text)
print(cleaned)The regex keeps only letters and whitespace.
Call me at or email abcxyzcomQuestion 3
Easy
What is tokenization in NLP?
It breaks text into individual units.
Tokenization is the process of splitting text into individual tokens, usually words or subwords. For the sentence 'Natural language processing is fascinating', tokenization produces ['Natural', 'language', 'processing', 'is', 'fascinating']. It is a fundamental first step in NLP because models need to process individual units, not continuous strings of text.
Question 4
Easy
What are stop words and why do we remove them?
They are common words that appear in almost every sentence.
Stop words are extremely common words like 'the', 'is', 'a', 'an', 'in', 'at', 'of' that appear in nearly every sentence but carry little meaning for most NLP tasks. We remove them to reduce noise and dimensionality. After removing stop words from 'machine learning is a fascinating field', we get 'machine learning fascinating field' -- the core meaning is preserved with fewer tokens.
Question 5
Easy
What is the difference?
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
words = ['running', 'studies', 'better']
for w in words:
print(f"{w} -> {stemmer.stem(w)}")Stemming chops off suffixes using rules.
running -> runstudies -> studibetter -> betterQuestion 6
Easy
What shape does CountVectorizer produce?
from sklearn.feature_extraction.text import CountVectorizer
corpus = ["I love coding", "coding is fun", "I love fun"]
vec = CountVectorizer()
X = vec.fit_transform(corpus)
print(f"Shape: {X.shape}")
print(f"Vocabulary: {vec.get_feature_names_out()}")Rows = documents, columns = unique words.
Shape: (3, 4)Vocabulary: ['coding' 'fun' 'is' 'love']Question 7
Medium
Explain the difference between TF and IDF. Why is TF-IDF better than raw word counts?
TF measures local importance, IDF measures global distinctiveness.
TF (Term Frequency) measures how often a word appears in a specific document. A word mentioned 5 times is probably important to that document. IDF (Inverse Document Frequency) measures how distinctive a word is across the corpus: IDF = log(N/df) where N is total documents and df is documents containing the word. A word in all documents (like 'the') gets low IDF. A word in only 1 document gets high IDF. TF-IDF = TF x IDF. It is better than raw counts because it down-weights common words (high TF but low IDF) and up-weights distinctive words (moderate TF but high IDF).
Question 8
Medium
What is the difference between CBOW and Skip-gram in Word2Vec?
One predicts the center word from context; the other predicts context from the center word.
CBOW (Continuous Bag of Words) takes surrounding context words as input and predicts the center word. For 'the cat [?] on the mat', it uses ['the', 'cat', 'on', 'the'] to predict 'sat'. Skip-gram does the reverse: it takes the center word and predicts the surrounding context words. Given 'sat', it predicts ['the', 'cat', 'on', 'the']. Skip-gram works better for rare words and larger datasets because each word generates multiple training examples. CBOW is faster to train and works better for frequent words.
Question 9
Medium
What is the key difference in these TF-IDF scores?
from sklearn.feature_extraction.text import TfidfVectorizer
docs = ["machine learning", "deep learning", "machine deep"]
tfidf = TfidfVectorizer()
X = tfidf.fit_transform(docs)
vocab = tfidf.get_feature_names_out()
for i, word in enumerate(vocab):
scores = [f"{X[j, i]:.3f}" for j in range(3)]
print(f"{word}: {scores}")Words appearing in fewer documents get higher IDF.
'learning' appears in 2/3 docs (lower IDF), 'machine' appears in 2/3 docs (lower IDF), 'deep' appears in 2/3 docs (lower IDF). All words appear in the same fraction of documents, so their IDF values are similar. The differences come from TF (term frequency within each document).
Question 10
Medium
Why are dense word vectors (embeddings) better than sparse vectors (one-hot or BoW) for deep learning?
Think about dimensionality, semantic similarity, and how neural networks process data.
Dense vectors are better because: (1) Dimensionality -- dense vectors are 100-300 dimensions vs 50,000+ for sparse. Neural networks train faster with fewer dimensions. (2) Semantic similarity -- similar words have similar vectors (cosine similarity). Sparse vectors have zero similarity for all word pairs. (3) Generalization -- if the model learns 'excellent' is positive, it automatically knows 'great' is positive too (similar vectors). With sparse vectors, each word must be learned independently. (4) Compositionality -- dense vectors support arithmetic: king - man + woman = queen. (5) Pre-training -- embeddings trained on billions of words capture rich linguistic knowledge that transfers to downstream tasks.
Question 11
Medium
Why does this code produce a data leakage error?
tfidf = TfidfVectorizer()
X_train = tfidf.fit_transform(train_texts)
X_test = tfidf.fit_transform(test_texts) # Bug herefit_transform() re-learns the vocabulary.
Calling
fit_transform() on test data re-fits the vectorizer, creating a new vocabulary and new IDF values from test data. This is data leakage -- the model benefits from information it should not have. It also creates a different feature space, so the features do not align between train and test. The fix is to use tfidf.transform(test_texts) which applies the vocabulary and IDF learned from training data.Question 12
Hard
What is the role of ngram_range=(1, 2) in TfidfVectorizer? When is it important?
It includes both single words and word pairs as features.
ngram_range=(1, 2) creates features from both unigrams (single words) and bigrams (consecutive word pairs). For 'not good movie', the features include: unigrams ['not', 'good', 'movie'] and bigrams ['not good', 'good movie']. This is important for capturing negation and multi-word expressions. Without bigrams, 'not good' becomes 'not' + 'good', and the model might see 'good' as positive. With bigrams, 'not good' is a single feature that the model can learn is negative. Bigrams are especially important for sentiment analysis.
Question 13
Hard
Meera has a dataset of 500 customer reviews and needs to build a text classifier. Should she use TF-IDF + SVM or Embeddings + LSTM? Why?
Consider dataset size and model complexity.
Meera should use TF-IDF + SVM. With only 500 reviews: (1) An LSTM with Embedding layer has too many parameters for so little data and will overfit. The Embedding layer alone for a 10,000-word vocabulary with 64 dimensions has 640,000 parameters. (2) TF-IDF + SVM works well with small datasets because SVM is designed for high-dimensional, sparse data and generalizes well with limited examples. (3) TF-IDF does not need to learn word representations -- it uses statistical word weighting that works immediately. (4) Training is seconds (SVM) vs minutes (LSTM). If she needs LSTM-level performance, she should use pre-trained embeddings (GloVe, Word2Vec) with the Embedding layer frozen.
Question 14
Hard
What does the vector arithmetic 'king - man + woman = queen' actually mean in embedding space?
Think about what direction each subtraction and addition encodes.
In the embedding space, the vector difference (king - man) captures the concept of 'royalty' independent of gender. Adding 'woman' to this 'royalty' direction yields a point closest to 'queen'. More precisely: the vector from 'man' to 'king' represents the gender-to-royalty transformation. Applying this same transformation to 'woman' (by adding the same difference vector) lands near 'queen'. This works because Word2Vec embeddings organize so that consistent relationships between word pairs correspond to consistent directions in vector space. It is not exact (the nearest word to king-man+woman might not always be queen), but it demonstrates that embeddings capture semantic relationships as geometric directions.
Mixed & Application Questions
Question 1
Easy
What does this produce?
text = "I love NLP and AI"
tokens = text.lower().split()
print(tokens)lower() then split() on whitespace.
['i', 'love', 'nlp', 'and', 'ai']Question 2
Easy
What is the difference between stemming and lemmatization?
One uses rules, the other uses a dictionary.
Stemming uses rule-based suffix stripping to reduce words to a root form. It is fast but can produce non-words ('studies' -> 'studi'). Lemmatization uses vocabulary lookup and morphological analysis to reduce words to their dictionary base form (lemma). It always produces real words ('studies' -> 'study', 'better' -> 'good') but is slower. Choose stemming for speed when exact words do not matter; choose lemmatization when you need valid dictionary words.
Question 3
Easy
How many features does TfidfVectorizer with max_features=100 produce?
tfidf = TfidfVectorizer(max_features=100)
X = tfidf.fit_transform(corpus_of_1000_documents)
print(X.shape)max_features limits the vocabulary to the top N words by term frequency.
(1000, 100)Question 4
Medium
What is the NLP pipeline? List the steps in order.
From raw text to prediction.
The NLP pipeline: (1) Raw text input. (2) Text preprocessing: lowercasing, removing punctuation/URLs/HTML, tokenization. (3) Text cleaning: stop word removal, stemming/lemmatization. (4) Feature extraction: convert text to numbers (BoW, TF-IDF, or word embeddings). (5) Model training: feed numerical features to a classifier (Naive Bayes, SVM, LSTM, Transformer). (6) Prediction: the model outputs a label (sentiment, spam/ham, topic) or other structured output. Each step must be consistent between training and inference.
Question 5
Medium
What happens when you process a new word that was not in the training data?
tokenizer = Tokenizer(num_words=100, oov_token='')
tokenizer.fit_on_texts(['I love machine learning'])
new_seq = tokenizer.texts_to_sequences(['I love quantum computing'])
print(new_seq) Unknown words become the OOV token.
[[2, 3, 1, 1]] where 1 is the OOV index.Question 6
Medium
What is the vocabulary of this vectorizer?
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(ngram_range=(1, 2), max_features=8)
tfidf.fit(["I love deep learning", "deep learning is great"])
print(tfidf.get_feature_names_out())ngram_range=(1,2) includes unigrams and bigrams. max_features keeps top 8.
The vocabulary includes the top 8 features from unigrams and bigrams, such as:
['deep', 'deep learning', 'great', 'is', 'is great', 'learning', 'learning is', 'love'] (exact order may vary by TF-IDF ranking).Question 7
Medium
Why should you use transform() instead of fit_transform() on test data when using TfidfVectorizer?
fit_transform() learns new statistics from the data it processes.
fit_transform() on test data re-learns the vocabulary and IDF values from test data, causing two problems: (1) Data leakage -- the model benefits from test data statistics it should not know. (2) Feature misalignment -- the test vocabulary may differ from the training vocabulary, so features do not correspond to the same words. transform() applies the vocabulary and IDF values learned from training data to new text, ensuring consistent feature representation. This rule applies to all scikit-learn transformers.
Question 8
Medium
What is the difference between these two approaches?
# Approach 1: Learned embeddings
Embedding(10000, 64, trainable=True)
# Approach 2: Pre-trained embeddings (frozen)
Embedding(10000, 64, weights=[glove_matrix], trainable=False)One learns from scratch, the other uses pre-trained knowledge.
Approach 1 initializes embeddings randomly and learns them from your training data. Good when you have lots of data (100K+ samples). Approach 2 loads pre-trained GloVe/Word2Vec embeddings and freezes them. Good when you have limited data -- the embeddings already capture word relationships from billions of tokens. You can also use trainable=True with pre-trained weights to fine-tune them for your specific task.
Question 9
Hard
Explain why TF-IDF + SVM often performs as well as deep learning for text classification on small to medium datasets.
Consider the properties of TF-IDF features and SVM's capabilities.
Several reasons: (1) TF-IDF captures word importance effectively through statistical weighting, which is often sufficient for classification. (2) SVM is specifically designed for high-dimensional, sparse data (which is exactly what TF-IDF produces). (3) SVM finds the maximum-margin hyperplane, which generalizes well even with limited data. (4) Bigrams in TF-IDF (ngram_range=(1,2)) capture some word order information. (5) Deep learning models (LSTM, Transformer) need large datasets to learn good representations from scratch. With limited data, the Embedding layer alone has more parameters than the entire TF-IDF+SVM pipeline. (6) Deep learning's advantage (learning complex patterns) only materializes with sufficient data.
Question 10
Hard
What are the limitations of Bag of Words and TF-IDF? What kinds of text understanding do they fail at?
Think about word order, context, and polysemy.
Limitations: (1) No word order -- 'dog bites man' and 'man bites dog' produce identical vectors. (2) No context -- 'bank' (financial) and 'bank' (river) get the same representation. (3) No compositionality -- cannot understand phrases ('not good' is just 'not' + 'good'). (4) Sparse, high-dimensional -- vocabulary of 50K words creates 50K-dimensional sparse vectors. (5) No generalization across synonyms -- 'excellent' and 'great' are completely different features with zero similarity. (6) Cannot handle out-of-vocabulary words. These limitations are why word embeddings and neural models were developed.
Question 11
Hard
What would happen if you remove stop words for this NER task?
# Text: "The Bank of India announced new policies"
# After stop word removal: "Bank India announced new policies"
# NER on original: "Bank of India" -> ORG
# NER on cleaned: "Bank" -> ?, "India" -> GPEStop word removal can break multi-word entities.
Removing stop words destroys the multi-word entity 'Bank of India'. The NER model can no longer recognize it as an organization name because 'of' (a stop word) is removed. 'Bank' alone might be classified as a common noun, and 'India' separately as a location. The correct entity (a specific bank named 'Bank of India') is lost. This is why stop word removal should NOT be applied before NER or any task where entity boundaries and multi-word expressions matter.
Question 12
Hard
Rajesh wants to build a text classifier for legal documents in Hindi. What challenges will he face compared to English NLP?
Consider tokenization, pre-trained resources, and morphological complexity.
Challenges for Hindi NLP: (1) Tokenization is harder -- Hindi uses Devanagari script, words may have different spellings, and compound words are common. (2) Limited pre-trained resources -- far fewer pre-trained embeddings and language models compared to English. GloVe and Word2Vec trained on Hindi corpora are smaller. (3) Morphological richness -- Hindi has complex inflections (gender, number, case), requiring more sophisticated stemming/lemmatization. (4) Smaller labeled datasets -- legal documents in Hindi are rare in public datasets. (5) Code-mixing -- many Hindi texts mix Hindi and English words (Hinglish), requiring special handling. (6) Standard NLP tools (NLTK, spaCy) have limited Hindi support compared to English. Solutions: use multilingual models (mBERT, XLM-RoBERTa), Hindi-specific tokenizers (iNLTK, Stanza), and consider few-shot learning approaches.
Question 13
Easy
What does this code do?
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(stop_words='english')
X = tfidf.fit_transform(['I love machine learning'])
print(tfidf.get_feature_names_out())stop_words='english' removes common English words.
['learning' 'love' 'machine'] (approximately -- 'I' is removed as a stop word)Question 14
Medium
What is the advantage of using pre-trained word embeddings (GloVe, Word2Vec) over training embeddings from scratch?
Think about the amount of data needed to learn good word representations.
Pre-trained embeddings are trained on billions of words (Google News for Word2Vec, Wikipedia + Common Crawl for GloVe) and capture rich semantic relationships that your small dataset cannot learn. Training embeddings from scratch on, say, 10,000 reviews would produce poor representations because there is not enough context to learn meaningful word relationships. Pre-trained embeddings provide a massive head start -- 'king' and 'queen' already have similar vectors. You can use them as fixed features (trainable=False) or fine-tune them (trainable=True) for your specific task.
Question 15
Medium
What is the output shape of TfidfVectorizer on 100 documents?
tfidf = TfidfVectorizer(max_features=5000)
X = tfidf.fit_transform(corpus_of_100_documents)
print(X.shape)Rows = documents, columns = vocabulary features.
(100, 5000) or fewer columns if the corpus has fewer than 5000 unique words.Question 16
Hard
How would you handle a dataset where some reviews are in English and some are in Hindi for sentiment analysis?
Think about multilingual models and consistent preprocessing.
Options: (1) Use a multilingual pre-trained model like mBERT (multilingual BERT) or XLM-RoBERTa, which understand 100+ languages including both English and Hindi. These models produce language-agnostic representations where sentiment patterns transfer across languages. (2) Translate all texts to one language before processing, though this introduces translation errors. (3) Train separate models for each language. (4) Use multilingual TF-IDF with appropriate tokenization for each language. The best approach is (1) -- multilingual Transformers handle code-mixed text naturally and have been shown to transfer sentiment understanding across languages.
Multiple Choice Questions
MCQ 1
What is the first step in most NLP pipelines?
Answer: B
B is correct. Text preprocessing (lowercasing, removing punctuation, tokenization) is always the first step. Raw text must be cleaned and standardized before any model can process it.
B is correct. Text preprocessing (lowercasing, removing punctuation, tokenization) is always the first step. Raw text must be cleaned and standardized before any model can process it.
MCQ 2
What does tokenization do?
Answer: B
B is correct. Tokenization breaks text into individual units (tokens), usually words. 'I love NLP' becomes ['I', 'love', 'NLP']. It is a fundamental preprocessing step.
B is correct. Tokenization breaks text into individual units (tokens), usually words. 'I love NLP' becomes ['I', 'love', 'NLP']. It is a fundamental preprocessing step.
MCQ 3
What does TF-IDF stand for?
Answer: B
B is correct. TF-IDF = Term Frequency (how often a word appears in a document) x Inverse Document Frequency (how distinctive the word is across all documents).
B is correct. TF-IDF = Term Frequency (how often a word appears in a document) x Inverse Document Frequency (how distinctive the word is across all documents).
MCQ 4
Which of these words would typically be considered a stop word in English?
Answer: B
B is correct. 'the' is one of the most common English stop words. Stop words are extremely frequent words that carry little meaning (the, is, a, an, in, at). Content words like 'algorithm', 'neural', 'python' are not stop words.
B is correct. 'the' is one of the most common English stop words. Stop words are extremely frequent words that carry little meaning (the, is, a, an, in, at). Content words like 'algorithm', 'neural', 'python' are not stop words.
MCQ 5
What is Bag of Words?
Answer: C
C is correct. Bag of Words creates a vector by counting how many times each word appears in a document. It completely ignores word order (like throwing words into a bag).
C is correct. Bag of Words creates a vector by counting how many times each word appears in a document. It completely ignores word order (like throwing words into a bag).
MCQ 6
Why is TF-IDF generally better than raw Bag of Words?
Answer: C
C is correct. TF-IDF weighs words by their importance. Common words across all documents get low scores (low IDF). Words distinctive to specific documents get high scores (high IDF). BoW treats all words equally based on count alone.
C is correct. TF-IDF weighs words by their importance. Common words across all documents get low scores (low IDF). Words distinctive to specific documents get high scores (high IDF). BoW treats all words equally based on count alone.
MCQ 7
What is the key advantage of word embeddings over one-hot encoding?
Answer: B
B is correct. Word embeddings map words to dense vectors where semantically similar words (king/queen, cat/kitten) have similar vectors. One-hot encoding produces sparse, orthogonal vectors where every word pair has zero similarity.
B is correct. Word embeddings map words to dense vectors where semantically similar words (king/queen, cat/kitten) have similar vectors. One-hot encoding produces sparse, orthogonal vectors where every word pair has zero similarity.
MCQ 8
In Word2Vec Skip-gram, what is the model trained to do?
Answer: B
B is correct. Skip-gram predicts context words from the center word. Given 'sat', it predicts surrounding words like 'cat', 'on', 'mat'. CBOW does the reverse: predicts the center word from context.
B is correct. Skip-gram predicts context words from the center word. Given 'sat', it predicts surrounding words like 'cat', 'on', 'mat'. CBOW does the reverse: predicts the center word from context.
MCQ 9
Which sklearn method should be used on test data: fit_transform() or transform()?
Answer: C
C is correct. fit_transform() on training data learns the vocabulary/statistics AND transforms. transform() on test data uses the SAME learned vocabulary/statistics to transform new data. This prevents data leakage.
C is correct. fit_transform() on training data learns the vocabulary/statistics AND transforms. transform() on test data uses the SAME learned vocabulary/statistics to transform new data. This prevents data leakage.
MCQ 10
What does ngram_range=(1, 2) do in TfidfVectorizer?
Answer: B
B is correct. ngram_range=(1, 2) includes unigrams (single words like 'good') and bigrams (word pairs like 'not good') as features. This helps capture phrases and negations.
B is correct. ngram_range=(1, 2) includes unigrams (single words like 'good') and bigrams (word pairs like 'not good') as features. This helps capture phrases and negations.
MCQ 11
What does IDF equal for a word that appears in every document of a corpus?
Answer: B
B is correct. IDF = log(N / df). If a word appears in all N documents: IDF = log(N/N) = log(1) = 0. Words appearing everywhere contribute zero information for distinguishing documents, so their TF-IDF score is zero.
B is correct. IDF = log(N / df). If a word appears in all N documents: IDF = log(N/N) = log(1) = 0. Words appearing everywhere contribute zero information for distinguishing documents, so their TF-IDF score is zero.
MCQ 12
Which NLP task identifies names of people, organizations, and locations in text?
Answer: B
B is correct. NER (Named Entity Recognition) identifies and classifies entities: PERSON (Sundar Pichai), ORG (Google), GPE (Bangalore), DATE (March 15). It is a key information extraction technique.
B is correct. NER (Named Entity Recognition) identifies and classifies entities: PERSON (Sundar Pichai), ORG (Google), GPE (Bangalore), DATE (March 15). It is a key information extraction technique.
MCQ 13
Which statement about GloVe is correct?
Answer: B
B is correct. GloVe (Global Vectors) learns embeddings by factorizing the global word-word co-occurrence matrix. Unlike Word2Vec which uses a sliding window, GloVe uses global statistics of the entire corpus. The result is dense vectors (not sparse) that work for multiple languages.
B is correct. GloVe (Global Vectors) learns embeddings by factorizing the global word-word co-occurrence matrix. Unlike Word2Vec which uses a sliding window, GloVe uses global statistics of the entire corpus. The result is dense vectors (not sparse) that work for multiple languages.
MCQ 14
For a small dataset of 500 text samples, which approach is most appropriate?
Answer: B
B is correct. With 500 samples, TF-IDF + SVM is the best simple approach (SVM handles high dimensions well with limited data). Alternatively, use pre-trained embeddings (GloVe) to leverage knowledge from billions of tokens. Training a Transformer or Word2Vec from scratch requires vastly more data.
B is correct. With 500 samples, TF-IDF + SVM is the best simple approach (SVM handles high dimensions well with limited data). Alternatively, use pre-trained embeddings (GloVe) to leverage knowledge from billions of tokens. Training a Transformer or Word2Vec from scratch requires vastly more data.
MCQ 15
What is the famous Word2Vec analogy equation?
Answer: B
B is correct. king - man + woman = queen. The vector difference (king - man) captures the royalty concept independent of gender. Adding woman to this royalty direction yields queen. This demonstrates that embeddings capture semantic relationships as geometric directions.
B is correct. king - man + woman = queen. The vector difference (king - man) captures the royalty concept independent of gender. Adding woman to this royalty direction yields queen. This demonstrates that embeddings capture semantic relationships as geometric directions.
MCQ 16
What does lowercasing do in text preprocessing?
Answer: B
B is correct. Lowercasing normalizes text so that 'The', 'THE', and 'the' are all treated as the same token. This reduces vocabulary size and improves model generalization.
B is correct. Lowercasing normalizes text so that 'The', 'THE', and 'the' are all treated as the same token. This reduces vocabulary size and improves model generalization.
MCQ 17
What is the output of CountVectorizer?
Answer: B
B is correct. CountVectorizer creates a document-term matrix where each row is a document, each column is a word from the vocabulary, and values are word counts. This is the Bag of Words representation.
B is correct. CountVectorizer creates a document-term matrix where each row is a document, each column is a word from the vocabulary, and values are word counts. This is the Bag of Words representation.
MCQ 18
What is the difference between stemming and lemmatization?
Answer: B
B is correct. Stemming applies rule-based suffix removal: 'studies' -> 'studi' (not a word). Lemmatization uses morphological analysis: 'studies' -> 'study' (valid word). Stemming is faster; lemmatization is more accurate.
B is correct. Stemming applies rule-based suffix removal: 'studies' -> 'studi' (not a word). Lemmatization uses morphological analysis: 'studies' -> 'study' (valid word). Stemming is faster; lemmatization is more accurate.
MCQ 19
What does a high IDF value for a word indicate?
Answer: B
B is correct. IDF = log(N/df). A word appearing in few documents has a small df (document frequency), giving a high IDF. This word is distinctive and informative for distinguishing documents. A word in all documents has IDF = 0.
B is correct. IDF = log(N/df). A word appearing in few documents has a small df (document frequency), giving a high IDF. This word is distinctive and informative for distinguishing documents. A word in all documents has IDF = 0.
MCQ 20
Why do word embeddings support vector arithmetic like king - man + woman = queen?
Answer: B
B is correct. Word2Vec and GloVe learn that words appearing in similar contexts get similar vectors. The king-queen and man-woman relationships involve the same gender dimension shift. This emergent property arises from training on billions of words, not from explicit programming.
B is correct. Word2Vec and GloVe learn that words appearing in similar contexts get similar vectors. The king-queen and man-woman relationships involve the same gender dimension shift. This emergent property arises from training on billions of words, not from explicit programming.
MCQ 21
What does NER stand for in NLP?
Answer: B
B is correct. NER (Named Entity Recognition) identifies and classifies named entities in text into categories like PERSON, ORGANIZATION, LOCATION, and DATE.
B is correct. NER (Named Entity Recognition) identifies and classifies named entities in text into categories like PERSON, ORGANIZATION, LOCATION, and DATE.
MCQ 22
What does max_features=10000 do in TfidfVectorizer?
Answer: B
B is correct. max_features limits the vocabulary size to the top N most frequent terms across the corpus. This reduces dimensionality and focuses on the most common and potentially useful words.
B is correct. max_features limits the vocabulary size to the top N most frequent terms across the corpus. This reduces dimensionality and focuses on the most common and potentially useful words.
MCQ 23
When should you NOT remove stop words in NLP?
Answer: C
C is correct. In sentiment analysis, removing 'not' from 'not good' leaves only 'good', reversing the intended meaning. Negation words (not, no, never) are critical for sentiment analysis and should be preserved.
C is correct. In sentiment analysis, removing 'not' from 'not good' leaves only 'good', reversing the intended meaning. Negation words (not, no, never) are critical for sentiment analysis and should be preserved.
MCQ 24
What is the key difference between CBOW and Skip-gram in Word2Vec?
Answer: A
A is correct. CBOW (Continuous Bag of Words) predicts the center word from surrounding context. Skip-gram predicts surrounding context from the center word. Both are unsupervised and produce dense vectors. Skip-gram works better for rare words and larger datasets.
A is correct. CBOW (Continuous Bag of Words) predicts the center word from surrounding context. Skip-gram predicts surrounding context from the center word. Both are unsupervised and produce dense vectors. Skip-gram works better for rare words and larger datasets.
MCQ 25
What is a Pipeline in scikit-learn?
Answer: B
B is correct. A Pipeline chains multiple steps (e.g., TfidfVectorizer + LinearSVC) into one object. Calling fit() trains all steps, and predict() applies all steps. This ensures consistent preprocessing and prevents data leakage.
B is correct. A Pipeline chains multiple steps (e.g., TfidfVectorizer + LinearSVC) into one object. Calling fit() trains all steps, and predict() applies all steps. This ensures consistent preprocessing and prevents data leakage.
Coding Challenges
Coding challenges coming soon.
Need to Review the Concepts?
Go back to the detailed notes for this chapter.
Read Chapter NotesWant to learn AI and ML with a live mentor?
Explore our AI/ML Masterclass