Machine Learning MCQs
This section focuses on "Machine Learning" in Data Science. These Machine Learning Multiple Choice Questions (MCQ) should be practiced to improve the Data Science skills required for various interviews (campus interview, walk-in interview, company interview), placements, entrance exams and other competitive examinations.
1. What is true about Machine Learning?
A. Machine Learning (ML) is that field of computer science
B. ML is a type of artificial intelligence that extract patterns out of raw data by using an algorithm or method.
C. The main focus of ML is to allow computer systems learn from experience without being explicitly programmed or human intervention.
D. All of the above
View Answer
Ans : D
Explanation: All statement are true about Machine Learning.
2. ML is a field of AI consisting of learning algorithms that?
A. Improve their performance
B. At executing some task
C. Over time with experience
D. All of the above
View Answer
Ans : D
Explanation: ML is a field of AI consisting of learning algorithms that : Improve their performance (P), At executing some task (T), Over time with experience (E).
3. p → 0q is not a?
A. hack clause
B. horn clause
C. structural clause
D. system clause
View Answer
Ans : B
Explanation: p → 0q is not a horn clause.
4. The action _______ of a robot arm specify to Place block A on block B.
A. STACK(A,B)
B. LIST(A,B)
C. QUEUE(A,B)
D. ARRAY(A,B)
View Answer
Ans : A
Explanation: The action 'STACK(A,B)' of a robot arm specify to Place block A on block B.
5. A__________ begins by hypothesizing a sentence (the symbol S) and successively predicting lower level constituents until individual preterminal symbols are written.
A. bottow-up parser
B. top parser
C. top-down parser
D. bottom parser
View Answer
Ans : C
Explanation: A top-down parser begins by hypothesizing a sentence (the symbol S) and successively predicting lower level constituents until individual preterminal symbols are written.
6. A model of language consists of the categories which does not include ________.
A. System Unit
B. structural units.
C. data units
D. empirical units
View Answer
Ans : B
Explanation: A model of language consists of the categories which does not include structural units.
7. Different learning methods does not include?
A. Introduction
B. Analogy
C. Deduction
D. Memorization
View Answer
Ans : A
Explanation: Different learning methods does not include the introduction.
8. The model will be trained with data in one single batch is known as ?
A. Batch learning
B. Offline learning
C. Both A and B
D. None of the above
View Answer
Ans : C
Explanation: we have end-to-end Machine Learning systems in which we need to train the model in one go by using whole available training data. Such kind of learning method or algorithm is called Batch or Offline learning.
9. Which of the following are ML methods?
A. based on human supervision
B. supervised Learning
C. semi-reinforcement Learning
D. All of the above
View Answer
Ans : A
Explanation: The following are various ML methods based on some broad categories : Based on human supervision, Unsupervised Learning, Semi-supervised Learning and Reinforcement Learning
10. In Model based learning methods, an iterative process takes place on the ML models that are built based on various model parameters, called ?
A. mini-batches
B. optimizedparameters
C. hyperparameters
D. superparameters
View Answer
Ans : C
Explanation: In Model based learning methods, an iterative process takes place on the ML models that are built based on various model parameters, called hyperparameters.
11. Which of the following is a widely used and effective machine learning algorithm based on the idea of bagging?
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
View Answer
Ans : D
Explanation: The Radom Forest algorithm builds an ensemble of Decision Trees, mostly trained with the bagging method.
12. To find the minimum or the maximum of a function, we set the gradient to zero because:
A. The value of the gradient at extrema of a function is always zero
B. Depends on the type of problem
C. Both A and B
D. None of the above
View Answer
Ans : A
Explanation: The gradient of a multivariable function at a maximum point will be the zero vector of the function, which is the single greatest value that the function can achieve.
13. Which of the following is a disadvantage of decision trees?
A. Factor analysis
B. Decision trees are robust to outliers
C. Decision trees are prone to be overfit
D. None of the above
View Answer
Ans : C
Explanation: Allowing a decision tree to split to a granular degree makes decision trees prone to learning every point extremely well to the point of perfect classification that is overfitting.
14. How do you handle missing or corrupted data in a dataset?
A. Drop missing rows or columns
B. Replace missing values with mean/median/mode
C. Assign a unique category to missing values
D. All of the above
View Answer
Ans : D
Explanation: All of the above techniques are different ways of imputing the missing values.
15. When performing regression or classification, which of the following is the correct way to preprocess the data?
A. Normalize the data -> PCA -> training
B. PCA -> normalize PCA output -> training
C. Normalize the data -> PCA -> normalize PCA output -> training
D. None of the above
View Answer
Ans : A
Explanation: You need to always normalize the data first. If not, PCA or other techniques that are used to reduce dimensions will give different results.
16. Which of the following statements about regularization is not correct?
A. Using too large a value of lambda can cause your hypothesis to underfit the data.
B. Using too large a value of lambda can cause your hypothesis to overfit the data
C. Using a very large value of lambda cannot hurt the performance of your hypothesis.
D. None of the above
View Answer
Ans : D
Explanation: A large value results in a large regularization penalty and therefore, a strong preference for simpler models, which can underfit the data.
17. Which of the following techniques can not be used for normalization in text mining?
A. Stemming
B. Lemmatization
C. Stop Word Removal
D. None of the above
View Answer
Ans : C
Explanation: Lemmatization and stemming are the techniques of keyword normalization.
18. In which of the following cases will K-means clustering fail to give good results?
1) Data points with outliers
2) Data points with different densities
3) Data points with nonconvex shapes
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. All of the above
View Answer
Ans : D
Explanation: K-means clustering algorithm fails to give good results when the data contains outliers, the density spread of data points across the data space is different, and the data points follow nonconvex shapes.
19. Which of the following is a reasonable way to select the number of principal components "k"?
A. Choose k to be the smallest value so that at least 99% of the varinace is retained.
B. Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
C. Choose k to be the largest value so that 99% of the variance is retained.
D. Use the elbow method.
View Answer
Ans : A
Explanation: This will maintain the structure of the data and also reduce its dimension.
20. What is a sentence parser typically used for?
A. It is used to parse sentences to check if they are utf-8 compliant.
B. It is used to parse sentences to derive their most likely syntax tree structures.
C. It is used to parse sentences to assign POS tags to all tokens.
D. It is used to check if sentences can be parsed into meaningful tokens.
View Answer
Ans : B
Explanation: Sentence parsers analyze a sentence and automatically build a syntax tree.
Discussion