Big Data Analytics MCQ

Question 1 : What do you mean by sampling of stream data?

Sampling reduces the amount of data fed to a subsequent data mining algorithm.
Sampling reduces the diversity of the data stream
Sampling aims to keep statistical properties of the data intact.
Sampling algorithms often doesn't need multiple passes over the data

Question 2 : if Distance measure d(x, y)= d(y, x) then it is called

Symmetric
identical
positiveness
triangle inequality

Question 3 : NOSQL is

Not only SQL
Not SQL
Not Over SQL
No SQL

Question 4 : Find the L1 and L2 distances between the points (5, 6, 7) and (8, 2, 4).

L1 =10 , L2 = 5.83
L1 =10 , L2 = 5
L1 =11 , L2 = 4.9
L1 =9 , L2 = 5.83

Question 5 : The time between elements of one stream

need not be uniform
need to be uniform
must be 1ms.
must be 1ns

Question 6 : A Reduce task receives

one or more keys and their associated value list
key value pair
list of keys and their associated values
list of key value pairs

Question 7 : Which of the following statements about data streaming is true?

Stream data is always unstructured data.
Stream data often has a high velocity.
Stream elements cannot be stored on disk.
Stream data is always structured data.

Question 8 : Hadoop is the solution for:

Database software
Big Data Software
Data Mining software
Distribution software

Question 9 : ETL stands for ________________

Extraction transformation and loading
Extract Taken Lend
Enterprise Transfer Load
Entertainment Transference Load

Question 10 : “Sharding” a database across many server instances can be achieved with _______________

Question 11 : Neo4j is an example of which of the following NoSQL architectural pattern?

Key-value store
Graph Store
Document Store
Column-based Store

Question 12 : CSV and JSON can be described as

Structured data
Unstructured data
Semi-structured data
Multi-structured data

Question 13 : The hardware term used to describe Hadoop hardware requirements is

Commodity firmware
Commodity software
Commodity hardware
Cluster hardware

Question 14 : Which of the following is not a Hadoop Distributions?

MAPR
Cloudera
Hortonworks
RMAP

Question 15 : Which of the following Operation can be implemented with Combiners?

Selection
Projection
Natural Join
Union

Question 16 : ________ stores are used to store information about networks, such as social connections.

Key-value
Wide-column
Document
graph

Question 17 : The DGIM algorithm was developed to estimate the counts of 1's occur within the last k bits of a stream window N. Which of the following statements is true about the estimate of the number of 0's based on DGIM?

The number of 0's cannot be estimated at all.
The number of 0's can be estimated with a maximum guaranteed error
To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's.
Determine whether an element has already occurred in previous stream data.

Question 18 : If size of file is 4 GB and block size is 64 MB then number of mappers required for MapReduce task is

Question 19 : Which of the following is not the default daemon of Hadoop?

Namenode
Datanode
Job Tracker
Job history server

Question 20 : In Bloom filter an array of n bits is initialized with

all 0s
all 1s
half 0s and half 1s
all -1

Question 21 : _____________is a batch-based, distributed computing framework modeled after Google’s paper.

MapCompute
MapReuse
MapCluster
MapReduce

Question 22 : What is the edit distance between A=father and B=feather ?

Question 23 : Sliding window operations typically fall in the category

OLTP Transactions
Big Data Batch Processing
Big Data Real Time Processing
Small Batch Processing

Question 24 : _________ systems focus on the relationship between users and items for recommendation.

DGIM
Collaborative-Filtering
Content Based and Collaborative Filtering
Content Based

Question 25 : Find Hamming Distance for vectors A=100101011 B=100010010

Question 26 : During start up, the ___________ loads the file system state from the fsimage and the edits log file.

Datanode
Namenode
Secondary Namenode
Rack awereness policy

Question 27 : What is the finally produced by Hierarchical Agglomerative Clustering?

final estimate of cluster centroids
assignment of each point to clusters
tree showing how close things are to each other
Group of clusters

Question 28 : The Jaccard similarity of two non-binary sets A and B, is defined by__________

Jaccard Index
Primary Index
Secondary Index
Clustered Index

Question 29 : Following is based on grid like street geography of the New York:

Manhattan Distance
Edit Distance
Hamming distance
Lp distance

Question 30 : The FM-sketch algorithm can be used to:

Estimate the number of distinct elements.
Sample data with a time-sensitive window.
Estimate the frequent elements.
Determine whether an element has already occurred in previous stream data.