Big Data Analytics MCQ



Question 1 : What do you mean by sampling of stream data?

  1. Sampling reduces the amount of data fed to a subsequent data mining algorithm.
  2. Sampling reduces the diversity of the data stream
  3. Sampling aims to keep statistical properties of the data intact.
  4. Sampling algorithms often doesn't need multiple passes over the data
    

Question 2 : if Distance measure d(x, y)= d(y, x) then it is called

  1. Symmetric
  2. identical
  3. positiveness
  4. triangle inequality
    

Question 3 : NOSQL is

  1. Not only SQL
  2. Not SQL
  3. Not Over SQL
  4. No SQL
    

Question 4 : Find the L1 and L2 distances between the points (5, 6, 7) and (8, 2, 4).

  1. L1 =10 , L2 = 5.83
  2. L1 =10 , L2 = 5
  3. L1 =11 , L2 = 4.9
  4. L1 =9 , L2 = 5.83
    

Question 5 : The time between elements of one stream

  1. need not be uniform
  2. need to be uniform
  3. must be 1ms.
  4. must be 1ns
    

Question 6 : A Reduce task receives

  1. one or more keys and their associated value list
  2. key value pair
  3. list of keys and their associated values
  4. list of key value pairs
    

Question 7 : Which of the following statements about data streaming is true?

  1. Stream data is always unstructured data.
  2. Stream data often has a high velocity.
  3. Stream elements cannot be stored on disk.
  4. Stream data is always structured data.
    

Question 8 : Hadoop is the solution for:

  1. Database software
  2. Big Data Software
  3. Data Mining software
  4. Distribution software
    

Question 9 : ETL stands for ________________

  1. Extraction transformation and loading
  2. Extract Taken Lend
  3. Enterprise Transfer Load
  4. Entertainment Transference Load
    

Question 10 : “Sharding” a database across many server instances can be achieved with _______________

  1. MAN
  2. LAN
  3. WAN
  4. SAN
    

Question 11 : Neo4j is an example of which of the following NoSQL architectural pattern?

  1. Key-value store
  2. Graph Store
  3. Document Store
  4. Column-based Store
    

Question 12 : CSV and JSON can be described as

  1. Structured data
  2. Unstructured data
  3. Semi-structured data
  4. Multi-structured data
    

Question 13 : The hardware term used to describe Hadoop hardware requirements is

  1. Commodity firmware
  2. Commodity software
  3. Commodity hardware
  4. Cluster hardware
    

Question 14 : Which of the following is not a Hadoop Distributions?

  1. MAPR
  2. Cloudera
  3. Hortonworks
  4. RMAP
    

Question 15 : Which of the following Operation can be implemented with Combiners?

  1. Selection
  2. Projection
  3. Natural Join
  4. Union
    

Question 16 : ________ stores are used to store information about networks, such as social connections.

  1. Key-value
  2. Wide-column
  3. Document
  4. graph
    

Question 17 : The DGIM algorithm was developed to estimate the counts of 1's occur within the last k bits of a stream window N. Which of the following statements is true about the estimate of the number of 0's based on DGIM?

  1. The number of 0's cannot be estimated at all.
  2. The number of 0's can be estimated with a maximum guaranteed error
  3. To estimate the number of 0s and 1s with a guaranteed maximum error, DGIM has to be employed twice, one creating buckets based on 1's, and once created buckets based on 0's.
  4. Determine whether an element has already occurred in previous stream data.
    

Question 18 : If size of file is 4 GB and block size is 64 MB then number of mappers required for MapReduce task is

  1. 8
  2. 16
  3. 32
  4. 64
    

Question 19 : Which of the following is not the default daemon of Hadoop?

  1. Namenode
  2. Datanode
  3. Job Tracker
  4. Job history server
    

Question 20 : In Bloom filter an array of n bits is initialized with

  1. all 0s
  2. all 1s
  3. half 0s and half 1s
  4. all -1
    

Question 21 : _____________is a batch-based, distributed computing framework modeled after Google’s paper.

  1. MapCompute
  2. MapReuse
  3. MapCluster
  4. MapReduce
    

Question 22 : What is the edit distance between A=father and B=feather ?

  1. 5
  2. 1
  3. 4
  4. 2
    

Question 23 : Sliding window operations typically fall in the category

  1. OLTP Transactions
  2. Big Data Batch Processing
  3. Big Data Real Time Processing
  4. Small Batch Processing
    

Question 24 : _________ systems focus on the relationship between users and items for recommendation.

  1. DGIM
  2. Collaborative-Filtering
  3. Content Based and Collaborative Filtering
  4. Content Based
    

Question 25 : Find Hamming Distance for vectors A=100101011 B=100010010

  1. 2
  2. 4
  3. 3
  4. 1
    

Question 26 : During start up, the ___________ loads the file system state from the fsimage and the edits log file.

  1. Datanode
  2. Namenode
  3. Secondary Namenode
  4. Rack awereness policy
    

Question 27 : What is the finally produced by Hierarchical Agglomerative Clustering?

  1. final estimate of cluster centroids
  2. assignment of each point to clusters
  3. tree showing how close things are to each other
  4. Group of clusters
    

Question 28 : The Jaccard similarity of two non-binary sets A and B, is defined by__________

  1. Jaccard Index
  2. Primary Index
  3. Secondary Index
  4. Clustered Index
    

Question 29 : Following is based on grid like street geography of the New York:

  1. Manhattan Distance
  2. Edit Distance
  3. Hamming distance
  4. Lp distance
    

Question 30 : The FM-sketch algorithm can be used to:

  1. Estimate the number of distinct elements.
  2. Sample data with a time-sensitive window.
  3. Estimate the frequent elements.
  4. Determine whether an element has already occurred in previous stream data.
    
  • chevron_left
  • 1
  • 2
  • chevron_right