Big Data Questions [PDF]

  • 0 0 0
  • Gefällt Ihnen dieses papier und der download? Sie können Ihre eigene PDF-Datei in wenigen Minuten kostenlos online veröffentlichen! Anmelden
Datei wird geladen, bitte warten...
Zitiervorschau

Which NoSQL datastore type began as an implementation of question Question: 1 Which capability does IBM BigInsights add to enrich Hadoop?

reponses A Jaql B Fault tolerance through HDFS replication C Adaptive MapReduce D Parallel computing on commodity servers

Question: 2 What is one of the four characteristics of Big Data?

A value B volume C verifiability D volatility

Question: 3 Which Hadoop-related project provides common utilities and libraries that support other Hadoop sub projects?

A Hadoop Common B Hadoop HBase C MapReduce D BigTable A Federated Discovery and Navigation B Text Analysis C Stream Computing D MapReduce

Question: 4 Which type of Big Data analysis involves the processing of extremely large volumes of constantly moving data that is impractical to store?

Question: 6 Which primary computing bottleneck of modern computers is addressed by Hadoop?

Question: 7 Which Big Data function improves the decision-making capabilities of organizations by enabling the organizations to interpret and evaluate structured and unstructured data in search of valuable business information? Question: 8 What is one of the two technologies that Hadoop uses as its foundation?

Question: 9 What key feature does HDFS 2.0 provide that HDFS does not?

Question: 10 What are two of the core operators that can be used in a Jaql query? (Select two.)

Question: 11 Which type of language is Pig?

A 64-bit architecture B disk latency C MIPS D limited disk capacity A stream computing B data warehousing C analytics D distributed file system A HBase B Apache C Jaql D MapReduce A a high throughput, shared file system B high availability of the NameNode C data access performed by an RDBMS D random access to data in the cluster A LOAD B JOIN C TOP D SELECT A SQL-like B compiled language C object oriented D data flow

If you need to change the replication factor or increase the default storage block size, which file do you need to modify?

Question: 13 To run a MapReduce job on the BigInsights cluster, which statement about the input file(s) must be true?

What is a characteristic of IBM GPFS that distinguishes it from other distributed file systems?

Question: 15 Which statement represents a difference between Pig and Hive?

Question: 16 Which command helps you create a directory called mydata on HDFS?

Question: 17 In which step of a MapReduce job is the output stored on the local disk?

A hdfs.conf B hadoop-configuration.xml C hadoop.conf D hdfs-site.xml A The file(s) must be stored on the local file system where the map reduce job was developed. B The file(s) must be stored in HDFS or GPFS. C The file(s) must be stored on the JobTracker. D No matter where the input files are before, they will be automatically copied to where the job runs. A operating system independence B posix compliance C no single point of failure D blocks that are stored on different nodes A Pig is used for creating MapReduce programs. B Pig has a shelf interface for executing commands. C Pig is not designed for random reads/writes or low-latency queries. D Pig uses Load, Transform, and Store. A hdfs -dir mydata B hadoop fs -mkdir mydata C hadoop fs -dir mydata D mkdir mydata A Reduce B Shuffle C Combine D Map

Question: 18 Under the MapReduce programming model, which task is performed by the Reduce step?

A Worker nodes process individual data segments in parallel. B Worker nodes store results in the local file system. C Input data is split into smaller pieces. D Data is aggregated by worker nodes.

Question: 19 Which element of the MapReduce architecture runs map and reduce jobs?

A Reducer B JobScheduler C TaskTracker D JobTracker

Question: 19 Which element of the MapReduce architecture runs map and reduce jobs?

A Reducer B JobScheduler C TaskTracker D JobTracker

Question: 20 What is one of the two driving principles of MapReduce?

A spread data across a cluster of computers B provide structure to unstructured or semi-structured data C increase storage capacity through advanced compression algorithms D provide a platform for highly efficient transaction processing

Question: 21 When running a MapReduce job from Eclipse, which BigInsights execution models are available? (Select two.)

A Cluster B Distributed C Remote D Debugging E Local

Question: 22 Which statement is true regarding the number of mappers and reducers configured in a cluster?

A The number of reducers is always equal to the number of mappers. B The number of mappers and reducers can be configured by modifying the mapred-site. xml file. C The number of mappers and reducers is decided by the NameNode. D The number of mappers must be equal to the number of nodes in a cluster.

Question: 23 Which command displays the sizes of files and directories contained in the given directory, or the length of a file, in case it is just a file?

A hadoop size B hdfs -du C hdfs fs size D hadoop fs -du

Question: 24 Following the most common HDFS replica placement policy, when the replication factor is three, how many replicas will be located on the local rack?

A three B two C one D none

Question: 25 A copies Job Resources to the shared file system In the MapReduce processing model, what is the main function performed by the JobTracker? B coordinates the job execution C executes the map and reduce functions D assigns tasks to each cluster node Question: 26 How are Pig and Jaql query languages similar?

A Both are data flow languages. B Both require schema. C Both use Jaql query language. D Both are developed primarily by IBM.

Question: 27 Under the HDFS architecture, what is one purpose of the NameNode?

A to manage storage attached to nodes B to coordinate MapReduce jobs C to regulate client access to files D to periodically report status to DataNode

Question: 28 Which command should be used to list the contents of the root directory in HDFS?

A hadoop fs list B hdfs root C hadoop fs -Is / D hdfs list /

Question: 29 What is one function of the JobTracker in MapReduce?

A runs map and reduce tasks B keeps the work physically close to the data C reports status of DataNodes D manages storage

Question: 30 In addition to the high-level language Pig Latin, what is a primary component of the Apache Pig platform?

A built-in UDFs and indexing B platform-specific SQL libraries C an RDBMS such as DB2 or MySQL D runtime environment

Question: 31 Which statement is true about Hadoop Distributed File System (HDFS)?

A Data is accessed through MapReduce. B Data is designed for random access read/write. C Data can be processed over long distances without a decrease in performance. D Data can be created, updated and deleted.

Question: 32 Which is a use-case for Text Analytics?

A managing customer information in a CRM database B sentiment analytics from social media blogs C product cost analysis from accounting systems D health insurance cost/benefit analysis from payroll data

Question: 33 Which tool is used to access BigSheets?

A BigSheets client B Microsoft Excel C Eclipse D Web Browser

Question: 34 Which technology does Big SQL utilize for access to shared catalogs?

A Hive metastore B RDBMS C MapReduce D HCatalog

Question: 35 Which statement will make an AQL view have content displayed?

A display view B return view C output view D export view

Question: 36 You work for a hosting company that has data centers spread across North America. You are trying to resolve a critical performance problem in which a large number of web servers are performing far below expectations. You know that the information written to log files can help determine the cause of the problem, but there is too much data to manage easily. Which type of Big Data analysis is appropriate for this use case?

A Text Analytics B Stream Computing C Data Warehousing D Temporal Analysis

Question: 37 Which utility provides a command-line interface for Hive?

A Thrift client B Hive shell C Hive SQL client D Hive Eclipse plugin

Question: 38 What is an accurate description of HBase?

A It is a data flow language for structured data based on Ansi-SQL. B It is a distributed file system that replicates data across a cluster. C It is an open source implementation of Google's BigTable. D It is a database schema for unstructured Big Data.

Question: 39 Which Hadoop-related technology provides a user-friendly interface, which enables business users to easily analyze Big Data?

A BigSQL B BigSheets C Avro D HBase

Question: 40 What drives the demand for Text Analytics?

A Text Analytics is the most common way to derive value from Big Data. B MapReduce is unable to process unstructured text. C Data warehouses contain potentially valuable information. D Most of the world's data is in unstructured or semi-structured text.

Question: 41 In Hive, what is the difference between an external table and a Hive managed table?

A An external table refers an existing location outside the warehouse directory. B An external table refers to a table that cannot be dropped. C An external table refers to the data from a remote database. D An external table refers to the data stored on the local file system. A It provides all the capabilities of an RDBMS plus the ability to manage Big Data. B It is a database technology that does not use the traditional relational model. C It is based on the highly scalable Google Compute Engine. D It is an IBM project designed to enable DB2 to manage Big Data.

Question: 42 Which statement about NoSQL is true?

Question: 43 If you need to JOIN data from two workbooks, which operation should be performed beforehand?

A "Copy" to create a new sheet with the other workbook data in the current workbook B "Group" to bring together the two workbooks C "Load" to create a new sheet with the other workbook data in the current workbook D "Add" to add the other workbook data to the current workbook

Question: 44 What is the "scan" command used for in HBase?

A to get detailed information about the table B to view data in an Hbase table C to report any inconsistencies in the database D to list all tables in Hbase

Which tool is used for developing a BigInsights Text Analytics extractor?

A Eclipse with BigInsights tools for Eclipse plugin B BigInsights Console with AQL plugin C AQLBuilder D AQL command line

Question: 46 What is the most efficient way to load 700MB of data when you create a new HBase table?

A Pre-create regions by specifying splits in create table command and use the insert command to load data. B Pre-create regions by specifying splits in create table command and bulk loading the data. C Pre-create the column families when creating the table and use the put command to load the data. D Pre-create the column families when creating the table and bulk loading the data.

Question: 47 The following sequence of commands is executed: create 'table_1','column_family1','column_family2' put 'table_1','row1','column_family1:c11','r1v11' put 'table_1','row2','column_family1:c12','r1v12' put 'table_1','row2','column_family2:c21','r1v21' put 'table_1','row3','column_family1:d11','r1v11' put 'table_1','row2','column_family1:d12','r1v12' put 'table_1','row2','column_family2:d21','r1v21' In HBase, which value will the "count 'table_1'" command return?

A4 B3 C6 D2

Question: 48 Which Hive command is used to query a table?

A TRANSFORM B SELECT C GET D EXPAND

Question: 49 Why develop SQL-based query languages that can access Hadoop data sets?

A because SQL enhances query performance B because the MapReduce Java API is sometimes difficult to use C because data stored in a Hadoop cluster lends itself to structured SQL queries D because the data stored in Hadoop is always structured

Question: 50 Which key benefit does NoSQL provide?

A It allows Hadoop to apply the schema-on-ingest model to unstructured Big Data. B It allows an RDBMS to maintain referential integrity on a Hadoop data set. C It allows customers to leverage high-end server platforms to manage Big Data. D It can cost-effectively manage data sets too large for traditional RDBMS.

Question: 51 What makes SQL access to Hadoop data difficult?

A Hadoop data is highly structured. B Data is in many formats. C Data is located on a distributed file system. D Hadoop requires pre-defined schema.

Question: 52 Which command can be used in Hive to list the tables available in a database/schema?

A list tables B describe tables C show all D show tables

Question: 53 In HBase, what is the "count" command used for?

A to count the number of columns of a table B to count the number of column families of a table C to count the number of rows in a table D to count the number of regions of a table

Question: 54 Which Hadoop-related technology supports analysis of large datasets stored in HDFS using an SQL-like query language?

A HBase B Pig C Jaql D Hive

A They need to be marked as "Shared." Question: 55 How can the applications published to BigInsights Web Console be made available for users to B They need to be copied under the user home directory. C They need to be deployed with proper privileges. execute? D They need to be linked with the master application. Question: 56 Which component of Apache Hadoop is used for scheduling and running workflow jobs?

Question: 57 What is one of the main components of Watson Explorer (InfoSphere Data Explorer)?

A Eclipse B Oozie C Jaql D Task Launcher A validater B replicater C crawler D compressor

Question: 58 IBM InfoSphere Streams is designed to accomplish which Big Data function?

A analyze and react to data in motion before it is stored B find and analyze historical stream data stored on disk C analyze and summarize product sentiments posted to social media D execute ad-hoc queries against a Hadoop-based data warehouse

Question: 59 Which IBM Big Data solution provides low-latency analytics for processing data-in-motion?

A InfoSphere Information Server B InfoSphere Streams C InfoSphere BigInsights D PureData for Analytics

Question: 60 A Avro Which IBM tool enables BigInsights users to develop, test and publish BigInsights applications? B HBase C Eclipse D BigInsights Applications Catalog Question: 5 Which description identifies the real value of Big Data and Analytics?

A enabling customers to efficiently index and access large volumes of data B gaining new insight through the capabilities of the world's interconnected intelligence C providing solutions to help customers manage and grow large database systems D using modern technology to efficiently store the massive amounts of data generated by social networks examen big data

question Question: 1 Where must a Spark configuration be set up first?

reponses A Notebook B Db2 Warehouse C IBM Cloud D Watson Data Platform

Question: 2 A Watson Studio homepage When sharing a notebook, what will always point to the most recent version of the notebook? B The permalink C The Spark service D PixieDust visualization

The Spark configuration must be set up first through IBM Cloud

Notebooks can be shared. The permalink will always point to the most recent version of the notebook.

Question: 3 When creating a Watson Studio project, what do you need to specify?

A Spark service B Data service C Collaborators D Data assets

Question: 4 You can import preinstalled libraries if you are using which languages? (Select two.) (Please select ALL that apply)

AR B Python C Bash D Rexx E Scala

Question: 5 Who can control a Watson Studio project assets?

A Viewers B Editors C Collaborators D Tenants

Question: 6 Which environmental variable needs to be set to properly start ZooKeeper?

A ZOOKEEPER_APP B ZOOKEEPER_DATA C ZOOKEEPER D ZOOKEEPER_HOME

Question: 7 Which is the primary advantage of using column-based data formats over record-based formats?

A better compression using GZip B supports in-memory processing C facilitates SQL-based queries D faster query execution

Question: 8 What is the primary purpose of Apache NiFi?

A Collect and send data into a stream. B Finding data across the cluster. C Connect remote data sources via WiFi. D Identifying non-compliant data access.

When you create a project, you need to specify a Spark service. You can either create a new service or associate an existing one. You also need to specify an object store, which you can easily set up and associate from your Watson Studio account. You can import preinstalled libraries if you are using Python or R.

Choose the permissions for the collaborator. The Admin can control project asserts, collaborators, and setting. The Editor can control project assets. The Viewer can view the project. Collaborators can be removed from a project or have their permissions updated. pas trop sûre

but column-based storage formats (Parquet, ORC) provide not only faster query execution by minimizing IO but also great compression.

Question: 9 What are three examples of Big Data? (Choose three.) (Please select ALL that apply)

A cash register receipts B web server logs C inventory database records D bank records E photos posted on Instragram F messages tweeted on Twitter

Question: 10 What ZK CLI command is used to list all the ZNodes at the top level of the ZooKeeper hierarchy, in the ZooKeeper command-line interface?

A get / B create / C listquota / D ls /

Question: 11 What is the default data format Sqoop parses to export data to a database?

A JSON B CSV C XML D SQL

Question: 12 Under the MapReduce v1 architecture, which function is performed by the TaskTracker?

A Keeps the tasks physically close to the data. B Pushes map and reduce tasks out to DataNodes. C Manages storage and transmission of intermediate output. D Accepts MapReduce jobs submitted by clients.

VarietyBig data is collected and created in various formats and sources. It includes structured data as well as unstructured data like text, multimedia, social media, business reports etc.Structured data such as bank records, demographic data, inventory databases, business data, product data feeds have a defined structure and can be stored and analyzed using traditional data management and analysis methods. Unstructured data includes captured like images, tweets or Facebook status updates, instant messenger conversations, blogs, videos uploads, voice recordings, sensor data. These types of data do not have any defined pattern. Unstructured data is most of the time reflection of human thoughts, emotions and feelings which sometimes would be difficult to be expressed using exact words. Typels/in the ZooKeeper CLI prompt. This tells ZK to list the ZNodes at the top levelof the ZooKeeper node hierarchy.For this we needa slash ("/") after the lscommand:[zk: localhost: 2181(CONNECTED) 1] ls /[registry, ambari-metrics-cluster, hiveserver2, zookeeper, hbase-unsecure, rmstore] [zk: localhost:2181(CONNECTED) 2]

Question: 13 Which statement describes "Big Data" as it is used in the modern business world?

A Indexed databases containing very large volumes of historical data used for compliance reporting purposes. B Non-conventional methods used by businesses and organizations to capture, manage, process, and make sense of a large volume of data. C Structured data stores containing very large data sets such as video and audio streams. D The summarization of large indexed data stores to provide information about potential problems or opportunities.

Question: 14 Under the MapReduce v1 architecture, which function is performed by the JobTracker?

A Runs map and reduce tasks. B Accepts MapReduce jobs submitted by clients. C Manages storage and transmission of intermediate output. D Reports status to MasterNode.

Question: 15 Which statement is true about the Hadoop Distributed File System (HDFS)?

A HDFS is a software framework to support computing on large clusters of computers. B HDFS provides a web-based tool for managing Hadoop clusters. C HDFS links the disks on multiple nodes into one large file system. D HDFS is the framework for job scheduling and cluster resource management.

Question: 16 How does MapReduce use ZooKeeper?

A Coordination between servers. B Aid in the high availability of Resource Manager. C Master server election and discovery. D Server lease management of nodes.

Question: 17 Which two Spark libraries provide a native shell? (Choose two.) (Please select ALL that apply)

A Python B Scala C C# D Java E C++

Question: 18 What is an authentication mechanism in Hortonworks Data Platform?

A IP address B Preshared keys C Kerberos D Hardware token

Question: 19 What is Hortonworks DataPlane Services (DPS) used for?

A Manage, secure, and govern data stored across all storage environments. B Transform data from CSV format into native HDFS data. C Perform backup and recovery of data in the Hadoop ecosystem. D Keep data up to date by periodically refreshing stale data.

Question: 20 What must be done before using Sqoop to import from a relational database?

A Copy any appropriate JDBC driver JAR to $SQOOP_HOME/lib. B Complete the installation of Apache Accumulo. C Create a Java class to support the data import. D Create an empty database for Sqoop to access.

Question: 21 What is the native programming language for Spark?

A Scala B C++ C Java D Python

Question: 22 Which Hortonworks Data Platform (HDP) component provides a common web user interface for applications running on a Hadoop cluster? Question: 23 Which Spark RDD operation returns values after performing the evaluations?

A YARN B HDFS C Ambari D MapReduce A Transformations B Actions C Caching D Evaluations

Question: 24 Which two are use cases for deploying ZooKeeper? (Choose two.) (Please select ALL that apply)

A Configuration bootstrapping for new nodes. B Managing the hardware of cluster nodes. C Storing local temporary data files. D Simple data registry between nodes.

Question: 25 In a Hadoop cluster, which two are the result of adding more nodes to the cluster? (Choose two.) (Please select ALL that apply)

A DataNodes increase capacity while NameNodes increase processing power. B It adds capacity to the file system. C Scalability increases by a factor of x^N-1. D Capacity increases while fault tolerance decreases. E It increases available processing power.

Question: 26 Which Spark RDD operation creates a directed acyclic graph through lazy evaluations?

A Distribution B GraphX C Transformations D Actions

Question: 27 Which feature allows application developers to easily use the Ambari interface to integrate Hadoop provisioning, management, and monitoring capabilities into their own applications?

A REST APIs B Postgres RDBMS C Ambari Alert Framework D AMS APIs

Question: 28 What is one disadvantage to using CSV formatted data in a Hadoop data store?

A Columns of data must be separated by a delimiter. B Fields must be positioned at a fixed offset from the beginning of the record. C It is difficult to represent complex data structures such as maps. D Data must be extracted, cleansed, and loaded into the data warehouse.

Question: 29 Which element of Hadoop is responsible for spreading data across the cluster?

A YARN B MapReduce C AMS D HDFS

Question: 30 Which component of the Apache Ambari architecture stores the cluster configurations?

A Authorization Provider B Ambari Metrics System C Postgres RDBMS D Ambari Alert Framework

Question: 31 Which two are examples of personally identifiable information (PII)? (Choose two.) (Please select ALL that apply)

A Time of interaction B Medical record number C Email address D IP address

Question: 32 Under the MapReduce v1 architecture, which element of the system manages the map and reduce functions?

A SlaveNode B JobTracker C MasterNode D StorageNode E TaskTracker

Question: 33 Which component of the HDFS architecture manages storage attached to the nodes?

A NameNode B StorageNode C DataNode D MasterNode

Question: 34 A Volume Which of the "Five V's" of Big Data describes the real purpose of deriving business insight from B Value Big Data? C Variety D Velocity E Veracity Question: 35 Which component of the Spark Unified Stack supports learning algorithms such as, logistic regression, naive Bayes classification, and SVM?

A Spark Learning B Spork C Spark SQL D MLlib

Question: 36 Which two descriptions are advantages of Hadoop? (Choose two.) (Please select ALL that apply)

A able to use inexpensive commodity hardware B intensive calculations on small amounts of data C processing a large number of small files D processing random access transactions E processing large volumes of data with high throughput

Question: 37 Which two of the following are row-based data encoding formats? (Choose two.) (Please select ALL that apply)

A CSV B Avro C ETL D Parquet E RC and ORC

Question: 38 A The data is spread out and replicated across the cluster. Which statement describes the action performed by HDFS when data is written to the Hadoop B The data is replicated to at least 5 different computers. cluster? C The MasterNodes write the data to disk. D The FsImage is updated with the new data map. Question: 39 Under the MapReduce v1 architecture, which element of MapReduce controls job execution on multiple slaves?

A MasterNode B JobTracker C SlaveNode D TaskTracker E StorageNode

Question: 40 Which component of the Spark Unified Stack provides processing of data arriving at the system in real-time?

A MLlib B Spark SQL C Spark Streaming D Spark Live

Question: 41 Which two registries are used for compiler and runtime performance improvements in support of the Big SQL environment? (Choose two) (Please select ALL that apply)

A DB2ATSENABLE B DB2F ODC C DB2COMPOPT D DB2RSHTIMEOUT E DB2SORTAFTER_TQ

Question: 42 Which script is used to backup and restore the Big SQL database?

A bigsql_bar.py B db2.sh C bigsql.sh D load.py

Question: 43 You need to create a table that is not managed by the Big SQL database manager. Which keyword would you use to create the table?

A STRING B BOOLEAN C SMALLINT D EXTERNAL

Question: 44 Which two of the following data sources are currently supported by Big SQL? (Choose two) (Please select ALL that apply)

A Oracle B PostgreSQL C Teradata D MySQL E MariaDB

Question: 45 Which port is the default for the Big SQL Scheduler to get administrator commands?

A 7055 B 7054 C 7052 D 7053

Question: 46 Which tool should you use to enable Kerberos security?

A Hortonworks B Ambari C Apache Ranger D Hive

Question: 47 Which two options can be used to start and stop Big SQL? (Choose two) (Please select ALL that apply)

A Scheduler B DSM Console C Command line D Java SQL shell

Question: 48 Which command is used to populate a Big SQL table?

A CREATE B QUERY C SET D LOAD

Question: 49 Which feature allows the bigsql user to securely access data in Hadoop on behalf of another user?

Question: 50 Which command would you run to make a remote table accessible using an alias?

A Impersonation B Privilege C Rights D Schema A SET AUTHORIZATION B CREATE SERVER C CREATE WRAPPER D CREATE NICKNAME

Question: 51 The Big SQL head node has a set of processes running. What is the name of the service ID running these processes?

A Db2 B hdfs C user1 D bigsql

Question: 52 Which file format contains human-readable data where the column values are separated by a comma?

A Parquet B ORC C Delimited D Sequence

Question: 53 Which Big SQL authentication mode is designed to provide strong authentication for client/server applications by using secret-key cryptography?

A Public key B Flat files C Kerberos D LDAP

Question: 54 Which type of foundation does Big SQL build on?

A Jupyter B Apache HIVE C RStudio D MapReduce

Question: 55 You need to monitor and manage data security across a Hadoop platform. Which tool would you use?

A SSL B HDFS C Hive D Apache Ranger

Question: 56 A """ What can be used to surround a multi-line string in a Python code cell by appearing before and B " after the multi-line string? C Question: 57 For what are interactive notebooks used by data scientists?

A Packaging data for public distribution on a website. B Quick data exploration tasks that can be reproduced. C Providing a chain of custody of all data. D Bulk loading data into a database.

Question: 58 What Python statement is used to add a library to the current code cell?

A pull B import C load D using

Question: 59 What Python package has support for linear algebra, optimization, mathematical integration, and statistics?

A NLTK B Pandas C NumPy D SciPy

Question: 60 Which three main areas make up Data Science according to Drew Conway? (Choose three.) (Please select ALL that apply)

A Traditional research B Machine learning C Substantive expertise D Math and statistics knowledge E Hacking skills

Big Data Engineer v2IBM Certification2018 question 1/ What are the 4Vs of Big Data? (Please select the FOUR that apply) 2/ What are the three types of Big Data? (Please select the THREE that apply) 3/ Select all the components of HDP which provides data access capabilities 4/ Select the components that provides the capability to move data from relational database into Hadoop. 5/ Managing Hadoop clusters can be accomplished using which component? 6/ True or False: The following components are value-add from IBM: Big Replicate, Big SQL, BigIntegrate, BigQuality, Big Match 7/ True or False: Data Science capabilities can be achieved using only HDP. 8/ True or False: Ambari is backed by RESTful APIs for developers to easily integrate with their own applications. 9/ Which Hadoop functionalities does Ambari provide? 10/ Which page from the Ambari UI allows you to check the versions of the software installed on your cluster? 11/ True or False?Creating users through the Ambari UI will also create the user on the HDFS. 12/ True or False? You can use the CURL commands to issue commands to Ambari 13/ True or False: Hadoop systems are designed for transaction processing. 14/ What is the default number of replicas in a Hadoop system? 15/ True or False: One of the driving principal of Hadoop is that the data is brought to the program. 16/ True or False: Atleast 2 Name Nodes are required for a standalone Hadoop cluster. 17/ True or False: The phases in a MR job are Map, Shuffle, Reduce and Combiner 18/ Centralized handling of job control flow is one of the the limitations of MRv1 19/ The Job Tracker in MR1 is replaced by which component(s) in YARN? 20/ What are the benefits of using Spark? (Please select the THREE that apply) 21/ What are the languages supported by Spark? (Please select the THREE that apply) 22/ Resilient Distributed Dataset (RDD) is the primary abstraction of Spark. 23/ What would you need to do in a Spark application that you would not need to do in a Spark shell to start using Spark? 24/ True or False: NoSQL database is designed for those that do not want to use SQL. 25/ Which database is a columnar storage database? 26/ Which database provides a SQL for Hadoop interface?

reponses •Veracity •Velocity •Variety •Semi-structured •Structured •Unstructured •Pig •MapReduce •Hive •Sqoop

•Kafka

•Volume

•Flume

Ambari TRUE FALSE(Big Data Ecosystem UNIT 2) True Manage

•Provision

•Integrate

The Admin > Manage Ambari page FALSE TRUE FALSE 3 FALSE(Big Data Ecosystem UNIT 4) FALSE(Big Data Ecosystem UNIT 4) TRUE TRUE •ApplicationMaster •ResourceManager •Generality •Speed •Python •Java True

•East of use •Scala

Import the necessary libraries to load the SparkContext FALSE(Big Data Ecosystem UNIT 7 Hbase Hive

•Monitor

27/ Which Apache project provides coordination of resources? 28/ What is ZooKeeper's role in the Hadoop infrastructure?

29/ True or False: Slider provides an intuitive UI which allows you to dynamically allocate YARN resources. 30/ True or False: Knox can provide all the security you need within your Hadoop infrastructure. 31/ True or False: Sqoop is used to transfer data between Hadoop and relational databases. 32/True or False: For Sqoop to connect to a relational database, the JDBC JAR files for that database must be located in $SQOOP_HOME/bin. 33/ True or False: Each Flume node receives data as "source", stores it in a "channel", and sends it via a "sink". 34/ Through what HDP component are Kerberos, Knox, and Ranger managed? 35/ Which security component is used to provide peripheral security? 36/ One of the governance issue that Hortonworks DataPlane Service (DPS) address is visibility over all of an organization's data across all of their environments — on-prem, cloud, hybrid — while making it easy to maintain consistent security and governance 37/ True or false: The typical sources of streaming data are Sensors, "Data exhaust" and highrate transaction data. 38/ What are the components of Hortonworks Data Flow(HDF)? 39/ True or False: NiFi is a disk-based, microbatch ETL tool that provides flow management 40/ True or False: MiNiFi is a complementary data collection tool that feeds collected data to NiFi 41/ What main features does IBM Streams provide as a Streaming Data Platform? (Please select the THREE that apply) 42/ What are the most important computer languages for Data Analytics?(Please select the THREE that apply 43/ True or False: GPUs are special-purpose processors that traditionally can be used to power graphical displays, but for Data Analytics lend themselves tofaster algorithm execution because of the large number of independent processing cores. 44/ True or False: Jupyter stores its workbooks in files with the .ipynb suffix. These files can not be stored locally or on a hub server. 45/ $BIGSQL_HOME/bin/bigsql startcommand is used to start Big SQL from the command line? 46/ What are the two ways you can work with Big SQL.(Please select the TWO that apply) 47/ What is one of the reasons to use Big SQL? 48/ Should you use the default STRING data type? 49/The BOOLEAN type is defined as SMALLINT SQL type in Big SQL. 50/ Using the LOAD operation is the recommended method for getting data into your Big SQL table for best performance. 51/ Which file storage format has the highest performance? 52/ What are the two ways to classify functions? 53/ True or False: UMASK is used to determine permissions on directories and files. 54/ True or False: You can only Kerberize a Big SQL server before it is installed.

ZooKeeper •Manage the coordination between HBase servers •Hadoop and MapReduce uses ZooKeeper to aid in high availability of Resource Manager •Flume uses ZooKeeper for configuration purposes in recent releases FALSE(Big Data Ecosystem UNIT 8) FALSE(Big Data Ecosystem UNIT 8) True FALSE(Big Data Ecosystem UNIT 9) True Ambari Apache Knox True True •Flow management •Stream processing •Enterprise services True True •Analysis and visualization •Rich data connections •Development support •Python

•R

•Scala

True FALSE(Introduction to Data Science UNIT 1) True •Jsqsh •Web tooling from DSM Want to access your Hadoop data without using MapReduce No(Big SQL UNIT 2) True True Parquet •Built-in functions •User-defined functions True False(Big SQL UNIT 4)

55/ True or False: Authentication with Big SQL only occurs at the Big SQL layer or the client's False(Big SQL UNIT 4) application layer. False(Big SQL UNIT 4) 56/ True or False: Ranger and impersonation works well together. True 57/ True or False: RCAC can hide rows and columns. False(Big SQL UNIT 5) 58/ True or False: Nicknames can be used for wrappers and servers. True 59/ True or False: Server objects defines the property and values of the connection. 60/ True or False: The purpose of a wrapper provide a library of routines that doesn't False(Big SQL UNIT 5) communicates with the data source. True 61/ True or False: User mappings are used to authenticate to the remote datasource. 62/ True or False: Collaboration with Watson Studio is an optional add-on component that False(Watson Studio UNIT 1) must be purchased. 63/ True or False: Watson Studio is designed only for Data Scientists, other personas would False(Watson Studio UNIT 1) not know how to use it. 64/ True or False: Community provides access to articles, tutorials, and even data sets that you True can use. True 65/ True or False: You can import visualization libraries into Watson Studio. True 66/ True or False: Collaborators can be given certain access levels. False(Watson Studio UNIT 2) Jupyter as notebook 67/True or False: Watson Studio contains Zeppelin as a notebook interface. Big Data QCM question reponses Which component connects sinks and sources in Flume? A. HDFS B. ElasticSearch * C. channels D. Interceptors Why does YARN scale better than Hadoop v1 for multiple jobs? (Choose two.) A. There is one Job Tracker per cluster. (Please select ALL that apply) * B. Job tracking and resource management are split. C. Job tracking and resource management are one process. * D. There is one Application Master per job. What happens if a task fails during a Hadoop job execution? A. The job will be restarted with different compute nodes. B. The entire job will fail. C. The job will finish with incomplete results. * D. The task will be restarted on another node. What command will list files located on the HDFS in R? A. bigr.dir() B. ls() * C. bigr.listfs() D. list() Which Big SQL datatype should be avoided because it causes significant performance degradation?

You need to create multiple Big SQL tables with columns defined as CHAR. What needs to be set to enable CHAR columns?

A. CHAR * B. STRING C. UNION D. VARCHAR * A. SET SYSHADOOP.COMPATIBILITY_MODE=1 B. CREATE TABLE chartab C. SET HADOOPCOMPATIBLITY_MODE=True D. ALTER CHAR DATATYPE TO byte

What is the primary core abstraction of Apache Spark?

A. GraphX * B. Resilient Distributed Dataset (RDD) C. Spark Streaming D. Directed Acyclic Graph (DAG)

Which Text Analytics runtime component is used for languages such as Spanish and English by A. Named entity extractors breaking a stream of text into phrases or words? B. Other extractors * C. Standard tokenizer D. Multilingual tokenizer Question : Which two commands are used to load data into an existing Big SQL table from * A. Load HDFS? (Choose two.) B. Table (Please select ALL that apply) C. Select * D. Insert E. Create Which command should you use to set the default schema in a Big SQL table and also create A. default the schema if it does not exist? B. create C. format * D. use What is missing from the following statement when querying a remote table? CREATE A. TABLE _______ FOR remotetable1 … B. VIEW * C. NICKNAME D. INDEX What are two major business advantages of using BigSheets? (Choose two.) (Please select ALL that apply)

* A. built-in data readers for multiple formats * B. spreadsheet-like querying and discovery interface C. command-line-driven data analysis D. feature rich programming environment

Where should you build extractors in the Information Extraction Web Tool?

A. Documents * B. Canvas C. Property pane D. Regular expression

In which text analytics phase are extractors developed and tested?

A. Analysis * B. Rule Development C. Production D. Performance Tuning * A. Intermediate results are aggregated. B. The TaskTracker distributes the job to the cluster. C. The initial problem is broken into pieces. D. The JobTrackers execute their assigned tasks.

Which action is performed during the Reduce step of a MapReduce v1 processing cycle?

What are two benefits of using the IBM Big SQL processing engine? (Choose two.) (Please select ALL that apply)

A. Core functionality is written in Java for portability. B. The system is built to be started and stopped on demand. * C. Various data storage formats are supported. * D. It provides access to Hadoop data using SQL.

An organization is developing a proof-of-concept for a big data system. Which phase of the big * A. Engage data adoption cycle is the company currently in? B. Execute C. Explore D. Educate Which feature in a Big SQL federation is a library to access a particular type of data source? A. server B. table C. view * D. wrapper A. generates shell programs for running components of Hadoop What is a feature of Apache ZooKeeper? B. monitors log files of cluster members * C. maintains configuration information for a cluster D. performance tunes a running cluster Which open source component is a big data processing framework?

Which command is used to launch an interactive Python shell for Spark?

What are the two types of Spark operations? (Choose two.) (Please select ALL that apply)

* A. Apache Spark B. Apache Ambari C. IBM BigSheets D. IBM Big SQL A. python -spark * B. pyspark C. hadoop pyshell D. spark-shell A. Sequences B. Vectors C. DataFrames * D. Transformations * E. Actions

Which statement is true regarding Reduce tasks in MapReduce?

What command will load the BigR package in R?

Which programming language is Apache Spark primarily written in?

Which feature of Text Analytics should you use to process Japanese or Chinese language text?

* A. They can run on any node. B. They only run on nodes that didn't generate data during the Map step. C. They run only on nodes that generated data during the Map step. D. They only run on one node. * A. library(bigr) B. source("bigr") C. dir(pattern="bigr") D. bigr.connect * A. Scala B. Java C. Python 2 D. C++ A. Annotation Query Language (AQL) B. Standard tokenizer * C. Multilingual tokenizer D. Online Analytical Programming (OLAP)

Which kind of HBase row key maps to multiple SQL columns? A. Primary B. Dense C. Unique * D. Composite

What does the HCatalog component of Hive provide?

Which action is performed prior to the Map step of a MapReduce v1 processing cycle?

Which integration API does Apache Ambari support?

How does an end-user interact with the IBM BigSheets tool?

Which software is at the core of the IBM BigInsights platform?

What command is used to retrieve multiple rows out of an HBase table?

Which format is used to export extractor results?

How does Sqoop decide how to split data across mappers?

How should you use the pre-built extractors for a new project?

Why is the SYSPROC.SYSINSTALLOBJECT procedure used with Big SQL?

What does the programmatic implementation of a Map function do?

A. collecting common data transformations into a library B. maintaining an inventory of cluster nodes * C. table and storage management layer for Hadoop D. providing a REST gateway for jobs A. The job is sent sequentially to all nodes. B. Output result sets are simplified to a single answer. C. The data required is moved to the fastest nodes. * D. The job is broken into individual task pieces and distributed. * A. REST B. RMI C. SOAP D. RPC A. IBM-built desktop app B. command line C. mobile app * D. web browser * A. open source components B. customer developed software C. proprietary IBM libraries D. cloud-based web services A. pull B. select * C. scan D. get A. TXT B. RTF C. JSON * D. CSV * A. examining the primary key B. moving the data to the closest network node C. dividing the input bytes by available nodes D. applying the split size to the data A. Assign them to a new query. B. Right click on the extractor and select Edit Output. * C. Drag and drop them onto canvas. D. Convert them to AQL Statements. A. to create a SNAPSHOT column B. to set the location of the EXPLAIN.DDL C. to specify the SQL statement to be explained * D. to create an EXPLAIN table * A. Reads the data file and performs a transformation. B. Combines previous results into an aggregate. C. Locates the data in the DFS. D. Computes the final result of the entire job.

Which statement will create a table with parquet files?

What is the JSqsh tool used for?

What does the bucketing feature of Hive do?

What advantage does the Text Analytics Web UI give you?

Which AQL candidate rule combines tuples from two views with the same schema?

Data collected within your organization has a short period of time when it is relevant. Which characteristic of a big data system does this represent?

Assuming the same data is stored in multiple data formats, which format will provide faster query execution and require the least amount of IO operations to process?

Which feature of Text Analytics allows you to rollback your extractors when necessary?

What defines a relation in an AQL extractor?

Which command must be run after compiling a Java program so it can run on the Hadoop cluster?

What type of NoSQL datastore does HBase fall into?

* A. CREATE HADOOP TABLE T ( i int, s VARCHAR(10)) STORED AS PARQUETFILE;. B. CREATE HADOOP TABLE T ( i int, s VARCHAR(10)) STORED AS PARQUET; C. CREATE HADOOP TABLE T ( i int, s VARCHAR(10)) SAVE AS PARQUETFILE; D. CREATE HADOOP TABLE T ( i int, s VARCHAR(10)) SAVE AS PARQUET; A. web-based SQL editing B. deploying the SQL JDBC driver * C. command-line SQL queries D. installing the IBM Data Server Manager (DSM) * A. sub-partitioning/grouping of data by hash within partitions B. allows data to be stored in arrays C. splits data into collections based on ranges D. distributes the data dynamically for faster processing * A. It generates the AQL syntax for you. B. It allows only single data types. C. It allows only one type of file extension. D. It teaches you how to write AQL syntax. A. Blocks B. Select * C. Union D. Sequence * A. Velocity B. Validation C. Variety D. Volume * A. Parquet B. XML C. flat file D. JSON * A. Snapshots B. Standard tokenizer C. Scalar functions D. Multilingual tokenizer * A. a view B. a row C. a schema D. a column * A. jar cf name.jar *.class B. hadoop classpath C. jar tf name.jar D. rm hadoop.class A. document B. key-value * C. column D. graph

Which data inconsistency may appear while using ZooKeeper?

What is required to run an EXPLAIN statement in Big SQL?

Which command must be run first to become the HDFS user?

Which Big SQL file format is human readable and supported by most tools, but is the least efficient file format?

What is the default install location for the IBM Open Data Platform on Linux?

You need to populate a Big SQL table to test an operation. Which INSERT statement is recommended for testing, only because it does not support parallel reads or writes?

Which command is used to launch an interactive Apache Spark shell?

A. excessively stale data views * B. simultaneously inconsistent cross-client views C. unreliable client updates across the cluster D. out-of-order updates across clients A. the explainable-sql-statement clause B. the SYSPROC.SYSINSTALLOBJECT procedure * C. proper authorization D. a rule * A. su - hdfs B. hadoop fs C. pwd D. hdfs * A. Delimited B. Parquet C. Sequence D. Avro A. /opt/ibm/iop B. /var/iop C. /usr/local/iop * D. /usr/iop A. INSERT INTO ... SELECT FROM ... * B. INSERT INTO ... VALUES (...) C. INSERT INTO ... SELECT … D. INSERT INTO ... SELECT ... WHERE …

A. scala --spark B. hadoop spark C. spark * D. spark-shell Which data inconsistency may appear while using ZooKeeper? A. excessively stale data views B. simultaneously inconsistent cross-client views C. out-of-order updates across clients D. unreliable client updates across the cluster Which statement will create a table with parquet files? A. CREATE HADOOP TABLE T ( i int, s VARCHAR(10)) STORED AS PARQUET; B. CREATE HADOOP TABLE T ( i int, s VARCHAR(10)) SAVE AS PARQUETFILE; * C. CREATE HADOOP TABLE T ( i int, s VARCHAR(10)) STORED AS PARQUETFILE;. D. CREATE HADOOP TABLE T ( i int, s VARCHAR(10)) SAVE AS PARQUET; What are extractors transformed into when they are executed? A. Candidate generation statements B. BigSheets function statements * C. Annotated Query Language (AQL) statements D. Online Analytical Programming (OLAP) statements You need to set up the command-line interface JSqsh to connect to a bigsql database. What is A. Run the $JSQSH_HOME/bin/JSQSH script. the recommended method to set up the connection? B. Run the JSqsh driver wizard. C. Modify database parameters in the .jsqsh/connections.xml file. * D. Run the JSqsh connection wizard.

How will the following column mapping command be encoded? cf_data:full_names mapped by (last_name, First_name) separator ','

Which underlying data representation and access method does Big SQL use?

What does the MLlib component of Apache Spark support?

Which type of HBase column is mapped to multiple SQL columns?

What is a key factor in determining how to implement file compression with HDFS?

What is used in a Big SQL file system to organize tables?

What command is used to start a Flume agent?

Which component is required for Flume to work?

A. Hex B. Character C. Binary * D. String * A. Hive B. TINYINT C. MAP D. SMALLINT * A. scalable machine learning B. graph computation C. SQL and HiveQL D. stream processing A. Composite B. Exclusive * C. Dense D. Double * A. compression algorithm supports splitting B. the CPU speed of the cluster members (MHz) C. the speed of network transfers between nodes D. the amount of storage space needed for all files A. JSqsh B. DSM * C. schemas D. partitions A. flume-start B. flume-src * C. flume-ng D. flume-agent A. RDBMS B. Syslog * C. Data source D. Interceptor

When creating a new table in Big SQL, what additional keyword is used in the CREATE TABLE statement to create the table in HDFS?

A. dfs * B. hadoop C. replicated D. cloud

What is a feature of an Avro file?

A. directly readable by JavaScript * B. versioning of the data C. columns delimited by commas D. formal schema language

What does the federation feature of Big SQL allow?

A. tuning server hardware performance B. importing data into HDFS C. rewriting statements for better execution performance * D. querying multiple data sources in one statement

A Hadoop file listing is performed and one of the output lines is: -rw-r--r-- 5 biadmin biadmin 871233 2015-09-12 09:33 data.txt What does the 5 in the output represent?

A. permissions * B. replication factor C. login id of the file owner D. data size

What privilege is required to execute an EXPLAIN statement with INSERT privileges in Big SQL? A. SYSMON authority B. SECADM authority * C. SQLADM authority D. SYSCTRL authority What does a computer need to understand unstructured data?

What is the ApplicationMaster in YARN responsible for? (Choose two.) (Please select ALL that apply)

What is a limitation of Apache Spark?

How is a sequence created in Canvas?

Which type of key does HBase require in each row in an HBase table?

What is a key factor in determining how to implement file compression with HDFS?

Which two components make up a Hadoop node? (Choose two.) (Please select ALL that apply)

Which statement is used to set the correct compatible collation with Big SQL?

How does Apache Ambari use the Ganglia component?

* A. context B. usage C. extractors D. attribute types * A. monitoring node execution status * B. obtaining resources for computation C. taking nodes offline for maintenance D. allocating resources from all nodes * A. It does not have universal tools. B. It does not support streams. C. It does not run Hadoop. D. It does not in itself interact with SQL. A. Click on the New Literal button. * B. Drag and drop one extractor onto another. C. Right click on the extractor, and select Edit Output. D. Select multiple extractors on the result pane. A. Duplicate B. Foreign * C. Unique D. Primary * A. compression algorithm supports splitting B. the speed of network transfers between nodes C. the amount of storage space needed for all files D. the CPU speed of the cluster members (MHz) * A. CPU B. network C. memory * D. disk * A. CREATE SERVER B. SEQUENCE C. PUSHDOWN D. CREATE WRAPPER A. to predict hardware failures * B. to monitor cluster performance C. to cluster job scheduling D. to add new nodes to the cluster

How can you fix duplicate results generated by an extractor from the same text because the text matches more than one dictionary entry?

A. edit output with overlapping matches B. remove union statement * C. remove with a consolidation rule D. edit properties of the sequence

Which statement best describes Spark?

A. An instance of a federated database. * B. A computing engine for a large-scale data set. C. A logical view on top of Hadoop data. D. An open source database query tool. A. browse job information B. view service status * C. modify configurations * D. run service checks A. 5 B. 1 C. 10 * D. 3 * A. Updates completely succeed or fail. B. Updates are applied in the order created. C. If an update succeeds, then it persists. D. Every client sees the same view. A. Dictionary B. Part of Speech * C. Literals D. Splits

Which two tasks can an Apache Ambari admin do that a regular Apache Ambari user cannot do? (Choose two.) (Please select ALL that apply) What is the default replication factor for HDFS on a production cluster?

In the ZooKeeper environment, what does atomicity guarantee?

Which basic feature rule of AQL helps find an exact match to a single word or phrase?

What is the default data type in Big R?

* A. B. C. D.

character integer complex numeric

You have a very large Hadoop file system. You need to work on the data without migrating the A. MapReduce data out or changing the data format. Which IBM tool should you use? * B. Big SQL C. Data Server Manager D. Pig Which core component of the Hadoop framework is highly scalable and a common tool? A. Sqoop B. Pig * C. MapReduce D. Hive How can you reduce the memory usage of the ANALYZE command in Big SQL? A. Run everything in one batch. B. Turn on distribution statistics. * C. Run the command separately on different batches of columns. D. Include all the columns in the batch. What should you do in Text Analytics to fix an extractor that produces unwanted results? A. Re-create the extractors. B. Remove results with a consolidation rule. * C. Create a new filter. D. Edit the properties of the sequence.

QUESTIONS

REPONSES

CORRECTION

A The list of deployments B A list of your saved bookmarks C The email address of the collaborator You need to add a collaborator to your project. What do you need? D Your project ID A URL B Scala C File Before you create a Jupyter notebook in Watson Studio, which two items are necessary? D Spark Instance (Please select the TWO that apply) E Project A Database B Wrapper C Object Storage Where does the unstructured data of a project reside in Watson Studio? D Tables A Watson Studio Desktop B Watson Studio Cloud C Watson Studio Business Which Watson Studio offering used to be available through something known as IBM Bluemix? D Watson Studio Local A Data Assets B Projects C Collaborators What is the architecture of Watson Studio centered on? D Analytic Assets A INSERT B GRANT C REVOKE Which two commands would you use to give or remove certain privileges to/from a user? D LOAD (Please select the TWO that apply) E SELECT A4 B2 C1 How many Big SQL management node do you need at minimum? D3 A /apps/hive/warehouse/data B /apps/hive/warehouse/bigsql C /apps/hive/warehouse/ What is the default directory in HDFS where tables are stored? D /apps/hive/warehouse/schema A ./java mybigdata B ./jsqsh mybigdata C ./java tables Using the Java SQL Shell, which command will connect to a database called mybigdata? D ./jsqsh go mybigdata A 777 B 755 C 700 Which directory permissions need to be set to allow all users to create their own schema? D 666 A Files B Schemas C Hives What are Big SQL database tables organized into? D Directories

c

de

c

b

b

bc

c

c

b

a

b

A hdfs dfs -chmod 770 /hive/warehouse B hdfs dfs -chmod 755 /hive/warehouse C hdfs dfs -chmod 700 /hive/warehouse You have a distributed file system (DFS) and need to set permissions on the the /hive/warehouseDdirectory hdfs dfs to -chmod allow 666 access /hive/warehouse to ONLY the bigsql user. Which command would you run? A It grants or revokes certain user privileges. B It grants or revokes certain directory privileges. C It limits the rows or columns returned based on certain criteria. Which definition best describes RCAC? D It limits access by using views and stored procedures. A A data type of a column describing its value. B The defined format and rules around a delimited file. C A container for any record format. Which statement best describes a Big SQL database table? D A directory with zero or more data files. A Big SQL can exploit advanced features B Data interchange outside Hadoop C Supported by multiple I/O engines What is an advantage of the ORC file format? D Efficient compression A GRANT B umask C Kerberos You need to determine the permission setting for a new schema directory. Which tool would youDuse? HDFS A Scheduler B DSM C Jupyter Which tool would you use to create a connection to your Big SQL database? D Ambari A bigsql.alltables.io.doAs B DB2COMPOPT C DB2_ATS_ENABLE You need to enable impersonation. Which two properties in the bigsql-conf.xml file need to be marked true? D $BIGSQL_HOME/conf (Please select the TWO that apply) E bigsql.impersonation.create.table.grant.public A CREATE AS parquet B STORED AS parquetfile C STORED AS parquet You are creating a new table and need to format it with parquet. Which partial SQL statement would D CREATE createAS the parquetfile table in parquet format? A TRANSLATE FUNCTION B CREATE FUNCTION C ALTER MODULE ADD FUNCTION Which command creates a user-defined schema function? D ALTER MODULE PUBLISH FUNCTION A YARN B Spark C Pig Which Apache Hadoop application provides an SQL-like interface to allow abstraction of data on semi-structured D Hive data in a Hadoop datastore? A A messaging system for real-time data pipelines. B A wizard for installing Hadoop services on host servers. C Moves information to/from structured databases. Which description characterizes a function provided by Apache Ambari? D Moves large amounts of streaming event data.

c

c

d

d

b

b

ae

b

b

d

b

A Writes to a leader server will always succeed. B All servers keep a copy of the shared data in memory. C There can be more than one leader server at a time. Which statement accurately describes how ZooKeeper works? D Clients connect to multiple servers at the same time. b A MemcacheD B CouchDB C Riak Which NoSQL datastore type began as an implementation of Google's BigTable that can store anyDtype Hbase of data and scale to many petabytes? d A It is a powerful platform for managing large volumes of structured data. B It is designed specifically for IBM Big Data customers. C It is a Hadoop distribution based on a centralized architecture with YARN at its core. Which statement is true about Hortonworks Data Platform (HDP)? D It is engineered and developed by IBM's BigInsights team. c A Pig B Hive C Python What is the name of the Hadoop-related Apache project that utilizes an in-memory architecture to D Spark run applications faster than MapReduce? d A It runs on Hadoop clusters with RAM drives configured on each DataNode. B It supports HDFS, MS-SQL, and Oracle. C It is much faster than MapReduce for complex applications on disk. Which statement about Apache Spark is true? D It features APIs for C++ and .NET. c A It determines the size and distribution of data split in the Map phase. B It aggregates all input data before it goes through the Map phase. C It reduces the amount of data that is sent to the Reducer task nodes. Which statement is true about the Combiner phase of the MapReduce architecture? D It is performed after the Reducer phase to produce the final output. c A MapReduce v1 APIs cannot be used with YARN. B MapReduce v1 APIs provide a flexible execution environment to run MapReduce. C MapReduce v1 APIs are implemented by applications which are largely independent of the execution environment. Which statement is true about MapReduce v1 APIs? D MapReduce v1 APIs define how MapReduce jobs are executed. c A Hive B Cloudbreak C Big SQL D MapReduce Hadoop 2 consists of which three open-source sub-projects maintained by the Apache Software Foundation? E HDFS (Please select the THREE that apply) F YARN def A Authorization Provider B Ambari Alert Framework C Postgres RDBMS Which component of the Apache Ambari architecture integrates with an organization's LDAP or Active D RESTDirectory API service? a A Authenticating and auditing user access. B Loading bulk data into an Hadoop cluster. What are two services provided by ZooKeeper? C Maintaining configuration information. (Please select the TWO that apply) D Providing distributed synchronization. cd A Sesame B Neo4j C MongoDB What is an example of a Key-value type of NoSQL datastore? D REDIS d

A MLlib B RDD C Mesos Which Spark Core function provides the main element of Spark API? D YARN A org.apache.mr B org.apache.hadoop.mr C org.apache.hadoop.mapred Which is the java class prefix for the MapReduce v1 APIs? D org.apache.mapreduce A SSD B JBOD C RAID Which hardware feature on an Hadoop datanode is recommended for cost efficient performance? D LVM A ApplicationMaster B JobMaster C TaskManager Under the YARN/MRv2 framework, the JobTracker functions are split into which two daemons? D ScheduleManager (Please select the TWO that apply) E ResourceManager A The number of rows to commit per transaction. B The number of rows to send to each mapper. C The table name to export from the database. What does the split-by parameter tell Sqoop? D The column to use as the primary key. A Ambari B Google File System C HBase Hadoop uses which two Google technologies as its foundation? D YARN (Please select the TWO that apply) E MapReduce A Requires extremely rapid processing. B Data is processed in batch. Which two are attributes of streaming data? C Simple, numeric data. (Please select the TWO that apply) D Sent in high volume. Which statement accurately describes how ZooKeeper works? All servers keep a copy of the shared data in memory. SequenceFiles A API and perimeter security. B Management of Kerberos in the cluster. What two security functions does Apache Knox provide? C Proxying services. (Please select the TWO that apply) D Database field access auditing. A REDIS B HBase C Cassandra What is an example of a NoSQL datastore of the "Document Store" type? D MongoDB A Hive B Sqoop C Pig Which Apache Hadoop application provides a high-level programming language for data transformation D Zookeeper on unstructured data?

b

c

b

ae

d

be

ad

ac

d

c

A Big Data B Big Match C Big Replicate D Big SQL What are three IBM value-add components to the Hortonworks Data Platform (HDP)? E Big YARN (Please select the THREE that apply) F Big Index A Accumulo B HBase C Oozie Which Hadoop ecosystem tool can import data into a Hadoop cluster from a DB2, MySQL, or other D Sqoop databases? A Data Protection B Speed C Resiliency Which three are a part of the Five Pillars of Security? D Audit (Please select the THREE that apply) E Administration A MLlib B Mesos C Spark SQL Which component of the Spark Unified Stack allows developers to intermix structured database queries D Java with Spark's programming language? A Hadoop YARN B Apache Mesos C Nomad Apache Spark can run on which two of the following cluster managers? D Linux Cluster Manager (Please select the TWO that apply) E oneSIS A Suitable for transaction processing. B Libraries that support SQL queries. C APIs for Scala, Python, C++, and .NET. Which feature makes Apache Spark much easier to use than MapReduce? D Applications run in-memory. A Run Sqoop using the vi editor. B Use the --import-command line argument. What are two ways the command-line parameters for a Sqoop invocation can be simplified? C Include the --options-file command line argument. (Please select the TWO that apply) D Place the commands in a file. A NodeChildrenChanged B NodeDeleted Which two are valid watches for ZNodes in ZooKeeper? C NodeExpired (Please select the TWO that apply) D NodeRefreshed A ResourceManager B JobMaster C ScheduleManager Under the YARN/MRv2 framework, which daemon arbitrates the execution of tasks among all theDapplications ApplicationMaster in the system? A NiFi B Hortonworks Data Flow C Druid What is the preferred replacement for Flume? D Storm A Use the -mapper 1 parameter. B Use the --limit mapper=1 parameter. C Use the -m 1 parameter. How can a Sqoop invocation be constrained to only run one mapper? D Use the --single parameter.

bcd

d

ade

c

ab

b

cd

ab

a

b

c

A Reduce B Map C Combiner Under the MapReduce v1 programming model, which optional phase is executed simultaneouslyDwith Splitthe Shuffle phase? A Acquisition B Manipulation C Exploration What is the first step in a data science pipeline? D Analytics A Holding the output of a computation. B Configuring data connections. C Documenting the computational process. What is a markdown cell used for in a data science notebook? D Writing code to transform data. A Common desktop app. B Database interface. C Linux SSH session. What does the user interface for Jupyter look like to a user? D App in web browser. A To display a simple bar chart of data on the screen. B To collect video for use in streaming data applications. C To perform certain data transformation quickly. Why might a data scientist need a particular kind of GPU (graphics processing unit)? D To input commands to a data science notebook. A %dirmagic B %lsmagic C %list-magic What command is used to list the "magic" commands in Jupyter? D %list-all-magic A The list of deployments B A list of your saved bookmarks You need to add a collaborator to your project. What do you need? C The email address of the collaborator Your answer D Your project ID A record locking B batch processing C machine learning D transaction processing Apache Spark provides a single, unifying platform for which three of the following types of operations? E graph operations (Please select the THREE that apply) F ACID transactions A Scala B Python C Java D .NET Which three programming languages are directly supported by Apache Spark? E C# (Please select the THREE that apply) F C++ A Map -> Split -> Reduce -> Combine

c

a

c

d

c

b

c

bce

a / c/b

B Map -> Combine -> Shuffle -> Reduce C Map -> Combine -> Reduce -> Shuffle Under the MapReduce v1 programming model, which shows the proper order of the full set of MapReduce phases? D Split -> Map -> Combine -> Reduce

b

A ResourceManager B JobMaster C ApplicationMaster Under the YARN/MRv2 framework, which daemon is tasked with negotiating with the NodeManager(s) D TaskManager to execute and monitor tasks? A Collector B Source C Stream What is the final agent in a Flume chain named? D Agent A Availability B Authorization What are two security features Apache Ranger provides? C Authentication (Please select the TWO that apply) D Auditing A Worker nodes store results on their own local file systems. B Data is aggregated by worker nodes. C Worker nodes process pieces in parallel. Under the MapReduce v1 programming model, what happens in a "Reduce" step? D Input is split into pieces. A MapReduce B Spark C HBase Which Apache Hadoop component can potentially replace an RDBMS as a large Hadoop datastoreD and Ambari is particularly good for "sparse data"? A RCFile B SequenceFiles C Flat Which data encoding format supports exact storage of all data in binary representations such as VARBINARY D Parquet columns? A One time export and import of a database. B An application evaluating sensor data in real-time. C A system that stores many records in a database. Which statement describes an example of an application using streaming data? D A web application that supports 10,000 users. A Scalability B Resource utilization C TaskTrackers can be a bottleneck to MapReduce jobs What are two primary limitations of MapReduce v1? D Number of TaskTrackers limited to 1,000 (Please select the TWO that apply E Workloads limited to MapReduce A RAM B network C CPU Which component of an Hadoop system is the primary cause of poor performance? D disk latency A Partial failure of the nodes during execution. B Finding a particular node within the cluster. What are two common issues in distributed systems? C Reduced performance when compared to a single server. (Please select the TWO that apply) D Distributed systems are harder to scale up. A Ambari Metrics System B Ambari Alert Framework C Ambari Server Which component of the Apache Ambari architecture provides statistical data to the dashboard about D Ambari the performance Wizard of a Hadoop cluster?

c

a

bd

b

c

b

b

ab

d

ab

a

A Impersonation B Grant/Revoke privileges C Fluid query Which Big SQL feature allows users to join a Hadoop data set to data in external databases? D Integration A Data source B User mapping C Nickname When connecting to an external database in a federation, you need to use the correct database driver D Wrapper and protocol. What is this federation component called in Big SQL? A Parsing and loading data into a notebook. B Autoconfiguring data connections using a registry. C Extending the core language with shortcuts. What is a "magic" command used for in Jupyter? D Running common statistical analyses. A RAID-0 B Online Transactional Processing C Parallel Processing Which computing technology provides Hadoop's high performance D Online Analytical Processing A ScheduleManager B ResourceManager C ApplicationMaster Under the YARN/MRv2 framework, the Scheduler and ApplicationsManager are components of which D TaskManager daemon? A large number of small data files B solid state disks C immediate failover of failed disks D high-speed networking between nodes which two factors in a Hadoop cluster increase performance most significantly? E data redundancy on management nodes (Please select the TWO that apply) F parallel reading of large data files ok A MapReduce B YARN C HDFS Which component of the Hortonworks Data Platform (HDP) is the architectural center of HadoopDand Hbase provides resource management and a central platform for Hadoop applications? A Ambari Alert Framework B Ambari Wizard C Ambari Metrics System If a Hadoop node goes down, which Ambari component will notify the Administrator? D REST API A Code B Kernel C Output Which type of cell can be used to document and comment on a process in a Jupyter notebook? D Markdown A Notebooks can be used by multiple people at the same time. B Users must authenticate before using a notebook. C Notebooks can be connected to big data engines such as Spark. Which is an advantage that Zeppelin holds over Jupyter? D Zeppelin is able to use the R language. A Jaql B Fault tolerance through HDFS replication C Adaptive MapReduce Which capability does IBM BigInsights add to enrich Hadoop? D Parallel computing on commodity servers

c

d

c

c

b

d/F

b

a

d

a

c

A value B volume C verifiability What is one of the four characteristics of Big Data? D volatility b A Hadoop Common B Hadoop HBase C MapReduce Which Hadoop-related project provides common utilities and libraries that support other HadoopDsub BigTable projects? a A Federated Discovery and Navigation B Text Analysis C Stream Computing Which type of Big Data analysis involves the processing of extremely large volumes of constantly moving D MapReduce data that is impractical to store? c A 64-bit architecture B disk latency C MIPS Which primary computing bottleneck of modern computers is addressed by Hadoop? D limited disk capacity b A stream computing B data warehousing C analytics Which Big Data function improves the decision-making capabilities of organizations by enabling the D distributed organizations fileto system interpret and evaluate structured and unstructured data in search of valuable c business information? A HBase B Apache C Jaql What is one of the two technologies that Hadoop uses as its foundation? D MapReduce d A a high throughput, shared file system B high availability of the NameNode C data access performed by an RDBMS What key feature does HDFS 2.0 provide that HDFS does not? D random access to data in the cluster b A LOAD B JOIN C TOP What are two of the core operators that can be used in a Jaql query? (Select two.) D SELECT bc A SQL-like B compiled language C object oriented Which type of language is Pig? D data flow d A hdfs.conf B hadoop-configuration.xml C hadoop.conf If you need to change the replication factor or increase the default storage block size, which file do D you hdfs-site.xml need to modify? d A The file(s) must be stored on the local file system where the map reduce job was developed. B The file(s) must be stored in HDFS or GPFS. C The file(s) must be stored on the JobTracker. To run a MapReduce job on the BigInsights cluster, which statement about the input file(s) must be D No true? matter where the input files are before, they will be automatically copied to where the b job runs. A operating system independence B posix compliance C no single point of failure What is a characteristic of IBM GPFS that distinguishes it from other distributed file systems? D blocks that are stored on different nodes b

A Pig is used for creating MapReduce programs. B Pig has a shelf interface for executing commands. C Pig is not designed for random reads/writes or low-latency queries. Which statement represents a difference between Pig and Hive? D Pig uses Load, Transform, and Store. d A hdfs -dir mydata B hadoop fs -mkdir mydata C hadoop fs -dir mydata Which command helps you create a directory called mydata on HDFS? D mkdir mydata b A Reduce B Shuffle C Combine In which step of a MapReduce job is the output stored on the local disk? D Map d A Worker nodes process individual data segments in parallel. B Worker nodes store results in the local file system. C Input data is split into smaller pieces. Under the MapReduce programming model, which task is performed by the Reduce step? D Data is aggregated by worker nodes. d A Reducer B JobScheduler C TaskTracker Which element of the MapReduce architecture runs map and reduce jobs? D JobTracker c A spread data across a cluster of computers B provide structure to unstructured or semi-structured data C increase storage capacity through advanced compression algorithms What is one of the two driving principles of MapReduce? D provide a platform for highly efficient transaction processing a A Cluster B Distributed C Remote D Debugging When running a MapReduce job from Eclipse, which BigInsights execution models are available? (Select ae E Localtwo.) A The number of reducers is always equal to the number of mappers. B The number of mappers and reducers can be configured by modifying the mapred-site.xml file. C The number of mappers and reducers is decided by the NameNode. Which statement is true regarding the number of mappers and reducers configured in a cluster? D The number of mappers must be equal to the number of nodes in a cluster. b A hadoop size B hdfs -du C hdfs fs size Which command displays the sizes of files and directories contained in the given directory, or the Dlength hadoop of afsfile, -duin case it is just a file? d A three B two C one Following the most common HDFS replica placement policy, when the replication factor is three, how D none many replicas will be located on the local rack? c A copies Job Resources to the shared file system B coordinates the job execution C executes the map and reduce functions In the MapReduce processing model, what is the main function performed by the JobTracker? D assigns tasks to each cluster node b

A Both are data flow languages. B Both require schema. C Both use Jaql query language. How are Pig and Jaql query languages similar? D Both are developed primarily by IBM. a A to manage storage attached to nodes B to coordinate MapReduce jobs C to regulate client access to files Under the HDFS architecture, what is one purpose of the NameNode? D to periodically report status to DataNode c A hadoop fs list B hdfs root C hadoop fs -Is / Which command should be used to list the contents of the root directory in HDFS? D hdfs list / c A runs map and reduce tasks B keeps the work physically close to the data C reports status of DataNodes What is one function of the JobTracker in MapReduce? D manages storage b A built-in UDFs and indexing B platform-specific SQL libraries C an RDBMS such as DB2 or MySQL In addition to the high-level language Pig Latin, what is a primary component of the Apache Pig platform? D runtime environment d A Data is accessed through MapReduce. B Data is designed for random access read/write. C Data can be processed over long distances without a decrease in performance. Which statement is true about Hadoop Distributed File System (HDFS)? D Data can be created, updated and deleted. a A managing customer information in a CRM database B sentiment analytics from social media blogs C product cost analysis from accounting systems Which is a use-case for Text Analytics? D health insurance cost/benefit analysis from payroll data b A BigSheets client B Microsoft Excel C Eclipse Which tool is used to access BigSheets? D Web Browser d A Hive metastore B RDBMS C MapReduce Which technology does Big SQL utilize for access to shared catalogs? D HCatalog a A display view B return view C output view Which statement will make an AQL view have content displayed? D export view c A Text Analytics B Stream Computing C Data Warehousing You work for a hosting company that has data centers spread across North America. You are trying D Temporal to resolveAnalysis a critical performance problem in which a large number of web servers are performing a far below expectations. You know that the info A Thrift client B Hive shell C Hive SQL client Which utility provides a command-line interface for Hive? D Hive Eclipse plugin b

A It is a data flow language for structured data based on Ansi-SQL. B It is a distributed file system that replicates data across a cluster. C It is an open source implementation of Google's BigTable. What is an accurate description of HBase? D It is a database schema for unstructured Big Data. c A BigSQL B BigSheets C Avro Which Hadoop-related technology provides a user-friendly interface, which enables business users D HBase to easily analyze Big Data? b A Text Analytics is the most common way to derive value from Big Data. B MapReduce is unable to process unstructured text. C Data warehouses contain potentially valuable information. What drives the demand for Text Analytics? D Most of the world's data is in unstructured or semi-structured text. d A An external table refers an existing location outside the warehouse directory. B An external table refers to a table that cannot be dropped. C An external table refers to the data from a remote database. In Hive, what is the difference between an external table and a Hive managed table? D An external table refers to the data stored on the local file system. a A It provides all the capabilities of an RDBMS plus the ability to manage Big Data. B It is a database technology that does not use the traditional relational model. C It is based on the highly scalable Google Compute Engine. Which statement about NoSQL is true? D It is an IBM project designed to enable DB2 to manage Big Data. b A "Copy" to create a new sheet with the other workbook data in the current workbook B "Group" to bring together the two workbooks C "Load" to create a new sheet with the other workbook data in the current workbook If you need to JOIN data from two workbooks, which operation should be performed beforehand? D "Add" to add the other workbook data to the current workbook c A to get detailed information about the table B to view data in an Hbase table C to report any inconsistencies in the database What is the "scan" command used for in HBase? D to list all tables in Hbase c A Eclipse with BigInsights tools for Eclipse plugin B BigInsights Console with AQL plugin C AQLBuilder Which tool is used for developing a BigInsights Text Analytics extractor? D AQL command line a A Pre-create regions by specifying splits in create table command and use the insert command to load data. B Pre-create regions by specifying splits in create table command and bulk loading the data. C Pre-create the column families when creating the table and use the put command to load the data. What is the most efficient way to load 700MB of data when you create a new HBase table? D Pre-create the column families when creating the table and bulk loading the data. b The following sequence of commands is executed: create 'table_1','column_family1','column_family2' put 'table_1','row1','column_family1:c11','r1v11' put 'table_1','row2','column_family1:c12','r1v12' put 'table_1','row2','column_family2:c21','r1v21' A4 put 'table_1','row3','column_family1:d11','r1v11' B3 put 'table_1','row2','column_family1:d12','r1v12' C6 put 'table_1','row2','column_family2:d21','r1v21' D2 b In HBase, which value will the "count 'table_1'" command return? A TRANSFORM B SELECT C GET Which Hive command is used to query a table? D EXPAND b

A because SQL enhances query performance B because the MapReduce Java API is sometimes difficult to use C because data stored in a Hadoop cluster lends itself to structured SQL queries Why develop SQL-based query languages that can access Hadoop data sets? D because the data stored in Hadoop is always structured A It allows Hadoop to apply the schema-on-ingest model to unstructured Big Data. B It allows an RDBMS to maintain referential integrity on a Hadoop data set. C It allows customers to leverage high-end server platforms to manage Big Data. Which key benefit does NoSQL provide? D It can cost-effectively manage data sets too large for traditional RDBMS. A Hadoop data is highly structured. B Data is in many formats. C Data is located on a distributed file system. What makes SQL access to Hadoop data difficult? D Hadoop requires pre-defined schema. A list tables B describe tables C show all Which command can be used in Hive to list the tables available in a database/schema? D show tables A to count the number of columns of a table B to count the number of column families of a table C to count the number of rows in a table In HBase, what is the "count" command used for? D to count the number of regions of a table A HBase B Pig C Jaql Which Hadoop-related technology supports analysis of large datasets stored in HDFS using an SQL-like D Hivequery language? A They need to be marked as "Shared." B They need to be copied under the user home directory. C They need to be deployed with proper privileges. How can the applications published to BigInsights Web Console be made available for users to execute? D They need to be linked with the master application. A Eclipse B Oozie C Jaql Which component of Apache Hadoop is used for scheduling and running workflow jobs? D Task Launcher A validater B replicater C crawler What is one of the main components of Watson Explorer (InfoSphere Data Explorer)? D compressor A analyze and react to data in motion before it is stored B find and analyze historical stream data stored on disk C analyze and summarize product sentiments posted to social media IBM InfoSphere Streams is designed to accomplish which Big Data function? D execute ad-hoc queries against a Hadoop-based data warehouse A InfoSphere Information Server B InfoSphere Streams C InfoSphere BigInsights Which IBM Big Data solution provides low-latency analytics for processing data-in-motion? D PureData for Analytics A Avro B HBase C Eclipse Which IBM tool enables BigInsights users to develop, test and publish BigInsights applications? D BigInsights Applications Catalog

b

d

b

d

c

d

c

b

c

c

b

c

Which description identifies the real value of Big Data and Analytics?

A enabling customers to efficiently index and access large volumes of data B gaining new insight through the capabilities of the world's interconnected intelligence C providing solutions to help customers manage and grow large database systems D using modern technology to efficiently store the massive amounts of data generated by social b networks Sqoop

Which Hadoop ecosystem tool can import data into a Hadoop cluster from a DB2, MySQL, or other databases?

HBase Which NoSQL datastore type began as an implementation of Google's BigTable that can store any type of data and scale to many petabytes? A

Which computing technology provides Hadoop's high performance?

B C D

Parallel Processing Online Analytical Processing Online Transactional Processing RAID-0

a