Hadoop Interview Questions and Answers for beginners

What do you mean by Hadoop?

Hadoop is an open source software framework for processing and storing big size data in a distributed fashion on large clusters of commodity hardware.

What do you mean by Hadoop framework?

A Hadoop is free Java based programming framework which supports the processing of big data sets in a distributed computing environment.

How many most common Input Formats defined in Hadoop?

The tw omost common Input Formats defined in Hadoop are:

  • KeyValueInputFormat
  • TextInputFormat
  • SequenceFileInputFormat

What do you mean by Hadoop MapReduce?

Hadoop MapReduce framework is used for processing large data sets in parallel across a hadoop cluster. The two step map and reduces process uses by data analysis.

What is the NameNode in Hadoop?

The NameNode in Hadoop is the node, where Hadoop stores all the file location information in Hadoop Distributed File System (HDFS).  In other words, we can say that NameNode is the centerpiece of a Hadoop Distributed File System.

Can we change the file cached by Distributed Cache in Hadoop?

No, because the DistributedCache tracks the caching with timestamp a cached file should not be changed during the job execution.

What do you mean by Distributed Cache in mapreduce framework?

The distributed cache is a very effective feature provide by the map reduce framework. The Distributed cache can cache archive, text, jars which could be used by application to increase performance. Application give complete information’s of jobconf object to cache.

What is a task tracker in Hadoop?

A task tracker is real component which provides the mapreduce jar on the data nodes and responsible to execute the task given mapreduce. It continuously executes the task and send updated reports to job tracker.

What is a job tracker in Hadoop?

Job tracker provides background services which executed on the name node for submitting and tracking a job. A job in Hadoop technology refers to mapreduce jobs.

How many the number of modes that supported by Hadoop?

The three modes in which Hadoop can be used:

  • Fully distributed mode
  • Standalone mode
  • Pseudo-distributed mode

How may Daemon processes run on a Hadoop system?

A Hadoop is a comparison of five different daemons. Every daemon runs in own Java Virtual Machine (JVM). Following three Daemon are:

  • Master nodes NameNode
  • Secondary NameNode
  • Job Tracker