Hadoop Admin Course Introduction
Who is a Hadoop Administrator?
As the name suggests, a Hadoop Administrator is one who administers and manages hadoop clusters and all other resources in the entire Hadoop ecosystem. A hadoop admin’s job is not visible to other IT groups or end users. The role of a Hadoop Admin is mainly associated with tasks that involve installing and monitoring hadoop clusters. Hadoop Admin job responsibilities might include some mundane tasks, but each one in important for the efficient and continued operation of Hadoop clusters, to prevent problems and to enhance the overall performance. A hadoop admin is the person responsible for keeping the company’s hadoop clusters safe and running efficiently.
Hadoop Admin Job Roles and Responsibilities
Managing big data and hadoop clusters presents various challenges to hadoop admin’s with running test data through a couple of machines. Many a times, organizational deployments of Hadoop fail as the administrators try to replicate the processes and procedures tested on 1 or 2 different machines across more complex hadoop clusters. Hadoop Admins itself is a title that covers lot of various niches in the big data world : depending on the size of the company they work for, hadoop administrator might also be involved with performing DBA like tasks with HBase and Hive databases, security administration , and cluster administration. Instead of trying to put a hadoop admin in a pigeonhole, it is useful to take a look at what day to day activities a Hadoop Admin do –
- Chapter 1 – What is ICH GCP
- The typical responsibilities of a Hadoop admin include – deploying a hadoop cluster, maintaining a hadoop cluster, adding and removing nodes using cluster monitoring tools like Ganglia Nagios or Cloudera Manager, configuring the NameNode high availability and keeping a track of all the running hadoop jobs.
- Implementing, managing and administering the overall hadoop infrastructure.
- Takes care of the day-to-day running of Hadoop clusters
- A hadoop administrator will have to work closely with the database team, network team, BI team and application teams to make sure that all the big data applications are highly available and performing as expected.
- If working with open source Apache Distribution then hadoop admins have to manually setup all the configurations- Core-Site, HDFS-Site, YARN-Site and Map Red-Site. However, when working with popular hadoop distribution like Hortonworks, Cloudera or MapR the configuration files are setup on startup and the hadoop admin need not configure them manually.
- Hadoop admin is responsible for capacity planning and estimating the requirements for lowering or increasing the capacity of the hadoop cluster.
- Hadoop admin is also responsible for deciding the size of the hadoop cluster based on the data to be stored in HDFS.
- Ensure that the hadoop cluster is up and running all the time.
- Monitoring the cluster connectivity and performance.
- Manage and review Hadoop log files.
- Backup and recovery tasks
- Resource and security management
- Troubleshooting application errors and ensuring that they do not occur again.
Hadoop Admin Online Training Course Content
- Chapter 1: Getting Started with Apache Hadoop
- Chapter 2: HDFS and MapReduce
- Chapter 3: Cloudera's Distribution Including Apache Hadoop
- Chapter 4: Exploring HDFS Federation and Its High Availability
- Chapter 5: Using Cloudera Manager
- Chapter 6: Implementing Security Using Kerberos
- Chapter 7: Managing an Apache Hadoop Cluster
- Chapter 8: Cluster Monitoring Using Events and Alerts