Apache Spark ve Scala Eğitimi
With Apache Spark and Scala certification training you would advance your expertise in Big Data Hadoop Ecosystem.
With this Apache Spark certification you will master the essential skills such as Spark Streaming, Spark SQL, Machine Learning Programming, GraphX Programming, Shell Scripting Spark.
And with real life industry project coupled with 30 demos you would be ready to take up Hadoop developer job requiring Apache Spark expertise.
With Certification in Apache Spark and Scala training, you will be able to :
Get clear understanding of the limitations of MapReduce and role of Spark in overcoming these limitations
Understand fundamentals of Scala Programming Language and it’s features
Explain & master the process of installing Spark as a standalone cluster
Expertise in using RDD for creating applications in Spark
Mastering SQL queries using SparkSQL
Gain thorough understanding of Spark Streaming features
Master & describe the features of Spark ML Programming and GraphX Programming
Who should do this course?
Professionals aspiring for a career in field of real time Big data analytics
Analytics professionals
Research professionals
IT developers and testers
Data scientists
BI and reporting professionals
Students who wish to gain a thorough understanding of Apache Spark
What projects are included in this course?
US based university has collected datasets which represent reviews of movies from multiple reviewers as a part of Research Project. To gain in depth insights from research data collected you have to perform a series of tasks in Spark on the data set provided.
Apache Spark and Scala Certification Course Agenda
Lesson 1: Course Preview
Course overview
Objectives
Lesson 2: Introduction to Spark
Limitations of MapReduce in Hadoop Objectives
Batch vs. Real-time analytics
Application of stream processing
How to install Spark
Spark vs. Hadoop Eco-system
Lesson 3: Introduction to Programming in Scala
Features of Scala
Basic data types and literals used
List the operators and methods used in Scala
Concepts of Scala
Lesson 4: Using RDD for Creating Applications in Spark
Features of RDDs
How to create RDDs
RDD operations and methods
How to run a Spark project with SBT
Explain RDD functions and describe how to write different codes in Scala
Lesson 5: Running SQL queries Using SparkSQL
Explain the importance and features of SparkSQL
Describe methods to convert RDDs to DataFrames
Explain concepts of SparkSQL
Describe the concept of hive integration
Lesson 6: Spark Streaming
Explain a concepts of Spark Streaming
Describe basic and advanced sources
Explain how stateful operations work
Explain window and join operations
Lesson 7: Spark ML Programming
Explain the use cases and techniques of Machine Learning (ML)
Describe the key concepts of Spark ML
Explain the concept of an ML Dataset, and ML algorithm, model selection via cross validation
Lesson 8: Spark GraphX Programming
Explain the key concepts of Spark GraphX programming
Limitations of the Graph Parallel system
Describe the operations with a graph
Graph system optimizations
OFFICIAL SPARK SCALA RESOURCES
Programming Guides:
- Quick Start: a quick introduction to the Spark API; start here!
- Spark Programming Guide: detailed overview of Spark in all supported languages (Scala, Java, Python, R)
- Modules built on Spark:
- Spark Streaming: processing real-time data streams
- Spark SQL, Datasets, and DataFrames: support for structured data and relational queries
- MLlib: built-in machine learning library
- GraphX: Spark’s new API for graph processing
API Docs:
- Spark Scala API (Scaladoc)
- Spark Java API (Javadoc)
- Spark Python API (Sphinx)
- Spark R API (Roxygen2)
Deployment Guides:
- Cluster Overview: overview of concepts and components when running on a cluster
- Submitting Applications: packaging and deploying applications
- Deployment modes:
- Amazon EC2: scripts that let you launch a cluster on EC2 in about 5 minutes
- Standalone Deploy Mode: launch a standalone cluster quickly without a third-party cluster manager
- Mesos: deploy a private cluster using Apache Mesos
- YARN: deploy Spark on top of Hadoop NextGen (YARN)
- Kubernetes (experimental): deploy Spark on top of Kubernetes
Other Documents:
- Configuration: customize Spark via its configuration system
- Monitoring: track the behavior of your applications
- Tuning Guide: best practices to optimize performance and memory use
- Job Scheduling: scheduling resources across and within Spark applications
- Security: Spark security support
- Hardware Provisioning: recommendations for cluster hardware
- Integration with other storage systems:
- Building Spark: build Spark using the Maven system
- Contributing to Spark
- Third Party Projects: related third party Spark projects
External Resources:
- Spark Homepage
- Spark Community resources, including local meetups
- StackOverflow tag
apache-spark
- Mailing Lists: ask questions about Spark here
- AMP Camps: a series of training camps at UC Berkeley that featured talks and exercises about Spark, Spark Streaming, Mesos, and more. Videos, slides and exercises are available online for free.
- Code Examples: more are also available in the
examples
subfolder of Spark (Scala, Java, Python, R)