Data Engineering with AWS

book-image

This is a great book for understanding and implementing the lake house architecture to integrate your Data Lake with your warehouse. It shows you all the steps you need to orchestrate your data pipeline. From architecture, ingestion, and processing to running queries in your data warehouse, I really like the very hands-on approach that shows you how you can immediately implement the topics in your AWS account Andreas Kretz, CEO, Learn Data Engineering

Data Science on AWS

book-image

With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level up your skills. This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance.

Serverless Analytics with Amazon Athena

book-image

This book begins with an overview of the serverless analytics experience offered by Athena and teaches you how to build and tune an S3 Data Lake using Athena, including how to structure your tables using open-source file formats like Parquet. You willl learn how to build, secure, and connect to a data lake with Athena and Lake Formation. Next, you will cover key tasks such as ad hoc data analysis, working with ETL pipelines, monitoring and alerting KPI breaches using CloudWatch Metrics, running customizable connectors with AWS Lambda, and more. Moving on, you will work through easy integrations, troubleshooting and tuning common Athena issues, and the most common reasons for query failure.You will also review tips to help diagnose and correct failing queries in your pursuit of operational excellence.Finally, you will explore advanced concepts such as Athena Query Federation and Athena ML to generate powerful insights without needing to touch a single server.

Serverless ETL and Analytics with AWS Glue

book-image

Beginning with AWS Glue basics, this book teaches you how to perform various aspects of data analysis such as ad hoc queries, data visualization, and real time analysis using this service. It also provides a walk-through of CI/CD for AWS Glue and how to shift left on quality using automated regression tests. You will find out how data security aspects such as access control, encryption, auditing, and networking are implemented, as well as getting to grips with useful techniques such as picking the right file format, compression, partitioning, and bucketing.As you advance, you will discover AWS Glue features such as crawlers, Lake Formation, governed tables, lineage, DataBrew, Glue Studio, and custom connectors. The concluding chapters help you to understand various performance tuning, troubleshooting, and monitoring options.

Simplify Big Data Analytics with Amazon EMR

book-image

Amazon EMR, formerly Amazon Elastic MapReduce, provides a managed Hadoop cluster in Amazon Web Services (AWS) that you can use to implement batch or streaming data pipelines. By gaining expertise in Amazon EMR, you can design and implement data analytics pipelines with persistent or transient EMR clusters in AWS.This book is a practical guide to Amazon EMR for building data pipelines. You will start by understanding the Amazon EMR architecture, cluster nodes, features, and deployment options, along with their pricing. Next, the book covers the various big data applications that EMR supports. You will then focus on the advanced configuration of EMR applications, hardware, networking, security, troubleshooting, logging, and the different SDKs and APIs it provides. Later chapters will show you how to implement common Amazon EMR use cases, including batch ETL with Spark, real time streaming with Spark Streaming, and handling UPSERT in S3 Data Lake with Apache Hudi. Finally, you will orchestrate your EMR jobs and strategize on premises Hadoop cluster migration to EMR. In addition to this, you will explore best practices and cost optimization techniques while implementing your data analytics pipeline in EMR

Scalable Data Streaming with Amazon Kinesis

book-image

Amazon Kinesis is a collection of secure, serverless, durable, and highly available purpose built data streaming services. This data streaming service provides APIs and client SDKs that enable you to produce and consume data at scale. Scalable Data Streaming with Amazon Kinesis begins with a quick overview of the core concepts of data streams, along with the essentials of the AWS Kinesis landscape. You will then explore the requirements of the use case shown through the book to help you get started and cover the key pain points encountered in the data stream life cycle. As you advance, you will get to grips with the architectural components of Kinesis, understand how they are configured to build data pipelines, and delve into the applications that connect to them for consumption and processing. You will also build a Kinesis data pipeline from scratch and learn how to implement and apply practical solutions. Moving on, you will learn how to configure Kinesis on a cloud platform. Finally, you will learn how other AWS services can be integrated into Kinesis. These services include Redshift, Dynamo Database, AWS S3, Elastic Search, and third-party applications such as Splunk.

Actionable Insights with Amazon QuickSight

book-image

Amazon Quicksight is an exciting new visualization that rivals PowerBI and Tableau, bringing several exciting features to the table but sadly, there are not many resources out there that can help you learn the ropes. This book seeks to remedy that with the help of an AWS certified expert who will help you leverage its full capabilities. After learning QuickSight is fundamental concepts and how to configure data sources, you will be introduced to the main analysis-building functionality of QuickSight to develop visuals and dashboards, and explore how to develop and share interactive dashboards with parameters and on screen controls. You will dive into advanced filtering options with URL actions before learning how to set up alerts and scheduled reports.

Amazon Redshift Cookbook

book-image

Amazon Redshift is a fully managed, petabyte-scale AWS cloud data warehousing service. It enables you to build new data warehouse workloads on AWS and migrate on-premises traditional data warehousing platforms to Redshift. This book on Amazon Redshift starts by focusing on Redshift architecture, showing you how to perform database administration tasks on Redshift.You will then learn how to optimize your data warehouse to quickly execute complex analytic queries against very large datasets. Because of the massive amount of data involved in data warehousing, designing your database for analytical processing lets you take full advantage of Redshifts columnar architecture and managed services.As you advance, you will discover how to deploy fully automated and highly scalable extract, transform, and load (ETL) processes, which help minimize the operational efforts that you have to invest in managing regular ETL pipelines and ensure the timely and accurate refreshing of your data warehouse. Finally, you will gain a clear understanding of Redshift use cases, data ingestion, data management, security, and scaling so that you can build a scalable data warehouse platform.

Automated Machine Learning on AWS

book-image

Automated Machine Learning on AWS begins with a quick overview of what the machine learning pipeline/process looks like and highlights the typical challenges that you may face when building a pipeline. Throughout the book, you'll become well versed with various AWS solutions such as Amazon SageMaker Autopilot, AutoGluon, and AWS Step Functions to automate an end-to-end ML process with the help of hands-on examples. The book will show you how to build, monitor, and execute a CI/CD pipeline for the ML process and how the various CI/CD services within AWS can be applied to a use case with the Cloud Development Kit (CDK). You'll understand what a data-centric ML process is by working with the Amazon Managed Services for Apache Airflow and then build a managed Airflow environment. You'll also cover the key success criteria for an MLSDLC implementation and the process of creating a self-mutating CI/CD pipeline using AWS CDK from the perspective of the platform engineering team

Kubernetes Up and Running

book-image

In just five years, Kubernetes has radically changed the way developers and ops personnel build, deploy, and maintain applications in the cloud. With this book is updated third edition, you will learn how this popular container orchestrator can help your company achieve new levels of velocity, agility, reliability, and efficiency whether you are new to distributed systems or have been deploying cloud native apps for some time.

Getting Started with Containerization

book-image

Kubernetes is an open source orchestration platform for managing containers in a cluster environment. This Learning Path introduces you to the world of containerization, in addition to providing you with an overview of Docker fundamentals. As you progress, you will be able to understand how Kubernetes works with containers. Starting with creating Kubernetes clusters and running applications with proper authentication and authorization, you will learn how to create high- availability Kubernetes clusters on Amazon Web Services(AWS), and also learn how to use kubeconfig to manage different clusters.Whether it is learning about Docker containers and Docker Compose, or building a continuous delivery pipeline for your application, this Learning Path will equip you with all the right tools and techniques to get started with containerization.

Production Kubernetes

book-image

Kubernetes has become the dominant container orchestrator, but many organizations that have recently adopted this system are still struggling to run actual production workloads. In this practical book, four software engineers from VMware bring their shared experiences running Kubernetes in production and provide insight on key challenges and best practices. The brilliance of Kubernetes is how configurable and extensible the system is, from pluggable runtimes to storage integrations. For platform engineers, software developers, infosec, network engineers, storage engineers, and others, this book examines how the path to success with Kubernetes involves a variety of technology, pattern, and abstraction considerations.

Practical Vim

book-image

Vim is a fast and efficient text editor that will make you a faster and more efficient developer. It's available on almost every OS, and if you master the techniques in this book, you will never need another text editor. In more than 120 Vim tips, you will quickly learn the editor's core functionality and tackle your trickiest editing and writing tasks. This beloved bestseller has been revised and updated to Vim 8 and includes three brand-new tips and five fully revised tips.

CSS In Depth

book-image

CSS in Depth exposes you to a world of CSS techniques that range from clever to mind-blowing. This instantly useful book is packed with creative examples and powerful best practices that will sharpen your technical skills and inspire your sense of design.

Effective Typescript

book-image

TypeScript is a typed superset of JavaScript with the potential to solve many of the headaches for which JavaScript is famous. But TypeScript has a learning curve of its own, and understanding how to use it effectively can take time. This book guides you through 62 specific ways to improve your use of TypeScript

Unix and Linux System Administration Handbook

book-image

UNIX and Linux System Administration Handbook, Fifth Edition, is today definitive guide to installing, configuring, and maintaining any UNIX or Linux system, including systems that supply core Internet and cloud infrastructure. Updated for new distributions and cloud environments, this comprehensive guide covers best practices for every facet of system administration, including storage management, network design and administration, security, web hosting, automation, configuration management, performance analysis, virtualization, DNS, security, and the management of IT service organizations. The authors―world-class, hands-on technologists―offer indispensable new coverage of cloud platforms, the DevOps philosophy, continuous deployment, containerization, monitoring, and many other essential topics.Whatever your role in running systems and networks built on UNIX or Linux, this conversational, well-written ¿guide will improve your efficiency and help solve your knottiest problems.

Computer Organization and Design

book-image

Computer Organization and Design, Fifth Edition, is the latest update to the classic introduction to computer organization. The text now contains new examples and material highlighting the emergence of mobile computing and the cloud. It explores this generational change with updated content featuring tablet computers, cloud infrastructure, and the ARM (mobile computing devices) and x86 (cloud computing) architectures. The book uses a MIPS processor core to present the fundamentals of hardware technologies, assembly language, computer arithmetic, pipelining, memory hierarchies and I/Because an understanding of modern hardware is essential to achieving good performance and energy efficiency, this edition adds a new concrete example, Going Faster, used throughout the text to demonstrate extremely effective optimization techniques. There is also a new discussion of the Eight Great Ideas of computer architecture. Parallelism is examined in depth with examples and content highlighting parallel hardware and software topics. The book features the Intel Core i7, ARM Cortex A8 and NVIDIA Fermi GPU as real world examples, along with a full set of updated and improved exercises.

Database Systems The Complete Book

book-image

Database Systems: The Complete Book is ideal for Database Systems and Database Design and Application courses offered at the junior, senior and graduate levels in Computer Science departments. A basic understanding of algebraic expressions and laws, logic, basic data structure, OOP concepts, and programming environments is implied. Written by well-known computer scientists, this introduction to database systems offers a comprehensive approach, focusing on database design, database use, and implementation of database applications and database management systems.