Accelerate your IoT Data Pipeline with AWS

Mark Meersman
Oct 3, 2020
6 min read

Updated: Apr 17, 2022

Over the past several years, organizations have had to move quickly to deploy new data and cloud technologies alongside their legacy infrastructure to drive innovations in IoT (Internet-of-Things), such as real-time telemetry, predictive maintenance, and fully automated controls.

The IoT landscape is evolving rapidly and both Operations and IT leaders are finding new ways to monetize their IoT data and use it to drive faster, more informed decision making. Amazon Web Services (AWS) continues to expand it's lead in IoT as it offers an unmatched combination of scalability, end-to-end security that fully protects data in transit and at rest, affordability and a continuous innovation cycle - with more than 175 (and counting) services that provide solutions for almost any use-case.

For organizations to build a competitive edge with IoT, they will need a new approach to defining, implementing, and integrating their data pipelines. By leveraging AWS and four key IoT concepts, forward leaning organizations can build and deploy modern IoT data pipelines securely, rapidly and cost effectively.

Concept 1: Serverless computing | AWS Lambda | Amazon Athena | AWS Step Functions

Serverless is the native architecture of the cloud that allows you to build and run your IoT data pipeline without thinking about infrastructure. It eliminates infrastructure management tasks such as server or cluster provisioning, patching, operating system maintenance, and capacity provisioning. Everything required to execute and scale your data pipeline with high availability and security is handled for you.

Serverless solutions like Amazon Lambda and Amazon Athena serve as the heart of your IoT data pipeline by instantaneously executing logic on incoming data, such as decryption and normalization, and routing data to the appropriate destination (such as your data warehouse). The key to serverless computing is massively parallel processing power provided by AWS that seamlessly scales up and down to meet demand without ever bogging down and without manual infrastructure management.

Concept 2: Streaming data | AWS IoT Core | AWS Lambda | AWS Step Functions | AWS Kinesis

If Serverless computing is the heart of a modern IoT data pipeline, streaming data is its bloodstream. Streaming data is generated continuously by thousands, or even billions, of connected devices and sent to the cloud in a constant stream of information.

A hallmark of streaming data pipelines are lightweight communication protocols, such as MQTT, that are ideal for situations where bandwidth is limited (such as where IoT devices communicate with the cloud over cellular networks), and processing speed is essential (such as real-time analytics applications where new data is crucial).

Streaming data includes a wide variety of data such as log files generated by customers using mobile or web applications, information from social networks, streaming video, and telemetry from connected devices or instrumentation in data centers.

The type of data is unimportant- what matters is that it is being produced at a high volume and sent to the cloud as soon as the IoT device records it.

Concept 3: Real-time analytics | Amazon QuickSight | Amazon CloudWatch | Qlik Data Analytics Platform | Kibana Open Source

Real-time analytics refers to analytics that can be accessed in near real-time, usually defined as within about 1 minute of being generated by an IoT device. Real-time analytics dramatically improves time-to-value for organizations seeking to leverage their data by making insights immediately available to decision-makers and enabling automated logic that can be executed and sent back to devices to change their behavior without human intervention (for example, hundreds of on-ramp metering lights automatically adjusting their timing based on live traffic flow data).

The concept of real-time analytics does not replace the standard data-based decision making, based on historical information, that most organizations already employ. Instead, it is about enhancing those decisions by continuously integrating new data into your analytic workflow and having that data ready to drive insights as soon as it arrives.

Concept 4: Federated data storage | Amazon Redshift | Amazon S3 | BYOL on Amazon EC2

Data warehouses such as Amazon Redshift enable fast, complex queries across structured historical data. The data structure is defined in advance to optimize for fast queries, where the results are typically used for reporting, and historical and real-time analysis. Data must be cleaned, enriched, and transformed in an ETL process before reaching the data warehouse, where it acts as a “single source of truth” that users can trust.

Data lakes such as Amazon S3 are different because they store all your data- like relational data from line of business applications and non-relational data like streaming video generated by a connected camera- in a minimally processed format. The structure of the data is not defined when it is captured. This means you can store all of your data without careful design or the need to know what questions you might need answers for in the future.

Federated data storage accelerates return on investment by combining a data warehouse and a data lake to federate relational and non-relational data stores into a single, cohesive architecture. This enables new practices that complement the core data warehouse without replacing it, because a data warehouse remains the right platform for the standardized data used for real-time analytics, BI reports, and dashboards. In contrast, the data lake supports newer use cases, such big-data analytics, full-text search and machine learning.

Putting it all together: Architecture of a modern IoT data pipeline

At a basic level, IoT data pipelines are composed of three layers: data ingestion, data processing, and data analytics. Data ingestion is where device data is generated and brought into the cloud. The data processing layer is where your IoT data is acted upon to take it from something raw and turn it into something useable. The analytics layer is where data is visualized and made available to decision-makers, and where the real value of an IoT data pipeline is realized.

Data ingestion relies upon the interplay of connected devices and AWS IoT Core. IoT Core allows you to connect and manage billions of IoT devices and is the gateway to the cloud for IoT device data. Amazon makes IoT device SDK’s available that support all major programming languages. This ensures that whatever legacy technology you are using or what as-yet undeveloped technologies you someday will, all your devices will always be able to seamlessly interact with the AWS cloud through IoT Core.

The data processing layer is where raw IoT data is transformed into something that can be used by your analytics, AI, or machine-learning applications, and secured while in motion and at rest. Serverless solutions like AWS Lambda remove the infrastructure management component of data processing. This decreases overall costs and enables developers to focus on creating logic, not managing infrastructure. The processing layer is also responsible for routing data to the right storage location. For data pipelines that take advantage of federated data storage architecture, structured data is sent to an Amazon S3 data warehouse, and unstructured data sent to an Amazon S3 datalake.

The final layer of the data pipeline is the analytics layer, where data is translated into value. AWS services such as QuickSight and Sagemaker are available as low-cost and quick-to-deploy analytic options perfect for organizations with a relatively small number of expert users who need to access the same data and visualizations over and over. More advanced use-cases, such as organization-wide dashboards that need to be accessed by a wide range of users in support of self-service analytics, typically require a standalone analytics application such as the Qlik Data Analytics Platform (or Tableau, PowerBI and Kibana Open Source).

Linkages between AWS services representing data flow of modern IoT data pipeline. — Architecture of a Modern IoT Data Pipeline

Getting started

Building a game-changing IoT data pipeline in AWS requires a clear strategic vision, deep understanding of your unique business processes, and far-reaching insight into AWS’ ecosystem of services and expertise in how they fit together. IPC Global works hand in hand with our clients to leverage your organizational knowledge and our AWS subject-matter expertise to architect and deploy data architectures and processes that drive business results.

Those IT and Operations leaders who embrace these 4 new concepts (Serverless computing, Streaming data, Real-time analytics, and Federated data storage) will be better positioned to build a competitive edge with IoT, maximize the value of their data, and effectively use data to improve decision-making across their entire organization. To discuss your next game changing IoT project, get in touch with an IPC Global representative and benefit from our 20 years of experience and hundreds of successful AWS deployments to make your vision a reality.

Accelerate your IoT Data Pipeline with AWS

コメント

Ready to see how data analytics can transform your enterprise?

See what else IPC Global has been working on...