Archives 2023

Industrial data analytics – Working with Data and Analytics

We have seen the usage of data analytics in the past two sections and how it can be beneficial for our workloads. Now, let’s look at how it can benefit industry cases and how we can accordingly evaluate our deployments based on the best practices that are set out for us.

Evaluating performance

Use services such as CloudWatch metrics to monitor the performance of the IoT Analytics pipeline, such as the number of messages processed, the time it takes to process each message, and the number of errors that are encountered. This will be critical for use in further analysis and eventual optimization. The following are factors to consider in evaluating performance:

Analyze your data: We can use IoT Analytics SQL or other data analytics tools to identify any patterns or issues that we may need to address if they affect system performance.

Optimize your pipeline: From the analysis of the data, we can optimize the pipeline by adding data normalization, validation, and modeling to improve the performance of the data analytics workloads.

Use best practices: We need to adhere to best practices for data analysis, which includes techniques such as normalization, data validation, and data modeling. For the scope of this book, we will not be covering this, but we encourage you to look up more of these techniques in the Further reading section and read up on the topics listed there.

Usage of third-party monitoring tools: We can utilize third-party monitoring tools to collect and analyze performance metrics for our analytics workload and gain more insights into how our pipeline is performing.

Monitor and track usage of resources: We need to keep an eye on resources such as CPU, memory, and storage that are used by our data analytics workloads, especially if they are consuming more resources than expected. If necessary, we should perform actions such as scaling our workloads up or optimizing the pipelines further.

Having understood how to keep track of performance, we can now review some different use cases of data analysis within industry.

Use cases within industry

Industry has many different use cases for performing data analysis on a myriad of data. Here are just a few prominent examples:

Predictive maintenance: Within this use case, IoT devices are used to collect real-time sensor data that is processed and analyzed using AWS IoT Analytics to detect patterns and accordingly predict when maintenance would be required. This will help organizations schedule maintenance at the required times, reducing downtime and improving the efficiency of equipment.

Smart agriculture: IoT sensors can be used to collect data on soil moisture and temperature, which is then analyzed within AWS IoT Analytics to optimize crop yields, reduce consumption of water, and improve overall farm efficiency.

Smart cities: IoT devices can be used to collect data on various aspects of urban infrastructure such as traffic, air quality, and energy usage. The data can then be analyzed through AWS IoT Analytics where it can then be used to improve traffic flow, reduce pollution, and optimize energy usage to ensure that cities become more sustainable and livable for their residents.

With those use cases in mind, we can now take a look at a case study of a data analytics flow used within a production environment in an industrial setting.

Practical – smart home insights with AWS IoT Analytics – Working with Data and Analytics-2

Click on the target S3 bucket on the canvas and select the format as Parquet. Specify the new Amazon S3 bucket you created (for example, s3://your_bucket_name).

Go to the Job details tab and specify the IAM role you have been using so far. Leave everything else as it is. Rename the script filename to anything you want, as long as it ends with .py (for example, test.py):

Figure 10.3 – Configuring job details for the Glue job

Click Save, and afterward, click Run.

With that, we have appropriately transformed the data as needed.

Use Amazon Athena to query the transformed data.

We can now look at leveraging Amazon Athena to query the data that we have transformed:

  1. Navigate to the Amazon Athena service.
  2. On the sidebar, click on Query editor.
  3. There should be a prompt asking you to select an output location for your queries. Specify an S3 bucket or a folder within a bucket to do so.
  4. In the Athena dashboard, select AWSDataCatalog as the data source and SmartHomeData database (or your predefined values for them).
  5. Run the following query by clicking Run:


Select * from mychannelbucket
6. You should get the full table that you created before. Now, use SQL queries to answer the following questions:

  1. What is the average temperature, humidity, and light intensity for each day of the month?
  2. What is the average temperature, humidity, and light intensity for each hour of the day?
  3. What is the average temperature, humidity, and light intensity for each day of the week?
  4. What is the correlation between temperature and humidity and between temperature and light intensity?
  1. View the query results and save the query results to a new S3 bucket.
    In this practical exercise, we explored IoT data analytics using AWS services such as S3, Glue, and Athena. We loaded a dataset of IoT sensor readings into an S3 bucket, used Glue to transform the data and create a new table with additional columns, used Athena to query the transformed data and generate insights, and used QuickSight to visualize the insights and create a dashboard. Based on the insights generated, we provided recommendations for improving the smart home experience.
    We will now move on to industrial data analytics.

Practical – smart home insights with AWS IoT Analytics – Working with Data and Analytics-1

In this practical exercise, we will explore IoT data analytics using AWS. Specifically, we will use AWS services such as S3, Glue, Athena, and QuickSight to analyze a dataset of IoT sensor readings collected from a smart home over a period of 1 month.

You will need the following software components as part of the practical:

An AWS account (you can create one for free if you don’t have one already) A dataset of IoT sensor readings (you can create a sample dataset or use a publicly available dataset)

Let’s move to the various steps of the practical, as follows:

Download the occupancy detection dataset:

  1. We can obtain a dataset from https://github.com/PacktPublishing/IoT-Made-Easy-for-Beginners/tree/main/Chapter10/analyzing_smart_home_sensor_readings/datatest.csv.
  2. Open the dataset and take note of the fields inside it.

To start off, we will have to load our dataset into an Amazon S3 bucket:

  1. Sign in to your AWS Management Console.
  2. Navigate to the Amazon S3 service.
  3. Click on the Create bucket button. Name the bucket and choose a region. Click Next.
  4. Keep all the default settings in the Configure options page and click Next.
  5. Ensure public access is blocked for security reasons and click Next.
  6. Review your settings and click Create bucket.
  7. Navigate inside your newly created bucket, click on Upload, and drag and drop (or browse to) your datatest.csv file. Once uploaded, click Next.
  8. Keep the default permissions and click Next.
  9. Review the properties and click Upload.

We now will look to create an AWS Glue crawler to traverse our data and create a table in the AWS Glue Data Catalog:

  1. Navigate to the AWS Glue service.
  2. Click on the Crawlers tab under Data Catalog and then click Create crawler.
  3. Name your crawler and click Next.
  4. Select Not yet for the question Is your data already mapped to Glue tables.
  5. Click on Add a data source and choose S3 as the data source. Click Browse S3 and select the bucket you have just created. Click Next.
  6. Choose or create an Identity and Access Management (IAM) role that gives AWS Glue permissions to access your S3 data. Click Next.
  7. For the frequency, you can choose Run on demand. Click Next.
  8. Choose Add database, then name your database (for example, SmartHomeData). Navigate to your newly created database and click on Add table. Name your table (for example, SensorReadings) and select your database. Leave all other settings as they are. Click Next in the current window along with the subsequent ones, up to the window where you click Create to create the table.
  9. Review the configuration and click Create crawler.

With that, we have created an AWS Glue crawler to traverse our data. Now, we can look at transforming our data:

Use AWS Glue to transform the data and create a new table with additional columns:

  1. Navigate to ETL Jobs in the AWS Glue sidebar.
  2. Select Visual with a blank canvas and click on Create.
  3. Name your job on the top left and select or create an IAM role that has the right permissions.
  4. An Add nodes window should pop up. In the Sources tab, click on Amazon S3 to add an Amazon S3 node. Afterward, click on the Transforms tab and click on the Select Fields node. Finally, click on Target and click on Amazon S3.
  5. You should now have three nodes on your canvas. Connect the data source to the Transform – SelectFields node by dragging the black dot at the bottom of the Data source – S3 bucket node to the Select Fields node. Do the same to connect the Select Fields node to the Data target – S3 bucket node:

Figure 10.2 – Visualization of the three nodes on the canvas

Click on the Data Source – S3 bucket node. For the S3 source type, click on the Data Catalog table. Afterward, choose the database that you created. Choose the table that was created.

Afterward, click on Select Fields. Here, choose the field’s temperature and humidity.

We now need to create another S3 bucket for the output. Create a new S3 bucket with whatever name you want for it.

Prometheus – Working with Data and Analytics

Prometheus is an open source monitoring and alerting system that is designed for data collection and analysis. It is appropriate to be used for monitoring and analyzing large numbers of servers alongside other types of infrastructure. It is based on a pull-based model, which means that it will periodically scrape metrics from predefined endpoints. This allows for data collection to be done accurately and efficiently and provides the ability to scale horizontally to accommodate a large number of servers.

Its data model is based on the concept of metrics and labels, allowing for powerful querying and aggregation of data. It also includes a built-in alerting system, allowing alerts to be created based on queries that are made on it. This allows for notifications to be made automatically. It can also be used in conjunction with other data visualization tools such as Grafana to create interactive dashboards that can provide real-time insights into the performance of systems and infrastructure. This is especially useful in the context of IoT deployments, where it is critical to identify and troubleshoot issues.

Designing AWS flow diagrams with data analysis

It is important to understand the steps that have to be taken to design data analysis workloads before going forward with the implementation. The following is a seven-step process for designing these data analysis workloads:

Identify your data sources: You will need to start by identifying the data sources that you will need to collect and analyze. This may include data from your IoT devices, sensor data, log files, and other data sources that may be relevant.

Determine your data storage needs: Decide on what type of data storage you will need to store the data that you have collected from your IoT devices. Services such as S3, DynamoDB, and Kinesis Data Streams can be used for this purpose.

Design a data processing pipeline: Determine how your data will be processed, cleaned, and transformed. You can utilize services such as AWS Data Pipeline or AWS Lambda for this.

Choose the data analysis and visualization tools that you will need: Select appropriate data analysis and visualization tools that best fit your use case. You can use tools such as Amazon QuickSight, AWS IoT Analytics, and Amazon ElasticSearch.

Create a data security and compliance plan: You will then need to design a security and compliance plan to protect your data and ensure that you adhere to relevant regulatory requirements. This may include steps such as data encryption and access controls.

Test and optimize your deployment: You will then need to test the design by running a small pilot and optimizing it accordingly based on the results that you receive. You must then continuously monitor the performance and make any necessary adjustments accordingly.

Deploy and maintain: Finally, you will need to deploy the design within a production environment, ensuring that you continuously monitor and maintain it to prevent any errors and ensure it runs smoothly. This is why monitoring tools such as Amazon CloudWatch are imperative for this use case, as errors in our environment can happen anytime, and we want to be ready to make any adjustments autonomously when possible.

It is important to note that this is not an exhaustive list; there can certainly be more steps based on each user’s use case. However, this already encompasses most data workloads and certainly provides a guideline for how you choose to design your own flows moving forward.

Next, we will look at a practical exercise where we will create and design a data pipeline for for end-to-end data ingestion and analysis, based on the components of AWS IoT Analytics.

Amazon QuickSight – Working with Data and Analytics

Amazon QuickSight is a business intelligence (BI) tool that allows you to easily create, analyze, and visualize data. You can connect to various data sources such as databases and cloud storage, and accordingly create interactive dashboards and reports to gain insights based on the data. This way, you can quickly identify patterns and trends and understand your data to make data-driven decisions. You can also integrate it with other AWS services such as IoT Analytics for more powerful data analysis.

Amazon S3

Amazon S3 is a cloud storage service that allows you to store and retrieve large amounts of data, including photos, files, videos, and more. You can integrate it with other AWS services to create powerful data management and analytics solutions, while also being affordable, providing you scalability as your data storage needs grow.

Amazon CloudWatch

Amazon CloudWatch is a service that allows you to monitor and manage your resources and applications that are based on AWS. You can collect and track metrics, monitor log files, and set alarms that trigger certain actions based on your resources to save you the time of manually doing so. You can also use it to monitor the health of your applications and receive notifications if there are any issues.

Amazon SNS

Amazon Simple Notification Service (SNS) is a messaging service that allows applications to send and receive messages and notifications, allowing a large number of recipients to send and receive messages with just a few clicks. It is widely used for sending notifications, updates, and alerts to users, customers, or other systems. These notifications can be directed to text, email, or other services that you have on AWS.

Now that we understand the various of different services that can be used as part of our data analysis workloads, let’s start looking at third-party services and create data workflows within the cloud that can utilize the services that we have discussed in this section.

Analysis on the cloud and outside

When working with data, it often is necessary to visualize data to gain insights and make informed decisions. With services such as Amazon QuickSight, you are able to do this and create interactive dashboards and reports. However, some organizations’ requirements may necessitate the use of third-party services alongside AWS’ native tools.

Many third-party services can be used. In this section, we will discuss some of these third-party services, alongside how we can start architecting workloads within the cloud for data workflows and quickly ensure that they adhere to best practices, both when creating and evaluating them.

Third-party data services

In this section, we will talk about two different third-party data analytics and visualization services: Datadog and Prometheus.

Datadog

Datadog is a cloud-based monitoring and analytics platform that provides many features for monitoring and troubleshooting many aspects of an organization’s IT infrastructure and applications. It allows for real-time data collection and monitoring from various sources, including servers, containers, and cloud services. Key features include cloud infrastructure monitoring, application performance monitoring, log management trace analysis, and its integration with many services such as AWS, Kubernetes, and Jira, allowing users to collect and analyze data within one location.

AWS IoT Analytics – Working with Data and Analytics

AWS IoT Analytics is a service that is used to collect, process, and analyze data that is obtained from IoT devices. You can process and analyze large datasets from IoT devices with the help of IoT Analytics without the need for complex infrastructure or programming. You can apply mathematical and statistical models to your data to make sense of it and make better decisions accordingly. You can also integrate it with many other services from AWS, such as Amazon S3 or Amazon QuickSight, to perform further analytical and visualization workloads.

The following are components of IoT Analytics that are crucial for you to know that we will be using as we go through our exercise within the next subsection:

Channel: A channel is used to collect data from a select Message Queuing Telemetry Transport (MQTT) topic and archive unprocessed messages before the data is published to the pipeline. You can either use this or send messages to the channel directly through the BatchPutMessage API. Messages that are unprocessed will be stored within an S3 bucket that will be managed either by you or AWS IoT Analytics.

Pipeline: Pipelines consume messages that come from a channel and allow you to process the messages before then storing them within a data store. The pipeline activities then perform the necessary transformations on the messages that you have, such as renaming, adding message attributes, or filtering messages based on attribute values.

Data store: Pipelines then store the processed messages within a data store, which is a repository of messages that can be queried. It is important to make the distinction between this and a database as it is not a database but is more like a temporary repository. Multiple data stores can be provisioned for messages that come from different devices or locations, or you can have them filtered by their message attributes depending on how you configure your pipeline along with its requirements. The data store’s processed messages will also be stored within an S3 bucket that can be managed either by you or AWS IoT Analytics.

Dataset: Data is retrieved from a data store and made into a dataset. IoT Analytics allows you to create a SQL dataset or a container dataset. You can further explore insights in your dataset through integration with Amazon QuickSight or Jupyter Notebook. Jupyter Notebook is an open source web application that allows you to create and share documents containing live code, equations, visualizations, and narrative text, and is often used for data cleaning and transformation, numerical simulation, statistical modeling, data visualization, and ML. You can also send the contents of a dataset to an S3 bucket, allowing you to then enable integration with existing data lakes or in-house applications that you may have to perform further analysis and visualization. You can also send the contents to AWS IoT Events to trigger certain actions if there are failures or changes in operation.

SQL dataset: An SQL dataset is like the view that would be had from a materialized view of an SQL database. You can create SQL datasets by applying an SQL action.

Trigger: A trigger is a component you can specify to create a dataset automatically. It can be a time interval or based on when the content of another dataset has been created.

With an understanding of these components, we can look at other services that we will also come across in our practical exercises.

Introduction to data analysis at scale – Working with Data and Analytics

Data analysis is often done at scale to analyze large sets of data using the capabilities of cloud computing services such as AWS. Designing a workflow for the data analysis to follow is the pivotal starting point for this to be performed. This will follow five main categories: collection, storage, processing, visualization, and data security.

In this section, we will be introducing you to data analysis on AWS, discussing which services we can use as part of AWS to perform the data analytics workloads we need it to, and walking through the best practices that are part of this. We will understand how to design and incorporate workflows into the IoT network that we currently have and work with it to better power our capabilities.

Data analysis on AWS

Data analysis on AWS can be summarized in five main steps. These steps can be seen in the following diagram:

Figure 10.1 – Data analysis workflow on AWS

Let’s look at the steps in more detail:

Collect: In this phase, data is collected from the devices within the environment. Services that are usually in charge of this include AWS IoT Core and AWS IoT Greengrass, which collects the data and ingests it into the cloud.

Process: Data can then be processed according to how the configuration is set up for it. Services such as AWS IoT Analytics are made for this purpose.

Store: Data can then be stored, either temporarily or for long-term storage. This can be done on services such as Amazon Simple Storage Service (S3), Amazon Redshift, and Amazon DocumentDB.

Analyze: Data will then be analyzed. Services such as AWS Glue and Amazon Elastic MapReduce (EMR) can be used for this purpose, while also potentially performing more complex analytics and ML tasks as necessary.

Build: We can then build datasets using this data, making patterns from the processed data that we have received from the workloads that are run.

With that, we have understood the different steps of how a typical data analysis workflow would go at a high level. Now, we can look at the different services in AWS that help facilitate this.

AWS services

Several important services can be used for data processing workloads. These five services are just a few of them, and there are definitely more that can be mentioned and that we encourage you to have a look at. For more information on this, you can refer to the documentation that is linked in the Further reading section at the end of the chapter.

Technical requirements – Working with Data and Analytics

Managing data and performing analytics on it is a crucial aspect of any Internet of Things (IoT) deployment. It allows you to gain insights based on the large amounts of data generated by IoT devices and make appropriate decisions based on data to improve operations, increase efficiency, and reduce costs. With Amazon Web Services (AWS) and other cloud providers, there are a variety of services that you can use to analyze and visualize data that you have obtained from your IoT devices, from simple data storage and retrieval options that you can configure without much difficulty to more complex analytics and machine learning (ML) tools, which you may have to learn and fine-tune, that you perform as part of the analysis.

Often, data analytics is the piece of the puzzle that completes the picture that we are trying to architect with our IoT networks, as even with edge networks in which we process data on the edge nodes to reduce costs, there is usually always further processing and storage that we want to perform when the data reaches the cloud. We want to do so while still optimizing based on the options that we have within AWS and looking further into how we can adhere to best practices within AWS’ Well-Architected Framework to make the best use of our resources. The link to the framework can be found at the end of the chapter.

In this chapter, we’re going to cover the following main topics:

Introduction to data analysis at scale

Analysis on the cloud and outside

Practical – smart home insights with AWS IoT Analytics

Industrial data analytics

Practical – creating a data pipeline for end-to-end data ingestion and analysis

Technical requirements

This chapter will require you to have the following software installed

Arduino IDE

AWS account

We will be running our programs on Python and have a bit of Structured Query Language (SQL) syntax, a standardized programming language used for managing and manipulating relational databases, that we need to use as part of querying data in this chapter; again, don’t worry if you don’t understand some of the code — we will walk you through it and get you down to understanding how each part of the code works in no time.

You can access the GitHub folder for the code that is used in this chapter at https://github.com/PacktPublishing/IoT-Made-Easy-for-Beginners/tree/main/Chapter10.

Monitoring the EC2 Thing when publishing messages – Operating and Monitoring IoT Networks

Now, we can start monitoring how the Thing is doing in publishing messages through Amazon CloudWatch:

Navigate to Services, search for CloudWatch, and click on it.

Click on All Metrics under the Metrics menu in the left pane.

Navigate to IoT –> Protocol Metrics and click on the checkbox for the PublishIn.Success metric. You will see the metrics that have been published successfully being reflected on the graph that is shown on the page.

Hence, you’ve created your first Greengrass solution with monitoring based on it!

Creating an AWS IoT Greengrass group for edge computing is a useful exercise to test and validate different edge computing scenarios. By using Greengrass core components such as Lambda functions, connectors, and machine learning models, you can gain practical experience in developing and deploying edge computing solutions that process and analyze IoT data locally, without the need for cloud connectivity. You can also use the AWS IoT Greengrass dashboard to monitor and manage the Greengrass group and its components, set up alerts and notifications, and troubleshoot issues as they arise.

Now, upload the code to GitHub and see whether you can also answer the following questions, based on your hardware/code for further understanding and practice on the concepts that you have learned through this practical:

Can you also try to connect the data to Prometheus?

Can you recreate a similar setup but with EC2s as the devices?

Important note

When working with different kinds of monitoring tools, concepts will often be similar between one program and the next. This is the reason why we ask you to try out different monitoring software on your own as well. Within industrial cases, you will also find that many types of monitoring tools are used, depending on the preferences of the firm and its use cases.

Summary

In this chapter, we explored the best practices for operating and monitoring IoT networks. We discussed the importance of continuous operation, setting KPIs and metrics for success, and monitoring capabilities both on-premises and in the cloud using AWS IoT services. We also looked at several practical exercises that can be used to gain hands-on experience in operating and monitoring IoT networks. These included simulating IoT networks using virtualization, developing AWS Lambda functions to process and analyze IoT data, creating AWS CloudWatch dashboards for IoT metrics, setting up AWS IoT Greengrass groups for edge computing, and using the AWS IoT simulator to test different operating and monitoring strategies.

By learning and applying these best practices and practical exercises, students can develop the skills and knowledge necessary to design, deploy, and manage robust and reliable IoT networks. They will gain experience in using AWS IoT services and tools to monitor and analyze IoT data, set up alerts and notifications, and troubleshoot issues as they arise. Ultimately, they will be well-equipped to meet the challenges of operating and monitoring IoT networks in a variety of real-world scenarios.

In the next chapter, we will be looking at working with data and analytics within IoT with services on AWS.

Further reading

For more information about what was covered in this chapter, please refer to the following links:

Learn more about data lakes and analytics relating to managing big data on AWS: https://aws.amazon.com/big-data/datalakes-and-analytics/

Understand more on how to use Grafana through its official documentation: https://grafana.com/docs/grafana/latest/

Explore further on AWS IoT Greengrass through its official documentation: https://docs.aws.amazon.com/greengrass/index.html

Learn more about different analytics-based deployments through AWS’ official whitepapers: https://docs.aws.amazon.com/whitepapers/latest/aws-overview/analytics.html

Learn more on different analytics solutions provided by AWS: https://aws.amazon.com/solutions/analytics/

Configure AWS Greengrass on Amazon EC2 – Operating and Monitoring IoT Networks

Now, we can set up AWS Greengrass on our Amazon EC2 to be able to simulate our IoT Thing, which will fetch the ChatGPT API along with its responses accordingly:

Run the following command to update the necessary dependencies:
$ sudo yum update

Run the following command to install Python, pip, and boto3:
$ sudo yum install python && sudo yum install pip && sudo yum install
 boto3

Now, we will install the AWS IoT Greengrass software with automatic provisioning. First, we will need to install the Java runtime as Amazon Corretto 11:
$ sudo dnf install java-11-amazon-corretto -y

Run this command afterward to verify that Java is installed successfully:

$ java -version

Establish the default system user and group that operate components on the gadget. Optionally, you can delegate the task of creating this user and group to the AWS IoT Greengrass Core software installer during the installation process by utilizing the –component-default-user installer parameter. For additional details, refer to the section on installer arguments. The commands you need to run are as follows.
$ sudo useradd –system –create-home ggc_user
$ sudo groupadd –system ggc_group

Ensure that the user executing the AWS IoT Greengrass Core software, usually the root user, has the necessary privileges to execute sudo commands as any user and any group. Use the following command to access the /etc/sudoers file:
$ sudo visudo

Ensure that the user permission looks like the following:

root    ALL=(ALL:ALL) ALL

Now, you will need to provide the access key ID and secret access key for the IAM user in your AWS account to be used from the EC2 environment. Use the following commands to provide these credentials:
$ export AWS_ACCESS_KEY_ID={Insert your Access Key ID here}
$ export AWS_SECRET_ACCESS_KEY={Insert your secret access key here}

On your primary device, retrieve the AWS IoT Greengrass Core software and save it as a file named greengrass-nucleus-latest.zip:
$ curl -s https://d2s8p88vqu9w66.cloudfront.net/releases/greengrass-nucleus-latest.zip > greengrass-nucleus-latest.zip

Decompress the AWS IoT Greengrass Core software into a directory on your device. Substitute GreengrassInstaller with the name of your desired folder:
$ unzip greengrass-nucleus-latest.zip -d GreengrassInstaller && rm greengrass-nucleus-latest.zip

We now can install the AWS IoT Greengrass Core software. Replace the values as follows:

  1. /greengrass/v2 or C:\greengrass\v2: This location specifies where you plan to install the AWS IoT Greengrass Core software on your system, serving as the primary directory for the application.
  2. GreengrassInstaller: This term refers to the directory where you have unpacked the installation files for the AWS IoT Greengrass Core software.
  3. region: This is the specific geographical area within AWS where your resources will be provisioned and managed.
  4. MyGreengrassCore: This label is used to identify your Greengrass core device as a thing within AWS IoT. Should this thing not be present already, the installation process will generate it and retrieve the necessary certificates to establish its identity.
  5. MyGreengrassCoreGroup: This refers to the collective grouping of AWS IoT things that your Greengrass core device is part of. In the absence of this group, the installation process is designed to create it and enroll your thing within it. If the group is pre-existing and actively deploying, the core device will proceed to pull and initiate the deployment’s software.
  6. GreengrassV2IoTThingPolicy: This is the identifier for the AWS IoT policy that facilitates the interaction of Greengrass core devices with AWS IoT services. Lacking this policy, the installation will automatically generate one with comprehensive permissions under this name, which you can later restrict as needed.
  7. GreengrassV2TokenExchangeRole: This is the identifier for the IAM role that allows Greengrass core devices to secure temporary AWS credentials. In the event that this role is not pre-established, the installation will create it and assign the GreengrassV2TokenExchangeRoleAccess policy to it.
  8. GreengrassCoreTokenExchangeRoleAlias: This alias pertains to the IAM role that grants Greengrass core devices the ability to request temporary credentials in the future. Should this alias not be in existence, the installation process will set it up and link it to the IAM role you provide.

The following is the command you will need to run and have the values within replaced:
$ sudo -E java -Droot=”/greengrass/v2″ -Dlog.store=FILE \
-jar ./GreengrassInstaller/lib/Greengrass.jar \
–aws-region region \
–thing-name MyGreengrassCore \
–thing-group-name MyGreengrassCoreGroup \
–thing-policy-name GreengrassV2IoTThingPolicy \
–tes-role-name GreengrassV2TokenExchangeRole \
–tes-role-alias-name GreengrassCoreTokenExchangeRoleAlias \
–component-default-user ggc_user:ggc_group \
–provision true \
–setup-system-service true

Now, navigate to the root of the EC2 instance and create a file called script.py with the following command:
$ sudo vi script.py

Write the following in the script, replacing the AWS access key, secret access key, and OpenAI API key with your own values:
import json
import openai
import boto3
import time
from datetime import datetime
# Initialize AWS IoT client
def create_aws_iot_client():
    iot_client = boto3.client(‘iot-data’, region_name='{ENTER_YOUR_AWS_REGION_HERE}’, aws_access_key_id='{ENTER_YOUR_ACCESS_KEY_HERE}’, aws_secret_access_key=’ENTER_YOUR_SECRET_ACCESS_KEY_HERE’)  # replace ‘ap-southeast-2’ with your AWS region
    return iot_client
# Initialize OpenAI client
def interact_with_chatgpt(prompt):
    openai.api_key = ‘{ENTER_OPENAI_API_KEY_HERE}’
    response = openai.Completion.create(
        engine=”text-davinci-002″,
        prompt=prompt,
        temperature=0.5,
        max_tokens=100)
    return response.choices[0].text.strip()
def publish_to_aws_iot_topic(iot_client, topic, message):
    # Convert the message into a JSON object
    json_message = json.dumps({“message”: message})
    return iot_client.publish(
        topic=topic,
        qos=0,
        payload=json_message)
def main():
    prompt = “Tell a joke of the day”
    topic = “sensor/chat1”
    iot_client = create_aws_iot_client()
    while True:
        chatgpt_response = interact_with_chatgpt(prompt)
        publish_response = publish_to_aws_iot_topic(iot_client, topic, chatgpt_response)
        print(f”{datetime.now()}: Published message to AWS IoT topic: {topic}”)
        time.sleep(300)  # pause for 5 minutes
if __name__ == “__main__”:
    main()

Save the file and quit the vim editor.

Navigate to the AWS IoT page in the AWS Management Console. Go to MQTT test client.

Click on Subscribe to a Topic and input sensor/chat1 into the topic filter. Click on Subscribe.

If you look in the Subscriptions window at the bottom of the page, you can see the topic open. Now, navigate back to the EC2 window and run the following command:
$ python script.py

You should now see there is a new message under the topic. You will see a joke being written there, and one being generated every five minutes (or any other duration of time, depending on what you specified).

With that, we have configured AWS Greengrass on the EC2. Now, we can look at monitoring the EC2 in terms of how it publishes messages.