Tag Archives: risk

Ensuring data reliability and observability in risk systems

2024-04-23 Grab Tech

Post Syndicated from Grab Tech original https://engineering.grab.com/data-observability

Grab has an in-house Risk Management platform called GrabDefence which relies on ingesting large amounts of data gathered from upstream services to power our heuristic risk rules and data science models in real time.

Fig 1. GrabDefence aggregates data from different upstream services

As Grab’s business grows, so does the amount of data. It becomes imperative that the data which fuels our risk systems is of reliable quality as any data discrepancy or missing data could impact fraud detection and prevention capabilities.

We need to quickly detect any data anomalies, which is where data observability comes in.

Data observability as a solution

Data observability is a type of data operation (DataOps; similar to DevOps) where teams build visibility over the health and quality of their data pipelines. This enables teams to be notified of data quality issues, and allows teams to investigate and resolve these issues faster.

We needed a solution that addresses the following issues:

Alerts for any data quality issues as soon as possible – so this means the observability tool had to work in real time.
With hundreds of data points to observe, we needed a neat and scalable solution which allows users to quickly pinpoint which data points were having issues.
A consistent way to compare, analyse, and compute data that might have different formats.

Hence, we decided to use Flink to standardise data transformations, compute, and observe data trends quickly (in real time) and scalably.

Utilising Flink for real-time computations at scale

What is Flink?

Flink SQL is a powerful, flexible tool for performing real-time analytics on streaming data. It allows users to query continuous data streams using standard SQL syntax, enabling complex event processing and data transformation within the Apache Flink ecosystem, which is particularly useful for scenarios requiring low-latency insights and decisions.

How we used Flink to compute data output

In Grab, data comes from multiple sources and while most of the data is in JSON format, the actual JSON structure differs between services. Because of JSON’s nested and dynamic data structure, it is difficult to consistently analyse the data – posing a significant challenge for real-time analysis.

To help address this issue, Apache Flink SQL has the capability to manage such intricacies with ease. It offers specialised functions tailored for parsing and querying JSON data, ensuring efficient processing.

Another standout feature of Flink SQL is the use of custom table functions, such as JSONEXPLOAD, which serves to deconstruct and flatten nested JSON structures into tabular rows. This transformation is crucial as it enables subsequent aggregation operations. By implementing a 5-minute tumbling window, Flink SQL can easily aggregate these now-flattened data streams. This technique is pivotal for monitoring, observing, and analysing data patterns and metrics in near real-time.

Now that data is aggregated by Flink for easy analysis, we still needed a way to incorporate comprehensive monitoring so that teams could be notified of any data anomalies or discrepancies in real time.

How we interfaced the output with Datadog

Datadog is the observability tool of choice in Grab, with many teams using Datadog for their service reliability observations and alerts. By aggregating data from Apache Flink and integrating it with Datadog, we can harness the synergy of real-time analytics and comprehensive monitoring. Flink excels in processing and aggregating data streams, which, when pushed to Datadog, can be further analysed and visualised. Datadog also provides seamless integration with collaboration tools like Slack, which enables teams to receive instant notifications and alerts.

With Datadog’s out-of-the-box features such as anomaly detection, teams can identify and be alerted to unusual patterns or outliers in their data streams. Taking a proactive approach to monitoring is crucial in maintaining system health and performance as teams can be alerted, then collaborate quickly to diagnose and address anomalies.

This integrated pipeline—from Flink’s real-time data aggregation to Datadog’s monitoring and Slack’s communication capabilities—creates a robust framework for real-time data operations. It ensures that any potential issues are quickly traced and brought to the team’s attention, facilitating a rapid response. Such an ecosystem empowers organisations to maintain high levels of system reliability and performance, ultimately enhancing the overall user experience.

Organising monitors and alerts using out-of-the-box solutions from Datadog

Once we integrated Flink data into Datadog, we realised that it could become unwieldy to try to identify the data point with issues from hundreds of other counters.

Fig 2. Hundreds of data points on a graph make it hard to decipher which ones have issues

We decided to organise the counters according to the service stream it was coming from, and create individual monitors for each service stream. We used Datadog’s Monitor Summary tool to help visualise the total number of service streams we are reading from and the number of underlying data points within each stream.

Fig 3. Data is grouped according to their source stream

Within each individual stream, we used Datadog’s Anomaly Detection feature to create an alert whenever a data point from the stream exceeds a predefined threshold. This can be configured by the service teams on Datadog.

Fig 4. Datadog’s built-in Anomaly Detection function triggers alerts whenever a data point exceeds a threshold

These alerts are then sent to a Slack channel where the Data team is informed when a data point of interest starts throwing anomalous values.

Fig 5. Datadog integration with Slack to help alert users

Impact

Since the deployment of this data observability tool, we have seen significant improvement in the detection of anomalous values. If there are any anomalies or issues, we now get alerts within the same day (or hour) instead of days to weeks later.

Organising the alerts according to source streams have also helped simplify the monitoring load and allows users to quickly narrow down and identify which pipeline has failed.

What’s next?

At the moment, this data observability tool is only implemented on selected checkpoints in GrabDefence. We plan to expand the observability tool’s coverage to include more checkpoints, and continue to refine the workflows to detect and resolve these data issues.

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

AWS Customer Compliance Guides now publicly available

2024-02-22 Kevin Donohue

Post Syndicated from Kevin Donohue original https://aws.amazon.com/blogs/security/aws-customer-compliance-guides-now-publicly-available/

The AWS Global Security & Compliance Acceleration (GSCA) Program has released AWS Customer Compliance Guides (CCGs) on the AWS Compliance Resources page to help customers, AWS Partners, and assessors quickly understand how industry-leading compliance frameworks map to AWS service documentation and security best practices.

CCGs offer security guidance mapped to 16 different compliance frameworks for more than 130 AWS services and integrations. Customers can select from the frameworks and services available to see how security “in the cloud” applies to AWS services through the lens of compliance.

CCGs focus on security topics and technical controls that relate to AWS service configuration options. The guides don’t cover security topics or controls that are consistent across AWS services or those specific to customer organizations, such as policies or governance. As a result, the guides are shorter and are focused on the unique security and compliance considerations for each AWS service.

We value your feedback on the guides. Take our CCG survey to tell us about your experience, request new services or frameworks, or suggest improvements.

CCGs provide summaries of the user guides for AWS services and map configuration guidance to security control requirements from the following frameworks:

National Institute of Standards and Technology (NIST) 800-53
NIST Cybersecurity Framework (CSF)
NIST 800-171
System and Organization Controls (SOC) II
Center for Internet Security (CIS) Critical Controls v8.0
ISO 27001
NERC Critical Infrastructure Protection (CIP)
Payment Card Industry Data Security Standard (PCI-DSS) v4.0
Department of Defense Cybersecurity Maturity Model Certification (CMMC)
HIPAA
Canadian Centre for Cyber Security (CCCS)
New York’s Department of Financial Services (NYDFS)
Federal Financial Institutions Examination Council (FFIEC)
Cloud Controls Matrix (CCM) v4
Information Security Manual (ISM-IRAP) (Australia)
Information System Security Management and Assessment Program (ISMAP) (Japan)

CCGs can help customers in the following ways:

Shorten the process of manually searching the AWS user guides to understand security “in the cloud” details and align configuration guidance to compliance requirements
Determine the scope of controls applicable in risk assessments or audits based on which AWS services are running in customer workloads
Assist customers who perform due diligence assessments on new AWS services under consideration for use in their organization
Provide assessors or risk teams with resources to identify which security areas are handled by AWS services and which are the customer’s responsibility to implement, which might influence the scope of evidence required for assessments or internal security checks
Provide a basis for developing security documentation such as control responses or procedures that might be required to meet various compliance documentation requirements or fulfill assessment evidence requests

The AWS Global Security & Compliance Acceleration (GSCA) Program connects customers with AWS partners that can help them navigate, automate, and accelerate building compliant workloads on AWS by helping to reduce time and cost. GSCA supports businesses globally that need to meet security, privacy, and compliance requirements for healthcare, privacy, national security, and financial sectors. To connect with a GSCA compliance specialist, complete the GSCA Program questionnaire.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Continuous compliance monitoring using custom audit controls and frameworks with AWS Audit Manager

2022-01-18 Deenadayaalan Thirugnanasambandam

Post Syndicated from Deenadayaalan Thirugnanasambandam original https://aws.amazon.com/blogs/security/continuous-compliance-monitoring-using-custom-audit-controls-and-frameworks-with-aws-audit-manager/

For most customers today, security compliance auditing can be a very cumbersome and costly process. This activity within a security program often comes with a dependency on third party audit firms and robust security teams, to periodically assess risk and raise compliance gaps aligned with applicable industry requirements. Due to the nature of how audits are now performed, many corporate IT environments are left exposed to threats until the next manual audit is scheduled, performed, and the findings report is presented.

AWS Audit Manager can help you continuously audit your AWS usage and simplify how you assess IT risks and compliance gaps aligned with industry regulations and standards. Audit Manager automates evidence collection to reduce the “all hands-on deck” manual effort that often happens for audits, while enabling you to scale your audit capability in the cloud as your business grows. Customized control frameworks help customers evaluate IT environments against their own established assessment baseline, enabling them to discern how aligned they are with a set of compliance requirements tailored to their business needs. Custom controls can be defined to collect evidence from specific data sources, helping rate the IT environment against internally defined audit and compliance requirements. Each piece of evidence collected during the compliance assessment becomes a record that can be used to demonstrate compliance with predefined requirements specified by a control.

In this post, you will learn how to leverage AWS Audit Manager to create a tailored audit framework to continuously evaluate your organization’s AWS infrastructure against the relevant industry compliance requirements your organization needs to adhere to. By implementing this solution, you can simplify yet accelerate the detection of security risks present in your AWS environment, which are relevant to your organization, while providing your teams with the information needed to remedy reported compliance gaps.

Solution overview

This solution utilizes an event-driven architecture to provide agility while reducing manual administration effort.

AWS Audit Manager–AWS Audit Manager helps you continuously audit your AWS usage to simplify how you assess risk and compliance with regulations and industry standards.
AWS Lambda–AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers, in response to events such as changes in data, application state or user actions.
Amazon Simple Storage Service (Amazon S3) –Amazon S3 is object storage built to store and retrieve any amount of data from anywhere, that offers industry leading availability, performance, security, and virtually unlimited scalability at very low costs.
AWS Cloud Development Kit (AWS CDK)–AWS Cloud Development Kit is a software development framework for provisioning your cloud infrastructure in code through AWS CloudFormation.

Architecture

This solution enables automated controls management using event-driven architecture with AWS Services such as AWS Audit Manager, AWS Lambda and Amazon S3, in integration with code management services like GitHub and AWS CodeCommit. The Controls owner can design, manage, monitor and roll out custom controls in GitHub with a simple custom controls configuration file, as illustrated in Figure 1. Once the controls configuration file is placed in an Amazon S3 bucket, the on-commit event of the file triggers a control pipeline to load controls in audit manager using a Lambda function.

Figure 1: Solution workflow

Solution workflow overview

The Control owner loads the controls as code (Controls and Framework) into an Amazon S3 bucket.
Uploading the Controls yaml file into the S3 bucket triggers a Lambda function to process the control file.
The Lambda function processes the Controls file, and creates a new control (or updates an existing control) in the Audit Manager.
Uploading the Controls Framework yaml file into the S3 bucket triggers a Lambda function to process the Controls Framework file.
The Lambda function validates the Controls Framework file, and updates the Controls Framework library in Audit Manager

This solution can be extended to create custom frameworks based on the controls, and to run an assessment framework against the controls.

Prerequisite steps

Sign in to your AWS Account
Login to the AWS console and choose the appropriate AWS Region.
In the Search tab, search for AWS Audit Manager

Figure 2. AWS Audit Manager

Choose Set up AWS Audit Manager.

Keep the default configurations from this page, such as Permissions and Data encryption. When done choose Complete setup.

Before deploying the solution, please ensure that the following software packages and their dependencies are installed on your local machine:

Node.js v12 or above	https://nodejs.org/en/
AWS CLI version 2	https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html
AWS CDK	https://docs.aws.amazon.com/cdk/latest/guide/getting_started.html
jq	https://stedolan.github.io/jq/
git	https://git-scm.com/
AWS CLI configuration	https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html

Solution details

To provision the enterprise control catalog with AWS Audit Manager, start by cloning the sample code from the aws-samples repository on GitHub, followed by running the installation script (included in this repository) with sample controls and framework from your AWS Account.

To clone the sample code from the repository

On your development terminal, git clone the source code of this blog post from the AWS public repository:

git clone [email protected]:aws-samples/enterprise-controls-catalog-via-aws-audit-manager.git

To bootstrap CDK and run the deploy script

The CDK Toolkit Stack will be created by cdk bootstrap and will manage resources necessary to enable deployment of Cloud Applications with AWS CDK.

cdk bootstrap aws://<AWS Account Number>/<Region> # Bootstrap CDK in the specified account and region

cd audit-manager-blog

./deploy.sh

Workflow

Figure 3 illustrates the overall deployment workflow. The deployment script triggers the NPM package manager, and invokes AWS CDK to create necessary infrastructure using AWS CloudFormation. The CloudFormation template offers an easy way to provision and manage lifecycles, by treating infrastructure as code.

Figure 3: Detailed workflow lifecycle

Once the solution is successfully deployed, you can view two custom controls and one custom framework available in AWS Audit Manager. The custom controls use a combination of manual and automated evidence collection, using compliance checks for resource configurations from AWS Config.

To verify the newly created custom data security controls

In the AWS console, go to AWS Audit Manager and select Control library
Choose Custom controls to view the controls DataSecurity-DatainTransit and DataSecurity-DataAtRest

Figure 4. View custom controls

To verify the newly created custom framework

In the AWS console, go to AWS Audit Manager and select Framework library.
Choose Custom frameworks to view the following framework:

Figure 5. Custom frameworks list

You have now successfully created the custom controls and framework using the proposed solution.

Next, you can create your own controls and add to your frameworks using a simple configuration file, and let the implemented solution do the automated provisioning.

To set up error reporting

Before you begin creating your own controls and frameworks, you should complete the error reporting configuration. The solution automatically sets up the error reporting capability using Amazon SNS, a web service that enables sending and receiving notifications from the cloud.

In the AWS Console, go to Amazon SNS > Topics > AuditManagerBlogNotification
Select Create subscription and choose Email as your preferred endpoint to subscribe.
This will trigger an automated email on subscription confirmation. Upon confirmation, you will begin receiving any error notifications by email.

To create your own custom control as code

Follow these steps to create your own controls and frameworks:

Create a new control file named example-control.yaml with contents as shown below. This creates a custom control to check whether all public access to data in Amazon S3 is prohibited:

name:
DataSecurity-PublicAccessProhibited

description:
Information and records (data) are managed consistent with the organization’s risk strategy to protect the confidentiality, integrity, and availability of information.

actionPlanTitle:
All public access block settings are enabled at account level

actionPlanInstructions:
Ensure all Amazon S3 resources have public access prohibited

testingInformation:
Test attestations – preventive and detective controls for prohibiting public access

tags:
ID: PRDS-3Subcategory: Public-Access-Prohibited
Category: Data Security-PRDS
CIS: CIS17
COBIT: COBIT 5 APO07-03
NIST: NIST SP 800-53 Rev 4

datasources:
sourceName: Config attestation
sourceDescription: Config attestation
sourceSetUpOption: System_Controls_Mapping
sourceType: AWS_Config

sourceKeyword:
keywordInputType: SELECT_FROM_LIST
keywordValue: S3_ACCOUNT_LEVEL_PUBLIC_ACCESS_BLOCKS

Go to AWS Console > AWS CloudFormation > Stacks. Select AuditManagerBlogStack and choose Outputs.
Make note of the bucketOutput name that starts with auditmanagerblogstack-
Upload the example-control.yaml file into the auditmanagerblogstack- bucket noted in step 3, inside the controls folder
The event-driven architecture is deployed as part of the solution. Uploading the file to the Amazon S3 bucket triggers an automated event to create the new custom control in AWS Audit Manager.

To validate your new custom control is automatically provisioned in AWS Audit Manager

In the AWS console, go to AWS Audit Manager and select Control library
Choose Custom controls to view the following controls:

Figure 6. Audit Manager custom controls are listed as Custom controls

To create your own custom framework as code

Create a new framework file named example-framework.yaml with contents as shown below:

name:
Sample DataSecurity Framework

description:
A sample data security framework to prohibit public access to data

complianceType:
NIST

controlSets:
– name: Prohibit public access
controls:
– DataSecurity-PublicAccessProhibited

tags:
Tag1: DataSecurity
Tag2: PublicAccessProhibited

Go to AWS Console > AWS CloudFormation > Stacks. Select AuditManagerBlogStack and choose Outputs.
Make note of the bucketOutput name that starts with auditmanagerblogstack-
Upload the example-framework.yaml file into the bucket noted in step 3 above, inside the frameworks folder
The event driven architecture is deployed as part of the blog. The file upload to Amazon S3 triggers an automated event to create the new custom framework in AWS Audit Manager.

To validate your new custom framework automatically provisioned in AWS Audit Manager

Go to AWS Audit Manager in the AWS console and select Control library
Click Custom controls and you should be able to see the following controls:

Figure 7. View custom controls created via custom repo

Congratulations, you have successfully created your new custom control and framework using the proposed solution.

Next steps

An Audit Manager assessment is based on a framework, which is a grouping of controls. Using the framework of your choice as a starting point, you can create an assessment that collects evidence for the controls in that framework. In your assessment, you can also define the scope of your audit. This includes specifying which AWS accounts and services you want to collect evidence for. You can create an assessment from a custom framework you build yourself, using steps from the Audit Manager documentation.

Conclusion

The solution provides the dynamic ability to design, develop and monitor capabilities that can be extended as a standardized enterprise IT controls catalogue for your company. With AWS Audit Manager, you can build compliance controls as code, with capability to audit your environment on a daily, weekly, or monthly basis. You can use this solution to improve the dynamic nature of assessments with AWS Audit Manager’s compliance audit, on time with reduced manual effort. To learn more about our standard frameworks to assist you, see Supported frameworks in AWS Audit Manager which provides prebuilt frameworks based on AWS best practices.

How to think about cloud security governance

2020-08-25 Paul Hawkins

Post Syndicated from Paul Hawkins original https://aws.amazon.com/blogs/security/how-to-think-about-cloud-security-governance/

When customers first move to the cloud, their instinct might be to build a cloud security governance model based on one or more regulatory frameworks that are relevant to their industry. Although this can be a helpful first step, it’s also critically important that organizations understand what the control objectives for their workloads should be.

In this post, we discuss what you need to do both organizationally and technically with Amazon Web Services (AWS) to build an efficient and effective governance model. People who are taking their first steps in cloud can use this post to guide their thinking. It can also act as useful context for folks who have been running in the cloud for a while to evaluate their current governance approach.

But before you can build that model, it’s important to understand what governance is and to consider why you need it. Governance is how an organization ensures the consistent application of policies across all teams. The best way to implement consistent governance is by codifying as much of the process as possible. Security governance in particular is used to support business objectives by defining policies and controls to manage risk.

Moving to the cloud provides you with an opportunity to deliver features faster, react to the changing world in a more agile way, and return some decision making to the hands of the people closest to the business. In this fast-paced environment, it’s important to have a way to maintain consistency, scaleability, and security. This is where a strong governance model helps.

Creating the right governance model for your organization may seem like a complex task, but it doesn’t have to be.

Frameworks

Many customers use a standard framework that’s relevant to their industry to inform their decision-making process. Some frameworks that are commonly used to develop a security governance model include: NIST Cybersecurity Framework (CSF), Information Security Registered Assessors Program (IRAP), Payment Card Industry Data Security Standard (PCI DSS), or ISO/IEC 27001:2013

Some of these standards provide requirements that are specific to a particular regulator, or region and others are more widely applicable—you should choose one that fits the needs of your organization.

While frameworks are useful to set the context for a security program and give guidance on governance models, you shouldn’t build either one only to check boxes on a particular standard. It’s critical that you should build for security first and then use the compliance standards as a way to demonstrate that you’re doing the right things.

Control objectives

After you’ve selected a framework to use, the next considerations are controls. A control is a technical- or process-based implementation that’s designed to ensure that the likelihood or consequences of an identified risk are reduced to a level that’s acceptable to the organization’s risk appetite. Examples of controls include firewalls, logging mechanisms, access management tools, and many more.

Controls will evolve over time; sometimes they do so very quickly in the early stages of cloud adoption. During this rapid evolution, it’s easy to focus purely on the implementation of a control rather than the objective of it. However, if you want to build a robust and useful governance model, you must not lose sight of control objectives.

Consider the example of the firewall. When you use a firewall, you implement a control. The objective is to make sure that only traffic that should reach your environment is able to reach it. Although a firewall is one way to meet this objective, you can achieve the same outcome with a layered approach using Amazon Virtual Private Cloud (Amazon VPC) Security Groups, AWS WAF and Amazon VPC network access control lists (ACLs). Splitting the control implementation into multiple places can enable workload owners to have greater flexibility in how they configure resources while the baseline posture is delivered automatically.

Not all areas of a business necessarily have the same cloud maturity level, or use the same methods to deploy or run workloads. As a security architect, your job is to help those different parts of the business deliver outcomes in the way that is appropriate for their maturity or particular workload.

The best way to help drive this goal is for the security part of your organization to clearly communicate the necessary control objectives. As a security architect, it’s easier to have a discussion about the things that need tweaking in an application if the objectives are well communicated. It is much harder if the workload owner doesn’t know they have to meet certain security expectations.

What is the job of security?

At AWS, we talk to customers across a range of industries. One thing that consistently comes up in conversation is how to help customers understand the role of their security team in a distributed cloud-aware environment. The answer is always the same: we as security people are here to help the business deploy and run applications securely. Our job is to guide and educate the rest of the organization on the best way to meet the business objectives while meeting the security, risk, and compliance requirements.

So how do you do this?

Technology and culture are both important to an organization’s security posture, and they enable each other. AWS is a good example of an organization that has a strong culture of security ownership. One thing that all customers can take away from AWS: security is everyone’s job. When you understand that, it becomes easier to build the mechanisms that make the configuration and operation of appropriate security control objectives a reality.

The cloud environment that you build goes a long way to achieving this goal in two key ways. First, it provides guardrails and automated guidance for people building on the platform. Second, it allows solutions to scale.

One of the challenges organizations encounter is that there are more developers than there are security people. The traditional approach of point-in-time risk and control assessments performed by a human looking at an architecture diagram doesn’t scale. You need a way to scale that knowledge and capability without increasing the number of people. The best way to achieve this is to codify as much as possible, early in the build and release process.

One way to do this is to run the AWS platform as a product in its own right. Team members should be able to submit feature requests, and there should be metrics on the features that are enabled through the platform. The more security capability that teams building workloads can inherit from the platform, the less they have to implement at the workload level and the more time they can spend on product features. There will always be some security control objectives that can only be delivered by specific configuration at the workload level; this should build on top of what’s inherited from the cloud platform. Your security team and the other teams need to work together to make sure that the capabilities provided by the cloud platform are available to help people build and release securely.

One part of the governance model that we like to highlight is the concept of platform onboarding. The idea of this part of the governance model is to quickly and consistently get to a baseline set of controls that enable you to use a service safely in a particular environment. A good example here is to give developers access to evaluate a service in an experimentation account. To support this process, you don’t want to spend a long time building controls for every possible outcome. The best approach is to take advantage of the foundational controls that are delivered by the cloud platform as the starting point. Things like federation, logging, and service control policies can be used to provide guard rails that enable you to use services quickly. When the services are being evaluated, your security team can work together with your business to define more specific controls that make sense for the actual use cases.

AWS Well-Architected Framework

The cloud platform you use is the foundation of many of the security controls. These guard rails of federation, logging, service control polices, and automated response apply to workloads of all types. The security pillar in the AWS Well-Architected Framework builds on other risk management and compliance frameworks, provides you with best practices, and helps you to evaluate your architectures. These best practices are a great place to look for what you should do when building in the cloud. The categories—identity and access management, detection, infrastructure protection, data protection, and incident response—align with the most important areas to focus on when you build in AWS.

For example, identity is a foundational control in a cloud environment. One of the AWS Well-Architected security best practices is “Rely on a centralized identity provider.” You can use AWS Single Sign-On (AWS SSO) for this purpose or an equivalent centralized mechanism. If you centralize your identity provider, you can perform identity lifecycle management on users, provide them with access to only the resources that are required, and support users who move between teams. This can apply across the multiple AWS accounts in your AWS environment. AWS Organizations uses service control policies to enable you to use a subset of AWS services in particular environments; this is an identity-centric way of providing guard rails.

In addition to federating users, it’s important to enable logging and monitoring services across your environment. This allows you to generate an event when something unexpected happens, such as a user trying to call AWS Key Management Service (AWS KMS) to decrypt data that they should have access to. Securely storing logs means that you can perform investigations to determine the causes of any issues you might encounter. AWS customers who use Amazon GuardDuty and AWS CloudTrail, and have a set of AWS Config rules enabled, have access to security monitoring and logging capabilities as they build their applications.

The layer cake model

When you think about cloud security, we find it useful to use the layer cake as a good mental model. The base of the cake is the understanding of the below-the-line capability that AWS provides. This includes self-serving the compliance documentation from AWS Artifact and understanding the AWS shared responsibility model.

The middle of the cake is the foundational controls, including those described previously in this post. This is the most important layer, because it’s where the most controls are and therefore where the most value is for the security team. You could describe it as the “solve it once, consume it many times” layer.

The top of the cake is the application-specific layer. This layer includes things that are more context dependent, such as the correct control objectives for a certain type of application or data classification. The work in the middle layer helps support this layer, because the middle layer provides the mechanisms that make it easier to automatically deliver the top layer capability.

The middle and top layers are not just technology layers. They also include the people and process parts of the equation. The technology is just there to support the processes.

One thing to be aware of is that you shouldn’t try to define every possible control for a service before you allow your business to use the service. Make use of the various environments in your organization—experimenting, development, testing, and production—to get the services in the hands of developers as quickly as possible with the minimum guardrails to avoid accidental misconfiguration. Then, use the time when the services are being assessed to collaborate with the developers on control implementation. Control implementations can then be rolled into the middle layer of the cake, and the services can be adopted by other parts of the business.

This is also the ideal time to apply practical threat modelling techniques so you can understand what threats and risks you must address. Working with your business to define recommended implementation patterns also helps provide context for how services are typically used. This means you can focus on the controls that are most relevant.

The architecture, platform, or cloud center of excellence (CoE) teams can help at this stage. They can likely make a quick determination of whether an AWS service fits in with your organization’s architectural direction. This quick triage helps the security team focus their efforts in helping get services safely in the hands of the business without being seen as blocking adoption. A good mechanism for streamlining the use of new services is to make sure the backlog is well communicated, typically on a platform team wiki. This helps the security and non-security parts of your organization prioritize their time on services that deliver the most business value. A consistent development approach means that the services that are used are probably being used in more places across the organization. This helps your organization get the benefits of scale as consistent approaches to control implementation are replicated between teams.

Simplicity, metrics, and culture

The world moves fast. You can’t just define a security posture and control objectives, and then walk away. New services are launched that make it easier to do more complex things, business priorities change, and the threat landscape evolves. How do you keep up with all of it?

The answer is a combination of simplicity, metrics, and culture.

Simplicity is hard, but useful. For example, if you have 100 application teams all building in a different way, you have a large number of different configurations that you must ensure are sensibly defined. Ideally, you do this programmatically, which means that the work to define and maintain that set of security controls is significant. If you have 100 application teams using only 10 main patterns, it’s easier to build controls. This has the added benefit of reducing the complexity at the operations end, which applies to both the day-to-day operations and to incident responses. Simplification of your control environment means that your monitoring is less complex, troubleshooting is easier, and people have time to focus on the development of new controls or processes.

Metrics are important because you can make informed decisions based on data. A good example of the usefulness of metrics is patching. Patching is one of the easiest ways to improve your security posture. Having metrics on patch age, presented where this information is most important in your environment, enables you to focus on the most valuable areas. For example, infrastructure on your edge is more important to keep patched than infrastructure that is behind multiple layers of controls. You should patch everything, but you need to make it easy for application teams to do so as part of their build and release cycles. Exposing metrics to teams and leadership helps your organization learn from high performing areas in the business. These could be teams that are regularly meeting the patching expectations or have low instances of needing to remediate penetration testing findings. Metrics and data about your control effectiveness enables you to provide assurance internally and externally that you’re meeting your control objectives.

This brings us to culture. Security as an enabler is something that we think is the most important concept to take away from this post. You must build capabilities that enable people in your organization to have the secure configuration or design choice be the easiest option. This is the role of security. You should also make sure that, when there are problems, your security team works with the business to help everyone learn the cause and improve for next time.

AWS has a culture that uses trouble ticketing for everything. If our employees think they have a security problem, we tell them to open a ticket; if they’re not sure that they have a security problem, we tell them to open a ticket anyway to get guidance. This kind of culture encourages people to communicate and help means so we can identify and fix issues early. Issues that aren’t as severe thought can be downgraded quickly. This culture of ticketing gives us data to inform what we build, which helps people be more secure. You can get started with a system like this in your own environment, or look to extend the capability if you’ve already started.

Take our recommendation to turn on GuardDuty across all your accounts. We recommend that the resulting high and medium alerts are sent to a ticketing system. Look at how you resolve those issues and use that to prioritize the next two weeks of work. Now you can build automation to fix the issues and, more importantly, build to prevent the issues from happening in the first place. Ask yourself, “What information did I need to diagnose the problem?” Then, build automation to enrich the findings so your tickets have that context. Iterate on the automation to understand the context. For example, you may want to include information to show whether the environment is production or non-production.

Note that having production-like controls in non-production environments means that you reduce the chance of deployment failures. It also gets teams used to working within the security guardrails. This increased rigor earlier on in the process, and helps your change management team, too.

Summary

It doesn’t matter what security frameworks or standards you use to inform your business, and you might not even align with a particular industry standard. What does matter is building a governance model that empowers the people in your organization to consistently make good security decisions and provides the capability for your security team to enable this to happen. To get started or continue to evolve your governance model, follow the AWS Well-Architected security best practices. Then, make sure that the platform you implement helps you deliver the foundational security control objectives so that your business can spend more of its time on the business logic and security configuration that is specific to its workloads.

The technology and governance choices you make are the first step in building a positive security culture. Security is everyone’s job, and it’s key to make sure that your platform, automation, and metrics support making that job easy.

The areas of focus we’ve talked about in this post are what allow security to be an enabler for business and to ultimately help you better help your customers and earn their trust with everything you do.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.