Saturday, April 22, 2023

Threat Modeling

In a highly secure organization, threat modeling must be an integral part of a Secure Release Process (SRP) for cloud services and software. It is a security practice used to evaluate potential threats to a system or application. Organizations can adopt threat modeling to get ahead of vulnerabilities before it's too late. 

The practice involves 6 steps:

1. Identify assets: What needs to be protected? This may include data, hardware, software, networks, or any other resources that are critical to the organization.

2. Create a data flow diagram: How does data flow through a system or application? What components talk to each other and how? A data flow diagram shows component interactions, ports, and protocols. 

3. Identify potential threats: What threats to the system exist? External threats include hackers and malware, while internal threats may include authorized users and human error. What harm can they cause to assets? 

4. Risk: evaluate the likelihood and impact of threats. How serious is the threat and how likely is it to happen? What would be the impact to the organization? 

5. Prioritize threats: After a risk evaluation, prioritize threats by severity. This helps organizations to focus attention on addressing the most severe threats first.

6. Mitigate threats: The final step in threat modeling is to develop and implement measures to mitigate identified threats. This could include adding security controls, such as firewalls, intrusion detection/prevention systems, or a SIEM. Employee training on security best practices also goes a long way to mitigate threats. Regular tests and updates to security measures are other ways to mitigate security risks.

A review of a threat model should happen at least once a year as part of an SRP to catch new threats and to assess architecture changes to the system or application as it evolves. By identifying potential threats proactively, organizations can significantly reduce the risk of a cybersecurity attack.

Sample Threat Model

In this sample threat model, the focus is a PAM solution deployed deep inside the corporate network in a Blue Zone. In identifying the assets needing protection, the diagram expands out to include all inbound connections into the PAM solution, all outbound connections from it. 

Sample Threat Model

Network Zones

The sample diagram can be broken down by network zone: Blue, Yellow, and Red. 

Blue - Highly restricted. Contains mission critical data and systems operating within it. Here applications can talk to each other, but they shouldn't reach out to any of the other zones. If they do, traffic should be carefully monitored. 

Yellow - like a DMZ (Demilitarized Zone), this zone hosts a services layer of APIs and user interfaces that are exposed to authenticated/authorized users. It also hosts a SIEM to take logs in from external sources. 

Red - This zone is uncontrolled. It is completely un-trusted because of the limited controls that can be put into place there. As such, it's viewed as a major security risk. Sensitive assets inside the organization must be isolated as much as possible from this zone. Could be a customer's network or the big bad internet.

Assets

The A1-A18 labels identify and classify the assets that need to be protected. In this model assets to protect include logs, alert data, credentials, backups, device health metrics, key stores, as well as SIEM data. And since the focus of this threat model is the PAM solution itself, the primary asset are elevated customer device credentials, A4.  

Threats

The red labels T1-T5 represent threats to the applications and data inside each zone. In this model, threats do not include external unauthenticated users because the system is locked down to prevent this type of access. But it does include internal authenticated as well as internal unauthenticated users as a threat simply because human error could lead to a security incident. SQL Injection is also identified as a threat to the databases. 

Controls

To mitigate threats, controls and safeguards are put in place. In this model, a VPN sits in front of the internal network and VPN access is required to get in. All traffic is routed through an encrypted VPN tunnel. In the diagram this is the dotted line underneath the Red Zone labeled C1. Other controls include firewalls to allow traffic only through specific ports and protocols, as well as encryption of data in transit by SSL/TLS. 

An intrusion detection and/or intrusion prevention (IDS/IPS) tool is in place, labelled C7, to capture and analyze traffic. Anything suspicious generates an alert for security personnel to act upon. An IDS can be used to detect a wide range of attacks, including network scans, port scans, denial-of-service (DoS) attacks, malware infections, and unauthorized access attempts. Other controls throughout the diagram are placed there to protect assets from identified threats. 

Data Classification

The sample threat model does not have an extensive data classification scheme, as it only identifies sensitive data. But other models could provide a more granular data classification scheme to better explain what kind of data is stored where and how to protect it. 

For example, HIPPA (Health Insurance Portability and Accountability Act) protects the privacy and security of individuals' medical records and personal health information (PHI). The law applies to health care providers, health plans, and health care clearinghouses, as well as their business associates, who handle PHI. HIPAA requires covered entities to implement administrative, physical, and technical safeguards to protect the confidentiality, integrity, and availability of PHI. Non-compliance with HIPAA regulations can result in significant fines and legal penalties.

HIPAA, PCI-DSS, and GDPR mandate that organizations implement security measures to protect sensitive data. Threat modeling as a security practice helps organizations to comply with regulatory requirements. 

Once it's created, a threat model diagram can be reviewed by the organization's security team and kept current with architectural changes as the system evolves over time. The threat model provides a basis for a re-assessment of the threats and controls in place to protect assets. 

An improvement in an organization's security posture can be achieved by investing in threat modeling, and thereby reducing the risk of cyber-attacks. 

5-Service Cloud Architecture Model

A primary goal of cloud architecture is to provide a cloud computing environment that supports a flexible, scalable, and reliable platform for the delivery of cloud services. 

In terms of layers, cloud architecture may include infrastructure, platform, and software layers. These are often referred to as IaaS, PaaS, and SaaS. Infrastructure has the physical servers, storage devices, and networking that are required to support cloud services. Platform refers to the software frameworks and tools that are used to develop and deploy cloud applications. And software is all about the applications and services built on top of infrastructure and platform, that are provided to end-users. 

If we really boil it down to an essence, it's possible to define a cloud architecture comprised of 5 core elements:

1. Load Balancer
2. Microservices
3. System of Record 
4. System of Engagement
5. Messaging

1. Load Balancer

To distribute incoming network traffic evenly and to prevent overloading of any single resource, a Load Balancer provides high availability (HA) through fail-over and request distribution by an algorithm like round-robin. Rules are added here to send traffic to the nodes of a cluster uniformly, or based on their ability to respond. 

2. Microservices

Microservices allow APIs to be deployed fast and often. As a core platform layer, microservices provide a way for APIs to be combined when there's a need to correlate data from different sources. They isolate the lower layers of the platform from end-user applications and they support long term growth in development team size, plus the number of applications and services can grow here as well, in isolation from each other.  


3. System of Record

A System of Record (SOR) is the primary source of data used as the source of truth for a particular business process or application. A database, a file system, or any other software system that stores and manages data becomes a system of record. It is a unified view of critical business data, accurate and up-to-date. A source of truth has a disaster recovery plan as well as backups. 

4. System of Engagement

All user interaction with applications and services delivered by the platform happens through a System of Engagement (SOE), for users to login, search, and interact with the cloud services provided. To better engage with customers and stakeholders, the SOE provides them with personalized, interactive experiences that are tailored to their needs and preferences. 

5. Messaging

Finally, a messaging system is needed for service-to-service communication and coordination between applications. A  messaging system adds the facilities to create, send, receive, and to read messages. 

These elements could be selected from a cloud catalog, or installed into a cloud provider such as AWS, Azure, or GCP. For example, the following open-source software could be deployed into a cloud platform, as an implementation of the 5-service cloud architecture:

1. HA Proxy
2. Kubernetes
3. MySQL
4. Elastic Search (ELK)
5. Kafka

There's no limit to the number of cloud services that can be designed, implemented, deployed, operationalized, and exposed to users as cloud-native applications, using any 2 or more of these platform services. 

In defining a cloud architecture model, architects lay a foundation for future product development. Through a 5-service cloud architecture, we setup product teams, engineering teams, and infrastructure teams to design, build, and deliver an unlimited range of applications and services to users and businesses, built on top of a system of sub-systems. 

Disaster Recovery Plan

A disaster recovery (DR) plan provides a step-by-step procedure for unplanned incidents such as power outages, natural disasters, cyber attacks and other disruptive events. This DR plan is intended to minimize the impact of a disaster on a primary data center by defining a way for the system to continue to operate. A plan includes a procedure to quickly return to an operational state in a production environment.

A disruption to the operational state of the system in production can lead to lost revenue, financial penalties, brand damage, and/or dissatisfied customers. If the recovery time is long, then the adverse business impact of a disaster is greater. A good disaster recovery plan is intended to recover rapidly from a disruption, regardless of the cause.

This DR plan defines 4 basic elements:

  1. Response - A step-by-step procedure to perform in the event of a disaster that severely impacts the primary data center hosting the system in order to failover to a secondary site.
  2. Secondary Site (Backup) - A secondary, backup instance of the system (DR site) in support of business continuity in the event of a disaster.
  3. Data Replication - The data replication mechanism that keeps a secondary site in sync with a primary.
  4. Recovery - An approach to reconstitute the primary data center hosts after an assessment of the damage.

Disaster Recovery defines two primary objectives, Recovery Point Objective (RPO) and Recovery Time Objective (RTO):


Recovery Point Objective (RPO) - The maximum targeted period of time in which data or transactions might be lost from an IT service due to a major incident. For example, the time period elapsed during a data replication interval. 

Recovery Time Objective (RTO) - The targeted duration of time and a service level within which a business process must be restored after a disaster or disruption in order to avoid unacceptable consequences associated with a break in business continuity. For example, 24 hours to restore 95% of the service. 

Recovery

After damage to the primary site is assessed, a procedure to re-constitute the site to an operational state can be followed. A procedure is expected to be completed within 24 hours of a disaster. During this recovery period, a DR site is expected to provide business continuity, in some cases read-only, as user operations can be queued up, but not yet committed. 

Wednesday, April 19, 2023

GitHub Actions + AWS CodeDeploy

Let's sketch an architecture diagram of a solution, and describe a CI/CD pipeline, including build, test, pre-deployment and post-deployment actions, and tools that could be used to deploy this application to AWS. 

Solution

One approach is to add GitHub Actions to a blog-starter repository that contains the Node.js application source code, to define a CI/CD pipeline. We could re-deploy the blog-starter application onto an AWS EC2 Linux instance when source code changes are pushed to the GitHub repository. In this approach, AWS CodeDeploy services are integrated with GitHub and leveraged for this purpose. 


High Level Flow

  • Developer pushes a commit to a branch in the blog-starter GitHub repo. 
  • The push triggers GitHub actions that run AWS CodeDeploy 
  • The AWS CodeDeploy commands deploy the new commit to the EC2 instance that hosts the Node.js app
  • Hook scripts are invoked to run pre-installation, post-installation, and application start tasks. 

Architecture Sketch


Pipeline Stages

A. Test

Code quality tests can be implemented as GitHub pre-merge checks to run against the application source code. A GitHub pull request catches when a specific line in a commit causes a check to fail. This will display failure, warning, or notice next to the relevant code in the Files tab of the pull request. 

The idea here is to prevent a merge to the master branch until all code quality issues have been resolved. 

B. Pre-Deployment

Any dependencies that need to be installed on the Linux EC2 instance can be installed by a hook script that is defined in the CodeDeploy AppSpec file's hooks section. 

The CodeDeploy AppSpec file is placed in the blog-starter repository where the AWS CodeDeploy Agent can read it, for example under blog-starter/appspec.yml

C. Build

The blog-starter node application is built by running npm. This step is accomplished by another hook script that is defined in the CodeDeploy AppSpec file hooks section under ApplicationStart. 

D. Post-Deployment

Tasks that run after the application is installed, such as changing permissions on directories or log files, can also be defined in a hook script in the CodeDeploy AppSpec file under AfterInstall. 

AWS EC2 Instance

To host the application in AWS, an EC2 Linux instance can be defined and launched. Initial installation of node, npm, as well as the app by cloning the GitHub repository, can be done manually from the EC2 command-line to have these features up and running in the cloud. 

AWS CodeDeploy Agent

The installer for CodeDeploy agent can be downloaded onto the EC2 Linux instance from the command line to install, and then the agent can be started as a service. 

AWS CodeDeploy

Additional configuration is needed, for example, to create an AWS IAM Role and User that is authorized to run deployment commands through the CodeDeploy agent. 

GitHub Actions

A deploy.yml file can be added under .github/workflows that defines the CI/CD pipeline steps, or what do to after a push. For example, 1) checkout a branch and then 2) run a CodeDeploy deployment command.  

Further Reading


Infrastructure as Code (IaC)

IaC or Infrastructure as Code refers to the use of machine-readable definition files to manage and provision IT infrastructure, instead of manual configuration. The infrastructure is treated like software code that can be versioned, tested, and automated. This allows for faster and more reliable deployment of infrastructure and easier management and scaling of resources. The infrastructure is defined in code using programming languages or specialized tools. This code can be executed to create, modify, or delete infrastructure resources in a repeatable and consistent manner, thus reducing the risk of human error and increasing the efficiency of IT operations, resulting in overall system reliability.

There are many IaC frameworks available, including popular ones like Terraform, Ansible, Puppet, Chef, CloudFormation, and SaltStack. The choice of framework depends on factors such as the size and complexity of the infrastructure, the specific needs of the organization, and the skills and expertise of the IT team. For example, Terraform is known for its ability to provision infrastructure across multiple cloud providers and on-premises environments, while Ansible is popular for its simplicity and ease of use. Puppet and Chef focus on configuration management and enforcing consistency across infrastructure resources, while CloudFormation is specific to Amazon Web Services (AWS) environments. SaltStack offers an event-driven automation approach that can help with high-scale and complex infrastructures. Ultimately, the best IaC framework is the one that meets the needs of the organization and aligns with its IT strategy and goals.