One complex setup
Table of Contents
Introduction
To inaugurate this blog, in this first post, I will describe the complex setup I have chosen to run this blog. I will discuss the goals I have set for myself, the tools I have used, and the learnings I have gained.
When explaining things, especially solutions, it always makes sense to start with the problem we're trying to solve. So, let's begin:
Problem
The requirements are roughly as follows:
- I wanted to use S3 to host the website.
- Entries are kept on git.
- Any time I publish a change, it should get deployed to the website (CD pipeline).
- I wanted to write entries in markdown.
- I didn't want to use any existing static site generator (henceforth SSG).
Solution
CloudFormation and CDK
While I have been hosting a bunch of sites on a $10 Linode instance for a while, I wanted to host the blog on AWS and S3, mostly to learn more about how to use AWS. To achieve this, I had to use Infrastructure as Code (IaC), and the easiest way is to use CloudFormation Development Kit or CDK.
CloudFormation is AWS' IaC offering, and it works more or less by writing a big YAML file and submitting it to the CloudFormation service. It takes this file as input, understands the dependencies, calculates the drift, and provisions some resources on your behalf.
CDK is much nicer because it allows you to script this YAML file using code. The general sentiment everyone settled on is to use TypeScript, although more languages are supported (e.g., Python). I'm not super versed in TypeScript, but I'm not writing a complex configuration either, and as long as it is typed, I'm happy.
Getting started with CDK is as straightforward as:
npm install -g npm
npm install aws-cdk-lib
cdk init app --language typescript
# Preview and synthesize CDK
cdk synth
# Create resources to allow CDK to deploy things
cdk bootstrap
# Deploy resources
cdk deploy
I'm lucky enough to have a subscription to acloudguru.com. They offer an AWS playground, which is an AWS account in which you can play around with the services. After trying my CDK code in this account, I published the final CDK code to my personal AWS account.
Other than reproducible environments, the cool thing about IaC is that it allows other people to share and learn from each other.
S3
I think hosting a static website on S3 is kind of considered the 101 of "start doing something with AWS." So, I was expecting to find a lot of tutorials, but it turns out most of them are outdated. AWS changes over time, and websites tend to not keep pace, whereas the documentation is usually the source of truth.
Anyways, for hosting a website on S3, you have two options:
- Use the REST API endpoint. This is used by smart clients that can talk to S3. Messages are exchanged over HTTP.
- Use the "S3 website hosting" feature. This is nice because it provides you with, for example, the possibility to set an error page.
I started with the option 2, creating the cdk is quite straightforward:
// create an s3 bucket that will contain the blog website.
const blogBucket = new s3.Bucket(this, "blog.fponzi.me", {
bucketName: props.bucketName,
publicReadAccess: true,
removalPolicy: cdk.RemovalPolicy.DESTROY,
websiteIndexDocument: "index.html",
versioned: false,
blockPublicAccess: new s3.BlockPublicAccess({
blockPublicAcls: false,
ignorePublicAcls: false,
blockPublicPolicy: false,
restrictPublicBuckets: false,
}),
});
Recently (around April 2023)), S3 changed the default policy for new buckets. Now you need to specify the BlockPublicAccess and set everything to false. In recent history, there have been a few instances where misconfigured public S3 buckets caused leaks, and S3 is likely trying to help prevent that.
To add a CNAME and point it to this bucket, the bucket needs to be named the same way as the final domain. So, I needed a bucket called "blog.fponzi.me". I created a simple index.html and error.html and even used CDK to upload them:
const deployment = new s3Deployment.BucketDeployment(this, "deployStaticWebsite", {
sources: [s3Deployment.Source.asset("../website")],
destinationBucket: myBucket
});
Now everything was fine, and I was almost done! Then, I discovered that:
Amazon S3 website endpoints do not support HTTPS or access points. If you want to use HTTPS, you can use Amazon CloudFront to serve a static website hosted on Amazon S3
CloudFront
At this point, I knew I had to change my plans. Although I was already planning to use CloudFront, in this case I was forced to use it. Adding CloudFront for an S3 website is super easy. However, using the website now comes with its own downsides.
Anyone can create a CloudFront distribution and put it in front of my S3 bucket and serve my files from their domain. One way to keep using the S3 website endpoint and restrict access to the files to my own distribution, is to use a "sentinel" header trick. My distribution should add a secret header and my S3 bucket should deny any request that doesn't come with that secret header.
In my case, I decided to disable the website endpoint, remove public access and restrict access to the bucket only to my CloudFront distribution. There are currently two solutions offered by CloudFront:
- Origin Access Identity (OAI): It is currently considered legacy. It consists of a special user that will request files on behalf of our site's users.
- Origin Access Control (OAC): It works similarly to OAI and supports more options. Reference
It looks like OAC is still not supported in cdk: https://github.com/aws/aws-cdk/issues/21771 so OAI it is. The nice thing about cdk is that I just need to update the code and deploy to get the needed changes in place.
For this next part, I found this article very useful: https://idanlupinsky.com/blog/static-site-deployment-using-aws-cloudfront-and-the-cdk/
Basically, the idea is to:
- Create the S3 bucket and restrict access.
- Create an OAI and grant read access to the bucket.
- Create a certificate using AWS Certificate Manager (ACM).
- Create a CloudFront distribution in front of the bucket.
Some may have noticed that I'm not using AWS Route53 for domain hosting. That's because my website and domain's hosted zone are on Linode. Additionally, Linode offers free DNS services, which is a nice perk.
This setup makes the process of creating an ACM certificate using CloudFormation a bit unusual. During deployment, the stack creation halts midway. To continue, you'll need to access the stack's page on the CloudFormation console and navigate to the events page. There, you'll be instructed to create a CNAME record on your DNS, pointing to a specific ACM domain. This step is necessary to verify domain ownership. Once ACM verifies the presence of the DNS entry, the CloudFormation stack will be able to progress again.
For CI/CD, I'm going to use GitHub Actions. I'm already used to them, and I'm going to host the projects on GitHub, so they're quick to put together. Therefore, I also needed a way to allow GitHub Actions to access my S3 bucket.
There is a nice GitHub action called configure-aws-credentials that also lists all the ways available to configure AWS credentials. Surprisingly, the easiest way is also the safest: creating an OpenIdConnectProvider for GitHub.
The GitHub official documentation has a very easy-to-follow step-by-step guide, which involves clicking around in AWS services.
I found and slightly modified a simple stack to accomplish this using CDK.
import * as cdk from 'aws-cdk-lib';
import * as iam from 'aws-cdk-lib/aws-iam';
import { Construct } from 'constructs';
/**
* Used to access AWS resources from GitHub Actions.
**/
export interface GitHubStackProps extends cdk.StackProps {
/**
* Name of the deploy role to assume in GitHub Actions.
*
* @default - 'exampleGitHubDeployRole'
*/
readonly deployRole: string;
/**
* The sub prefix string from the JWT token used to be validated by AWS. Appended after `repo:${owner}/${repo}:`
* in an IAM role trust relationship. The default value '*' indicates all branches and all tags from this repo.
*
* Example:
* repo:octo-org/octo-repo:ref:refs/heads/demo-branch - only allowed from `demo-branch`
* repo:octo-org/octo-repo:ref:refs/tags/demo-tag - only allowed from `demo-tag`.
* repo:octo-org/octo-repo:pull_request - only allowed from the `pull_request` event.
* repo:octo-org/octo-repo:environment:Production - only allowd from `Production` environment name.
*
* @default '*'
* @see https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect#configuring-the-oidc-trust-with-the-cloud
*/
readonly repositoryConfig: { owner: string; repo: string; filter?: string }[];
}
export class GithubOcidStack extends cdk.Stack {
constructor(scope: Construct, id: string, props: GitHubStackProps) {
super(scope, id, props);
const githubDomain = 'token.actions.githubusercontent.com';
const ghProvider = new iam.OpenIdConnectProvider(this, 'githubProvider', {
url: `https://${githubDomain}`,
clientIds: ['sts.amazonaws.com'],
});
const iamRepoDeployAccess = props.repositoryConfig.map(r =>
`repo:${r.owner}/${r.repo}:${r.filter ?? '*'}`);
// grant only requests coming from a specific GitHub repository.
const conditions: iam.Conditions = {
StringLike: {
[`${githubDomain}:sub`]: iamRepoDeployAccess,
},
StringEquals: {
"token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
}
};
const githubrole = new iam.Role(this, 'githubDeployRole', {
assumedBy: new iam.WebIdentityPrincipal(ghProvider.openIdConnectProviderArn, conditions),
roleName: props.deployRole,
description: 'This role is used via GitHub Actions to deploy with AWS CDK or Terraform on the target AWS account',
maxSessionDuration: cdk.Duration.hours(1),
});
}
}
Then from cdk.ts:
const githubStack = new GithubOcidStack(app, 'GithhubOcidStack', {
deployRole: DEPLOY_ROLE,
repositoryConfig: [
{ owner: 'FedericoPonzi', repo: 'blog' },
],
});
This will allow "DEPLOY_ROLE" role to use aws credentials from FedericoPonzi/blog's GitHub actions. From GitHub, the action will run aws s3 sync
which works like the rsync
program. I checked what permissions are needed for it to work, and assigned them to the github role:
const githubIamRole = iam.Role.fromRoleName(this, "githubRoleId", props?.deployRole);
// policies required for aws s3 sync
// "*object*"" permissions applies to arnForObjects, whereas "*bucket*" applies to bucketArn.
var policy = new iam.PolicyStatement({
actions: [
"s3:DeleteObject",
"s3:GetObject",
"s3:PutObject"
],
resources: [blogBucket.arnForObjects('*')],
principals: [githubIamRole],
effect: iam.Effect.ALLOW,
});
blogBucket.addToResourcePolicy(policy);
var policy = new iam.PolicyStatement({
resources: [blogBucket.bucketArn],
actions: [
"s3:GetBucketLocation",
"s3:ListBucket"
],
principals: [githubIamRole],
effect: iam.Effect.ALLOW,
});
blogBucket.addToResourcePolicy(policy);
...and this is basically it for the infrastructure! The downside of not using the S3 website endpoint is not having a proper error page (I dare you, try opening a random page and see what happens). I might revisit this in the future, probably by using the "sentinel" header trick.
Genereto: The Simple Static Site Generator
Genereto is a very simple static site generator I decided to build in Rust, but version 0.0.1 could have been a sed script.
Each entry is divided into two parts:
- A metadata section written in YAML.
- The Markdown content.
The metadata looks like this:
title: 'One complex setup'
keywords: 'CDK, AWS, CloudFront, Rust'
publish_date: '2023-05-19'
read_time_minutes: 10
A Genereto project is also very straightforward. It consists of a content
folder, templates
folder, and generated output
folder.
At build time it will iterate through all entries, take the template as input, and replace the template with the content. It also supports some simple variables that will be replaced:
$GENERETO["title"]
= metadata.title$GENERETO["publish_date"]
= metadata.publish_date
and a few more, but you get the idea.
In case you're wondering, as of this writing, Genereto is not open source (edit: as of 2023-08-20, Genereto is now opensource!). However, I plan to release it on GitHub here once I have spent some time refactoring it and making it more user-friendly.
The next piece is the website's template.
🤖 ChatGPT for web design
For the template, I used ChatGPT to quickly bootstrap a simple template for the website. I started with a simple prompt and progressively increased my demands:
- Write a simple HTML5 page with responsive support.
- Can you add a basic sticky top navbar with the website title?
- On the right side of the screen, can you add a link to "index.html" and "blog.html"?
- Nice! Can you add some clean style to the links and the title?
- Can you align the links on the left side, near the website title? Also, use "Federico Ponzi's Blog" as the website title.
- This template is for a blog post. Can you also add the published date and the read time under the post title?
Etc. I used to use the Chrome design inspector to style a page, but being able to use ChatGPT to get 90% of the template done has been nice. I know, I know, I didn't put great effort into thinking hard about these prompts, but I really wanted to get it done quickly. What surprised me about ChatGPT are prompts like (3). Taken out of context, it could really mean anything. But if you put more importance on the previous request, then you could understand that I was asking for the right side of the navbar.
After putting a super simple template together, I created a simple demo page with all the possible markup elements.
GitHub Actions
The last bit is setting up GitHub Actions. Genereto has a basic Continuous Integration action that builds and runs tests for the project on every commit. The blog is on a different repository altogether and has an action to pull Genereto's repo, compile it and use it to build the HTML pages and push them to S3.
The Continuous Deployment looks like this:
name: Upload Website
on:
push:
branches:
- main
permissions:
id-token: write # This is required for requesting the JWT
contents: read # This is required for actions/checkout
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Checkout Genereto, a super simple static site generator
uses: actions/checkout@v3
with:
repository: FedericoPonzi/Genereto
token: ${{ secrets.GH_PAT }} # it's in a private repo
path: genereto
- name: Install rust toolchain
uses: actions-rs/toolchain@v1
with:
toolchain: stable
- name: Run genereto to generate the website
uses: actions-rs/cargo@v1
with:
command: run
args: --manifest-path genereto/Cargo.toml -- --project-path /home/runner/work/blog/blog/genereto-project
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v2
with:
# arn:aws:iam::123456789100:role/github-blog-upload-s3-role
role-to-assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}
aws-region: us-east-1
- name: Deploy static site to S3 bucket
run: aws s3 sync ./genereto-project/output s3://blog.fponzi.me --delete
Eventually, it would be nice to use a release instead of building Genereto on every commit. Also, I'm not doing any invalidation on CloudFront, so the cache will stay dirty until the files become stale. I'm planning to manually invalidate as needed. The first 1000 invalidations per month are free, so I might consider just adding a new step.
How much did it cost to set up this infrastructure?
This setup cost me 0¢. Both AWS and GitHub provide a certain amount of free compute resources every month. I believe their intention is to attract developers with their offerings, and I must say it worked! However, if the demand for my pages increases significantly, I may end up paying for additional traffic. Additionally, frequent changes to CloudFormation stacks may eventually incur some costs.
Regarding GitHub actions, the most time-consuming part is building and running Genereto. If it becomes too time-consuming, I might consider creating and pulling releases instead.
Next steps
While there are a few things I feel are missing, I plan to add them gradually as I progress. For now, I wanted to have something functional and enhance my knowledge about AWS and CDK, and I believe I have achieved that goal.
Some of the future improvements I have in mind include:
- Enhancing support for handling assets (such as template-based minifying) and components in Genereto.
- Supporting more variables in the templates.
- Implementing error pages, which could potentially be done using CloudFront (or reconsider using website endpoint!)
- Setting up a pipeline for deploying CloudFormation changes, although this may not be implemented in the near future.
Conclusions
At this point, I consider the current state of the setup to be satisfactory. As the reader has hopefully discovered, the final setup is not as complex as I initially claimed it to be. However, it is certainly more complex than simply rsyncing an HTML file to a VPS with Apache installed on it.
I plan to write new articles whenever I have ideas or projects I want to share. I'll be sharing these articles on Twitter, so make sure to follow me if you don't want to miss any updates.
I would love to hear your questions, comments, and feedback. Please don't hesitate to reach out to me at me+blog@fponzi.me.
Until the next one!
I would like to express my gratitude to Federico Scozzafava and Marco Biagi for their early review and valuable feedback!