Using a Metadata Proxy to Limit AWS/IAM Access with GitLab CI
The gitlab-runner
agent is very flexible, with multiple executors
to handle most situations. Similarly, AWS IAM allows one to use “instance
profiles” with EC2 instances, obviating the need for static, long-lived
credentials. In the situation where one is running gitlab-runner
on an EC2
instance, this presents us with a couple interesting challenges – and
opportunities.
- How does one prevent CI jobs from being able to obtain credentials against the instance’s profile role?
- How does one allow certain CI jobs to assume credentials through the metadata service without allowing all CI jobs to assume those credentials?
Criteria
For our purposes, we want:
- The
gitlab-runner
agent to run on an EC2 instance, with one or more runners configured.1 - All configured runners should be using the Docker executor.
- Jobs to run, by default, without access to the EC2 instance’s profile credentials.
- Certain jobs to assume a specific role transparently through the EC2 metadata service by virtue of what runner picks them up.
- Reasonable security:
- Jobs can’t just specify an arbitrary role to assume
- No hardcoded, static, or long-lived credentials
Only short-term, transient credentials
It’s worth emphasizing this: no hardcoded, static, or long-lived credentials. Sure, it’s easy to generate an IAM user and plunk its keys in (hopefully) protected environment variables, but then you have to worry about key rotation, audits, etc, in the way one doesn’t with transient credentials.
Executor implies methodology
For our purposes, we’re going to solution this using the agent’s docker executor. Other executors will have different solutions (e.g. kubernetes has tools like kiam).
However, for fun let’s cheat a bit and do a quick-and-fuzzy run-through of a couple of the other executors.
This is largely like the plain docker
executor, except that as EC2 instances
will be spun up to handle jobs you can take a detour around anything complex
by simply telling the agent to associate specific instance profiles with those
new instances, e.g.:
|
|
The instance running the gitlab-runner
agent does not need to be associated
with the same profile – but the agent does need to be able to
EC2:AssociateIamInstanceProfile
and iam:PassRole
the relevant resources.
The downside is that you’ll have to have multiple runners configured if you want to be able to allow different jobs to assume different roles.
The kubernetes
executor is going to be a bit trickier, and, as ever,
TMTOWTDI[^tmtowtdi]. Depending on what you’re doing, any of the following
might work for you:
- Launch nodes with the different profiles and use constraints to pick and choose which job pods end up running on them.
- Use a solution like kiam.
- …
Brute force
Ever a popular option, you can just brute-force block container (job) access to the EC2 metadata service by firewalling it off, e.g.:
|
|
If you just want to block all access from jobs, this is a good way to do it.
This approach is contraindicated if you want to be able to allow some containers to access the metadata service, or to allow them to retrieve credentials of some (semi) arbitrary role.
EC2 metadata proxy
A more flexible solution can be found by using a metadata proxy. This sort of service should be a benevolent man-in-the-middle: able to access the actual EC2 metadata service for its own credentials, able to inspect containers making requests to determine what role (if any) they should be assuming, and able to assume those roles and pass tokens back to jobs without those jobs being any the wiser about it.
For our purposes, we will use go-metadataproxy2, which will handle:
- EC2 metadata requests made by processes in containers (e.g. CI jobs);
- Sourcing its own credentials from the actual EC2 metadata service;
- Inspecting containers for the IAM role that should be assumed (via the
IAM_ROLE
environment variable); - Blocking direct access to the EC2 metadata service; and
- Assuming the correct role and providing STS tokens transparently to the contained process.
The authentication flow will look something like this:
This also means that the instance profile role must be able to assume the individual roles we want to allow jobs to assume, and the trust policy of the individual roles must allow the instance profile role to assume them.
In short:
- The instance profile’s IAM role policy should only permit certain roles to be assumed, either by ARN or some sensible condition (tagged in a certain way, etc).
- Roles in the account, in general, should not blindly trust any principal in the account to assume them.3
Configuring the CI agent correctly
We’re not going to cover it here, but take care when registering the runner. Under this approach, judiciously restricting access to the runner is a critical part of controlling what jobs may run with elevated IAM authority.
Keep a couple things in mind:
- Registering runners is cheap; better to have more runners for more granular security than allow projects / pipelines with no need for access to use them.
- Runners can be registered at the project, group, or (unless you’re on gitlab.com) the instance level; register them as precisely as your requirements allow.
- Runner access can be further restricted and combined with project/group access by allowing them to run against protected refs only, and then restricting who can push/merge to protected branches (including protected tags) to trusted individuals.
Anything that allows a pipeline author to control what role the proxy assumes
is a security… concern. In this context, IAM_ROLE
can be set on the
container in one of several ways (in order of precedence):
- Through the runner configuration;
- By the pipeline author; or
- By the creator of the image.
Unless you intend to allow the pipeline author to specify the role to
assume, it is recommended that IAM_ROLE
always be set in the runner
configuration file, config.toml
. If you don’t want any role to be
assumed, great, set the variable to a blank value.
go-metadataproxy
discovers the role to assume by interrogating the docker
daemon, inspecting the container of the process seeking credentials from the
EC2 metadata service. It does this by looking for the value of the IAM_ROLE
environment set on the container.
IAM_ROLE
must be set on the container itself. While
whitelisting
the list of allowed images isn’t a terrible idea, the safest and most reliable
way of controlling this as the administrator of the runner is to simply set
the environment variable as part of the runner configuration.
|
|
This also means that we’re going to want a runner configuration per IAM role. (Not terribly surprising, I would hope.)
Running the metadata proxy
This is reasonably straight-forward, in two parts. There are a number of ways to run it, but as we’re doing this in a docker environment anyways, why not let it handle all the messy bits for us?
|
|
Using the metadata proxy
To use the proxy, the containers must be able to reach it in the same way they would reach the actual EC2 metadata endpoint. We need to prevent requests to the metadata endpoint from reaching the actual endpoint, and instead be transparently redirected to the proxy. (That is, we’re going to play Faythe4 here)
To “hijack” container requests to the EC2 metadata service, a little iptables
magic is in order. This is well described in the project’s
README.
I’m including it here as well for completeness’ sake, and with one small
change: instead of redirecting connections off of docker0
, we reconnect any
off of docker+
. (If you’re using the runner’s network per
build
functionality, you may need to tweak this.)
As we’re exposing the metadataproxy on port 8000, you’ll want to make sure
that port is firewalled off from the outside; either via iptables
or a
security group.
|
|
IAM role requirements
EC2 Instance Profile
The role belonging to the instance profile associated with the instance our
agent lives on should be able to assume the roles we want to allow CI jobs to
assume. Specifically, the trust policy must permit iam:GetRole
and
sts:AssumeRole
on these roles.
If you’re using S3 for shared runner caches, you may wish to permit this access through the instance profile role as well. (Implemented properly, the proxy will not permit direct CI jobs to use this role.)
Container / Job IAM roles for assumption
As before, only containers with IAM_ROLE
set at the container level will
have tokens returned to them by the metadata proxy5, and then only if the
proxy can successfully assume and convince STS to issue tokens for them. For
this to happen, the container/job role’s trust policy must alllows the role of
the instance profile associated with the EC2 instance to assume them.
Specifically, the trust policy must permit iam:GetRole
and sts:AssumeRole
.
Profit!
Alright! You should now have a good idea as to how create and run CI jobs that:
- CANNOT request tokens directly from the EC2 metadata service
- CANNOT implicitly assume the EC2 instance profile’s role
- CANNOT leak static or long-lived credentials
- CAN transparently assume certain specific roles
Enjoy :)
-
The nomenclature gets a bit tricky here.
gitlab-runner
- The agent responsible for running one or more runner configurations.
- A “runner”
- A single runner configuration being handled by the
gitlab-runner
agent. - An entity that can run CI jobs, from the perspective of the CI server (e.g. gitlab.com proper).
-
Lyft also has an excellent tool at https://github.com/lyft/metadataproxy. I’ve used it with success, but
go-metadataproxy
provides at least rudimentary metrics for scraping. ↩︎ -
Not that anyone would ever create a trust policy like that, or that it would be one of the defaults offered by the AWS web console. Nope. That would never happen. ↩︎
-
Unless, of course, the metadata proxy is configured with a default role – but we’re not going to do that here. ↩︎