Alpaca - How workday handles with SSL/TLS connections

Jessica Rangel
Senior Devops
Automatic Summary

Alpaca: Enhancing Security with SSL and TLS Connections

Hi, I'm Jessica Rio, a senior devops at Workday, part of the security product themes, and today I'm going to talk about Alpaca and how Workday deals with TLS and SSL connections.

Workday is an American firm that provides cloud-based systems for finance, HR and planning, is considered a leader by Gardener in HR systems and ERP. Our security is second to none, with numerous measures in place to keep our over 90,500 customers' precious data safe.

The Art of Encryption: PKI and CA

Before we delve deep into Alpaca, let's briefly talk about Public Key Infrastructure (PKI) and Certificate Authority (CA). PKI works by encrypting a message through a unique set of keys (public and private). For example, if you want to send a secure message to your friend, you can encrypt it with their public key – only they can decrypt it because they're the only one with the corresponding private key. This is the essence of PKI.

On the other hand, CA guarantees the identity that each private key represents. For example, on the internet, a CA would authorize and authenticate a bank website, ensuring that its users are on the legitimate platform and not a fraudulent clone. This is where CA plays a significant role.

Introducing Alpaca

Now, let's talk about Alpaca - Asynchronous Low Latent Platform Automation Certificate Authority. It's the sole solution in Workday authorized to issue, manage, validate, and renew digital certificates. This in-house certificate authority solution is built based on PKI and adheres to best practices described in the CPS and RFC.

Alpaca Trust Model

The trust model with Alpaca is a little different, and we've added more layers for increased security. For web-based applications, we use three certificates: your web server certificate, intermediate certificate, and root certificate. But in the case of Alpaca, we add in two more: an identity certificate and a security endpoint certificate.

Diving Deep into Alpaca's Key Features

Authoritative Service List and Identity Certificates are among Alpaca's best features. They relate to how data is accepted and processed between different services. A service is only allowed to interact with Alpaca and request certificates if it is specified on the authoritative service list, which also enhances security.

When it comes to Security Endpoint, it exposes your endpoint over HTTPS or any other encrypted protocol. There's also Trust Star, a package that contains the Root CA.

Finally, Alpaca also provides an API. Having an API makes it easier for other teams to integrate Alpaca into their workflows and automate procedures, making Alpaca a success.

Alpaca's Future Challenges

The main challenge with an internal certificate authority like Alpaca is securely handling private keys. It involves delicate balancing the need for automation without compromising security standards. Future challenges may include regulatory and compliance requirements, potential new encryption methods, newer technologies, and market standards.

Conclusion

Alpaca has been successful so far as an internal certificate authority solution at Workday. It secures our data, enhances our infrastructure, and increases our efficiency. But, like any other technology, Alpaca too needs to adapt and evolve with changing times, laws, regulations, and market demands.

Feel free to reach out if you have any questions about Alpaca, Workday’s handling of SSL and TLS connections, or Workday in general. Thank you for reading!


Video Transcription

Um presentation today. And I'm very happy that you guys girls actually choose to come and see my presentation about security. So my name is Jessica Rio and today I'm gonna talk about Alpaca and how Workday handles with TLS and SSL connections.So in our agenda today, I'm gonna mention a little bit what it's work day and introduce myself. I'm gonna cover some key concepts about what's PP I and C A. Also, I'm gonna mention exactly what it's, it's Alpaca, uh the decisions behind to build borrow or buy this kind of solution, our key features and also some feature challenges. So just introducing myself, my name is Jessica Rango. I am originally from Brazil, but I'm based in Dublin, Ireland.

I'm senior devops at work day and there I am part of the security product themes. Uh Overall, I have more than 15 years working on it. So if you have any question about uh work day or uh if you want to talk about security, you can connect it uh with me on my linkedin and I would love to keep in touch with you. So covering a little bit about workday. So Workday, it's an American company. Uh we do uh systems that are cloud based for finance hr and planning. Nowadays, we are a leader of the Gardener for HSM system. So it's basically hr and we also are Gardener leader for the er P, we have 90,500 customers and a few of them are Netflix uh Google uh Walmart. So uh because those huge uh customers and these, these huge companies trust Workday to handle with their most precious data that's like finance or hr uh for workday security. It's one of our top priorities work day also has um behind the scenes, thousands of applications that are supported for a hundreds of teams and also acquired companies.

So if you feel interesting to see uh the positions that we have open and we have positions for everything that's related to A T in this uh presentation, I put a link where you can check and also you can um connect with me on the link again to have more details about how it works.

Uh ho how it, how it is working at work day because yeah, I really like to work there. Now going to the um mentioning a little bit about the concepts. So then before we start to park, I would like to explain actually a little bit, a little brief um how the public key infrastructure works. So the PKI basically, it's in a way to encrypt your message if you want to send that security to a friend of yours or to a recipient. So how you do that? The PKI creates two keys, the, the public and the private key. So let's say that you want to send a message to your friend and this message, this message you would be encrypted. So no one besides your friend would have access. So what you can do it's using PKI, you can encrypt your message with your friends public key. And because he's the only one that has the private key, he's gonna be the only one that's able to decrypt the message and then read uh the information that is there. If it's someone fetches the this message that's encrypted with the public key, they are not gonna have access because this is gonna be encrypted and only the private key can decrypt the message. And now commenting a little bit about the certificate authority.

Actually, if you think about uh PK I that I just mentioned, so who has access to the private key would be able to decrypt the message, right? So how can I be sure that only my friend has the the private key and he didn't share with someone else. So imagine that in the internet, we have, it's a very hard time to actually make sure that websites are who they are, right? Because hackers can create, can mimic a very uh uh can, can mimic bank websites. And you don't know if you can trust that website is actually your bank or not. So because of that, for the internet, we have a third party or our organization or a component called it certificate authority. That as you can see in this image of the certificate authority, it's an organization or a company that would um trust and you can trust on the certificate authority that the in this image, Ali is actually the owner of the public key that is related to Ali and only her has the private key uh in the internet.

We have many external certificate authorities. So this uh this other graph, it's basically saying that Commodo that's a company that creates certificates and it's also a certificate authority holds like 45% of the SSL certificates over the internet. So basically Commodo, it's a third party company that that is gonna say uh that that bank that uh that where all came from, the certificate that's associated to that is actually from, from that company that you were believing then thinking about it. So how do we do? We enable TLS and SSL for thousands of micro services. So more than 100 teams deployed also in different platforms like cnets, openstack physical server. So Workday has uh had this problem and we solve that with a Parker. So then mention a little bit about Alpaca. So first of all, it's not an animal. All right, actually, we managed to create an acronym acronym that stands for a synchronous low latent platform automation certificate authority. So more uh as known as Alpaca, so Alpaca, it's an internal certificate authority solution that it's using at work day and see. And it's the only solution that's authorized to issue, manage, validate and renew digital certificates in our infrastructure. But to create A B A, we didn't create that from what we had in our heads. Basically, we, we follow up some best practice that are described in the CPS and also in the RFC.

So all the certification to even being internally or externally has to follow these rules for the RFC also because they mentioned a few security items that you have to implement when you have the certificate authority. And nowadays, uh ALPACA is responsible to issuing more than 30 million internal certificates uh so far for work day. And here, it's a little bit about the decision that we had to um between to build borrow or buy a solution for the certificate authority. So back in 2014, uh we started to uh try to fix this issue um to have SSL and TLS uh internally for the communication of the microservices and the services that we had. And back on that time, we saw a very high cost to use a third party solution. So basically, it was 100 K dollars per year to create certificates for 20 K servers on that time. And also on top of that work day has to, to use a little bit of uh develop development um on top of it to create the technical fit for our architecture. Because as a third party solution doesn't come um perfectly for your architecture, you have to create some kind of connectors and et cetera.

So you have to invest some development hours and this would be a little bit uh on top of this high cost of the licensing, if you imagine that now Workday has actually uh 10 times more than the, the services that they have in that we have in 2014. Also, we evaluated some open source solutions. And the main reason that we didn't uh went for any, all of them was because we checked um documentation, we checked adoption for other companies. And back on that time, I was one year before of the lasting crypt and any of the solutions inspired confidence for us because of the lack of the documentation, because of the language that was used or because of um the lack of uh other companies and other people using actually the open sourcing and, and supporting that.

So then in 2015, we, we just decided to build our own uh certificate authority solution based on the PK I. And also taking in account that the solution would be fit for our uh technical requests, would uh would would need to, to, to reach out our our performance thresholds. And also our use cases that workday has in the infrastructure, a few keys, uh a few key features uh about ALPACA is our trust model uh of the certificate. I'm gonna give you more details how that works in the next slide, also uh authoritative list, service list. So basically we have a list of services that can use ALPACA. And I'm gonna mention also in the preview in the next slide a little bit more about it. We have our trust star, we have identity and security certificate. I'm gonna give you more details later and also we have an EP I for Alpaca. So going for the trust model, that is actually what makes our pocket different from a last encrypt and, and any other um external certificate authority that we have in in in the internet. Usually uh if you get a certificate to put in your website and that's as a public website, you would have three certificates, you have the uh the certificate for your web server.

You, you have the intermediate certificate and the root certificate, but work day created more layers on top of it that I'm gonna explain why. And it's gonna make sense for you why we have like that. So first of all, starting from the bottom of this image, we have the identity and security endpoint certificate. These two certificates are gonna be um available inside of your server or inside of your pod. And it's actually what your application are gonna use in the daily work identity is a certificate that are gonna uh um that that's gonna tell to the other services that you are who you are. And the security point is the certificate for the htps. I'm gonna give you more details in the next slide. What's important here is your server also has an certificated call actor. So that actor actually is on a certificate to validate that your server is something that we trust inside of our infrastructure. So then imagine that, for example, if your server get hacked or get leaked for, for, for any reason, so if we would revoke the certificate that are related to the actor or for your server, then the identity and the security and the point certificate are gonna also be revoked.

And that is the key when you have these kind of layers, if you want to revoke the certificates, your chain or everything that's under that certificate are gonna get revoked as well. So on top of the actor, we have another certificate that's called it Uber actor. And the Uber actor is in a certificate that it's installed in your cluster in your platform. And through the Uber actor, the platform can send a request to Alpaca and get the actor certificate for the server, install that for you. And the, then the server with the actor certificate, the server, it's able to then connect to OPA and request the identity and security end point certificate for uh your application. So again, this Uber actor, they are created per uh cluster per platform. So I imagine that I have three clusters of certs, each of them would have an Uber actor. And if one cluster get uh compromised, I can just revoke the Uber actor and everything that all the certificates that are inside of that cluster are gonna get revoked. And anyone else in our infrastructure are gonna trust on that service because we know that they were compromised on top of the Uber actor, we also have the issuing C AC A. So as I mentioned, work day, it's a cloud based. So uh usually our systems are deployed in different zones in N AWS in Google. So then each zone uh would have a different issuing sue.

And again, imagine it that the whole zone was compromised for, for any reason I can revoking the issuing C A for that zone and then everything else that are there are gonna get revoked as well. So then it's um with that, it, we make it sure that for example, if you have any issue with that environment, we can revoke and and the rest of the cluster, the the rest of the environment are not gonna trust on that on, on that region anymore. On top of the issue in A, we have the policy and the policy is basically created for the type of the environment that we have. So then again, imagine it that your whole development environment that works in different regions, get compromised. Some someone got your private key.

So because of that, if something happens like that, we have the layers so you can revoke everything that's under. And then um basically the, the rest of the components that we have in your infrastructure are not gonna trust on that type of environment anymore and you can work safely and uh be secure that um that environment that was compromised, no one is gonna receive or accept any request from there.

On top of the C A policy, then we have the root C A and the root C A here. And for any um uh certificate authority works um to trust in the certificate authority. So by default, if you have an internal certificate, your um uh your browser or your server, they, they are not gonna trust in, in Alpaca or any internal certificate if you don't install the rule C A. So you have to install it. And for work day, we have only one certificate and this certificate is, is broadcast everywhere. So then if you um check internally one URL to your browser, you're gonna have that Green Locker and in your browser then uh going over uh uh to the other key features. So what is authoritative test service list and what is identity certificate they are kind of related? So when I mentioned about the Uber actor and mentioned that the platform, it's able to request certificates to Alpaca. Um, the platform cannot request any certificate they had, they have to tell which services is, uh for each which service, that certificate was requests. And if this service is not, uh described on Alpaca, they cannot get uh, a certificate that is kind of, uh, on a security feature as well because imagine that a hacker is gonna bootstrap a pot inside of your cluster and you even know that he's there and he's, he's gonna use like a name, I don't know, call it D DNS if the DNS name is not described in our service list and it's not gonna be there because uh Workday is very creative for names of services.

Uh They are not gonna be able to get the certificates. So then this authoritative service list, it works also with another barrier of, of security to grant uh the creation of the, the certificates in our infrastructure. Also this service list work with identity. So uh at work day, if two applications want to communicate between each other and want to exchange um some information we use Jason Web Token for uh a education and authorization. And this Jason web token are gonna use this identity certificate and in this identity certificate, it's gonna have the name of the server. So then imagine it that you have and point a um uh two services service A and service CB and in your service CB, your endpoint, it's saying that you would only receive um a data would be authorized only for service C A because the service A as has the identity certificate from Mo Parker.

When you send this request, the service CBS uh accept that and, and authorize you to, to, to communicate with them. So that's the reason that we have the identity uh certificate in our, in our infrastructure and the alterative service list. Also we have the security change point.

So this one is the, is the, is the, is the certificate that's common, that's common for people. So you install the security point in your Apache or your ton cat. And then you are gonna expose your endpoint over uh HTPS or any other protocol that you want to encrypt the traffic. Then also I mentioned about the thrust star. So thrust star is a package that, that my team controls, that has the root C A inside and then it's installed everywhere. So we also are responsible for that. So if uh we have a new root certificates or we need to renew or um if this root certificate could be internal or external, we al we, we put in this package and you distribute this everywhere. All the servers, all the laptops have must to have this uh packing to be able to communicate over uh an encrypted uh protocol and oops at the end, we have also epis so the only way that we can make service teams that are developers and platform works with you is basically having the EP I because then they can uh develop some solution on their side or they can automate their side.

And uh Alpaca has an EP I. So for example, uh for CNET clusters, uh the guys that take care of the CNET cluster created an operator to be able to request the certificates to Alpaca and mount the certificates inside of the pod. But also with the EPIS, we create a CLI and libraries. So the cli actually the develop the, the developers can use loon in your laptop and use the certificates if they want to test on their application and et cetera. And the library also, you can import the library to talk with our EP I and this library would be inside of your code for you to use it. So the EP I actually make the Alpaca be very successful by uh by the other teams take Alpaca to, to use it because we have this EP I actually, and they don't need to do stuff manually to request the certificates for example. And then actually um how this uh works, not just like this in a education uh purpose, but it's uh almost close to the what we have in the reality. So at work day, so in the orange box, we have Alpaca. So uh and then we have in the yellow box, we have a platform that would be like cer nets, these certs they have to have an Uber actor and also imagined that this Uber actor uh has the authoritative service list for the service A and CCB.

So then the certs using the Uber actor, it's able to fetch two actors uh certificates for a paca and mounting this actor certificates for the pot. Then the pod with the CC A goes to Alpaca as well and request the identity and the security change point. The same happens with the CCB that connects to OPA request the two certificates and with the security point, they are gonna expose the htps and then imagine that service C A wants to send a post request to an entry point in service CB. And the service CB in this entry point is, is listed there that only would accept um on a message or a data from uh the service A. So then when the service, when the service a send this uh request to the service CB, they create a Jason web token with the identity certificate proving that Alpaca says that he is the service A as he, they are mentioned that they are. So then service B will receive this uh request. They can do two things, they can just accept as it is or they can look back Alpaca and ask like this certificate, it's uh revocation or should I trust in this kind of source? If the Alpaca said that yes, the certificate was created by me and you can trust the source in, in the look back, let's say that then CB would accept the request and you would process the message that was sent by the servicing.

So this is how actually it, it works with the trust model and it's in a practical way to, to use Alpaca internally and then going uh towards the end of my presentation, what's the future challenges when you have a Cer internal certificate authority? So first of all, it's handled with the private keys. So the private keys for a certificate authority is the most important thing. If you leak these private keys, you're gonna have a lot of trouble. And if you follow the RFC, they mentioned that these private keys has to be stored stored in a off offline way. So in an offline laptop or in an off light the server. So, and the main thing is like how we automate uh to handle this private key if they are offline, you know. So this is one of the things that, that we have the with the certificate authority because if you automate it too much, someone can uh bypass your automation can uh actually explore and then you can get hacked. Uh So then this is one of the challenge that, that we see in our, in our team. It's like how we automate it without losing uh the security without being less secure. Also working at work day, we, we have to follow a lot of compliance and privacy standards.

So, uh and that is also a challenge because the compliances, for example, uh fed ramp, that's for American uh government and the GDPR and, and others that we also have to follow sometime, uh sometimes they can collide between each other. So how you make sure that you have that you are compliance in following the compliance and following the privacies that are requested. And also in the future, I can see that we, we might have also to be compliance per segment and also per region. So let's say that you're gonna have something like fed ramp for EU and other regions like Asia or per segment. So this is also an issue because they can collide between each other. Also, another challenge is to use new encryption methods. I didn't mention about it. But when you create um private and public key, you're gonna create a hash and to create that hash, you have to use an algorithm R CAA es for example, and now we have elliptical curve. So then you have to update this new encryption methods for your certificate. Also, we have new technologies like MU TLS, for example, that you have to uh adopt your application. And you know, with uh chat TPT nowadays, who knows which, which kind of new technologies we're gonna have uh for the SSL and NTLS uh side, right, as I mentioned to here, uh Alpaca so far uh was a very successful solution that we deployed.

But if you are interesting to see what happened when a certificate authority lose their private key and which kind of damage that can be done? I put a link on this presentation that I'm gonna share later and it's in a podcast episode, mentioned about GGG Notar that was a company that had the private key um leaked and how hard it was for other companies that were trusting in this certificate authority. So if you want to hear also about the not happy, the unhappy path, I invite you to listen to this podcast because it's a very interesting story. Ok. We are almost over 30 minutes of my presentation. II, I would like to thank you all so much for being here and, and listening. Uh What I have to say, if you have any question about what I mentioned regarding to the Certificate authority SSL TLS or even about work day, you can uh drop your message and I'm happy to, to answer now. Thank you. OK? I don't see any questions for now. So I think that I'm gonna uh close the session.