Remote Code Execution Engine

codeforces-solutions · GitHub Topics · GitHub

As of 2023, Leetcode alone has around 1 million users. Not stopping here, codeforces which is a famous platform for competitve programming has around 2 million users. These online judges seem too simple from user's perspective but making the experience smooth and devoid of hurdles isn't an easy game. Ever wondered how thousands of users can simultaneously run code on their platform in mere few seconds? This is made possible due to an entity called Remote Code Execution (From now on I'll refer to it as RCE) engine. In this blog post, we will see how I designed and developed an RCE as my personal project.

Overview

Before proceeding onto the architecture of RCE , let's understand how exactly RCE works. An RCE ideally should be designed and structured to cater to the following basic requirements.

It Should be able to achieve parallelism. (Parallel code execution).
Should consume minimal resources while giving maximum throughput.
Immune to all sorts of attacks. ( I specifically mean the ones by submitting the code, like infinite loops, bombing codes etc).
Achieve code isolation for each submitted program.

Now that we know that what features should be incorporated, let us dive into the process of designing the entire system from scratch

Architecture

The main ordeal is to successfully sustain multiple concurrent incoming submission requests. Using a monolithic architecture would mean waste of resources and computing power. So the way to go forward is to use micro-services connected via messaging queues. The entire entity can be divided into 4 main components which we will discuss in detail one -by - one.

RCE Server

The server is the "gate" or the head section of the RCE . Apparently, it listens to API requests of "component-2"(this entity is another microservice of online judge that handles submission by user) as shown in the diagram above. The server here just receives the submissions and is only responsible to pass it ahead to the Exchange, which we will cover in the later part of this blog.

The reason behind doing this is pretty straightforward. Imagine multiple requests are being spammed to the server by users. If the monolithic architecture is used then the server would have to execute the code as well as listen to submission requests. This would become a big bottleneck for the entire system, and saturate the server with pending requests. Instead, it is better to direct the code submissions to the nodes of their particular language ( distributed system , separate execution nodes for C++, python, java etc). This will distribute the requests and optimize the function of the servers.

RCE Message Queue and Exchange

The second crucial part of RCE is a message broker or a messaging queue. A messaging queue is an asynchronous service-to-service communicating medium used in a microservices architecture. Message queues can be used to decouple heavyweight processing, to buffer or batch work and smoothen spiky workloads.

Here we use an Exchange which is connected to the RCE server. It redirects the requests to different communication channels established with different worker nodes representing different languages. So Exchange merely acts like a "postal worker" segregating requests to their respective queues. Using a message broker has its own advantages like:-

achieving a high degree of parallelism
supporting multiple routes
built and customized for rapid message transfer

Amazon S3 Bucket

Now imagine there are hundreds of requests being spammed to the RCE server each moment. transmitting the entire Source Code through

(server -> exchange -> queue -> worker)

is inefficient and nonviable. Here comes our third component to the rescue, S3 , a data storage platform highly optimized to handle large volumes of data. Initially the source code and input files are actually uploaded here once the user submits the code (by "component 2"). The information passed later on is only the {S3 download link, input file and submission ID} Then once the request reaches its desginated worker node, the code will get downloaded at that node. This way we save ourselves from over-computation and memory bounds.

(Note : The above image shows the actual JSON object of the API request sent by publisher( RCE server) to consumer (RCE worker).

src : S3 dowload link.
stdin : input
lang : the language of the code submitted.

)

The reason to choose S3 were :-

High storage capacity for low cost
High computation power
storage management and monitoring in a flexible fashion
Faster download rate, with lesser bottlenecks for large data transfer

RCE Worker

The worker nodes are the actual powerhouse or the officers of the RCE. Worker nodes are responsible to execute the obtained source code. The core concept of RCE is code virtualization and code isolation, therefore we containerize each worker using docker.

So for those who dont know docker, it is a software that packages applications into standardized units call containers to achieve virtualization (This means different runtime environment). I would suggest reading more about docker from here. It is a prime tool to quickly deploy scalable applications easily. So we can safely say that each worker is now a different independent entity, not affected by the functioning of any other worker node. This is a game changer as even if one worker is down, won't lead to the entire RCE going down.

Finally the api request has reached at its respective worker node and the code is now downloaded from S3. We need to make sure that we make the provision for the workers to parallely execute the code. So everytime worker recieves a submission , It spawns a sibling container that executes the recieved code independently. Sibling containers help us with the security aspect. Not just does it give concurrent execution capability , but it ensures that a malicious code cannot damage the entire RCE , as it would be virtualized in a completely isolated environment by sibling.

A quick hack to code this would be to use network API calls between the worker and the sibling to pass the code as POST request. While we have tried to eliminate API calls throughout the project , why use it here and sabotage the efficient functioning of RCE. To solve this we will use Docker Volume , which is our fourth component.

RCE "Shared" Volume

Docker Volumes are the preferred mechanism for persisting data generated by and used by Docker containers. By using "Shared" volume**("Shared" is the name of the docker volume we use here)**, we gain two big advantages in one go:

The Submission code can be accessed by sibling without network calls.
The output generated can be stored back into volume which can be accessed by worker without network calls.

This significantly decreases the overall network saturation of RCE , and improves runtime complexity. Finally one small measure to be taken here is to kill the sibling container incase a malicious code has been encountered , or once the code is executed and output is generated.

Conclusion

This was pretty much the gist of entire RCE as a microservice entity. The project is yet to become fully mature except for a few modifications and feature additions here and there. I hope this blog was insightful for you readers and would give a try at making a better system than mentioned currently.

You can pay a visit to this project by clicking here.

Incase you want a Code-wise explanation of this repository, feel free to reach out to me on linkedin or mention as comment below.

A huge shout-out to Parth , without whom this project would've still been a mere dream.

Thank You for being a patient reader!