Mritunjay Singh

Posted on Feb 21

Backend Interview Sheet

1. What is Docker, and Why is it Used?

Docker is an open-source containerization platform that allows developers to package applications and their dependencies into isolated environments called containers. These containers ensure that applications run consistently across different environments.

🔹 Real-Life Example:

Imagine you're developing a MERN stack web app. It works fine on your laptop, but when your teammate runs it, they get "version mismatch" errors.

With Docker, you create a consistent environment across all machines, preventing such issues.

✅ Why Use Docker?

Docker is beneficial when you need:

Portability → Works on any OS without compatibility issues
Consistency → Eliminates "It works on my machine" problems
Lightweight → Uses fewer system resources than virtual machines
Scalability → Quickly scale applications with minimal overhead

2. Main Components of Docker

🛠️ 1. Docker Daemon (dockerd)

The background process that manages Docker containers
Listens for API requests and handles images, networks, and volumes

💻 2. Docker CLI (Command-Line Interface)

A tool to interact with the Docker Daemon
Common commands:

  docker ps        # List running containers  
  docker run       # Start a new container  
  docker stop      # Stop a running container

📦 3. Docker Images

A read-only template containing the application, libraries, and dependencies
Immutable → Once built, images don’t change
Used to create containers

📌 4. Docker Containers

A running instance of a Docker image
Isolated from the host system but can interact if needed (e.g., exposing ports)

🌐 5. Docker Hub

A cloud-based registry where Docker images are stored and shared

🗂️ 6. Docker Volumes

Used for persistent data storage outside of containers

📌 Illustration of Docker Components:

3. How is Docker Different from Virtual Machines?

⚡ Example:

You're testing a React.js + Express.js app. Instead of running a full Ubuntu VM (which consumes high RAM & CPU), you start a lightweight container in seconds:

docker run -d -p 3000:3000 node:16

Unlike a VM, which takes minutes to boot, a container starts instantly.

🆚 Docker vs. Virtual Machines

Feature	Docker (Containers)	Virtual Machines (VMs)
Boot Time	Seconds	Minutes
Size	MBs	GBs
Performance	Near-native speed	Slower due to hypervisor overhead
Isolation	Process-level isolation	Full OS-level isolation
Resource Efficiency	Shares OS kernel, lightweight	Requires full OS, resource-intensive

docker run vs. docker start vs. docker exec

docker run : Start a new container
docker start : Restart a stopped container
docker exec : Run a command inside it

4. Popular and Useful Docker Commands

Here are some of the most commonly used Docker commands:

🔍 Container Management

# List all running containers
docker ps  

# List all containers (including stopped ones)
docker ps -a  

# Start a stopped container
docker start <container_id>  

# Stop a running container
docker stop <container_id>  

# Remove a container
docker rm <container_id>

🏗 Image Management

# List all available images
docker images  

# Pull an image from Docker Hub
docker pull <image_name>  

# Remove an image
docker rmi <image_name>

📦 Build and Run Containers

# Build a Docker image from a Dockerfile
docker build -t <image_name> .  

# Run a container from an image
docker run -d -p 8080:80 <image_name>

📂 Volume Management

# List all Docker volumes
docker volume ls  

# Create a new volume
docker volume create <volume_name>  

# Remove a volume
docker volume rm <volume_name>

Docker Compose: `docker-compose.yml`

What is `docker-compose.yml`?

The docker-compose.yml file is used to define and run multi-container Docker applications. With Docker Compose, you can manage and orchestrate multiple services, including databases, backend APIs, and front-end applications, all in a single file.

It allows you to define services, networks, and volumes, making it easier to deploy and manage applications that require multiple services working together.

Why is `docker-compose.yml` Useful?

Simplifies Multi-Container Management:
Instead of managing each container manually, Docker Compose allows you to define all services (frontend, backend, database, etc.) in one configuration file and launch them with a single command.
Networking and Dependency Management:
Docker Compose automatically creates a network for your containers, allowing them to communicate with each other. Services can be referenced by their service name, which means the backend can talk to the database without needing an IP address.
One Command to Start Everything:
Instead of running individual containers with complex docker run commands, Docker Compose lets you define the services and their dependencies in a YAML file, and run everything with docker-compose up.
Simplified Development Environment:
With Docker Compose, developers can easily replicate the production environment locally, using the same configuration for services like databases, backends, and frontends. It allows seamless integration and testing, as you don't have to manually set up each service.
Environment Variable Management:
You can manage environment variables for each service within the docker-compose.yml file, making it easier to configure your application for different environments (development, testing, production).

Example of `docker-compose.yml` for a Web Application

Let’s walk through an example where we have three services:

Frontend: A React app running on port 3000.
Backend: A Node.js API running on port 5000.
Database: A MongoDB instance to store data.

version: '3.8'

services:
  frontend:
    build: ./frontend
    ports:
      - "3000:3000"
    volumes:
      - ./frontend:/app
    depends_on:
      - backend

  backend:
    build: ./backend
    ports:
      - "5000:5000"
    environment:
      - NODE_ENV=development
    depends_on:
      - database

  database:
    image: mongo
    volumes:
      - mongo-data:/data/db
    ports:
      - "27017:27017"

volumes:
  mongo-data:

Database Migrations

Explain how you would design and manage a database schema using Sequelize, including the process of setting up migrations, handling model relationships, optimizing for performance, and managing database changes in a collaborative team environment.

Database Migration with Sequelize

Purpose

Database migrations allow you to safely update and manage your database schema over time. They help track changes to the schema in a version-controlled manner, making it easy to collaborate in teams.

Setting Up Migrations

Initialize Sequelize with sequelize-cli to generate migration files.
Migration files contain two primary methods:
- up: For applying changes (e.g., create tables, add columns).
- down: For rolling back changes (undoing the applied changes).

Handling Schema Changes

Creating Migrations:
When you need to add, modify, or delete database schema (e.g., tables, columns), you create a new migration file.
Applying Migrations:
Use the command npx sequelize-cli db:migrate to apply migrations to the database.
Rolling Back Migrations:
Use npx sequelize-cli db:migrate:undo to undo the last applied migration.

Model Relationships

Define associations (e.g., one-to-many, many-to-many) within your models using Sequelize methods:
- hasMany, belongsTo, manyToMany, etc.

Collaborative Workflow

Migrations should be version-controlled using Git.
Each team member works with migrations, and when schema changes are required, new migrations are created and applied across all environments (development, staging, production).

Github Action

Reference

YouTube Video

Steps to Deploy on AWS EC2

1. Launch EC2 Instance

2. Add Secret Variables in GitHub

Go to GitHub Repo Settings → Secrets and Variables → Actions → Add Secret

3. Connect to EC2 Instance

Install Docker on AWS EC2

sudo apt-get update
sudo apt-get install docker.io -y
sudo systemctl start docker
sudo chmod 666 /var/run/docker.sock
sudo systemctl enable docker
docker --version
docker ps

4. Create Two Runners on the Same EC2 Instance

In React App → Actions → Runner → New Self-Hosted Runner
Copy the download commands and run them in the EC2 instance terminal
Install it as a service to keep it running in the background

sudo ./svc.sh install
sudo ./svc.sh start

Do the same for the Node.js Runner

5. Create a Dockerfile for Node.js (Backend)

6. Create a GitHub Actions Workflow

Create a .github/workflows/cicd.yml file

7. Push Docker Images to DockerHub

8. Add Inbound/Outbound Rules on EC2 Instance

9. Access the Node.js Application

Use EC2_PUBLIC_IP:PORT to access your application

Deploying React App

Create a Dockerfile for React
Follow the same process as above

What is GitHub Actions, and how does it work?

GitHub Actions is a CI/CD automation tool that allows you to define workflows in YAML to build, test, and deploy applications directly from GitHub repositories.

How do you trigger a GitHub Actions workflow?

Workflows can be triggered by events such as push, pull_request, schedule, workflow_dispatch, and repository_dispatch.

What are the key components of a GitHub Actions workflow?

Key components include:

Workflows (YAML files in .github/workflows/)
Jobs (Independent execution units in a workflow)
Steps (Commands executed in a job)
Actions (Reusable units of functionality)
Runners (Machines that execute jobs)

What is the difference between jobs, steps, and actions?

Jobs: Run in parallel or sequentially within a workflow.
Steps: Individual tasks executed within a job.
Actions: Pre-built reusable components within steps.

How do you use environment variables and secrets in GitHub Actions?

Define environment variables using env:

  env:
    NODE_ENV: production

Store sensitive values in secrets:

  env:
    API_KEY: ${{ secrets.API_KEY }}

What are self-hosted runners, and when should you use them?

Self-hosted runners are custom machines used to execute workflows instead of GitHub's hosted runners. Use them for private repositories, custom hardware, or specific dependencies.

How do you cache dependencies in GitHub Actions?

Use actions/cache@v3 to cache dependencies and speed up builds:

- uses: actions/cache@v3
  with:
    path: ~/.npm
    key: npm-${{ runner.os }}-${{ hashFiles('**/package-lock.json') }}
    restore-keys: npm-${{ runner.os }}

How do you create a reusable workflow in GitHub Actions?

Define a workflow with on: workflow_call and call it from another workflow:

on: workflow_call
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - run: echo "Reusable workflow"

How do you set up a CI/CD pipeline using GitHub Actions?

Define a workflow that includes jobs for building, testing, and deploying:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - run: echo "Building..."
  test:
    runs-on: ubuntu-latest
    steps:
      - run: echo "Testing..."
  deploy:
    runs-on: ubuntu-latest
    needs: test
    steps:
      - run: echo "Deploying..."

What is the difference between workflow_dispatch, workflow_run, and schedule triggers?

workflow_dispatch: Manual trigger via GitHub UI/API.
workflow_run: Triggered when another workflow finishes.
schedule: Runs workflows at specific times using cron syntax.

How do you debug a failing GitHub Actions workflow?

Check logs in GitHub Actions UI.
Use set -x in bash scripts for verbose output.
Add continue-on-error: true to isolate issues.

How do you run a GitHub Actions workflow locally?

Use act, a tool that simulates GitHub Actions on your local machine:

act

How do you optimize and speed up GitHub Actions workflows?

Use caching (actions/cache@v3).
Run jobs in parallel when possible.
Use matrix builds for different environments.
Limit workflow execution to necessary branches.

How do you manage permissions and security in GitHub Actions?

Use least privilege principle for tokens (GITHUB_TOKEN).
Restrict secrets exposure to trusted workflows.
Use branch protection rules to limit workflow execution.

Websockets & Multi-backend system

Why Do Backends Need to Talk to Each Other?

In a typical client-server architecture, communication happens between the browser (client) and the backend server. However, as applications grow, keeping everything on a single server exposed to the internet becomes inefficient and unscalable.

When designing a multi-backend system, you need to consider:

If there are multiple services, how should they communicate when an event occurs?
Should it be an immediate HTTP call?
Should the event be sent to a queue?
Should the services communicate via WebSockets?
Should you use a Pub-Sub mechanism?

These decisions impact performance, scalability, and reliability.

Example: Payment Processing System

Let's consider a payment application. When a transaction occurs:

The database update should happen immediately (synchronous).
The notification (email/SMS) can be pushed to a queue (asynchronous).

Why not handle everything in the primary backend?

If the email service is down, should the user be forced to wait after completing the transaction? No!
Instead, we push the notification event to a queue.
Even if the notification service is down, the queue retains the event and sends notifications once the service is back.
This is why message queues (e.g., RabbitMQ, Kafka, AWS SQS) are better than HTTP for such tasks.

Types of Communication

Synchronous Communication
- The system waits for a response from the other system.
- Examples: HTTP requests, WebSockets (in some cases).
Asynchronous Communication
- The system does not wait for a response.
- Examples: Message queues, Pub-Sub services.

Why WebSockets?

WebSockets provide persistent, full-duplex communication over a single TCP handshake.

Limitations of HTTP:

In HTTP, the server cannot push events to the client on its own.
The client (browser) can request, and the server can respond, but the server cannot initiate communication with the client.

WebSockets vs. HTTP for Real-Time Applications

Example: Stock Market Trading System

Stock buying & selling generates millions of requests per second.
If you use HTTP, every request requires a three-way handshake, adding latency and overhead.
With WebSockets, the handshake happens only once, and then the server and client can continuously exchange data.

Alternative: Polling

If you still want to use HTTP for real-time updates, an alternative approach is polling.

However, polling creates unnecessary load on the server by making frequent requests.
WebSockets are a more efficient solution for real-time updates.

Some Basic Questions

Basic

What is Node.js?

Node.js is a runtime environment for executing JavaScript on the server side. It is not a framework or a language. A runtime is responsible for memory management and converting high-level code into machine code.

Examples:

Java: JVM (Runtime) → Spring (Framework)
Python: CPython (Runtime) → Django (Framework)
JavaScript: Node.js (Runtime) → Express.js (Framework)

With Node.js, JavaScript can run outside the browser as well.

Runtime vs Frameworks

Runtime: Focuses on executing code, handling memory, and managing I/O.
Framework: Provides structured tools and libraries to simplify development.

What happens when you enter a URL in the browser and hit enter?

DNS Lookup

The browser checks if it already knows the IP address for www.example.com.
If not, it contacts a DNS (Domain Name System) server to get the IP address (e.g., 192.168.1.1).

Establishing Connection

The browser initiates a TCP connection with the web server using a process called three-way handshake.
If the website uses HTTPS, a TLS handshake happens to encrypt the communication.

Sending HTTP Request

The browser sends an HTTP request to the server:

GET / HTTP/1.1
Host: www.example.com

Server Processing

The web server processes the request and may:
    Fetch data from a database
    Generate a response (HTML, JSON, etc.)

Receiving the Response

The server sends an HTTP response back to the browser:

HTTP/1.1 200 OK
Content-Type: text/html

Rendering the Page

The browser processes the HTML, CSS, and JavaScript and displays the webpage.

Difference Between Monolithic and Microservices Architecture

Monolithic Architecture

All components (UI, DB, Auth, etc.) are tightly coupled.
Single application handles everything.

Microservices Architecture

Divided into small, independent services.
Each service handles a specific function (Auth, Payments, etc.).

Pros:

Scalable
Services can use different tech stacks

Cons:

More complex to manage
Requires API communication

HTTP Status Codes

200 OK
201 Created
400 Bad Request
401 Unauthorized
402 Payment Required
404 Not Found
405 Method Not Allowed
500 Internal Server Error

What is cors ?

CORS stand for Cross Origin Resource Sharing- a security feature built into browsers
It blocks the requests from one origin(domain,protocol or port) to another origin unless explicitly allowed by the server
For exmple: Your frontend is hosted at frontend.com and you bacend at backend.com
The browser these as a different origin and blocks the request unless it is explicitly allowed
why does this happen though?
CORS error are triggered by Same Origin Policy,which prevents malicious websites from making unauthorized API call using your credentials

Browser isn't blocking the requests---its blocking the response for security reasons

REST vs GraphQL

REST API:

"REST (Representational State Transfer) is an architectural style where data is fetched using multiple endpoints, and each request returns a fixed structure of data."

GraphQL:

"GraphQL is a query language for APIs that allows clients to request only the data they need, reducing overfetching and underfetching."

💡 Key Point:

REST APIs have multiple endpoints (/users, /orders), while GraphQL has a single endpoint (/graphql).
GraphQL provides more flexibility by allowing clients to request exactly what they need in a single query.
REST APIs return predefined responses and sometimes require multiple requests.
If performance and flexibility are key concerns, GraphQL is a better choice.

How Do You Design an API for a Large-Scale System?

Use Microservices: Separate services (Auth, Payments, etc.).
Load Balancers: Distribute traffic efficiently.
Caching: Use Redis for frequently accessed data.
Pagination: Send data in chunks.
Rate Limiting: Prevent API abuse.

What is Pagination? How to Implement It?

Pagination breaks large datasets into smaller parts.
Implementation:

Use limit and offset in database queries.
Example:

  SELECT * FROM users LIMIT 10 OFFSET 20;

Use cursor-based pagination for better performance.

How Do You Handle File Uploads?

Single file upload: Use multipart/form-data with Express.js & Multer.
Large file handling: Use chunked uploads.
Storage options: Store files on AWS S3, Google Cloud Storage, or a database.
Server-side Upload: The file is uploaded to your backend server first, and then the server sends it to S3 or Cloudinary.

Explain the concept of statelessness in HTTP and how it impacts backend services

Intermediate

What is full text search?

What is Serverless and Serverful backend ?

A serverfull backend means you manage the entire server, while a serverless backend means you don’t have to manage servers—your code runs only when needed on cloud platforms like AWS Lambda
Example: Imagine you are building a food delivery app like Zomato or Uber Eats.

If you use a serverfull backend:

    You set up an Express.js server on AWS EC2.

    The server is always running, handling all API requests like fetching restaurants, placing orders, and tracking deliveries.

    You pay for the server 24/7, even when there are no active users.

If you use a serverless backend:

    You use AWS Lambda functions to handle API requests.

    When a user places an order, the function runs only for that request and then shuts down.

    You only pay for execution time, making it cost-effective.

Can you explain single-threaded vs. multi-threaded processing?

Single-threaded programs execute one task at a time, while multi-threaded programs can execute multiple tasks in parallel. However, single-threaded systems can still be asynchronous using event loops, like in Node.js. If I were building a CPU-intensive app like a video editor, I’d go with multi-threading. But for an API server handling multiple users, I’d use a single-threaded, asynchronous model like Node.js to handle requests efficiently