DEV Community

sam-nash
sam-nash

Posted on • Edited on

Google Pub/Sub

Introduction: The Power of Event-Driven Architectures

Have you ever wondered how large-scale systems like Google, Netflix, or Spotify manage to process millions of real-time events, such as sending notifications, updating dashboards, or recommending personalized content, all without missing a beat? The secret lies in event-driven architectures, and at the heart of such systems is a powerful tool: Google Cloud Pub/Sub.

In a world where businesses demand seamless integration and instant processing of data, traditional point-to-point messaging systems often fall short. Enter Google Pub/Sub, a fully managed messaging service designed to decouple applications, handle massive data streams, and enable real-time communication with unparalleled reliability and scalability.

Why Pub/Sub is Essential for Modern Systems
Modern applications require more than just static workflows; they thrive on dynamic, event-driven processes that can adapt and respond to real-time data. Here are some key reasons why Google Pub/Sub has become indispensable:

  • Real-Time Processing: From IoT devices generating millions of sensor readings to financial systems detecting fraudulent transactions, Pub/Sub ensures these events are captured and processed in real time.

  • Scalability: Pub/Sub seamlessly scales to handle millions of messages per second, supporting the ever-growing demands of modern businesses.

  • Decoupling of Services: It simplifies architectures by allowing independent components to communicate asynchronously, improving maintainability and flexibility.

  • Global Reach: Built on Google's global infrastructure, Pub/Sub offers low-latency message delivery across regions.

  • Integration-Friendly: It integrates seamlessly with other Google Cloud services like BigQuery, Dataflow, Cloud Functions, and Cloud Storage, making it a cornerstone for building robust, interconnected systems.

Google Cloud Pub/Sub is a fully managed messaging service designed for real-time communication, enabling independent applications to send and receive messages seamlessly. It's a powerful tool for building scalable and reliable systems.

In this tutorial, we'll delve into the core concepts of Pub/Sub, explore practical examples using the command-line interface (CLI) and Python SDK, and discuss various use cases.

Key Concepts

  • Topic: A named resource to which messages are published.
  • Subscription: A named resource that receives messages from a specific topic.
  • Publisher: An application that sends messages to a topic.
  • Subscriber: An application that receives messages from a subscription.
  • Message: The core unit of communication in Pub/Sub, consisting of data (payload) and attributes (key-value metadata).
  • Acknowledgment (Ack): Subscribers must confirm receipt of a message by acknowledging it. Unacknowledged messages are redelivered to ensure reliable processing.
  • Push vs. Pull Subscriptions
    • Pull: Subscribers explicitly fetch messages.
    • Push: Pub/Sub pushes messages to an HTTP(S) endpoint.
  • Dead Letter Queue (DLQ): A separate topic for routing undeliverable messages after a defined number of delivery attempts. Useful for troubleshooting.
  • Retention Policy: Controls how long messages are stored if they remain unacknowledged or acknowledged. Default: 7 days.
  • Filters: Enable subscribers to receive only messages that match specific criteria based on attributes.
  • Ordering Keys: Ensures messages with the same ordering key are delivered in the order they were published.
  • Exactly Once Delivery: Guarantees no duplicate messages are delivered when configured correctly.
  • Flow Control: Helps subscribers manage the rate of message delivery to prevent being overwhelmed.
  • IAM Permissions and Roles: Pub/Sub uses Google Cloud IAM to control access.
    • Key roles:
    • Publisher: roles/pubsub.publisher
    • Subscriber: roles/pubsub.subscriber
    • Viewer: roles/pubsub.viewer
  • Message Deduplication: Prevents duplicate messages caused by retries or network issues.
  • Message Expiration (TTL): Automatically drops messages from a subscription after a specified time-to-live, reducing storage costs.
  • Backlog: The collection of unacknowledged messages in a subscription, enabling message replay if required.
  • Schema Registry: Defines structured message formats (e.g., Avro, Protocol Buffers), ensuring consistent data validation.
  • Batching: Groups multiple messages together for efficient publishing and delivery, improving performance.
  • Regional Endpoints: Publish messages to region-specific endpoints for compliance and reduced latency.
  • Cross-Project Topics and Subscriptions: Share Pub/Sub resources across projects for multi-project architectures.
  • Monitoring and Metrics: Integrated with Cloud Monitoring, providing insights like throughput, acknowledgment latency, and backlog size.
  • Snapshot: Captures the state of a subscription at a specific time, enabling message replay from that point.
  • Message Encryption: Messages are encrypted in transit and at rest by default, with the option to use Customer-Managed Encryption Keys (CMEK).

Getting Started

Pre-requisites

  • Set Up Your Google Cloud Project:
    • Create a new Google Cloud project or select an existing one.
    • Enable the Pub/Sub API.
  • Install the Google Cloud SDK:

    • Download and install the SDK from the official website.
    • Authenticate your Google Cloud account using
     # using cli
     gcloud auth login
    

Creating a Topic and Subscription

# Create a topic using cli
gcloud pubsub topics create my-topic

# Create a subscription using cli
gcloud pubsub subscriptions create my-subscription --topic my-topic
Enter fullscreen mode Exit fullscreen mode
# Using the Python SDK
from google.cloud import pubsub_v1

# Create a Publisher client
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path("your-project-id", "my-topic")
publisher.create_topic(topic_path)
Enter fullscreen mode Exit fullscreen mode
# Create a Subscriber client
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path("your-project-id", "my-subscription")
subscriber.create_subscription(subscription_path, topic_path)
Enter fullscreen mode Exit fullscreen mode

Publishing Messages

# using cli
gcloud pubsub topics publish my-topic \
--message "Hello, Pub/Sub!"
Enter fullscreen mode Exit fullscreen mode
# using the Python SDK
# Publish a message
message = "Hello, Pub/Sub!"
future = publisher.publish(topic_path, message.encode("utf-8"))
print(future.result())
Enter fullscreen mode Exit fullscreen mode

Subscribing to Messages

# Using the CLI
gcloud pubsub subscriptions pull my-subscription --auto-ack
Enter fullscreen mode Exit fullscreen mode
# Using the Python SDK
def callback(message):
    print("Received message: {}".format(message.data))
    message.ack()

subscriber.subscribe(subscription_path, callback=callback)

# The subscriber will stay alive until interrupted
while True:
    time.sleep(60)
Enter fullscreen mode Exit fullscreen mode

Use Cases with Practical Examples

Triggers with Cloud Functions

Use Case: Automatically trigger a process whenever a new message is published to a Pub/Sub topic. For example, sending an email notification when a new user registers.

Solution: Use Cloud Functions to listen to Pub/Sub topics and handle messages.

Steps:
Pre-Requisites
-- Enable the apis run.googleapis.com, cloudbuild.googleapis.com, eventarc.googleapis.com
-- Grant the role roles/logging.logWriter to the default compute servie account

gcloud projects add-iam-policy-binding $GCP_PROJECT \
  --member="serviceAccount:<ID>-compute@developer.gserviceaccount.com" \
  --role="roles/logging.logWriter"
Enter fullscreen mode Exit fullscreen mode
  1. Create a Pub/Sub topic:

    gcloud pubsub topics create user-registration
    

pub-sub-topic

  1. Write a Cloud Function:

    def send_email_notification(event, context):
        import base64
        message = base64.b64decode(event['data']).decode('utf-8')
    print(f"Sending email notification for: {message}")
    
  2. Deploy the Cloud Function:

      gcloud functions deploy sendEmailNotification \
    --runtime python39 \
    --trigger-topic user-registration \
    --entry-point send_email_notification
    
  3. Publish a message to test:

      gcloud pubsub topics publish user-registration --message "New User: John Doe"
    

Result: The Cloud Function logs the email
notification process.

cloud-function-run

Streaming Data to BigQuery

Use Case: Process and analyze large volumes of event data, such as IoT sensor readings or transaction logs, by streaming messages from Pub/Sub to BigQuery.

Solution: Use a pre-built Dataflow template to ingest data into BigQuery.

Steps:

  1. Create a Pub/Sub topic and subscription:

    gcloud pubsub topics create sensor-data
    gcloud pubsub subscriptions create sensor-data-sub --topic=sensor-data
    

pub-sub-topic

pub-sub-subscription

  1. Enable BigQuery API and create a BigQuery dataset and table:

    # schema.json
    [
      {
        "name": "temperature",
        "type": "FLOAT",
        "mode": "NULLABLE"
      },
      {
        "name": "humidity",
        "type": "FLOAT",
        "mode": "NULLABLE"
      }
    ]
    
```
bq mk sensor_dataset
bq mk --table sensor_dataset.sensor_table \ 
schema.json
```
Enter fullscreen mode Exit fullscreen mode

Image description

  1. Enable the Dataflow API and run the Dataflow job:

    # enable dataflow api
    https://console.developers.google.com/apis/api/dataflow.googleapis.com/overview?project=your-project-id
    
```bash
# run the dataflow job
gcloud dataflow jobs run pubsub-to-bigquery \
--gcs-location gs://dataflow-templates-us-central1/latest/PubSub_to_BigQuery \
--region us-central1 \
--parameters inputTopic=projects/your-project-id/topics/sensor-data,outputTableSpec=your-project-id:sensor_dataset.sensor_table
```
Enter fullscreen mode Exit fullscreen mode

Image description
Image description

  1. Publish messages to the topic:

    gcloud pubsub topics publish sensor-data --message '{"temperature": 25, "humidity": 60}'
    

Image description

Result: The message data should appear in the BigQuery table for further analysis.

Image description

Logging Events with Sinks

Use Case: Centralize and process logs by routing Cloud Logging events to a Pub/Sub topic for downstream processing, like alerting or archiving.

Solution: Create a logging sink targeting a Pub/Sub topic.

Steps:

  1. Create a Pub/Sub topic:
   gcloud pubsub topics create log-events
Enter fullscreen mode Exit fullscreen mode
  1. Create a logging sink:
    gcloud logging sinks create log-sink \
    pubsub.googleapis.com/projects/your-project-id/topics/log-events
Enter fullscreen mode Exit fullscreen mode
  1. Set IAM permissions for the sink service account:
gcloud pubsub topics add-iam-policy-binding log-events \
    --member=serviceAccount:service-account-email \
    --role=roles/pubsub.publisher
Enter fullscreen mode Exit fullscreen mode
  1. Create a subscriber to process the logs:
from google.cloud import pubsub_v1

subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path('your-project-id', 'log-events-sub')

def callback(message):
    print(f"Received log: {message.data.decode('utf-8')}")
    message.ack()

subscriber.subscribe(subscription_path, callback=callback)
print("Listening for log events...")
Enter fullscreen mode Exit fullscreen mode

Result: Log messages flow into Pub/Sub and can be processed by your custom subscriber.

Streaming Messages to Cloud Storage

Use Case: Archive real-time data, such as application usage metrics or user activity logs, to Cloud Storage for long-term storage.

Solution: Use Dataflow to write Pub/Sub messages to a Cloud Storage bucket.

Steps:

  1. Create a Pub/Sub topic and subscription:
gcloud pubsub topics create app-metrics
gcloud pubsub subscriptions create app-metrics-sub --topic=app-metrics
Enter fullscreen mode Exit fullscreen mode
  1. Create a Cloud Storage bucket:
gsutil mb gs://your-bucket-name
Enter fullscreen mode Exit fullscreen mode
  1. Run the Dataflow job:
gcloud dataflow jobs run pubsub-to-gcs \
    --gcs-location gs://dataflow-templates-us-central1/latest/PubSub_to_Cloud_Storage_Text \
    --region us-central1 \
    --parameters inputTopic=projects/your-project-id/topics/app-metrics,outputDirectory=gs://your-bucket-name/data/,outputFilenamePrefix=metrics,outputFileSuffix=.json
Enter fullscreen mode Exit fullscreen mode
  1. Publish messages:
gcloud pubsub topics publish app-metrics --message '{"event": "page_view", "user": "123"}'
Enter fullscreen mode Exit fullscreen mode

Result: Messages are written as JSON files in the specified Cloud Storage bucket.

Conclusion
Google Pub/Sub is a versatile tool for building scalable and reliable applications. By understanding the core concepts and leveraging the CLI and Python SDK, you can effectively utilize Pub/Sub to solve a variety of real-world problems.

Top comments (0)