Stream-Framework: Build Scalable News Feeds and Activity Streams in Python

Introduction

Stream-Framework is a Python library that empowers developers to build sophisticated news feeds, activity streams, and notification systems. It offers a flexible and scalable solution by integrating with popular NoSQL databases like Cassandra and Redis. This framework is ideal for creating features such as GitHub-style activity streams, Twitter-like newsfeeds, Instagram/Pinterest-style feeds, Facebook-style newsfeeds, and comprehensive notification systems.

The authors of Stream-Framework also provide a cloud service, Stream, which offers a managed solution for feed technology with clients for various languages, including Node, Ruby, PHP, Python, Go, Scala, Java, and a REST API. This service can be a great alternative for those not exclusively using Python or seeking to offload infrastructure management.

Installation

Installation of Stream-Framework is recommended via pip.

To install the base library:

pip install stream-framework

By default, stream-framework does not install the required dependencies for Redis and Cassandra. You can install them specifically:

Install stream-framework with Redis dependencies:

pip install stream-framework[redis]

Install stream-framework with Cassandra dependencies:

pip install stream-framework[cassandra]

Install stream-framework with both Redis and Cassandra dependencies:

pip install stream-framework[redis,cassandra]

Examples

Let's explore a quick example demonstrating how to publish a "Pin" to all followers, similar to a Pinterest-like application.

First, define an activity for the item:

from stream_framework.activity import Activity

def create_activity(pin):
    activity = Activity(
        pin.user_id,
        PinVerb,
        pin.id,
        pin.influencer_id,
        time=make_naive(pin.created_at, pytz.utc),
        extra_context=dict(item_id=pin.item_id)
    )
    return activity

Next, define the feed classes. Here, UserPinFeed represents a user's personal feed, and PinFeed is a general feed type using Redis:

from stream_framework.feeds.redis import RedisFeed

class UserPinFeed(PinFeed):
    key_format = 'feed:user:%(user_id)s'

class PinFeed(RedisFeed):
    key_format = 'feed:normal:%(user_id)s'

Writing to a specific user's feed is straightforward:

feed = UserPinFeed(13)
feed.add(activity)

To publish an activity to multiple followers, Stream-Framework uses a "fanout" mechanism managed by a Manager class. You need to subclass Manager and define how to retrieve follower IDs:

from stream_framework.feed_managers.base import Manager

class PinManager(Manager):
    feed_classes = dict(
        normal=PinFeed,
    )
    user_feed_class = UserPinFeed

    def add_pin(self, pin):
        activity = pin.create_activity()
        # add user activity adds it to the user feed, and starts the fanout
        self.add_user_activity(pin.user_id, activity)

    def get_user_follower_ids(self, user_id):
        ids = Follow.objects.filter(target=user_id).values_list('user_id', flat=True)
        return {FanoutPriority.HIGH:ids}

manager = PinManager()

Now, broadcasting a pin to all followers becomes a single line of code:

manager.add_pin(pin)

This method inserts the pin into the user's personal feed and initiates a fanout to all followers' feeds using Celery tasks. In a web framework like Django, you can then display the user's feed:

# django example

@login_required
def feed(request):
    '''
    Items pinned by the people you follow
    '''
    context = RequestContext(request)
    feed = manager.get_feeds(request.user.id)['normal']
    activities = list(feed[:25])
    context['activities'] = activities
    response = render_to_response('core/feed.html', context)
    return response

Why Use Stream-Framework

Stream-Framework is built for high-performance, scalable feed systems, emphasizing heavy writes and extremely light reads. Key features include:

Asynchronous Tasks: All computationally intensive operations, such as fanouts, occur in the background via Celery, ensuring that user interactions remain fast and responsive.
Reusable Components: The framework provides modular and reusable components, allowing developers to make informed tradeoffs based on specific use cases without imposing rigid structures.
Full Cassandra and Redis Support: It offers robust integration with both Cassandra and Redis, two leading NoSQL databases for high-volume data.
Modern Cassandra Integration: The Cassandra storage backend utilizes CQL3 and the latest Python-Driver packages, providing access to modern Cassandra features.
Optimized for Performance: Built for the highly performant Cassandra 2.1, with compatibility tested for 2.2 and 3.3.

Stream-Framework: Build Scalable News Feeds and Activity Streams in Python

Summary

Repository Info

Tags

Introduction

Installation

Examples

Why Use Stream-Framework

Links