Stream-Framework: Build Scalable News Feeds and Activity Streams in Python
Summary
Stream-Framework is a powerful Python library designed for building robust news feeds, activity streams, and notification systems. It leverages Cassandra and/or Redis for high performance and scalability, making it ideal for applications requiring real-time feed capabilities. The library provides asynchronous tasks and reusable components, simplifying the development of complex feed-based systems.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
Stream-Framework is a Python library that empowers developers to build sophisticated news feeds, activity streams, and notification systems. It offers a flexible and scalable solution by integrating with popular NoSQL databases like Cassandra and Redis. This framework is ideal for creating features such as GitHub-style activity streams, Twitter-like newsfeeds, Instagram/Pinterest-style feeds, Facebook-style newsfeeds, and comprehensive notification systems.
The authors of Stream-Framework also provide a cloud service, Stream, which offers a managed solution for feed technology with clients for various languages, including Node, Ruby, PHP, Python, Go, Scala, Java, and a REST API. This service can be a great alternative for those not exclusively using Python or seeking to offload infrastructure management.
Installation
Installation of Stream-Framework is recommended via pip.
To install the base library:
pip install stream-framework
By default, stream-framework does not install the required dependencies for Redis and Cassandra. You can install them specifically:
Install stream-framework with Redis dependencies:
pip install stream-framework[redis]
Install stream-framework with Cassandra dependencies:
pip install stream-framework[cassandra]
Install stream-framework with both Redis and Cassandra dependencies:
pip install stream-framework[redis,cassandra]
Examples
Let's explore a quick example demonstrating how to publish a "Pin" to all followers, similar to a Pinterest-like application.
First, define an activity for the item:
from stream_framework.activity import Activity
def create_activity(pin):
activity = Activity(
pin.user_id,
PinVerb,
pin.id,
pin.influencer_id,
time=make_naive(pin.created_at, pytz.utc),
extra_context=dict(item_id=pin.item_id)
)
return activity
Next, define the feed classes. Here, UserPinFeed represents a user's personal feed, and PinFeed is a general feed type using Redis:
from stream_framework.feeds.redis import RedisFeed
class UserPinFeed(PinFeed):
key_format = 'feed:user:%(user_id)s'
class PinFeed(RedisFeed):
key_format = 'feed:normal:%(user_id)s'
Writing to a specific user's feed is straightforward:
feed = UserPinFeed(13)
feed.add(activity)
To publish an activity to multiple followers, Stream-Framework uses a "fanout" mechanism managed by a Manager class. You need to subclass Manager and define how to retrieve follower IDs:
from stream_framework.feed_managers.base import Manager
class PinManager(Manager):
feed_classes = dict(
normal=PinFeed,
)
user_feed_class = UserPinFeed
def add_pin(self, pin):
activity = pin.create_activity()
# add user activity adds it to the user feed, and starts the fanout
self.add_user_activity(pin.user_id, activity)
def get_user_follower_ids(self, user_id):
ids = Follow.objects.filter(target=user_id).values_list('user_id', flat=True)
return {FanoutPriority.HIGH:ids}
manager = PinManager()
Now, broadcasting a pin to all followers becomes a single line of code:
manager.add_pin(pin)
This method inserts the pin into the user's personal feed and initiates a fanout to all followers' feeds using Celery tasks. In a web framework like Django, you can then display the user's feed:
# django example
@login_required
def feed(request):
'''
Items pinned by the people you follow
'''
context = RequestContext(request)
feed = manager.get_feeds(request.user.id)['normal']
activities = list(feed[:25])
context['activities'] = activities
response = render_to_response('core/feed.html', context)
return response
Why Use Stream-Framework
Stream-Framework is built for high-performance, scalable feed systems, emphasizing heavy writes and extremely light reads. Key features include:
- Asynchronous Tasks: All computationally intensive operations, such as fanouts, occur in the background via Celery, ensuring that user interactions remain fast and responsive.
- Reusable Components: The framework provides modular and reusable components, allowing developers to make informed tradeoffs based on specific use cases without imposing rigid structures.
- Full Cassandra and Redis Support: It offers robust integration with both Cassandra and Redis, two leading NoSQL databases for high-volume data.
- Modern Cassandra Integration: The Cassandra storage backend utilizes CQL3 and the latest Python-Driver packages, providing access to modern Cassandra features.
- Optimized for Performance: Built for the highly performant Cassandra 2.1, with compatibility tested for 2.2 and 3.3.