programming Programming Projects

Create a Content Aggregator with Python

The need for content aggregators is pretty clear. The internet is filled with endless information and in order for you to stay updated and informed about the latest news or any other type of content, you might be scrolling through various websites every day.

Content aggregation helps us optimize our content consumption — instead of scrolling through 5 different websites we only need one, and instead of endless scrolling trying to filter the content we care about, we can be presented with content related to our topics of interest.

In this article, you will learn how to create your own customized content aggregator with python from scratch.


Prerequisites

To complete this tutorial, you will need:

  • A local development environment for Python 3.6+
  • Familiarity with Python.

Step 1 — Installing Dependencies

Create a new file called requirements.txt and copy the following content

praw==7.4.0

Run the following command to install all dependencies

pip install -r requirements.txt

In the next section, we are going to come up with a design for our content aggregator such that it will be easy to add new sources and topics to follow.

Step 2 — Design

In this article, we are going to create a content aggregator from a single source — Reddit, but in order to make it easier to add new sources, we would have to design our project properly.

We are going to create a Source abstract class which will be the base class for the different sources we want to include (Reddit, Medium, etc).
In our content aggregator, we are going to create a RedditSource class that inherits from Source .
Lastly, we will create another class RedditHotProgramming which will represent a topic we want to fetch content for from our source.

If you wish to enter a second topic that will be fetched from the Reddit source, you would simply have to create a new class, for example RedditHotNews

 In order to fetch posts from a different platform, for example, Medium, you will create a MediumSource class and the topic classes you wish to follow.

In the next section, we are going to start coding our content aggregator, starting from the Source abstract class.

Step 3 — Creating the Source Class

Create a new file called content_agg.py and import the following libraries

from abc import ABC, abstractmethod
import praw
import os

Now let’s define our Source abstract class.
The Source class will have two abstract methods which will allow us to connect to a source (its API for example) and to fetch posts from it.

class Source(ABC):

  @abstractmethod
  def connect(self):
    pass

  @abstractmethod
  def fetch(self):
    pass

In the next section, we are going to create the RedditSource class which will implement the abstract methods in the Source class.

Step 4 — Creating the Reddit Source class

In this section, we are going to write the RedditSource class into content_agg.py .

In this class, we will implement the connection to the Reddit API.
In order to access Reddit’s API, you will need to generate a key. 
Luckily this procedure is very easy and short, follow it on reddit-archive on github.

Once you have the necessary keys, create environment variables for them or simply use them as constants in your code (only if you are not planning to share your code elsewhere).
If you chose to create environment variables for the two keys like I did, here’s how you access them from your code

CLIENT_ID = os.environ.get('REDDIT_CLIENT_ID')
CLIENT_SECRET = os.environ.get('REDDIT_CLIENT_SECRET')

Now moving on to create the RedditSource class

class RedditSource(Source):

  def connect(self):
    self.reddit_con = praw.Reddit(client_id=CLIENT_ID,
                      client_secret=CLIENT_SECRET,
                      grant_type_access='client_credentials',
                      user_agent='script/1.0')
    return self.reddit_con

  def fetch(self):
    pass

And now we have a working connection to Reddit’s API.
The fetch function is left unimplemented in this class since we will implement it for each topic.

In the next section, we will create the RedditHotProgramming class where we will fetch the hot posts from r/programming.

Step 5 — Create the Reddit Hot Programming class

In this section, we are going to implement a class that will enable us to fetch hot posts from r/programming.
We will rely on the connection to Reddit’s API from our parent class — RedditSource .

class RedditHotProgramming(RedditSource):

  def __init__(self) -> None:
    self.reddit_con = super().connect()
    self.hot_submissions = []

  def fetch(self, limit: int):
    self.hot_submissions =  
           self.reddit_con.subreddit('programming').hot(limit=limit)

  def __repr__(self):
    urls = []
    for submission in self.hot_submissions:
      urls.append(vars(submission)['url'])
    return '\n'.join(urls)

Besides the fetch functionality, we implemented the __repr__ method so that when we call print on our object of RedditHotProgramming we will print its custom representation which in our case will be the list of URLs from the hot posts.

In the next section, we will glue everything together and execute our content aggregator.

Step 6 — Gluing Everything Together

In order to run everything, we will create an RedditHotProgramming instance and fetch from it a few posts. then we will be able to print our object and get all the URLs that appeared in each of the hot posts.

if __name__ == '__main__':
  reddit_top_programming = RedditHotProgramming()
  reddit_top_programming.fetch(limit=10)
  print(reddit_top_programming)

To execute, simply run python content_agg.py in your terminal.

Conclusion

In this article, you built a news aggregator completely from scratch. now you can pick any platform and topic you want to add and expand this project.

0 comments on “Create a Content Aggregator with Python

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: