Skip to the content.

Object-Oriented Programming (OOP)

Overview

  1. Object-Oriented Programming (OOP)
  2. Inheritance (OOP)
  3. Estimators
  4. Transformers
  5. Custom Estimators
  6. Pipeline
  7. Common Scikit-learn modules

What is Object-Oriented Programming?
A style of programming that emphasizes the use of objects to represent and process data in a program.

Basic Concepts

We will use an example to motivate the use of the OOP paradigm.

Example:
Suppose we are required to write a program to simulate the interactions between users on a Social Media platform.

For the basic requirements, we need to be able to:

  1. Represent each user data (username, birthdate, friends, posts).
  2. Add a new friend
  3. Publish a post
  4. Like a post

Basic Solution

A basic solution would be to store each user's data as a dict, and use functions to manipulate the data.

{
    'username': 'john_doe',
    'joined_date': 'YYYY-MM-DD',
    'friends': [...], # list of usernames
    'posts': [
        {
            'title': 'Post 1',
            'text': 'A new post',
            'likes': [...] # list of usernames
        },
        ... # other posts 
    ],
}

Requirement 1: Represent user data

def create_user(username, joined_date):
    user = {
        'username': username,
        'joined_date': joined_date,
        'friends': [], # no friends for new user
        'posts': [], # no posts for new user
    }

    return user
>>> johndoe = create_user('johndoe', '2015-04-20') # creating a new user dictionary
>>> johndoe
{
    'friends': [],
    'joined_date': '2018-04-20',
    'posts': [],
    'username': 'johndoe'
}
>>> mikesmith = create_user('mikesmith', '2020-10-31') # creating a new user dictionary
>>> mikesmith
{
    'friends': [],
    'joined_date': '2020-10-31',
    'posts': [],
    'username': 'mikesmith'
}

Requirement 2: Add friend

def add_friend(user1, user2):
    username1 = user1['username']
    username2 = user2['username']

    user1['friends'].append(username2)
    user2['friends'].append(username1)
>>> add_friend(johndoe, mikesmith) # adding friends
>>> johndoe['friends']
['mikesmith']
>>> mikesmith['friends']
['johndoe']

Requirement 3: Publish post

def publish_post(user, post):
    user['posts'].append(post)
>>> post = {
...     'title': 'Post 1',
...     'text': 'A new post',
...     'likes': []
... }
>>> publish_post(johndoe, post) # publish a new post
>>> johndoe['posts']
[
    {
        'likes': [],
        'text': 'A new post',
        'title': 'Post 1'
    }
]

Requirement 4: Like post

def like_post(user, post):
    username = user['username']

    post['likes'].append(username)
>>> like_post(mikesmith, post) # liking a post
>>> johndoe['posts']
[
    {
        'likes': ['mikesmith'],
        'text': 'A new post',
        'title': 'Post 1'
    }
]

OOP Solution

The Basic Solution already solves the requirements for the program.

In fact, what we have done is conceptually inline with the OOP paradigm.

Recall:
Object-Oriented Programming is a style of programming that emphasizes the use of objects to represent and process data in a program.

What is an Object?
In simplest terms, an object is a data type or data structure.

string, integer, boolean, list are all objects.

Even the dict we've been using the represent the user data is an object.

There are two type of features that make objects powerful:

Conceptually, the key-value pairs in the user dict are like an object's properties.
And the functions we've defined to manipulate the user dict are the methods associated with that specific type of data structure.

However, in order to create an object, we need to define its structure. This is accomplished with a class.

What is a Class?
A class is a blueprint of an object's structure. Just as a house requires a blueprint that defines its structure, an object requires a class in order to be constructed.

Specifically, a class defines the Properties and Methods that its objects possess.

To define a class, we use a special keyword called class:

>>> class User:
...     pass

And we create an object (aka instance) of the class by calling it like a function. This is called instantiation.

>>> user = User()

Note:
pass is a special keyword in python for avoiding a common error that occurs when there is no code within an indentation block.

SyntaxError: unexpected EOF while parsing

Right now the user object does not have any properties defined.

We can assign properties to an object using the <object>.<property> syntax to access its properties:

>>> user.username = 'johndoe'
>>> user.username
'johndoe'

Note:
Attempting to access a property that does not exist on an object will result in an error.

>>> user.name
AttributeError: 'User' object has no attribute 'name'

We can even combine the process of instantiating the object and initializing its properties into a single function create_user_object.

This ensures that every User object we create has the expected properties defined when we use the create_user_object function.

def create_user_object(username, joined_date):
    user = User()

    user.username = username
    user.joined_date = joined_date
    user.friends = [] # no friends for new user
    user.posts = [] # no post for new user

    return user
>>> johndoe = create_user_object('johndoe', '2018-04-20')
>>> johndoe.username, johndoe.joined_date
('johndoe', '2018-04-20')
>>> mikesmith = create_user_object('mikesmith', '2020-10-31')
>>> mikesmith.username, mikesmith.joined_date
('mikesmith', '2020-10-31')

Now that we've looked at properties, let's move on to methods.

Methods are functions that are bound to a particular class and are used by objects of that class.

We define a method on a class like this:

class MyClass:
    def my_method(self, ...):
        pass

And call it like this:

>>> my_object = MyClass()
>>> my_object.my_method(...)

It is similar to a function definition except for 2 notable differences:

  1. The definition exists within the indentation block of the class definition.

    • This means that most of the code relating to the class are bundled up in the class definition.
    • It's now clear the method is meant only for objects of that class.
  2. The first argument is always the current object calling the method. By convention, the name of that first argument is called self.

    • We don't have to explicitly pass in the object for its methods to gain access to it.
class User:
    # Requirement 2: Add friend
    def add_friend(self, new_friend):
        username = self.username
        friend_username = new_friend.username

        self.friends.append(friend_username)
        new_friend.friends.append(username)

    # Requirement 3: Publish post
    def publish_post(self, post):
        self.posts.append(post)

    # Requirement 4: Like post
    def like_post(self, post):
        username = self.username
        post['likes'].append(username) # append is a method on the `list` object
        
# Requirement 1: Represent user data
def create_user_object(username, birthdate):
    user = User()

    user.username = username
    user.birthdate = birthdate
    user.friends = [] # no friends for new user
    user.posts = [] # no post for new user

    return user
>>> # Requirement 1: Represent user data
>>> johndoe = create_user_object('johndoe', '2015-04-20')
>>> mikesmith = create_user_object('mikesmith', '2020-10-31')
>>>
>>> # Requirement 2: Add friend
>>> johndoe.add_friend(mikesmith) # instead of add_friend(johndoe, mikesmith)
>>> johndoe.friends
['mikesmith']
>>> mikesmith.friends
['johndoe']
>>>
>>> # Requirement 3: Publish post
>>> post = {
...     'title': 'Post 1',
...     'text': 'A new post',
...     'likes': []
... }
>>> johndoe.publish_post(post) # instead of publish_post(johndoe, post)
>>> johndoe.posts
[
    {
        'likes': [],
        'text': 'A new post',
        'title': 'Post 1'
    }
]
>>>
>>> # Requirement 4: Like post
>>> mikesmith.like_post(post) # instead of like_post(mikesmith, post)
>>> post
{
    'likes': ['mikesmith'],
    'text': 'A new post',
    'title': 'Post 1'
}

Now that we have moved most of the code into the User class, it is now easy to see which methods an object can use.
Depending on the text editor or IDE you're using, you could even get auto-complete features.

It would be nice if the process for instantiating and initializing a User was also bundled together in the class definition with the rest of the code.
If only there was a way to automatically initialize the properties of an object at the same time when we instantiate it via MyClass().

Fortunately, Python provides a solution for this.
There are a set of special method definitions that Python watches out for in a class.
If present, these special methods enhance the functionalities of the classes that define them and their objects.

These methods are commonly referred to as dunder (double-underscore) methods, due to their naming convention (def __methodname__(self, ...)).

One of those special methods is the __init__ method.
Whenever we construct an object (i.e: calling MyClass(...)), Python automatically calls the __init__ method in the background after the object has been instantiated.

>>> # When we do this 👇
>>> my_object = MyClass(...)

>>> # Python does this 👇 for us if __init__ is defined
>>> my_object = MyClass()
>>> my_object.__init__(...)

We can now move the initialization process for User objects from create_user_object to the __init__ method.

class User:
    # Requirement 1: Represent user data
    def __init__(self, username, joined_date):
        # user = User() # we don't need this

        self.username = username
        self.joined_date = joined_date
        self.friends = [] # no friends for new user
        self.posts = [] # no post for new user

        # return user # we don't need this

    # Requirement 2: Add friend
    def add_friend(self, new_friend):
        username = self.username
        friend_username = new_friend.username

        self.friends.append(friend_username)
        new_friend.friends.append(username)

    # Requirement 3: Publish post
    def publish_post(self, post):
        self.posts.append(post)

    # Requirement 4: Like post
    def like_post(self, post):
        username = self.username
        post['likes'].append(username) # Note: .append is a method for `list` objects

Now every code related to User is defined in the class definition.
Both the properties and methods are visible in the same location.

>>> johndoe = User('johndoe', '2018-04-20')
>>> mikesmith = User('mikesmith', '2020-10-31')

Conclusion

Naming Convention

The built-in classes in Python are typically in lower-case because they are used frequently and recognized by most Python developers as classes. However, when defining custom classes, it is standard to use PascalCase casing for custom class names to ensure that readers of the code recognize they are classes at a glance.

Documentation

When defining a class it is advised to provide documentation using a multi-line string, and to provide documentation for its methods in the same manner.

>>> class MyClass:
...     """Description and purpose of the class goes here.
...     Also describe the properties that belong to this class.
...     The rest of the class definition goes below
...     """
...     
...     def my_method(self, x, y):
...         """Description about my_method preferably talking about
...         the purpose of the method and what its arguments are for
...         """
...         pass

Quick Tip:
if you call the built-in help function on a class or object, it outputs the documentation for that class.

>>> help(MyClass)
Help on class MyClass in module __main__:

class MyClass(builtins.object)
 |  Description and purpose of the class goes here.
 |  Also describe the properties that belong to this class.
 |  The rest of the class definition goes below
 |  
 |  Methods defined here:
 |  
 |  my_method(self, x, y)
 |      Description about my_method preferably talking about
 |      the purpose of the method and its arguments are for
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)

Hopefully through these examples (Basic Solution and OOP Solution), you now realize how powerful the Object-Oriented Programming paradigm is.

This style of programming will come up frequently as you advance in your programming journey.

As an exercise, you can try implementing a Post class with what we've learned so far and integrate it with the current code.


Prev - Overview Next - Inheritance (OOP)