Day 8: Introduction to Sets

Day 8: Sets

A. Definition of a Set

In the realm of computer programming, a set is a type of data structure that stores multiple items in a single variable. Set items are unordered, unindexed, and each item is unique (no duplicates are allowed). A set itself is mutable. We can add or remove items from it. However, a set cannot contain mutable types like lists, sets or dictionaries as its elements.

B. Importance of Sets in Programming

Sets are incredibly useful in various programming scenarios. Here are a few reasons why:

  1. Uniqueness of Elements: A set automatically removes duplicate items. This can be especially useful when you’re working with a large amount of data and need to ensure that no data points are repeated.
  2. Efficient Operations: Sets are optimized for checking whether a specific element is contained in the set. This is much more efficient than doing the same with a list or a tuple.
  3. Mathematical Operations: Sets can be used to perform mathematical set operations like union, intersection, difference, and symmetric difference.
  4. Data Cleaning: Sets can be used for data cleaning in data analysis and pre-processing stages.

C. Differences between Sets, Lists, and Tuples

While sets, lists, and tuples are all data structures that can store multiple items, they have some key differences.

  1. Order of Elements: In a list, items have a definite order. The order is the way in which items are arranged. A tuple also maintains order. However, in a set, items do not have a defined order, items are unordered.
  2. Indexing: Because lists and tuples maintain order, you can access items by referring to their index number. But since sets are unordered, you cannot refer to items by index number.
  3. Mutability: Lists are mutable, which means you can change their content without changing their identity. You can modify a list item, add new items, and delete or remove items. Tuples are immutable; you cannot change, add, or remove items after the tuple is defined. Sets are also mutable. You can add and remove items from sets.
  4. Uniqueness: In lists and tuples, you can have items that have duplicate values. In sets, all items must be unique.

D. Real-life Examples of Set Usage

  1. Social Networks: Sets can be used to find common friends, suggest friends, find people you may know, etc.
  2. E-commerce: Sets are used in e-commerce websites to check the list of items viewed by the user, to compare product sets, or to make product recommendations.
  3. Data Analysis: Sets are used to find unique values from a list of data or to find the difference and intersection in multiple data lists.
  4. Search Engine: To check whether a webpage has been indexed or not, search engines use a set data structure.

In the following sections, we will dive into the mechanics of using sets in Python and explore their properties and the operations that can be performed on them.

II. Basics of Sets in Python

A. Creating a Set

In Python, you can create sets in a couple of different ways: using curly braces {} or by using the set() function.

1. Empty Set

An empty set is a set that does not contain any elements. It’s useful when you need a set, but you don’t have any elements to put in it yet.

Here is how you can create an empty set in Python:

# Creating an empty set
s = set()

print(s)  # Output: set()

Note: Do not use {} to create an empty set because in Python, {} is used to create an empty dictionary.

2. Set with Elements

A set with elements can be created by placing a comma-separated sequence of elements within curly braces {}. Here is an example:

# Creating a set with elements
s = {1, 2, 3, 4, 5}

print(s)  # Output: {1, 2, 3, 4, 5}

You can also use the set() function to create a set from a list or tuple:

# Creating a set from a list
s = set([1, 2, 3, 4, 5])

print(s)  # Output: {1, 2, 3, 4, 5}

Notice that when we print the sets, the order of elements may not be the same as we entered. This is because, in Python, sets are unordered collections of items.

Also, if there are duplicate elements in the sequence when creating the set, they will be removed in the created set because sets don’t allow duplicate elements.

# Creating a set with duplicate elements
s = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5}

print(s)  # Output: {1, 2, 3, 4, 5}

Here, although we have multiple 2s, 3s, 4s, and 5s in the sequence, in the output, each of them appears only once.

B. Data Types in a Set

A set can contain values of any immutable data type. Whether integers, floats, strings, tuples, or frozensets, if it’s immutable, it can be part of a set. However, mutable data types, such as lists, dictionaries, and sets themselves, are not allowed to be elements of a set.

1. Immutable Data Types

Immutable data types in Python are those data types for which we cannot change the content once they are created. The most common immutable data types that can be included in a set are:

  • Integer: A whole number with no fractional part.
  • Float: A number having both an integer and fractional part, separated by a point.
  • String: A collection of one or more characters put in single, double, or triple quotes.
  • Tuple: A collection of Python objects separated by commas.
  • Frozenset: A frozenset is a built-in set in Python that is immutable and hashable.

Here is an example:

# Creating a set with different immutable data types
s = {1, 2.2, 'hello', ('a', 'b'), frozenset([1, 2, 3])}

print(s)  # Output: {1, 2.2, 'hello', ('a', 'b'), frozenset({1, 2, 3})}

2. Mutable Data Types and Why They’re Not Allowed

Mutable data types are those for which the contents can be changed after they are created. The most common mutable data types in Python are lists, dictionaries, and sets.

These data types are not allowed as elements in a set because of how sets are implemented. Sets in Python are implemented using hash tables. A requirement for an object to be used as a key in a hash table (and thus as an element in a set) is that the object’s hash value remains constant. However, mutable objects can change their content without changing their identity, which would alter their hash value, causing problems with the lookup operation.

For example, if you try to create a set with a list as an element, Python will raise a TypeError.

# Trying to create a set with a list
s = {[1, 2, 3]}  # Raises TypeError: unhashable type: 'list'

In this case, you will receive a TypeError because lists are mutable, and thus “unhashable”, which means they can’t be added to a set. If you need a set-like object with mutable elements, consider using a list of sets or using a dictionary and ignoring the values.

III. Accessing Elements in a Set

A. Iterating Through a Set

Because sets are unordered, you can’t access or manipulate items in a set by referring to an index. But you can loop through the set using a for loop, or ask if a specified value exists in a set by using the in keyword.

Here’s how you can iterate over a set:

# Creating a set
s = {1, 2, 3, 4, 5}

# Iterating over the set
for element in s:
    print(element)

# Output: 
# 1
# 2
# 3
# 4
# 5

B. Checking if an Item Exists

To check if an item exists in a set, you can use the in keyword:

# Creating a set
s = {1, 2, 3, 4, 5}

# Check if 3 is in the set
print(3 in s)  # Output: True

# Check if 6 is in the set
print(6 in s)  # Output: False

C. Why Indexing and Slicing is Not Supported

Indexing and slicing are not supported in sets because they are unordered collections of items. Indexes are used in ordered collections to access elements at specific positions. With no concept of order in sets, accessing a particular index or slicing a range of indexes doesn’t make sense and therefore is not permitted.

D. Caveats and Considerations

  1. Set Mutability: While a set as a whole is mutable (we can add and remove items from it), the elements themselves must be of an immutable type.
  2. No Duplicate Elements: A set will automatically remove any duplicate values.
  3. Order of Elements: Sets are unordered, meaning that the order in which elements are added may not be the order in which they are iterated. Also, the ordering of elements could change over time as elements are added and removed.
  4. Accessing Elements: Elements in a set can be accessed using a loop or the in keyword. Direct access via indexing or slicing is not supported due to the unordered nature of sets.
  5. Set Operations: Sets support a range of mathematical and logical operations, such as union, intersection, difference, and symmetric difference, that can be used to manipulate and compare sets. We’ll be discussing these in the next section of this course.

IV. Set Operations

A. Adding Elements

Python sets have methods for adding elements: add() and update().

1. add() Method

The add() method adds a single element to a set. If the element is already present in the set, it doesn’t do anything. Here’s an example:

# Creating a set
s = {1, 2, 3}

# Adding an element to the set
s.add(4)

print(s)  # Output: {1, 2, 3, 4}

# Trying to add an existing element to the set
s.add(2)

print(s)  # Output: {1, 2, 3, 4}

In the first add() call, the number 4 was added to the set. In the second add() call, the number 2 was not added because it was already present in the set.

2. update() Method

The update() method can add multiple items to a set. You can pass one or more iterable (lists, tuples, sets, etc.) to the update() method, and it will add all of the elements of the iterables to the set:

# Creating a set
s = {1, 2, 3}

# Adding multiple elements to the set
s.update([4, 5, 6])

print(s)  # Output: {1, 2, 3, 4, 5, 6}

# Adding elements from multiple iterables
s.update([7, 8], (9, 10), {11, 12})

print(s)  # Output: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}

In the first update() call, the numbers 4, 5, and 6 were added to the set. In the second update() call, the numbers 7 through 12 were added from a list, a tuple, and a set, respectively.

Similar to add(), if any of the elements already exist in the set, update() will not add those.

B. Removing Elements

Python sets have methods for removing elements: remove(), discard(), and clear().

1. remove() Method

The remove() method removes a specified element from the set. This method alters the set in-place. If the element is not found, it raises a KeyError.

# Creating a set
s = {1, 2, 3, 4, 5}

# Removing an element from the set
s.remove(3)

print(s)  # Output: {1, 2, 4, 5}

# Trying to remove an element not present in the set
s.remove(6)  # Raises KeyError: 6

2. discard() Method

The discard() method also removes a specified element from the set. However, if the element is not found, it does nothing and does not raise an error.

# Creating a set
s = {1, 2, 3, 4, 5}

# Discarding an element from the set
s.discard(3)

print(s)  # Output: {1, 2, 4, 5}

# Trying to discard an element not present in the set
s.discard(6)

print(s)  # Output: {1, 2, 4, 5}

3. clear() Method

The clear() method removes all elements in the set, leaving behind an empty set.

# Creating a set
s = {1, 2, 3, 4, 5}

# Clearing the set
s.clear()

print(s)  # Output: set()

C. Copying a Set

1. Shallow Copy: copy() Method

The copy() method returns a new set that is a shallow copy of the original set. This means that the new set will be a new object with the same elements as the original set, but changes to the original set won’t affect the new set and vice versa.

# Creating a set
s = {1, 2, 3, 4, 5}

# Copying the set
t = s.copy()

print(t)  # Output: {1, 2, 3, 4, 5}

# Adding an element to the original set
s.add(6)

print(s)  # Output: {1, 2, 3, 4, 5, 6}
print(t)  # Output: {1, 2, 3, 4, 5}

2. Deep Copy: Understanding deepcopy() Function

In the context of Python sets, a deep copy isn’t any different from a shallow copy, because sets can only contain immutable elements, so there’s no possibility of nested references being modified. Thus, you can use the copy() method to safely duplicate sets, even if they contain complex objects like tuples or frozensets.

For other mutable data types like lists or dictionaries, which can contain other mutable types, you might need a deep copy, which you can get with the copy.deepcopy() function from the copy module. This is beyond the scope of this discussion about sets but is important to remember when working with mutable data structures.

V. Mathematical Set Operations

Set data types in Python support several operations that correspond to mathematical set operations. We will be covering two of these: union and intersection.

A. Union

The union of two sets is a set of all elements from both sets.

1. union() Method

The union() method returns a new set with all items from both sets:

# Creating two sets
s1 = {1, 2, 3}
s2 = {3, 4, 5}

# Union of the sets
s3 = s1.union(s2)

print(s3)  # Output: {1, 2, 3, 4, 5}

Notice that even though the number 3 is present in both sets, it only appears once in the union set because sets don’t allow duplicate elements.

2. Using the ‘|’ Operator

The union of two sets can also be obtained using the ‘|’ operator:

# Creating two sets
s1 = {1, 2, 3}
s2 = {3, 4, 5}

# Union of the sets
s3 = s1 | s2

print(s3)  # Output: {1, 2, 3, 4, 5}

B. Intersection

The intersection of two sets is a set of elements that exist in both sets.

1. intersection() Method

The intersection() method returns a new set with only those items that are present in both sets:

# Creating two sets
s1 = {1, 2, 3}
s2 = {2, 3, 4}

# Intersection of the sets
s3 = s1.intersection(s2)

print(s3)  # Output: {2, 3}

In this case, the numbers 2 and 3 are present in both sets, so they are included in the intersection set.

2. Using the ‘&’ Operator

The intersection of two sets can also be obtained using the ‘&’ operator:

# Creating two sets
s1 = {1, 2, 3}
s2 = {2, 3, 4}

# Intersection of the sets
s3 = s1 & s2

print(s3)  # Output: {2, 3}

C. Difference

The difference of two sets is a set of elements that exist only in the first set but not in the second set.

1. difference() Method

The difference() method returns a new set with items in the first set that are not in the second set:

# Creating two sets
s1 = {1, 2, 3}
s2 = {2, 3, 4}

# Difference of the sets
s3 = s1.difference(s2)

print(s3)  # Output: {1}

Here, the number 1 is present only in the first set, so it’s included in the difference set.

2. Using the ‘-‘ Operator

The difference of two sets can also be obtained using the ‘-‘ operator:

# Creating two sets
s1 = {1, 2, 3}
s2 = {2, 3, 4}

# Difference of the sets
s3 = s1 - s2

print(s3)  # Output: {1}

D. Symmetric Difference

The symmetric difference of two sets is a set of elements that exist only in one of the sets, but not in both.

1. symmetric_difference() Method

The symmetric_difference() method returns a new set with items that are in one of the sets, but not in both:

# Creating two sets
s1 = {1, 2, 3}
s2 = {2, 3, 4}

# Symmetric difference of the sets
s3 = s1.symmetric_difference(s2)

print(s3)  # Output: {1, 4}

Here, the numbers 1 and 4 are present only in one of the sets, so they’re included in the symmetric difference set.

2. Using the ‘^’ Operator

The symmetric difference of two sets can also be obtained using the ‘^’ operator:

# Creating two sets
s1 = {1, 2, 3}
s2 = {2, 3, 4}

# Symmetric difference of the sets
s3 = s1 ^ s2

print(s3)  # Output: {1, 4}

E. Subset, Superset, and Disjoint Sets

  1. Subset: A set s1 is considered a subset of s2 if every element of s1 is in s2. The issubset() method checks if all elements of a set are present in another set.
# Creating two sets
s1 = {1, 2, 3}
s2 = {1, 2, 3, 4, 5}

print(s1.issubset(s2))  # Output: True
  1. Superset: A set s1 is considered a superset of s2 if s1 includes every element of s2. The issuperset() method checks if all elements of a set are included in another set.
# Creating two sets
s1 = {1, 2, 3, 4, 5}
s2 = {1, 2, 3}

print(s1.issuperset(s2))  # Output: True
  1. Disjoint Sets: Two sets are disjoint if they have no common elements. The isdisjoint() method checks if two sets are disjoint.
# Creating two sets
s1 = {1, 2, 3}
s2 = {4,5, 6}

print(s1.isdisjoint(s2))  # Output: True

In this example, since there are no common elements between s1 and s2, the isdisjoint() method returns True.

VI. Set Comprehension

A. Concept and Syntax

Set comprehension is a concise way to create sets in Python. It’s inspired by the mathematical notion of “set builder notation”. Similar to list comprehension, set comprehension provides a shorter syntax when you want to create a new set based on the values of an existing iterable.

The syntax of set comprehension in Python is:

{expression(variable) for variable in input_set [predicate expression]}
  1. expression(variable): This part transforms each variable in some way before it is added to the new set.
  2. variable in input_set: This is the for-loop that iterates over each item of the input_set.
  3. predicate expression (optional): This part is a filter that decides whether to include or exclude the variable in the new set.

B. Examples of Set Comprehensions

Let’s take a look at some examples:

  1. Creating a set of the squares of numbers from 0 to 9:
squares = {x**2 for x in range(10)}
print(squares)  # Output: {0, 1, 4, 9, 16, 25, 36, 49, 64, 81}
  1. Creating a set of even numbers from a list of numbers:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9]
evens = {n for n in numbers if n % 2 == 0}
print(evens)  # Output: {2, 4, 6, 8}

In this example, the predicate expression if n % 2 == 0 ensures that only even numbers are included in the new set.

C. Advantages of Using Set Comprehension

  1. Conciseness: Set comprehensions allow you to define and construct sets in a single line of code, which is more succinct than using a for-loop.
  2. Performance: Set comprehensions are often faster than equivalent for-loops because they are specifically optimized for creating new sets.
  3. Readability: Once you’re familiar with the syntax, set comprehensions are often easier to read and understand than equivalent for-loops. They provide a clear and concise way to represent the transformation of one set (or any iterable) into another set.