Day 10: Introduction to Advanced Data Structures
Section 1: Recap and Introduction
Recap of Basic Data Structures
Let’s begin with a brief recap of the basic data structures we have already covered:
- Tuple: A tuple is a collection of objects which are ordered and immutable. Tuples are sequences, just like lists, but unlike lists, tuples cannot be changed. They are used when a statement or user-defined function can safely assume that a collection of values will not change. For instance, a tuple can represent a point in a 3D space, defined by three coordinates (x, y, z).
- Set: A set is an unordered collection of unique elements. Sets are mainly used to eliminate repeated numbers in a sequence/list. They are also used to perform some standard set operations like union, intersection, difference, etc.
- Dictionary: A dictionary is a mutable, unordered collection of key-value pairs. It provides a way to map pieces of data to each other, so that one piece of data (the key) can be used to look up another piece of data (the value). For instance, a dictionary can map student names (keys) to their grades (values).
Introduction to Advanced Data Structures
Definition
Advanced data structures are more complex structures that extend beyond the basic data structures, typically providing more efficient solutions to computing problems. These structures include graphs, trees, heaps, hash maps, bloom filters, and many others. They can be nested, hierarchical, mutable or immutable, and sometimes involve complex algorithms for their operation.
Why Advanced Data Structures are Necessary
While basic data structures are sufficient for many tasks, they may not be the most efficient or suitable option when it comes to larger datasets or more complex operations. Advanced data structures often provide optimization in terms of memory usage and time complexity.
For example, you could technically use an array (list) to represent relationships between entities, but it would be far more efficient to use a graph or tree structure. Or, while you could use a list to keep a collection of elements sorted, a heap or a binary search tree would be a more efficient choice for this purpose.
Real-life Examples and Applications
Advanced data structures find usage in various real-world scenarios:
- Graphs are used in social networks. Each person can be a node, and the connection between two people can be an edge. Similarly, the World Wide Web is a huge directed graph with web pages as nodes and hyperlinks as edges.
- Trees, specifically Binary Search Trees (BSTs), are used in databases for indexing purposes which makes searching for information faster. File systems on computers also use tree data structures for organization.
- Heaps are used in many places, most famously in implementing efficient priority queues, which in turn are used in scheduling processes in operating systems, in graph algorithms like Dijkstra’s, in heap sort, etc.
- Hash Maps (also known as Hash Tables) are used in database indexing, caching, object representation in programming languages, etc. They provide a way to maintain collections of key-value pairs and enable a user to access the value by a unique key.
- Bloom Filters are used in database systems for membership checks, web browsers for checking if a URL is malicious, and in the cache layer of a software application for checking if a piece of information is present in the cache.
In the subsequent sections, we will delve into the concepts, implementation, and operations of some of these advanced data structures.
Section 2: Linked Lists
Definition and Explanation of Linked Lists
A Linked List is a linear data structure similar to an array. However, unlike arrays, elements are not stored at contiguous memory locations but are linked using pointers.
Singly Linked Lists
A singly linked list is the simplest type of linked list, consisting of nodes where each node contains a data field and a ‘next’ reference/link to the next node in the sequence. The list is terminated when a node’s ‘next’ link is null. This kind of list allows traversal elements only in one way.
Doubly Linked Lists
A doubly linked list is a type of linked list in which each node contains a reference to the next node as well as the previous node in the sequence. This allows for a more flexible navigation as we can traverse the list in both directions.
Implementing a Linked List in Python
Creating a Node Class
First, we’ll start by creating a Node class. Each node will have a value and a pointer to the next node. In the case of a doubly linked list, each node will also have a pointer to the previous node.
class Node:
def __init__(self, data=None):
self.data = data
self.next = None
Creating a Linked List Class
Next, we’ll create a LinkedList class, which will utilize the Node class to create a chain of nodes.
class LinkedList:
def __init__(self):
self.head = None
This initialization creates an empty list with no nodes, hence self.head is None.
Adding Elements to the Linked List
To add elements to our linked list, we’ll need to create nodes with our data and connect them.
def append(self, data):
if not self.head:
self.head = Node(data)
else:
cur = self.head
while cur.next:
cur = cur.next
cur.next = Node(data)
LinkedList.append = append
In this append method, we first check if the list is empty. If it is, we create a new Node and set it as the head. If not, we traverse the list until we reach the end, then add our new node.
Here, we’ve implemented a simple singly linked list. A doubly linked list would require a bit more complexity, particularly around the maintenance of the ‘previous’ links in each node. You would also need to ensure that these links are correctly managed when adding and removing nodes.
Traversing a Linked List
Traversing a linked list means to move through each element of the list, starting from the first node (head) and ending at the last node (tail).
Iterative Method
Here’s how you might traverse a linked list iteratively in Python:
def print_list(self):
node = self.head
while node is not None:
print(node.data, end=" ")
node = node.next
LinkedList.print_list = print_list
This print_list method starts at the head of the list and prints out the data in each node. It continues this until it has traversed all nodes, i.e., until the ‘next’ attribute is None.
Recursive Method
A recursive method to traverse and print a linked list can be implemented like so:
def print_list_recursive(self, node):
if node is not None:
print(node.data, end=" ")
self.print_list_recursive(node.next)
LinkedList.print_list_recursive = print_list_recursive
In this method, we use recursion to continue printing the data in each node until we have traversed all nodes. The base case for our recursive function is when the node is None.
Linked List Operations
Insertion
Adding a new node in a linked list involves reassigning the ‘next’ pointers of the surrounding nodes. The insertion can occur at the beginning, at the end, or at any position in the list.
def insert(self, data, position):
new_node = Node(data)
if position == 0: # Insert at the start
new_node.next = self.head
self.head = new_node
else:
cur = self.head
for _ in range(position - 1):
if cur is not None:
cur = cur.next
if cur is None:
print("Position out of bounds")
else:
new_node.next = cur.next
cur.next = new_node
LinkedList.insert = insert
Deletion
Deletion in a linked list involves reassigning the ‘next’ pointer of the node preceding the node to be deleted to the node following the node to be deleted. If the head is to be deleted, we move the head pointer to the next node.
def delete(self, key):
cur = self.head
# If head node holds the key to be deleted
if cur and cur.data == key:
self.head = cur.next
cur = None
return
# Search for the key to be deleted
while cur and cur.data != key:
prev = cur
cur = cur.next
# If key was not present in linked list
if cur is None:
return
# Unlink the node from linked list
prev.next = cur.next
cur = None
LinkedList.delete = delete
Search
To search an element in a linked list, we can start from the head of the list and continue till the desired element is found or the list ends.
def search(self, key):
cur = self.head
while cur and cur.data != key:
cur = cur.next
return False if cur is None else True
LinkedList.search = search
Length Calculation
The length of a linked list can be calculated either iteratively or recursively by traversing the entire list and counting the number of nodes.
def length(self):
count = 0
cur = self.head
while cur:
count += 1
cur = cur.next
return count
LinkedList.length = length
Application
of Linked Lists
Linked lists are fundamental data structures and are used in many applications due to their dynamic nature (easy insertions and deletions) and efficient use of memory. Some common applications of linked lists include:
- Implementation of stacks and queues: Linked lists provide a dynamic size, efficient insertion/removal from the end used in stacks and queues.
- Implementation of graphs: Adjacency list representation of graphs uses linked lists.
- Dynamic memory allocation: Linked lists can help in managing available memory, required memory, and deallocation of memory when a program or a function ends.
- Performing arithmetic operations on long integers: Linked lists can be used to store and perform operations on very large numbers after they are broken into smaller digits.
Section 3: Stacks and Queues
Definition and Explanation of Stacks
A Stack is a linear data structure that follows a particular order in which operations are performed. The order can be either LIFO (Last In First Out) or FILO (First In Last Out). A real-world example of a stack is a stack of plates. The last plate you put on the stack is the first one you would take off.
The Principle of LIFO (Last In First Out)
In a LIFO data structure, the last element added to the stack will be the first one to be removed. This is the main principle when it comes to stacks.
Stack Operations
Stacks have several operations:
- Push: Add an element to the top of the stack.
- Pop: Remove an element from the top of the stack.
- Peek/Top: Get the top item of the stack.
- is_empty: Check if the stack is empty.
Implementing a Stack in Python
Here’s a simple implementation of a stack in Python:
class Stack:
def __init__(self):
self.stack = []
def is_empty(self):
return len(self.stack) == 0
def push(self, data):
self.stack.append(data)
def pop(self):
if self.is_empty():
return None
else:
return self.stack.pop()
def peek(self):
if self.is_empty():
return None
else:
return self.stack[-1]
In this implementation, we use a Python list to hold our data. We use the list’s append method to push data onto the stack, and the list’s pop method to remove data. The top of the stack is the last item in the list.
Application of Stacks
Stacks are used in many areas of computing due to their LIFO property. Some of the main uses of stacks are:
- Expression Evaluation & Syntax Parsing: Compilers use a stack for syntax checking of expressions and calculations. For example, checking of correct opening and closing of brackets in an expression.
- Backtracking: This involves finding the solution to a problem step-by-step and then retracting the steps back to find another solution. For instance, in maze and puzzle solving.
- Function call stack: Every time a function declares a new variable, it is “pushed” onto the stack. Whenever a function exits, all of the variables pushed onto the stack by that function, are freed (that is, they are deleted).
- Memory Management: Stack space is used for static memory allocation and compile-time allocated objects.
- Undo Mechanism: The undo operation in text editors, the history feature in web browsers, and other similar software applications are implemented using stacks.
Definition and Explanation of Queues
A Queue is a linear data structure or a collection in which the operations are performed based on FIFO (First In First Out) principle. A good example of a queue is any queue of consumers for a resource where the consumer that came first is served first.
The Principle of FIFO (First In First Out)
In a FIFO data structure, the first element added to the queue will be the first one to be removed. This is the main principle when it comes to queues.
Queue Operations
The main operations of a queue are:
- Enqueue: Add an element to the end of the queue.
- Dequeue: Remove an element from the front of the queue.
- Peek/Front: Get the first item of the queue.
- is_empty: Check if the queue is empty.
Implementing a Queue in Python
Here’s a simple implementation of a queue in Python:
class Queue:
def __init__(self):
self.queue = []
def is_empty(self):
return len(self.queue) == 0
def enqueue(self, data):
self.queue.append(data)
def dequeue(self):
if self.is_empty():
return None
else:
return self.queue.pop(0)
def peek(self):
if self.is_empty():
return None
else:
return self.queue[0]
In this implementation, we use a Python list to hold our data. We use the list’s append method to enqueue data onto the queue, and the list’s pop method to dequeue data. The front of the queue is the first item in the list.
Application of Queues
Queues are used extensively in both system and application software:
- Operating Systems: In OS, queues are used in multitasking, disk scheduling, and IO buffering.
- Networking: In networking, when packets arrive for routing, they are held in a buffer until they can be routed. This buffer works on a queue basis.
- Call Center Phone Systems: Call center phone systems uses Queues to hold people calling them in an order, until a service representative is free.
- Printer Queue: If many print commands are given at once, printer maintains a queue of requests, and process them one by one.
- Process Scheduling: Processes in an operating system are scheduled using algorithms that have a queuing mechanism. For example, when processes wait for a specific resource, they are held in a queue.
Section 4: Trees
Definition and Explanation of Trees
In computer science, a Tree is a widely used abstract data type that simulates a hierarchical tree structure, with a set of linked nodes. It is a non-linear data structure compared to arrays, linked lists, stacks, and queues which are linear data structures.
Terminology
- Root: The top node in a tree.
- Node: Each element in a tree is called a node.
- Edge: The link between any two nodes.
- Parent: Any node except the root node has one edge upward to a node called its parent.
- Child: The node below a given node connected by its edge downward is called its child node.
- Leaf: The node which does not have any child node is called the leaf node.
- Subtree: Subtree represents the descendants of a node.
- Depth: The depth of a node is the number of edges from the root to the node.
- Height: The height of a node is the number of edges from the node to the deepest leaf.
Different Types of Trees
Binary Trees
A Binary Tree is a tree data structure in which each node has at most two children, which are referred to as the left child and the right child. It is a commonly used tree in data structure.
Binary Search Trees
A Binary Search Tree (BST) is a tree in which all the nodes follow the property that they are larger than all the nodes in their left subtree and smaller than all the nodes in their right subtree. It allows fast lookup, addition, and removal of items.
AVL Trees
An AVL Tree (Adelson-Velsky and Landis tree) is a self-balancing binary search tree in which the difference of heights of left and right subtrees cannot be more than one for all nodes. AVL trees are more balanced compared to BSTs, but they require more writes when inserting elements because of the rotation operations to keep them balanced.
Each type of tree has its own unique properties and uses, and the choice of which type to use largely depends on the specific application and its requirements. Understanding these different types of trees is a fundamental part of understanding how data can be effectively structured and accessed in complex programs.
Implementing a Binary Tree in Python
Creating a Node
First, we will create a Node class, which will be the basic building block of our binary tree. Each node will contain some data and pointers to its left and right child nodes.
class Node:
def __init__(self, data):
self.left = None
self.right = None
self.data = data
Inserting Nodes in the Tree
Next, we’ll define a function to insert new nodes into the tree. We’ll do this by creating a new node and then finding the correct location in the tree to put this node.
class Node:
# ...previous code...
def insert(self, data):
if self.data:
if data < self.data:
if self.left is None:
self.left = Node(data)
else:
self.left.insert(data)
elif data > self.data:
if self.right is None:
self.right = Node(data)
else:
self.right.insert(data)
else:
self.data = data
Tree Traversals
Tree traversal is the process of visiting (checking and/or updating) each node in a tree data structure, exactly once.
Pre-order Traversal
In a pre-order traversal, the root is visited first, then the left subtree, and finally the right subtree.
class Node:
# ...previous code...
def preorder_traversal(self):
print(self.data, end=' ')
if self.left:
self.left.preorder_traversal()
if self.right:
self.right.preorder_traversal()
In-order Traversal
In an in-order traversal, the left subtree is visited first, then the root, and finally the right subtree.
class Node:
# ...previous code...
def inorder_traversal(self):
if self.left:
self.left.inorder_traversal()
print(self.data, end=' ')
if self.right:
self.right.inorder_traversal()
Post-order Traversal
In a post-order traversal, the left subtree is visited first, then the right subtree, and finally the root.
class Node:
# ...previous code...
def postorder_traversal(self):
if self.left:
self.left.postorder_traversal()
if self.right:
self.right.postorder_traversal()
print(self.data, end=' ')
Application of Trees
Trees are used in many areas of computer science, including:
- Operating Systems: Trees are used in various algorithms of the operating system. For example, file systems, device drivers.
- Database Systems: Database management systems use B-Trees and T-Trees to store information.
- Graph Algorithms: Trees are used in various graph algorithms like Kruskal, Prim, etc.
- Networking: Trees are used in network routing algorithm.
- Machine Learning: Decision Trees and Random Forest models are based on tree structures.
Trees, specifically Binary Search Trees, are important in software development as well, providing efficient insertion, deletion, and lookup operations, and being used to create and manage sorted lists, double-ended queues, and priority queues.