Day 1: Setting up Python environment (Anaconda, Jupyter Notebook)
Day 1: Setting up Python Environment (Anaconda, Jupyter Notebook)
I. Introduction to Anaconda
A. Brief Explanation of Anaconda
1. What is Anaconda?
Anaconda is a free and open-source distribution of the Python and R programming languages. It is used for scientific computing and data science-related tasks. Developed by Anaconda, Inc., it aims to simplify package management and deployment, as well as handle the various dependencies that arise with the use of data science tools.
Anaconda consists of more than 1,500 Python/R data science packages, a package, environment manager named ‘conda’, and a host of other utilities. It offers a whole ecosystem for researchers, scientists, and analysts, allowing them to work with the high-level languages and libraries they need, rather than dealing with the nitty-gritty of low-level programming, packaging, and installation challenges.
The distribution comes pre-loaded with many useful and popular data science libraries, such as NumPy, Pandas, and Scikit-learn, making it a one-stop solution for most computational requirements.
2. Why use Anaconda for Python?
Choosing Anaconda for Python brings with it several advantages:
- Simplicity: Anaconda is easy to install, and its use removes the need to manually install each of the many libraries you might need. It offers a simple solution to package management, reducing the hassle of handling multiple libraries.
- Versatility: With the conda package and environment manager, you can manage and isolate projects using different versions of Python and installed packages, which minimizes potential conflicts between projects.
- Comprehensive: Anaconda includes a wide variety of robust, well-tested, and optimized libraries for scientific computing and data analysis, saving the time you would otherwise spend searching for the right tools.
- Community and Support: Anaconda has a large and active community. This means that if you encounter any problems or need help, there’s a good chance you’ll be able to find someone who has faced a similar issue.
In essence, Anaconda offers a complete platform for Python programming, providing the necessary tools to simplify the entire scientific computing workflow. This makes it a preferred choice for anyone working in data science, scientific computing, or related fields.
B. Download and Installation of Anaconda
1. System Requirements for Anaconda
Anaconda is compatible with Windows, macOS, and Linux systems. For a smooth operation of Anaconda, your system should meet the following requirements:
- Operating System: Windows 7 or newer, 64-bit macOS 10.10+, or Linux, including Ubuntu, RedHat, CentOS 6+, and others.
- If your operating system is older, you may need to use an earlier version of Anaconda that supports your system.
- Processor: Intel Core i3 or equivalent
- RAM: Minimum 4GB (8GB recommended for efficient Anaconda usage)
- Disk Space: Minimum 3GB to install Anaconda, and you will need additional space for your projects and data.
2. Finding and Downloading the Appropriate Version
- Visit the Anaconda distribution page at https://www.anaconda.com/distribution/
- Choose the correct download for your operating system (Windows, macOS, or Linux).
- Ensure to download the Python 3.x version, as Python 2 is no longer being maintained.
3. Step-by-step Walkthrough of the Installation Process
For Windows Users:
- Run the installer by double-clicking the downloaded file.
- In the setup, select “Just Me”, unless you want to install Anaconda for all users.
- Choose an install location that the installer will write to. The default path is generally fine to use.
- In the Advanced Installation Options, check both boxes “Add Anaconda to my PATH environment variable”, and “Register Anaconda as my default Python 3.x”. Then, click Install.
For macOS Users:
- Open the installer by double-clicking the downloaded file.
- In the welcome screen, click Continue.
- Read the software license agreement and click Continue. Then, agree to the terms.
- Select the destination for the installation. For just you, use “Install on a specific disk…” and select your home folder. For all users, use “Install for me only.”
- To start the installation, click Install.
For Linux Users:
- Open a terminal.
- Navigate to the directory containing the downloaded file.
- Run the command
bash Anaconda3-<version>-Linux-x86_64.sh, where<version>is the version number of the downloaded file. - Accept the license agreement by typing ‘yes’.
- Accept or change the installation location.
- Allow the installer to prepend the Anaconda3 install location by typing ‘yes’.
4. Verification of Successful Installation
To confirm that the Anaconda distribution was installed correctly, you can:
- For Windows and macOS users, search for and open the “Anaconda Navigator” in your applications.
- For Linux users, in the terminal, type
conda list. If the installation was successful, a list of installed packages appears.
In case of any issues, you can consult the Anaconda documentation or the multitude of online communities and forums dedicated to Python and Anaconda.
II. Getting Started with Anaconda
A. Introduction to Anaconda Navigator
1. Overview of Anaconda Navigator
Anaconda Navigator is a desktop graphical user interface (GUI) included in the Anaconda distribution that allows you to launch applications and manage conda packages, environments, and channels without using command-line commands. It is designed to be user-friendly, making it a great tool for beginners who may be less comfortable with a command-line interface.
Anaconda Navigator can search for packages on Anaconda Cloud or in a local Anaconda Repository, install them in an environment, run the packages, and update them. It is also possible to search, browse, and install Conda packages from the Navigator itself.
2. Launching Anaconda Navigator
To start Anaconda Navigator, follow the steps specific to your operating system:
- Windows: Open the Start menu by clicking the Windows logo in the lower-left corner of the screen, then scroll to find and select Anaconda Navigator from the program list.
- macOS: Open Launchpad, then click the Anaconda-Navigator icon, or you can use Spotlight to search for Anaconda Navigator.
- Linux: Open a terminal window and type
anaconda-navigator.
3. Exploring the User Interface
The Anaconda Navigator interface is divided into several areas:
- Home Tab: This is where you can launch applications such as Jupyter Notebook, JupyterLab, Spyder, etc. The list of applications can vary based on the packages installed in the active environment.
- Environments Tab: This tab shows a list of all environments, which are isolated spaces where packages can be installed without interfering with each other. You can create, clone, export, and delete environments from here, and you can also switch between environments.
- Learning Tab: Here you will find links to learning resources, like tutorials and training courses.
- Community Tab: This tab gives you access to the Anaconda community where you can find additional help, join discussions, or discover new projects.
Spend some time exploring these areas to get familiar with the interface and understand how Anaconda Navigator can simplify your work with Python packages and environments.
III. Introduction to Jupyter Notebook
A. Brief Explanation of Jupyter Notebook
1. What is Jupyter Notebook?
The Jupyter Notebook is an open-source web application that allows the creation and sharing of documents that contain live code, equations, visualizations, and narrative text. It is an interactive computational environment, where you can combine code execution, rich text, mathematics, plots, and rich media.
The name “Jupyter” is a combination of the core languages it was designed for: JUlia, PYThon, and R. Notebook documents are both human-readable documents containing the analysis description and the results (figures, tables, etc.) as well as executable documents which can be run to perform data analysis.
2. Why use Jupyter Notebook for Python?
There are several reasons why Jupyter Notebooks are popular in data analysis and scientific computing:
- Interactive Coding: Jupyter Notebooks allow you to write and run code interactively. You can run cells out of order, which is particularly useful for data exploration where you might need to go back and tweak your analysis.
- Integration of Text and Code: Notebooks allow you to interleave text with code, making it easy to provide explanations and documentation to your code. This is a significant advantage over traditional scripts.
- Visualization Support: Notebooks are excellent for visualizing data. Plots created with Matplotlib or other libraries can be embedded directly in the notebook alongside the code that generated them.
- Sharing and Collaboration: Jupyter Notebooks are easy to share. They can be exported in a variety of formats including HTML, PDF, and Markdown, and can be shared via email, Dropbox, GitHub, and Jupyter Notebook Viewer.
- Wide Usage: Jupyter is widely used in the data science community and has many extensions and plugins available.
These features make Jupyter Notebook an excellent tool for teaching, demonstrative programming, rapid prototyping, and collaborative coding.
B. Launching Jupyter Notebook
1. Launching Jupyter Notebook via Anaconda Navigator
To start the Jupyter Notebook:
- Open Anaconda Navigator.
- On the home tab, you’ll see several applications you can launch. Click on the “Launch” button under Jupyter Notebook. This will start a Jupyter Notebook server and open your default web browser.
2. Navigating the Jupyter Notebook Dashboard
The Jupyter Notebook dashboard typically opens in a new web browser window. It is a control panel that allows you to manage your Jupyter Notebooks, files, and subdirectories. The dashboard serves as a home page for the notebook, providing an overview of these components:
- Files Tab: Displays the files and folders in the current directory. From here, you can create new notebooks or folders, upload files, or to open existing notebooks.
- Running Tab: Shows the currently running notebooks (those with a green icon). You can shut down running notebooks from this tab.
- Clusters Tab: For use with IPython parallel, a Python package for parallel computing.
IV. Creating and Organizing Jupyter Notebooks
A. Creating a New Jupyter Notebook
To create a new Jupyter Notebook:
- Navigate to the dashboard’s Files tab.
- Click on the “New” button in the top right corner.
- From the dropdown menu, select “Python 3” under the “Notebook” section.
This will open a new tab with an empty notebook named “Untitled”. You can rename this notebook by clicking on the “Untitled” name at the top and entering a new name.
B. Organizing Jupyter Notebooks
Managing your notebooks effectively is crucial, especially when working on multiple projects. Here are some tips:
- Create a new directory for each project: Click on the “New” dropdown button and select “Folder”. Open the new folder and rename it with your project’s name. Now, you can create new notebooks within this directory.
- Renaming Notebooks: To rename a notebook, open the notebook and click on the current name at the top of the page. Enter the new name and hit enter.
- Moving Notebooks: If you want to move notebooks from one directory to another, you can do so by checking the box next to the notebook and using the “Move” button at the top of the dashboard. You’ll need to enter the relative path to the new directory.
- Deleting Notebooks: To delete a notebook, check the box next to the notebook on the dashboard and use the “Delete” button.
Remember, keeping your notebooks and projects organized will not only make your work more efficient but also make it easier to find and share your work with others.
A. Creating a New Jupyter Notebook
1. Step-by-Step Guide to Creating a New Jupyter Notebook
Follow these steps to create a new Jupyter Notebook:
- From the Jupyter Notebook dashboard (which can be opened via Anaconda Navigator), navigate to the directory where you want to create your new notebook.
- Click on the “New” button in the top right corner of the dashboard.
- From the dropdown menu, under the “Notebook” section, select the Python version you want to use (typically “Python 3”).
- A new tab will open in your web browser with an empty notebook, named “Untitled” by default. The notebook is saved in the directory you were navigating in the dashboard.
- To rename the notebook, click on “Untitled” at the top of the page. A dialog box will open. Enter your desired name for the notebook and click “Rename”.
2. Overview of the Jupyter Notebook User Interface
Once you’ve created and opened a new notebook, you’ll be presented with the notebook interface. It’s composed of two main components:
- Notebook Menu: This is located at the top of the page and provides access to various functions such as saving the notebook, exporting the notebook in different formats, editing the notebook, etc.
- Toolbar: Just below the Notebook Menu, this offers shortcuts to commonly used features. It allows you to save your notebook, add a new cell, cut or copy cells, execute code, and much more.
- Cell: The notebook itself consists of cells. A cell is a container for text to be displayed in the notebook or code to be executed by the notebook’s kernel. Cells can be of different types, including “Code”, “Markdown”, “Raw NBConvert” etc.
- Code Cells: These are used to write and edit live code. When a code cell is run, it executes the code in the language of the notebook’s kernel and displays the output below the cell.
- Markdown Cells: These are used to write and edit rich text. You can format the text in a variety of ways and even include equations, hyperlinks, and images.
In the margin of the notebook, you’ll see a line that shows where each cell begins and ends with areas for controls and status indicators, including a blue left margin for command mode and a green one for edit mode.
Remember that Jupyter operates on a modal user interface. The mode (command or edit) changes the available keyboard commands. You can toggle between these modes using Esc and Enter.
B. Organizing Notebooks
1. Saving and Naming Notebooks
The notebook auto-saves your work every few minutes, but it’s a good practice to manually save your work often by clicking on the “Save” icon on the toolbar, or by using the shortcut Ctrl + S.
When you create a new notebook, it’s named “Untitled” by default. To rename it:
- Click on the “Untitled” name at the top of your notebook, and a dialog box will appear.
- Enter the new name for your notebook and click “Rename”.
Remember to give your notebooks descriptive names, particularly if you are working on several different notebooks at the same time. This will make it easier for you to locate your work later.
2. Organizing Notebooks in Folders
If you’re working on several projects or different aspects of a large project, you might want to organize your notebooks in folders. To do this:
- Navigate to the dashboard’s Files tab.
- Click on the “New” dropdown button and select “Folder”. This will create a new folder named “Untitled Folder”.
- To rename the folder, check the box next to it and click the “Rename” button on the toolbar.
- You can now create new notebooks within this directory by entering it (click on the folder’s name) and following the usual steps to create a new notebook.
Organizing your notebooks in folders can help keep your workspace tidy and your projects separate. It can also make it easier to share your work with others, as you can share a whole folder of related notebooks at once.
C. Basic Operations
1. Creating, Deleting, and Rearranging Cells
- Creating Cells: To create a new cell, you can use the “Insert” menu option from the toolbar and choose “Insert Cell Above” or “Insert Cell Below”, depending on your needs. Alternatively, you can use the shortcuts
A(for above) orB(for below) while in command mode (pressEscto switch to command mode). - Deleting Cells: To delete a cell, select the cell and then choose “Delete Cells” from the “Edit” menu or use the shortcut
Dpressed twice in command mode. - Rearranging Cells: To move a cell, select it, and then use the “Up” or “Down” options in the “Cell” menu. The keyboard shortcuts for these are
Ctrl + Shift + Up/Down.
2. Different Cell Types: Code, Markdown, etc.
Jupyter notebooks are made up of cells, each of which can be one of a few types:
- Code Cells: These cells contain code in the language of the current notebook’s kernel (Python, in our case). The code in these cells is executed when you press
Shift + Enter, and the cell’s output is displayed below the cell. - Markdown Cells: These cells contain text formatted using Markdown, a lightweight markup language. They can include text, headings, lists, tables, links, images, LaTeX equations, and more. These cells are “run” (rendered into formatted text) when you press
Shift + Enter. - Raw NBConvert Cells: These are unformatted cells that are included when you convert your notebook into another format using
nbconvert. They’re typically only used for that purpose. - Heading Cells: In older versions of Jupyter Notebooks, these cells were used for organizing your notebook. In more recent versions, they have been replaced with Markdown cells using the
#character for creating headings.
You can switch the type of a cell by using the cell type dropdown menu in the toolbar at the top, or by using the keyboard shortcuts Y (for code), M (for markdown), R (for raw), and 1-6 (for headings of different levels) while in command mode.
V. Essential Jupyter Notebook Tips and Tricks
A. Executing Cells
1. How to Run a Single Cell
To run a single cell in Jupyter Notebook:
- Click on the cell to select it.
- Press
Shift + EnterorCtrl + Enterto run the cell.
Shift + Enter runs the current cell and selects the next one, while Ctrl + Enter runs the current cell and keeps it selected. If you’re in the middle of editing a cell, both shortcuts will also exit edit mode.
2. How to Run Multiple Cells
Jupyter Notebook provides several ways to run multiple cells:
- Run All: To run all cells in the notebook, you can use the “Cell” -> “Run All” menu option. This will run your cells from top to bottom, including any markdown cells.
- Run Above/Below: If you want to run all cells above or below a certain point, select the cell where you want to start and then use the “Cell” -> “Run All Above” or “Cell” -> “Run All Below” menu option. This is useful when you’ve made a change in one cell and want to see how it affects the cells that depend on it.
- Running a Selected Group of Cells: If you want to run a specific group of cells, you can select multiple cells using
Shift + Clickand then run them all at once by pressingShift + Enter.
These features can save you time when you’re working with complex notebooks, allowing you to re-run your code to see the effects of changes you’ve made or to ensure that everything still works the way you expect.
B. Keyboard Shortcuts
Keyboard shortcuts can significantly speed up your work in Jupyter Notebook. Here are some commonly used ones:
Shift + Enter: Run the current cell and select the next one.Ctrl + Enter: Run the current cell and stay in it.Alt + Enter: Run the current cell and insert a new one below.A/B: Insert a new cell above / below the current cell (in command mode).D, D(pressDkey twice): Delete the current cell (in command mode).Y/M: Change the current cell to code / markdown (in command mode).Ctrl + S: Save the notebook.H: Show keyboard shortcut help dialog (in command mode).Esc/Enter: Switch to command mode / switch to edit mode.
Remember, to use the commands like A, B, Y, M, and H, you need to be in command mode (press Esc to enter command mode).
C. Magic Commands
Magic commands in Jupyter are special commands that can make your coding process a lot smoother. They are prefixed by a % character. Here are some examples:
%run: Runs a python script as a program, with command line arguments passed as arguments in the script.%load: Inserts the code from an external script.%who: List all variables of global scope.%reset: Deletes all variables/names defined in the interactive namespace.%pwd: Returns the current working directory path.%cd: Changes the current working directory.%matplotlib inline: Makes matplotlib plots show up inline within the notebook.
D. Exporting and Sharing Notebooks
1. Exporting Jupyter Notebooks
Jupyter notebooks can be exported to a variety of file formats, including HTML, PDF, LaTeX, .py, and more. To export a notebook:
- Click on “File” -> “Download as”.
- Choose the format you wish to export as.
2. Sharing Jupyter Notebooks
There are several ways to share Jupyter notebooks:
- GitHub: You can upload your notebooks to GitHub, where they are rendered automatically.
- nbviewer: A simple way to share notebooks online is to put them in a public place (like GitHub), then view them using Jupyter’s nbviewer.
- Google Colab: You can upload your notebooks to Google Colab, which also allows others to run and experiment with the code.
- Binder: With Binder, you can create a fully executable environment, making it easy for others to interact with your code.
When sharing your notebooks, make sure to clear any sensitive data, especially when using public sharing platforms.
VI. Practical Exercises
A. Exercise 1: Install Anaconda and Launch Jupyter Notebook
Objective: Familiarize yourself with the Anaconda platform and launch Jupyter Notebook.
Instructions:
- Install Anaconda:
- Go to the Anaconda website.
- Download the appropriate version for your system.
- Follow the prompts on the installer screens. If you are unsure about any setting, accept the defaults. You can change them later.
- To make the changes take effect, close and then re-open your terminal window.
- Verify your installation:
- In your terminal window or Anaconda Prompt, run the command
conda list. A list of installed packages appears if it has been installed correctly.
- In your terminal window or Anaconda Prompt, run the command
- Launch Jupyter Notebook:
- From the start menu, open the Anaconda Prompt.
- Type
jupyter notebookand press enter. This will launch the Jupyter Notebook in your default web browser.
- Create a new Python notebook:
- In the Jupyter Notebook dashboard, navigate to the directory where you want to create your new notebook.
- Click on the “New” button and from the dropdown menu, select “Python 3” (or the version you installed).
- A new tab will open with an empty notebook.
Deliverable:
- A screenshot showing Jupyter Notebook running on your system, with a new, blank notebook open.
This exercise will ensure you have Anaconda and Jupyter Notebook installed correctly, and are familiar with launching a new Jupyter Notebook.
C. Exercise 3: Explore Magic Commands and Keyboard Shortcuts in Jupyter Notebook
Objective: Gain familiarity with using magic commands and keyboard shortcuts to enhance your efficiency in Jupyter Notebook.
Instructions:
- Explore Keyboard Shortcuts:
- Open a new or existing Jupyter notebook.
- Enter command mode by pressing
Esc. You will notice that the border of your current cell changes to blue. - Try out some of the following keyboard shortcuts while in command mode:
Ato insert a cell above,Bto insert a cell below,D, Dto delete a cell,Yto change a cell to code mode, andMto change a cell to markdown mode. - Press
Enterto switch back to edit mode (cell border will be green) and try theCtrl + Entershortcut to run a cell and stay in it.
- Experiment with Magic Commands:
- In a new cell, try the
%pwdmagic command and run the cell to display your current working directory. - Use the
%lsmagic command in another cell and run it to list the files in your current directory. - Now try the
%whomagic command in a new cell and run it to display a list of all variables of global scope. - Finally, use the
%matplotlib inlinemagic command, then create a plot usingmatplotlibto see how it makes plots show up inline within the notebook.
- In a new cell, try the
Deliverable:
- A Jupyter Notebook with different cells demonstrating the usage of the keyboard shortcuts and magic commands described above.
This exercise will help you become more efficient in your use of Jupyter Notebook by familiarizing you with some of its most powerful and convenient features.
D. Exercise 4: Export a Jupyter Notebook to HTML and Share
Objective: Learn how to export a Jupyter Notebook to HTML format and understand the process of sharing a notebook.
Instructions:
- Create a Simple Jupyter Notebook:
- Open a new Jupyter Notebook.
- In the first cell (in markdown mode), write a brief introduction about yourself.
- In the next cell (in code mode), write a simple Python code snippet (for example, a “Hello, World!” script or a simple calculation).
- Run all cells to ensure everything is working as expected.
- Export Notebook to HTML:
- Go to “File” -> “Download as” -> “HTML (.html)”.
- Save the file in your desired directory.
- Share Your Notebook:
- Open a GitHub account if you don’t already have one.
- Create a new repository and upload your HTML file.
- Once uploaded, click on the file in your repository and then click on the “Raw” button.
- Copy the URL of the raw file and paste it into nbviewer and hit “Go!”.
Deliverable:
- A shared link to your rendered Jupyter Notebook via nbviewer.
This exercise provides hands-on experience with exporting a Jupyter Notebook to a different format and sharing it for others to view, which is a common requirement when collaborating on data science projects.
D. Exercise 4: Export a Jupyter Notebook to HTML and Share
Objective: Learn how to export a Jupyter Notebook to HTML format and understand the process of sharing a notebook.
Instructions:
- Create a Simple Jupyter Notebook:
- Open a new Jupyter Notebook.
- In the first cell (in markdown mode), write a brief introduction about yourself.
- In the next cell (in code mode), write a simple Python code snippet (for example, a “Hello, World!” script or a simple calculation).
- Run all cells to ensure everything is working as expected.
- Export Notebook to HTML:
- Go to “File” -> “Download as” -> “HTML (.html)”.
- Save the file in your desired directory.
- Share Your Notebook:
- Open a GitHub account if you don’t already have one.
- Create a new repository and upload your HTML file.
- Once uploaded, click on the file in your repository and then click on the “Raw” button.
- Copy the URL of the raw file and paste it into nbviewer and hit “Go!”.
Deliverable:
- A shared link to your rendered Jupyter Notebook via nbviewer.
This exercise provides hands-on experience with exporting a Jupyter Notebook to a different format and sharing it for others to view, which is a common requirement when collaborating on data science projects.