Wesley Kambale

Building, Compiling, and Fitting Models with TensorFlow

Wesley Kambale — Tue, 16 Jan 2024 13:04:58 GMT

Introduction

TensorFlow is a free and open-source software library that can be used to build machine learning models. It includes the Keras API, which provides a user-friendly interface for building models. Machine learning engineers make decisions about the architecture of a model based on the type of data they are working with, the task they are trying to accomplish, and the resources they have available. The best way to learn how to build models in TensorFlow is to start with a simple task and then gradually work your way up to more complex tasks.

Why and How?

Machine learning engineers consider the type of problem, the properties of the data, and the intended performance of the model when choosing a model architecture. Here are some tips to help you understand the decisions they make:

Start with a simple architecture: It is often a good idea to start with a simple architecture when building a new model and add complexity as needed. This allows you to quickly test and improve your ideas.

Experiment with different architectures: There is no one-size-fits-all answer to model architecture. It is important to try different architectures to see which best solves your problem.

Use prior knowledge: If you know about the problem you are trying to solve or the data you are using, you can use this information to inform your model architecture decisions.

Stay up-to-date with the field: Machine learning is a constantly evolving field, with new methods and architectures being developed all the time. Staying up-to-date with the latest research can help you make informed decisions about your model design.

Building the model

Let's say you are building a model to classify images of cats and dogs. You could start with a simple architecture, such as a convolutional neural network (CNN). You could then experiment with different architectures, such as a recurrent neural network (RNN) or a long short-term memory (LSTM) network. You could also use prior knowledge about the problem, such as the fact that cats and dogs have different fur patterns, to inform your architectural decisions. Finally, you could stay up-to-date with the latest research on image classification to find new and improved architectures.

Import libraries

# import libraries already installed import tensorflow as tffrom tensorflow.keras.models import Sequentialfrom tensorflow.keras.layers import Dense

Prepare your data

Load your dataset using appropriate methods (e.g., tf.keras.datasets, pandas, etc.).

Preprocess your data if needed (e.g., normalization, scaling, feature engineering).

Split your data into training and testing sets.

Model architecture

# Create a sequential modelmodel = Sequential()# Add layers to the modelmodel.add(Flatten(input_shape=(28, 28)))  model.add(Dense(128, activation='relu'))                      model.add(Dense(10))

We are creating a basic neural network model using TensorFlow's Keras API. The first layer is a Flatten layer that takes an input image with a shape of (28, 28) and flattens it into a 1D array. The subsequent layer is a Dense layer with 128 neurons, using the ReLU activation function. Finally, we have another Dense layer with 10 neurons.

Our choice of the number of layers and neurons was based on previous knowledge and experimentation. For example, 128 neurons in the hidden layer have been shown to perform well on similar problems in the past. Similarly, using the ReLU activation function is a common choice as it has been proven to be effective in practice. For more detailed explanations and information, please refer to the provided link.

After constructing the model, the next step is to compile it. This involves instructing TensorFlow on how we want it to learn. Our ultimate goal is to enable our program to learn and become more intelligent to tackle future challenges.

Compiling a Model in TensorFlow

Building a TensorFlow model is like building a Jenga tower, where the different layers of the model correspond to the different types of blocks in the tower. When building the model, we check that the tower is stable and that all the bricks are neatly stacked. Compiling the model is like adding the finishing touches to the tower.

To compile a machine learning model, we select components like the loss function (how well the model is performing) and the optimizer (which helps adjust the blocks to make the tower more stable). The loss function acts as a benchmark, while the optimizer acts as a shovel. Machine learning engineers select the best-performing loss function and optimizer for their problem.

Heres an example of compiling a model in TensorFlow:

# Configure learning processmodel.compile(optimizer='adam',              loss='categorical_crossentropy',              metrics=['accuracy'])

In TensorFlow, we compile a model to set up the loss function, optimizer, and metrics. This is like ensuring that all the Jenga blocks are properly placed. After creating the model, we can fit it with data to train it.

In this example, we are telling TensorFlow that we want to use categorical_crossentropy as our loss function and adam as our optimizer. We are also saying that we want to keep track of how accurate our model is by including accuracy in our list of metrics.

Fitting a Model in TensorFlow

When working with TensorFlow, training a model is similar to playing Jenga. You must maintain the balance of your tower of blocks, with each layer representing a different level of the tower. As you add or remove blocks, you assess how well the tower can remain stable.

In the field of machine learning, fitting a model involves providing it with data to learn from. The model examines the data and tries to make predictions, then measures the accuracy of those predictions. If the results are not up to standard, the model tweaks its settings and tries again, aiming to enhance its accuracy with each attempt.

# Train the model on your training datamodel.fit(x_train, y_train, epochs=10, batch_size=32)

Training a TensorFlow model involves feeding it data (x_train, y_train) and letting it learn through multiple passes (epochs). We test its performance on unseen data (x_test, y_test) to gauge its progress.

Why is fitting important?

Model fitting is the core of machine learning. Just like a poorly stacked Jenga tower, a poorly fitted model won't be reliable for real-world decisions. Fitting finds the best internal settings (hyperparameters) for your data, allowing the model to extract key information and make accurate predictions.

Think of it as automated tuning

Fitting automatically adjusts your model's parameters to optimally solve your specific problem. This ensures high accuracy and eliminates manual parameter tweaking.

Evaluate your model

# Evaluate model performance on test datatest_loss, test_acc = model.evaluate(x_test, y_test)print('Test accuracy:', test_acc)

Why Choose TensorFlow?

Flexibility and Versatility

TensorFlow supports various deep learning tasks like image recognition, natural language processing, and time series forecasting. It caters to a wide range of applications, making it a versatile choice for different projects. Its diverse backend options, including Python, C++, and Java, allow for integration with various existing systems and tools, enhancing flexibility.

Scalability and Performance

TensorFlow can handle large datasets and complex models efficiently, thanks to its distributed computing capabilities. This allows scaling up your training process for faster model development and deployment. Its integration with various cloud platforms like Google Cloud TPUs and NVIDIA GPUs further boosts performance and scalability.

Eager Execution and Debugging

TensorFlow offers eager execution, enabling line-by-line code evaluation and debugging. This makes it easier to understand and troubleshoot your model's behavior, leading to faster development cycles. Visualization tools like TensorBoard provide insights into your model's training process, allowing you to monitor performance and identify potential issues.

Continuous Development and Innovation

TensorFlow is constantly evolving, with regular updates and new features. This ensures access to cutting-edge advancements in the field of deep learning and machine learning. The active development team and community contribute to ongoing improvements in stability, performance, and usability, making TensorFlow a reliable and future-proof choice.

Disadvantages

Steep Learning Curve

TensorFlow can have a steeper learning curve compared to some other frameworks, especially for beginners. It's complex API and diverse functionalities require a dedicated effort to master. While the extensive community and resources can help, initial setup and configuration might require additional time and effort.

Resource Intensity

Training complex models in TensorFlow can be resource-intensive, demanding powerful hardware and computing resources. This can be a constraint for smaller projects or those with limited budgets. Cloud platforms can alleviate this issue, but their costs need to be factored in when making the decision.

Debugging Challenges

Debugging complex models in TensorFlow can be challenging due to its intricate architecture and data flow. While eager execution helps, identifying the root cause of issues might require advanced knowledge and expertise. Investing in proper monitoring and logging practices can help mitigate this challenge.

Potential for Overfitting

TensorFlow's flexibility allows for building powerful models, but it also increases the risk of overfitting. This occurs when the model memorizes the training data instead of learning generalizable patterns. Techniques like regularization and early stopping can help prevent overfitting, but careful tuning might be necessary.

Conclusion

You've taken a major step into the world of deep learning by understanding the fundamentals of building, compiling, and fitting models with TensorFlow. This journey may have its challenges, but the rewards are significant the ability to unlock powerful insights from your data and solve complex problems.

Here are some key takeaways to keep in mind as you continue your learning journey:

Start small and iterate: Begin with simple models and gradually increase complexity as you gain confidence. Experiment with different architectures and hyperparameters to see their impact on performance.

Leverage the community: Don't hesitate to seek help from the vast TensorFlow community. Utilize online resources, forums, and documentation to troubleshoot problems and learn from others' experiences.

Practice makes perfect: The more you train models, the better you'll understand their behavior and potential pitfalls. Use diverse datasets and tasks to hone your skills and become a well-rounded machine learning practitioner.

Stay curious and engaged: The field of deep learning is constantly evolving, with new tools and techniques emerging regularly. Keep up with the latest advancements and be open to exploring new ideas to stay ahead of the curve.

Remember, building and training effective models is not just about writing code. It's about understanding the problem you're trying to solve, choosing the right tools, and iteratively refining your approach. With dedication and a curious mind, you can harness the power of TensorFlow to build impactful solutions and become a valuable asset in the world of AI and machine learning.

Dear 2023, thank you!

Wesley Kambale — Sun, 31 Dec 2023 11:33:51 GMT

Dearest gentle reader,

We won some. We lost some. There's nothing else I can add.

I wish you the very best of the 366 days ahead.

With love,

Wes.

Building a Deep Learning Model with Keras and TensorFlow

Wesley Kambale — Thu, 27 Jul 2023 22:06:00 GMT

Introduction

What is Deep Learning?

Deep learning is a branch of artificial intelligence and machine learning that seeks to imitate the learning and decision-making capabilities of the human brain. This involves training intricate neural networks with layers upon layers to analyze and extract advanced features from raw data. By learning from large amounts of labeled data, these networks can identify patterns, classify objects, and make predictions. With its breakthrough advancements, deep learning has transformed countless industries such as computer vision, natural language processing, and speech recognition, surpassing the performance of traditional machine learning algorithms in many formerly difficult tasks.

What is Keras?

Are you familiar with Keras? It's a Python-based, open-source neural network API designed to simplify the process of building, training, and deploying deep learning models. Keras was created with user experience in mind, allowing developers and researchers to experiment with different neural network architectures without getting bogged down in the complexity of lower-level deep learning libraries like TensorFlow or Theano. With its flexible and modular structure, users can easily stack and connect layers to create complex models. Keras is widely used in various domains due to its simplicity and powerful capabilities.

What is TensorFlow?

TensorFlow is a deep learning framework developed by the Google Brain team. It's an open-source platform that offers a vast array of tools, libraries, and resources to create and implement machine learning and deep learning models. With TensorFlow, you can define and train neural networks using a flexible and symbolic dataflow graph representation. This graph structure allows for efficient parallel computations and optimization, making it possible to run on CPUs, GPUs, and even specialized hardware like TPUs (Tensor Processing Units). Its versatility, scalability, and performance have made TensorFlow a popular choice for academia, research, and industry, playing a crucial role in advancing deep learning applications and research.

Building a Deep Learning Model

In this article, we will guide you through the process of building your very first Keras classifier using the well-known deep-learning library Keras. We aim to create a basic image classification model that can accurately classify images of handwritten digits from the MNIST dataset. Before starting, please ensure that you have installed both Keras and TensorFlow. If not, you can easily install them using pip, if you haven't already:

pip install tensorflow pip install keras

Import the necessary libraries

Here, we import the required libraries for building our deep learning model. We'll be using TensorFlow and Keras for creating and training the neural network. Additionally, we import specific modules and functions for loading the MNIST dataset, defining the model architecture, and data preprocessing.

import numpy as npimport tensorflow as tf# MNIST dataset is included in Kerasfrom tensorflow.keras.datasets import mnist# Model typefrom tensorflow.keras.models import Sequential# Types of layers to be used in our modelfrom tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropoutfrom tensorflow.keras.utils import to_categorical

Load and preprocess the MNIST dataset

We have to load the MNIST dataset using the mnist.load_data() function from Keras. The dataset consists of 28x28 grayscale images of handwritten digits (0 to 9). We split the dataset into training and testing sets. Then, we normalize the pixel values of the images to range between 0 and 1 by dividing all pixel values by 255.0. This normalization helps the model converge faster during training. We also one-hot encode the class labels, converting them into binary vectors representing the corresponding digit.

# Load the MNIST dataset(x_train, y_train), (x_test, y_test) = mnist.load_data()# Normalize the pixel values to range [0, 1]x_train = x_train.astype('float32') / 255.0x_test = x_test.astype('float32') / 255.0# One-hot encode the labelsy_train = to_categorical(y_train, 10)y_test = to_categorical(y_test, 10)# Add a channel dimension for Conv2D (for grayscale images)x_train = np.expand_dims(x_train, axis=-1)x_test = np.expand_dims(x_test, axis=-1)

Build the CNN model

A Convolutional Neural Network (CNN) is a type of deep learning model particularly effective for image recognition tasks. We can use Keras's API to define the CNN architecture. The CNN model is made up of several layers:

Convolutional Layers: We include three 2D convolutional layers with increasing numbers of filters (32, 64, and 128) and small filter sizes (3x3). The activation function used is ReLU (Rectified Linear Unit), which brings non-linearity to the model.

MaxPooling Layers: Following each convolutional layer, we add a 2D MaxPooling layer with a pool size of (2, 2). MaxPooling is responsible for decreasing the spatial dimensions of the feature maps, which aids in extracting the most important information while reducing the computational load.

Flatten Layer: We then add a Flatten layer that converts the 3D feature maps into a 1D vector, which will serve as input for the fully connected layers.

Dense Layers: After the Flatten layer, two fully connected Dense layers are added. The first Dense layer has 128 neurons and employs ReLU activation. We also add a Dropout layer with a dropout rate of 50% to prevent overfitting. The second Dense layer consists of 10 neurons (one for each digit) with a softmax activation function, which outputs probabilities representing the predicted class probabilities for each image.

model = Sequential()# Convolution Layer 1model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))model.add(MaxPooling2D((2, 2)))# Convolution Layer 2model.add(Conv2D(64, (3, 3), activation='relu'))model.add(MaxPooling2D((2, 2)))# Convolution Layer 3model.add(Conv2D(128, (3, 3), activation='relu'))model.add(MaxPooling2D((2, 2)))# Connected Layer 4model.add(Flatten())model.add(Dense(128, activation='relu'))# Connected Layer 5model.add(Dropout(0.5))model.add(Dense(10, activation='softmax'))

Compile the model

We can now compile the model using the compile() function. During compilation, we define the loss function and the optimization algorithm. Since we are dealing with a multi-class classification problem, we use the categorical_crossentropy loss function, which is appropriate for this scenario. We also specify the Adam optimizer, a popular and effective optimization algorithm. Additionally, we can choose to track other metrics like accuracy during training.

# Use the Adam optimizer for learningmodel.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Train the model

After compiling the model, we move on to training it on the training dataset. We specify the number of training epochs (how many times the model will see the entire dataset) and the batch size (the number of samples the model will process before updating its parameters). We use the fit() function to train the model, passing the training data and labels. During training, the model adjusts its weights and biases to minimize the defined loss function and improve its accuracy on the training data.

# Set the number of training epochs and batch sizeepochs = 10batch_size = 128# Train the modelmodel.fit(x_train, y_train, epochs=epochs, batch_size=batch_size, validation_split=0.1)

Output:

Evaluate the model

Once the model is trained, we evaluate its performance on the test dataset using the evaluate() function. The model is not trained on the test set; instead, we use this set to assess its generalization performance. The function returns the loss value and the accuracy achieved on the test set. A high accuracy indicates that the model is capable of recognizing handwritten digits from unseen data.

loss, accuracy = model.evaluate(x_test, y_test)print(f'Test loss: {loss:.4f}, Test accuracy: {accuracy:.4f}')

Output:

Test loss: 0.0494Test accuracy: 0.9869

Make Predictions

Finally, we can use the trained model to make predictions on new data.

predictions = model.predict(x_test[:3])predicted_labels = np.argmax(predictions, axis=1)print("Predicted label:", predicted_labels)print("True label:", y_test[:3])

Output:

Conclusion

To wrap things up, we went through the steps of constructing a deep learning model with Keras and TensorFlow to train on the MNIST dataset of handwritten images. I hope this has given you a strong basis for constructing and training deep learning models. You now have the tools to address different image recognition tasks, try out various designs, and keep exploring the amazing realm of deep learning. By practicing and experimenting more, you can apply these skills to tackle more intricate issues and delve into the latest developments in the deep learning field.

Resources

Google Colab Notebook here. (Make a Copy)

Access the Keras Official Documentation here.

Access the TensorFlow official documentation here.

Automation and Integration using Python

Wesley Kambale — Mon, 10 Jul 2023 19:59:55 GMT

Introduction

Python is a versatile programming language that proves to be highly efficient for automation and integration purposes. Its impressive collection of libraries and tools makes the automation of redundant tasks, integration of different systems and applications, as well as workflow optimization, a breeze. This tutorial delves into the various facets of automation and integration using Python, complete with examples and practical use cases.

Automation and Integration

The process of automation replaces manual tasks with automated ones which reduces the need for human intervention leading to increased efficiency. On the other hand, integration involves merging different systems or applications to work flawlessly. By utilizing Python for both automation and integration, time is saved, errors are minimized, and productivity is significantly improved.

Common Use Cases

Here are some common use cases for automation and integration using Python:

Automating data backups and file synchronization.
Automating repetitive data entry tasks.
Scraping data from websites for analysis or monitoring.
Integrating different systems or applications to exchange data.
Automating report generation and data analysis.
Interacting with APIs to fetch data from external services.
Automating database operations and data processing.
Monitoring and alerting systems based on predefined conditions.

Python Libraries for Automation and Integration

Python offers a variety of libraries that facilitate automation and integration tasks. Some commonly used libraries include:

os and shutil: These libraries provide functions for automating file operations like moving, copying, renaming, and deleting files.
requests and beautifulsoup4: These libraries enable web scraping and interacting with web APIs, making it possible to automate tasks like fetching data from websites or interacting with web services.
selenium: This library is used for browser automation, allowing you to control web browsers programmatically and perform actions like filling forms, clicking buttons, and scraping dynamic web content.
pandas: This library is widely used for data manipulation and analysis. It can help automate tasks involving data processing and generating reports.
pyodbc and sqlite3: These libraries enable database integration, allowing you to connect to databases, execute queries, and automate data operations.

Automating File Operations

Python provides the os and shutil libraries for automating file operations. Here's an example that demonstrates how to copy files from one directory to another:

import shutilsource_directory = '/path/to/source'destination_directory = '/path/to/destination'shutil.copytree(source_directory, destination_directory)

This example uses the copytree function from the shutil library to copy the entire directory tree from the source directory to the destination directory.

Web Scraping and Automation

Python offers libraries like requests and beautifulsoup4 for web scraping and automation. Here's an example that demonstrates how to scrape data from a website:

import requestsfrom bs4 import BeautifulSoupurl = 'https://example.com'response = requests.get(url)soup = BeautifulSoup(response.text, 'html.parser')# Extracting data from HTMLtitle = soup.title.textparagraphs = soup.find_all('p')# Printing the extracted dataprint('Title:', title)print('Paragraphs:')for p in paragraphs:    print(p.text)

Here, we've utilized a library to send an HTTP GET request to https://kambale.dev and retrieve its HTML content. Following that, we've employed BeautifulSoup to analyze the HTML and extract targeted information. The output is presented below:

Interacting with APIs

Python's requests library is commonly used for interacting with web APIs. APIs allow different systems or applications to communicate with each other. Here's an example that demonstrates how to retrieve data from a public API:

import requestsurl = 'https://api.example.com/data'response = requests.get(url)data = response.json()# Process the retrieved datafor item in data:    print(item['name'], item['value'])

In this instance, the requests is utilized to send an HTTP GET request to a publicly available API, and acquire the response as JSON data. The retrieved data can then be processed according to the specific requirements. A notebook, located at the end of this article, showcases an example where http://universities.hipolabs.com/search?name=&country is utilized to obtain a list of universities and their respective countries. The outcome is displayed below:

Database Integration

Python provides libraries like pyodbc and sqlite3 for integrating with databases. Here's an example that demonstrates how to connect to an SQLite database, execute a query, and retrieve results:

import sqlite3# Connect to the databaseconnection = sqlite3.connect('example.db')cursor = connection.cursor()# Execute a querycursor.execute('SELECT * FROM users')# Retrieve the resultsresults = cursor.fetchall()# Process the retrieved datafor row in results:    print(row)# Close the database connectioncursor.close()connection.close()

We connect to an SQLite database using the connect function, execute a SELECT query using the execute method, retrieve the results using the fetchall method, and then process the retrieved data.

Creating Automated Reports

Python's pandas library is widely used for data manipulation and analysis, making it helpful for creating automated reports. Here's an example that demonstrates how to read data from a CSV file, perform some calculations, and generate a report:

import pandas as pd# Read data from CSV filedata = pd.read_csv('data.csv')# Perform calculationstotal_sales = data['sales'].sum()average_sales = data['sales'].mean()# Generate reportreport = f'Total sales: {total_sales}\nAverage sales: {average_sales}'print(report)

Here we use the read_csv function from the pandas library to read data from a CSV file. We then use various functions and methods to perform calculations on the data and generate a report.

Conclusion

In the world of automation and integration, Python offers a plethora of tools and libraries that can come in handy. This tutorial delves into some of the essential aspects, such as automating file operations, web scraping, interacting with APIs, integrating databases, and creating automated reports. By utilizing these capabilities, you can save time, minimize errors, and streamline your workflows, making your tasks more efficient and productive. Take a look at the different Python libraries and examples mentioned here to unleash the full potential of automation and integration with Python.

Resources

Google Colab Notebook

PandasAI: The Generative AI Library

Wesley Kambale — Fri, 16 Jun 2023 12:45:47 GMT

Introduction

What is PandasAI?

PandasAI is an advanced library built on top of the popular Pandas library, designed to provide enhanced functionality for data manipulation, analysis, and AI-driven tasks. With PandasAI, you can efficiently handle large datasets, perform complex operations, and leverage artificial intelligence techniques seamlessly. In this article, we will explore the key features of PandasAI with practical examples and code snippets.

Read more about Pandas here.

Key Features of PandasAI

PandasAI extends the functionality of Pandas with additional features. Some of the key features are:

Feature Engineering: PandasAI offers a wide range of feature engineering techniques such as one-hot encoding, binning, scaling, and generating new features.

AI-driven Operations: PandasAI integrates with popular AI libraries like scikit-learn and TensorFlow, enabling seamless integration of machine learning and deep learning algorithms with Pandas data-frames.

Exploratory Data Analysis (EDA): It provides various statistical and visualization tools for EDA, including descriptive statistics, correlation analysis, and interactive visualizations.

Time Series Analysis: PandasAI includes powerful tools for handling time series data, such as resampling, lagging, rolling computations, and date-based operations.

Installation of PandasAI

To install PandasAI, you can use the pip package manager, which simplifies the process. Run the following command to install PandasAI:

pip install pandasai

If you wish to use PandasAI in a Google Colab Notebook like I am doing, you need to run the following commands to install PandasAI and other necessary modules:

!pip install pandasai!pip install langchain

After successfully installing PandasAI, you need to import it along with the Pandas library to start using its enhanced functionalities.

import pandas as pdfrom pandasai import PandasAIfrom pandasai.llm.openai import OpenAI

To use the OpenAI library, we will have to generate an OpenAI API Key here. Also, ensure that you set up a paid account for access to OpenAI's Large Language Models (LLM) which are priced per 1,000 tokens.

Functionality of PandasAI

Before exploring the functionality of PandasAI, we will have to first set up an OpenAI environment and create an instance of PandasAI with the OpenAI environment we created.

# Loading the API token to OpenAI environmentenv = OpenAI(api_token='OpenAI API Key')# Initializing an instance of PandasAI with OpenAI environmentpandasAi = PandasAI(env)

Data Exploration

In this article, we are going to use the employee dataset containing information about employees, including their names, ages, salaries, and departments. However, this dataset has missing values in some of the columns. Below is a screenshot of the dataset:

To explore our data, we will now prompt or ask pandasAi by passing in our dataset name and the question we wish to ask. In the code snippet below, we will ask PandasAI to tell us which employees have null values in the dataset.

question = "Which employees have null values?"pandasAi.run(data, prompt=question)

Output:

To check whether PandasAI is correct with the output given from our dataset, we can run a query for null values using Pandas itself with the isna() function.

# Check for null values using Pandasdata = data[data.isna().any(axis=1)]# Print the rows with the null valuesprint(data)

Output:

From the above output, we can see that both Pandas and PandasAI return the same rows that have null values in our dataset.

Next, we can prompt pandasAi to tell us which employee earns more than all the employees in the dataset. We will run the following code:

question = "Who earns more than the others?"pandasAi.run(data, prompt=question)

Output:

With a salary of 2000000, Marvin is the highest-paid employee. And we can see from the output above that PandasAI is correct.

Up next, we can ask PandasAI to fill in the null values for the employees without salaries. To do this, we are going to run the following code:

question = "Fill in only the null values to 5 figure salaries"pandasAi.run(data, prompt=question)

Output:

From the above output, PandasAI can fill in the null values with a 5-figure salary for each employee who previously had a null value. The salary is uniform for each, but you can always twist the question/prompt to randomize the numbers.

In the next prompt, we can ask PandasAI to tell us how many employees are in different departments. This is crucial if one wishes to analyze employee data to know how employees are distributed across departments. In the code below, we prompt for the number of employees in the Sales department:

question = "How many employees are in the Sales department?"pandasAi.run(data, prompt=question)

Output:

Indeed there are 5 employees in the Sales department in our dataset.

There's more a data analyst can prompt PandasAI to do in terms of data exploration. As you can realize, we do not run the traditional Pandas codes and queries to analyze our dataset, but instead, supply English language prompts and the generative AI library will use OpenAI's LLM capabilities to return outputs.

Data Visualization

As with Pandas, PandasAI can also be used to visualize data with simple prompts supplied to pandasAi. Below, we prompt PandasAI to plot visual charts of our data.

question = "Plot a barplot of all employees and their salaries"pandasAi.run(data, prompt=question)

Output:

We can plot the employee's salaries grouped by their departments and see which departments get more salary amounts compared to other departments. To do that, we twist the prompt as seen below:

question = "Plot a barplot of all employees and their salaries grouped in their departments"pandasAi.run(data, prompt=question)

Output:

Next, we can plot a boxplot for the employee's salary and age with the following prompt:

question = "Plot a boxplot out of the employee salary and age"pandasAi.run(data, prompt=question)

Output:

We can also show the relationship between the employee's salaries and their age. To do that, we run the following prompt:

question = "Plot a scatter graph from the employee data"pandasAi.run(data, prompt=question)

Output:

Conclusion

PandasAI extends the capabilities of Pandas by providing advanced data manipulation, analysis, and AI-driven operations. In this article, we covered key features, and use cases, and provided examples and code snippets to illustrate the functionality of PandasAI. By leveraging PandasAI, you can streamline your data preprocessing pipeline and seamlessly integrate AI techniques into your Pandas workflows.

Resources

Google Colab Notebook

Dataset

PandasAI Repository

PandasAI Docs

TensorFlow v FLAX: A Comparison of Frameworks

Wesley Kambale — Mon, 12 Jun 2023 19:47:11 GMT

Introduction

TensorFlow and FLAX are two popular frameworks that have gained significant traction in the deep learning community. In this article, we will explore and compare TensorFlow and FLAX, focusing on their features, functionality, advantages, and use cases.

What is TensorFlow

TensorFlow, developed by Google, is a widely adopted open-source framework for building machine learning and deep learning models. It provides a comprehensive ecosystem of tools, libraries, and resources to simplify the development and deployment of AI models. TensorFlow utilizes a dataflow graph paradigm, where computations are represented as a graph of nodes and edges.

What is FLAX

FLAX, developed by Google Research, is a new deep learning framework that aims to provide a more flexible and transparent approach to model development. It is built on top of JAX, a composable and high-performance library for numerical computing. FLAX follows a functional programming style and emphasizes code simplicity and modularity.

Key Features of TensorFlow

Ease of Use: TensorFlow provides a high-level API that enables users to quickly build and deploy models with minimal code.

Flexibility: It supports a wide range of use cases, including computer vision, natural language processing, and reinforcement learning.

Scalability: TensorFlow offers distributed computing capabilities, allowing models to be trained on multiple devices or across a cluster of machines.

Model Visualization: TensorFlow includes TensorBoard, a powerful visualization tool for monitoring and debugging models.

Pre-Trained Models and Transfer Learning: It provides a vast collection of pre-trained models and supports transfer learning, allowing users to leverage pre-existing knowledge in their models.

Key Features of FLAX

Modularity: FLAX promotes a modular and functional approach to model building, making it easier to reason about the code and modify model components.

Automatic Differentiation: FLAX leverages JAX's automatic differentiation capabilities, which enable efficient computation of gradients for optimization algorithms.

Research-Focused: FLAX is designed with research in mind, offering flexibility and extensibility to experiment with new model architectures and training techniques.

Debugging and Profiling: FLAX integrates with JAX's profiling tools, making it easier to diagnose performance bottlenecks and optimize model training.

Reproducibility: FLAX enforces deterministic execution by default, ensuring that experiments can be easily reproduced.

Summary of Key Features

TensorFlow	FLAX
Ease of Use	Modularity
Flexibility	Automatic Differentiation
Scalability	Research-Focused
Model Visualization	Debugging and Profiling
Pre-Trained Models and Transfer Learning	Reproducibility

Functionality

Now let's dive deeper into the comparison between TensorFlow and FLAX across various dimensions:

Compatibility and Integration

TensorFlow is compatible with a wide range of hardware and software platforms. It supports CPUs, GPUs, and TPUs, making it suitable for various deployment scenarios. TensorFlow integrates well with other popular libraries and frameworks, such as Keras, TensorFlow Probability, and TensorFlow Data.

FLAX, being built on top of JAX, inherits JAX's compatibility with accelerators like GPUs and TPUs. It integrates seamlessly with other JAX libraries and can take advantage of JAX's automatic differentiation and GPU acceleration capabilities.

TensorFlow - Using TensorFlow with GPUs and TPUs:

import tensorflow as tf# Check available devicesprint(tf.config.list_physical_devices())# Use GPU for computationwith tf.device('/GPU:0'):    # TensorFlow operations# Use TPU for computationresolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://ip_address:8470')tf.config.experimental_connect_to_cluster(resolver)tf.tpu.experimental.initialize_tpu_system(resolver)strategy = tf.distribute.TPUStrategy(resolver)with strategy.scope():    # TensorFlow operations

FLAX - Using JAX and FLAX with GPUs and TPUs:

import jaximport jax.numpy as jnp# Check available devicesprint(jax.devices())# Use GPU for computationdevice = jax.devices('gpu')[0]jax.jit(function, device=device)# Use TPU for computationdevice = jax.devices('tpu')[0]jax.jit(function, device=device)

Maturity and Stability

TensorFlow has been in development for several years and has reached a high level of maturity and stability. It has a proven track record in large-scale production deployments and is backed by a major technology company like Google.

FLAX, being a newer framework, is still evolving and may undergo more frequent updates and changes. While this provides opportunities for innovation and rapid development, it may also mean that certain features or optimizations are still under development.

TensorFlow - Stability and production readiness:

import tensorflow as tf# Use TensorFlow for production-grade projectsmodel = tf.keras.Sequential([    tf.keras.layers.Dense(64, activation='relu'),    tf.keras.layers.Dense(10, activation='softmax')])# Train and evaluate the modelmodel.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])model.fit(train_dataset, epochs=10, validation_data=val_dataset)

FLAX - Innovations and rapid development:

import flaxfrom flax import linen as nn# Use FLAX for research and experimental projectsclass MyModel(nn.Module):    hidden_dim: int    def setup(self):        self.dense = nn.Dense(self.hidden_dim)        self.output = nn.Dense(10)    def __call__(self, inputs):        x = nn.relu(self.dense(inputs))        return self.output(x)model = MyModel(hidden_dim=64)# Train and evaluate the modeloptimizer = flax.optim.Adam(learning_rate=0.001).create(model)for batch in dataset:    optimizer = optimizer.train_step(batch)

Model Development

TensorFlow provides two main APIs for model development: the high-level Keras API and the lower-level TensorFlow API. The Keras API offers a user-friendly interface for building and training models with minimal code. On the other hand, the TensorFlow API provides more flexibility and control over the model architecture and training process.

FLAX, being built on top of JAX, follows a functional programming style. It promotes a modular approach to model development, allowing users to define models as composable functions. This design makes it easier to reason about the code, modify model components, and experiment with new architectures.

Making a simple feed-forward neural network in TensorFlow using the Keras API:

import tensorflow as tfmodel = tf.keras.Sequential([  # Add a dense layer with 64 units and ReLU activation  tf.keras.layers.Dense(64, activation='relu'),  # Add a dense layer with 10 units (output layer)  tf.keras.layers.Dense(10)  ])

Making the same feed-forward neural network in FLAX:

from flax import linen as nnclass FeedForward(nn.Module):  hidden_dim: int  def setup(self):    # Define a dense layer with hidden_dim units    self.dense = nn.Dense(self.hidden_dim)     # Define an output layer with 10 units     self.output = nn.Dense(10)    def __call__(self, inputs):    # Apply ReLU activation to the dense layer    x = nn.relu(self.dense(inputs))      # Return the output    return self.output(x)

Training and Optimization

Both TensorFlow and FLAX support various optimization algorithms, such as stochastic gradient descent (SGD), Adam, and RMSprop. TensorFlow provides a wide range of pre-built optimizers, while FLAX allows users to define custom optimizers easily.

TensorFlow uses the concept of eager execution by default, which allows for immediate computation and easy debugging. FLAX, on the other hand, follows a more functional style and separates the model definition from the training loop, which can provide better optimization and performance.

Training a model in TensorFlow:

# Define an Adam optimizer with learning rate 0.001optimizer = tf.keras.optimizers.Adam(learning_rate=0.001) # Define the loss functionloss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)  for inputs, labels in dataset:  with tf.GradientTape() as tape:    logits = model(inputs)    # Compute the loss    loss = loss_fn(labels, logits)    # Compute the gradients  grads = tape.gradient(loss, model.trainable_variables)    # Update the model parameters  optimizer.apply_gradients(zip(grads, model.trainable_variables))

Training the same model in FLAX:

# Define an Adam optimizer with learning rate 0.001optimizer = flax.optim.Adam(learning_rate=0.001)  # Define the loss functionloss_fn = flax.nn.logits_cross_entropy_loss  def train_step(optimizer, batch):  def loss_fn(model):    logits = model(batch['inputs'])    # Compute the loss    loss = loss_fn(labels=batch['labels'], logits=logits)      return loss.mean()  # Compute the gradient function  grad_fn = jax.grad(loss_fn)    # Compute the gradients  grad = grad_fn(optimizer.target)    # Update the model parameters  optimizer = optimizer.apply_gradient(grad)    return optimizerfor batch in dataset:  # Perform a training step  optimizer = train_step(optimizer, batch)

Distributed Training

Both TensorFlow and FLAX provide support for distributed training across multiple devices or machines. TensorFlow offers the tf.distribute.Strategy API, which allows users to distribute training across multiple GPUs or machines seamlessly. FLAX, being built on top of JAX, inherits JAX's built-in support for distributed computing.

Distributed training in TensorFlow using tf.distribute.Strategy:

# Create a MirroredStrategy for synchronous training across multiple GPUsstrategy = tf.distribute.MirroredStrategy()  with strategy.scope():  # Create the model within the strategy's scope  model = create_model()    optimizer = tf.keras.optimizers.Adam()  # Train the model  model.fit(train_dataset, epochs=10, validation_data=val_dataset)

Distributed training in FLAX using jax.pmap:

model = create_model()optimizer = flax.optim.Adam(learning_rate=0.001)@jax.pmapdef train_step(optimizer, batch):  def loss_fn(model):    logits = model(batch['inputs'])    # Compute the loss    loss = loss_fn(labels=batch['labels'], logits=logits)     return loss.mean()  # Compute the gradient function  grad_fn = jax.grad(loss_fn)   # Compute the gradients  grad = grad_fn(optimizer.target)    # Update the model parameters  optimizer = optimizer.apply_gradient(grad)   return optimizerfor batch in dataset:  # Perform a training step  optimizer = train_step(optimizer, batch)

Model Serving and Deployment

TensorFlow provides robust tools for model serving and deployment. TensorFlow Serving allows you to serve trained models over a network, making it easier to integrate models into production systems. TensorFlow also supports TensorFlow Lite, which enables the deployment of models on mobile and edge devices with optimized performance.

FLAX, being a research-focused framework, does not have built-in tools specifically designed for model serving and deployment. However, FLAX models can be exported and deployed using other frameworks or custom deployment pipelines.

TensorFlow - Serving a trained model using TensorFlow Serving:

import tensorflow as tffrom tensorflow_serving.apis import predict_pb2from tensorflow_serving.apis import prediction_service_pb2_grpc# Create a gRPC channel to connect to the TensorFlow Serving serverchannel = tf.grpc.insecure_channel('localhost:8500')stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)# Create a request for inferencerequest = predict_pb2.PredictRequest()request.model_spec.name = 'my_model'request.model_spec.signature_name = 'serving_default'request.inputs['input'].CopyFrom(tf.make_tensor_proto(input_data))# Send the request and get the responseresponse = stub.Predict(request)output_data = tf.make_ndarray(response.outputs['output'])

FLAX - Exporting a trained model for deployment using a custom pipeline:

from flax import serialization# Export the FLAX modelmodel_params = model.paramsserialized_model = serialization.to_bytes(model_params)# Save the serialized model to a filewith open('model.flax', 'wb') as f:    f.write(serialized_model)# Deploy the exported model using a custom deployment pipeline# ... (Implementation depends on the deployment setup)

Learning Curve

TensorFlow has a gentle learning curve, especially when using the Keras API, which provides a high-level abstraction for model development. Its extensive documentation and broad community support make it easier for beginners to get started.

FLAX, with its functional programming style and modular approach, may have a steeper learning curve, especially for those who are new to JAX or functional programming concepts. However, it offers a deeper level of control and flexibility for advanced users and researchers.

TensorFlow - Easy learning curve with Keras API:

import tensorflow as tf# Define a model using the Keras APImodel = tf.keras.Sequential([    tf.keras.layers.Dense(64, activation='relu'),    tf.keras.layers.Dense(10, activation='softmax')])# Compile and train the modelmodel.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])model.fit(train_dataset, epochs=10, validation_data=val_dataset)

FLAX - Steeper learning curve with functional programming style:

import jaxfrom jax import numpy as jnp# Define a model using FLAX with functional programming style@jax.jitdef model(params, inputs):    dense = flax.nn.Dense(inputs.shape[-1], features=64)    x = dense.initialize_carry(jax.random.PRNGKey(0), inputs)    x = flax.nn.relu(x)    output = flax.nn.Dense(x.shape[-1], features=10).initialize_carry(jax.random.PRNGKey(0), x)    return output# Train the modeloptimizer = flax.optim.Adam(learning_rate=0.001).create(model.params)for batch in dataset:    optimizer = optimizer.train_step(batch)

Ecosystem and Community

TensorFlow has a mature and extensive ecosystem with a wide range of libraries, tools, and resources. It offers TensorFlow Hub for sharing and discovering pre-trained models, TensorFlow Serving for deploying models in production, and TensorFlow Lite for running models on mobile and edge devices.

FLAX is a relatively new framework and its ecosystem is still growing. However, FLAX benefits from JAX's ecosystem, which includes libraries for distributed computing, automatic differentiation, and GPU acceleration.

Industry Adoption

TensorFlow - Widely adopted in the industry:

TensorFlow in Production: https://www.tensorflow.org/guide/production
TensorFlow Success Stories: https://www.tensorflow.org/stories

FLAX - Growing adoption in research and cutting-edge projects:

FLAX GitHub Showcase: https://github.com/google/flax#showcase
FLAX Research Papers and Publications: https://github.com/google/flax#research-papers-and-publications

Community and Documentation

TensorFlow - Accessing TensorFlow's extensive documentation and resources:

TensorFlow Official Website: https://www.tensorflow.org/
TensorFlow GitHub Repository: https://github.com/tensorflow/tensorflow
TensorFlow Tutorials: https://www.tensorflow.org/tutorials
TensorFlow Community: https://www.tensorflow.org/community

FLAX - Accessing JAX and FLAX documentation and resources:

JAX Official Website: https://jax.readthedocs.io/
JAX GitHub Repository: https://github.com/google/jax
JAX Tutorials: https://jax.readthedocs.io/en/latest/notebooks.html
FLAX GitHub Repository: https://github.com/google/flax
FLAX Community: https://github.com/google/flax#community

Conclusion

TensorFlow and FLAX are powerful frameworks for building and training machine learning and deep learning models. TensorFlow provides a rich set of features, and an extensive ecosystem, and is widely adopted in both research and industry. On the other hand, FLAX offers a more flexible and functional approach to model development, making it well-suited for research and experimentation.

The choice between TensorFlow and FLAX depends on the specific requirements of your project. TensorFlow is a great choice for developers who value a mature ecosystem, ease of use, and extensive community support. FLAX is a good fit for researchers and developers who prefer a functional programming style, modularity, and flexibility.

Ultimately, both frameworks have their strengths and use cases, and choosing the right one depends on your specific needs and preferences. Good luck at choosing one!

PCOS Detection Using Machine Learning

Wesley Kambale — Thu, 08 Jun 2023 15:39:55 GMT

Introduction

Polycystic Ovary Syndrome (PCOS) is a common hormonal disorder that affects many women of reproductive age. It is characterized by the presence of multiple cysts in the ovaries, irregular menstrual cycles, and symptoms such as excessive hair growth, acne, and weight gain.

Early detection of PCOS is crucial for effective management and treatment of the condition. Machine Learning (ML) techniques can be utilized to build predictive models that can aid in the detection and diagnosis of PCOS. In this article, we will explore the process of PCOS detection using ML and provide code snippets for implementation.

Understanding PCOS Detection

PCOS detection involves analyzing various factors such as medical history, symptoms, physical examinations, and laboratory tests. ML algorithms can learn patterns and relationships within these factors to predict the likelihood of PCOS. We will use Python and the scikit-learn library, which provides a wide range of ML algorithms and utilities.

Import the Required Libraries

We will need to import essential libraries like Pandas, scikit-learn, and Seaborn, NumPy for numerical computations, and Matplotlib for visualizing the results. Create a new Python script, and import the necessary libraries as shown below:

import pandas as pdimport numpy as npimport matplotlib.pyplot as plt%matplotlib inlineimport seaborn as snsfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScalerfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.svm import SVCfrom sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

Data Collection

Gather a dataset that includes relevant features and labels. The features can include medical history, symptoms, hormone levels, and other related variables, while the labels indicate whether an individual has been diagnosed with PCOS or not.

Below is the dataset that we are going to use in this article:

Load the dataset in a Google Colab or Jupyter Notebook environment

# Load the datasetpcosData = pd.read_csv('pcos_dataset.csv')

Data Preprocessing

Clean the dataset by handling missing values, normalizing numerical features, and encoding categorical variables.

# Check for the sum of null values in each columnpcosData.isnull().sum()

You will realize that Marriage Status and Fast Food both have a null value that need to be cleaned (dropped).

# Drop all the null values in the datasetpcosData = pcosData.dropna()# Check again for the sum of null values in each columnpcosData.isnull().sum()

Depending on the dataset that you have, sometimes the data type of the data collected could be object instead of int64 or float64. It is our duty to typecast the data to the right data type for numerical computation and manipulation. The code snippet ensures that we cast the data to numerical values.

for column in pcosData:    columnSeriesObj = pcosData[column]    pcosData[column] = pd.to_numeric(pcosData[column], errors='coerce')

We can now visualize our data using Seaborn's pairplot function. The pairplot function creates a grid of scatterplots and histograms to visualize the pairwise relationships between multiple variables in a dataset. It provides a quick way to explore the correlation and distribution of variables.

In our case, we are going to visualize the pairwise relationship between the Age, Weight, Height and BMI of the patients to find any correlation between these features.

sns.pairplot(pcosData.iloc[:,1:5])

To get more insights into the data we are looking at, we shall plot histograms showing the distribution of Age, Weight and Marital Status among the patients.

def plot_hist(variable):    plt.figure(figsize = (9,3))    plt.hist(pcosData[variable], bins = 50)    plt.xlabel(variable)    plt.ylabel("Frequency")    plt.title("{} distribution with hist".format(variable))    plt.show()numericVar = [" Age (yrs)", "Weight (Kg)","Marraige Status (Yrs)"]for n in numericVar:    plot_hist(n)

Lastly, we shall plot a correlation graph between the features in that dataset. This is important when choosing which features to use when predicting PCOS in patients. As an ML engineer, you need to choose optimal features that will give you the best accuracy in prediction, not just 99%.

corr_matrix = pcosData.corr()plt.subplots(figsize=(30,10))sns.heatmap(corr_matrix, annot = True, fmt = ".2f");plt.title("Correlation Between Features")plt.show()

Split the dataset into features and labels X and Y and then split the dataset into training and testing sets.

# Split the dataset into features and labelsX = pcosData.iloc[:,1:41].valuesy = pcosData.iloc[:,0].values# Split the dataset into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)

Feature Selection

Analyze the dataset to identify the most informative features. This step helps improve model performance and reduces computational complexity. Techniques such as correlation analysis, feature importance ranking, and domain expertise can aid in feature selection.

# Normalize the numerical featuressc = StandardScaler()X_train = sc.fit_transform(X_train)X_test = sc.fit_transform(X_test)

Model Selection

Choose an appropriate ML algorithm for PCOS detection. Commonly used algorithms include Logistic Regression, Support Vector Machines (SVM), Random Forest, and Gradient Boosting. Consider the characteristics of the dataset, such as the number of features and the size of the dataset, to make an informed choice. In our case, we are going to use Support Vector Machine (SVM) and Random Forest.

# Create a Random Forest modelrf_model = RandomForestClassifier()# Create an SVM modelsvm_model = SVC()

Model Training

Train the selected ML model using the training dataset. During training, the model learns the patterns and relationships between the features and labels. This step involves optimization techniques to find the best set of model parameters.

# Train the Random Forest modelrf_model.fit(X_train, y_train)# Train the SVM modelsvm_model.fit(X_train, y_train)

Make Predictions

Once the model is trained and evaluated, it can be used to make predictions on new, unseen data. The model takes in input features related to a specific individual and predicts the probability of that individual having PCOS.

# Make predictions on the testing set with RFrf_pred = rf_model.predict(X_test)# Make predictions on the testing set with SVMsvm_pred = svm_model.predict(X_test)

Model Evaluation

Evaluate the trained model using the testing dataset. Performance metrics such as accuracy, precision, recall, and F1 score can be used to assess the model's performance. Adjust the model's hyperparameters if necessary to improve performance.

# Evaluate the Random Forest modelrf_accuracy = accuracy_score(y_test, rf_pred)rf_precision = precision_score(y_test, rf_pred)rf_recall = recall_score(y_test, rf_pred)rf_f1 = f1_score(y_test, rf_pred)print("Random Forest Accuracy:", rf_accuracy)print("Random Forest Precision:", rf_precision)print("Random Forest Recall:", rf_recall)print("Random Forest F1 Score:", rf_f1)# Evaluate the SVM modelsvm_accuracy = accuracy_score(y_test, svm_pred)svm_precision = precision_score(y_test, svm_pred)svm_recall = recall_score(y_test, svm_pred)svm_f1 = f1_score(y_test, svm_pred)print("SVM Accuracy:", svm_accuracy)print("SVM Precision:", svm_precision)print("SVM Recall:", svm_recall)print("SVM F1 Score:", svm_f1)

Model Deployment

After successfully training and evaluating the model, we can deploy it to make PCOS diagnoses on new, unseen data. This could involve integrating the model into a web application, mobile app, or any other suitable platform.

# Save the trained RF modelrf_model.save('pcos_rf_model.h5')# Save the trained SVM modelsvm_model.save('pcos_svm_model.h5')

Conclusion

Machine Learning techniques provide a valuable tool for the detection and diagnosis of Polycystic Ovary Syndrome (PCOS). By leveraging ML algorithms, we can analyze various factors and build predictive models that assist healthcare professionals in identifying PCOS in patients. This article outlined the step-by-step process of PCOS detection using ML and provided code snippets for data preprocessing, model training, and evaluation. Remember to adapt the code to your specific dataset and explore different ML algorithms to find the best approach for PCOS detection.

Resources

Access the code snippets in a Google Colab notebook here (MIT License).

Access the dataset used in this article here (MIT License). Don't request access, Make A Copy.

Techniques of Feature Extraction in Machine Learning

Wesley Kambale — Mon, 29 May 2023 23:11:17 GMT

Introduction

Feature extraction is a crucial step in machine learning that involves transforming raw input data into a set of meaningful features that can be used for training models. The goal is to reduce the dimensionality of the data, remove irrelevant information, and extract relevant patterns and characteristics that can improve the model's performance.

Feature extraction is particularly useful when dealing with high-dimensional data, such as images, text documents, or sensor readings. By selecting and extracting informative features, we can reduce computational complexity, remove noise, and improve the model's ability to generalize. The resulting features should be discriminative, informative, and independent.

Feature Extraction Libraries

Python provides several libraries that facilitate feature extraction:

NumPy: For numerical feature extraction and manipulation.

Pandas: Ideal for handling tabular data and categorical feature extraction.

scikit-learn: Offers various feature extraction techniques and utilities.

NLTK (Natural Language Toolkit): Useful for text feature extraction and processing.

OpenCV: Widely used for image feature extraction

Techniques of Feature Extraction

There are several techniques for feature extraction depending on the data that one has. Below, we cover these different techniques and how they are applied in machine learning.

Numerical Feature Extraction

Scaling

Scaling is essential to bringing numerical features onto a common scale, preventing one feature from dominating others. Common scaling techniques include Min-Max scaling and Z-score normalization. Min-Max scaling scales the features to a specific range, usually between 0 and 1, while Z-score normalization transforms the features to have zero mean and unit variance.

from sklearn.preprocessing import MinMaxScalerdata = [[10, 0.5], [20, 0.8], [15, 0.7]]# Create an instance of the MinMaxScalerscaler = MinMaxScaler()# Apply scaling to the datascaled_features = scaler.fit_transform(data)

The MinMaxScaler scales the features between 0 and 1, ensuring they are normalized

Binning

Binning is useful when dealing with continuous numerical features. It involves dividing the range of values into multiple bins or intervals and assigning each value to a specific bin. Binning helps reduce noise and capture patterns within specific value ranges.

import numpy as npdata = [2.5, 3.7, 1.9, 4.2, 5.1, 2.8]# Create four bins from 1 to 6bins = np.linspace(1, 6, 4) # Assign each value to a binbinned_features = np.digitize(data, bins)

The np.digitize function assigns each value to a bin based on its position in the specified bins.

Aggregation

Aggregation involves computing statistical summaries of numerical features. Common aggregations include mean, median, standard deviation, minimum, maximum, and various percentiles. Aggregating features can provide insights into the overall distribution and characteristics of the data.

import numpy as npdata = [[10, 20, 15], [5, 10, 8], [12, 18, 20]]# Compute mean along each column (axis=0)mean_features = np.mean(data, axis=0)  # Compute median along each columnmedian_features = np.median(data, axis=0)

The np.mean and np.median functions compute the mean and median values along the specified axis, resulting in summary statistics for each feature.

Polynomial Features

Polynomial features allow capturing of nonlinear relationships between numerical features. By creating higher-order combinations, such as squares or interaction terms, we can introduce additional dimensions that may improve the model's ability to capture complex patterns.

from sklearn.preprocessing import PolynomialFeaturesdata = [[2, 3], [1, 4], [5, 2]]# Create polynomial features up to degree 2poly = PolynomialFeatures(degree=2) # Generate polynomial featurespolynomial_features = poly.fit_transform(data)

The PolynomialFeatures class generates polynomial features up to the specified degree, allowing the model to capture nonlinear relationships between the original features.

Categorical Feature Extraction

One-Hot Encoding

One-hot encoding transforms categorical variables into binary vectors. Each unique category becomes a separate binary feature, with a value of 1 indicating the presence of that category and 0 otherwise. One-hot encoding is suitable when there is no inherent order or hierarchy among categories.

from sklearn.preprocessing import OneHotEncoderdata = [['Red'], ['Blue'], ['Green'], ['Red']]# Create an instance of the OneHotEncoderencoder = OneHotEncoder()  # Apply one-hot encodingonehot_features = encoder.fit_transform(data).toarray()

The OneHotEncoder encodes categorical features as binary vectors, where each unique category becomes a separate binary feature.

Label Encoding

Label encoding assigns a unique numeric label to each category. It is useful when there is an ordinal relationship among the categories. However, it is important to note that label encoding may introduce unintended ordinality that could impact model performance.

from sklearn.preprocessing import LabelEncoderdata = ['Low', 'High', 'Medium', 'Low'] # Create an instance of the LabelEncoderencoder = LabelEncoder()  # Apply label encodingencoded_features = encoder.fit_transform(data)

The LabelEncoder assigns a numerical label to each category, preserving the order of categories.

Frequency Encoding

Frequency encoding replaces categorical values with their corresponding frequencies in the dataset. It can help capture the importance or prevalence of each category within the dataset.

import pandas as pddata = pd.Series(['Apple', 'Banana', 'Apple', 'Orange', 'Banana'])# Compute frequency of each categoryfrequency = data.value_counts(normalize=True)  # Replace categories with frequenciesencoded_features = data.map(frequency)

The value_counts method computes the frequency of each category, and the map function replaces the categories with their corresponding frequencies.

Target Encoding

Target encoding replaces categorical values with the mean (or other statistics) of the target variable for each category. It leverages the relationship between the target variable and the categorical feature, potentially capturing valuable information for predictive modeling.

import pandas as pddata = pd.DataFrame({'Category': ['A', 'B', 'A', 'B'], 'Target': [1, 0, 1, 1]})# Compute mean target value for each categorytarget_mean = data.groupby('Category')['Target'].mean() # Replace categories with mean target valuesencoded_features = data['Category'].map(target_mean)

The groupby method groups the data by category, and the mean function computes the mean target value for each category. The map function replaces the categories with their corresponding mean target values.

Text Feature Extraction

Bag-of-Words (BoW)

The bag-of-words approach represents text documents by creating a vocabulary of unique words and counting the occurrence of each word in each document. The resulting representation is a count or frequency matrix, where each row corresponds to a document, and each column corresponds to a word in the vocabulary.

from sklearn.feature_extraction.text import CountVectorizerdata = ['I love dogs', 'I hate cats', 'Dogs are cute'] # Create an instance of CountVectorizervectorizer = CountVectorizer() # Apply BoW transformationbow_features = vectorizer.fit_transform(data)

The CountVectorizer converts text into a matrix of token counts, where each row represents a document, and each column represents a unique word in the vocabulary.

Term Frequency-Inverse Document Frequency (TF-IDF)

TF-IDF assigns weights to words based on their frequency within a document and their rarity across the entire document collection. It downplays common words and emphasizes rare and distinctive words, providing a more informative representation of the text.

from sklearn.feature_extraction.text import TfidfVectorizerdata = ['I love dogs', 'I hate cats', 'Dogs are cute']# Create an instance of TfidfVectorizervectorizer = TfidfVectorizer()  # Apply TF-IDF transformationtfidf_features = vectorizer.fit_transform(data)

The TfidfVectorizer computes TF-IDF weights for each word, where TF measures the word's frequency in a document, and IDF measures its rarity across the document collection.

Word Embeddings

Word embeddings represent words as dense vectors in a continuous vector space, capturing semantic relationships between words. Techniques like Word2Vec and GloVe learn representations based on the surrounding context of words, enabling the model to capture word similarities and analogies.

import gensimfrom gensim.models import Word2Vecdata = [['I', 'love', 'dogs'], ['I', 'hate', 'cats'], ['Dogs', 'are', 'cute']]# Create a Word2Vec modelmodel = Word2Vec(data, min_count=1)  # Obtain the word embedding for 'dogs'word_embedding = model.wv['dogs']

The Word2Vec model learns word embeddings based on the context of words in the provided data. Each word is represented as a dense vector, and we can access the word embeddings using the model's wv property.

N-grams

N-grams represent contiguous sequences of N words in a text document. By considering contextual information and preserving word order, N-grams capture richer information about the relationships between words.

from sklearn.feature_extraction.text import CountVectorizerdata = ['I love dogs', 'I hate cats', 'Dogs are cute']vectorizer = CountVectorizer(ngram_range=(1, 2))ngram_features = vectorizer.fit_transform(data)

Image Feature Extraction

Histogram of Oriented Gradients (HOG)

HOG calculates the distribution of gradient orientations within image patches. It captures local shape information and is commonly used for object detection and recognition tasks.

import cv2image = cv2.imread('image.jpg', 0)hog = cv2.HOGDescriptor()hog_features = hog.compute(image)

The HOGDescriptor computes the HOG features for an image, which represents the local shape and edge information of objects in the image.

Scale-Invariant Feature Transform (SIFT)

SIFT identifies key points and descriptors that are invariant to scale, rotation, and affine transformations. It is widely used for image matching and recognition, particularly in scenarios with significant variations in lighting and viewpoint.

import cv2image = cv2.imread('image.jpg', 0)sift = cv2.SIFT_create()keypoints, descriptors = sift.detectAndCompute(image, None)

The SIFT_create function creates a SIFT object, and detectAndCompute extracts key points and their descriptors from the image, capturing distinctive features invariant to transformations.

Convolutional Neural Networks (CNNs)

CNNs are deep learning models designed to automatically learn hierarchical representations from images. They consist of multiple layers, including convolutional layers that extract local features and pooling layers that aggregate information. CNNs have achieved remarkable success in various computer vision tasks.

import tensorflow as tffrom tensorflow.keras.applications import VGG16image = tf.keras.preprocessing.image.load_img('image.jpg', target_size=(224, 224))image = tf.keras.preprocessing.image.img_to_array(image)image = tf.keras.applications.vgg16.preprocess_input(image)vgg_model = VGG16(weights='imagenet', include_top=False)features = vgg_model.predict(np.expand_dims(image, axis=0))

The VGG16 model is a pre-trained CNN that learns hierarchical features from images. We preprocess the image and extract features using the model's predict function.

Pre-trained Models

Pre-trained models, such as VGG, ResNet, or Inception, are deep-learning models trained on large-scale image datasets. Instead of training from scratch, we can utilize these models and extract features from intermediate layers. The features extracted from these models can capture high-level image representations that can be used for transfer learning or as input to other models.

import tensorflow as tffrom tensorflow.keras.applications import VGG16image = tf.keras.preprocessing.image.load_img('image.jpg', target_size=(224, 224))image = tf.keras.preprocessing.image.img_to_array(image)image = tf.keras.applications.vgg16.preprocess_input(image)vgg_model = VGG16(weights='imagenet', include_top=False)intermediate_layer_model = tf.keras.Model(inputs=vgg_model.input, outputs=vgg_model.get_layer('block4_pool').output)features = intermediate_layer_model.predict(np.expand_dims(image, axis=0))

The intermediate_layer_model is created by specifying the desired intermediate layer of the pre-trained VGG16 model. By extracting features from this intermediate layer, we capture high-level representations that can be used for transfer learning or as inputs to other models.

Further Feature Extraction Techniques

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a widely used technique for dimensionality reduction. It identifies the directions (principal components) in which the data varies the most and projects the data onto these components, effectively reducing the dimensionality while preserving most of the information.

from sklearn.decomposition import PCA# Assuming X is your input data# Specify the number of components you want to extractpca = PCA(n_components=2)# X_pca contains the extracted features with reduced dimensionalityX_pca = pca.fit_transform(X)

Independent Component Analysis (ICA)

Independent Component Analysis (ICA) is another technique for feature extraction that aims to find statistically independent components from the input data. It assumes that the observed data is a linear combination of independent sources and tries to separate these sources.

from sklearn.decomposition import FastICA# Assuming X is your input data # Specify the number of components you want to extractica = FastICA(n_components=2)# X_ica contains the extracted features with reduced dimensionalityX_ica = ica.fit_transform(X)

Feature Selection

Instead of transforming the input data, feature selection focuses on selecting a subset of the existing features that are most relevant to the prediction task. This approach can be particularly useful when dealing with high-dimensional data.

Here's an example of using feature selection with scikit-learn:

from sklearn.feature_selection import SelectKBest, chi2# Assuming X and y are your input features and target labels, respectively # Select the top 10 featuresselector = SelectKBest(score_func=chi2, k=10)# X_new contains the selected featuresX_new = selector.fit_transform(X, y)

Conclusion

Feature extraction is a fundamental step in machine learning that significantly influences model performance. By selecting appropriate techniques for numerical, categorical, text, or image data, we can transform raw data into meaningful representations that capture relevant information. Experimenting with different feature extraction methods and understanding their impact on model performance is crucial for building accurate and robust machine learning models. The choice of feature extraction technique depends on the specific problem and the characteristics of your data.

Resources

Access the code snippets in a Google Colab notebook here (MIT License).

Data Visualization with Matplotlib and Seaborn

Wesley Kambale — Fri, 19 May 2023 22:08:32 GMT

Introduction

Data visualization is a crucial step in the data analysis process. It allows us to visually explore and communicate data patterns, trends, and relationships effectively. Matplotlib and Seaborn are two popular Python libraries that provide powerful tools for creating a wide range of static, animated, and interactive visualizations.

Matplotlib

Matplotlib is a versatile plotting library that offers a high degree of control over plot customization. It provides a wide variety of plot types, including line plots, scatter plots, bar plots, histograms, and more. Matplotlib can be used in interactive environments like Jupyter or Google Colab notebooks.

Seaborn

Seaborn is a higher-level data visualization library built on top of Matplotlib. It simplifies the process of creating attractive statistical graphics by providing high-level functions for common plot types. Seaborn also offers themes and color palettes that make plots visually appealing with minimal customization.

Installation

Before we start, let's make sure Matplotlib and Seaborn are installed. You can install them using pip, the Python package installer, by running the following commands in your terminal:

pip install matplotlibpip install seaborn

Make sure you have an up-to-date version of both libraries. Now that we have everything set up, let's dive into the tutorial!

Visualization with Matplotlib

Line Plot

A line plot is a basic plot type that displays data points connected by lines. It is useful for visualizing trends and changes over time or any continuous variable. Here's an example of creating a simple line plot using Matplotlib:

import matplotlib.pyplot as plt# Sample datalistOne = [1, 2, 3, 4, 5]listTwo = [2, 4, 6, 8, 10]# Create a line plotplt.plot(listOne, listTwo)# Add labels and titleplt.xlabel('X-axis')plt.ylabel('Y-axis')plt.title('Line Plot of Two Lists')# Show the plotplt.show()

We import matplotlib.pyplot as plt, create lists listOne and listTwo representing the data points, and then use the plot() function to create the line plot. We add labels to the x-axis and y-axis and provide a title for the plot. Finally, we use show() to display the plot.

Scatter Plot

A scatter plot displays individual data points as markers on a two-dimensional plane. It is useful for examining the relationship between two continuous variables. Let's create a scatter plot using Matplotlib:

import matplotlib.pyplot as plt# Sample datalistOne = [1, 2, 3, 4, 5]listTwo = [2, 4, 6, 8, 10]# Create a scatter plotplt.scatter(listOne, listTwo)# Add labels and titleplt.xlabel('X-axis')plt.ylabel('Y-axis')plt.title('Scatter Plot of Two Lists')# Show the plotplt.show()

We use the scatter() function to create a scatter plot. The rest of the code is similar to the line plot example.

Bar Plot

A bar plot represents data as rectangular bars, with the length of each bar proportional to the value it represents. Bar plots are commonly used to compare categorical data or to show the distribution of a continuous variable across categories. Here's an example of creating a bar plot using Matplotlib:

import matplotlib.pyplot as plt# Sample datacategories = ['A', 'B', 'C', 'D']values = [10, 15, 7, 12]# Create a bar plotplt.bar(categories, values)# Add labels and titleplt.xlabel('Categories')plt.ylabel('Values')plt.title('Bar Plot of Categories and Values')# Show the plotplt.show()

We use the bar() function to create a bar plot. The categories list represents the x-axis categories, and the values list represents the height of each bar. We add labels to the x-axis and y-axis and provide a title for the plot. Finally, we use show() to display the plot.

Histogram

A histogram is used to visualize the distribution of a single continuous variable. It divides the range of values into intervals called bins and displays the frequency or proportion of values falling into each bin. Here's an example of creating a histogram using Matplotlib:

import matplotlib.pyplot as pltimport numpy as np# Generate random datanp.random.seed(42)data = np.random.normal(0, 1, 1000)# Create a histogramplt.hist(data, bins=30)# Add labels and titleplt.xlabel('Values')plt.ylabel('Frequency')plt.title('Histogram Showing Values against Frequency')# Show the plotplt.show()

Here we use the hist() function to create a histogram. The data variable contains random values generated using NumPy's random.normal() function. We specify the number of bins using the bins parameter. We add labels to the x-axis and y-axis and provide a title for the plot. Finally, we use show() to display the plot.

Visualization with Seaborn

Box Plot

A box plot, also known as a box-and-whisker plot, is used to display the distribution of a continuous variable across different categories or groups. It shows the median, quartiles, and any potential outliers in the data. Let's create a box plot using Seaborn:

import seaborn as snsimport numpy as npimport pandas as pd# Generate random datanp.random.seed(42)dataOne = np.random.normal(0, 1, 100)dataTwo = np.random.normal(2, 1, 100)dataThree = np.random.normal(1, 2, 100)# Combine the data into a DataFramedata = np.concatenate([dataOne, dataTwo, dataThree])categories = np.repeat(['A', 'B', 'C'], 100)df = pd.DataFrame({'Category': categories, 'Data': data})# Create a box plotsns.boxplot(x='Category', y='Data', data=df)# Add titleplt.title('Box Plot of Data against Category')# Show the plotplt.show()

We use the boxplot() function from Seaborn to create a box plot. We create three sets of random data, dataOne, dataTwo, and dataThree, representing different categories. We then combine the data into a DataFrame, df, with the 'Category' and 'Data' columns. Finally, we use boxplot() by specifying the x-axis as 'Category', the y-axis as 'Data', and the DataFrame df. We add a title to the plot and display it using show().

Heatmap

A heatmap is a graphical representation of data where the values in a matrix are represented as colors. It is useful for visualizing the relationships or patterns in large datasets. Let's create a heatmap using Seaborn:

import seaborn as snsimport numpy as np# Generate random correlation datanp.random.seed(42)data = np.random.rand(10, 10)corr = np.corrcoef(data)# Create a heatmapsns.heatmap(corr, annot=True, cmap='coolwarm')# Add titleplt.title('Heatmap of the Data and the Correlation')# Show the plotplt.show()

We use the heatmap() function from Seaborn to create a heatmap. We generate random data data and calculate the correlation matrix corr using NumPy's corrcoef() function. We then pass corr to heatmap(), set annot=True to display the correlation values on the heatmap, and specify the color map as 'coolwarm'. We add a title to the plot and display it using show().

Additional Customizations

Both Matplotlib and Seaborn offer a wide range of customization options to enhance your plots. Here are a few additional customization techniques:

Axis Limits and Ticks

You can set custom axis limits using xlim() and ylim() functions in Matplotlib:

plt.xlim(0, 10)plt.ylim(0, 20)

You can also customize the ticks on the axis using xticks() and yticks():

plt.xticks([0, 1, 2, 3, 4, 5])plt.yticks([0, 5, 10, 15, 20])

Legends

You can add legends to your plots to provide additional information about the data using legend():

plt.plot(x, y, label='Line 1')plt.plot(x, z, label='Line 2')plt.legend()

Color Maps

Both Matplotlib and Seaborn provide a variety of color maps for different purposes. You can specify the color map using the cmap parameter. For example, in a scatter plot:

plt.scatter(x, y, cmap='viridis')

Styling with Seaborn

Seaborn provides additional styling options using its built-in themes. You can set a different theme using set_theme(). For example:

sns.set_theme(style='whitegrid')

Seaborn also provides various color palettes that you can use to customize the colors in your plots. You can set a different color palette using set_palette(). For example:

sns.set_palette('Set2')

Conclusion

We have covered the basics of data visualization using Matplotlib and Seaborn. We explored various plot types, including line plots, scatter plots, bar plots, histograms, box plots, and heatmaps. We also discussed additional customization techniques to enhance your plots. With these tools and techniques, you can create visually appealing and informative visualizations to explore and communicate your data effectively.

Resources

Google Colab Notebook

Machine Learning in SQL using BigQuery

Wesley Kambale — Tue, 09 May 2023 14:12:55 GMT

Machine learning is a rapidly growing field that has seen a lot of interest from businesses and organizations. With the growing amount of data that businesses collect, it has become essential to use machine learning to extract insights from that data. One of the best tools for this is BigQuery, a fully managed, cloud-native data warehouse that makes it easy to analyze large amounts of data quickly. In this article, we will explore how to perform machine learning with SQL using BigQuery.

But before we start, it is essential to note that machine learning with SQL is not the same as traditional machine learning methods such as regression, clustering, or classification. Instead, it focuses on using SQL to analyze and transform data into the format required for machine learning. With that said, let's get started.

Setting up BigQuery

Before we start with machine learning, we need to set up BigQuery. If you haven't done this before, you can follow these simple steps:

Go to the BigQuery console and sign up for a Google Cloud account if you haven't already.
Create a new project or select an existing one.
Click on "Create Dataset" to create a new dataset to store your data.
Upload your data to the dataset using the "Create Table" button.

Explore the Data

Once you have uploaded your data, it is essential to understand it. In this tutorial, we will use the Iris dataset, which is a popular dataset for classification tasks. It contains 150 instances of three classes, with each class having 50 instances.

To explore the data, we can use SQL queries to select and visualize the data. The following code snippet shows how to select the first five rows of the dataset:

SELECT *FROM `..`LIMIT 5

This will display the first five rows of the dataset. We can use the same query to select specific columns, such as the sepal length and sepal width:

SELECT sepal_length, sepal_widthFROM `..iris`LIMIT 5

We can also use SQL to calculate summary statistics of the data, such as the mean and standard deviation:

SELECT  AVG(sepal_length) AS avg_sepal_length,  STDDEV(sepal_length) AS stddev_sepal_lengthFROM `..iris`

Transform the Data

Before we can perform machine learning on the data, we need to transform it into the required format. In the case of the Iris dataset, we need to convert the categorical target variable into a numerical value. We can do this using a SQL CASE statement:

SELECT  CASE class    WHEN 'Iris-setosa' THEN 1    WHEN 'Iris-versicolor' THEN 2    WHEN 'Iris-virginica' THEN 3  END AS target,  sepal_length,  sepal_width,  petal_length,  petal_widthFROM `..iris`

This query will create a new column called target with the numerical value of the class.

Train a Machine Learning Model

Now that we have transformed the data, we can train a machine learning model using SQL. In BigQuery, we can use the CREATE MODEL statement to create a model. In this tutorial, we will use logistic regression to classify the Iris dataset

To train a logistic regression model, we can use the following SQL query:

CREATE MODEL `..iris_model`OPTIONS  (model_type='logistic_reg',  input_label_cols=['target'],   max_iteration=50,   l1_reg=1,   l2_reg=0.1) ASSELECT  CASE class    WHEN 'Iris-setosa' THEN 1    WHEN 'Iris-versicolor' THEN 2    WHEN 'Iris-virginica' THEN 3  END AS target,  sepal_length,  sepal_width,  petal_length,  petal_widthFROM `..iris`

This query will create a new model called "iris_model" and train it using logistic regression. We can specify options such as the maximum number of iterations and regularization parameters.

Click on GO TO MODEL button to see the training details of the model as shown below:

Evaluate the Model

Once the model is trained, we can evaluate its performance using SQL. In BigQuery, we can use the ML.EVALUATE function to evaluate the model. The following query shows how to evaluate the model on the training data:

SELECT  *FROM ML.EVALUATE(MODEL `..iris_model`,                (                 SELECT                   CASE species                     WHEN 'Iris-setosa' THEN 1                     WHEN 'Iris-versicolor' THEN 2                     WHEN 'Iris-virginica' THEN 3                   END AS target,                   sepal_length,                   sepal_width,                   petal_length,                   petal_width                 FROM `..iris`                ))

This query will return metrics such as accuracy, precision, and recall.

Precision - 0.9534Recall - 0.9533Accuracy - 0.9533F1 score - 0.9533Log loss - 0.1231ROC AUC - 0.9990

Make Predictions

Finally, we can use the trained model to make predictions on new data. In BigQuery, we can use the ML.PREDICT function to make predictions. The following query shows how to make predictions on new data:

SELECT  predicted_target,  predicted_target_probsFROM ML.PREDICT(MODEL `..iris_model`,                (                 SELECT                   sepal_length,                   sepal_width,                   petal_length,                   petal_width                 FROM `..new_data`                ))

This query will return the predicted target value and the probability of each class for each row in the new data.

Conclusion

In this article, we have explored how to perform machine learning with SQL using BigQuery. We started by setting up BigQuery and exploring the data using SQL queries. We then transformed the data into the required format, trained a logistic regression model, and evaluated its performance. Finally, we made predictions on new data using the trained model. With BigQuery, it is possible to perform powerful machine learning tasks without leaving the SQL environment.

NumPy vs JAX: Optimizing Performance and Functionality

Wesley Kambale — Mon, 08 May 2023 17:57:39 GMT

Introduction

Numpy and Jax are both Python libraries that are widely used in numerical computing and scientific computing. While they have similar functionalities, there are some key differences between them that make Jax particularly useful for machine learning applications. In this tutorial, we will explore some of the differences between Numpy and Jax and provide code snippets to illustrate these differences.

Installation

Both Numpy and Jax can be installed using pip. To install Numpy, you can run the following command in your terminal:

pip install numpy

To install Jax, you can run the following command:

pip install jax

Note that Jax also requires the jaxlib library, which can be installed using the following command:

pip install jaxlib

Functionality

Let us take a look at how to write NumPy and Jax when performing simple and complex scientific computing and machine learning.

Array creation

Numpy and Jax both provide functions for creating arrays. The simplest way to create an array in Numpy is to use the array function:

import numpy as np# create a 1D arraynpArrOne = np.array([1, 2, 3])print(npArrOne)# create a 2D arraynpArrTwo = np.array([[1, 2], [3, 4]])print(npArrTwo)

In Jax, the corresponding function is jnp.array:

import jax.numpy as jnp# create a 1D arrayjxArrOne = jnp.array([1, 2, 3])print(jxArrOne)# create a 2D arrayjxArrTwo = jnp.array([[1, 2], [3, 4]])print(jxArrTwo)

Note that the syntax is very similar in both libraries, but in Jax we import the numpy functions from the jax package rather than the numpy package.

Array indexing

Both Numpy and Jax provide powerful indexing capabilities for arrays. The basic syntax is the same in both libraries:

# indexing in numpynpArr = np.array([1, 2, 3])print(npArr[0])  # output: 1# indexing in jaxjxArr = jnp.array([1, 2, 3])print(jxArr[0])  # output: 1

However, Jax provides some additional indexing features that are particularly useful for machine learning. For example, Jax provides a function called jnp.index_update that allows you to update an element of an array by index:

# update an element in numpynpArr = np.array([1, 2, 3])npArr[0] = 4print(npArr)  # output: [4, 2, 3]# update an element in jaxjxArr = jnp.array([1, 2, 3])jxArr = jnp.index_update(b, 0, 4)print(jxArr)  # output: [4, 2, 3]

Note that in Jax, we cannot modify an array in-place, so we need to reassign the result of jnp.index_update to the original array.

Array broadcasting

One of the most powerful features of Numpy and Jax is their ability to broadcast arrays. Broadcasting allows you to perform operations on arrays with different shapes and sizes. Here is an example of broadcasting in Numpy:

# broadcasting in numpynpArrOne = np.array([[1, 2], [3, 4]])npArrTwo = np.array([10, 20])print(npArrOne + npArrTwo)  # output: [[11, 22], [13, 24]]

In Jax, broadcasting works in a similar way:

# broadcasting in jaxjxArrOne = jnp.array([[1, 2], [3, 4]])jxArrTwo = jnp.array([10, 20])print(jxArrOne + jxArrTwo) # output: [[11, 22], [13, 24]]

In the above example, Numpy and Jax are able to add the 2D array npArrOne and the 1D array npArrTwo by automatically broadcasting the dimensions of npArrTwo to match the dimensions of npArrOne.

For Jax, it provides some additional broadcasting features that are particularly useful for machine learning. For instance, Jax provides a function called jnp.vmap that allows you to apply a function to multiple inputs using broadcasting:

# vmap in Jaxdef add(x, y):    return x + yjxArrOne = jnp.array([[1, 2], [3, 4]])jxArrTwo = jnp.array([[10, 20], [30, 40]])# apply add to multiple inputs using broadcastingjxArrThree = jnp.vmap(add)(jxArrOne, jxArrTwo)print(jxArrThree)  # output: [[11, 22], [33, 44]]

In the code snippet above, we define a function add that adds two arrays element-wise. We then use jnp.vmap to apply this function to two arrays jxArrOne and jxArrTwo, which have the same shape. The result is a new array jxArrThree that has the same shape as jxArrOne and jxArrTwo, where each element of jxArrThree is the sum of the corresponding elements of jxArrOne and jxArrTwo.

Performance

One of the key advantages of Jax over Numpy is its ability to compile code using the XLA compiler. This allows Jax to run code on GPUs and TPUs, which can significantly improve performance for large-scale machine learning applications.

	NumPy	JAX
Hardware	CPU	CPU, GPU, TPU
Execution	Synchronously	Asynchronously
Parallel computation	No	Yes

In the example below, we show how Jax and NumPy can be used to accelerate a simple matrix multiplication:

NumPy:

# matrix multiplication in numpyimport numpy as npimport timenpArrOne = np.random.rand(1000, 1000)npArrOne = np.random.rand(1000, 1000)start = time.time()npArrThree = np.dot(npArrOne, npArrOne)end = time.time()print("NumPy time:", end - start)

Output:

NumPy time: 0.5427792072296143

JAX:

# matrix multiplication in jaximport jax.numpy as jnpfrom jax import jitjxArrOne = jnp.random.rand(1000, 1000)jxArrTwo = jnp.random.rand(1000, 1000)@jitdef matmul(jxArrOne, jxArrTwo):    return jnp.dot(jxArrOne, jxArrTwo)start = time.time()jxArrThree = matmul(jxArrOne, jxArrOne)end = time.time()print("JAX time:", end - start)

Output:

JAX time: 0.03486919403076172

In the code snippets above, we generate two random matrices NumPy and Jax of size 1000 x 1000, and we use the np.dot function to perform matrix multiplication in Numpy, and the jnp.dot function to perform matrix multiplication in Jax. We also use the @jit decorator to compile the matmul function using the XLA compiler.

From the output of time given, you can see that Jax is significantly faster than NumPy in terms of performance.

Conclusion

Summary table of the similarities and differences between NumPy and JAX

Feature	NumPy	JAX
Functionality	Basic array operations, linear algebra, statistics, image processing, Fourier transform	All of NumPy's functionality, plus automatic differentiation, just-in-time compilation, parallel execution, and stateful computations
Performance	Good for simple tasks	Excellent for complex tasks
Scalability	Good for small to medium datasets	Excellent for large datasets

So, we have explored some of the differences between Numpy and Jax, two powerful Python libraries that are widely used in numerical computing and scientific computing.

While they have similar functionalities, Jax provides some additional features that are particularly useful for machine learning applications, including powerful indexing and broadcasting capabilities, as well as the ability to run code on GPUs and TPUs using the XLA compiler, which makes it particularly useful for machine learning applications.

By understanding the strengths and weaknesses of each library, you can choose the one that best suits your needs, and maximize the performance and functionality of your scientific computing projects.

Good luck!

Pandas: The Gateway to Data Exploration and Visualization

Wesley Kambale — Wed, 03 May 2023 22:40:40 GMT

What is Pandas?

Not sure whether Pandas was named after the dearly beloved panda, but Pandas is a popular open-source Python library for data manipulation and analysis. The name is derived from "panel data". It offers various tools for data structures and functions in manipulating numerical data.

The library includes a DataFrame object for multivariate data manipulation and a Series object for univariate data manipulation with integrated indexing. There are various methods for data manipulation with the help of vectorization. Data set merging, joining, reshaping, and pivoting. And most importantly, tools for reading and writing data in different file formats.

So, let's dive into the workings of Pandas. For the installation of Pandas, check the official documentation here.

Importing Pandas

In your Google Colab or Jupyter Notebook, we import pandas and assign it an alias for easy reference and use throughout the notebook. The most common alias is pd.

import pandas as pd

Data Exploration

Data exploration is typically the first step of data analysis used to explore and visualize data to uncover insights from the start or identify areas or patterns to dig into more. Using interactive dashboards and point-and-click data exploration, users can better understand the bigger picture and get insights faster.

Reading Data

The beauty of Pandas is that it allows you to work with data that is stored in different file formats. As a data analyst, you need to be flexible and ready to work with all sorts of file formats thrown at you. However, the most common file format is CSV.

Reading a CSV

data = pd.read_csv('csv_file_name.csv')

Note

data is a variable that will store the file we are reading
pd is the alias for Pandas
read_csv is a Pandas function for reading the CSV file

Reading a Spreadsheet

#To read an entire spreadsheetdata = pd.read_excel('spreadsheet_file_name.xlsx')#If you wish to read a specific sheet in the spreadsheetdata = pd.read_excel('spreadsheet_file_name.xlsx', sheetname = 'sheet_name')

Reading HTML

For us to be able to explore data stored in HTML tables, we need to first ensure that the BeautifulSoup package is installed using the following command:

pip install BeautifulSoup4

Then run the following command to read data from the HTML file while importing BeautifulSoup first.

from bs4 import BeautifulSoupdata = pd.read_html('url_to_html_file')

For this article, we are going to use the "PCOS" data set. This is non-personal data for Polycystic ovary syndrome obtained from Kaggle. The data set is a CSV file.

We will check out the information about the data set to know how many entries it contains. This helps you know whether the data set has null values that need to be cleaned up.

data.info()

When you notice null values, the best way is to either drop them or fill them will data. Dropping null values helps you get accurate insights from the data during analysis and visualization.

#You can use the any() function to check for null values by Columnspd.isnull(data).any()# This will drop all columns that contain null values in the entire data set data.dropna()#This will fill the null values with the average of BMIdata['BMI'].fillna(value = data['BMI'].mean())

After fixing the null values, we can now go ahead and take a sneak peek at our data. The head() function enables us to display the first five (5) entries in the data set while the tail() function displays the last five (5) entries in the data set.

data.head()

Exploring the Data

The GroupBy function

Looking at our data, we can group the data by PCOS to see which age is affected most and count the number of those with PCOS for each age.

groupbyPcos = data.groupby('Age (yrs)').count()groupbyPcos.columns = ['Number of PCOS Cases']groupbyPcos

The sum() funtion

The data set we are dealing with has data on age, height, BMI, etc. and we can sum up this data

groupbyPcos = data.groupby('BMI').sum()

Data Visualization

Data visualization allows users to explore and analyze data quickly and easily. This is good to get visual insights on patterns that you are like to have in your model.

Line Plot

A line plot is a basic plot that shows the trend of data over time. In Pandas, you can create a line plot using the plot() function. To visualize a simple line plot of the data we have, we run the following code:

#Line Plot of Sl. No against BMIdata.plot(x="Sl. No", y="BMI", figsize=(12,4))

Bar Plot

A bar plot is a plot that shows the distribution of data in different categories. In Pandas, you can create a bar plot using the plot.bar() function. To visualize a bar plot of the data, we run the following code:

data.plot.bar(x="Sl. No", y="BMI", figsize=(12,4))

Scatter Plot

A scatter plot is a plot that shows the relationship between two variables. In Pandas, you can create a scatter plot using the plot.scatter() function. To visualize a scatter plot of the data, we run the following code:

data.plot.scatter(x="Sl. No", y="BMI", figsize=(12,4))

Box Plot

A box plot is a plot that shows the distribution of data using quartiles. In Pandas, you can create a box plot using the plot.box() function. To visualize a box plot of the data, we run the following code:

data.plot.box(x="Sl. No", y="BMI", figsize=(12,4))

Conclusion

In conclusion, Pandas is a very powerful library used to manipulate data and visualize it for analysis. It offers a wide range of functions to enable a user to play around with their data and make sense of it.

PS: Access the dataset used here. (License: MIT License)

Intro to Machine Learning with TensorFlow

Wesley Kambale — Mon, 06 Feb 2023 22:30:04 GMT

Prerequisites

Basic understanding of the Python programming language and knowledge of artificial intelligence concepts.

What is Machine Learning?

Machine learning (ML) is a subfield of artificial intelligence (AI) that enables computers to learn from data to make predictions and identify patterns. Computers traditionally rely on explicit programming. Machine learning algorithms can be divided into two main categories: supervised and unsupervised learning.

Supervised learning

Used when the training data includes labeled examples. The algorithm attempts to find the relationship between the input features (independent variables) and the output (dependent variable), which is known as the "ground truth". Once the relationship has been learned, the algorithm can use this knowledge to make predictions on new, unseen data. Common examples of supervised learning include classification (determining the class of an object based on its features) and regression (predicting a continuous value).

Unsupervised learning

Used when the training data is unlabeled. The algorithm must identify patterns and structure in the data on its own. Common examples of unsupervised learning include clustering (grouping similar data points) and dimensionality reduction (reducing the number of features in the data).

Why TensorFlow in Machine Learning?

TensorFlow (TF) is an open-source software library for machine learning and deep learning developed by the Google Brain team. It is used for implementing and deploying machine learning models, and it provides a comprehensive and flexible platform for developing deep learning applications by combining the computational algebra of optimization techniques for easy calculation of commonly used mathematical expressions.

Some of the important features of TensorFlow are:

The definition, optimization, and calculation of mathematical expressions easily with the help of multi-dimensional arrays called tensors.
A wide range of programming support (Python, C/C++, Java, R) of deep neural networks and machine learning techniques.
Highly scalable features for computation with various data sets both raw and pre-trained.
TensorFlow, as a cloud-based framework, comes with the power of GPU computing and automation management.

Installing TensorFlow

TensorFlow can be installed in Python easily, just like any other module, with a terminal command using pip, the package manager for Python. Open a terminal or command prompt and enter the following command:

pip install tensorflow

Note: This general installation command may not work for all operating systems.

macOS

To install TF that is optimized for Apple's macOS processors (especially M1 and M2) without going through the troubles of using the general installation, the following command is used:

pip install tensorflow-macos

Windows & Ubuntu without GPU

You can install the CPU version of TF on Windows & Ubuntu if you do not have an external GPU installed or wish to use the CPU:

pip install tensorflow-cpu

Note: Training neural networks with CPUs has performance issues in comparison to powerful GPUs.

TensorFlow also provides a high-level API, called Keras, which can simplify the creation and training of machine learning models. You can install Keras using the following command:

pip install tensorflow-keras

Training a Model in TensorFlow

Open Google Colab, create a new Notebook, and run through these basic steps of training a machine learning model using TensorFlow as follows:

Import the necessary libraries

These libraries might include matplotlib (for data visualization), scipy (for scientific and technical computing), numpy, pandas, and others.

import tensorflow as tf from sklearn import datasets#Import the following libraries if you need themimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport scipy

Load and preview the dataset

In this example, we'll use the iris dataset, which contains 150 samples of iris flowers with four features (sepal length, sepal width, petal length, and petal width) and three classes (setosa, versicolor, and virginica).

To load the dataset, run:

iris = datasets.load_iris() x = iris["data"] y = iris["target"]

To print the last 10 rows of the dataset to preview what it contains, run:

print(data.tail(10))

Split the dataset into training and test sets

The training set will be used to train the model, while the test set will be used to evaluate its performance.

from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

Preprocess the data

This can include normalizing the features and converting the output labels to one-hot encoding.

from sklearn.preprocessing import StandardScaler scaler = StandardScaler() x_train = scaler.fit_transform(x_train)x_test = scaler.transform(x)

Define the model

In TensorFlow, you can define a model using the Keras API. A model in Keras is defined as a sequence of layers, and you can choose from a variety of layer types, including dense (fully connected), convolutional, recurrent, and more. For this example, we'll use a simple fully connected (dense) model with three hidden layers and a softmax activation function in the output layer for the three-class classification problem.

model = tf.keras.models.Sequential([  tf.keras.layers.Dense(32, activation='relu', input_shape=(4,)),  tf.keras.layers.Dense(32, activation='relu'),  tf.keras.layers.Dense(32, activation='relu'),  tf.keras.layers.Dense(3, activation='softmax'),])

Compile the model

Before training, you need to compile the model by specifying the optimizer, loss function, and metrics to use. In this example, we'll use the Adam optimizer, categorical cross-entropy loss, and accuracy as the metric.

model.compile(optimizer='adam',              loss='sparse_categorical_crossentropy',              metrics=['accuracy'])

Train the model

You can train the model using the fit method, which takes the training data and target values as arguments. You can also specify the batch size and the number of epochs (iterations over the training data).

model.fit(x_train, y_train, batch_size=32, epochs=100)

Evaluate the model

After training, you can evaluate the model's performance on the test data using the evaluate method. This will return the loss and accuracy of the model on the test data.

test_loss, test_acc = model.evaluate(x_test, y_test)print('Test Accuracy:', test_acc)

Make predictions

You can use the trained model to make predictions on new, unseen data using the prediction method.

predictions = model.predict(x_test)

After making predictions, there are several steps that can be taken:

Evaluate the performance of the model
You can use various evaluation metrics such as accuracy, precision, recall, F1 score, ROC-AUC, etc. to assess how well the model performed on the test data.
Analyze the errors
You can examine the instances where the model made incorrect predictions and try to understand why the model made those mistakes. This can help you identify limitations or weaknesses in the model.
Improve the model
Based on the analysis of the errors, you can modify the model architecture, change the feature representation, add more data, etc. to improve its performance.
Deploy the model
If the model performs well, you can deploy it to a production environment and use it to make predictions on new, unseen data.
Monitor the performance
After deployment, it's important to monitor the performance of the model and re-evaluate it periodically to ensure that it continues to perform well as the underlying data distribution changes over time.
Iterate the process
The process of building and deploying a machine learning model is iterative, and it's common to go through several rounds of improvement and evaluation before reaching a final, production-ready model.

Conclusion

We introduced the basics of machine learning and covered the steps to train a machine learning model using TensorFlow. TensorFlow is a powerful and flexible platform for developing machine learning models, and the high-level API, Keras, makes it easy to create and train models.

While we only covered the basics, TensorFlow offers a wide range of features and tools for developing more complex models. If you're interested in exploring further, there are many resources available, including the TensorFlow website, tutorials, and documentation.

Machine learning is a rapidly growing field with many exciting applications, and TensorFlow is a great tool for getting started in this area. Whether you're a seasoned developer or just starting, TensorFlow can help you take your machine learning skills to the next level.

Until we meet again. Same time, same place. Adios!

The Highs and Lows of 2022

Wesley Kambale — Tue, 03 Jan 2023 13:12:24 GMT

Hey there! I'm using Hashnode.

Well, I didn't write anything in 2022. I have about 4 drafts, and unless they are to back up my good deeds on Judgment day, they will never see the light of day. So allow me to talk about what actually happened in 2022.

January

Wait, wait, wait, hold on, back up, back up! It all starts in December 2021. On December 17, 2021, I handed in my undergrad dissertation as the final requirement for a Bachelor's degree in Computer Science from Mbarara University. I was free to now pursue my passion for tech community building outside the university setting.

I applied for a GDG Cloud community in Mbarara to back up the old and existing GDG Mbarara community and bring back the vibrance of the young, but steadily growing community in western Uganda. I envy Kampala, so, this was my attempt at getting back at them. What is good for the goose is good for the gander, so they say.

Her Highness Kolokodess (Princess Ada of She Code Africa, and our Community Manager in charge of the GDG program in sub-Saharan Africa at Google) went backpacking before I could do an interview with her.

On January 21, 2022, I had an interview with Ada. It was a friendly and lively one. She realized I had previously been part of the Google Developer Student Clubs program with Uncle Auwal, the Jollof blackbelt master. And I had told Auwal I was gonna apply to the GDG program. This made it easy for us to conduct the interview, which was really a conversation about how I was gonna run things.

On January 24, 2022, I got that email, and viol, I was the Organizer of the newest baby in SSA, GDG Cloud Mbarara. I was overjoyed but also reserved, knowing there was a lot of work to be done. But I had chosen a great Co-Organizer, my buddy, Andrew Okello who had served with me as the GDSC Co-Lead at Mbarara University. Auwal texted me on WhatsApp quoting the message in which I had assured him I would get into the GDG program.

I also managed to get on a project under the College of Agricultural and Environmental Sciences at Makerere University. I was to handle data collection and develop a machine-learning model to monitor environmental degradation and restoration in the R. Rwizi basin. I was joining hands with Moses Eteku, a colleague and founder at Shamos Tech Solutions.

February

February is to some a month of love, believe me, I forgot about Valentine's day and I was not even mad at myself. I was busy preparing and planning for the maiden event for our new community. There was no need for an Info Session. We kicked off with "Getting Started With Google Cloud" facilitated by Benjamin Otim, the DevOps and Cloud enthusiast I know in Mbarara. We got 33 attendees. This was huge for us because the community isn't as big as people would assume. We promised to build on this and do better.

Meanwhile, the data collection drive for the Makerere project (as we came to fondly call it) was on and I had my boys put on gumboots and take an exhausting tour of Mbarara to capture close to 8,000 images of key features identified to act as params for the model.

March

The month of women, yes. In the Google Developer program corridors, March is a month to celebrate our women for the immense work they do in the tech industry. So, naturally, at GDG Cloud Mbarara (with an absence of a Women Techmakers community) we had to organize for IWD (International Women's Day) Mbarara 2022. Now, one of the things I have to deal with in the few years of building tech communities is women.

Mbarara having a sizeable tech community means there are countable ladies involved and interested in the activities of the community. Most, through reading for tech-related programs at university, do not follow up on tech as a career and passion. One or two stick around. Inevitably, IWD Mbarara 2022 was a bit of a disaster. We got about 15 attendees. But that's not what one would expect. I had convinced Madam Nuriat (as we often call her) of CAMTech Uganda to be our guest speaker to share her wide experience as a successful woman in tech.

Meanwhile, my former GDSC Lead counterpart, Tabitha Namwone, had eaten her transport money (pun) and couldn't travel to Mbarara. She spoke to us virtually. But I think the ladies loved her presentation. She has great presentation skills for a little Mugisu (as she famously calls herself on Twitter). Can we get done with March? Don't wanna relive that dreaded month. But Tabitha has promised me she will take a bombardier to Mbarara for IWD Mbarara 2023, so, buckle up. We gonna be in the presence of Mt. Elgon's greatness.

Luckily for me, I transitioned from the Rotaract Club of Mbarara University to the Rotaract Club of Mbarara City to help me continue offering Service Above Self. I was received by Rtn. Edward Lukyamuzi aka Professor Omuto.

I also got the chance to prove that in another world, I could have been Uganda's other Joshua Cheptegei when I participated in the Rotary Club of Mbarara's Safe Motherhood Run to collect funds for the maternity ward at Mbarara Referral Hospital.

April

The reason why the first day of that month is Fool's day is that nothing really ever gets done in that month. You sleep for two hours and it's Good Friday, then go out to buy milk and return after three days (like Jesus Christ of Nazareth) drenched in alcohol and dirty from the spoils of that short holiday.

PS: Dear employers, no employee should ever lie to you that they have any meaningful KPI in April. They lie.

May

The DevRel team, both SSA and Global, were raining I/O and I/O Extended emails but I was not one to listen to any of that. May was the month the Lord had made. I was graduating from university, y'all. I was excited. Due to the ruins of COVID-19, we had not been able to get our official transcripts before graduation like it was before. The only way to get one was to attend the bloody and boring ceremony of being told you are officially unemployed.

I put up I/O Extended Mbarara 2022 on the event platform and let it ride out for itself. It was one of those events I wasn't going to put so much effort into. Plus, the community had been growing and anticipating these events. Tenting the tent (ideally sitting in the Graduands' tent) was the main focus. My dad had ensured throughout all my years of studying that I achieve just this one thing for him: Graduate. In our humble backgrounds, this means everything to the parents and the community (at least not the one supposed to hire you).

On May 28, at the Kihumuro main campus, I was conferred the degree of Bachelor of Science in Computer Science from Mbarara University. I could later look at my parents, siblings, and friends all happy and sharing in the moment of this milestone. But it never lasts, does it? Because May is just hoping whether the year may or may not be kind to you. So, let's see what happens next, shall we?

June

Having brushed off the numerous emails and lagging on the marketing and general preparation of I/O Extended Mbarara 2023, I swang into action and got a list of 11 speakers with Auwal giving us the keynote speech and having Jochen, the Cloud GDE from Mauritius, take us through latest cloud developments at Google and how techies can jump right in.

We managed to have more than 50 attendees both physically and virtually and it was a great experience for us. But remember I told you that the women in tech in Mbarara are a pearl and rare to come by? Yes, we had only one lady attend physically and two joining virtually. It was mindblowing. It was the "if you don't gerrritt, forget about it" moment. But I knew why and was laying down strategies to change the status quo.

July

In African tradition, and largely due to the weather partners in SSA in July, it's the cursed month. It's generally dry. Nothing much you can do. There's hunger everywhere, all you ever eat is the dust off the incomparably hot season. Why did I even mention this month? Oh, wait! In another world that gets me close to the community, July is the first month of the Rotary year. Under the Imagine Rotary theme of RI President Jennifer Jones, I serve as the Service Projects Chair of the Rotaract Club of Mbarara City.

August

August, just like its predecessor, has nothing much to offer. That's it.

September

It's a new Rotary year and we have to imagine what Rotary can do to improve the lives of people in our community, what do we do? We head to Omungari village in Kazo district, the home district of our current D9215 Governor, Rtn. Peace Taremwa. We did tree planting as part of the Rwamwanja Phase 4 project headed by the Rotaract Club of Bwebajja. Distributed mama kits and held a medical camp at Omungari Health Centre II.

Also, September means that the bells of DevFest are starting to ring within the developer community. We started preparations for DevFest Mbarara 2022 by putting out a call for speakers that later attracted 30 submissions.

October

I started work with Right Click Signs as a Mobile Dev, majorly using Flutter to build mobile applications while consuming internal APIs developed using Laravel. Never truly understand why people still hold on to PHP. It's a sailed boat. Never coming back from the Facebook hype.

Also, preparations for DevFest Mbarara 2022 went into high gear. Had to make sure everything was in place before I got out of the country. Yes, you had that right.

So, as a GDG Organizer, I got the rare opportunity to be invited to the in-person SSA Community Summit 2022 in Nairobi, Kenya fully sponsored by Google. I was running up and down like a restless village cock looking for a passport, a yellow fever card, and a COVID-19 vaccination card.

November

On November 2, I headed to the capital Kampala for my maiden flight to Nairobi on November 3. I had previously been in Nyeri but got into Kenya by road and that was feeling like 20 A.D. This right here was the moment that was going to stick as a memorable one.

November 3, I was off to Entebbe International Airport to catch my 4 PM (EAT) flight that got delayed because we (together with other GDG Organizers in Uganda: Hassan, Marghi, Nicholas, Rawah, and Immaculate) were boarding Kenya Airways which is notoriously known for delayed flights across Africa.

We touched down at Jomo Kenyatta International Airport, JKIA, at about 7 PM (EAT) and met with the team from Juba, Lesotho, and South Africa. Kenya being Kenya, we had to wait for one hour to get our "Uber" to the hotel in Westlands, Nairobi.

A picture of Westlands' night view taken from the 10th floor at Golden Tulip Hotel - Westlands, Nairobi while having breakfast, lunch, and supper for November 3 combined.

Meeting the whole team of GDG Organizers, WTM Ambassadors, GDEs, and Googlers from Eastern, Southern, and Central Africa. We had talks from Ada, Auwal, John Kimani, Fannie Ndlovu (the "very good" man), Georgia, Sodiq FM, and many more on both days of the Summit.

Visited the famous Westgate Mall in Nairobi where I played bowling under the Googlers team comprising all Google employees we had at the Summit. And I beat them all hands down. In fact, Allela owes me a branded bag which she betted if I was to beat her. But we later had dinner at CJ's off Koinange Street that evening. So, I can say, I forgave her.

While in Nairobi, we visited the United States University in Africa where GDG Nairobi and its sister communities hosted DevFest Nairobi 2022. One of the biggest on the African continent this year. I largely kept moving around with Robert John, the Machine Learning GDE, known for carrying with him the Arduino Nano RP2040 Connect boards which he uses to deliver his TinyML talk.

Now, while having these memorable times in Nairobi, the pilots of Kenya Airways laid down their tools and all flights were canceled including ours that was supposed to get take off on Sunday, November 6 at 4:30 PM. Sunday reached with all the uncertainties of whether we would go home or remain behind in Kenya as refugees.

Luckily for us, we connected that night on Uganda Airlines and flew back to Uganda: the Pearl of Africa, landing at 1 AM on Monday, November 7 night. The trip to the land of Githeri and Ugali had come to an end. That night I met my long-time friend and sister, Noeline, who works at Entebbe Airport. It had been a long time since I last saw her. Hugging her felt like home again.

But that's not all that happened in November, is it? Nah. Remember DevFest Mbarara was on the horizon. November 19 was around the corner and we did everything to ensure that we made that day a success.

We had secured the Innovation Village as our venue and our partner. Propel was also on board and sending us the incredible Andie Balbina, who I had met in Nairobi. She was flying down to Mbarara, the land of milk and honey. You don't want to wait to know what happened that night.

Hosting the unarguably second biggest DevFest in Uganda, we attracted more than 18 speakers and more than 150 attendees at my maiden DevFest. It was one of the best highlights of the year. Being able to bring down people from Kabale, Ishaka, Kampala, Nairobi, and even Masaka to a tech event in the small city of Mbarara. The dream to get back at Kampala and show the world that it's possible to build a tech community down here was possible and that every potential sponsor, partner, speaker, and attendee, need not fear coming down here and realizing this grand potential. So, thank you to everyone that made it a success.

December

The major highlight of December was attending DevFest Kampala 2022 to network and connect with the city folks and convince them that they should also take a trip west to Mbarara when the right time comes.

Met with some CodeZoners from the CodeZone WhatsApp group where we mentor newbies in tech at DevFest Kampala.

The next day, we had a luncheon with Auwal who was in Uganda for the first time to give keynote speech. The luncheon was to meet all former and current GDSC Leads, GDG Organizers and WTM Ambassadors that are shaping the tech community of Uganda.

To wrap-up December and 2022, I travelled to Bundibugyo in western Uganda, after Fort Portal and Ntoroko to have a natural time with my other family and also attend the wedding of dear cousin, Hanlord, who have grown up with. Was happy to see him become a family man and wish him the very best with his beautiful wife, Juliet.

So, ladies and gentlemen, that was my 2022 with its highs and lows. I can't wait to see what 2023 has in store for me. But there's one way to find out, isn't it? Stay safe. Stay hungry for knowledge and skills. Adios!

The Journey as a GDSC Lead

Wesley Kambale — Tue, 17 Aug 2021 21:38:51 GMT

Today, Wednesday, August 18, 2021, I graduate from the 2020/21 sub-Sahara Africa cohort of Google Developer Student Club Leads. It also marks exactly one year since I received an email congratulating me on being selected as the first Google Developer Student Club Lead of Mbarara University of Science and Technology, a university-based community of students interested in Google developer technologies that I'd go on to build.

As I graduate, I now look back at the people I have met, the events, sessions, and interactions we have had and I can not be more proud of how much we have achieved as a community, but also as an individual. But let me tell the story in detail, okay?

The Genesis

Around April 2020, as the COVID-19 pandemic started to bite in with a nationwide total lockdown, I saw a message about the DSC Lead Applications in a CodeZone WhatsApp group, a platform of mentors dedicating their spare time to mentor freshmen in Ugandan universities into the world of programming. I will be honest, I lazily opened the link and applied without high hopes of getting selected.

Later in June, I was asked to schedule an interview. I remember coming to the interview on Hangouts, and Sumit Kakkar requested that I come back after 30 minutes. He was still sleepy after a long night of work. Eventually, I was interviewed and I admit, it was fun. Sumit was but friendly. I asked him about my performance and he was positive.

On August 18, 2020, I was selected and effectively became the pioneer GDSC Lead for GDSC Mbarara University with a responsibility to build and lead the Club in a middle of a pandemic. Miles away from the university, armed with a smartphone (and all virtual platforms to use in setting up a community), I set out to do the best I could.

What Happened Next

On September 22, 2020, we had the first Info Session to introduce the club: the goals, aims, and objectives, to the early members who had responded to the first call. The session was graced by Muhammad Samu, the Program Manager, Google Developer Student Clubs in sub-Sahara Africa, and Drake Arinda, the Programs Officer at Hive Colab - Mbarara, an innovation hub that would go on to partner with us on this journey.

From November 6 to December 4, 2020, we ran the Android Study Jams: New to Programming and Android Study Jams: Prior Programming Experience with recorded sessions available on the club YouTube channel here. The sessions were integrated both virtually and physically at Empower Youth in Technology, a local innovation hub we had partnered with within the course of building the community.

On January 30, 2021, we had a kick-off call for the 2021 Solution Challenge to give insights and provide information to the community for our inaugural entry into the global competition of GDSCs with a mission is to solve for one or more of the United Nations 17 Sustainable Development Goals using Google technology.

A week later, on February 6, 2021, we had a Project Review & Design Day of the different projects community members had come up with that would take them to the main global competition.

By the end of the Review, we had 4 solid projects that would go on to represent us; the Club and the University. These projects were;

Mulinde, a web app that seeks to source out jobs for the skilled, but informal sector workers built by Daniel Nasasira, Hussein Akugizibwe, and Ezra Rodney Mpiima.
Contactless Payments, a mobile app with the implementation of the NFC hardware feature to implement contactless payments built by Edgar Baluku and Michael Ajuna.
YieldUp, a web app that can be used by farmers to detect crop diseases mainly in cassava, beans, and coffee by taking pictures of plant leaves built by Andrew Okello and Ezra Tumusiime.
Traveler App, a web app to act as a travel assistant, ticketing and booking buses on major roads in Uganda built by Ivan Isiiko, Brian Bidhampola, and Rongum Wahab.

On February 26, 2021, we had a 2-day intensive Hackathon Day for the 2021 Solution Challenge to accelerate the development of the solutions in time before the deadline of the submission would hit. The Hackathon was organized in partnership with Hive Colab Mbarara.

The Traveller App team mentored by Nakanwagi EvelyneOn Day 1, we had orientation, a lot of coding, and cracking.

And then Day 2 gave us the winners of the Hackathon Day after rigorous pitching and judging. The winning team was YieldUp, a web app that can be used by farmers to detect crop diseases mainly in cassava, beans, and coffee by taking pictures and implementing image processing to classify healthy or unhealthy crop leaves.Drake Arinda, Andrew Okello (YieldUp), Me, Ezra Tumusiime (YieldUp), and Probuse.

On March 27, 2021, as we neared the deadline for submission of the solutions to the main competition, the Faculty of Computing and Informatics joined in to support and host the Demo Day for our community members to showcase their solutions to the university community before sending them out.

The Demo Day attracted faculty members including the Dean, Dr. Evarist Nabaasa, 7 companies and about 200 students at the university. Both students and companies pitched their ideas, solutions, and products. Eventually, the Location Sensors team led by Jimmy Segujja and Grace Nsubuga won the Best Student Award while Global Auto Systems led by Edwin Nahabwe won the Best Company Award.

Best Student Winners: Jimmy Segujja and Grace Nsubuga of the Location Sensors team

The Demo Day also attracted two fellow GDSC Leads from Aptech Computer Education and Kabale University.

Brian Johnson (Aptech), Me, Azizi Kakooza (Kabale University)

Having submitted our solutions and thereby closing the 2021 Solution Challenge chapter, on April 10, 2021, we had an Info Session on Google Summer of Code that was graced by Kanyinsola Fapohunda, a Site Reliability Engineer at Google, Ndubuisi Onyemenam, a Software Engineer at PATRICIA and James John, a Software Engineer at Oppia Foundation. We discussed ways on how students can contribute to Open Source to boost their careers.

Finally, on July 16, 2021, we had my last event as a GDSC Lead which was an info session about Anxiety & Mental Health of Software Developers with Karen Olive, a web enthusiast and mental health advocate at Ctrl Uganda, and Edgar Mugoya, an Orthopaedic Officer currently working as a Medical Representative for Gama Pharmaceuticals attached to Mulago Hospital.

Lessons Learnt

To be a tech community lead requires perseverance and endurance. Not everything will go as planned. Not everyone will give you the support and shoulder they promise you.
With a diverse group of people you are dealing with, you have to adapt and learn to deal with everyone in their own way.
Seek out partnerships with the people and organizations that believe in what you are doing. Seek advice and help to offload the workload.
As you reach out to better and grow others, grow yourself and be a better version to be able to lead others. Failures do not lead.
Leadership can be so exhausting and often you will have burnouts if you don't work as a team, or protect your mental health.
Reach out for opportunities that can be of great benefit to your community.
As a leader, be a giver and not a taker.

Shortcomings

Wasn't able to achieve all the plans I had for the community due to the COVID-19 pandemic which brought a constraint on physical meetups.
Didn't do much interpersonal follow-up to members of the community, although I mentored a few including the incoming GDSC Lead.
We didn't achieve any physical projects and were not able to compete in as many competitions as possible.
Social media presence wasn't up to full expectations.
My failure to convince well enough to have ladies join, and actively participate in the community. However, the selection of the new GDSC Lead, who is a female, vindicates me and shows that it's still a process to get our sisters on board.

The Team

I worked with rather a small team of like-minded, committed, and passionate colleagues to achieve all that has been jotted up there. Without their support and sacrifice, I wouldn't have laid a firm foundation onto which this club is destined to flourish as we watch from the sidelines.Andrew Okello, the GDSC Co-Lead. He was the person that knew things must happen whether I was in a position to [be part] or not. Andrew is an IBM and DataCamp certified Data Scientist and a machine learning enthusiast. Thank you, Andrew, for your service to the community.Ezra Tumusiime, the Events Planning Lead. Ezra was instrumental in making sure Hackathon Day and Demo Day happen in a grande way. He's a full-stack web developer. He builds with Python (Django), PHP (Laravel), Vue.js, React.js. Thank you, Ezra, for your service to the community.Ezra Rodney Mpiima, the PR & Marketing Lead. To be heard in the corridors of tech, a community needs a person like Rodney, and he did that. He was responsible for marketing the community to the university community. Rodney is the tech community lead of Tech Connect at the Innovation Village - Mbarara. Thank you, Ezra, for your service to the community.

Richard Kimera, the Faculty Advisor, is a man whose love to build tech communities is unmatched. It doesn't matter whether he has to put on gumboots, take an umbrella and be part of the meetup, he will. Thank you for offering to build this community; the support, mentorship, and the work behind the curtains that most never see and appreciate. We are grateful.

Continuation

As a pioneer GDSC Lead, it fell on me to see that this community is established and built on a firm foundation. However, with the selection of the new GDSC Lead, it now falls on her to continue and reinforce what was built and add her bricks to the foundation.

Ms. Sumaiya Nalukwago, 2021/22 GDSC Lead.

It feels good that I'm graduating from this program with an assurance that all was not in vain and that someone else is taking on. I have great faith and trust in her to lead. So, dear GDSC Mbarara University, as I say goodbye, be kind and great to the Lead as you have been to me. Don't wait to be called onto, there's a lot to do. So, get to work...

Up Next for Me

Maybe one day, I will be back as a GDG or GDG Cloud organizer. That's a step I am yet to look at. Or perhaps a Google Developer Expert (GDE). I'm considering further studies, too.

In the meantime, I continue my work in building communities and mentoring the next set of developers. I'm now a Mentor for learners in the Android track taking the Associate Android Developer (AAD) Certification offered by Andela under the Google Africa Developer Scholarship (GADS) program.

I'm still an Ambassador of StartHub Africa at Mbarara University mandated with building a community of the next set of African entrepreneurs. And most importantly, I'm considering a venture into technical writing via this blog and also contributing to Open Source projects in Uganda (and Africa).

The End.

Let's Get Started with Flask, Shall We?

Wesley Kambale — Thu, 12 Aug 2021 21:05:06 GMT

No, not a Vacuum or Thermos flask, okay?

I mean Flask, a very popular Python micro-web framework. It is so because it requires no particular tools or libraries to develop web applications. You do not need to validate forms, have a database abstraction layer, or any other components that have pre-existing third-party libraries which provide basic functions. This particular flask has gat it all, to begin with.

Kati, here's how we gonna roll it:

Installation, the very start of bugs, though no one will tell you.
Run the only successful program, "Hello World!" with Flask.
Check through the Directories.
Might as well want to check how Files are structured.
We shall do a little bit of Configuration.
And pray it works out because then, we do Initialization.
Sometimes you have to jump in the air if it works out, but we shall Run, Flask, Run!
If you doubt what you see, how about Views?
Not impressed yet? Okay, hold on for the Templates.
Voil, we gat it, shall we do a Conclusion?

Kati tusimbule...

Installation

We need the following installed to get going, otherwise don't kusimbula:

Python (in this case, Python 3).

If you already have Python installed on your system, you should see the following output when you run $ python on the command line:

$ pythonPython 3.8.10 (default, Jun  2 2021, 10:49:15) [GCC 9.4.0] on linuxType "help", "copyright", "credits" or "license" for more information.>>>

virtualenv and virtualenvwrapper

Installing virtualenv, creates a Python environment that we need to keep all the dependencies used by different Python projects together. And installing virtualenvwrapper, which is a set of extensions that provide simpler commands while using virtualenv.

To do this, we need our little friend pip.

$ pip install virtualenv$ pip install virtualenvwrapper$ export WORKON_HOME=~/Envs$ source /usr/local/bin/virtualenvwrapper.sh

Now, to create and activate a virtualenv, run the following commands:

$ mkvirtualenv flask-env$ workon flask-env

Note: flask-env is custom, name your environment as best as you can remember.

Oh, yeah, now we have a virtual environment called flask-env, which is activated and running.In this env, all dependencies we install will be kept.

Note: Remember to always activate this env to work on this or other projects.

Great, now let's create a directory for our app where all our files will go:

$ mkdir flask-project$ cd flask-project

Flask

Feeling good? Let's now install, yes, Flask. We'll need our little friend pip again:

$ pip install Flask

Kyo! Let's see what is contained in the flask, definitely harimu amaate, there are some dependencies that come along:

$ pip freezeclick==6.6Flask==2.0.1itsdangerous==0.24Jinja2==2.11.2MarkupSafe==0.23virtualenv==20.0.17Werkzeug==0.11.11wadllib==1.3.3wrapt==1.12.1zipp==1.0.0

Wazireeba? So, Flask uses Click (Command Line Interface Creation Kit) for its command-line interface to add custom shell commands for your app. itsdangerous provides security when sending data using cryptographical signing. Jinja2 (Not Jinja where they brew Nile Special from), is a powerful template engine for Python, while MarkupSafe is a HTML string handling library. Werkzeug is a utility library for WSGI (Web Server Gateway Interface), a protocol that ensures web apps and web servers can communicate effectively.

You can save the output above in a file. This is good practice because anyone who wants to work on or run your project will need to know the dependencies to install. The following command will save the dependencies in a requirements.txt file:

pip freeze > requirements.txt

"Hello World!" with Flask

Any beginner must run a "Hello World!" program. To some, this becomes their only successful program in that language, ever! But you don't want to end up like that, do you? So here's how to do our ting in Flask:

Create the following file, hello_world.py, in your favourite text editor; VS Code, Atom, Sublime Text3, and if you ask PHP developers nicely, they will tell you even Microsoft Word, winks:

# hello_world.pyfrom flask import Flaskapp = Flask(__name__)@app.route('/')def hello_world():    return 'Hello World!'

To begin, we import the Flask class and creating an instance (the object, flask) of it. We use the __name__ argument to indicate the app's module or package so that Flask knows where to find other files such as templates. Then we have a simple function that will display the string Hello World!. The preceding decorator simply tells Flask which path to display the result of the function. In this case, we have specified the route /, which is the home URL.

Let's see this in action, shall we? In your terminal, run the following:

$ export FLASK_APP = hello_world.py$ flask run * Serving Flask app "hello_world.py" * Environment: production   WARNING: This is a development server. Do not use it in a production deployment.   Use a production WSGI server instead. * Debug mode: off * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

The first command tells the system which app to run.
The next one starts the server.
Now Enter the specified URL (http://127.0.0.1:5000/) in your browser, don't worry, even Internet Explorer is faster here.

Twashuba twanywa, it did work!

Flask Directories

With only one functional file: hello_world.py, like our project, we are far from a fully-fledged real-world web application that comes bundled up with a lot of files. Therafwa, it is important to maintain a good directory structure to organize the different components of the application separately.

The following are some of the common directories in a Flask project:

/app: This is a directory within flask-project . We'll put all our code in here, and leave other files, such as the requirements.txt file, outside.
/app/templates: This is where all HTML files will go.
/app/static: This is where static files such as CSS and JavaScript files as well as images usually go. However, we won't be needing this folder for this tutorial since we won't be using any static files.

$ mkdir app app/templates

With that command, your project directory should now look like this:

 flask-project        app           templates        hello_world.py        requirements.txt

Well, you can see where the hello_world.py is now. Kinda out of place now, isn't it? Rindaaho we'll fix it soon.

Flask File Structure

In the "Hello World!" example, we only had one file (remember it?). Well, for us to build a huge website, we'll need more files that serve various functions. And this brings about such a file structure that is common with most Flask apps:

run.py: This is the application's entry point. We'll run this file to start the Flask server and launch our application.
config.py: This file contains the configuration variables for your app, such as database details.
app/__init__.py: This file initializes a Python module. Without it, Python will not recognize the app directory as a module.
app/views.py: This file contains all the routes for our application. This will tell Flask what to display on which path.app/models.py: This is where the models are defined. A model is a representation of a database table in code. However, because we will not be using a database in this tutorial, we won't be needing this file.

Now ahead and create these files, and also delete hello_world.py since we won't be needing it anymore:

$ touch run.py config.py$ cd app$ touch __init__.py views.py$ rm hello_world.py

And there you have your directory structure homeboy:

 flask-project        app           __init__.py           templates           views.py        config.py        requirements.txt        run.py

Heads up! Time to code now...

Configuration

The config.py file should contain one variable per line as you see below:

# config.py# Enable Flask's debugging features. Should be False in productionDEBUG = True

Note: This config file is very simplified and would not be appropriate for a more complex application. For bigger applications, you may choose to have different config.py files for testing, development, and production, and put them in a config directory, making use of classes and inheritance. You may have some variables that should not be publicly shared, such as passwords and secret keys. These can be put in an instance/config.py file, which should not be pushed to version control.

Initialization

Now, we have to initialize our app with all our configurations. This is done in the app/__init__.py file. Note that if we set instance_relative_config to True, we can use app.config.from_object('config') to load the config.py file.

# app/__init__.pyfrom flask import Flask# Initialize the appapp = Flask(__name__, instance_relative_config=True)# Load the viewsfrom app import views# Load the config fileapp.config.from_object('config')

Flask, Run!

All we have to do now is configure the run.py file so we can start the Flask server.

# run.pyfrom app import appif __name__ == '__main__':    app.run()

To use the command flask run as we did before, we would need to set the FLASK_APP environment variable to run.py:

$ export FLASK_APP = run.py$ flask run

First error? Nah, it is just a 404 page because we haven't written any views for our app. That'll be fixed as we go on.

Views in Flask

With our "Hello World!" example, you by now have an understanding of how views work. We use the @app.route decorator to specify the path where we would like the view to be displayed on. Let's now see what else we can do with views.

# views.pyfrom flask import render_templatefrom app import app@app.route('/')def index():    return render_template("index.html")@app.route('/about')def about():    return render_template("about.html")

Now, Flask comes with a method, render_template, which we use to specify which HTML file should be loaded in a particular view. Of course, the index.html and about.html files do not exist yet, so Flask will give us a Template Not Found or Internal Server Error when we navigate to these paths.

Templates

Flask allows us to use a variety of template languages, but Jinja2 is the most popular one. Jinja2 provides syntax that allows us to add some functionality to our HTML files, like if-else blocks and for loop, and also use variables inside our templates. Jinja2 also lets us implement template inheritance, which means we can have a base template that other templates inherit from.

Let's begin by creating the following three HTML files:

$ cd app/templates$ touch base.html index.html about.html

We'll start with the base.html file, using a slightly modified version of this example Bootstrap template:

html><html lang="en">  <head>    <title>{% block title %}{% endblock %}title>        <link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet">        <link href="https://getbootstrap.com/examples/jumbotron-narrow/jumbotron-narrow.css" rel="stylesheet">  head>  <body>    <div class="container">      <div class="header clearfix">        <nav>          <ul class="nav nav-pills pull-right">            <li role="presentation"><a href="/">Homea>li>            <li role="presentation"><a href="/about">Abouta>li>            <li role="presentation"><a href="http://flask.pocoo.org" target="_blank">About Flaska>li>          ul>        nav>      div>      {% block body %}      {% endblock %}      <footer class="footer">        <p> 2021 Wesley Kambalep>      footer>    div>   body>html>

Did you notice the {% block %} and {% endblock %} tags? We'll also use them in the templates that inherit from the base template:

{% extends "base.html" %}{% block title %}Home{% endblock %}{% block body %}<div class="jumbotron">  <h1>Flask Is Awesomeh1>  <p class="lead">And I'm glad to be learning so much about it!p>div>{% endblock %}

{% extends "base.html" %}{% block title %}About{% endblock %}{% block body %}<div class="jumbotron">  <h1>The About Pageh1>  <p class="lead">You can learn more about my website here.p>div>{% endblock %}

We use the {% extends %} tag to inherit from the base template. We insert the dynamic content inside the {% block %} tags. Everything else is loaded right from the base template, so we don't have to re-write things that are common to all pages, such as the navigation bar and the footer.

Refresh that browser, iwe mwanawe!

Kati awo nebwentema!, but first:

Conclusion

You made it! Congratulations on nailing your first Flask web application up! Explore more. You now have a great foundation to start building more complex apps. Check out the official documentation for more information and examples.

Had fun? Follow @WesleyKambale on Twitter.

Inspiration and credits for the HTML files go out to Mbithe Nzomo

Any discussions? Let's have a conversation in the comments below.