# Category: PythonPage 1 of 2

Where do you live? May be a simple question, but when companies are asking where someone lives, the answer isn’t always so easy. The physical address and the mailing address of a person can be two completely separate things. And when companies want to send direct mail, they need to ensure the message is appropriate.

Today I am going to show a Python program that I wrote that can process millions of addresses and verify if the address is a valid mailing address that matches the physical address. And how the majority of online mapping services get it wrong.

I’m not gonna write you a love songSara Bareilles

## Post 1 of 4

A question I’ve always wondered is their a magic formula for creating a number one song. Many song writers are prolific however their song doesn’t necessarily have commercial success. It may be difficult to quantify how the musical composition relates to the song’s success. However I am going to make an attempt at evaluating the lyrics of songs to determining if we can accurately predict if a song will be commercially successful.

To obtain the data I am going to use Beautiful Soup and a few other packages to scrap the content from the websites.

## Steps for Getting the Data

The first step is getting the list of the songs by the artist. I am using the website http://www.azlyrics.com to obtain the list of songs and the lyrics for the songs.

#### Computing the Minimum Number of Flight Segments

If we want to compute the minimum number of flight segments between a starting city and target city, we can construct an undirected graph.  In the graph the nodes represent cities and the edges represent the flight segments.  We can count the number of segments to determine the shortest distance.

The following can be applied to any situation in finding the shortest path.  It is an implementation of the breadth first search algorithm.

See code below.

#### Testing for Unconnected Components in an Undirected Graph

With a graph structure it is possible that parts of the graph will not be connected to each other.  An example of this would be with social networks, not all users are friends with other users.

The code will find the total number of connected components of the graph, or graph parts in an undirected graph.

See code below.

#### Finding an Exit from a Maze using undirected graphs.

We can think of a maze as a rectangular grid of cells with paths between adjacent cells. If we want to find if there is a path from a given cell to a given exit from the maze, where the exit is represented by a cell, you can represent the maze as an undirected graph.

The nodes of the graph are cells of the maze, and two nodes are connected an undirected edge if they are adjacent and there is no wall between them. Therefore we can surmise we just need to see if a path, series of edges connecting the nodes, to determine if the two nodes are connected.

See code below.

#### Spark – Streaming Data, Capturing and Querying

Today we will look at how to capture streaming data and perform some simple queries as the data is streamed. We will use the regular expressions library and the PySpak library.  The streaming data comes from a weather station that transmits different weather at different intervals.  We will need to find the correct data out of the stream and output the results.

Let’s get started.

#### Spark – SQLContext

Today we will look at the SQLContext object from the PySpark library and how you can use it to connect to a local database.  In the example below we will:
Connect to a local PostgreSQL database and read the contents into a dataframe.
Run some simple SQL queries
And join two data frames together

Let’s get started.

#### Hadoop Spark – Word Count

One of the first things to do in most programming languages is create a “Hello World!” program.  The equivalent in Spark is to create a program that will read the contents of a file and count the number of occurrences of each word.

Below I will show a basic example, so let’s start counting.

The great thing about the Panda’s library for Python is how easily it can manipulate data sources.  We will look at one of the first things you will want to do, read a .csv file.

Page 1 of 2