{"id":247,"date":"2018-01-12T16:31:32","date_gmt":"2018-01-12T16:31:32","guid":{"rendered":"http:\/\/eipsoftware.com\/musings\/?p=247"},"modified":"2018-02-20T16:31:47","modified_gmt":"2018-02-20T16:31:47","slug":"read-csv-files-using-python-panda-dataframes","status":"publish","type":"post","link":"https:\/\/eipsoftware.com\/musings\/read-csv-files-using-python-panda-dataframes\/","title":{"rendered":"Read CSV files using Python Panda Dataframes"},"content":{"rendered":"<p>The great thing about the Panda&#8217;s library for Python is how easily it can manipulate data sources.\u00a0 We will look at one of the first things you will want to do, read a .csv file.<\/p>\n<p><!--more--><\/p>\n<h5>Read .csv file using a Panda Dataframe<\/h5>\n<pre class=\"lang:python decode:true\">import pandas\r\nbuyclicksDF = pandas.read_csv(\"buy-clicks.csv\")\r\nbuyclicksDF.shape<\/pre>\n<p class=\"\">First we need to import the pandas library.<br \/>\nNext we will store the .csv file into a local variable called buyclicksDF.Lastly we will use the shape command to display the number of rows and columns. The results are 2947 rows and 7 columns<\/p>\n<pre class=\"lang:python decode:true \">(2947, 7)<\/pre>\n<p>Now let&#8217;s take a look at some sample rows and see what they look like.<\/p>\n<pre class=\"lang:python decode:true \">buyclicksDF.head(7)<\/pre>\n<p>And the results we get are as follows.<img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-254\" src=\"https:\/\/eipsoftware.com\/musings\/wp-content\/uploads\/2018\/02\/Screenshot-from-2018-02-20-11-19-08.png\" alt=\"\" width=\"445\" height=\"213\" srcset=\"https:\/\/eipsoftware.com\/musings\/wp-content\/uploads\/2018\/02\/Screenshot-from-2018-02-20-11-19-08.png 445w, https:\/\/eipsoftware.com\/musings\/wp-content\/uploads\/2018\/02\/Screenshot-from-2018-02-20-11-19-08-300x144.png 300w\" sizes=\"auto, (max-width: 445px) 100vw, 445px\" \/><\/p>\n<p>Now we will read another file into a data frame and again look at the first seven rows.<\/p>\n<pre class=\"lang:python decode:true \">adclicksDF = pandas.read_csv(\"ad-clicks.csv\")\r\nadclicksDF.head(7)<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-255\" src=\"https:\/\/eipsoftware.com\/musings\/wp-content\/uploads\/2018\/02\/Screenshot-from-2018-02-20-11-24-10.png\" alt=\"\" width=\"512\" height=\"214\" srcset=\"https:\/\/eipsoftware.com\/musings\/wp-content\/uploads\/2018\/02\/Screenshot-from-2018-02-20-11-24-10.png 512w, https:\/\/eipsoftware.com\/musings\/wp-content\/uploads\/2018\/02\/Screenshot-from-2018-02-20-11-24-10-300x125.png 300w\" sizes=\"auto, (max-width: 512px) 100vw, 512px\" \/><\/p>\n<h5>Joining Dataframes<\/h5>\n<p>Now that both dataframes are loaded we will want to join them together, and then take a look at the first seven rows.<\/p>\n<pre class=\"lang:python decode:true\">mergeDF = adclicksDF.merge(buyclicksDF, on = \"userId\")\r\nmergeDF.head(7)<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-256\" src=\"https:\/\/eipsoftware.com\/musings\/wp-content\/uploads\/2018\/02\/Screenshot-from-2018-02-20-11-27-45.png\" alt=\"\" width=\"923\" height=\"210\" srcset=\"https:\/\/eipsoftware.com\/musings\/wp-content\/uploads\/2018\/02\/Screenshot-from-2018-02-20-11-27-45.png 923w, https:\/\/eipsoftware.com\/musings\/wp-content\/uploads\/2018\/02\/Screenshot-from-2018-02-20-11-27-45-300x68.png 300w, https:\/\/eipsoftware.com\/musings\/wp-content\/uploads\/2018\/02\/Screenshot-from-2018-02-20-11-27-45-768x175.png 768w, https:\/\/eipsoftware.com\/musings\/wp-content\/uploads\/2018\/02\/Screenshot-from-2018-02-20-11-27-45-900x205.png 900w\" sizes=\"auto, (max-width: 923px) 100vw, 923px\" \/><\/p>\n<p>Joining the dataframes you specify the method <strong>.merge<\/strong> from one dataframe and specify what columns to use to join them on. In this simple case both dataframes have a column labeled userId and we will join on that column. And finally we stored the results into a new dataframe labeled mergeDF.<\/p>\n<p>I hope you found the above informative, let me know if you have any questions in the comments below.<\/p>\n<p><a href=\"mailto:michael.data@eipsoftware.com\">\u2014 michael.data@eipsoftware.com<\/a><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The great thing about the Panda&#8217;s library for Python is how easily it can manipulate data sources.\u00a0 We will look at one of the first things you will want to do, read a .csv file.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_crdt_document":"","footnotes":""},"categories":[3,8,4],"tags":[28,30,50],"series":[],"class_list":["post-247","post","type-post","status-publish","format-standard","hentry","category-python","category-pandas","category-code","tag-python","tag-code","tag-pandas"],"_links":{"self":[{"href":"https:\/\/eipsoftware.com\/musings\/wp-json\/wp\/v2\/posts\/247","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/eipsoftware.com\/musings\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/eipsoftware.com\/musings\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/eipsoftware.com\/musings\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/eipsoftware.com\/musings\/wp-json\/wp\/v2\/comments?post=247"}],"version-history":[{"count":4,"href":"https:\/\/eipsoftware.com\/musings\/wp-json\/wp\/v2\/posts\/247\/revisions"}],"predecessor-version":[{"id":257,"href":"https:\/\/eipsoftware.com\/musings\/wp-json\/wp\/v2\/posts\/247\/revisions\/257"}],"wp:attachment":[{"href":"https:\/\/eipsoftware.com\/musings\/wp-json\/wp\/v2\/media?parent=247"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/eipsoftware.com\/musings\/wp-json\/wp\/v2\/categories?post=247"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/eipsoftware.com\/musings\/wp-json\/wp\/v2\/tags?post=247"},{"taxonomy":"series","embeddable":true,"href":"https:\/\/eipsoftware.com\/musings\/wp-json\/wp\/v2\/series?post=247"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}