{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This time we are going to pull data directly from the internet.\n",
"Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.\n",
"\n",
"### Step 1. Import the necessary libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Excercices taken from https://github.com/guipsamora/pandas_exercises"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 3. Assign it to a variable called chipo."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'\n",
" \n",
"chipo = pd.read_csv(url, sep = '\\t')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 4. See the first 10 entries"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
" \n",
" | \n",
" order_id | \n",
" quantity | \n",
" item_name | \n",
" choice_description | \n",
" item_price | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" Chips and Fresh Tomato Salsa | \n",
" NaN | \n",
" $2.39 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" Izze | \n",
" [Clementine] | \n",
" $3.39 | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" Nantucket Nectar | \n",
" [Apple] | \n",
" $3.39 | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" Chips and Tomatillo-Green Chili Salsa | \n",
" NaN | \n",
" $2.39 | \n",
"
\n",
" \n",
" 4 | \n",
" 2 | \n",
" 2 | \n",
" Chicken Bowl | \n",
" [Tomatillo-Red Chili Salsa (Hot), [Black Beans... | \n",
" $16.98 | \n",
"
\n",
" \n",
" 5 | \n",
" 3 | \n",
" 1 | \n",
" Chicken Bowl | \n",
" [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... | \n",
" $10.98 | \n",
"
\n",
" \n",
" 6 | \n",
" 3 | \n",
" 1 | \n",
" Side of Chips | \n",
" NaN | \n",
" $1.69 | \n",
"
\n",
" \n",
" 7 | \n",
" 4 | \n",
" 1 | \n",
" Steak Burrito | \n",
" [Tomatillo Red Chili Salsa, [Fajita Vegetables... | \n",
" $11.75 | \n",
"
\n",
" \n",
" 8 | \n",
" 4 | \n",
" 1 | \n",
" Steak Soft Tacos | \n",
" [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... | \n",
" $9.25 | \n",
"
\n",
" \n",
" 9 | \n",
" 5 | \n",
" 1 | \n",
" Steak Burrito | \n",
" [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... | \n",
" $9.25 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" order_id quantity item_name \\\n",
"0 1 1 Chips and Fresh Tomato Salsa \n",
"1 1 1 Izze \n",
"2 1 1 Nantucket Nectar \n",
"3 1 1 Chips and Tomatillo-Green Chili Salsa \n",
"4 2 2 Chicken Bowl \n",
"5 3 1 Chicken Bowl \n",
"6 3 1 Side of Chips \n",
"7 4 1 Steak Burrito \n",
"8 4 1 Steak Soft Tacos \n",
"9 5 1 Steak Burrito \n",
"\n",
" choice_description item_price \n",
"0 NaN $2.39 \n",
"1 [Clementine] $3.39 \n",
"2 [Apple] $3.39 \n",
"3 NaN $2.39 \n",
"4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98 \n",
"5 [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... $10.98 \n",
"6 NaN $1.69 \n",
"7 [Tomatillo Red Chili Salsa, [Fajita Vegetables... $11.75 \n",
"8 [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... $9.25 \n",
"9 [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... $9.25 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.head(10)\n",
"# chipo['choice_description'][4]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 5. What is the number of observations in the dataset?"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 4622 entries, 0 to 4621\n",
"Data columns (total 5 columns):\n",
"order_id 4622 non-null int64\n",
"quantity 4622 non-null int64\n",
"item_name 4622 non-null object\n",
"choice_description 3376 non-null object\n",
"item_price 4622 non-null object\n",
"dtypes: int64(2), object(3)\n",
"memory usage: 180.6+ KB\n"
]
},
{
"data": {
"text/plain": [
"4622"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.info()#\n",
"\n",
"# OR\n",
"\n",
"chipo.shape[0]\n",
"# 4622 observations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 6. What is the number of columns in the dataset?"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"5"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.shape[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 7. Print the name of all the columns."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Index([u'order_id', u'quantity', u'item_name', u'choice_description',\n",
" u'item_price'],\n",
" dtype='object')"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 8. How is the dataset indexed?"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"RangeIndex(start=0, stop=4622, step=1)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 9. Which was the most ordered item? "
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" order_id | \n",
" quantity | \n",
"
\n",
" \n",
" item_name | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" Chicken Bowl | \n",
" 713926 | \n",
" 761 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" order_id quantity\n",
"item_name \n",
"Chicken Bowl 713926 761"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c = chipo.groupby('item_name')\n",
"c = c.sum()\n",
"c = c.sort_values(['quantity'], ascending=False)\n",
"c.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 10. How many items were ordered?"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" order_id | \n",
" quantity | \n",
"
\n",
" \n",
" item_name | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" Chicken Bowl | \n",
" 713926 | \n",
" 761 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" order_id quantity\n",
"item_name \n",
"Chicken Bowl 713926 761"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c = chipo.groupby('item_name')\n",
"c = c.sum()\n",
"c = c.sort_values(['quantity'], ascending=False)\n",
"c.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 11. What was the most ordered item in the choice_description column?"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" order_id | \n",
" quantity | \n",
"
\n",
" \n",
" choice_description | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" [Diet Coke] | \n",
" 123455 | \n",
" 159 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" order_id quantity\n",
"choice_description \n",
"[Diet Coke] 123455 159"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c = chipo.groupby('choice_description').sum()\n",
"c = c.sort_values(['quantity'], ascending=False)\n",
"c.head(1)\n",
"# Diet Coke 159"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 12. How many items were orderd in total?"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"4972"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"total_items_orders = chipo.quantity.sum()\n",
"total_items_orders"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 13. Turn the item price into a float"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"dollarizer = lambda x: float(x[1:-1])\n",
"chipo.item_price = chipo.item_price.apply(dollarizer)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 14. How much was the revenue for the period in the dataset?"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Revenue was: $39237.02\n"
]
}
],
"source": [
"revenue = (chipo['quantity']* chipo['item_price']).sum()\n",
"\n",
"print('Revenue was: $' + str(np.round(revenue,2)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 15. How many orders were made in the period?"
]
},
{
"cell_type": "code",
"execution_count": 130,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1834"
]
},
"execution_count": 130,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.order_id.value_counts().count()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 16. What is the average amount per order?"
]
},
{
"cell_type": "code",
"execution_count": 140,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"18.811428571428689"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"order_grouped = chipo.groupby(by=['order_id']).sum()\n",
"order_grouped.mean()['item_price']\n",
"\n",
"# Or \n",
"\n",
"#chipo.groupby(by=['order_id']).sum().mean()['item_price']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 17. How many different items are sold?"
]
},
{
"cell_type": "code",
"execution_count": 148,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"50"
]
},
"execution_count": 148,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.item_name.value_counts().count()"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}