Doggy boots size recommendation - PBL 01 - Azure Data Scientist Associate
- Tung San
- Jul 26, 2021
- 3 min read
Updated: Aug 7, 2021
Background: A client, a company which sells harnesses and doggy boots for avalanche-rescue dogs, finds that its customer often order the correct size of harness for their rescue dogs but not when they order the boots. Also, customers usually order harnesses and doggy boots for their rescue dogs at the same time.
Objective: Build a model to offer recommendation of doggy boots size when the client's customer order harness.
1. Download package and datasets
!pip install wget
import wget
url1 = "https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/graphing.py"
url2 = "https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/Data/doggy-boot-harness.csv"
outdir = "JUPY_download"
wget.download(url1, out = outdir)
wget.download(url2, out = outdir)
2. Prepare data
import pandas
# Load a file containing dog's boot and harness sizes
path1 = "./" + outdir + "/" + "doggy-boot-harness.csv"
dataset = pandas.read_csv(path1)
# Print the first few rows
dataset.head()

3. Visualize data
# Remove the sex and age-in-years columns.
del dataset["sex"]
del dataset["age_years"]
print(f"{dataset.mean()}")

Small dogs
# This creates a True or False value for each row where True means they are smaller than 55.64
is_small_dogs = dataset.harness_size < dataset.harness_size.mean()
# Now apply this 'mask' to our data to keep the smaller dogs
data_small_dogs = dataset[is_small_dogs ]
print(f"The small dogs has id {list(data_from_small_dogs.index)}")

(That is based on assumption that customer usually choose the correct hardness size for their dogs, and hardness size is an equivalent information to dog size.)
Among dogs with small paws
# Dogs with boot size smaller than 40 are said to have small paws
data_small_paws = dataset[dataset.boot_size < 40].copy()
# Load and prepare plotly to create our graphs
import plotly.express
import graphing # this is a custom file you can find in our code on github
# Show a graph of harness size by boot size:
plotly.express.scatter(data_small_paws , x="harness_size", y="boot_size")

Overall
plotly.express.scatter(dataset , x="harness_size", y="boot_size", title = "All Dogs")

Observe that the variation in each hardness size level are quite high. The variation shown in the tails are small but are likely due to small data.
4. Modeling
import pandas
data = pandas.read_csv('doggy-boot-harness.csv')
!pip install statsmodels
import statsmodels.formula.api as smf
# Fit a simple model that finds a linear relationship between booth size and harness size
model = smf.ols(formula = "boot_size ~ harness_size", data = data).fit()
import os
import joblib
new_model_filename = './avalanche_dogboot_linearmodel.pkl'
joblib.dump(model, new_model_filename)
if os.path.isfile('./avalanche_dogboot_linearmodel.pkl') is True:
print("Model saved!")

model_loaded = joblib.load(new_model_filename)
print("We have loaded a model with the following parameters:")
print(model_loaded.params)

5. Deploy trained model
1st Method: A method that input harness size, output predicted boot size.
# Let's write a function that loads and uses our model
def load_model_and_predict(harness_size):
'''
This function loads a pretrained model. It uses the model
with the customer's dog's harness size to predict the size of
boots that will fit that dog.
harness_size: The dog harness size, in cm
'''
# Load the model from file and print basic information about it
loaded_model = joblib.load(new_model_filename)
print("We have loaded a model with the following parameters:")
print(loaded_model.params)
# Prepare data for the model
inputs = {"harness_size":[harness_size]}
# Use the model to make a prediction
predicted_boot_size = loaded_model.predict(inputs)[0]
return predicted_boot_size
# Practice using our model
predicted_boot_size = load_model_and_predict(45)
print("Predicted dog boot size:", predicted_boot_size)

2nd Method : A method to give recommendation.
def check_size_of_boots(selected_harness_size, selected_boot_size):
'''
Calculates whether the customer has chosen a pair of doggy boots that
are a sensible size. This works by estimating the dog's actual boot
size from their harness size.
This returns a message for the customer that should be shown before
they complete their payment.
selected_harness_size: The size of the harness the customer wants to buy
selected_boot_size: The size of the doggy boots the customer wants to buy
'''
# Estimate the customer's dog's boot size
estimated_boot_size = load_model_and_predict(selected_harness_size)
# Round to the nearest whole number because we don't sell partial sizes
estimated_boot_size = int(round(estimated_boot_size))
# Check if the boot size selected is appropriate
if selected_boot_size == estimated_boot_size:
# The selected boots are probably OK
return f"Great choice! We think these boots will fit your avalanche dog well."
if selected_boot_size < estimated_boot_size:
# Selected boots might be too small
return "The boots you have selected might be TOO SMALL for a dog as "\
f"big as yours. We recommend a doggy boots size of {estimated_boot_size}."
if selected_boot_size > estimated_boot_size:
# Selected boots might be too big
return "The boots you have selected might be TOO BIG for a dog as "\
f"small as yours. We recommend a doggy boots size of {estimated_boot_size}."
# Practice using our new warning system
check_size_of_boots(selected_harness_size=55, selected_boot_size=39)

コメント