top of page

Doggy boots size recommendation - PBL 01 - Azure Data Scientist Associate

  • Writer: Tung San
    Tung San
  • Jul 26, 2021
  • 3 min read

Updated: Aug 7, 2021

Background: A client, a company which sells harnesses and doggy boots for avalanche-rescue dogs, finds that its customer often order the correct size of harness for their rescue dogs but not when they order the boots. Also, customers usually order harnesses and doggy boots for their rescue dogs at the same time.



Objective: Build a model to offer recommendation of doggy boots size when the client's customer order harness.



1. Download package and datasets

!pip install wget

import wget

url1 = "https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/graphing.py"

url2 = "https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/Data/doggy-boot-harness.csv"

outdir = "JUPY_download"

wget.download(url1, out = outdir)

wget.download(url2, out = outdir)



2. Prepare data

import pandas

# Load a file containing dog's boot and harness sizes

path1 = "./" + outdir + "/" + "doggy-boot-harness.csv"

dataset = pandas.read_csv(path1)

# Print the first few rows

dataset.head()











3. Visualize data

# Remove the sex and age-in-years columns.

del dataset["sex"]

del dataset["age_years"]

print(f"{dataset.mean()}")






Small dogs

# This creates a True or False value for each row where True means they are smaller than 55.64

is_small_dogs = dataset.harness_size < dataset.harness_size.mean()


# Now apply this 'mask' to our data to keep the smaller dogs

data_small_dogs = dataset[is_small_dogs ]

print(f"The small dogs has id {list(data_from_small_dogs.index)}")

(That is based on assumption that customer usually choose the correct hardness size for their dogs, and hardness size is an equivalent information to dog size.)


Among dogs with small paws

# Dogs with boot size smaller than 40 are said to have small paws

data_small_paws = dataset[dataset.boot_size < 40].copy()


# Load and prepare plotly to create our graphs

import plotly.express

import graphing # this is a custom file you can find in our code on github


# Show a graph of harness size by boot size:

plotly.express.scatter(data_small_paws , x="harness_size", y="boot_size")


Overall

plotly.express.scatter(dataset , x="harness_size", y="boot_size", title = "All Dogs")

Observe that the variation in each hardness size level are quite high. The variation shown in the tails are small but are likely due to small data.



4. Modeling

import pandas

data = pandas.read_csv('doggy-boot-harness.csv')


!pip install statsmodels

import statsmodels.formula.api as smf

# Fit a simple model that finds a linear relationship between booth size and harness size

model = smf.ols(formula = "boot_size ~ harness_size", data = data).fit()


import os

import joblib

new_model_filename = './avalanche_dogboot_linearmodel.pkl'

joblib.dump(model, new_model_filename)

if os.path.isfile('./avalanche_dogboot_linearmodel.pkl') is True:

print("Model saved!")



model_loaded = joblib.load(new_model_filename)

print("We have loaded a model with the following parameters:")

print(model_loaded.params)



5. Deploy trained model


1st Method: A method that input harness size, output predicted boot size.

# Let's write a function that loads and uses our model

def load_model_and_predict(harness_size):

'''

This function loads a pretrained model. It uses the model

with the customer's dog's harness size to predict the size of

boots that will fit that dog.

harness_size: The dog harness size, in cm

'''


# Load the model from file and print basic information about it

loaded_model = joblib.load(new_model_filename)

print("We have loaded a model with the following parameters:")

print(loaded_model.params)


# Prepare data for the model

inputs = {"harness_size":[harness_size]}


# Use the model to make a prediction

predicted_boot_size = loaded_model.predict(inputs)[0]

return predicted_boot_size


# Practice using our model

predicted_boot_size = load_model_and_predict(45)

print("Predicted dog boot size:", predicted_boot_size)


2nd Method : A method to give recommendation.

def check_size_of_boots(selected_harness_size, selected_boot_size):

'''

Calculates whether the customer has chosen a pair of doggy boots that

are a sensible size. This works by estimating the dog's actual boot

size from their harness size.


This returns a message for the customer that should be shown before

they complete their payment.


selected_harness_size: The size of the harness the customer wants to buy

selected_boot_size: The size of the doggy boots the customer wants to buy

'''


# Estimate the customer's dog's boot size

estimated_boot_size = load_model_and_predict(selected_harness_size)


# Round to the nearest whole number because we don't sell partial sizes

estimated_boot_size = int(round(estimated_boot_size))


# Check if the boot size selected is appropriate

if selected_boot_size == estimated_boot_size:

# The selected boots are probably OK

return f"Great choice! We think these boots will fit your avalanche dog well."

if selected_boot_size < estimated_boot_size:

# Selected boots might be too small

return "The boots you have selected might be TOO SMALL for a dog as "\

f"big as yours. We recommend a doggy boots size of {estimated_boot_size}."

if selected_boot_size > estimated_boot_size:

# Selected boots might be too big

return "The boots you have selected might be TOO BIG for a dog as "\

f"small as yours. We recommend a doggy boots size of {estimated_boot_size}."


# Practice using our new warning system

check_size_of_boots(selected_harness_size=55, selected_boot_size=39)








コメント


Post: Blog2 Post
bottom of page