Kanav Khurana
The Daily Dispatch #7, Let's clean this up, Part 3
Now that we’ve tested the stock code, let’s move on to a documented scenario which I hope to tweak for my exact use case.
Here’s the scenario verbatim:
“you’re a developer who works for a company that sells outdoor sporting gear. The company has automation that monitors social media channels. When someone posts a photo, the company wants to know whether the photo was taken at the beach or in the mountains. Based on where the photo was taken, the company can make targeted product recommendations to its customers.”
Step 1: Authorisation
Apparently, Einstein Platform services uses a JWT (Json Web Token) Auth flow.
Hmm. Sorry, what?
As an extract from the prolific Apexhours article, the JWT auth flow enables “Secure server-to-server integration without real time user involvement“.
Step 2: Create the dataset
The first step is to create the dataset that contains the beach and mountain images using a synchronous call:
curl -X POST -H "Authorization: Bearer <TOKEN>" -H "Cache-Control: no-cache" -H "Content-Type: multipart/form-data" -F "type=image" -F "path=https://einstein.ai/images/mountainvsbeach.zip" https://api.einstein.ai/v2/vision/datasets/upload/sync
Note: Something to still figure out will be how to create a token that lasts longer or at least refreshes itself.
A sneak peak into the sample image zip file shows the following structure:

It seems the classifier is trained and assigned labels based on the folder name.
So, all the ‘Beaches’ images are being grouped as similar and all the ‘Mountains’.
Once the images are uploaded, we can also do a fetch to see how many labels are created in Einstein vision with the following curl command:
curl -X GET -H "Authorization: Bearer <TOKEN>" -H "Cache-Control: no-cache" https://api.einstein.ai/v2/vision/datasets
Step 3: Train the dataset
Once the data set is uploaded, it is time to train the dataset.
It is important to note the datasetID from Step 2 for the Beach and Mountain models, and then insert it into the next call.
curl -X POST -H "Authorization: Bearer <TOKEN>" -H "Cache-Control: no-cache" -H "Content-Type: multipart/form-data" -F "name=Beach and Mountain Model" -F "datasetId=1353920" https://api.einstein.ai/v2/vision/train
This is an async call. The response comes back as ‘QUEUED’.
Step 3.1: Check the status of the model.
curl -X GET -H "Authorization: Bearer K5IDGRRTGZJDGVCVJVMEWVSJJZGEGSKKLJLTGN2KG5MEUNZSJNDEERRXLFFFASCNJVBEIRKLJNHFQQ2RINDU2QJSGJFEGVSYKVBUMUCNJRLUUSCQKFIEYRCDIZDVETCKJJGU6M2JKMZEST2HIVNEWT2IK5LDOWSNIU2FC7COIE" -H "Cache-Control: no-cache" https://api.einstein.ai/v2/vision/train/TLBDYOVUVEX3WXTYCHLO674WTU
Step 3.2: Check metrics
curl -X GET -H "Authorization: Bearer <TOKEN>" -H "Cache-Control: no-cache" https://api.einstein.ai/v2/vision/models/TLBDYOVUVEX3WXTYCHLO674WTU
{
"id": "X76USM4Q3QRZRODBDTUGDZEHJU",
"metricsData": { "f1": [ 0.8571428571428571, 0.75 ],
"labels": [ "Mountains", "Beaches" ],
"testAccuracy": 0.9333, "trainingLoss": 0.0637, "confusionMatrix": [ [ 8, 1 ], [ 0, 6 ] ],
"trainingAccuracy": 0.9814 }, "createdAt": "2019-02-21T22:19:25.000+0000", "language": "N/A", "object": "metrics" }
What do these metrics mean?
So I looked online, I started here.
This is a brilliant article that explains what an F1 score is and also introduces concepts like Precision and Recall - which are 2 core components of the F1 score.
For now, let's just say that these numbers signify how good a model is.
So, coming back to what Einstein returned to us.
metricsData": { "f1": [ 0.8571428571428571, 0.75 ],
"labels": [ "Mountains", "Beaches" ],
It seems that the Mountains model has an F1 score of 0.857 or 85.7%.
The Beaches model has an F1 score of 0.75 or 75%.
I still don’t know if these numbers are high enough.
This article, then pointed me to another article here to explain the difference between testing and training model data.
Step 4: Predicting
In this example, we pass a sample image to understand how good the prediction is.
curl -X POST -H "Authorization: Bearer <TOKEN>" -H "Cache-Control: no-cache" -H "Content-Type: multipart/form-data" -F "sampleLocation=http://einstein.ai/images/546212389.jpg" -F "modelId=TLBDYOVUVEX3WXTYCHLO674WTU>" https://api.einstein.ai/v2/vision/predict
Response:
{"probability":0.99778014,"label":"Beaches"},{"probability":0.0022198444,"label":"Mountains"
99.7% probability that it is a beach.
That's pretty good!