One post tagged with "ComputerVision"

#24: Cognitive Services & SWA

May 25, 2022 · 7 min read

Senior Developer Advocate @Microsoft

Welcome to Week 4, Day 3 of #30DaysOfSWA!!

We continue our Best Practices week by looking at year another Azure Services integration option for your Azure Static Web Apps deployment. Yesterday we talked about adding search to your website using pre-trained Azure AI models. Today we'll discuss how you can use Vision AI Services.

What We'll Cover

What are Azure Vision Services
How to get started building your AI applications
How to use the JavaScript AI SDKs
How to analyze a document using Form Recognizer
How to deploy your APIs and Static Web App(SWA)
Exercise Build and deploy a SWA to analyse images

Resources

Azure Custom Vision - Documentation
Azure AI Fundamentals: Explore computer vision - Learn Module
SWA Next.js - Documentation

What are Azure Vision Services?

Azure AI Vision is part of Azure Cognitive Services - cloud-based AI services that give you Terminator-like powers.

You can take a picture and analyze it to describe the image, detect objects, landmarks and famous people or your users, read documents and scan information on an ID or a business card. All of these superpowers are available to you - an API call away - using Azure’s Cognitive and Applied AI Services. Let’s dive into what we can build and what we can customize.

Get Started building AI apps

If you are planning to infuse your application with AI, the first place you need to look at is our Applied AI Services. Applied AI services are solving the most common use cases and build on top of our Cognitive Services. In terms of Vision related services, we have two Applied AI Services, Form Recognizer and Video Analyzer.

Azure Form Recognizer is built using Optical Character Recognition(OCR), Text Analytics and Custom Text from Azure Cognitive Services and has custom trained models for things like vaccination cards, passports and tax documentation. If you are wondering why you need Form Recognizer, you can try reverse engineering some of its features by using the Cognitive Services.

Building things from scratch instea of using a ready solution

I won’t wish that for any of you and that’s why your first stop should be Applied AI Services documentation to see if your problem is solvable by any of these services. Whenever your problem is not solved or you need more flexibility, you can build with Cognitive Services, Computer Vision, Custom Vision or Face APIs.

A great way to start playing with these APIs and explore your specific use case is through Vision Studio Preview or Form Recognizer Studio preview. For example, I was wondering if I can recognize Mixed Reality devices using Computer Vision. Since these devices are very new and still not a common object, they were not recognized by our all-purpose object detection model. Before building an app, I was able to easily see that I need to train a custom model. You can check and see your logo or your product is easily recognizable or you should start training a custom model on the Studio without writing any code and find the code samples right inside the Studio.

If the object you want to detect is not recognized, you can train a custom model through customvision.ai and deploy it. You will get a custom API endpoint to call and your client implementation won’t be different than using any other service. There are couple of best practices for training a custom model. Most importantly, you need a variety of images in different contexts. For example if you want to find Waldo, you can’t just train with Waldo’s profile picture, you need to also use his pictures in a crowd.

You can read how I’ve trained my custom model and built an No Code prototype here using Power Apps AI Builder

How to use the JavaScript AI SDKs

Using any of these APIs works pretty much the same way if you are working with an image instead of a video to analyze:

Sign into your Azure account or sign up for $200 free credit.
Create an Azure Cognitive Services or Specific service resource and get the key and endpoint information.
Install the related service’s JavaScript SDK and Azure Identity SDK.
Take a picture or upload an image.
Initialize an AzureCredentials object using your resource key.
Initialize a Client object with the Azure credential object & endpoint.
Use the client’s analyze method to make an API call, wait for response.

The class and method names change, like DocumentAnalysisClient for Form Recognizer, PredictionAPIClient for Custom Vision or FaceClient for facial recognition. If you are training a custom model or using a specific model for form recognizer, you have couple of things that you will do but the most important part of it is taken care of by our SDK functionality.

How to analyze a document using Form Recognizer

Let’s see the code in action for Form Recognizer using their new JavaScript SDK. You can start with your choice of Static Web Apps templates or add the code to your existing application.

We need to import the SDK objects and your environment variables:

import {
    AzureKeyCredential,
    DocumentAnalysisClient,
    DocumentField,
    FormRecognizerApiVersion,
    PrebuiltModels,
} from "@azure/ai-form-recognizer";

const key = process.env.FORM_RECOGNIZER_KEY || "";
const endpoint = process.env.FORM_RECOGNIZER_ENDPOINT || "";

In your async callback function that starts the analysis, like a picture upload or click event callback, you will initialize the client object and poll until you have all the results from your analysis.

async function Analyze() {
    const client = new DocumentAnalysisClient(endpoint, new AzureKeyCredential(key));

    const poller = await client.beginAnalyzeDocument("prebuilt-document", formUrl);

    const { keyValuePairs, entities } = await poller.pollUntilDone();

    //Do amazing things with the data.
}

Do something with the results, either sign in your user with their ID doc data or display the key value pairs to your user.

Deploy your app to SWA

If you are using one of the SWA templates, all you need to do is push your code to share it with the word.

If you are not using a template, you can use the Azure Static Web Apps VS Code Extension or SWA CLI to deploy your app to an SWA resource.

If you have a lot of images and would like to batch process or if you are doing multiple things with your image, like detect objects, read the text in the image and translate, you might want to use a Azure Functions App. Thankfully creating and API for your Static Web App is one of the features of the VSCode extension. Check out my video for step by step instructions to deploy your SWA with Functions.

I hope you are inspired to enhance your applications with Azure Applied AI or Cognitive Services. Checkout the Build AI talk and demo to see what else you can do with AI.

Reach out to me on twitter if you have questions, you want to share what you build or discuss your ideas.

Exercise

Want to explore Azure Cognitive Service integrations with your Azure Static Web App? Try walking through one of these tutorials to get hands-on experience with development:

Build and deploy a SWA to analyse images - build a React application that analyzes an image using Azure Cognitive Services (Computer Vision) - deploy the app to Azure Static Web Apps.
Machine Learning With Custom Vision - complete this workshop where you learn to build a model to detect dog breeds using the Custom Vision API, then deploy it to Azure Static Web Apps with an Azure Functions backend.

What We'll Cover​

Resources​

What are Azure Vision Services?​

Get Started building AI apps​

How to use the JavaScript AI SDKs​

How to analyze a document using Form Recognizer​

Deploy your app to SWA​

Exercise​