swagger: '2.0'
info:
version: '1.0'
title: Microsoft Azure Computer Vision API
description: >-
The Computer Vision API provides state-of-the-art algorithms to process
images and return information. For example, it can be used to determine if
an image contains mature content, or it can be used to find all the faces in
an image. It also has other features like estimating dominant and accent
colors, categorizing the content of images, and describing an image with
complete English sentences. Additionally, it can also intelligently
generate images thumbnails for displaying large images effectively.
securityDefinitions:
apim_key:
type: apiKey
name: Ocp-Apim-Subscription-Key
in: header
security:
- apim_key: []
x-ms-parameterized-host:
hostTemplate: '{AzureRegion}.api.cognitive.microsoft.com'
parameters:
- $ref: ../../../Common/ExtendedRegions.json#/parameters/AzureRegion
basePath: /vision/v1.0
schemes:
- https
paths:
/models:
get:
description: >-
This operation returns the list of domain-specific models that are
supported by the Computer Vision API. Currently, the API only supports
one domain-specific model: a celebrity recognizer. A successful response
will be returned in JSON. If the request failed, the response will
contain an error code and a message to help understand what went wrong.
operationId: microsoftAzureListmodels
produces:
- application/json
responses:
'200':
description: List of available domain models.
schema:
$ref: '#/definitions/ListModelsResult'
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful List Domains request:
$ref: ./examples/SuccessfulListDomainModels.json
summary: Microsoft Azure Get Models
tags:
- Models
/analyze:
post:
description: >-
This operation extracts a rich set of visual features based on the image
content. Two input methods are supported -- (1) Uploading an image or
(2) specifying an image URL. Within your request, there is an optional
parameter to allow you to choose which features to return. By default,
image categories are returned in the response.
operationId: microsoftAzureAnalyzeimage
consumes:
- application/json
produces:
- application/json
parameters:
- $ref: '#/parameters/VisualFeatures'
- name: details
in: query
description: >-
A string indicating which domain-specific details to return.
Multiple values should be comma-separated. Valid visual feature
types include:Celebrities - identifies celebrities if detected in
the image.
type: array
required: false
collectionFormat: csv
items:
type: string
x-nullable: false
x-ms-enum:
name: Details
modelAsString: false
enum:
- Celebrities
- Landmarks
- $ref: '#/parameters/ServiceLanguage'
- $ref: ../../../Common/Parameters.json#/parameters/ImageUrl
responses:
'200':
description: >-
The response include the extracted features in JSON format.Here is
the definitions for enumeration typesClipartTypeNon-clipart = 0, ambiguous = 1, normal-clipart = 2, good-clipart =
3.LineDrawingTypeNon-LineDrawing = 0,LineDrawing = 1.
schema:
$ref: '#/definitions/ImageAnalysis'
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful Analyze with Url request:
$ref: ./examples/SuccessfulAnalyzeWithUrl.json
summary: Microsoft Azure Post Analyze
tags:
- Analyze
/generateThumbnail:
post:
description: >-
This operation generates a thumbnail image with the user-specified width
and height. By default, the service analyzes the image, identifies the
region of interest (ROI), and generates smart cropping coordinates based
on the ROI. Smart cropping helps when you specify an aspect ratio that
differs from that of the input image. A successful response contains the
thumbnail image binary. If the request failed, the response contains an
error code and a message to help determine what went wrong.
operationId: microsoftAzureGeneratethumbnail
consumes:
- application/json
produces:
- application/octet-stream
parameters:
- name: width
type: integer
in: query
required: true
minimum: 1
maximum: 1023
description: >-
Width of the thumbnail. It must be between 1 and 1024. Recommended
minimum of 50.
- name: height
type: integer
in: query
required: true
minimum: 1
maximum: 1023
description: >-
Height of the thumbnail. It must be between 1 and 1024. Recommended
minimum of 50.
- $ref: ../../../Common/Parameters.json#/parameters/ImageUrl
- name: smartCropping
type: boolean
in: query
required: false
default: false
description: Boolean flag for enabling smart cropping.
responses:
'200':
description: The generated thumbnail in binary format.
schema:
type: file
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful Generate Thumbnail request:
$ref: ./examples/SuccessfulGenerateThumbnailWithUrl.json
summary: Microsoft Azure Post Generatethumbnail
tags:
- generateThumbnail
/ocr:
post:
description: >-
Optical Character Recognition (OCR) detects printed text in an image and
extracts the recognized characters into a machine-usable character
stream. Upon success, the OCR results will be returned. Upon failure,
the error code together with an error message will be returned. The
error code can be one of InvalidImageUrl, InvalidImageFormat,
InvalidImageSize, NotSupportedImage, NotSupportedLanguage, or
InternalServerError.
operationId: microsoftAzureRecognizeprintedtext
consumes:
- application/json
produces:
- application/json
parameters:
- $ref: '#/parameters/DetectOrientation'
- $ref: ../../../Common/Parameters.json#/parameters/ImageUrl
- $ref: '#/parameters/OcrLanguage'
responses:
'200':
description: >-
The OCR results in the hierarchy of region/line/word. The results
include text, bounding box for regions, lines and words.textAngleThe
angle, in degrees, of the detected text with respect to the closest
horizontal or vertical direction. After rotating the input image
clockwise by this angle, the recognized text lines become horizontal
or vertical. In combination with the orientation property it can be
used to overlay recognition results correctly on the original image,
by rotating either the original image or recognition results by a
suitable angle around the center of the original image. If the angle
cannot be confidently detected, this property is not present. If the
image contains text at different angles, only part of the text will
be recognized correctly.
schema:
$ref: '#/definitions/OcrResult'
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful Ocr request:
$ref: ./examples/SuccessfulOcrWithUrl.json
summary: Microsoft Azure Post Ocr
tags:
- Ocr
/describe:
post:
description: >-
This operation generates a description of an image in human readable
language with complete sentences. The description is based on a
collection of content tags, which are also returned by the operation.
More than one description can be generated for each image. Descriptions
are ordered by their confidence score. All descriptions are in English.
Two input methods are supported -- (1) Uploading an image or (2)
specifying an image URL.A successful response will be returned in JSON. If the request failed, the response will contain an error code and a
message to help understand what went wrong.
operationId: microsoftAzureDescribeimage
consumes:
- application/json
produces:
- application/json
parameters:
- name: maxCandidates
in: query
description: >-
Maximum number of candidate descriptions to be returned. The
default is 1.
type: string
required: false
default: '1'
- $ref: '#/parameters/ServiceLanguage'
- $ref: ../../../Common/Parameters.json#/parameters/ImageUrl
responses:
'200':
description: Image description object.
schema:
$ref: '#/definitions/ImageDescription'
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful Describe request:
$ref: ./examples/SuccessfulDescribeWithUrl.json
summary: Microsoft Azure Post Describe
tags:
- Describe
/tag:
post:
description: >-
This operation generates a list of words, or tags, that are relevant to
the content of the supplied image. The Computer Vision API can return
tags based on objects, living beings, scenery or actions found in
images. Unlike categories, tags are not organized according to a
hierarchical classification system, but correspond to image content.
Tags may contain hints to avoid ambiguity or provide context, for
example the tag 'cello' may be accompanied by the hint 'musical
instrument'. All tags are in English.
operationId: microsoftAzureTagimage
consumes:
- application/json
produces:
- application/json
parameters:
- $ref: '#/parameters/ServiceLanguage'
- $ref: ../../../Common/Parameters.json#/parameters/ImageUrl
responses:
'200':
description: Image tags object.
schema:
$ref: '#/definitions/TagResult'
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful Tag request:
$ref: ./examples/SuccessfulTagWithUrl.json
summary: Microsoft Azure Post Tag
tags:
- Tag
/models/{model}/analyze:
post:
description: >-
This operation recognizes content within an image by applying a
domain-specific model. The list of domain-specific models that are
supported by the Computer Vision API can be retrieved using the /models
GET request. Currently, the API only provides a single domain-specific
model: celebrities. Two input methods are supported -- (1) Uploading an
image or (2) specifying an image URL. A successful response will be
returned in JSON. If the request failed, the response will contain an
error code and a message to help understand what went wrong.
operationId: microsoftAzureAnalyzeimagebydomain
consumes:
- application/json
produces:
- application/json
parameters:
- name: model
in: path
description: The domain-specific content to recognize.
required: true
type: string
- $ref: '#/parameters/ServiceLanguage'
- $ref: ../../../Common/Parameters.json#/parameters/ImageUrl
responses:
'200':
description: Analysis result based on the domain model
schema:
$ref: '#/definitions/DomainModelResults'
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful Domain Model analysis request:
$ref: ./examples/SuccessfulDomainModelWithUrl.json
summary: Microsoft Azure Post Models Model Analyze
tags:
- Models
/recognizeText:
post:
description: >-
Recognize Text operation. When you use the Recognize Text interface, the
response contains a field called 'Operation-Location'. The
'Operation-Location' field contains the URL that you must use for your
Get Handwritten Text Operation Result operation.
operationId: microsoftAzureRecognizetext
parameters:
- $ref: ../../../Common/Parameters.json#/parameters/ImageUrl
- $ref: '#/parameters/HandwritingBoolean'
consumes:
- application/json
produces:
- application/json
responses:
'202':
description: >-
The service has accepted the request and will start processing
later. It will return Accepted immediately and include an
Operation-Location header. Client side should further query the
operation status using the URL specified in this header. The
operation ID will expire in 48 hours.
headers:
Operation-Location:
description: >-
URL to query for status of the operation. The operation ID will
expire in 48 hours.
type: string
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful Domain Model analysis request:
$ref: ./examples/SuccessfulRecognizeTextWithUrl.json
summary: Microsoft Azure Post Recognizetext
tags:
- recognizeText
/textOperations/{operationId}:
get:
description: >-
This interface is used for getting text operation result. The URL to
this interface should be retrieved from 'Operation-Location' field
returned from Recognize Text interface.
operationId: microsoftAzureGettextoperationresult
parameters:
- name: operationId
in: path
description: >-
Id of the text operation returned in the response of the 'Recognize
Handwritten Text'
required: true
type: string
produces:
- application/json
responses:
'200':
description: Returns the operation status.
schema:
$ref: '#/definitions/TextOperationResult'
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful Domain Model analysis request:
$ref: ./examples/SuccessfulGetTextOperationResult.json
summary: Microsoft Azure Get Textoperations Operationid
tags:
- textOperations
x-ms-paths:
/analyze?overload=stream:
post:
description: >-
This operation extracts a rich set of visual features based on the image
content.
operationId: AnalyzeImageInStream
consumes:
- application/octet-stream
- multipart/form-data
produces:
- application/json
parameters:
- $ref: '#/parameters/VisualFeatures'
- name: details
in: query
description: >-
A string indicating which domain-specific details to return.
Multiple values should be comma-separated. Valid visual feature
types include:Celebrities - identifies celebrities if detected in
the image.
type: string
required: false
enum:
- Celebrities
- Landmarks
- $ref: '#/parameters/ServiceLanguage'
- $ref: ../../../Common/Parameters.json#/parameters/ImageStream
responses:
'200':
description: >-
The response include the extracted features in JSON format. Here is
the definitions for enumeration types clipart = 0, ambiguous = 1,
normal-clipart = 2, good-clipart = 3. Non-LineDrawing =
0,LineDrawing = 1.
schema:
$ref: '#/definitions/ImageAnalysis'
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful Analyze with Url request:
$ref: ./examples/SuccessfulAnalyzeWithStream.json
/generateThumbnail?overload=stream:
post:
description: >-
This operation generates a thumbnail image with the user-specified width
and height. By default, the service analyzes the image, identifies the
region of interest (ROI), and generates smart cropping coordinates based
on the ROI. Smart cropping helps when you specify an aspect ratio that
differs from that of the input image. A successful response contains the
thumbnail image binary. If the request failed, the response contains an
error code and a message to help determine what went wrong.
operationId: GenerateThumbnailInStream
consumes:
- application/octet-stream
- multipart/form-data
produces:
- application/octet-stream
parameters:
- name: width
type: integer
in: query
required: true
minimum: 1
maximum: 1023
description: >-
Width of the thumbnail. It must be between 1 and 1024. Recommended
minimum of 50.
- name: height
type: integer
in: query
required: true
minimum: 1
maximum: 1023
description: >-
Height of the thumbnail. It must be between 1 and 1024. Recommended
minimum of 50.
- $ref: ../../../Common/Parameters.json#/parameters/ImageStream
- name: smartCropping
type: boolean
in: query
required: false
default: false
description: Boolean flag for enabling smart cropping.
responses:
'200':
description: The generated thumbnail in binary format.
schema:
type: file
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful Generate Thumbnail request:
$ref: ./examples/SuccessfulGenerateThumbnailWithStream.json
/ocr?overload=stream:
post:
description: >-
Optical Character Recognition (OCR) detects printed text in an image and
extracts the recognized characters into a machine-usable character
stream. Upon success, the OCR results will be returned. Upon failure,
the error code together with an error message will be returned. The
error code can be one of InvalidImageUrl, InvalidImageFormat,
InvalidImageSize, NotSupportedImage, NotSupportedLanguage, or
InternalServerError.
operationId: RecognizePrintedTextInStream
consumes:
- application/octet-stream
- multipart/form-data
produces:
- application/json
parameters:
- $ref: '#/parameters/OcrLanguage'
- $ref: '#/parameters/DetectOrientation'
- $ref: ../../../Common/Parameters.json#/parameters/ImageStream
responses:
'200':
description: >-
The OCR results in the hierarchy of region/line/word. The results
include text, bounding box for regions, lines and words. The angle,
in degrees, of the detected text with respect to the closest
horizontal or vertical direction. After rotating the input image
clockwise by this angle, the recognized text lines become horizontal
or vertical. In combination with the orientation property it can be
used to overlay recognition results correctly on the original image,
by rotating either the original image or recognition results by a
suitable angle around the center of the original image. If the angle
cannot be confidently detected, this property is not present. If the
image contains text at different angles, only part of the text will
be recognized correctly.
schema:
$ref: '#/definitions/OcrResult'
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful Ocr request:
$ref: ./examples/SuccessfulOcrWithStream.json
/describe?overload=stream:
post:
description: >-
This operation generates a description of an image in human readable
language with complete sentences. The description is based on a
collection of content tags, which are also returned by the operation.
More than one description can be generated for each image. Descriptions
are ordered by their confidence score. All descriptions are in English.
Two input methods are supported -- (1) Uploading an image or (2)
specifying an image URL.A successful response will be returned in JSON. If the request failed, the response will contain an error code and a
message to help understand what went wrong.
operationId: DescribeImageInStream
consumes:
- application/octet-stream
- multipart/form-data
produces:
- application/json
parameters:
- name: maxCandidates
in: query
description: >-
Maximum number of candidate descriptions to be returned. The
default is 1.
type: string
required: false
default: '1'
- $ref: '#/parameters/ServiceLanguage'
- $ref: ../../../Common/Parameters.json#/parameters/ImageStream
responses:
'200':
description: Image description object.
schema:
$ref: '#/definitions/ImageDescription'
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful Describe request:
$ref: ./examples/SuccessfulDescribeWithStream.json
/tag?overload=stream:
post:
description: >-
This operation generates a list of words, or tags, that are relevant to
the content of the supplied image. The Computer Vision API can return
tags based on objects, living beings, scenery or actions found in
images. Unlike categories, tags are not organized according to a
hierarchical classification system, but correspond to image content.
Tags may contain hints to avoid ambiguity or provide context, for
example the tag 'cello' may be accompanied by the hint 'musical
instrument'. All tags are in English.
operationId: TagImageInStream
consumes:
- application/octet-stream
- multipart/form-data
produces:
- application/json
parameters:
- $ref: '#/parameters/ServiceLanguage'
- $ref: ../../../Common/Parameters.json#/parameters/ImageStream
responses:
'200':
description: Image tags object.
schema:
$ref: '#/definitions/TagResult'
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful Tag request:
$ref: ./examples/SuccessfulTagWithStream.json
/models/{model}/analyze?overload=stream:
post:
description: >-
This operation recognizes content within an image by applying a
domain-specific model. The list of domain-specific models that are
supported by the Computer Vision API can be retrieved using the /models
GET request. Currently, the API only provides a single domain-specific
model: celebrities. Two input methods are supported -- (1) Uploading an
image or (2) specifying an image URL. A successful response will be
returned in JSON. If the request failed, the response will contain an
error code and a message to help understand what went wrong.
operationId: AnalyzeImageByDomainInStream
consumes:
- application/octet-stream
- multipart/form-data
produces:
- application/json
parameters:
- name: model
in: path
description: The domain-specific content to recognize.
required: true
type: string
- $ref: '#/parameters/ServiceLanguage'
- $ref: ../../../Common/Parameters.json#/parameters/ImageStream
responses:
'200':
description: Analysis result based on the domain model
schema:
$ref: '#/definitions/DomainModelResults'
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful Domain Model analysis request:
$ref: ./examples/SuccessfulDomainModelWithStream.json
/recognizeText?overload=stream:
post:
description: >-
Recognize Text operation. When you use the Recognize Text interface, the
response contains a field called 'Operation-Location'. The
'Operation-Location' field contains the URL that you must use for your
Get Handwritten Text Operation Result operation.
operationId: RecognizeTextInStream
parameters:
- $ref: '#/parameters/HandwritingBoolean'
- $ref: ../../../Common/Parameters.json#/parameters/ImageStream
consumes:
- application/octet-stream
produces:
- application/json
responses:
'202':
description: >-
The service has accepted the request and will start processing
later.
headers:
Operation-Location:
description: >-
URL to query for status of the operation. The operation ID will
expire in 48 hours.
type: string
default:
description: Error response.
schema:
$ref: '#/definitions/ComputerVisionError'
x-ms-examples:
Successful Domain Model analysis request:
$ref: ./examples/SuccessfulRecognizeTextWithStream.json
definitions:
TextOperationResult:
type: object
properties:
status:
type: string
description: Status of the text operation.
enum:
- Not Started
- Running
- Failed
- Succeeded
x-ms-enum:
name: TextOperationStatusCodes
modelAsString: false
x-nullable: false
recognitionResult:
$ref: '#/definitions/RecognitionResult'
RecognitionResult:
type: object
properties:
lines:
type: array
items:
$ref: '#/definitions/Line'
Line:
type: object
properties:
boundingBox:
$ref: '#/definitions/BoundingBox'
text:
type: string
words:
type: array
items:
$ref: '#/definitions/Word'
Word:
type: object
properties:
boundingBox:
$ref: '#/definitions/BoundingBox'
text:
type: string
BoundingBox:
type: array
items:
type: integer
x-nullable: false
ImageAnalysis:
type: object
description: Result of AnalyzeImage operation.
properties:
categories:
type: array
description: An array indicating identified categories.
items:
$ref: '#/definitions/Category'
adult:
$ref: '#/definitions/AdultInfo'
color:
$ref: '#/definitions/ColorInfo'
imageType:
$ref: '#/definitions/ImageType'
tags:
type: array
description: A list of tags with confidence level.
items:
$ref: '#/definitions/ImageTag'
description:
$ref: '#/definitions/ImageDescriptionDetails'
faces:
type: array
description: An array of possible faces within the image.
items:
$ref: '#/definitions/FaceDescription'
requestId:
type: string
description: Id of the request for tracking purposes.
metadata:
$ref: '#/definitions/ImageMetadata'
OcrResult:
type: object
properties:
language:
type: string
description: The BCP-47 language code of the text in the image.
textAngle:
type: number
format: double
description: >-
The angle, in degrees, of the detected text with respect to the
closest horizontal or vertical direction. After rotating the input
image clockwise by this angle, the recognized text lines become
horizontal or vertical. In combination with the orientation property
it can be used to overlay recognition results correctly on the
original image, by rotating either the original image or recognition
results by a suitable angle around the center of the original image.
If the angle cannot be confidently detected, this property is not
present. If the image contains text at different angles, only part of
the text will be recognized correctly.
orientation:
type: string
description: >-
Orientation of the text recognized in the image. The value
(up,down,left, or right) refers to the direction that the top of the
recognized text is facing, after the image has been rotated around its
center according to the detected text angle (see textAngle property).
regions:
type: array
description: >-
An array of objects, where each object represents a region of
recognized text.
items:
$ref: '#/definitions/OcrRegion'
OcrRegion:
type: object
description: >-
A region consists of multiple lines (e.g. a column of text in a
multi-column document).
properties:
boundingBox:
type: string
description: >-
Bounding box of a recognized region. The four integers represent the
x-coordinate of the left edge, the y-coordinate of the top edge,
width, and height of the bounding box, in the coordinate system of the
input image, after it has been rotated around its center according to
the detected text angle (see textAngle property), with the origin at
the top-left corner, and the y-axis pointing down.
lines:
type: array
items:
$ref: '#/definitions/OcrLine'
OcrLine:
type: object
description: An object describing a single recognized line of text.
properties:
boundingBox:
type: string
description: >-
Bounding box of a recognized line. The four integers represent the
x-coordinate of the left edge, the y-coordinate of the top edge,
width, and height of the bounding box, in the coordinate system of the
input image, after it has been rotated around its center according to
the detected text angle (see textAngle property), with the origin at
the top-left corner, and the y-axis pointing down.
words:
type: array
description: An array of objects, where each object represents a recognized word.
items:
$ref: '#/definitions/OcrWord'
OcrWord:
type: object
description: Information on a recognized word.
properties:
boundingBox:
type: string
description: >-
Bounding box of a recognized word. The four integers represent the
x-coordinate of the left edge, the y-coordinate of the top edge,
width, and height of the bounding box, in the coordinate system of the
input image, af
# --- truncated at 32 KB (44 KB total) ---
# Full source: https://raw.githubusercontent.com/api-evangelist/microsoft-azure/refs/heads/main/openapi/computer-vision-api-openapi-original.yml