sensible-so
Sensible Extractions API

Extract structured data from documents synchronously or asynchronously. Supports sync `POST /extract/{document_type}`, async `POST /extract_from_url`, async via Sensible-signed `POST /generate_upload_url`, portfolio (multi-document) extractions, CSV and Excel output, `/extractions` listing and `/documents/{id}` retrieval, daily coverage statistics, and review auth-token issuance for human-in-the-loop workflows. All endpoints are bearer-auth and webhook-capable.
Documentation GitHub OpenAPI
OpenAPI Specification

openapi: 3.0.3
info:
  title: Sensible Extractions API
  version: v0
  description: Extract structured data from documents synchronously and asynchronously. Supports sync extract, async extract
    from your URL, async extract via a Sensible-signed upload URL, portfolio (multi-document) extractions, CSV and Excel output,
    extraction listing and retrieval, coverage statistics, and review auth tokens for human-in-the-loop workflows.
  contact:
    name: Sensible
    url: https://www.sensible.so
    email: [email protected]
  license:
    name: Proprietary
    url: https://www.sensible.so/terms
servers:
- url: https://api.sensible.so/v0
  description: Production server
security:
- bearerAuth: []
tags:
- name: Document
  description: Extract data from documents
- name: Get Excel from documents
  description: Convert extracted document data to spreadsheet
- name: Portfolio
  description: Extract data from multiple documents bundled into single PDF files
- name: Retrieve extractions
  description: Retrieve data extracted asynchronously from documents
paths:
  /extract/{document_type}/{config_name}:
    post:
      operationId: extract-data-from-a-document-with-config
      summary: Extract data from a document using specified config
      description: 'This endpoint''s behavior identical to the [Extract data from a document](https://docs.sensible.so/reference/extract-data-from-a-document)
        endpoint''s behavior, except that Sensible uses the specified config to extract data from the document instead of
        automatically choosing the best-scoring extraction in the document type.

        '
      parameters:
      - $ref: '#/components/parameters/document_type'
      - $ref: '#/components/parameters/config_name'
      - $ref: '#/components/parameters/environment'
      - $ref: '#/components/parameters/document_name'
      requestBody:
        $ref: '#/components/requestBodies/SupportedFileTypes'
      tags:
      - Document
      responses:
        '200':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ExtractionSyncResponse'
          description: 'The structured data extracted from the document.

            '
        '400':
          $ref: '#/components/responses/400'
        '401':
          $ref: '#/components/responses/401'
        '415':
          $ref: '#/components/responses/415'
        '429':
          $ref: '#/components/responses/429'
        '500':
          $ref: '#/components/responses/500'
  /generate_csv/{ids}:
    get:
      operationId: get-csv-extraction
      summary: Get CSV extraction
      description: 'You can use this endpoint to get CSV files from documents, for example, from PDFs. In more detail, this
        endpoint converts your JSON document extraction to a comma-separated values.

        To compile multiple documents into one CSV file, specify the IDs of their recent extractions in the request separated
        by commas, for example,

        `/generate_csv/867514cc-fce7-40eb-8e9d-e6ec48cdac34,5093c65f-05bd-46a3-8df7-da3ed00f6d35`.

        For the best compiled spreadsheet results, configure your SenseML so that the documents output identically named fields.

        For more information about the conversion process, see [SenseML to spreadsheet reference](https://docs.sensible.so/docs/excel-reference).

        For a list of document file types that Sensible can extract data from, see [Supported file types](https://docs.sensible.so/docs/file-types).

        Call this endpoint after an extraction completes. For more information about checking extraction status,

        see the `GET /documents/{id}` endpoint.

        '
      parameters:
      - $ref: '#/components/parameters/ids'
      tags:
      - Get Excel from documents
      responses:
        '200':
          description: 'Indicates the extraction successfully converted to an CSV file. This response contains the download
            URL for the CSV file. The link

            expires after 15 minutes.

            '
          content:
            application/json:
              schema:
                properties:
                  url:
                    type: string
                    format: url
                    description: The download URL for the CSV file
                    example: https://sensible-so-document-type-bucket-dev-us-west-2.s3.us-west-2.amazonaws.com/sensible/fc3484c5-3f35-4129-bb29-0ad1291ee9f8/EXTRACTION/14d82783-c12b-4e70-b0ae-ca1ce35a9836.csv?REDACTED
        '400':
          $ref: '#/components/responses/400'
        '401':
          $ref: '#/components/responses/401'
        '415':
          $ref: '#/components/responses/415'
        '500':
          $ref: '#/components/responses/500'
  /extract/{document_type}:
    post:
      operationId: extract-data-from-a-document
      summary: Extract data from a document (sync)
      description: "\n**Note:** Use this endpoint for testing. Use the asynchronous extraction endpoints when in production.\n\
        \nExtract data from a local document synchronously.\n\nTo explore this endpoint, use this interactive API reference,\
        \ or use one of the following options:\n\n- For a quick \"hello world\" response to this endpoint, see the [API quickstart](https://docs.sensible.so/docs/quickstart)\n\
        - For a step-by-step tutorial about calling this endpoint, see [Try synchronous extraction](https://docs.sensible.so/docs/api-tutorial-sync).\n\
        - Run this endpoint in the Sensible Postman collection.\n  [![Run in Postman](https://run.pstmn.io/button.svg)](https://god.gw.postman.com/run-collection/16839934-45339059-3fec-4c31-a891-9a12a3e1c22b?action=collection%2Ffork&collection-url=entityId%3D16839934-45339059-3fec-4c31-a891-9a12a3e1c22b%26entityType%3Dcollection%26workspaceId%3Ddbde09dc-b7dd-487d-a68f-20d32b008f90)\n\
        \nThere are two options for posting the document bytes.\n  1. (often preferred) specify the non-encoded document bytes\
        \ as the entire request body,and specify the `Content-Type` header, for example,\"application/pdf\" or \"image/jpeg\"\
        .\n     See the following for supported file formats.\n  2. Base64 encode the document bytes, specify them in a body\
        \ \"document\" field, and specify application/json for the `Content-Type` header.\n\nFor a list of  supported document\
        \ file types, see [Supported file types](https://docs.sensible.so/docs/file-types).\n"
      parameters:
      - $ref: '#/components/parameters/document_type'
      - $ref: '#/components/parameters/environment'
      - $ref: '#/components/parameters/document_name'
      requestBody:
        $ref: '#/components/requestBodies/SupportedFileTypes'
      tags:
      - Document
      responses:
        '200':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ExtractionSyncResponse'
          description: 'The structured data extracted from the document.

            '
        '400':
          $ref: '#/components/responses/400'
        '401':
          $ref: '#/components/responses/401'
        '415':
          $ref: '#/components/responses/415'
        '429':
          $ref: '#/components/responses/429'
        '500':
          $ref: '#/components/responses/500'
  /extract_from_url:
    post:
      operationId: provide-a-download-url-for-a-pdf-portfolio
      summary: Extract portfolio at your URL
      description: '

        Use this endpoint with multiple documents that are packaged into one file (a "portfolio"). For a list of supported
        file types, see [Supported file types](https://docs.sensible.so/docs/file-types).

        Segments a portfolio file at the specified `document_url` into the specified document types (for example, 1099, w2,
        and bank_statement)

        and then runs extractions asynchronously for each document Sensible finds in the portfolio. Take the following steps.

        1. Run this endpoint.

        3. To retrieve the extraction, use a webhook, or use the extraction `id` returned in the  response to poll the GET
        documents/{id} endpoint.

        For more about extracting from portfolios, see [Multi-document extractions](https://docs.sensible.so/docs/portfolio).

        '
      parameters:
      - $ref: '#/components/parameters/environment'
      - $ref: '#/components/parameters/document_name'
      requestBody:
        content:
          application/json:
            schema:
              type: object
              x-internal-note: ocr_engine and ocr_every_page are accepted by the backend (src/api/extract-from-url/handler.ts:62-68)
                but deliberately not documented publicly.
              properties:
                document_url:
                  $ref: '#/components/schemas/DocumentUrl'
                types:
                  $ref: '#/components/schemas/DocumentTypeNames'
                segment_documents_with:
                  $ref: '#/components/schemas/SegmentDocumentsWith'
                webhook:
                  $ref: '#/components/schemas/Webhook'
                extra_data:
                  $ref: '#/components/schemas/ExtraDataRecord'
              required:
              - types
              - document_url
      tags:
      - Portfolio
      responses:
        '200':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ExtractFromUrlPortfolioResponse'
          description: Returns the ID to use to retrieve the extraction.
        '400':
          $ref: '#/components/responses/400'
        '401':
          $ref: '#/components/responses/401'
        '415':
          $ref: '#/components/responses/415'
        '429':
          $ref: '#/components/responses/429'
        '500':
          $ref: '#/components/responses/500'
  /extractions/statistics:
    get:
      operationId: statistics
      summary: Get extraction statistics
      tags:
      - Retrieve extractions
      description: Returns daily extraction coverage statistics as a `coverage_histogram` per config.  Sensible returns coverage
        for each config that was used for at least one extraction performed in the specified environments  in the specified
        time period. For more information about coverage, see [Monitoring extractions](https://docs.sensible.so/docs/metrics).
        For more information about the returned `coverage_histogram`, see the response model.
      parameters:
      - $ref: '#/components/parameters/start_date_config'
      - $ref: '#/components/parameters/end_date_config'
      - $ref: '#/components/parameters/environments_statistics'
      responses:
        '200':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/StatisticsResponse'
          description: Returns daily statistics for configs in the specified time period.
        '400':
          $ref: '#/components/responses/400'
        '401':
          $ref: '#/components/responses/401'
        '415':
          $ref: '#/components/responses/415'
        '500':
          $ref: '#/components/responses/500'
  /generate_upload_url:
    post:
      operationId: generate-an-upload-url-for-a-pdf-portfolio
      summary: Extract portfolio at a Sensible URL
      description: 'Use this endpoint with multiple documents that are packaged into one file (a "portfolio"). For a list
        of supported file types, see [Supported file types](https://docs.sensible.so/docs/file-types).

        Segments a portfolio file into the specified document types (for example, 1099, w2, and bank_statement) and then runs
        extractions

        asynchronously for each document Sensible finds in the portfolio.  Take the following steps -

        1. Use this endpoint to generate a Sensible URL.

        2. PUT the document you want to extract data from at the URL, where `SENSIBLE_UPLOAD_URL` is the URL you received

        from this endpoint''s response. For more information about how to PUT the document, see the [generate_upload_url/{document_type}](https://docs.sensible.so/reference/generate-an-upload-url)
        endpoint.

        3. To retrieve the extraction, use a webhook, or use the extraction `id` returned in the  response to poll the GET
        documents/{id} endpoint.

        For more about extracting from portfolios, see [Multi-document extractions](https://docs.sensible.so/docs/portfolio).

        '
      parameters:
      - $ref: '#/components/parameters/environment'
      - $ref: '#/components/parameters/document_name'
      requestBody:
        content:
          application/json:
            schema:
              type: object
              x-internal-note: ocr_engine and ocr_every_page are accepted by the backend (src/api/generate-upload-url/handler.ts:67-76)
                but deliberately not documented publicly.
              properties:
                webhook:
                  $ref: '#/components/schemas/Webhook'
                types:
                  $ref: '#/components/schemas/DocumentTypeNames'
                segment_documents_with:
                  $ref: '#/components/schemas/SegmentDocumentsWith'
                extra_data:
                  $ref: '#/components/schemas/ExtraDataRecord'
              required:
              - types
      tags:
      - Portfolio
      responses:
        '200':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/UploadPortfolioResponse'
          description: Returns the upload_url at which to PUT the document for extraction
        '400':
          $ref: '#/components/responses/400'
        '401':
          $ref: '#/components/responses/401'
        '415':
          $ref: '#/components/responses/415'
        '429':
          $ref: '#/components/responses/429'
        '500':
          $ref: '#/components/responses/500'
  /documents/{id}:
    get:
      operationId: retrieving-results
      summary: Retrieve extraction by ID
      description: 'Use this endpoint in conjunction with asynchronous extraction requests to retrieve your results.

        You can also use this endpoint to retrieve the results for documents extractions from the synchronous /extract endpoint.

        To poll extraction status, check the `status` field in this endpoint''s response.

        When the extraction completes, the returned status is `COMPLETE` and the response includes results in the

        `parsed_document` field.  For fields in the extraction for which Sensible couldn''t find a value, Sensible returns
        null.

        '
      parameters:
      - $ref: '#/components/parameters/id'
      tags:
      - Retrieve extractions
      responses:
        '200':
          content:
            application/json:
              schema:
                oneOf:
                - $ref: '#/components/schemas/ExtractionSingleRetrievalResponse'
                - $ref: '#/components/schemas/ExtractionPortfolioRetrievalResponse'
          description: Returns the extraction.
        '400':
          $ref: '#/components/responses/400'
        '401':
          $ref: '#/components/responses/401'
        '415':
          $ref: '#/components/responses/415'
        '500':
          $ref: '#/components/responses/500'
  /generate_upload_url/{document_type}/{config_name}:
    post:
      operationId: generate-an-upload-url-with-config
      summary: Extract doc at a Sensible URL using specified config
      description: 'This endpoint''s behavior is identical to the [Extract doc at a Sensible URL](https://docs.sensible.so/reference/generate-an-upload-url)
        endpoint''s behavior, except that Sensible uses the specified config to extract data from the document instead of
        automatically choosing the best-scoring extraction in the document type.

        '
      parameters:
      - $ref: '#/components/parameters/document_type'
      - $ref: '#/components/parameters/environment'
      - $ref: '#/components/parameters/document_name'
      - $ref: '#/components/parameters/config_name'
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/GenerateUrlRequest'
      tags:
      - Document
      responses:
        '200':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/UploadResponse'
          description: Returns the upload_url at which to PUT the document for extraction
        '400':
          $ref: '#/components/responses/400'
        '401':
          $ref: '#/components/responses/401'
        '415':
          $ref: '#/components/responses/415'
        '429':
          $ref: '#/components/responses/429'
        '500':
          $ref: '#/components/responses/500'
  /extract_from_url/{document_type}/{config_name}:
    post:
      operationId: provide-a-download-url-with-config
      summary: Extract doc at your URL using config
      description: 'This endpoint''s behavior is identical to the [Extract doc at your URL](https://docs.sensible.so/reference/extract-from-url)
        endpoint''s behavior, except that Sensible uses the specified config to extract data from the document instead of
        automatically choosing the best-scoring extraction in the document type.

        '
      parameters:
      - $ref: '#/components/parameters/document_type'
      - $ref: '#/components/parameters/environment'
      - $ref: '#/components/parameters/document_name'
      - $ref: '#/components/parameters/config_name'
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ExtractFromUrlRequest'
      tags:
      - Document
      responses:
        '200':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ExtractFromUrlResponse'
          description: Returns the ID to use to retrieve the extraction
        '400':
          $ref: '#/components/responses/400'
        '401':
          $ref: '#/components/responses/401'
        '415':
          $ref: '#/components/responses/415'
        '429':
          $ref: '#/components/responses/429'
        '500':
          $ref: '#/components/responses/500'
  /extractions:
    get:
      operationId: list-extractions
      summary: List extractions
      tags:
      - Retrieve extractions
      description: "Use this endpoint to get a filtered list of past extractions.\nThis endpoint returns a summary for each\
        \ extraction, listed in reverse chronological order. \nTo get details about an extraction, use the [Retrieve extraction\
        \ by ID](https://docs.sensible.so/reference/retrieving-results) endpoint.\nThis endpoint uses keyset pagination to\
        \ retrieve the next page of results.\nBy default it returns a first page of 20 extractions and an opaque `continuation_token`\
        \ that you can pass in the next request to get the next page of results, until the endpoint returns `continuation_token`\
        \ to indicate the last page. \nUse the `limit` parameter to configure page size. \n"
      parameters:
      - $ref: '#/components/parameters/start_date'
      - $ref: '#/components/parameters/end_date'
      - $ref: '#/components/parameters/page_limit'
      - $ref: '#/components/parameters/continuation_token'
      - $ref: '#/components/parameters/configuration_ids'
      - $ref: '#/components/parameters/document_type_ids'
      - $ref: '#/components/parameters/environments'
      - $ref: '#/components/parameters/statuses'
      - $ref: '#/components/parameters/min_coverage'
      - $ref: '#/components/parameters/max_coverage'
      - $ref: '#/components/parameters/review_statuses'
      responses:
        '200':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ExtractionsResponseFiltered'
          description: Returns list of summarized extractions.
        '400':
          $ref: '#/components/responses/400'
        '401':
          $ref: '#/components/responses/401'
        '415':
          $ref: '#/components/responses/415'
        '500':
          $ref: '#/components/responses/500'
  /generate_excel/{ids}:
    get:
      operationId: get-excel-extraction
      summary: Get Excel extraction
      description: 'You can use this endpoint to get Excel files from documents, for example from PDFs. In more detail, this
        endpoint converts your JSON document extraction to an Excel spreadsheet.

        To compile multiple documents into one Excel file, specify the IDs of their recent extractions in the request separated
        by commas, for example,

        `/generate_excel/867514cc-fce7-40eb-8e9d-e6ec48cdac34,5093c65f-05bd-46a3-8df7-da3ed00f6d35`.

        For the best compiled spreadsheet results, configure your SenseML so that the documents output identically named fields.

        For more information about the conversion process, see [SenseML to spreadsheet reference](https://docs.sensible.so/docs/excel-reference).


        For portfolio extractions, Sensible returns an Excel file containing fields for all the documents it finds in the
        PDF. For more information, see [Multi-document spreadsheet](https://docs.sensible.so/docs/excel-reference#multi-document-spreadsheet).


        For a list of document file types that Sensible can extract data from, see [Supported file types](https://docs.sensible.so/docs/file-types).

        Call this endpoint after an extraction completes. For more information about checking extraction status,

        see the `GET /documents/{id}` endpoint.

        '
      parameters:
      - $ref: '#/components/parameters/ids'
      tags:
      - Get Excel from documents
      responses:
        '200':
          description: 'Indicates the extraction successfully converted to an Excel file. This response contains the download
            URL for the Excel file. The link

            expires after 15 minutes.

            '
          content:
            application/json:
              schema:
                properties:
                  url:
                    type: string
                    format: url
                    description: The download URL for the Excel file
                    example: https://sensible-so-document-type-bucket-dev-us-west-2.s3.us-west-2.amazonaws.com/sensible/fc3484c5-3f35-4129-bb29-0ad1291ee9f8/EXTRACTION/14d82783-c12b-4e70-b0ae-ca1ce35a9836.xlsx?REDACTED
        '400':
          $ref: '#/components/responses/400'
        '401':
          $ref: '#/components/responses/401'
        '415':
          $ref: '#/components/responses/415'
        '500':
          $ref: '#/components/responses/500'
components:
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      description: Bearer token using a Sensible API key. Create keys at https://app.sensible.so/account/.
  schemas:
    Charged:
      type: integer
      example: 1
      description: The number of extractions charged to your account for this extraction ID.
    PostprocessorOutput:
      type: object
      additionalProperties: true
      description: A custom schema that you define using a [postprocessor](https://docs.sensible.so/docs/postprocessor). For
        example, define this output when your app consumes a pre-existing schema and you don't want to use   Sensible's `parsed_document`
        schema.
    ReviewStatus:
      type: string
      enum:
      - NEEDS_REVIEW
      - APPROVED
      - REJECTED
      example: NEEDS_REVIEW
      description: The extraction's review status. For more information, see [Human review](https://docs.sensible.so/docs/human-review).
        Specify a webhook in the extraction request so that you can get a push notification when  review status changes to
        `APPROVED` or `REJECTED` for extractions that returned `NEEDS_REVIEW`. Sensible omits this property from the extraction
        response  if the extraction doesn't need review.
    ContentTypeResponse:
      type: string
      description: 'The content type of the document.

        '
      example: image/png
    Coverage:
      type: number
      description: The coverage score measures how fully an extraction captured all your target data in the document.  It's
        a percentage comparing non-null, [validated](https://docs.sensible.so/docs/validate-extractions) fields to total fields  returned
        by a config for a document. For example, a coverage score of 70% for an extraction with no  validation errors means
        that 30% of fields were null. For more information about scoring,  see [Monitoring extraction metrics](https://docs.sensible.so/docs/metrics).
      example: 0.75
    EnvironmentResponse:
      description: Name of the environment to which the configuration used by this extraction was published.
      example: DEVELOPMENT
      type: string
    ExtractionSyncResponse:
      type: object
      properties:
        id:
          $ref: '#/components/schemas/ExtractionId'
        created:
          $ref: '#/components/schemas/ExtractionCreated'
        type:
          $ref: '#/components/schemas/DocumentTypeName'
        status:
          $ref: '#/components/schemas/ExtractionStatus'
        completed:
          $ref: '#/components/schemas/ExtractionCompleted'
        configuration:
          $ref: '#/components/schemas/ConfigurationName'
        configuration_version:
          $ref: '#/components/schemas/ConfigurationVersion'
        parsed_document:
          $ref: '#/components/schemas/ParsedDocument'
        validations:
          $ref: '#/components/schemas/Validations'
        file_metadata:
          $ref: '#/components/schemas/FileMetadata'
        validation_summary:
          $ref: '#/components/schemas/ValidationsSummary'
        errors:
          $ref: '#/components/schemas/Errors'
        classification_summary:
          $ref: '#/components/schemas/ClassificationSummary'
        page_count:
          type: integer
          example: 100
          description: Total number of pages in the document.
        environment:
          $ref: '#/components/schemas/EnvironmentResponse'
        document_name:
          $ref: '#/components/schemas/DocName'
        content_type:
          $ref: '#/components/schemas/ContentTypeResponse'
        coverage:
          $ref: '#/components/schemas/Coverage'
        reviewStatus:
          $ref: '#/components/schemas/ReviewStatus'
        charged:
          $ref: '#/components/schemas/Charged'
        postprocessorOutput:
          $ref: '#/components/schemas/PostprocessorOutput'
    DocName:
      type: string
      description: If you specify the filename of the document using the `document_name` parameter, then  Sensible displays
        the name in extraction history in the Sensible app  and returns the name in the extraction response.
      example: example.pdf
    Classification:
      type: object
      properties:
        configuration:
          $ref: '#/components/schemas/ConfigurationName'
        fingerprints_present:
          type: integer
          example: 1
          description: The number of this config's fingerprints that Sensible found in the document.
        fingerprints:
          type: integer
          example: 1
          description: The number of fingerprints defined in this config.
        score:
          $ref: '#/components/schemas/Score'
    ConfigurationName:
      type: string
      description: Name of the "configuration", a collection of SenseML queries for extracting document data.
      example: config_for_x_company
    ConfigurationVersion:
      type: string
      description: Version number for the configuration.
      example: N39i3ZvEbPCkcjOtYIAU1_ADSovnUC5I
    DocumentTypeName:
      description: Unique user-friendly name for a document type
      example: auto_insurance_quotes_all_carriers
      type: string
    ClassificationSummary:
      type: array
      description: Metadata about how Sensible scores configs against the document to extract from. By default, Sensible compares
        all configs in the document type, then chooses the best extraction using fingerprints, scores, or a combination of
        the two. When two extractions tie by score and fingerprints, Sensible chooses the first configuration in alphabetic
        order. For more information, see [fingerprints](https://docs.sensible.so/docs/fingerprint#notes).
      items:
        $ref: '#/components/schemas/Classification'
      example:
      - configuration: config_for_x_company
        fingerprints: 2
        fingerprints_present: 2
        score:
          value: 3
          fields_present: 4
          penalities: 0.5
      - configuration: acme_co
        fingerprints: 2
        fingerprints_present: 2
        score:
          value: 0
          fields_present: 2
          penalities: 1.5
    FileMetadata:
      type: object
      description: Metadata about the PDF file, for example author, authoring tool, and modified date.
      properties:
        metadata:
          type: object
          description: Raw metadata embedded in the PDF. Returned if available, without data normalization.
        error:
          type: string
          description: Errors Sensible encountered when attempting to retrieve metadata
          example: 'Error retrieving PDF metadata: Invalid PDF structure'
        info:
          type: object
          description: Normalized metadata about the PDF, returned if available.
          properties:
            author:
              type: string
              description: The name of the person who created the document.
              example: Jay S. Schiller
            title:
              type: string
              description: Title assigned to the PDF by the PDF producer.
              example: file123
            creator:
              type: string
              description: If the document was converted to PDF from another format, the name of the application that created
                the original document from which it was converted.
              example: macOS Version 11.2 (Build 20D64) Quartz PDFContext
            producer:
              type: string
              description: If the document was converted to PDF from another format, the name of the application that converted
                it to PDF
              example: Preview
            creation_date:
              type: string
              description: File creation date
              example: '2022-08-02T18:09:31.000+00:00'
            modification_date:
              type: string
              description: File modification date
              example: '2022-08-03T15:09:23.000+00:00'
            error:
              type: string
              description: Errors Sensible encountered when attempting to retrieve metadata.
    Score:
      type: object
      description: The score for the extraction, used to help choose the best extraction.
      properties:
        value:
          type: number
          example: 17
          description: The score total is fields_present minus penalty points. In the absence of fingerprints, Sensible returns
            the extraction in the document type with the highest score.
        fields_present:
          type: integer
          example: 17
          description: Number of non-null fields Sensible extracted from the document using this config
        penalties:
          type: number
          example: 1.5
          description: Errors are 1 penalty point and warnings are 0.5 points. See the validation_summary for a breakdown.
    ParsedDocument:
      description: 'Data extracted from the document, structured as an array of fields.

        Configure the verbosity parameter in the SenseML configuration to return

        extraction metadata, such as:

        - page numbers

        - the bounding polygons that

        define line coordinates

        - for text that Sensible OCR''d, confidence scores.

        For more information, see [Verbosity](https://docs.sensible.so/docs/verbosity).

        '
      type: object
      example:
        policy_number:
          type: number
          value: 123456789
          lines:
          - text: '123456789'
            page: 0
            boundingPolygon:
            - x: 6.458
              y: 2.601
            - x: 7.354
              y: 2.601
            - x: 7.354
              y: 2.767
            - x: 6.458
              y: 2.767
        name_insured:
          type: string
          value: Petar Petrov
          lines:
          - text: Petar Petrov
            page: 0
            boundingPolygon:
   

# --- truncated at 32 KB (66 KB total) ---
# Full source: https://raw.githubusercontent.com/api-evangelist/sensible-so/refs/heads/main/openapi/sensible-extractions-api-openapi.yml
Sensible Extractions API

Documentation

Specifications

Examples

Schemas & Data

Other Resources

OpenAPI Specification