Important Notice

The v5 API should now be fully functional, and future changes should maintain
backward compatibility with its current state.  Endpoints for user and account
management are still only available through the v4 API.  Documentation for
those endpoints can be found at: https://analytics.luminoso.com/api/v4/

Converting from v4

If you've been using the old v4 API, you may find the following tips helpful
when adapting your usage to the new API.  (This discussion will center around
using the API itself, and not major functionality changes like the move from
subsets to the more flexible metadata model.)  Some conceptual changes:

* Returns have been generally streamlined.  In v4, every return contained both
  a "result" and an "error", one of which would be null.  In the new API, a
  successful result is returned directly (though some have "result" keys, when
  other supplemental information is also coming back), and errors no longer
  include a null "result".  See "Responses" for more information.

* As much as possible, the API has been designed around consistent objects.
  For example, while there has always been the idea of a "project record",
  the v4 API would return the entire record from the endpoint that retrieved
  records but only a partial record from the endpoints to create or copy a
  project.  Now, anything that returns something like a project record will
  return a complete project record.

* Many endpoints have moved or even been combined.  For instance, retrieving
  documents that match a search is now part of GET /.../docs/, rather than
  being a separate GET /.../docs/search/ endpoint.  Uploading documents to
  a project is now done through POST /projects/<project_id>/upload/, rather
  than something under /.../docs/.

* A few endpoints that we know were rarely used, and that seem either
  superfluous or incompatible with the new system, have been removed.

* Rather than having "terms" and "topics", the former of which was extremely
  overloaded terminology, the v5 API uses unified references to "concepts".
  What were "terms" are now simply "concepts" (or sometimes "top concepts");
  what were topics are now "saved concepts".  The unification reflects the fact
  that anything you can do with one or the other--search, get match counts, and
  so forth--is really something you can do with both.

And some changes to particular endpoints:

* Project endpoints no longer include the account ID, so that a project you
  used to access via "/api/v4/projects/a12b345c/pr123456/" will now be accessed
  via "/api/v5/projects/pr123456/".

* "language", which used to be an optional parameter when building a project
  that defaulted to English, is now a required parameter on project creation.
  This should minimize the likelihood of accidentally building a project in
  the wrong language.  The language can be changed through the "Edit project
  details" endpoint.

* The "ping" endpoint has been removed.  If you want to test your connection to
  the API, the "Get system status" endpoint (GET /api/v5/status/) is similarly
  unauthenticated, with a fairly small return.

* The "jobs" endpoints have also been removed.  If you want to know if there is
  a build currently running, or to see the status of the current build, the
  information is available in the "last_build_info" of the project record.
  (Unlike the previous version, this version of the API cannot return
  information about builds before the most recent.)  The v5 revision of the
  Luminoso API client uses this endpoint in its "wait_for_build" method, which
  replaces the old "wait_for" method that required having the "job number".

* Many auxiliary endpoints that involve user and account information have not
  yet been transferred to the new API.  The most notable exclusion is the
  endpoint for logging in; this, however, is a deliberate decision.  Rather
  than connecting to the API with a username and password and getting a
  short-lived token to use, we now encourage people connecting programmatically
  to acquire a long-lived token through the UI and to authenticate with that.
  (Once again, our Luminoso API client is designed to help handle this.)

Using the API

If you are interested in writing your own code to connect to this API, many of
the following sections will be helpful.  However, the easiest way to connect
is via our API client: https://pypi.org/project/luminoso-api/.  This package
includes clients for both v4 and v5; the documentation for the v5 version can
be found at https://github.com/LuminosoInsight/luminoso-api-client-python/blob/master/V5_README.md

Logging in

While some endpoints can be used without authentication, most endpoints require
authentication with an API token.  Long-lived API tokens can be acquired by
logging into the UI and going to the "API tokens" tab of "User settings".  Our
Python API client connects you via either a token you specify or one you have
previously saved, and handles authentication on subsequent calls; if you are
using your own HTTP client, you'll need to acquire a token and send it with
each authenticated request.

To make an authenticated request, include an "Authorization" HTTP header with
the string "Token " followed by your token.

Requests

Many endpoints accept resource IDs as part of the URL path.  This is indicated
by the use of <angle brackets> in the documented endpoint URL.

Endpoints also accept parameters in the query string or the request body.
Again, our Python API client will handle this for you.  If you are using your
own HTTP client, follow the following guidelines.

In query strings, boolean parameters should be passed as "true" or "false",
numeric and string parameters require no unusual encoding, and object or array
parameters should be encoded as JSON.  Of course, the query string must also be
urlencoded; your HTTP client almost certainly does this automatically.

In the request body, only JSON is accepted; it must consist of a single object
whose fields are the intended parameter values (without any further encoding),
and the content type must be set to "application/json".

Please note that, while many implementations of JSON allow "NaN" and "Infinity"
as numbers, our API will reject them as ill-formed either in a query string or
in a JSON body.

Typically the query string is used for GET and DELETE requests and the request
body for PUT and POST requests.  The Daylight API will accept parameters in
either form regardless of the request method, with the request body taking
precedence; however, following conventional practice will minimize your chance
of problems with HTTP client or proxy software.

Both the Daylight API and most HTTP clients limit the total length of a URL,
which restricts the size of query strings.  To work around this limitation, GET
endpoints in the API that may require large queries will generally also respond
to POST requests with identical results, as documented for each endpoint below.

Responses

Successful requests return HTTP status code 200 and either null or a JSON
object (other than /api/v5/, which returns this documentation as HTML).  If the
endpoint has no useful data to return, the response will be null; otherwise,
the response will be a JSON object whose format is described in the endpoint's
documentation.

Example successful return from POST /api/v5/projects/ (see "Projects" for the
full return):

    {"project_id": "pr123456",
     "account_id": "a12b345c",
     "name": "Example Project",
     "description": "An example project",
     ...}

Unsuccessful requests return HTTP status codes of 400 and above.  The response
will be a JSON object having at minimum an "error" field containing a short
string code and a "message" field containing an English description of the
error.  Other fields may give additional details about the nature of the error.

Example unsuccessful return from POST /api/v5/projects/:

    {"error": "INVALID_PARAMS",
     "message": "The listed request parameter(s) must be supplied.",
     "parameters": ["language"]}

The possible error codes and their associated HTTP status codes are:

  400 - INVALID_PARAMS
    An error that indicates that one of several things has gone wrong with the
    parameters you supplied, e.g. one was unrecognized, or a required one was
    missing.  The message will provide more information.

  400 - PROJECT_LOCKED
    Normally this error indicates that while your project was being built you
    attempted to upload new documents or rebuild the project, neither of which
    is allowed.  If your project is not being built and you receive this error,
    the project may have a fault; contact Luminoso for assistance.

  400 - PROJECT_NOT_BUILT
    Some project endpoints do not require you to have built the project yet:
    document uploading, of course, and things like changing the project's name
    or description.  Other endpoints rely on the science that results from the
    project being built; if you call one of those endpoints on a project that
    has not been built, this is the error we return.

  400 - EMAIL_NOT_ENABLED
    This error indicates that the server was asked to send email but is not
    configured to do so.

  401 - NO_TOKEN
    This error indicates that you have not supplied an API token to an endpoint
    that requires authentication.  See "Logging in".

  401 - INVALID_TOKEN
    This error indicates that you are attempting to use an invalid or expired
    API token.  If you are using a short-lived token, it has most likely
    expired; log in again to get a new one.  If you are using a long-lived
    token, use the Daylight user interface to make sure it has not been
    deleted.

  403 - INADEQUATE_PERMISSION
    This error indicates that your request was properly authenticated, but that
    you do not have permission on the requested project, account, or other
    resource.  Note that for security reasons this is also the error returned
    if the requested resource does not exist, so you should check that you have
    specified the correct ID and that the resource has not been deleted.

  404 - NOT_FOUND
  405 - METHOD_NOT_ALLOWED
    These errors indicate that the endpoint you called does not exist or does
    not support the requested method.  Check that you have spelled the endpoint
    correctly and that the base portion of the URL is correct (e.g., includes
    "/api/v5/").

The API may also return ERROR, usually with the status code 500; this indicates
an error on our end, and you should contact Luminoso for assistance.

Filters

A filter is an array of objects, each giving a constraint on a metadata field.

For a metadata field of type "string" the object should have the form

    {"name": <field name>,
     "values": <array of values>}

expressing the constraint that a document must have any of the specified values
for the field.  (The array may be empty, but the filter will match no
documents.)

For a metadata field of type "number", "score", or "date", a filter can be
specified either as an array of values of the field's type (in the same manner
as for metadata fields of type "string"), or as a range. A range filter for a
metadata field of type "number" or "score" should have the form

    {"name": <field name>,
     "minimum": <int or float>,
     "maximum": <int or float>}

expressing the constraint that a document must have a value specified for the
field which falls within the specified range.  (One of "minimum" or "maximum"
may be omitted, but not both.)  Similarly for "date", the object should be

    {"name": <field name>,
     "minimum": <int, float, or string>,
     "maximum": <int, float, or string>}

where the number or string conforms to the general format restrictions on
dates (see "Dates").

Documents matched by the filter must match all of the given constraints; that
is, the filter matches the intersection of the selected subsets.  The filter
may be an empty array, in which case it will match all documents.

For instance, the filter

    [{"name": "State", "values": ["MA", "NH"]},
     {"name": "Rating", "maximum": 3}]

would return all documents whose "State" is either "MA" or "NH", and whose
"Rating" is less than or equal to 3.

Concept Selectors

Many endpoints operate on any one of a number of "concept selectors", which
allows the user to specify which concepts they want results for.  A concept
selector is an object with a "type" field that specifies which type of concept
to use, and that must be one of the following types, with additional fields as
follows:

    {"type": "top"} - uses the project's top concepts, by relevance
      - optional field "limit", an integer (default 10, max 50000)

    {"type": "saved"} - uses the project's saved concepts
      - no other fields are allowed

    {"type": "specified"} - uses the specified concepts
      - required field "concepts", an array of concepts (see "Concepts")

    {"type": "related"} - uses concepts related to a specific search concept
      - required field "search_concept", a single concept on which to search
        (see "Concepts")
      - optional fields:
        "limit", an integer (default 10, max 50000)
        "min_doc_count", the minimum number of documents in the entire project 
            (not necessarily in the supplied filter) that the related concepts 
            must appear in to be returned (default 2)

    {"type": "suggested"} - uses our suggested clusters of concepts
      - optional fields:
        "limit", an integer (default 500, max 5000)
        "num_clusters", an integer (default 7)
        "num_cluster_concepts", an integer (default 4)

Some sample concept selectors you might pass, and the result of passing them to
the GET /concepts/ endpoint:

    {"type": "top", "limit": 20}
      - Returns the top 20 concepts in the project

    {"type": "specified", "concepts": [{"texts": ["disappointed"]}]}
      - Returns one concept, based on the text "disappointed"

    {"type": "related", "search_concept": {"texts": ["disappointed"]}}
      - Returns the 10 concepts most related to the concept "disappointed"

    {"type": "suggested"}
      - Returns 28 concepts, categorized into 7 clusters of 4 concepts each

    {"type": "suggested", "num_clusters": 10, "num_cluster_concepts": 3}
      - Returns 30 concepts, categorized into 10 clusters of 3 concepts each

Dates

Dates sent to the API, either as metadata on a document or a value in a filter,
must be in one of two formats.  The first is the subset of ISO 8601 time
formats recommended by RFC 3339, but only allowing the time zone Z: that is,
strings in the format

    <year>-<month>-<day>T<hours>:<minutes>:<seconds>Z or
    <year>-<month>-<day>T<hours>:<minutes>:<seconds>.<microseconds>Z

For example, August 4, 2017, 8:15 pm, would be written "2017-08-04T20:15:00Z".
(Note that, per RFC 3339, the T and Z may be lowercased; we will standardize
all strings to uppercase.)

The second format we accept is epoch time, i.e. the number of seconds since
midnight UTC on January 1, 1970.  In this format, the above time would be
expressed as 1501877700.

Dates returned by the API are in the string format given above.

Term management

A term management object is an object mapping term IDs to changes.  Term IDs
are the language-tagged forms found in the "term_id" field of the "terms" and
"fragments" arrays on documents, as described in "Documents".  "changes" is an
object which may include

  * "action": the name of an action that changes the way a term ID is treated
    when a project builds.  At the moment, the only valid action is:
    * "ignore": term IDs marked with this action will be entirely skipped over
      when the project builds, as if they were function words like "the" or
      "of".
    * "notice": term IDs marked with this action will stop being skipped over
      when the project builds.  This is intended to apply to function words
      that would normally be left out of the analysis, but which represent a
      genuine concept in this project, e.g. the number "529" in a project about
      financial services.
  * "new_term_id": a new term ID that will replace the term ID whenever it
    occurs.

If a term management object is sent in which a term ID's changes is an empty
object, the management information for that term ID will be removed.

An example of a term management object:

  {"enviornment|en": {"new_term_id": "environment|en"},
   "really|en": {"action": "ignore"}}

With this object stored in term management, a project subsequently built would
have "enviornment" and "environment" merged into the same concept, and would
have "really" entirely removed.

Field breakdowns

Some endpoints can break down their results across values of a metadata field,
as might be used to generate a histogram or bar chart.  These endpoints accept
objects specifying which fields should be broken down and how.

A breakdown object for a string field has the form

  {"name": <name of the metadata field>}

for a number or score field

  {"name": <name of the metadata field>, "interval": <positive number>}

and for a date field

  {"name": <name of the metadata field>,
   "interval": "year", "quarter", "month", "week", "day", or "hour"}

For example, to retrieve concept match counts for a project with birthday data
bucketed by year, supply the following:

  {"name": "birthday", "interval": "year"}

Projects

A project record has the form

    {"project_id": <string, an internal identifier>,
     "account_id": <string, the ID of the account that owns the project>,
     "name": <string, the project's name>,
     "description": <string, a description of the project>,
     "language": <string, a two-character language code>,
     "creator": <string, the username of the user that created the project>,
     "creation_date": <float, a timestamp>,
     "document_count": <integer, the total number of documents in the project>,
     "last_update": <float, a timestamp of when the project was last modified>,
     "last_successful_build_time": <float, a timestamp of when the project's
                                    last successful build terminated, or null
                                    if project is unbuilt>,
     "last_build_info": <object, see below>,
     "permissions": <list, the permissions the user has on the project>}

A project record's "last_build_info" will always be an object.  If no build has
ever been started, the object will be empty.  Otherwise, the object will always
have at least the fields

  "number", integer, the number of the last build;
  "start_time", float, a timestamp;
  "stop_time", either a timestamp (float) if the build has finished, or null if
               it is still running.

To check whether a project's build is still running, look at the "stop_time"
field on the record's "last_build_info".

If the build is no longer running, the "last_build_info" will also contain
"success", a boolean.  If "success" is false, it will additionally contain
"reason", a string explaining briefly why the build failed.

Create project

POST /api/v5/projects/

    Required parameters:
      name (string)
          readable name for the project
      language (string)
          the language for the project.  See "Get system status" for more
          information.

    Optional parameters:
      account_id (string, defaults to user's default account ID)
          ID of the account which should own the project
      description (string)
          description for the project

    Result:
      a project record

    Creates a project with a specified name in a specified language.  Project
    names and descriptions are changeable; the ID that comes back in the
    project record is an internal identifier and cannot be changed.

    A note about unique naming: you cannot create a project that has the same
    name as an existing project in your account.  Any time an endpoint attempts
    to create a project with a name that already exists, it will append
    " - <n>" for the smallest <n> not already in use.  For instance, the first
    time you try to create a project named "Test", the project will get the
    name "Test"; the second time, the name will be "Test - 1", the third time,
    "Test - 2", and so forth.
    

Add documents

POST /api/v5/projects/<project_id>/upload/

    Required parameter:
      docs (JSON-encoded array of document objects)

    Upload new documents to a project (see "Documents").  Note that documents
    do not become available until the project is built.
    

Copy project

POST /api/v5/projects/<project_id>/copy/

    Optional parameters:
      name (string, defaults to "Copy/Partial copy of [original project name]")
          readable name for the new project
      account (string, defaults to ID of the account that owns the original project)
          ID of the account that will own the new project
      description (string, defaults to [original project description])
          description for the new project
      search (JSON-encoded concept object, defaults to no search)
          search used to limit documents
      filter (JSON-encoded filter, defaults to no filter)
          filter used to limit documents

    Optional parameters (specify at most one; ignored if no search is specified)
      exact_only (boolean)
          if true, only documents with exact matches to the texts in the search
          will be used
      match_type (string)
          if "exact", uses only documents that contain exact matches for the
          texts in the search; if "conceptual", uses only documents containing
          conceptual matches and not containing exact matches
    If neither parameter is specified, returns documents containing any kind of
    match (exact, conceptual, or both).

    Additional optional parameter (ignored if not rebuilding)
      notify (boolean, defaults to false)
          if true, sends an email to the user who initiated the copy

    Result:
      a project record

    Copies a project with the given ID to a new project.  The name that gets
    returned on success is the new project name, either as given in the
    optional "name" parameter, or by default "Copy/Partial copy of X", where X
    is the project name of the source project, and "Partial copy" is used when
    a search or a filter is passed to limit the documents.  See "Create project"
    for more information about project_id as well as what happens when creating
    a project with a name that already exists in the system.

    The new copy will contain a subset of the original documents if at least
    one of "search" or "filter" are passed, in which case the copy will also
    be rebuilt.  Otherwise, the project will be copied exactly.  The
    "exact_only" parameter is only used when specifying a search, and "notify"
    is only used when rebuilding.

    To avoid potential miscopying, copying a project will not work if it is
    currently building.
    

List projects

GET /api/v5/projects/

    Optional parameters:
      fields (JSON-encoded array of strings, defaults to all fields)
          which fields to include on each returned object. Nonexistent fields
          are ignored.
      account_id (string, defaults to all accounts)
          which account to include projects from

    Result:
      an array of project records

    Returns the records for a user's available projects.  See "Projects" above
    for details about project records.
    

Get project info

GET /api/v5/projects/<project_id>/

    Optional parameter:
      fields (JSON-encoded array of strings, defaults to all fields)
          which fields to include on the returned object. Nonexistent fields
          are ignored.

    Result:
      the project record
    

Get project metadata

GET /api/v5/projects/<project_id>/metadata/

    Optional parameter:
      max_values (integer)
          the maximum number of values to send back for a string or number
          field.  If a field has more than this many values, it will be
          returned without "values".

    Result:
      {"result": <an array of objects describing the project's metadata fields,
                  sorted alphabetically by field name>}

    Metadata fields of type "string" are described by objects of the form

      {"name": <field name>,
       "type": "string",
       "values": [{"value": <field value>,
                   "count": <number of docs>},
                   ...]}

    Metadata fields of type "number" or "score" are described by objects of the
    form

      {"name": <field name>,
       "type": "<field type>",
       "minimum": <lowest value in the project>,
       "maximum": <highest value in the project>,
       "values": [{"value": <field value>,
                   "count": <number of docs>},
                  ...]}

    The "values" field for "number" metadata fields is not present if there
    are more than 100 different values in the project.

    Metadata fields of type "date" are described by objects of the form

      {"name": <field name>,
       "type": "date",
       "minimum": <earliest value in the project>,
       "maximum": <latest value in the project>}
    

Build project

POST /api/v5/projects/<project_id>/build/

    Optional parameter:
      notify (boolean, defaults to false)
          if true, sends an email to the user who started the job upon its
          completion

    Initiates a build which processes uploaded documents and stores the
    results.

    This endpoint must be called after a project's documents are uploaded, and
    subsequently if new documents are added, the project's language is changed,
    or term management settings are changed.

    Only one build can run at a time on a given project; if another build is
    running on the specified project, this endpoint will return an error.
    

Edit project details

PUT /api/v5/projects/<project_id>/

    Required parameter (specify at least one):
      name (string)
          readable name for the project
      description (string)
          description for the project
      language (string)
          the language for the project.  See "Get system status" for more
          information.

    Sets user-changeable information for the project; omitted fields are not
    changed.
    

Delete project

DELETE /api/v5/projects/<project_id>/

    This will irrevocably delete a project and clear all data from it,
    including any jobs that are currently running.  No second chances, so only
    call this if you really mean it.  Programs that call this API should insert
    their own "Are you sure?" checks.
    

Vectorize text

POST /api/v5/projects/<project_id>/vectorize/

    Required parameter:
      texts (JSON-encoded array of strings)
          an array of texts to vectorize; each text must be less than 500,000
          characters

    Optional parameter:
      legacy_term_format (boolean, default true)
          if true, returns the terms and fragments on the result objects in the
          legacy array format [term_id, part_of_speech, [start, end]]; if
          false, returns the terms and fragments in the same format as the one
          that appears on documents (see "Documents")
    For backward compatibility, this parameter currently defaults to true.  In
    the future, it will change to default to false and be deprecated.  If you
    plan to use this endpoint with new code, please pass "false" for this
    parameter.

    Result:
      an array of objects, with each object containing the original text, a
      vector, terms, and fragments

    This does *not* save the texts, or add any of their information to
    the project!
    

Documents

Documents uploaded to a project are required to be of the form

    {"text": <string, required; maximum length 500,000 characters>,
     "title": <string, default "">
     "metadata": <array of metadata fields, default []>}

in which the text field is required, and the other fields are optional.  Any
extraneous fields are invalid and will result in an error upon upload.
Metadata fields on documents for upload must be of the form

    {"name": <field name>,
     "value": <field value>,
     "type": <metadata type: one of "string", "number", "date", "score">}

For instance:

    {"text": "Some text",
     "metadata": [{"name": "State", "value": "MA", "type": "string"},
                  {"name": "Rating", "value": 5, "type": "number"}]}

Documents requested from a built project have the form

    {"doc_id": <string>,
     "text": <string>,
     "title": <string>,
     "metadata": <array of metadata fields>,
     "terms": <array of term objects>,
     "fragments": <array of term objects>,
     "vector": <vector in pack64 format>,
     "match_score": <float measuring similarity to the search if one was
                     performed, else null>}

where a term object in a document's terms and fragments is an object with
the fields "term_id" (a string representation of the term) and "start" and
"end" (integer indexes into the document text).

Get documents

GET /api/v5/projects/<project_id>/docs/

    Optional parameters:
      filter (JSON-encoded filter, defaults to all documents)
          criteria for restricting which documents to match (see "Filters")
      search (JSON-encoded concept object, defaults to no search)
          concept to search for (see "Concepts"); restricts which documents to
          return and determines an ordering/ranking for the results
      limit (integer, default 100, max 25000)
          number of documents to return
      offset (integer, default 0)
          number of documents to skip before beginning

    Optional parameters (specify at most one; ignored if no search is specified)
      exact_only (boolean)
          if true, returns only documents that contain exact matches for the
          texts in the search; if false, also returns documents containing
          conceptual matches but no exact matches
      match_type (string)
          if "exact", returns only documents that contain exact matches for the
          texts in the search; if "conceptual", returns only documents
          containing conceptual matches and not containing exact matches
    If neither parameter is specified, returns documents containing any kind of
    match (exact, conceptual, or both).

    Result:
      {"result": <array of document objects>,
       "total_count": <number of documents in the project>,
       "filter_count": <number of documents matching the filter>
       "search": <null if no search object is supplied, otherwise
                  {<concept object> with additional fields:
                   "match_count": <number of documents that matched any exact
                                   or related terms>,
                   "exact_match_count": <number of documents that matched any
                                         of the exact terms>}>

    If a search is supplied, documents are returned ranked by "match_score",
    and only documents that contain an exact or conceptual match will be
    included.
    
    This endpoint also accepts POST requests to accommodate large queries.

Delete documents

POST /api/v5/projects/<project_id>/docs/delete/

    Required parameter:
      doc_ids (JSON-encoded array of strings)
          an array of document IDs to delete

    Marks documents for removal from the project on the next rebuild.  Any
    strings in the "doc_ids" parameter that do not correspond to documents that
    exist in your project will be silently ignored.

    Note that this endpoint does not automatically trigger a rebuild, and the
    documents will remain in the project until you POST to /build/.
    

Terms

Terms represent the science underlying text, and thus in particular underlying
concepts, which are defined via texts.  Their IDs are the unified
representation of a particular concept, marked with a language tag for clarity.
For instance, the concept in an English project corresponding to the texts
"speak" and "speaks" and "speaking" and "spoke" will be "speak|en"; the concept
in a French project corresponding to "parlez" and "parlons" will be "parler|fr".

The primary use of terms in the API is term management; their IDs appear in a
list on concepts to allow management via term ID.

Get terms

GET /api/v5/projects/<project_id>/terms/

    Required parameter:
      term_ids (JSON-encoded array of strings, max length 50000)
          which terms to get statistics for

    Result:
      An array of term objects.  Each term object has the following form:

      {"term_id": <term with language tag>,
       "total_doc_count": <number of documents in the filter containing the
                           term, including fragments>,
       "distinct_doc_count": <number of documents in the filter containing the
                              term, not including fragments>,
       "relevance": <term's relevance within the filter>,
       "background_frequency": <float measuring the approximate frequency of
                                the term in the background corpus>,
       "display_text": <most common surface text for this term>,
       "all_texts": {<text>: <frequency> for all texts that correspond to the
                     term},
       "vector": <vector in pack64 format>}
    
    This endpoint also accepts POST requests to accommodate large queries.

Get term management information

GET /api/v5/projects/<project_id>/terms/manage/

    Result:
      an object mapping term IDs to "changes" objects

    Retrieves complete information about what term management has been added to
    this project.

    If you have added management information but not yet rebuilt your project,
    the changes described by the result will not yet have any effect.  The
    result returned by this endpoint will not tell you which changes have gone
    in since the last build.
    

Update term management information

PUT /api/v5/projects/<project_id>/terms/manage/

    Required parameter:
      term_management (JSON-encoded term management object)
          a term management object (see "Term management")

    Optional parameter:
      overwrite (boolean, default false)
          if true, overwrites all existing term management information with the
          term management object being sent; if false, overwrites information
          for term IDs that are included, but preserves any other existing term
          ID information

    Alters the project's term management information.  (Note that changes will
    not take effect until the project is rebuilt.)

    As an example of the "overwrite" parameter: suppose your existing term
    management information is

      {"enviornment|en": {"action": "ignore"},
       "really|en": {"action": "ignore"}}

    and you call this endpoint with the term management object

      {"enviornment|en": {"new_term_id": "environment|en"},
       "enviornmental|en": {"new_term_id": "environmental|en"}}

    If you set overwrite to true, this object will now become the entirety of
    the project's term management, and the word "really" will go back to not
    being ignored.  If you set overwrite to false, ignoring "really" will be
    kept, but the previous act of ignoring "enviornment" will be overwritten,
    and your new term management will be

      {"enviornment|en": {"new_term_id": "environment|en"},
       "enviornmental|en": {"new_term_id": "environmental|en"},
       "really|en": {"action": "ignore"}}

    To remove term management for a particular term_id, set its changes to an
    empty object.  To remove all term management information from a project,
    use an empty object for the term_management parameter and set "overwrite"
    to true.
    

Concepts

A concept is an object with keys that vary, both when sent to the API as a
parameter and when retrieved from the API as part of a result.  Individual
endpoints will provide more guidance; this section offers a general overview.

Concepts are fundamentally an array of one or more texts, and therefore all
concepts retrieved through the API will include a "texts" key whose value is an
array.  Concepts sent to the API must be objects containing a single field,
either "texts", an array of texts, or "saved_concept_id", a string, e.g.

    {"texts": ["kermit", "miss piggy", "gonzo"]}
    {"texts": ["rainbow connection"]}
    {"saved_concept_id": "0ea6d448-cff7-4b4d-b519-24b9e021572c"}

This kind of concept can be used as a search, or as part of an array of
specified concepts (see "Concept Selectors").

Concept objects returned by the API will always have the following keys:

  * "texts": an array of texts.  For top concepts, this will be an array with
    one element, which is the display form of the concept.  For other
    concepts, this will be an array of texts that define the concept.
  * "name": the name of the concept.  For saved concepts, this is the name that
    was given to the concept, for other concepts, it is the comma-separated
    list of texts in "texts".
  * "exact_term_ids": the term IDs of terms that match the text exactly.
  * "vector": the vector in pack64 format.

They may also have some of the following keys:

  * "color" and "saved_concept_id": for saved concepts only.
  * "relevance": for top concepts only.  A score for ranking concepts, based on
    their frequency in a project and their background frequency.
  * "match_score": for searches only.  A score for ranking concepts, based on
    how well they match a given search.
  * "match_count", "exact_match_count": for match count endpoints only.
  * "related_term_ids": an array of related terms, returned on the search
    concept in the get docs endpoint only.

A note about ordering: different concept selectors will order their results in
different ways.  Specified concepts are returned in the order provided; saved
concepts are returned in their stored order.  Top concepts, as noted, have a
"relevance" score, and are returned in order based on that score, most
relevant first.  Suggested score drivers (see "Get score drivers") are returned
in order based on their "importance" score, again with the highest scores
first.

Get concepts

GET /api/v5/projects/<project_id>/concepts/

    Optional parameters:
      concept_selector (concept selector object, defaults to top concepts)
          which concepts to get (see "Concept Selectors")
      filter (JSON-encoded filter, defaults to all documents)
          criteria for restricting which documents must contain the concepts
          (see "Filters")

    Result:
      {"result": <an array of concepts>,
       "filter_count": <number of documents matching the filter>,
       "total_count": <number of documents in the project>,
       "search": <null if no search object is supplied, otherwise a concept
                 object>}

    Gives an array of concept objects, restricted to concepts that appear in
    documents that match the supplied filter.

    The returned concept objects will contain fields as described above,
    depending on the concept selector passed.
    
    This endpoint also accepts POST requests to accommodate large queries.

Get concept match counts

GET /api/v5/projects/<project_id>/concepts/match_counts/

    Optional parameters:
      concept_selector (concept selector object, defaults to top concepts)
          which concepts to use (see "Concept Selectors")
      filter (JSON-encoded filter, defaults to all documents)
          criteria for restricting which documents to count (see "Filters")
      breakdowns (JSON-encoded array of field breakdown specifications)
          specifies that the match counts should additionally be broken down
          against the provided metadata fields (see "Field breakdowns")

    Result:
      {"total_count": <number of documents in the project>,
       "filter_count": <number of documents matching the filter>,
       "match_counts": [{<concept object> with additional fields:
                         "match_count": <number of documents with an exact or
                                         related match to the concept within
                                         the filter>,
                         "exact_match_count": <number of documents with an
                                               exact match to the concept
                                               within the filter>},
                        ...],
       "breakdowns": [{"breakdown": <the provided specification>,
                       "buckets": [{"label": <bucket label>,
                                    "total_count": <total number of documents
                                                    in the bucket>,
                                    "filter_count": <number of documents in
                                                     the bucket matching the
                                                     filter>,
                                    "match_counts": <array similar to above>},
                                   ...]},
                      ...]
      }

    Counts the documents matching concepts in the project.  Always provides
    the total count of matching documents for each concept (in the provided
    filter, if present).

    Optionally request that the matches also be broken down against specified
    metadata fields.
    
    This endpoint also accepts POST requests to accommodate large queries.

Get concept-concept associations

GET /api/v5/projects/<project_id>/concepts/concept_associations/

    Optional parameter:
      concept_selector (concept selector object, defaults to top concepts)
          which concepts to use (see "Concept Selectors")

    Result:
      [{<concept object> with additional field:
        'associations': [
            {<concept object> with additional field:
             'association_score': <association score>
            },
            ...]
        },
       ...]

    Returns an array of concepts, each one with an "associations" field that
    lists all of the concepts with an additional association field.  For
    example, if you specify in the concept selector that you want the thirty
    concepts most related to "purchase", the endpoint will return thirty
    concepts, each one having those thirty concepts in its "associations"
    field, with an "association_score" on each.
    
    This endpoint also accepts POST requests to accommodate large queries.

Get concept-filter associations

GET /api/v5/projects/<project_id>/concepts/filter_associations/

    Optional parameters:
      concept_selector (concept selector object, defaults to top concepts)
          which concepts to use (see "Concept Selectors")
      filters (JSON-encoded array of filters, defaults to all documents)
          an array of filter objects with which to get concept associations.

    Result:
      An array of result objects, one for each filter provided.  Each result
      object has the form:

      {"filter": <filter object>,
       "filter_count": <number of documents in the project matching the
                        filter>,
       "concepts": [{<concept object> with additional field:
                     "association score": <score>},
                    ...]
      }

    Any filter objects that matched no documents will contain null association
    values.
    
    This endpoint also accepts POST requests to accommodate large queries.

Get score drivers

GET /api/v5/projects/<project_id>/concepts/score_drivers/

    Required parameter:
      score_field (string)
          the name of the score field against which to evaluate concepts

    Optional parameter:
      filter (JSON-encoded filter, defaults to all documents)
          an array of filter objects to restrict documents

    Optional parameters (specify at most one):
      limit (integer, max 2000)
          how many of the most important score drivers in the project to return
      concept_selector (concept selector object)
          which concepts to get (see "Concept Selectors")
    If neither parameter is specified, defaults to the 10 most important score
    drivers in the project.

    Result:
      An array of concept objects of the shape returned by the
      GET /concepts/match_counts/ endpoint, with additional score driver
      fields:

       - "average_score": the average score of documents that match this
         concept
       - "impact": a value conveying how much higher or lower in score are
           documents that match this concept, compared to the overall average
       - "baseline": the average score of all documents matching this filter,
           which doesn't depend on the term, so it's the same for all rows
       - "confidence": a measure of our confidence in this concept's impact as
           a score driver
       - "importance": a combination of impact, confidence, and other factors
           used for ranking the results, only returned on suggested score
           drivers
       - "relevance": the relevance of the concept, only returned on suggested
           score drivers and top concepts

      Suggested score drivers are returned in order based on their importance
      ranking (with the highest scores first).  Results for other concept
      selectors are ordered as described in the "Concept Selectors" section.
    
    This endpoint also accepts POST requests to accommodate large queries.

Get concept sentiment

GET /api/v5/projects/<project_id>/concepts/sentiment/

    Optional parameters:
      concept_selector (concept selector object, defaults to sentiment suggestions)
          which concepts to get (see "Concept Selectors")
      filter (JSON-encoded filter, defaults to all documents)
          criteria for restricting which documents to count (see "Filters")

    Result:
      {"total_count": <number of documents in the project>,
       "filter_count": <number of documents matching the filter>,
       "sentiment_share": a <sentiment share object (see below)> for the
                          whole project, as filtered
       "match_counts": [{<concept object> with additional fields:
                         "match_count": <number of documents with an exact or
                                         related match to the concept within
                                         the filter>,
                         "exact_match_count": <number of documents with an
                                               exact match to the concept
                                               within the filter>},
                         "sentiment_share": a <sentiment share object, see
                                            below> for the concept
                        },
                        ...],
      }

    Returns match counts with additional information on sentiment.

    The sentiment share object is a dictionary with the distribution of
    sentiment types (positive, neutral, negative) across documents that match
    a given concept, as follows:

        {
            "negative": <percentage of negative documents>,
            "neutral": <percentage of neutral documents>,
            "positive":  <percentage of positive documents>
        }

    Suggested sentiment concepts are returned in order based on their
    importance ranking (with the highest values first).  Results for other
    concept selectors are ordered as described in the "Concept Selectors"
    section.
    
    This endpoint also accepts POST requests to accommodate large queries.

Saved concepts

The endpoints in this section create, retrieve, update, and delete saved
concepts (and their ordering) without regard to their science.  Their returns
therefore have only the following fields:

      {"name": <saved concept name>,
       "saved_concept_id": <unique saved concept ID>,
       "color": <HTML hex color>,
       "texts": <array of strings that define the concept>}

Because these objects contain no science information, the endpoints in this
section can be used with an unbuilt project; you can work with your saved
concepts even before uploading data.  The GET /concepts/ endpoint, called with
{"type": "saved"}, will retrieve the science (i.e., the exact_term_ids and
vector) for saved concepts.

Get saved concepts

GET /api/v5/projects/<project_id>/concepts/saved/

    Result:
      an array of saved concept objects, in their stored order

    Retrieves the saved concepts for a project.
    

Create saved concepts

POST /api/v5/projects/<project_id>/concepts/saved/

    Required parameter:
      concepts (JSON-encoded array of saved concept objects)
          See below for more information on fields that this object allows or
          requires

    Optional parameter:
      position (integer, default 0)
          position to insert the concepts into (0 = first)

    Result:
      an array of saved concept objects

    Creates new saved concepts.  Saved concepts uploaded to this endpoint must
    include "texts", an array of strings.  They may optionally include "name"
    and "color"; if they are not included, this endpoint will set defaults (the
    comma-separated list of texts in "texts", for the name; #808080, for the
    color).  They cannot include "saved_concept_id", which is set internally.
    

Modify saved concept order

PUT /api/v5/projects/<project_id>/concepts/saved/order/

    Required parameter:
      order (JSON-encoded array of strings)
          the concept IDs, in their desired order

    Sets the order of saved concepts to the order given.  Existing concepts
    not included will be reordered at the end, non-existent concepts specified
    in the order will be ignored.
    

Delete saved concepts

DELETE /api/v5/projects/<project_id>/concepts/saved/

    Required parameter:
      saved_concept_ids (JSON-encoded array of strings)
          an array of saved concept IDs to delete

    Deletes saved concepts from the project, if present.
    

Update saved concepts

PUT /api/v5/projects/<project_id>/concepts/saved/

    Required parameter:
      concepts (JSON-encoded array of saved concept update objects)
          an array of saved concept update objects including at least a
          "saved_concept_id" and any of "texts", "name", or "color"

    Updates the saved concepts with the same saved concept ID as the ones
    provided, using the provided fields.  Ignores non-existent saved concept
    IDs.
    

Other

Get system status

GET /api/v5/status/

    Result:
      {"languages": [{"code": <language code>, "name": <language name>}, ...],
       "version": <version identifier>}

    This endpoint provides system-wide information about this instance of
    Daylight.

    The version identifier string uses the software build date, and thus will
    be consistent across deployments of the same code.

    Language codes are the two-letter ISO 639-1 codes that can be supplied as
    values to endpoints that take a "language" parameter.  For instance, one
    object in the "languages" array will be {"code": "en", "name": "English"}.
    

Luminoso Daylight™, powered by QuickLearn™ | © 2019 Luminoso Technologies. All rights reserved.