Text search in Adobe Acrobat Sign

Last updated on 2 December 2022

Overview
How text searching works
- THE AGREEMENT TITLE FIELD
Searching with special query syntax

Overview

Acrobat Sign allows for complex searching to find content in the user's agreements. The search bar, found on the Manage page, returns all transactions that match any string provided for the content source selected.

If you are looking at "your agreements", you search against your content. If you are looking at a shared account, you search that shared accounts' content

The content in the below fields is indexed as a transaction is created/updated:

Title - Agreement title.
Note - A private agreement note taken by the participant that is not visible to anyone else.
Message - A list of messages visible to this participant (includes public and private messages.
Original File Name - Original name of an uploaded file associated with the agreement.
Email - Recipient (including CCs) or sender’s email address.
Full Name – First and last name of the recipient (including CCs) or sender.
Job Title – Job title of the recipient (including CCs) or sender within their company.
Company Name – Company or organization name of the recipient (including CCs) or sender.
Recipient Group Name – Name of the ad-hoc agreement group name that recipients can belong to.
Text Field Content - User-provided text field content in the form.
Sharer's Full Name - Full name of the sharer of the agreement. In a non-sharing case, this is the name of the user.
Sharer‘s Recipient Group Name - Recipient group name of the sharer of the agreement. In a non-sharing case, this is the recipient group name of the user.
External ID - Sender assigned ID to the agreement that can be of any form, but usually in the form of "<groupID>:<ID>". External ID is passed in the call to the agreement creation API.
External Group ID - Sender assigned group ID to the agreement that can be of any form, usually used as a prefix for the External ID. The External Group ID is passed in the call to the agreement creation API. You are required to set an External ID if you are setting an External Group ID parameter.
Transaction ID - ID assigned to the agreement by Acrobat Sign upon creation.

How text searching works

If you were searching for the string: "A simple fish"

Acrobat Sign "tokenizes" the string, using spaces as delimiters. The above example string is broken into three tokens: A, simple, and fish
- Characters in string queries belong to one of three different types: letter, number, or delimiter.
- Characters treated as delimiters (other than whitespace) are: ~ ` ! @ # $ % ^ & * ( ) - + = { } [ ] | \ . , : ; " ' < > ? /
  - Periods, colons, and apostrophes remain as part of a token if characters before and after that symbol are of the same type.
  - Quotes that surround the whole query string are not delimiters and specify a literal string value (phrase)
  - Quotes inside a query string are delimiters and do not specify a literal string value.
Case distinction is removed. e.g., a, simple, and fish
Search then tries to match the full text of each token to an indexed value
- More complex tokenization occurs for the Agreement Title (see below)
An inclusive search is used, meaning that every agreement that matches at least one token from at least one searchable field is included in the returned dataset
- The returned dataset is sorted by relevance score, with the most relevant search result being at the top.

THE AGREEMENT TITLE FIELD

As noted above, the Agreement Title field has more sophisticated tokenization due to an additional "customized" tokenizer that primarily tokenizes on context delimiters (vs. explicit characters). This custom tokenizer is different from the standard in that:

Prefix tokens (up to ten characters) are generated - A prefix token is the incrementing string of any standard token. e.g., If the standard token is fish, the incrementing tokens are: f, fi, fis, and fish
- This allows searching for partial strings, given you start with the first character of the token
- Mid-string matches are ignored. e.g., Searching for rent will not match the word apparently
Split tokens at non-alphanumeric characters. e.g., The string: Super_Duper yields the tokens: Super and Duper
- Underscores are not delimiters in the standard tokenizer
Split tokens at letter case transitions. e.g., The string: PowerShot yields the tokens: Power and Shot
Split tokens at letter-number transitions. e.g., The string XL500 yields the tokens: XL and 500
Removes leading or trailing delimiters from each token. e.g., The string: XL---42+'Autocoder' yields the tokens: XL, 42, and Autocoder
Removes the English possessive ('s) from the end of each token. e.g., The string: Dave's yields the token: Dave

Note that the combination of the standard and custom tokenizers allow you to search for the full token string (thanks to the standard tokenizer) and the prefix tokens (thanks to the custom tokenizer). Still, you will not match a prefix token that spans a delimiter.

Example: If you have an agreement named My_NDA

The standard tokenizer would produce a token that looked like my_nda
The custom tokenizer would produce a series of tokens: m, my, n, nd, nda, my_, my_n, my_nd, my_nda

Example 2: If you have an agreement named XL500

The standard tokenizer would produce a token that looked like xl500
The custom tokenizer would produce a series of tokens: xl500, x, xl, 5, 50, 500, xl5, xl50

Searching with special query syntax

As described in the section above, Agreement Search performs an approximate match among all searchable fields of an agreement. Searchable field content is tokenized, and then those tokens are matched to the query string at query time. Agreement Search also performs a prefix match up to ten characters for those tokens. If at least one token match is found in an agreement, that agreement will appear in the search results. Search results are sorted by relevance score, with the most relevant search result being at the top.

However, matches of an entire field value, matches of a phrase from a field value, searches for agreements that don't contain a particular token, searches for agreements that contain several tokens at the same time (not a phrase) can be achieved only with a special syntax. Sign Generic Query Language (SGQL) was developed to address customers' need for these features that require special syntax.

RESERVED CHARACTERS AND WORDS

SGQL has seven reserved characters:

(

)

These reserved characters are used as operators and define language features in a search query. If a reserved character is used incorrectly, a syntax error is returned.

To use reserved characters as regular characters in a search query, they need to be escaped. For example, the search query

$ \"bea\:u\*ful\"\\ $

has all reserved characters escaped and they are treated as regular characters.

SGQL also has three reserved words for operators:

AND

NOT

The operators must be capitalized. To use operators as regular tokens in a search query, they need to be double-quoted. For example:

foo "AND" bar

PHRASE MATCH

A regular approximate match query will match if ANY of the tokens appear in a field (but not necessarily all), and the token order is irrelevant since tokens don't need to appear together.

A phrase match query should be used when a case-insensitive exact phrase search among one or more searchable fields is required. It allows us to match where multiple tokens appear in the same field, and those tokens appear one after another in the order specified within the quotes.

Phrase match query syntax format:

"<phrase_match_query>"

'<phrase_match_query>'

If the query syntax doesn't follow the rules for a phrase match query syntax, Search performs a regular approximate match query among all searchable fields.

For example, query

"Study Group"

will match agreements that contain phrase "study group" in any of the searchable fields. Please note that phrase "study - group" won't be matched for this query since it's not a case-insensitive exact phrase when compared to "Study Group".

Phrase match can appear in any place inside of a search query. For example,

title: ( math AND "course materials" AND "study group" )

will match the title "Course materials for MATH 101 Study Group" since this title contains the keyword "MATH" and phrases "Course Materials" and "Study Group".

FIELD NAME PREFIX

A field name prefix query should be used when searching for only one particular field within a user's agreements.

The field name prefix query must contain a field name prefix followed by a search query. Field name prefix query syntax format:

<field_name>:<query>

The field name prefix can appear in front of a token or in front of a parenthesized part of a query. For example, in the search query

title: ( Hello AND "Beautiful World" AND "My World" )

all tokens are matched against the agreement title field.

In the search query below

title: Hello AND "Beautiful World" AND "My World"

only the word 'Hello' has a field name prefix and this word is matched only against the agreement title field. The rest of the words and phrases are matched against all searchable fields.

If <field_name> is not specified, all fields supported for the phrase match are queried. Otherwise, only the <field_name> field is queried.

If the query syntax doesn't follow the rules for the field name prefix query syntax, Agreement Search uses the whole query as a search query and performs a search among all searchable fields.

For example, the field name prefix query:

title: ( Hello World )

performs a search only against the field that contains the title of the agreement.

Below is a list of prefixes supported for the field name prefix query. Field name prefixes are case-insensitive.

FIELD CONTENT	STRING QUERY FIELD NAME PREFIX	FIELD CONTENT DESCRIPTION
Title	title*	Agreement title.
Note	note	A private agreement note taken by the participant that is not visible to anyone else.
Message	message	A list of messages visible to this participant (includes both public and private messages.
Original File Name	originalFileName	The original name of an uploaded file associated with the agreement.
Email	email**	Recipient (including CCs) or sender’s email address.
Full Name	fullName***	First and last name of the recipient (including CCs) or sender.
Job Title	jobTitle	Job title of the recipient (including CCs) or sender within their company.
Company Name	companyName	Company or organization name of the recipient (including CCs) or sender.
Recipient Group Name	recipientGroupName	Name of the ad-hoc agreement group name that recipients can belong to.
Text Field Content	textFieldContent	User-provided text field content in the form.
Sharer's Full Name	sharerFullName	Full name of the sharer of the agreement. In a non-sharing case, this is the name of the user.
Sharer's Recipient Group Name	sharerRecipientGroupName	Recipient group name of the sharer of the agreement. In a non-sharing case, this is the recipient group name of the user.
External ID	externalId	Sender assigned ID to the agreement that can be of any form, but usually in form of "<groupID>:<ID>". External ID is passed in the call to the agreement creation API.
External Group ID	externalGroupId	Sender assigned group ID to the agreement that can be of any form, usually used as a prefix for external ID. External Group ID is passed in the call to the agreement creation API. You are required to set External ID if setting External Group ID parameter.
Transaction ID	agreementId	ID assigned to the agreement by Acrobat Sign when the agreement is created.

For backward compatibility, some of the field name prefixes have aliases that are equivalent in functionality to the original field name prefixes. Those aliases are deprecated and will be eventually removed:

* Field name prefix 'name' can be used instead of 'title'.

** Field name prefix 'participantEmail' can be used instead of 'email'.

*** Field name prefix 'participantName' can be used instead of 'fullName'.

WILDCARDS

A wildcard ( asterisk * ) can be used to do a prefix match followed by an unlimited number of characters in a token. Wildcard expansion is an expensive operation that happens at runtime, so the following syntax rules must be followed to be able to use this feature:

Leading wildcards are not allowed.
Wildcards in the middle of a token are not allowed.
A field name prefix is required for a token that has a wildcard expansion operator.
A wildcard operator cannot be used more than once in a query.
Queries that contain a wildcard expansion return a timeout error if the execution time exceeds five seconds.

For example, query

title:myh*

matches title field tokens

myhost

and

myhost.ny.mydomain.com

Wildcard expansion is an expensive operation. If used unwisely, it consumes a lot of system resources and you might wait a long time for your search results. To avoid these problems, SGQL has restrictions on wildcard usage which eliminates the most expensive and resource-consuming use cases. The more specific your tokens are, the more efficient your search is. Searching for a specific word or phrase is always more efficient than a search that uses a wildcard.

BOOLEAN EXPRESSIONS

SGQL supports the Boolean operators: AND, OR, and NOT as well as the grouping of those operators using parentheses. The operators must be capitalized.

The OR operator is always implied between tokens. For example,

foo bar

is the same as

foo OR bar

Unless you want to include the OR operator for clarity reasons, you do not need to specify it.

The NOT operator only applies to the token immediately following NOT. To apply the NOT operator to multiple tokens, you must enclose those tokens in parenthesis.

The following table describes the order in which boolean expressions are evaluated.

Order	Search command
1	Expressions within parentheses
2	NOT clauses
3	OR clauses
4	AND clauses

In the table below you can find examples of semantically equivalent queries that explain operator precedence in case grouping operators (parentheses) are not provided.

Search query	Equivalent rewritten search query	Comments
foo AND bar baz	foo AND ( bar OR baz )	Operator OR is implicit and shouldn't be used unless you want to add it for clarity reasons.
foo NOT bar baz	foo OR ( NOT bar ) OR baz	Operator NOT is applied to the following token or parenthesized part of a query.
foo NOT bar baz AND xyz	( foo OR ( NOT bar ) OR baz ) AND xyz
title: ( Hello AND "Beautiful World" "My World" )	title:Hello AND ( title:"Beautiful World" OR title:"My World" )	Field name prefix is applied to the following token or parenthesized part of a query.
title: Hello AND note: "Beautiful World" "My World"	title:Hello AND ( note:"Beautiful World" OR "My World" )	The phrase "My World" will be matched against all searchable fields.

Get help faster and easier

New user?