Python API overview

From Clusterpoint Wiki
Jump to: navigation, search

Clusterpoint Client Python API


You can get the latest Clusterpoint Client Python API from github repository. Currently only way to install it is using Python Distribution Utilities (also known as Distutils) using command in terminal 'python install' in module directory.


All interactions with the Clusterpoint database happen through a Connection class instance. The Connection object manages connection to the Clusterpoint server and provides methods for Clusterpoint database commands like insert() and search(). For Connection's constructor You must specify these arguments:

  1. An host address string. Example: 'tcp://'
  2. The name of a Clusterpoint database You want to connect to
  3. A username for authentication (user email for cloud users)
  4. A password for authentification
  5. Account ID (ignored for standalone intallations, account identificator for cloud databases)

You can also optionally specify some named arguments, like:

  • document_root_xpath - Document root tag name. This is only required if Your document root tag name is not "document".
  • document_id_xpath - Document ID Xpath. This is only required if Your document ID xpath, as specified in the database policy, is not "/document/id". You have to specify the relative path to the root, separated by slashes, like "./id" for the default value.
# Create a connection to a Clusterpoint database.
con = pycps.Connection('tcp://', 'test_database', '', 'password', '250')

Request methods

The command methods take various parameters depending on command and return one of Response objects that contains all information received from the Clusterpoint database.

# Insert multiple documents in the database (the keys of the dict will be used as the document ids). 
con.insert({5: '<text>foobar</text>',
            6: '<text>baz</text>', 
            'id7': {'title': 'Loerem Ipsum', 'text': 'Long, long text'}})
# Delete two of the inserted documents.
con.delete([5, 'id7'])

The full list of available command methods see in the API reference.


For situations where the same or similar requests are being sent repeatedly it might be useful to construct and send the underlying Request object manually instead of using the Connection class's methods. For most of the command methods there exist the underlying Request class, except for some simple commands the base Request class is used.

All Request classes take as their first argument the Connection instance they are to use.

# Manually send clear command request.
req = pycps.Request(con, 'clear')
# Make a request object that can be used repeatedly for searching for similar text to the given text.
text = "Long interesting text be here."
req = pycps.SimilarRequest(con, text, 10, 5, mode='text')


Both command methods and the underlying Request classes will return a appropriate response object. Response objects provide property fields for obtaining common Clusterpoint database response information like length of processing time (seconds), the name of the database (storage_name) and the command the response was generated for (command). Every Response class also allows obtains the whole content part of the database response in various formats with get_content_dict(), get_content_string(), get_content_etree() methods. Specific command information is also made available in the corresponding responses. Se the reference for listing of available information in each Response object.

# Get status
resp = con.status()
# Dump the status dict
print("Processed in {0}; dump:\n{1}". format(resp.seconds, resp.status))
# Get a ListResponse with last added documents.
resp = con.retrieve_last(docs=10, offset=0)
# Use the returned documents.
if resp.found:
    for document in resp.get_docuemens(doc_format='string'):
# Insert a couple of documents yielding a ModifiedResponse.
resp = con.insert({1: {'text': 'Lorem Ipsum ...', 'title': 'Test'}, 2: '<br/>'})
# Get list of inserted documents.
print("Modified:\n" + '\n'.join(resp.modified_ids()))

Document formats

For modify type commands documents to be inserted in the database can be specified in one of the supported formats -

  • An xml.etree.ElementTree ElementTree or Element that is the root of the document;
  • An XML string;
  • A dict where keys are tag names and the values are tag contents (themselves possibly being nested dicts). If there are multiple identical child tags to a root, the root contains a lost of corresponding dicts.

All these formats are also available from ListResponse via method get_documents() specifying the doc_format named parameter accordingly. The Whole XML response's content is available in these formats through get_content_etree(), get_content_string() and get_content_dict() respectively.


The Pycps package will raise exceptions if there are problems with connecting to the Clusterpoint database and for some incorrect parameter values or malformed xml structures. Pycps will also raise the all important Clusterpoint transaction errors that are received from the Clusterpoint database as exceptions and all nonfatal errors (not resulting in data loss) as warnings. Especially APIError and APIWarning objects contain useful information on what went wrong.

    con = pycps.Connection("tcp://SERVER_IP:SERVER_PORT", "DATABASE", "USERNAME", "PASSWORD", "ACCOUNT_ID")
except pycps.ConnectionError:
    print("Can't Reach the server!")
except pycps.APIError as e:
    print("Could not delete all documents as these document's don't exist: " + ' ,  '.join(e.document_id))

Note that multi-document modifying requests might return errors only for some of the document ids, while the rest of the modifications will take place successfully. In order to detect these situations, You can check the document_id list of the APIException object.