logo
Tutorials

Upload Documents from a Remote Location to GroundX

This tutorial will show you how to use GroundX's Typescript and Python SDK libraries to upload hosted documents to your GroundX buckets.

Through a simple API request you can effortlessly upload your content to GroundX and automatically pre-process your data to get it ready to be searched through.

Prerequisites

  • Node.js installed (for Javascript or Typescript projects)
  • Python 3.7 or higher installed (for Python projects)

Step 1: Set up your environment

If you haven't already done so, follow the steps below to get your GroundX API key and install the GroundX SDK for your project.

  1. To get your GroundX API key, log in to your GroundX dashboard and go to the API Keys section.
  1. Install the GroundX SDK for either Typescript or Python with the following commands:
pip install groundx-python-sdk
npm install groundx-typescript-sdk --save

Step 2: Import required libraries

In your project, import the GroundX SDK library:

from groundx import Groundx, ApiException
import { Groundx } from "groundx-typescript-sdk";

Step 3: Set up your API key

Set up your API key by creating a new GroundX object and passing your API key as a parameter:

groundxKey = 'YOUR_GROUNDX_KEY'
const groundxKey = "YOUR_GROUNDX_KEY";

Step 4: Set up content ingestion parameters

Set up the parameters for the content ingestion request. For more information on the parameters for uploading hosted documents to GroundX, go to the reference guide.

  1. Indicate the ID of the bucket you want to ingest the content into by setting the bucket parameter.
bucketID = 0
let bucketId = 0;
  1. Set a variable to indicate the type of content you want to ingest. Currently, the supported file types are:
  • txt
  • docx
  • pptx
  • xlsx
  • pdf
  • png
  • jpg

For example:

fileType = '<FILE_TYPE>'
const fileType = "<FILE_TYPE>";
  1. Set a variable to indicate the URL of the content you want to ingest. For example:
ingestHosted = '<URL>'
const ingestHosted = "<URL>";
  1. Optional: Include an object containing metadata for your content. For example:
contentMetadata = {
"title": "Sample Title",
"description": "Sample Description",
"author": "Sample Author",
"tags": ["Sample Tag 1", "Sample Tag 2"]
}
const contentMetadata = {
title: "Sample Title",
description: "Sample Description",
author: "Sample Author",
tags: ["Sample Tag 1", "Sample Tag 2"]
};

Step 5: Set parameter validation

Optional: Set up parameter validation to check if all the required parameters are set. For example:

if groundxKey == "":
raise Exception("set your GroundX key")
if ingestHosted == "":
raise Exception("set the hosted file URL")
if fileType == "":
raise Exception("set the file type to a supported enumerated type (e.g. txt, pdf)")
if (groundxKey === "YOUR_GROUNDX_KEY") {
throw Error("set your GroundX key");
}
if (ingestHosted === "") {
throw Error("set the hosted file URL");
}
if (fileType === "") {
throw Error("set the file type to a supported enumerated type (e.g. txt, pdf)");
}

Step 6: Initialize the GroundX client

Initialize the GroundX client by creating a new GroundX object and passing your API key as a parameter. For example:

groundx = Groundx(
api_key=groundxKey,
)
const groundx = new Groundx({
apiKey: groundxKey,
});

Step 7: Get default bucket ID

Before uploading the content, we'll set the default bucket ID. Since we set the bucket ID to 0 in Step 4.1, we'll now call the endpoint to check if any buckets exist and get the ID of the first bucket in the list. For example:

if bucketId == 0:
# list buckets request
try:
bucket_response = groundx.buckets.list()
if len(bucket_response.body["buckets"]) < 1:
print(bucket_response.body["buckets"])
raise Exception("no results from buckets")
bucketId = bucket_response.body["buckets"][0]["bucketId"]
except ApiException as e:
print("Exception when calling BucketApi.list: %s\n" % e)
// Note: Insert this code within a function.
if (bucketId === 0) {
// List buckets request
const bucketResponse = await groundx.buckets.list();
if (!bucketResponse || !bucketResponse.status || bucketResponse.status != 200 ||
!bucketResponse.data || !bucketResponse.data.buckets) {
console.error(bucketResponse);
throw Error("GroundX bucket request failed");
}
if (bucketResponse.data.buckets.length < 1) {
console.error("no results from buckets");
console.log(bucketResponse.data.buckets);
throw Error("no results from GroundX bucket query");
}
console.log(bucketResponse.data);
bucketId = bucketResponse.data.buckets[0].bucketId;
}

Step 8: Upload the content

Upload the content by calling the endpoint with the parameters you set in Step 4 as arguments. For example:

# Upload hosted documents to GroundX request
try:
ingest = groundx.documents.ingest_remote(
documents=[
{
"bucketId": bucketId,
"metadata": contentMetadata,
"sourceUrl": ingestHosted,
"fileType": fileType,
}
],
)
// Note: Insert this code within a function.
// Upload hosted documents to GroundX
let ingest = await groundx.documents.ingestRemote({
documents: [
{
bucketId: bucketId,
type: fileType,
metadata: contentMetadata,
sourceUrl: ingestHosted,
}
]
});

The endpoint returns a response object indicating the status of the ingestion process.

For example:

// Successful request response
{
"ingest": {
"processId": "string", // Object ID of the ingest process
"status": "string" // "queued" | "processing" | "error" | "complete"
}
}

Step 9: Get ingest status

To check the status of the ingestion process, we'll use the request response and the endpoint. For example:

# Insert this code after the Try block in Step 8.
while (
ingest.body["ingest"]["status"] != "complete"
and ingest.body["ingest"]["status"] != "error"
):
ingest = groundx.documents.get_processing_status_by_id(
process_id=ingest.body["ingest"]["processId"]
)
except ApiException as e:
print("Exception when calling DocumentApi.ingest_remote: %s\n" % e)
// Note: Insert this code within a function.
if (!ingest || !ingest.status || ingest.status != 200 ||
!ingest.data || !ingest.data.ingest) {
console.error(ingest);
throw Error("GroundX ingest request failed");
}
// poll ingest status
while (ingest.data.ingest.status !== "complete" && ingest.data.ingest.status !== "error") {
ingest = await groundx.documents.getProcessingStatusById({
processId: ingest.data.ingest.processId,
});
if (!ingest || !ingest.status || ingest.status != 200 ||
!ingest.data || !ingest.data.ingest) {
console.error(ingest);
throw Error("GroundX ingest request failed");
}
await new Promise((resolve) => setTimeout(resolve, 3000));
}

Step 10: Test your code

  1. After you have adjustmented the code accordingly, run your code to upload the content to GroundX.
  2. Call the endpoint to from GroundX's interactive API Reference guide to get a list of all the documents in your GroundX buckets.
  3. Check if the content you uploaded is listed in the response.

Congratulations!
You've successfully ingested a hosted document to GroundX that you can now search through using GroundX's search API.