Docusign Insight - Documents report "Already Processed" despite not being in the system

Issue

It is possible that due to errors while ingesting contracts, documents are only partially ingested into the system. This will result in you being unable to find the documents in the UI and all further attempts to ingest them again will generate an error of "Already processed." The reason for this is that the documents are tracked by their SHA1 hash which was completed without a problem. Later steps at the OCR stage fail and the contract is not created in the contract table. To fix this you will need to run a script against Postgres to remove all entries in the document table that aren't in the contract table.

Solution

CAUTION: This solution should be applied ONLY after performing a hard delete by emptying the trash bin in the UI. Also, ensure "Use Analytics" was turned ON when hard delete was performed.

The below script does not remove the document from the blobstore. Use caution when executing any script and ensure that you have a complete database backup.

Before proceeding, check and confirm "contract" or "sca_queue" table in SA DB doesn't have these documents. If present, those have to be removed before doing this procedure.
Perform a backup of the Database using pg dump.

You can see what will be removed by running this query:

select * from document d
where d.id not in (select document_id from contract)


To delete those objects:

delete from document d
where d.id not in (select document_id from contract)