Advanced Search

Announcement

Collapse
No announcement yet.

Which fields are involved in deduplication for stand alone documents in ecapture?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which fields are involved in deduplication for stand alone documents in ecapture?

    For example, we have an orphan item A.pdf in a volume. Having it available, a new extract job has been created where A.pdf is a legit standalone document. Will A.pdf in the consecutive volume be deduplicated with A.pdf. Provided, the dedupe scope rule it set to "Maintain compound document structure".
    Please help me in understanding the concept.

  • #2
    Hello,
    For loose documents, eCapture uses the standard algorithm to generate a SHA-1 hash value for each item. If the same PDF is in 2 different volumes of data, but neither are part of a family, one of them will be removed as a duplicate of the first volume.
    IF one or both PDF files are attached to a family, the family is hashed, and would need to match in both data sets in order for one to be removed. A loose PDF will not be removed, if the same PDF exists, but is part of a family. Deduplication can be a complicated subject, but hopefully this answered your initial question.

    Comment

    Working...
    X