Advanced Search


No announcement yet.


  • Filter
  • Time
  • Show
Clear All
new posts

  • deduplication

    ability to de-duplicate edocs against email attachments would be nice. Even the ability built into SE Advanced search would be nice

  • #2
    Originally posted by View Post
    ability to de-duplicate edocs against email attachments would be nice. Even the ability built into SE Advanced search would be nice
    In ADD, deduplication uses eCaptures abilities to determine which portions of an email you'd like to include in calculating hash values. You can select any of the primary metadata fields, including an ability to include or remove white space. It includes an ability to use attachment names and counts as criteria.

    "Never tell your people how to do things. Tell them what you want and they'll amaze you with their creativity." - Gen. George S. Patton


    • #3
      I mean the ability to deduplicate standalone edocs against email attachments


      • #4
        This option is not currently available in ADD Streaming Discovery but it is available in standard eCapture processing. If you are using standard eCapture Discovery, Data Extract/Processing, then within the Flex Processor Rules > General Criteria, there is an option 'Allow child originals'. If selected, then standalone eDocs will be compared to email attachments and deduplicated.

        Excerpt from eCapture User Guide:
        The Allow Child Originals option is selected by default and controls how child
        documents are compared during de-duplication when the option Maintain Family
        Structure is selected. This allows documents, including loose files, to deduplicate
        against child documents predicated on order they are processed. For
        example, if two Word documents exist with the same MD5Hash value; one as
        child attachment to an Email parent, the other as a loose Parent, the loose
        Parent (Word document) is removed. However, if the loose Parent (Word document)
        is encountered before the Email (parent) and its Word (child attachment)
        the Word (child attachment) is not removed. Deselect this option to
        force duplicate checks at the parent level only.


        • #5
          will this inadvertently mark an email attachment as duplicate instead of the edoc?


          • #6
            No, it is dependent on the order of the duplicate loose file and email attachment. If the email attachment comes first, then the duplicate loose file will be removed. If the loose file comes first, then the email attachment is untouched because it is part of a family.


            • #7
              Thanks, so, it's important to importemails with attachments first in order to get the desired behaviour


              • #8
                I guess this also means there is only static deduication for streaming discovery jobs as well as standard ecapture jobs.