How we calculate unique and repeated words
We generate word counts using our custom-built CAT (computer-aided translation) tool which is essentially a language database. The memory found in the tool is created or built overtime when a translator adds the source text and its corresponding human translations to the database.
If there's no stored memory, like in the case of a first time client, the tool looks only for text repetitions or 'repeated words' within the source document that is being translated. For every consecutive order by the same client, a translation memory (TM) is created.
When we translate, we usually translate one sentence before moving to the next. Logically, this helps us understand and retain the context of the content. Since what constitutes a sentence can differ from language to language, we refer to this breakdown of text as 'segments'. Learn more about segments and repeated words.
CAT tools compare segments against one another and identify whether each segment is unique or has been repeated. If there is even a slight difference between the two segments, each of them would be considered unique.
For example, here are three segments from a text:
My name is Mark (Unique)
My name is Mark. (Unique)
My name is Mark. (Repeated)
Although the words in all the segments are the same, there is a difference in punctuation. Since there is no full stop after the word ‘Mark’ in the first segment, it is considered unique. The second and third segments, however, are identical, and so, we consider the first occurrence of the segment as unique and any subsequent use as repetition.
You may wonder why the second segment is unique even if the words are the same. This is because we don't look at individual words but rather the segment as a whole. This helps us maintain the context.
To calculate the repeated words for a segment, we multiply the number of times the segment is repeated by the number of words within the segment. To get the total number of repeated words, we follow the same calculation for each segment that is repeated and add the answers.