About us

The Open System for Antiplagiarism (OSA) was developed within the framework of the project conducted by the Interuniversity Center of Information Technology in cooperation with the Institute of Computer Science at the Polish Academy of Sciences, as a response to the growing phenomenon of plagiarism in Polish diploma theses. Although, on the Polish market, there are some computer systems allowing to check students' writing for plagiarism, the authors of OSA implemented completely different solutions.

When comparing texts the Open System for Antiplagiarism does not use their original form. Instead so-called maps and term frequency vectors (TFV) are used, which are data structures containing partial information about the texts. This approach has three major advantages:

  • original texts cannot be fully recovered from the database of their maps.
  • comparison of maps is sufficient to indicate direct (verbal) and indirect (message) similarities between the texts and can be carried out with much better efficiency than comparison of the original texts, one-to-one.
  • the system is capable of indexing huge databases (e.g. the All-Polish Repository of Diploma Writings) with the TFV vectors or other hushes.
  • Tests carried out so far have shown great resistance of the system to the methods of camouflage of undocumented borrowings known by the authors (word order changes, permutation of phrases and/or paragraphs, synonymic substitutions, compilation from multiple sources, using white characters and/or characters from mixed script systems, ...).

    The database of maps and other bases needed by the system are created and/or updated in a fully automated action of processing the txt, pdf, doc, docx, odt or rtf documents selected by the user. In addition, the system allows the user to get more insight into the original documents whose maps proved to be alarmingly close. The text of any examined work can be compared one-to-one to each element of the set of reference texts selected by the system from local and Web resources on the basis of similarities between maps. Usually this set is empty or it contains only a few elements.

    We stress that any antiplagiarism system to be effective must have a large database of reference documents to compare with. In this version of the Open System for Antiplagiarism texts are compared with more than 600 million Polish Web pages.