Plagiarism Detection Architecture

Publication date: July 26, 2021 (Diagram history)

This architecture helps you create a plagiarism-detection service using AWS Step Functions, AWS Lambda, Amazon SageMaker AI, and OpenSearch Service.

Plagiarism Detection Architecture Diagram

Copy the document you’d like to run plagiarism detection on to Amazon Simple Storage Service (Amazon S3).
Amazon S3 event triggers start of AWS Step Functions workflow.
AWS Lambda function extracts text from document using Tika (a content analysis toolkit that detects and extracts metadata and text from over a thousand different file types.
For each paragraph in the document, text is passed to a pre-trained Bidirectional Encoder Representations from Transformers (BERT)-based model to extract word embedding vectors.
For each word embedding vector, a K-Nearest Neighbor (KNN) search is run using a cosine-similarity algorithm.
Amazon OpenSearch Service (OpenSearch Service) domain stores an index of pre-processed works that have been converted into word embedding vectors and indexed.
Based on the configured similarity threshold that is compared against the OpenSearch Service query result score, an event bridge event is raised, specifying source document information that has possibly been plagiarized with reference to relevant works.

To customize this reference architecture diagram based on your business needs, download the ZIP file which contains an editable PowerPoint.

Sign up for an AWS account. New accounts include 12 months of AWS Free Tier access, including the use of Amazon EC2, Amazon S3, and Amazon DynamoDB.

To be notified about updates to this reference architecture diagram, subscribe to the RSS feed.

Change	Description	Date
Initial publication	Reference architecture diagram first published.	July 26, 2021

To subscribe to RSS updates, you must have an RSS plugin enabled for the browser you are using.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.