Amazon Comprehend
Developer Guide

Detect PHI

Use the DetectPHI operation to detect Protected Health Information (PHI) data in the clinical text being examined. All five categories of entity are detected using the DetectEntities operation, but only information in the PHI category is detected by the DetectPHI operation. This allows for use cases where only this specific information is required. For information about information in the non-PHI categories, see Detect Entities.


When using Amazon Comprehend Medical to identify protected health information, please recall that the service provides confidence scores that indicate the level of confidence in the accuracy of the detected entities. You should evaluate these confidence scores and identify the right confidence threshold for your use case. For specific compliance use cases, we recommend you use additional human review or other methods to confirm the accuracy of detected PHI.

Under the HIPAA act, PHI that is based on a list of 18 identifiers must be treated with special care. These identifiers consist of data that can be used to identify an individual patient, including the following list. For more information, see Health Information Privacy on the U.S. Government Health and Human Services Web site.

When working with PHI, it is important to understand that while Amazon Comprehend Medical detects entities associated with these identifiers from clinical text, these entities don't map 1:1 to the above list of 18 specified by the Safe Harbor method. Not all identifiers are contained in unstructured clinical text, but Amazon Comprehend Medical does cover all of the relevant identifiers.

Each PHI-related entity includes a score (Score in the response) that indicates the level of confidence Amazon Comprehend Medical has in the accuracy of the detection. You should identify the right confidence threshold for your use case and filter out entities that do not meet it. In certain compliance use cases where you are identifying occurrences of PHI rather than using the values of the detected entities, it may be better to use a low confidence threshold for filtering because you will capture more potential occurrences.

The following PHI-related entities can be detected by DetectPHI and DetectEntities operations:

Detected PHI Entities



HIPAA Category


All components of age, spans of age, and any age mentioned, be it patient or family member or others involved in the note. Default is in years unless otherwise noted.

3. Dates related to an individual


All names mentioned in the clinical note, typically belonging to patient, family, or provider.

1. Name


Any phone, fax, pager; excludes named phone numbers such as 1-800-QUIT-NOW as well as 911.

4. Phone number

5. FAX number


Any email address.

6. Email addresses


Included is social security number, medical record number, facility identification number, clinical trial number, certificate or license number, vehicle or device number, or biometric number as it pertains to the patient, place of care, or provider.

7. Social Security Number

8. Medical Record number

9. Health Plan number

10. Account numbers

11. Certificate/License numbers

12. Vehicle identifiers

13. Device numbers

16. Biometric information

18. Any other identifying characteristics


Any web URL.

14. URLs


This includes all geographical subdivisions of an address of any facility, named medical facilities, or wards within a facility.

2. Geographic location


Includes any profession or employer mentioned in a note as it pertains to the patient or the patient’s family, not the profession of the clinician within the note.

18. Any other identifying characteristics


The text "Patient is John Smith, a 48 year old teacher and resident of Seattle, Washington." returns:

  • "John Smith" as an entity of type NAME in the PROTECTED_HEALTH_INFORMATION category.

  • "48" as an entity of type AGE in the PROTECTED_HEALTH_INFORMATION category.

  • "teacher" as an entity of type PROFESSION (identifying characteristic) in the PROTECTED_HEALTH_INFORMATION category.

  • "Seattle, Washington" as an ADDRESS entity in the PROTECTED_HEALTH_INFORMATION category.

In the Amazon Comprehend Medical console, this is shown like this:

When using the DetectPHI operation directly, the response appears like this:

{ "Entities": [ { "Id": 0, "BeginOffset": 11, "EndOffset": 21, "Score": 0.997368335723877, "Text": "John Smith", "Category": "PROTECTED_HEALTH_INFORMATION", "Type": "NAME", "Traits": [] }, { "Id": 1, "BeginOffset": 25, "EndOffset": 27, "Score": 0.9998362064361572, "Text": "48", "Category": "PROTECTED_HEALTH_INFORMATION", "Type": "AGE", "Traits": [] }, { "Id": 2, "BeginOffset": 37, "EndOffset": 44, "Score": 0.8661606311798096, "Text": "teacher", "Category": "PROTECTED_HEALTH_INFORMATION", "Type": "PROFESSION", "Traits": [] }, { "Id": 3, "BeginOffset": 61, "EndOffset": 68, "Score": 0.9629441499710083, "Text": "Seattle", "Category": "PROTECTED_HEALTH_INFORMATION", "Type": "ADDRESS", "Traits": [] }, { "Id": 4, "BeginOffset": 78, "EndOffset": 88, "Score": 0.38217034935951233, "Text": "Washington", "Category": "PROTECTED_HEALTH_INFORMATION", "Type": "ADDRESS", "Traits": [] } ], "UnmappedAttributes": [] }