The POLAR (POLyp Artificial Recognition) database is a colorectal polyp image database. The database consists of a training dataset and a validation dataset, usable within the developmental process of computer-assisted colorectal-polyp detection, localization and/or classification systems..

Polar Database

phone
020-5669111

Description POLAR training dataset

The POLAR training dataset consists of prospectively collected endoscopic images of colorectal polyps  annotated with corresponding histopathology. Images were collected withineight Dutch hospitals. To homogenize the data collection process, a standardized acquisition and anonymization protocol was used. In short, the endoscopist was requested to take at least two non-magnified Narrow Band Imaging (NBI) images of each detected lesion (using either a Olympus HQ190 or HQ180 colonoscope). These images were subsequently annotated with the corresponding histopathology and anonymized. Apart from very low sharpness and brightness, no other image exclusion criteria were applied.

The POLAR training dataset consists of 2,637 annotated non-magnified NBI images, originating from 1,339 unique polyps (73% adenomas, 17% hyperplastic polyps, 10% sessile serrated lesions) detected during 555 different colonoscopies.  Within each image in the dataset   , the polyp is marked and bounded by a bounding box. The bounding boxes were placed by two non-medical experts who consulted a medical expert in difficult cases. Examples of annotated polyps with bounding boxes are shown in Figure 1.

Figure 1: Examples of annotated polyps with bounding box. The information of the bounding box is provided in the attached excel file [y,x,w,h]. Figure 1: Examples of annotated polyps with bounding box. The information of the bounding box is provided in the attached excel file [y,x,w,h].

Description POLAR validation dataset

The POLAR validation dataset consists of prospectively collected endoscopic images of colorectal polyps annotated with corresponding histopathology. All images were obtained during the clinical validation phase of the POLAR CAD system. Images were collected within seven Dutch hospitals and one Spanish during screening colonoscopies within the Bowel Cancer Screening Program. Images were only collected withinconsecutive FIT-positive patients and only stored for research purposes if the patient provided informed consent.. Exclusion criteria for participation in the clinical validation study included  inflammatory bowel disease, Lynch syndrome and polyposis syndromes. Participating endoscopists in this study were all accredited for performing colonoscopies within in a Bowel Cancer Screening Program. Per centre, up to three endoscopists were invited to participate in the study. Endoscopists were required to perform at least 10 study procedures. All procedures had to be performed with EVIS EXERA II or III video processors (Olympus, Tokyo, Japan) and with 190-series colonoscopes containing NBI (Olympus). For each polyp, location, size and morphology (Paris classification) and predicted histology (optical diagnosis) were recorded. Participants were advised to size their polyps next to a reference tool of known diameter (e.g., snare or biopsy forceps). Histology of the detected lesions was predicted including a high or low confidence assessment (hyperplastic polyp, sessile serrated lesions, adenoma, carcinoma or other).  During the clinical validation phase, the endoscopist were asked to capture up to a maximum of three non-magnified NBI images of each detected lesion. For the POLAR validation dataset, we only included the images in which the POLAR CADx system was able to provide a diagnosis. The polyp in each image in the dataset was marked and bounded with bounding boxes by the POLAR localization model.

The POLAR validation dataset consists of a total of 730 polyps from 251 patients, prospectively collected by 20 endoscopists from 8 hospitals (Table 1). More details regarding the data are summarized in this reference: https://clinicaltrials.gov/ct2/show/NCT03822390. When the study will be published in a medical journal, please use this reference instead of the clinicaltrials.gov reference.

Table 1: Characteristics of the colorectal lesions in the clinical validation set, n (%)
All polyps (N=730) Diminutive polyps (1-5 mm) (N=481) Small polyps (6-9 mm) (N=112) Larger polyps (≥10 mm) (N=135)
Location
* Proximal to rectosigmoid 471 (65%) 342 (71%) 72 (64%) 57 (42%)
* Rectosigmoid 258 (35%) 138 (29%) 40 (36%) 78 (58%)
* Missing 1 (0%) 1 (0%) 0 (0%) 0 (0%)
Morphology, by Paris classification
* Polypoid (Ip, Isp, Is) 588 (81%) 384 (80%) 93 (83%) 111 (82%)
* Non-polypoid (IIa , IIb, IIc) 137 (19%) 94 (20%) 19 (17%) 22 (16%)
* Missing 5 (0%) 3 (0%) 0 (0%) 2 (2%)
Histopathology
* Cancer 11 (2%) 0 (0%) 0 (0%) 10 (7%)
* Adenoma 473 (65%) 301 (63%) 84 (75%) 77 (57%)
* Sessile serrated lesion 63 (9%) 39 (9%) 10 (9%) 14 (10%)
* Traditional serrated adenoma 5 (0%) 0 (0%) 0 (0%) 5 (3%)
* Hyperplastic polyp 99 (14%) 81 (17%) 14 (13%) 4 (3%)
* Other 15 (2%) 11 (2%) 3 (3%) 1 (0%)
* Normal mucosa 26 (4%) 24 (5%) 0 (0%) 2 (1%)
* Missing 32 (4%) 19 (4%) 1 (0%) 12 (9%)
Optical diagnosis
* Cancer 12 (2%) 1 (0%) 0 (0%) 10 (7%)
* Adenoma 510 (70%) 320 (67%) 85 (75%) 104 (78%)
* Sessile serrated lesion 107 (15%) 69 (14%) 20 (17%) 18 (14%)
* Hyperplastic polyp 91 (12%) 87 (19%) 4 (3%) 0 (0%)
* Other 7 (1%) 3 (0%) 3 (3%) 1 (0%)
* Missing 3 (0%) 1 (0%) 0 (0%) 2 (1%)
Confidence level
* High confidence 635 (87%) 409 (85%) 101 (90%) 123 (91%)
* Low confidence 81 (11%) 64 (13%) 8 (7%) 9 (7%)
* Missing 14 (2%) 8 (2%) 3 (3%) 3 (2%)

Description of the folder structure

  • There are two main folders, the training and validation dataset folder:
    • Inside the training dataset folder there are subfolders for each patient in the dataset, named with an anonymous patient file name. These patients subfolder contains again subfolders for each polyp in the dataset, named with an anonymous polyp file name. The subfolders for each polyp contain the original polyp images of each patient (per-protocol 2 NBI images per polyp)
    • Inside the validation dataset folder there are subfolders for each patient in the dataset, named with an anonymous patient file name. These patients subfolders contains the original polyp images of each patient (1 2 NBI images per polyp), named with an anonymous polyp file name
  • There is one Excel file containing the additional metrics of each polyp
    • Patient number (anonymized)
    • Polyp number of each patient (anonymized)
    • Polyp size assigned by the endoscopist
    • Polyp morphology by use of the Paris classification assigned by the endoscopist (Ip, Isp, Is, IIa , IIb, IIc, missing)
    • Polyp location assigned by the endoscopist (cecum, ascending colon, hepatic flexure, transverse colon, splenic flexure, descending colon, sigmoid colon, rectosigmoid, rectum, site not specified)
    • Optical diagnosis endoscopist (carcinoma, adenoma, sessile serrated lesion, hyperplastic polyp, other, missing)
    • Confidence level endoscopist (high, low, missing)
    • Pathological diagnosis (adenocarcinoma, tubular adenoma, tubulovillous adenoma, villous adenoma, traditional serrated adenoma, sessile serrated lesions, hyperplastic, other…, normal mucosa, no polyp received, missing)
    • Bounding box annotation of each polyp, performed by two non-medical experts, in difficult cases consultation with medical expert (y,x,w,h)

Data usage rules applies

[1] The data is released under the licence: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), which means that it will be publicly available for non-commercial usage.

[2] The licensing of new creations must use the exact same terms as in the current version of the data set.

[3] Any part of this dataset cannot be used in dataset science challenges/hackatons without permission from the dataset owners.

[4] Should you wish to use or refer to this data set, you must cite: https://clinicaltrials.gov/ct2/show/NCT03822390.  When the study will be published in a medical journal please use this reference instead of this one.

[5] Any error in data/anonymisation/annotation should be reported immediately. It is the responsibility of the user to report back to us and authors/organisers bear no liability under any such circumstances.

[6] The accuracy, reliability and completeness of the annotations may be subjective to the annotators.

 

Access to the database 

To obtain access to this dataset, potential users should agree with the terms of use and then complete a form with personal information. The study office will access the request before sending the dataset.

Form to request access to POLAR database