Search company, investor...


Founded Year



NAMUR is an international user association for automation technology and digitalization in process industries. It represents the interest in automation and digitalization technologies. It works to implement the use of automation and digitalization technologies that are cost-efficient, sustainable, and safe. It was founded in 1949 and is based in Leverkusen, Germany.

Headquarters Location

Leverkusen, 51368,


Missing: NAMUR's Product Demo & Case Studies

Promote your product offering to tech buyers.

Reach 1000s of buyers who use CB Insights to identify vendors, demo products, and make purchasing decisions.

Missing: NAMUR's Product & Differentiators

Don’t let your products get skipped. Buyers use our vendor rankings to shortlist companies and drive requests for proposals (RFPs).

Latest NAMUR News

Machine Learning Algorithm to Estimate Distant Breast Cancer Recurrence at the Population Level with Administrative Data

May 5, 2023

Hava Izci,1 Gilles Macq,2 Tim Tambuyzer,2 Harlinde De Schutter,2 Hans Wildiers,1,3 Francois P Duhoux,4 Evandro de Azambuja,5 Donatienne Taylor,6 Gracienne Staelens,7 Guy Orye,8 Zuzana Hlavata,9 Helga Hellemans,10 Carine De Rop,11 Patrick Neven,1,3 Freija Verdoodt2 1KU Leuven - University of Leuven, Department of Oncology, Leuven, B-3000, Belgium; 2Belgian Cancer Registry, Research Department, Brussels, Belgium; 3University Hospitals Leuven, Multidisciplinary Breast Center, Leuven, B-3000, Belgium; 4Department of Medical Oncology, King Albert II Cancer Institute, Cliniques Universitaires Saint-Luc, Brussels, Belgium; 5Institut Jules Bordet and l’Université Libre de Bruxelles (U.L.B), Brussels, Belgium; 6CHU UCL Namur, Site Sainte-Elisabeth, Namur, Belgium; 7Multidisciplinary Breast Center, General Hospital Groeninge, Kortrijk, Belgium; 8Department of Obstetrics and Gynecology, Jessa Hospital, Hasselt, Belgium; 9Department of Medical Oncology, CHR Mons-Hainaut, Mons, Hainaut, Belgium; 10Department of Obstetrics and Gynaecology, AZ Delta, Roeselaere, Belgium; 11Department of Obstetrics and Gynaecology, Imelda Hospital, Bonheiden, Belgium Correspondence: Hava Izci, KU Leuven, Department of oncology, Herestraat 49 Box 7003-06, Leuven, 3000, Belgium, Email [email protected] Purpose: High-quality population-based cancer recurrence data are scarcely available, mainly due to complexity and cost of registration. For the first time in Belgium, we developed a tool to estimate distant recurrence after a breast cancer diagnosis at the population level, based on real-world cancer registration and administrative data. Methods: Data on distant cancer recurrence (including progression) from patients diagnosed with breast cancer between 2009– 2014 were collected from medical files at 9 Belgian centers to train, test and externally validate an algorithm (i.e., gold standard). Distant recurrence was defined as the occurrence of distant metastases between 120 days and within 10 years after the primary diagnosis, with follow-up until December 31, 2018. Data from the gold standard were linked to population-based data from the Belgian Cancer Registry (BCR) and administrative data sources. Potential features to detect recurrences in administrative data were defined based on expert opinion from breast oncologists, and subsequently selected using bootstrap aggregation. Based on the selected features, classification and regression tree (CART) analysis was performed to construct an algorithm for classifying patients as having a distant recurrence or not. Results: A total of 2507 patients were included of whom 216 had a distant recurrence in the clinical data set. The performance of the algorithm showed sensitivity of 79.5% (95% CI 68.8– 87.8%), positive predictive value (PPV) of 79.5% (95% CI 68.8– 87.8%), and accuracy of 96.7% (95% CI 95.4– 97.7%). The external validation resulted in a sensitivity of 84.1% (95% CI 74.4– 91.3%), PPV of 84.1% (95% CI 74.4– 91.3%), and an accuracy of 96.8% (95% CI 95.4– 97.9%). Conclusion: Our algorithm detected distant breast cancer recurrences with an overall good accuracy of 96.8% for patients with breast cancer, as observed in the first multi-centric external validation exercise. Keywords: machine learning, breast cancer, distant metastases, recurrences, algorithm, administrative data Introduction Cancer recurrence is considered to be an important cancer outcome metric to measure the burden of the disease and success of (neo)adjuvant therapies. Despite this, high-quality breast cancer recurrence rates currently remain unknown in most countries, including Belgium. To date, cancer recurrence is not systematically registered in most population-based cancer registries, due to the difficulty and labor-intensity of registering follow-up for recurrences. Recurrence definitions used for registration purposes differ among countries, due to the lack of consensus regarding a standardized clinical definition. Defining recurrence clinically is a challenge, since various methods exist to detect recurrences after (neo)adjuvant treatments of a patient such as physical examination, pathological examination, imaging, or tumor markers. Unlike the guidelines and definitions that currently exist in the clinical trial setting, 1 , 2 no guidelines are set to correctly and consistently register a recurrence in a patient with stage I–III breast cancer at diagnosis. Real-world recurrence data could give an estimation of cancer burden and efficacy of cancer treatment modalities outside a conventional clinical trial setting, which could eventually lead to improvements in quality of care. 3 , 4 Administrative data from health insurance companies on medical treatments and procedures, also known as bill claims, and hospital discharge data could represent an alternative source for the assessment of disease evolution after breast cancer treatment. Recently, machine learning algorithms based on classification and regression trees (CART) have been developed to detect cancer recurrence at the population level using claims data. 5 However, only in a limited number of countries, research teams were able to successfully construct algorithms to detect breast cancer recurrences, and only for a small number of centers (USA, 6 , 7 Canada, 8 , 9 Denmark 10 , 11 and Sweden) 12 Our aim was to develop, test and validate an algorithm using administrative data features allowing the estimation of breast cancer recurrence rates for all Belgian patients with breast cancer. Methods Study Population To construct and validate an algorithm to detect distant recurrences, female patients with breast cancer diagnosed between January 1, 2009 and December 31, 2014 were included from nine different centers located in all three Belgian regions. We did not include patients with stage IV breast cancer at diagnosis, patients with a history of cancer (any second primary cancer, multiple tumors, and contralateral tumors), or patients who could not be coupled to administrative data sources. All breast cancers, regardless of molecular subtype, were included. Among the nine centers were centers from the Flemish region (University Hospitals Leuven, General Hospital Groeninge, Jessa Hospital, Imelda Hospital, and AZ Delta), Brussels-Capital region (Cliniques universitaires Saint-Luc and Institut Jules Bordet) and Walloon region (CHR Mons-Hainaut and CHU UCL Namur). For all nine centers, 300 patients were included per center, by randomly selecting from the study population 50 patients per incidence year. The study population of six centers was divided by randomization (60–40% split-sample validation) into a training set to develop the algorithm, and an independent test set to perform an internal validation. 13 The algorithm was additionally validated with an external validation set of the three remaining centers, to check reproducibility of the algorithm in a dataset with patients from other centers. Definition of Distant Recurrence: Manual Chart Review For the selection of the nine centers, we aimed for a reasonable variety of center characteristics based on teaching vs non-teaching hospital, the spread across the three regions in Belgium, and center size. For each patient in the study population, recurrence status (yes, no, unknown) and recurrence date (day, month, year) were extracted and collected from electronic medical files and reviewed by trained data managers from each of the nine hospitals. Recurrence was defined as the occurrence of a distant recurrence or metastasis between 120 days after the primary diagnosis and within 10 years of follow-up after diagnosis or end of study (December 31, 2018). Data managers were instructed to consider death due to breast cancer in our definition of a recurrence. Loco-regional recurrence, was not considered as an outcome in our study. Both patients with a progression (without a disease-free interval) and patients with a recurrence (with a disease-free interval) were considered as outcome in our definition of recurrence. Patients with an unknown recurrence status, due to the lack of follow-up for example, were excluded from the analysis. Patients with a recurrence within 120 days were considered de novo stage IV and therefore excluded because interference of first-line treatment complicates recurrence detection. Starting from diagnosis to detect recurrent disease might cause more false positive recurrence cases due to the treatment of the initial breast cancer overlapping with the immediate first-line treatment due to metastatic disease. Recurrence diagnosis date was the time-point (described in day, month, and year), confirmed by pathological examination, imaging (CT, PET-CT, bone scintigraphy or MRI scan), or defined by physicians in the multidisciplinary team meeting (MDT). Administrative Data Sources and Linkage In the course of an extensive data linking process with pseudonymization of the patient data, the recurrence data from the hospitals (i.e., gold standard) were linked to several population-based data sources. These included cancer registration data from the Belgian Cancer Registry (BCR), and administrative data sources, including claims or reimbursement data (InterMutualistic Agency, IMA), 14 hospital discharge data (Technische Cel, TCT), 15 information on vital status (Crossroads Bank for Social Security, CBSS) 16 and cause of death (“Agentschap Zorg en Gezondheid”, “Observatoire de la Santé et du Social de Bruxelles-Capitale”, and “Agence pour une Vie de Qualité” – AVIQ). 17 Information on data sources and data used is presented in Appendix 1 . Pre-Processing and Feature Extraction To build a robust algorithm to detect distant recurrences, pre-processing and extraction of features were performed. Expert-driven features to potentially detect recurrences in administrative data were created based on recommendations from breast oncologists (P.N. and H.W.). First, a comprehensive list of reimbursement codes for diagnostic and therapeutic procedures and medications was selected, and code groups were created based on their relevance for the diagnosis and/or treatment of distant metastasis in breast cancer patients (See Appendix 2 ). Potential features were further refined based on the exploration of data from patients with a recurrence, including time-frames starting from time points after diagnosis (0 days, 90 days, 160 days, 270 days, and 365 days after diagnosis). We assessed different time-frames to obtain the most accurate feature to detect recurrences, and because starting from the date of diagnosis might result in noise from the treatment of the initial breast cancer. We additionally created features based on count of codes, by assessing the maximum number of codes per year or per pre-defined time-frame (starting from 0, 90, 160, 270, and 365 days after diagnosis) ( Table 1 ). The best performing time-frame was selected for each feature by maximizing the Youden’s J index: 18 Table 1 List of Potential Markers for Recurrence (Available Within Administrative Data) Based on Recommendations from Breast Oncologists Feature Selection and Model Development After a feature list was obtained (as described in previous section), this list was narrowed down based on the ensemble method of bootstrapping. 19 In total 1000 bootstrap samples were used to generate 1000 classification and regression trees (CART) using the same training set, and to select best-performing features based on the frequency of the features. 19 , 20 Cost-complexity pruning was applied for each bootstrap sample, to obtain the best performing model and avoid over-fitting of the model to the dataset. 20 CART inherently uses entropy for the selection of nodes or features. The higher the entropy, the more informative and useful the feature is. 20 A 10-fold cross-validation was also performed to ensure robustness of the model in different training sets. Collinearity of the selected features was accounted for by the one standard error (1-SE) rule, to eliminate redundant features. The 1-SE rule selects the least complex tree that is within 1 standard error from the best performing tree. 21 Based on the selected features from the bootstrapping, a principal CART model was built to classify patients as having a recurrence or not by using the complete training set. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and classification accuracy was calculated for evaluating and comparing the performance of the principal CART model. All models were created and trained in SAS 9.4 (SAS Institute, Cary, NC, USA) within the SAS Enterprise Guide software (version 7.15 of the SAS System for Windows). Results Data for a total of 2507 patients could be retrieved from nine Belgian centers and were included in the final dataset to train, test and externally validate the algorithm ( Figure 1 and Table 2 ). The mean follow-up period was 7.4 years. For the split sample validation, the patients from six centers were split into the training set (N = 975 of which 78 distant recurrences, 8.0%) and internal validation set (N = 713 of which 56 distant recurrences, 7.9%). The external validation set consisted of three independent centers with 819 patients, of which 82 had distant recurrences (10.0%). The training, internal validation, and external validation sets did not have differences in distribution of baseline tumor and patient characteristics ( Table 2 ). Table 2 Baseline Patient and Tumor Characteristics Figure 1 Patient inclusion flow diagram. Based on bootstrap aggregation, 1000 CART models were built using the following features: (1) “Presence of a follow-up MDT meeting, starting from 270 days after diagnosis” (feature present in 975 out of 1000 CART models), (2) “Maximum number of CT codes present (with a moving average over time) of 5 or more times a year” (851 CART models), and (3) “Death due to breast cancer” (412 CART models) (see Supplementary Figure 1 ). Afterwards, the final CART model was constructed with these three features and calculated by using all data of the training set ( Figure 2 ). Figure 2 Final CART model to detect recurrences based on the three selected features after bootstrapping. Nodes represent selected features by the algorithm to classify patients. Abbreviations: MDT, multidisciplinary team meeting; CT, computed tomography scan. The sensitivity of the principal CART model to detect recurrences for the training set was 79.5% (95% confidence interval [CI] 68.8–87.8%), specificity was 98.2% (95% CI 97.1–99.0%), with an overall accuracy of 96.7% (95% CI 95.4–97.7%) ( Table 3 ), and an AUC (area under the curve) of 94.2%. After 10-fold cross-validation within the training set, we found a sensitivity of 71.8% (95% CI 66.4–86.7%), specificity of 98.2% (95% CI 96.3–98.5%) and overall accuracy of 96.1% (95% CI 94.7–97.2%). The internal validation (i.e. based on test set) resulted in a sensitivity of 83.9% (95% CI 71.7–92.4%), a specificity of 96.7% (95% CI 95.0–98.9%), and accuracy of 95.7% (95% CI 93.9–97.0%). After external validation was performed on three additional centers, the sensitivity was 84.1% (95% CI 74.4–91.3%), with a specificity of 98.2% (95% CI 97.0–99.1%) and accuracy of 96.8% (95% CI 95.4–97.9%). Table 3 Performance of Training Set, Cross Validation, Internal Validation Set and External Validation Set Discussion Main Findings In this study, we were able to successfully develop a machine learning algorithm to detect distant recurrence in patients with breast cancer, achieving accuracy of 96.8% after external validation in multiple centers across Belgium. The final list of detected parameters were presence of a follow-up MDT meeting, CT scan (max 5 times a year), and death due to breast cancer. Recurrence data are lacking in many population-based cancer registries due to the cost and labor-intensity of registration. 3 True incidence of cancer recurrence should be known across age groups and regions in Belgium, to measure burden of illness and eventually improve quality of care. Current recurrence numbers are often extrapolated from clinical trials, which typically exclude older and frail patients. Older patients are more susceptible to receive under-treatment and to recurrences 22 , 23 and recurrence numbers could therefore be underestimated. The administrative data sources used in our algorithm virtually cover all residents of Belgium, 14 which was useful to achieve population-based recurrence data. We were also able to accomplish a multi-centric study by developing the training model and performing an external validation based on data of multiple centers. Likewise, it is highly important to have a relatively large population and reliable gold standard to develop and train a machine learning model in these studies, to avoid prolonging and complicating the feature selection process due to conflicting recurrence and treatment data occurrence. The definition of a distant recurrence in medical files was the occurrence of a distant recurrence or metastases after a period of 120 days. This time-frame until detection of recurrence varied among previous studies. 24–27 Most common exclusions were done either from 120 days (Chubak et al 2012) or 180 days after diagnosis (A’mar et al 2020). Disease progression can be difficult to measure accurately and can be overestimated because of timing of therapeutic procedures that might be delayed. The limitation of our study was that we could not make a distinction between disease progression and disease recurrence. Defining medical recurrence in the clinic is a challenge, which makes it more difficult to define recurrence with a proxy based on administrative data. 28 Therefore, setting a clear definition of window of treatment and the time-frame for detection of recurrence is considered important for future studies. We chose to restrict our definition to distant recurrences to achieve a straightforward feature selection. We included death due to breast cancer as an outcome in our definition of recurrences. Cause-specific death and accurate source of cause of death is of utmost importance when studying recurrences, since recurrence and death are closely related to each other. 29 The machine learning algorithm used in this study was a decision tree, i.e. the Classification And Regression Tree (CART) with the ensemble method. Ensemble learning combines multiple decision trees sequentially (boosting) or in parallel (bootstrap aggregation). The key advantages of using bootstrap aggregation are: better predictive accuracy, less variance, and less bias than a single decision tree. Similarly, latest studies more often make use of ensemble methods. 7 , 9 , 12 Within the recurrence detection features that were selected from the bootstrapping method for the cohort of six different Belgian centers, no treatment features were selected, which could indicate that there are more inter-center similarities for diagnostic regimens and more differences in terms of treatment regimens. During pre-processing of the features, we did additional checks of features to improve accuracy of the model. For instance, we generated a treatment feature that only included metastases-specific chemotherapy agent codes. However, this feature was not included in the final model. Next, we tried out a model without diagnostic features, but this did not improve accuracy. Previous studies mostly make use of metastatic diagnosis codes (secondary malignant neoplasm or SMN code from ICD-9 or ICD-10) in their algorithm, which would be useful if highly reliable. We also checked subgroups by testing out different models for patients younger or older than 70 years, and different incidence years. We applied the algorithm on subgroups based on age or incidence years, to check if the algorithm accuracy performed better in specific subgroups. As expected, we found higher performance in younger patients ( Supplementary Table 1 ). Our algorithm performance was comparable to previous studies using decision trees. 9 , 12 , 24 , 30–32 We found greater accuracy compared with the pooled accuracy of previous algorithms. 5 Although algorithms with highest overall accuracy are often sought-after in earlier studies, some studies also provide multiple algorithms to choose from based on their preference, e.g. high-sensitivity or high-specificity algorithms. 6 , 10 , 24 , 26 , 30 Finally, we also investigated the false negative cases from University Hospitals in Leuven to explain why these cases were misclassified. We found that in most false negative cases, the patients were missed due to the lack of attestation of the claims or management of the patients’ procedures. These cases were most likely patients for which there was a decision to withhold treatment because of comorbid disease, older age, the prognosis of the recurrence, or patients’ treatments were reimbursed by the sponsor of a clinical trial. Previously, algorithms based on administrative claims data to detect breast cancer recurrences at the population level have been established. 5 , 7–10 , 12 For example research groups from the USA, Canada, and Sweden have built algorithms to detect recurrences in a delimited region within a population. Recent results from these groups have proven that machine learning algorithms based on administrative data can be used to detect recurrences, in the absence of systematic registration. These studies, however, only encompassed a few centers and were thus not validated in a larger cohort of a population. Moreover, most of these algorithms included complete metastasis-specific International Classification of Diseases (ICD)-codes to detect recurrences. Since metastasis-specific codes are not complete in our database, we were not able to use this code in our algorithm. Particularly, the Danish registry has actively collected recurrence information in the Danish Breast Cancer Group (DBCG) clinical database, which they were able to use to construct and validate population-based recurrence-algorithms to complete their recurrence database. 10 , 11 Additionally, they were able to look into long-term recurrences beyond 10 years after incidence date. 4 , 33 The objective of this study was to develop an algorithm that could be used on a nation-wide level to estimate population-wide distant recurrences. Compared with other studies, we used a large sample size and reported both internal and external validation, which was hardly reported in earlier studies. 5 Another strength of our study was that unlike many other studies from the USA using Medicare claims, 34–38 we were able to include all eligible patients with a breast cancer diagnosis, and not just patients older than 65 years. Although we used different diagnosis and treatment code sources, it should be noted that treatment regimens often change over time and adaptation of the features should be performed for later use. Adapting the algorithm based on changes in diagnosis or treatment regimens might be necessary to obtain accurate recurrence rates of more incidence years in the future. Ideally, we would also prefer to have long-term follow-up and claims data for patients to detect long-term recurrences. However, due to regulations and the large bulk of data that is generated, a longer follow-up of the codes was not possible within the current study. Longer follow-up of recurrences and administrative data would likely improve the accuracy and lead to a more robust algorithm. In conclusion, our machine learning algorithm to detect metastatic breast cancer recurrences performed with high accuracy after external validation. Claims data are available for medical procedures and medications, hospital discharge data, vital status and cause of death data on the whole population level, which allows the development of models for Belgium. This substantiates the feasibility to develop and validate recurrence algorithms at the population level and might encourage other population-based registries to develop recurrence models or actively register recurrences in the future as these become progressively important. These rates are valuable to gain more insights about recurrences outside the clinical trial setting and might unveil the importance of active registration of recurrences. Abbreviations AUC, Area under the curve; ATC, Anatomical Therapeutic Chemical classification; AVIQ, “Agence pour une Vie de Qualité”; BCR, Belgian Cancer Registry; CA15-3, Cancer antigen 15-3; CART, Classification and regression tree; CBSS, Crossroads Bank for Social Security; CT, Computed tomography; FN, False negatives; FP, False positives; ICD, International Classification of Diseases and Related Health Problems; IMA, InterMutualistic Agency; MDT, Multidisciplinary team meeting; MRI, Magnetic Resonance Imaging; MZG, “Minimale Ziekenhuis Gegevens”; NPV, Negative predictive value; PPV, Positive predictive value; PET-CT, Positron emission tomography – computed tomography; SE, Standard error; SMN, Secondary malignant neoplasm; TN, True negatives; TP, True positives. Data Sharing Statement The data that support the findings of this study are available upon reasonable request. The data can be given within the secured environment of the Belgian Cancer Registry, according to its regulations, and only upon approval by the Information Security Committee. Ethics Approval and Consent to Participate This retrospective chart review study involving human participants was in accordance with the ethical standards of the institutional and national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. This study was approved by the Ethics Committee of University Hospitals Leuven (S60928). Informed consent for use of data of all participants was obtained. Author Contributions All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work. Funding

NAMUR Frequently Asked Questions (FAQ)

  • When was NAMUR founded?

    NAMUR was founded in 1949.

  • Where is NAMUR's headquarters?

    NAMUR's headquarters is located at Leverkusen.

  • Who are NAMUR's competitors?

    Competitors of NAMUR include All for One Group and 4 more.

Compare NAMUR to Competitors

Onevision Logo

OneVision is an international manufacturing software solutions provider for the printing, publishing, and media industries. It provides services such as helpdesk, consulting, software-demo, training, and more. It was founded in 1993. It is based in Regensburg, Germany.


SchuttersMGZ is a printing and communication solutions company that specializes in content management, digital printing, offset printing, and printing services. It was founded in 1952 is based in Hasselt, Belgium.


TUV SUD provides testing and certification services. The company offers services such as auditing and system certification, global market access, risk management, technical advisory, and more. It was founded in 1866 and is based in Munich, Germany.

BSI Logo

BSI (British Standards Institution) provides multinational business services. It helps clients drive performance, manage risk, and grow sustainably through the adoption of international management systems standards. BSI's influence spans multiple sectors with a particular focus on aerospace, automotive, built environment, food, healthcare, and information technology (IT). It was founded in 1901 and is based in London, United Kingdom.

Life Science Nord

Life Science Nord is the regional industry network for medical technology, biotechnology, and pharma. It offers communicating knowledge, productive knowledge, consulting, exploiting synergies, marketing knowledge, communication knowledge, and more. It was founded in 2004 and is based in Hamburg, Germany.


Crossbase offers a software solution for cross-media communication from a single database. The applications range from product data maintenance with connection to the ERP system, media asset management and translation to the automation of print publications and the output of websites, online catalogs with shop functions, tablet applications, and electronic catalogs for e-business. The company's solutions are utilized for mechanical and electrial engineering; building elements and materials; medical technology; consumer goods; and eCommerce. It was founded in 2001 and is based in Boblingen, Germany.

Discover the right solution for your team

The CB Insights tech market intelligence platform analyzes millions of data points on vendors, products, partnerships, and patents to help your team find their next technology solution.

Request a demo

CBI websites generally use certain cookies to enable better interactions with our sites and services. Use of these cookies, which may be stored on your device, permits us to improve and customize your experience. You can read more about your cookie choices at our privacy policy here. By continuing to use this site you are consenting to these choices.