Leading applicant:
 
					Fondazione IRCCS "Istituto Nazionale Tumori", Analytic Epidemiology and Health Impact Unit (INT)
					Contacts:
 
					Roberto Lillini (roberto.lillini@istitutotumori.mi.it)
					Synthesis
					The protocol is created for cancer registry data collection, covering the following items:
					
						- File with breast cancer cases and geographic data: primary invasive female breast cancer (ICD9 174*, ICD10 C50*), selected from cancer registries data during a specific ten years period (ex: 2001 to 2010) are included in the project. It is mandatory to collect data with age at diagnosis less than 50 years of age, while it is not mandatory to collect data for all ages. Synchronous and metachronous breast cancer cases must be counted once. Cancer registration criteria must follow European Network of Cancer Registries (ENCR) rules. Residence addresses at diagnosis retrieved from the National or local Security system or from the personal data reference of each registry will be collected. Data can be collected in two different modalities
- Population files: for every CR, WASABY needs the reference population at the same geographic level on which that CR intends to study the incident cases. More specifically, the population files must contain the female population data by 5-year age groups, calendar year within time period and SU (sub-areas refer to the smallest geographical area for which required data are available and may be different across countries)
- Socio economic status (SES) and other confounders (Deprivation index): since this study includes different European countries, it is important for the measurement of socioeconomic deprivation to be comparable or at least transferable between different European countries, despite their socio-cultural differences, to improve the comparability and reproducibility across countries. The European Deprivation Index (EDI) measures the social environment in a comparable manner across countries, despite the differences in the census variables available, and to incorporate the social and cultural specificities of each country concerned. The ecological deprivation indices are built according to shared methodological principles, by selecting fundamental needs associated with both objective and subjective poverty, and they use the same theoretical concept of relative deprivation using a European survey dedicated to relative deprivation (Eu-Silc) regularly conducted on national samples from the all European countries. The method for constructing national versions of this EDI is described in different papers [Pornet C, JECH 2012; Guillaume E, JECH, 2016] and national versions of EDI are already available for 5 European countries (Italy, Portugal, Spain, England and France). This index is based on two elements:
						
							- The European survey on deprivation European Union Statistics on Income and Living Conditions (EU-SILC) is a cross-sectional and longitudinal sample survey providing data on income, poverty, social exclusion and living conditions in the European Union. From these data, the statistical office of the European Union (Eurostat—http://ec.europa.eu/eurostat/web/main) produces a European standardized questionnaire that is specifically designed to study deprivation. It consists of nine questions, common to European Union members, evaluating needs that directly or indirectly induce financial inability. For each European Union member, the sum of weights for the sample design and the response rate to a national questionnaire were tailored on the basis of the national population size. All analyses were weighted for non-response and adjusted for sample design, to ensure the representativeness of the results for each member
- The ecological data of the national population censuses. Ecological data came from the last exhaustive national population censuses, which were conducted in 2001 for Italy (Italian National Institute of Statistics: ISTAT), Portugal (National Institute of Statistics: INE), Spain (National Institute of Statistics: INE) and England (Office for National Statistics: ONS), and, in 1999, for France (National Institute for Statistics and Economic Studies: INSEE). To minimize the unavoidable ecological bias as much as possible, the smallest area for which census data were available was identified
 Also in this case, the emphasis is on the number of participating countries rather than that of participating CRs. This methodology was initially available for 5 countries: its replication in other participating countries is considered whereas the collection of national deprivation indexes is envisaged for areas where EDI cannot be computed.
- Other confounders: individual factors, e.g. ethnicity, family history, age, reproductive factors, alcohol intake, weight, physical activity, hormone therapy and oral contraceptives, have been found to influence the risk of breast cancer. Adherence to organized screening programmes in areas covered by cancer registries lead to an increment of incidence in those areas [Pacelli, Eur J Public Health, 2014]; such information, however, is not available at individual level. Where possible, information on adherence to organized cancer screening is to be collected at SU level. If data are collected only for patients under 50, screening adherence is not required
- Methods
							
								- Identification of risk areas across European Countries: for spatial representation we use the QGis open source GIS software which allows to create maps with many layers using different map projections and ArcGiS desktop 10.0 to improve geographical and spatial analysis where needed
- Spatial analysis: when large spatial units are used, the heterogeneity of exposure and different population characteristics can be missed. On the other hand, in small spatial units, the number of cancer cases is usually low and analysing the observed spatial pattern proves to be inefficient, as the population base, from which these cases arise, is often very low too. This can lead to unstable and misleading estimates of the true value. Modern approaches to relative risk estimation often rely on smoothing methods. The basic idea of mapping smoothed estimates is to borrow information from neighbouring regions to produce more stable and less noisy estimate associated with each geographical area and thus separate out the spatial pattern from the noise [Waller LA, Applied Spatial Statistics for Public Health Data, Wiley, NJ, 2004]. Taking into account these considerations we perform different statistical methods according to the available data, as follows:
								
									- Estimate of a census block level breast cancer incidence risk using Generalised Additive Models (GAMs). This is a form of non-parametric or semi-parametric regression offering the possibility to analyse contextual data while adjusting for covariates and taking into account spatial autocorrelation [Woods SN, Chapman and Hall, USA 2006]. This model takes into account the spatial dependence of the data and the incidence rate variability that is due to the small number of events per geographic unit, by using a locally weighted regression smoother to account for geographic location as a possible predictor of incidence rate [Webster T, Env Health Persp, 2008]. With this model it is possible to estimate the relative risk by adjusting for covariates
- Estimate of a census block relative breast cancer risk, by using the Besag, York and Mollié (BYM) model [Besag J, Ann Inst Stat Math, 1991]. This method assumes the existence of two sources of extra variation, one spatial and one non-spatial. The BYM model can be specified as a generalised linear mixed model (GLMM) with Poisson response variables, and considering the expected cases as an offset. The non-spatial random effect, also called heterogeneity, is usually assumed to be distributed with zero mean and constant variance. For the random effect, which captures spatial variability, a conditional autoregressive (CAR) [Clayton DG, Int J Epidem, 1993] model is used. The BYM model enables us to obtain smoothed estimates in each sub-area and, on the other hand, to estimate the effects of possible explanatory variables, such as the deprivation index
 
 Open source software is used for data manipulation and statistical analyses such as R and WinBUGS