Does anyone know public open large datasets with data - ResearchGate This can be used to query the Administrative Regions dataset; municipality.name: name of the municipality. However, this information is summarized in the Customer site dataset where for each square grid the number of customer sites is recorded along with the information about the power line they are connected to. This dataset provides information about the telecommunication activity over the city of Milano. During the same connection a CDR is generated if the connection lasts for more than 15min or the user transferred more than 5MB. Telecom Italia Big Data Challenge MIT Media Lab ISSN 2052-4463 (online). As you can see, the data was supplied in batch mode, using downloadable compressed files, or through API, if this kind of access is meaningful.API data access allows a specific audience to use data more quickly, easily and efficiently when they are looking to do something specific with the information. Dataset with 6 projects 1 file 1 table. Each sensor has a unique ID, a type and a location. Correspondence to The dataset supplies information regarding the current flowing through the distribution lines and details about how the distribution lines are spread over the Trentino territory. In the energy layer the red color represents the sum of consumed electricity. Telecom Italia Confirms Exploring Strategic Options for Network Similarly to the physical network where people and goods move, the virtual network determines how information and knowledge moves. https://doi.org/10.1038/sdata.2015.55, DOI: https://doi.org/10.1038/sdata.2015.55. The city has a population of about 1.3 million. A paid subscription is required for full access.. Use of any data must be accompanied by a hyperlink reading "from BigDataChallenge contest" and linking to either the ODI node Trento section homepage or the page referring to the information in question. This dataset provides information regarding the level of interaction between the Province of Trento and the Italian provinces.The level of interaction between an area A of the Province of Trento and a province B is given as a pair of decimal numbers. Each news is referred to the geographical location where the event happened. From the RBS it is possible to obtain an indication of the user's geographical location, thanks to the coverage maps Cmap which associates each RBS to the portion of territory which it serves (AKA coverage area, Fig. The goal of this challenge was to come up with technological ideas related to big data that in return. CDRs log the user activity for billing purposes and network management.The spatial aggregation values are provided for the squares of the Tretino GRID.The temporal values are aggregated in timeslots of ten minutes. Cellular traffic prediction with machine learning: A survey S Different sensors can share the same location. Journal of The Royal Society Interface 11, 20130789 (2014). SET Distribuzione SPA manages almost all the electrical network over the Trentino territory. the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in 1). Different types of software and tools were used in the dataset generation process and it would have been too complicated to share and explain all the used source code used. The data provides information of Telecom Italia's customers interacting with the network and of other people using it while roaming. The Census dataset represents an interesting source of information that can be linked to the data described in this paper to, for example, understand and predict the socio-economic well-being of a given territorial area. (t) follows the rule: where k is a constant defined by Telecom Italia, which hides the true number of calls, SMS and connections. & Ratti, C. Towards a comparative science of cities: using mobile traffic records in new york, london, and hong kong. It is a value between 0 and 3; Coverage: percentage value of the quadrant covered by the precipitation; Type: type of the precipitation. Analogously, Telecom Italia in association with EIT ICT Labs, SpazioDati, MIT Media Lab, Northeastern University, Polytechnic University of Milan, Fondazione Bruno Kessler, University of Trento. MathSciNet For this reason, we shared a simpler version of the code, to better understand part of the process explained in the Methods section. Defined as type 0; Slight: precipitation quantity equal in [0,2] mm/h. The possible values are, - 90: address (e.g., Via del Brennero, 52). Gianni Barlacchi and Marco De Nadai: These authors contributed equally to this work. The software is written in Python 2.7 and can be found at [Data citation 1]. Sensor ID: identification string of the sensor; Sensor street name: the street name where the sensor identified by the Sensor ID is located; Sensor lat: the geographical latitude specifying the position of the sensor identified by the Sensor ID; Sensor long: the geographical longitude specifying the position of the sensor identified by the Sensor ID; Sensor type: the type of the sensor identified by the Sensor ID; UOM: the unit of measurement of the value recorded by the sensor identified by the Sensor ID. date: publication date, formatted according to ISO 8601; timestamp: Unix timestamp generated from the publication date; municipality.acheneID: Dandelion achene for the municipality. estimating poverty maps using aggregated mobile communication networks. Line id: identification string of the distribution power line; Timestamp: timestamp relative to the instant when the measurement of the current passing through the power line is done. Algorithms | Free Full-Text | Citywide Cellular Traffic Prediction This dataset contains data derived from an analysis of geolocalized tweets originated from Milan during the months of November and December.Each row corresponds to a tweet. CAS MATH The images or other third party material in this article are included in the articles Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. The shared datasets were created combining all this anonymous information, with a temporal aggregation of time slots of ten minutes. This dataset provides information about the current administrative regions in Europe. This dataset contains measurements about temperature, precipitation and wind speed/direction taken in 36 Weather Stations. Square id: identification string of a given square of the Trentino GRID; Line id: identification string of the distribution power line, which is grouped with the Trentino GRID square; Number of customer sites: number of customer sites present in a given square of the Trentino GRID, connected to the grid powerline (Line id). Defined as type 2; Heavy: precipitation quantity equal to in [10,100] mm/h. This dataset contains all the articles published on the website trentotoday.it from 01/11/2013 to 31/12/2013. Defined as type 3. while the precipitation intensity is characterized as Absent (type: 0), Rain (type: 1) and Snow (type: 2). Metropolitan Cellular Traffic Prediction Using Deep Learning Techniques This dataset contains all the articles published on the website trentotoday.it from 01/11/2013 to 31/12/2013.The values are not spatially aggregated.The temporal aggregation values are discrete. The . Barlacchi, G., De Nadai, M., Larcher, R. et al. This dataset is temporally aggregated every 10min and spatially aggregated in four quadrants of equal size of 11.7511.75km, corresponding to 50 squares of the grid used for the aggregation. Wesolowski, A. et al. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide. This dataset provides information about precipitation intensity and type over the city of Milan. Similarly, this happened for the New Year eve in all areas of Milan and Trentino. The private equity firm is debating whether it may need to eventually increase its offer to around 70 . 10 20 0.0101 0.0693 0. . This is important as it allows us (and in turn the community) to benefit from reuse of our work, and so allows us to continue to provide this service. For privacy reasons this information is hidden, meaning that in the dataset the energy flowing is uniformly distributed among the various types of customers. Cartography and Geographic Information Science 41, 260271 (2014). T elecom Italia, a telecom company in Italy, organized a Big Data Challenge back in 2014. Strength: Value representing the directional interaction strength between Square id1 and Square id2. For example, Louail et al.10designed various indexes to quantitatively define the typology of cities and their spatial structure, such as: number of hotspots, which scales with the population following a power law; hotspots' relative importance that evolves during the day. Timestamp: timestamp value with the following format: YYYYMMDDHHmm; Square id: id of a given square of Milan/Trentino GRID; Intensity: intensity value of the precipitation. Its main role was to provide an affordable way to access to all the data related to the challenge and Dandelion is the original platform where all of this data was published.Its not the first time that some large datasets are made available to the public, through a controlled access: we can cite the public data sets published on Amazon S3, for example.But its the first time that there is an official Open Data release starting from some Big Data sets: we know that its an hot topic.Using your account on dandelion.eu to access the data, let us to collect some useful insights on the real demand side of the Open Data value chain.Well publish these statistics of usage as Open Data, to make all the community involved more aware about the data value chain.Its also useful to give some real perceptions on the Smart Cities and Smart Communities visions. The dataset describes various meteorological phenomena type and intensity of Milan city using sensors located within the city limits, The dataset describes precipitation intensity and type over the city of Milan, The dataset describes the pollution type and intensity of Milan city using various types of sensors located within the city limits. The level of interaction between an area A of the Province of Trento and a province B is given as a pair of decimal numbers. The contest made available to developers, designers and scientists a large dataset of 30+ kinds of data (mobile, weather, energy, etc.) For this reason we provide some useful examples in [Data citation 1] which display this information. The technical quality validation of the datasets is limited due to the absence of similar datasets to compare our results with. In this paper we described the richest open multi-source dataset ever released on two geographical areas. There is also code to generate the box-plots in this paper; Box-plots showing the calls, SMS, and Internet CDRs distributions per weekday and per cell in Milan. Noulas, A., Mascolo, C. & Enrique, F. Exploiting foursquare and cellular data to infer user activity in urban environments. Aleix Bassolas, Hugo Barbosa-Filho, Jos J. Ramasco, Hugo Barbosa, Surendra Hazarie, Gourab Ghoshal, Jan Priesmann, Lars Nolting, Aaron Praktiknjo, Carmen Cabrera-Arnau, Chen Zhong, Soong Moon Kang, Scientific Data In the 2014 edition they provided data of two Italian areas: the city of Milan and the Province of Trentino. Cross-checking different sources of mobility information. Consequently, researchers can study cities through the lens of hotspots' stability; the spatial structure of hotspots and their aforementioned categories can be studied to determine the typology of a city (e.g., mono-centric cities). Data for development: the d4d challenge on mobile phone data. ADS This dataset provides information about the telecommunication activity over the city of Milano.The dataset is the result of a computation over the Call Detail Records (CDRs) generated by the Telecom Italia cellular network over the city of Milano. Analysis of Telecom Italia Mobile Phone Data by Space-time - Springer The last set contains all the information about civic numbers and maps used in the census of 2011. For this reason, it is possible to more restrictively define hotspots using the Loubar threshold introduced in ref. designed the dataset and wrote the paper. Square id1: identification string of the square of Milan/Trentino GRID that represents the origin of the interaction; Square id2: identification string of the square of Milan or Trentino GRID that represents the destination of the interaction; Directional Inter. As depicted in the mobile phone usage plot (see Fig. From mobile phone data to the spatial structure of cities. Time Interval: Start interval time expressed in milliseconds. Proceedings of ICMI, 427434 (2014). Italy State Lender to Drop $21 Billion Telecom Italia Offer Two types of CDR datasets were also produced to measure the interaction intensity between different locations: one from a particular area (Trentino/Milan) to any of the Italian provinces and one quantifying the interactions within the city/province (e.g., Milan to Milan). A.V. Aujasvi-Moudgil/Forecasting-Mobile-Network-Traffic - GitHub Tizzoni, M. et al. For instance, given the article http://www.milanotoday.it/eventi/concerti/eventi-capodanno-2014-milano.html, text: Tutti invitati al gran concerto di Capodanno in piazza [], title: Concerto Capodanno in piazza Duomo:, url: http://www.milanotoday.it/eventi/concerti/eventi-capodanno-2014-milano.html. i CDRs log the user activity for billing purposes and network management.The spatial aggregation values are provided for the squares of the Milano GRID.The temporal values are aggregated in timeslots of ten minutes, This dataset contains data derived from an analysis of geolocalized tweets originated from the Province of Trento during the months of November and December.Each row corresponds to a tweet. This dataset provides information regarding the directional interaction strength between the city of Milan different areas based on the calls exchanged between Telecom Italia Mobile users. It covers an area of more than 6,000km2, with a total population of about 0.5 million. Telecom Data | Kaggle Weekly spatial behaviour of the six selected areas in Milan and Trentino. The almost universal adoption of mobile phones and the exponential increase in the use of Internet services is generating an enormous amount of data that can be used to provide new fundamental and quantitative insights on socio-technical systems. Scientific reports 4 (2014). The lack of open datasets limits the number of potential studies and creates issues in the process of validation and reproducibility needed by the scientific community. Bajardi, P., Delfino, M., Panisson, A., Petri, G. & Tizzoni, M. Unveiling patterns of international communities in a global city using mobile phone data. PLoS ONE 9, 6 (2014). publicly available is the dataset published by Telecom Italia in 2014 as "the Big Data Challenge" [5]. The calls are received in the nation identified by the Country code; Internet traffic activity: number of CDRs generated inside a given Square id during a given Time interval. Schlpfer, M. et al. Bruno Lepri. arunasubbiah/milan-telecom-data-modeling - GitHub The Milano Grid is provided in GeoJSON format. name: The name of the administrative region; parentAchenes: A composite object storing the achene IDs of all the administrative regions in which the current entity is placed; localCode: official government code, based on the country the administrative region belongs to (for Italy: ISTAT); cadastralCode: official cadastral code, where available; postCodes: list of post codes in the area; population: data about the population of the administrative region; isProvinceCheflieu: (only for level=50) whether the provice is a cheflieu or not; isMountainMunicipality: (only for level=60) whether the administrative region is mountainous or not. Since Telecom Italia only possesses the data of its own customers, the computed interactions are only between them. Each record (or feature) describes a square providing the following information, This dataset provides information about the telecommunication activity over the Province of Trento.The dataset is the result of a computation over the Call Detail Records (CDRs) generated by the Telecom Italia cellular network over the city of Milano. ), others on a weekly basis (e.g., watching the favourite football team at the stadium). In the first layer we have the exact position of each customer site (e.g., some of them are industries, others are small houses) and the precise geometry of each line. 3). Because the 10 min interval dataset was quite sparse, it was not conducive to extracting spatiotemporal characteristics. Telecom Italia's board of directors has agreed to the spin-off of its 23 data centers into a separate business. PDF Big Data Analysis of Spatio-temporal Data The Telecommunications and Social pulse data make it possible to identify the hotspots of the city, defined as areas with high activity density with respect to the rest of the city. In each traffic component, the spatial-temporal attention module is designed to capture the dynamic spatial-temporal correlation of cellular traffic; the spatial-temporal convolution module. Telecom Italia: As part of the "Big Data Challenge", consists of data about telecommunication activity in the city of Milan and in the province of Trentino. converter.py It converts the raw CDRs to the grid overlay as explained previously. Region: Europe and Central Asia The data are released on 7 Italian cities: Bari, Milan, Naples, Rome, Turin, Venice and Palermo. & Capra, L. Poverty on the cheap. A multi-source dataset of urban life in the city of Milan and the Sci Data 2, 150055 (2015). wrote the paper. There is no spatial aggregation and the data is aggregated in 60min time-slots. Analogously, Telecom Italia in association with EIT ICT Labs, SpazioDati, MIT Media Lab, Northeastern University, Polytechnic University of Milan, Fondazione Bruno Kessler, University of Trento and Trento RISE recently organized the Telecom Italia Big Data Challenge (http://www.telecomitalia.com/tit/en/bigdatachallenge/contest.html), providing various geo-referenced and anonymized datasets. Journal of Machine Learning Research 12, 28252830 (2011). Scaiella, U. et al. Csji, B. et al. The data of Milan and Trentino are collected by ARPA (http://www.arpa.piemonte.it/rischinaturali) and by Meteotrentino (http://www.meteotrentino.it) respectively. Many of them are repeated on a daily basis (e.g., eating at noon, jogging in the evening etc. PLoS ONE 9, e105184 (2014). Smith-Clarke, C., Mashhadi, A. The results of the proposed networks are then validated using the Telecom Italia Dataset. 10. This dataset provides information about the current flowing through the electrical grid of the Trentino province. The census data have been released for 1999, 2001 and 2011. Data 2:150055 doi: 10.1038/sdata.2015.55 (2015). There are 18 telecommunications datasets available on data.world. Then, a new CDR is created recording the time of the interaction and the RBS which handled it. 1 10 0.2724 0.1127 0.0035 0.0807. R.L. On the use of human mobility proxies for modeling epidemics. Song, C., Qu, Z., Blumm, N. & Barabasi, A. FBK takes is the scientific partner on big data and open data policy. The dataset describes the pollution type and intensity of Milan city using various types of sensors located within the city limits. Moreover, there is also a weekly seasonality due to the work cycles behaviour of people (e.g., working days versus weekends). This dataset contains all the articles published on the website milanotoday.it from 01/11/2013 to 31/12/2013.The values are not spatially aggregated.The temporal aggregation values are discrete. The Telecommunication datasets provide data about the telecommunication activity in the city of Milan and in the Province of Trentino. The Call Detail Records (CDRs) are provided by the Semantics and Knowledge Innovation Lab (SKIL) (http://jol.telecomitalia.com/jolskil/) of Telecom Italia. Weekly Z-scaled behavior of SMS, calls, Tweets and Internet CDRs in Milan. It is expressed as a geojson point and projected in WGS84 (EPSG:4326). Nature comm. Moreover, the emergence of new geo-located Information and Communications Technology (ICT) services like Twitter and Foursquare introduces further opportunities for researchers to inspect quantitatively different aspects of human behaviour such as the social well-being of individuals and communities19, socio-economic status of geographical regions20, and people's mobility21. For example Italy is divided into marco-regions, which are divided into regions, which are divided into provinces, which are divided into municipalities and so on.The Italian Administrative Regions come from ISTAT, and are updated to 2011. PubMedGoogle Scholar. Time instant: the time instant of the measurement expressed as a date/time with the following format YYYY/MM/DD HH24 : MI; Measurement: the value of meteorological phenomena intensity measured at the Time instant by the Sensor ID. Comparing and modeling land use organization in cities. designed the dataset, processed the data and wrote the paper. Dataverse Tagged. EPJ Data Science 4, 10 (2015). 2). . Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More. plot_maps.py Shows the thematic maps of Fig. This work is licensed under a Creative Commons Attribution 4.0 International License. Identifying important places in peoples lives from cellular network data. Google Scholar. In Milan, the type and the intensity of the phenomena are continuously measured by different sensors located within the city limit. In addition, the data pertaining to the challenge have been released to the research teams under the Open Database License (ODbL), thus triggering a long tail of follow on research work based on these data2630. Hawelka, B. et al. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0 Metadata associated with this Data Descriptor is available at http://www.nature.com/sdata/ and is released under the CC0 waiver to maximize reuse.