Big Data Management Systems
The use data in making accurate, reliable and timely decisions has become a "sine qua non" factor of success for most modern businesses and organizations. At the same time, in recent years, with the development of new technologies and applications - such as the spreading of social networks, the extensive use of smart phones, the installation of sensors etc. - the volume and format of the data has changed dramatically: We now have volumes of petabytes and exabytes data in both text, audio, video, and image formats. The need to manage and exploit this data has led to the development of a new generation of systems, models and programming tools that are still in the embryonic stage such as: Map Reduce, Hadoop and its ecosystem, NoSQL, etc. Technologies enabling parallel data processing on a large scale and fault-tolerant way. The purpose of this course is to present the basic principles of these systems and how they work.
The course contents include:
- Basic knowledge: query processing, distributed and parallel query processing, distributed systems
- Programming language: Python
- MapReduce, Hadoop and ecosystem
- NoSQL, Key-Value Systems, Learning Redis
- NoSQL, Document-Store Systems, MongoDB learning
- Data Flow Management and Applications
- Interconnectivity in Large Data Management Systems