The Graduate Data Science Initiative (GDSI) is an initiative that aims to create an environment where students both undergraduates and graduates can learn about data science at an introductory, intermediate, and/or advanced level. The initiative’s main objective is to equip students with the right skills to enter the growing and expanding market of data science.
As a member of the GDSI you will be able to learn about the different tools, methods, and technologies being used within the data science community. You will be exposed to real-world use case scenarios of data science in the industry and academia. You will learn about how machine learning is used to predict stock market trends, how natural language processing is used to determine the sentiment on a particular topic, how programming languages like R, Clojure, and Python are leading the way in performing data analysis, how NoSQL databases are providing a new paradigm to store unstructured data retrieved from social networks, sensors, log files, etc., how technologies like Hadoop are revolutionising data processing and parallel computing, how recommendations systems work inside Amazon, Netflix, Spotify, how DNA sequencing is being facilitated by machine learning algorithms, how data mining is helping in better understanding human-environment interactions and social economic dynamics, how predictive analytics is helping today’s business leaders in making key decisions and transforming their businesses into huge success stories and many more.
They say “Data is the new oil”. The amount of data in the world is increasing at incredible rates. Over 90% of the world’s data was generated in the last two years alone. This overwhelming explosion of data is expected to increase in the next five to ten years as more and more data sources become available and the digital world is infused deeper in our daily lives. With the increase of data, comes the increase of difficulty in managing huge amounts of datasets which are usually unstructured in nature. This includes data from social networks, log files, sensors, etc. This unstructured data posses a new challenge in terms of storage using current database technologies that adhere to the relational model. Furthermore, there are new challenges in terms of managing huge amounts of data in a timely fashion. Computers have a limited capacity as to the amount of data they can process at any one time. Therefore, computing power must be distributed across a number computing devices, making processing of data more efficient. Moreover, once data has been stored and processed, meaningful actionable insights should be generated so that the end user will benefit in one way or another.
The whole idea of Data Science is to ultimately “turn data into insights into action”. The term Data Science is used interchangeably with the term Big Data. The whole spectrum of Data Science ‘processes’ can be highly complicated to implement and put together. However, the benefits far outweigh the cost and time spent. As such, companies from all sectors and industries have started on a hiring spree to recruit the best in the field to provide Data Science solutions to their clients. The most interesting aspect of Data Science is that it is applicable to almost every domain/industry where data is a factor. Financial services use Data Science to detect and prevent fraud, saving tens of millions of dollars. The games industry uses Data Science to estimate the value of customers coming through different marketing channels, improve game levels by analysing gamers’ behaviour, and encourage users to upgrade to paid versions. In online movie distribution networks such as Netflix, data about online viewers such as movies watched, likes, and user preferences are gathered to generate meaningful recommendations. Cyber security uses data science to analyse network logs to detect and predict network intrusions. Data Science has helped hospitals to offer better treatments to their patients, and many more.
Companies across a range of industries are investing heavily in Data Science. There is an increasing demand for data scientist. The following graph illustrates this demand over a period of five years with a notable growth happening between 2011 and 2013. There has been a 15,000% increase in demand for data scientists between the summers of 2011 and 2012 alone.
Fig 1. (source: indeed.com)
However, despite the huge demand, there is a huge skills shortage in this area. It is estimated that the United States alone could face a skills shortage in this area of 140,000 to 190,000 people by 2018. Part of the reason for this shortage is the lack of university courses necessary to equip students with the right skills to enter this market. The good news is that you don’t necessarily need a university degree to become a data scientists. In addition, you don’t necessarily need to have any programming experience to become a data scientist. In other words, data scientists generally come from a number of different disciplines such as biostatistics, econometrics, engineering, computer science, physics, applied mathematics, statistics, and other interrelated disciplines. Here is a rough illustrative guide on how to become a data scientist:
Fig 2. (source: Swami Chandrasekaran)
As you can see from the illustrative guide above Data Science has a spectrum of dimensions. Starting from data warehousing and data integration, statistics, machine learning and data mining, visualization, etc. One great way to kick-start your education and career in Data Science is to join the Graduate Data Science Initiative (GDSI). The GDSI’s main aim is to encourage and help students learn data science by learning from experts in the field who are already involved in Big Data projects and some of them even lead Data Science companies. You will have the opportunity to learn by listening to technical presentations, industry use case scenarios, participate in workshops, hackathons, and many more activities. The best part of it is that you’ll be doing all of this in a social environment where you’ll have the chance to follow up with questions and have fun in the meantime.
We at GDSI are really passionate about Data Science. We hope you are too. We can’t wait for you to join us and be part of our community.
Join us at the following link: http://www.meetup.com/Graduate-Data-Science-Initiative/