Menu

How to Become a Data Scientist

IMDiversity 7 years ago Comments Off on How to Become a Data Scientist 1.4 K

By Ed Tittel, Contributing Writer

With proper education, certification, planning and experience, working as a data scientist, or in some other Big Data role, is an achievable goal.

It will take at least three to five years for entry-level IT professionals to work their way into such a position (less for those with more experience or an advanced degree in the field), but it’s a job that offers high pay and one that is expected to stay in high demand for the foreseeable future.


All such roads lead to the same destination: a job assembling, analyzing and interpreting large data sets to look for information of interest or value.

Data science encompasses “Big Data,” data analytics, business intelligence and more. Data science is becoming a vital discipline in IT because it enables businesses to extract value about the many kinds and large amounts of data they collect in doing whatever it is that they do. For those who do business with customers, it lets them learn more about those customers.

For those who maintain a supply chain, it helps them to understand more and better ways to request, acquire and manage supply components. For those who follow (or try to anticipate) markets – such as financials, commodities, employment and so forth – it helps them construct more accurate and insightful models for such things. The applications for data science are limited only by our ability to conceive of uses to which data may be put – limitless, in other words.
In fact, no matter where you look for data, if large amounts of information are routinely collected and stored, data science can play a role.
It can probably find something useful or interesting to say about such collections, if those who examine them can frame and process the right kinds of queries against that data. That’s what explains the increasing and ongoing value of data science for most companies and organizations, since all of them routinely collect and maintain various kinds of data nowadays.

The basic foundation for a long-lived career in IT for anybody getting started is to pursue a bachelor’s degree in something computing related. This usually means a degree in computer science, management information systems (MIS), computer engineering, informatics or something similar. Plenty of people transition in from other fields, to be sure, but the more math and science under one’s belt when making that transition, the easier that adjustment will be.

Given projected shortages of IT workers, especially in high demand subject areas – which not only include data science, but also networking, security, software development, IT architecture and its various specialty areas, virtualization, and more – it’s hard to go wrong with this kind of career start.

For data scientists, a strong mathematics background, particularly in statistics and analysis, is strongly recommended, if not outright required. This goes along naturally with an equally strong academic foundation in computing. Those willing to slog through to a master’s or Ph.D. before entering the workforce may find data science a particularly appealing and remunerative field of study when that slog comes to its end. If so, they can also jump directly into mid- or expert/senior level career steps, respectively.

If data science is a long-term goal, the more experience one has in working with data, the better. Traditional paths into data science may start directly in that field, though many IT professionals also cross over from programming, analyst or database positions.

Much of the focus in data science comes from working with so-called “unstructured data” – a term used to describe collections of information usually stored outside a database such as large agglomerations of event or security logs, e-mail messages, customer feedback responses, other text repositories and so forth. Thus, many IT pros find it useful to dig into technologies such as NoSQL and data platforms such as Hadoop, Cloudera and MongoDB. That’s because working with unstructured data is an increasingly large part of what data scientists do. Early-stage career IT pros will usually wind up focusing on programming for big data environments, or working under the direction of more senior staff to groom and prepare big data sets for further interrogation and analysis.

At this early stage of one’s career, exposure to text-oriented programming and basic pattern-matching or query formulation is a must, along with a strong and expanding base of coding, testing and code maintenance experience. Development of basic soft skills in oral and written communications is a good idea, as is some exposure to basic business intelligence and analysis principles and practices. This leads directly into the early-career certifications mentioned in the next section.

Basic data science training is now readily available online in the form of massively open online courses, or MOOCs. Among the many offerings currently available, the January 2017 Quora article “What is the best MOOC to get started in Data Science?” offers a variety of answers, and lists courses from sources such as Duke (Coursera), MIT, Caltech, and the Indian Institute of Management and Business (edX), Stanford, and more. MS has since instituted a Microsoft Professional Program in Data Science that includes nine courses on a variety of related topics and a capstone project to present a reasonably complete introductory curriculum on this subject matter. (Courses aren’t free, but at $99 each, they are fairly inexpensive.)

Data science is a big subject area, so by the time you’ve spent three to five years in the workforce and have started to zero-in on a career path, you’ll also start narrowing in on one or more data science specialties and platforms. These include areas such as big data programming, analysis, business intelligence and more. Any or all of them can put you into a front-line data science job of some kind, even as you narrow your focus on the job.

This is the career stage at which you’ll develop increasing technical skills and knowledge, as you also start to gain more seniority and responsibility among your peers. Soft skills become more important mid-career as well, because you’ll have to start drawing on your abilities to communicate with and lead or guide others (primarily on technical subjects related to data science and its outputs or results) during this career phase.

This is a time for professional growth and specialization. That’s why there is a much broader array of topics and areas to consider as one digs deeper into data science to develop more focused and intense technical skills and knowledge. Data science-related certifications can really help with this but will require some careful research and consideration. Thus, for example, one person might decide to dig into certifications related to a particular big data platform or toolset – such as the Certified Analytics Professional, MongoDB, Dell/EMC, Microsoft, Oracle or SAS.

This is a point at which one might choose to specialize more in big data programming for Hadoop, Cloudera or MongoDB on the one hand, or in running analyses and interpreting results from specific big data sets on the other. Cloudera covers most of these bases all by itself, which makes its offerings worth checking out: among many other certifications, they have Data Scientist, Data Engineer, Spark and Hadoop Developer and Administrator for Apache Hadoop credentials. There are dozens of Big Data certifications available today, with more coming online all the time, so you’ll have to follow your technical interests and proclivities to learn more about which ones are right for you.

After 10 or more years in the workforce, it’s time to get serious about data science/Big Data. This is the point at which most IT professionals start reaching for higher rungs on the job role and responsibilities ladder.

Jobs with such titles as senior data analyst, senior business intelligence analyst, senior data scientist, big data platform specialist (where you can plug in the name of your chosen platform in searching for opportunities), senior big data developer, and so forth, represent the kinds of positions that data science pros are likely to occupy at the point on the career ladder. Expert or senior level IT pros will often be spearheading project teams of varying sizes by this point on the career line as well, even if their jobs don’t carry a specific management title or overt management responsibilities. This means that soft skills are even more important with an increasing emphasis on leadership and vision, along with skills in people and project management, plus oral and written communications.

This is the career step at which one typically climbs near or to the top of most technical certification ladders. Many of these credentials – such as the SAS “Advanced Analytics” credentials (four at present) – actually include the term “advanced” or “expert” in their certification monikers.

The SAS Institute and Dell/EMC, in particular, have rich and deep certification programs, with various opportunities for interested data scientists or Big Data folks to specialize and develop their skills and knowledge. Database platform vendors, such as Oracle, IBM and Microsoft are also starting to recognize the potential and importance of Big Data and are adding related elements to their certification programs all the time. Because this field is still relatively young and new cert programs are still coming online, the shape of the high end of the cert landscape for Big Data is very much a work in progress.

Whatever Big Data platform or specialty you choose to pursue, this is the career stage where deep understanding of the principals and practices in the field and an understanding of their business impact and value must begin to combine. It is also where people must focus on their soft skills at the highest level, because senior data scientists or Big Data experts must be able to lead teams of high-level individuals in the organizations they serve, including top executives, high-level managers, and other technical experts and consultants. As you might expect, this kind of work is as much about soft skills in communication and leadership as it is about in-depth technical knowledge and ability.

Depending on where you are in terms of work experience, family situation and finances, it may be worth considering a master’s degree with a focus on data science or some other aspect of Big Data as a profound developmental step for career development. For most working adults, this will mean getting into a part-time or online advanced degree program.

Many such programs are available, but you’ll want to consider the name recognition value and the cost of those offerings when choosing a degree plan to pursue. If pursued later in life (after one’s 20s), a Ph.D. is probably only attainable for someone with strong interests in research or teaching. That means a Ph.D. is not an option for most readers unless they plan and budget for a lengthy interruption in their working lives (most doctorate programs require full-time attendance on campus, and take from three to six years to complete).

With proper education, certification, planning and experience, working as a data scientist, or in some other Big Data role, is an achievable goal. It will take at least three to five years for entry-level IT professionals to work their way into such a position (less for those with more experience or an advanced degree in the field), but it’s a job that offers high pay and one that is expected to stay in high demand for the foreseeable future. Because the amount of data stored in the world is only increasing year over year, this appears to be a good specialty area in IT that’s long on opportunity and growth potential.


Ed TittelEd is a 30-year-plus veteran of the computing industry, who has worked as a programmer, a technical manager, a classroom instructor, a network consultant and a technical evangelist for companies that include Burroughs, Schlumberger, Novell, IBM/Tivoli and NetQoS. He has written for numerous publications, including Tom’s IT Pro, and is the author of more than 140 computing books on information security, web markup languages and development tools, and Windows operating systems.