Do you want to move into a Big Data profession? Today, nearly 7 years later, the interest of Big Data is well established. It is now known and accepted that Big Data creates millions of jobs around the world.
After analyzing companies’ calls for tenders from 2016 to 2022, studying the allocation of their IT budget and expenses over the period, analyzing advertisements in various IT magazines, and monitoring technology, we have identified the 8 big data jobs most in demand by companies in 2022 and the trends for 2023.
1. Data Engineer
You choose this job if you want to help companies with the operational aspects of managing their data. Indeed, this business specializes in large-scale data management issues. A person oriented towards this profession will be able to use massively parallel computing frameworks such as Hadoop or Spark to manage large volumes of data. Big Data data engineering requires the dual mastery of Big Data technologies (mainly Hadoop, Spark, SQL, Hive, Oozie, ElasticSearch, Nifi, HBase, Spark Streaming, Apache Kafka, HDFS, Shell) and data management techniques (Data formats, distributed architectures, streaming data management, real time, API, web services, impact of technologies on application performance) to solve business needs for Reporting, indicator calculation, and operation of data for analytical purposes. The demand for this profession has been steadily increasing since 2016 and is driven by the increasingly growing transition of companies from traditional BI systems to Big Data systems. According to our analyses, this profession is, along with DevOps/cloud, currently the most profitable niches in Big Data. This trend will no doubt continue in 2022.
2. DevOps/Cloud Engineer
You are moving towards this profession if you want to help companies with the infrastructural aspects of their Big Data project. The traditional deployment of data processing applications still follows a V cycle, i.e. a sequential flow specification -> design -> development -> testing -> delivery to production. The problem with this approach is the delays that occur when going back and forth between the different phases of the cycle. These delays explain the well-known statistic that only 25% of fully developed projects are actually deployed in production. These delays are even more accentuated in Big Data where we already encounter many technical, technological and organizational difficulties. To solve this problem, agile development cycles (SCRUM, IP, Lean, Kanban, Safe, etc.) rely on reduced redundancy and maximum automation of tasks. DevOps, abbreviation of Development – Operational, is the agile counterpart of 3 profiles: developer – tester – software integrator. It is a profession that essentially consists of automating the development and deployment flows of software applications in companies. It uses specialized tools such as Jenkins, Git, GitFlow, Docker, SonarQube, Ansible, Maven, Nexus, artifactory, Kubernetes to provide continuous integration, i.e. automation of test phases -> deployment apps.
This allows a huge time saving for the company and promotes responsiveness in the correction of application bugs identified in the production environment, testing and deployment of new software versions (fixes, patches). DevOps does not do software development strictly speaking (even if it must know the general principles), it is more infrastructure-oriented and acts as an interface between the integrator and the developers. The work of DevOps greatly facilitates IT governance, because the software as an asset of the company, is now better controlled and better managed. We have greater control of the delays associated with the deployment of applications. This year, beyond the buzzword aspect, the demand has been very strong for DevOps engineers, especially those who also add Cloud skills… DevOps was this year after the Data Engineer, the most profitable niche in the BigData. Their demand will increase further in 2022.
3. Big Data Architect
You are moving towards this job if you want to help companies with the organizational aspects of their Big Data project. The Data architect (or Big Data, depending on the scale of the project) is a technical-functional profession. It refers on the one hand to the capacity to decide on the technological bricks necessary for the resolution of a specific data problem, and on the other hand to the capacity to integrate this set into the existing IT architecture of the company or to modify it so that it can be integrated with this one. The Big Data architect is very little involved in developments. He can provide technological expertise if necessary, but in most of the time, He provides the mapping of the tools to be used, and will show benchmark in support, the impact that these will have in the IS of business and works with decision makers to implement it. Big Data architecture is a lot of advice on the choice of technologies to be made, machine configurations, validation of the technical feasibility of uses cases.
It essentially requires mastery of corporate IS management repositories such as CobIT, ITIL, TOGAF, knowledge of the principles of urbanization of an information system, project management approaches (agile methodologies, SCRUM, Safe, V cycle), service-oriented architectures (SOA), analysis of business needs and MOA. It also requires a fairly in-depth knowledge of the main Big Data technologies. The demand for this job is not strong compared to Data Engineer or DevOps, but companies are very demanding on the skills they expect from a Big Data architect, therefore profiles are very rare. It should also be noted that because of the particularity of their profile, Big Data architects are by far the most paid of all Big Data professions, with gross ADR starting at 850 euros. Unlike Data Engineers or DevOps whose ADRs/salaries are driven exclusively by high demand (and the ESN-level speculation that this entails), the ADR of architects is high because of the many and diverse skills required by the job.
4. Big Data Administrator/Integrator
You are moving towards this profession if you want to help companies with the infrastructural aspects of their Big Data project. Big Data administration or integration is a job specifically related to the administration of Big Data and RUN technologies. It is a job in which we ensure that the Big Data technologies used in the project operate correctly (creation and sizing of virtual machines, connection of nodes, configuration, installation of the operating system, installation of software and tools necessary for the project, implementation of the security policy, management of resource provisioning and resizing).
It also consists of managing the security aspects, the allocation of authorizations and levels of permissions to the different users of the technologies used. In some cases, this job is combined with that of Big Data integrator. In which case it is also responsible for making Production Releases (MEP) of projects/applications on the platform, and for the run (production monitoring). Administration/integration requires a strong command of Linux, Hadoop administration tools (Ambari, Ranger), security protocols (Kerberos, SSL), Shell, administrative procedures for managing MEPs and production incidents , and to some extent DevOps tools (Jenkins, Git, GitFlow, Docker, SonarQube, Ansible, Maven, Nexus, artifactory, Kubernetes, unit testing tools, integration testing tools, functional testing tools, etc. .). The demand for this profession has been on the rise since this year. This increase is due to the fact that many companies are starting to go beyond the PoC framework to truly deploy their Big Data projects in production. The rise in this profile will therefore increase in the coming months and it is not surprising that there will be a boom in demand by 2022.
5. Data Analyst
You are moving towards this job if you want to help companies with the front-end aspects of their Big Data project. The technical side of Big Data is very complex and vast. The profession of Data Analyst has started to appear in calls for tenders recently because companies have felt the need to have data valued and synthesized in the form of performance indicators (KPIs) and dashboards.
The Data Analyst helps companies to truly consume the data shaped by the Data Engineer or the results returned by the models of the Data Scientist for effective decision making. It is a profession at the intersection of Business Intelligence and Big Data engineering. The Data Analyst masters reporting and visualization tools (Microstrategy, Business Objects, Microsoft Power BI), the ultimate monitoring tool for decision-makers (Microsoft Excel), VBA programming, SQL, and has very good communication skills to exchange with the company’s decision-makers on the meaning of the indicators calculated on the basis of the data. Its ultimate goal is data analysis for decision-making purposes. It is a very fascinating profession for people who see themselves more as research managers and analysts than as engineers. It is a profession that is increasingly in demand in Big Data with the deployment of projects in production. Its demand is increasing due to renewed market interest in Data Visualization.
6. Big Data Developer
You choose this job if you want to help companies with the application aspects of their Big Data project. This profession is, as its name suggests, software development. It refers to the ability to masterfully use a programming language (mainly Java or Scala) and APIs specialized in Big Data to develop application bricks that will complement a massively parallel processing platform such as Hadoop, Spark, HBase, etc.
7. Data Scientist
You choose this job if you want to help companies with the operational aspects of enhancing their data. This profile intervenes downstream of the data engineer. It is a job that essentially consists of “making the data speak”. The profession of Data Scientist requires skills in behavioral mathematical models (in other words mathematical models that explain or anticipate the evolution of a variable). Examples of such models are: linear regression, logistic regression, LASSO, Bridge, decision trees, multi-layer perceptrons, descriptive statistics, statistical inference, K-means, K-plus close neighbours, CHAID 2, etc. Knowledge of these models is the keystone of the Data Scientist profession. These techniques are used to anticipate the behavior of a variable, recommend actions to perform, categorize data according to their degree of similarity.
For example, in e-commerce and social networks, it is the Data Scientist who develops the recommendation algorithms behind “people you might also know”, “products you might also buy”, “pages you may also like”. In the field of banking, data scientists are developing scoring models that make it possible to lend money or not to an individual, to invest or not to invest in a project, to define and propose offers according to the profile of each customer, etc. Demand for this profession is down despite all the media hype it has been enjoying lately (cf. artificial intelligence, chatbots). This decrease is due to the decline in Data Science industrial projects. Many Data Scientists are reorienting themselves in Data Engineering/Data Analyst. On the other hand, we have a somewhat mixed opinion on the horizons of this profession. We believe that the demand will remain stable and that industrial Data Science projects will eventually emerge at some point.
8. Big Data Tech Lead
In 2020, a new profession has appeared in Big Data, it is that of Tech Lead (for Technical Leader). It was born on the one hand from the integration between the technologies of the Hadoop ecosystem and the existing technologies of the IT systems of companies, which are becoming more and more complex. On the other hand, the growing complexity of the technological ecosystem necessary to develop and deploy Big Data solutions in production. The Tech Lead is, in a word, THE technical referent of the Big Data project (hence his name Technical Leader). It is a profession of expertise and support located at the border between 3 profiles: a senior Data Engineer profile, an integrator profile and an Architect profile. He is the technical point of reference both for the project development and integration team and for the customers where the projects are carried out. The tasks of the Tech Lead can be summarized in two:
- Support companies in defining a strategy for integrating Big Data technologies into their IS, which initially involves deciding on the technical orientation of the project (validating the choice of technologies that make it possible to respond to the problems of the company’s Big Data project); and secondly to validate the proper functioning of the technologies chosen within the framework of a deployment in production.
- be the technical referent for the Big Data development teams and provide the technological expertise necessary for the realization of application solutions.
The Tech Lead is a polyglot profile combining a proven mastery of the main technologies of the Hadoop ecosystem with advanced skills in software development, Data Engineering, and basic skills in software architecture, and continuous integration (CI/CD chain, DevOps, Jenkins, Ansible, Docker, Cloud). There is no training to become a Tech Lead. In general, it is a job that one exercises after at least 3 years of experience as a Data Engineer, and after having developed skills in continuous integration.
ABOUT LONDON DATA CONSULTING (LDC)
We, at London Data Consulting (LDC), provide all sorts of Data Solutions. This includes Data Science (AI/ML/NLP), Data Engineer, Data Architecture, Data Analysis, CRM & Leads Generation, Business Intelligence and Cloud solutions (AWS/GCP/Azure).
For more information about our range of services, please visit: https://london-data-consulting.com/services
Interested in working for London Data Consulting, please visit our careers page on https://london-data-consulting.com/careers