Responsibilities:
· Perform data analyses on and discover new uses for existing data sources.
· Develop and evaluate the performance of predictive statistical models and selecting features, building and optimizing classifiers using machine-learning techniques.
· Data mining using state-of-the-art methods.
· Create and interpret strategic and operational analyses, assess options objectively, and present conclusions and recommendations to all levels of management.
· Develop subject matter expertise on source systems data and metadata.
· Extending company’s data with third party sources of information when needed.
· Enhancing data collection procedures to include information that is relevant for building analytic systems.
· Processing, cleansing, and verifying the integrity of data used for analysis.
· Doing ad-hoc analysis and presenting results in a clear manner.
· Creating automated anomaly detection systems and constant tracking of its performance.
· Collaborate with management and business units on innovative ways to successfully utilize data and related tools to advance business objectives and develop new products and services.
· Gain and master a comprehensive understanding of operations, processes, and business objectives and utilize that knowledge for data analysis and business insight.
Requirements:
· BSCS or equivalent; Masters in Data Sciences is preferred.
· 2+ years of experience as a Big Data Engineer or similar role.
· Good experience of Cloud platforms such as AWS, Azure or GCP.
· Strong SQL, and programming skills with a preference towards Python, Java, Scala, shell scripting.
· Must be able to tune Hadoop solutions to improve performance and end-user experience.
· Must be Proficient working with Hadoop cluster (with all included services), Hadoop, Cassandra, Map Reduce, HDFS, Cloudera, Storm or Spark-Streaming.
· Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala.
· Experience with integration of data from multiple data sources.
· Experience with NoSQL databases, such as HBase, Cassandra, MongoDB.
· Knowledge of various ETL/Ingestion techniques and frameworks, such as NiFi, SSIS, Flume, Airflow, Python.
· Good understanding of Lambda Architecture, along with its advantages and drawbacks.
· Experience with non-relational & relational databases (SQL, MySQL, NoSQL, Hadoop, MongoDB, etc.).
· Demonstrated ability to clearly form and communicate ideas to both technical and non-technical audiences.