H2O.ai Blog
Filter By:
513 results Category: Year:Fine Tuning The H2O Danube2 LLM for The Singlish Language
Singlish is an informal version of English spoken in Singapore. The primary variations lie in the style and structure of the text, and inclusion of elements of Chinese and Malay. Though Singlish is the common tongue in Singapore, it isn’t well defined or formalized. We fine tuned H2O.ai’s Danube-2 1.8B LLM on Singlish instruction data, wi...
Read moreAnnouncing H2O Danube 2: The next generation of Small Language Models from H2O.ai
A new series of Small Language Models from H2O.ai, released under Apache 2.0 and ready to be fine-tuned for your specific needs to run offline and with a smaller footprint. Why Small Language Models? Like most decisions in AI and tech, the decision of which Language Model to use for your production use cases comes down to trade-offs. ...
Read moreH2O Release 3.46
We are excited to announce the release of H2O-3 3.46.0.1! Some of the highlights of this major release are that we added custom metric support for XGBoost, allowed grid search models to be sorted with custom metrics, and we enabled H2O MOJO and POJO to work with MLFlow. Several improvements were also made to the Uplift model (like MLI ...
Read moreOpen-Weight AI Models: A Path to Responsible Innovation
The recent Request for Comments (RFC) issued by the National Telecommunications and Information Administration (NTIA) on open-weight AI models has sparked an important conversation about the future of AI. As we consider the potential benefits and risks associated with making AI model weights more accessible and transparent, it is clear ...
Read moreTransformando Empresas Latinoamericanas con Inteligencia Artificial: Estrategias y Perspectivas
En la actualidad podemos reconocer que hay una alta emoción en foros y publicaciones acerca del uso de inteligencia artificial (IA) en diferentes ámbitos empresariales, muchas veces se habla de los grandes cambios que conlleva el uso de la IA en procesos de negocios sin embargo estos casos de uso exitosos en su mayoría pertenecen a com...
Read moreUnlocking GenAI Magic: GenAI AppStudio Revolutionizes App Development with LLMs! (Part 2)
GenAI AppStudio provides a no code way to take user sketches and generates the code for you. DEMO Introducing GenAI AppStudio GenAI AppStudio is a no-code platform specifically crafted for non-technical users, to easily transform app ideas into reality with a few simple steps. One of its key features is the ability to sea...
Read moreH2O LLM DataStudio: V4.1 Release
H2O LLM DataStudio is a comprehensive no-code application designed to simplify data preparation tasks for Large Language Models (LLMs). This tool comprises three key components: Curate, Prepare, and Augment. Curate - Conversion of documents (PDFs, DOC & audio/video files) into question-answer pairs and summarization pairs Prepare ...
Read moreMy New Blog Page Title
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi vel risus erat. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus id tortor egestas, mollis augue eu, venenatis felis. Curabitur facilisis nunc sit amet odio tempor pharetra. Integer nunc magna, tincidunt eu elit a, aliquam gravida metus. In molestie rhoncus aug...
Read moreMy New Blog Page Title
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi vel risus erat. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus id tortor egestas, mollis augue eu, venenatis felis. Curabitur facilisis nunc sit amet odio tempor pharetra. Integer nunc magna, tincidunt eu elit a, aliquam gravida metus. In molestie rhoncus aug...
Read moreIntroducing the H2O GenAI App Store: A Playground of Generative AI Innovation
As the world becomes increasingly interconnected and reliant on data-driven decisions, the need for powerful and innovative AI solutions has never been more critical. At H2O.ai, we've been at the forefront of AI and machine learning for the last decade, providing you with the tools and platforms to harness the power of data. Today, we're ...
Read moreApresentamos a H2O GenAI App Store: um Playground de Inovação em Inteligência Artificial Generativa.
This blog was originally published in English here: https://h2o.ai/blog/2023/gen-ai-app-store/ À medida que o mundo se torna cada vez mais interconectado e dependente de decisões orientadas por dados, a necessidade de soluções de IA poderosas e inovadoras nunca foi tão crítica. Na H2O.ai, estivemos na vanguarda da IA e do aprendizado de ...
Read morePresentamos la H2O GenAI App Store: Un Playground de Innovación en Inteligencia Artificial Generativa.
This blog was originally published in English here: https://h2o.ai/blog/2023/gen-ai-app-store/ A medida que el mundo se vuelve cada vez más interconectado y dependiente de decisiones basadas en datos, la necesidad de soluciones de inteligencia artificial (IA) potentes e innovadoras nunca ha sido tan crítica. En H2O.ai, hemos estado a la ...
Read moreH2O Release 3.44
We are excited to announce the release of H2O-3 3.44.0.1! We have added and improved many items. A few of our highlights are the implementation of AdaBoost, Shapley values support, Python 3.10 and 3.11 support, and added custom metric support for Deep Learning, Uplift Distributed Random Forest (DRF), Stacked Ensemble, and AutoML. Please r...
Read moreBoosting LLMs to New Heights with Retrieval Augmented Generation
Businesses today can make leaps and bounds to revolutionize the way things are done with the use of Large Language Models (LLMs). LLMs are widely used by businesses today to automate certain tasks and create internal or customer-facing chatbots that boost efficiency. Challenges with dynamic adaption of LLMs As with any new hyped-up thi...
Read moreEntrenando Tu Propio LLM Sin Programación
This blog was originally published in English here: https://www.analyticsvidhya.com/blog/2023/09/training-your-own-llm-without-coding/ Introducción La Inteligencia Artificial Generativa, un campo fascinante que promete revolucionar cómo interactuamos con la tecnología y generamos contenido, ha causado sensación en el mundo. En este artí...
Read moreH2O LLM DataStudio Part II: Convert Documents to QA Pairs for fine tuning of LLMs
Convert unstructured datasets to Question-answer pairs required for LLM fine-tuning and other downstream tasks with H2O LLM Data Studio Curate. Every organization needs to own its GPT as simply as it needs to bring its data, algorithms, and models (read more here). A common problem we see in organizations is that they want to be able to...
Read moreBuilding a Fraud Detection Model with H2O AI Cloud
In a previous article [1], we discussed how machine learning could be harnessed to mitigate fraud. This time, we’ll delve into a step-by-step guide on leveraging H2O AI Cloud to construct efficient fraud detection models. We’ll tackle this process in three critical stages: build, operate, and detect. First, we’ll utilize Driverless AI in ...
Read moreA Look at the UniformRobust Method for Histogram Type
Tree-based algorithms, especially Gradient Boosting Machines (GBM’s), are one of the most popular algorithms used. They often out-perform linear models and neural networks for tabular data since they used a boosted approach where each tree built works to fix the error of the previous tree. As the model trains, it is continuously self-corr...
Read moreTesting Large Language Model (LLM) Vulnerabilities Using Adversarial Attacks
Adversarial analysis seeks to explain a machine learning model by understanding locally what changes need to be made to the input to change a model’s outcome. Depending on the context, adversarial results could be used as attacks, in which a change is made to trick a model into reaching a different outcome. Or they could be used as an exp...
Read moreH2O LLM EvalGPT: A Comprehensive Tool for Evaluating Large Language Models
In an era where Large Language Models (LLMs) are rapidly gaining traction for diverse applications, the need for comprehensive evaluation and comparison of these models has never been more critical. At H2O.ai, our commitment to democratizing AI is deeply ingrained in our ethos, and in this spirit, we are thrilled to introduce our innovati...
Read moreReducing False Positives in Financial Transactions with AutoML
In an increasingly digital world, combating financial fraud is a high-stakes game. However, the systems we deploy to safeguard ourselves are raising too many false alarms, with over 90% of fraud alerts being false positives. These false positives, not only frustrating for consumers but also costly for financial institutions, can eclipse t...
Read moreWinner's Insight: Navigating the Parkinson's Disease Prediction Challenge with AI
Parkinson’s disease, a condition affecting movement, cognition, and sleep, is escalating rapidly. By 2037, it is projected that around 1.6 million U.S. residents will be confronting this disease, resulting in significant societal and economic challenges. Studies have hinted that disruptions in proteins or peptides could be instrumental in...
Read moreH2O.ai and Snowflake Enable Developers to Train, Deploy, and Score Containerized Software Without Compromising Data Security
H2O.ai today announced its participation as a launch partner for Snowflake’s Snowpark Container Services (available in private preview), which provides our joint customers with the flexibility to train, deploy, and score models all within their Snowflake account. This further expands the ease of use for data science teams to create machin...
Read moreH2O Releases 3.40.0.1 and 3.42.0.1
Our new major releases of H2O are packed with new features and fixes! Some of the major highlights of these releases are the new Decision Tree algorithm, the added ability to grid over Infogram, an upgrade to the version of XGBoost and an improvement to its speed, the completion of the maximum likelihood dispersion parameter and its expan...
Read moreGenerating LLM Powered Apps using H2O LLM AppStudio – Part1: Sketch2App
sketch2app is an application that let users instantly convert sketches to fully functional AI applications. This blog is Part 1 of the LLM AppStudio Blog Series and introduces sketch2app The H2O.ai team is dedicated to democratizing AI and making it accessible to everyone. One of the focus areas of our team is to simplify the adoption of...
Read moreH2O LLM DataStudio: Streamlining Data Curation and Data Preparation for LLMs related tasks
A no-code application and toolkit to streamline data preparation tasks related to Large Language Models (LLMs) H2O LLM DataStudio is a no-code application designed to streamline data preparation tasks specifically for Large Language Models (LLMs). It offers a comprehensive range of preprocessing and preparation functions such as text cl...
Read moreRecap of H2O World India 2023: Advancements in AI and Insights from Industry Leaders
On April 19th, the H2O World made its debut in India, marking yet another milestone in its global journey. The conference gathered an array of notable experts and enthusiasts from deep learning, artificial intelligence, and data science. A broad spectrum of topics was covered, shedding light on the strides made in AI technology and its ...
Read moreEnhancing H2O Model Validation App with h2oGPT Integration
As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and enhance our processes. What if we could integrate the power of language models into our workflows, especially in the critical phase of model validation? Imagine running validation procedures, interpreting results, or even troubleshooting i...
Read moreBuilding a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave
Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in the manufacturing industry. The presence of defective components can have adverse effects on various aspects, including escalating production costs, compromising product quality, diminishing product longevity, and l...
Read moreInsights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution
At H2O.ai, we believe technology can be a force for good, and we’re committed to leveraging its power to create a positive impact in the world. As part of this commitment, we recently organized an AI for Good Hackathon during the H2O World India event, where participants had the opportunity to apply their data science skills to a real-wor...
Read moreDemocratization of LLMs
Every organization needs to own its GPT as simply as we need to own our data, algorithms and models. H2O LLM Studio democratizes LLMs for everyone allowing customers, communities and individuals to fine-tune large open source LLMs like h2oGPT and others on their own private data and on their servers. Every nation, state and city needs it...
Read moreBuilding the World's Best Open-Source Large Language Model: H2O.ai's Journey
At H2O.ai, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms. We released H2O, the most widely used open-source distributed and scalable machine learning platform, before XGBoost, TensorFlow and PyTorch existed. H2O.ai is home to over 25 Kaggle grandmasters, including the current #1. In 2017, w...
Read moreEffortless Fine-Tuning of Large Language Models with Open-Source H2O LLM Studio
While the pace at which Large Language Models (LLMs) have been driving breakthroughs is remarkable, these pre-trained models may not always be tailored to specific domains. Fine-tuning — the process of adapting a pre-trained language model to a specific task or domain—plays a critical role in NLP applications. However, fine-tuning can be ...
Read moreWhat's new in the latest release of H2O AI Hybrid Cloud?
Check out the complete release notes here! v23.01.0 | Apr 14, 2023 Upgraded ComponentsCore Components AI App Storev0.22.0 The AI App Store is a platform for accessing and operationalizing AI/ML applications and services that are built using H2O Wave . The 23.01.0 Hybrid Cloud release introduces multiple UI enhancements to make the us...
Read moreNavigating the challenges of time series forecasting
Jon Farland is a Senior Data Scientist and Director of Solutions Engineering for North America at H2O.ai. For the last decade, Jon has worked at the intersection of research, technology and energy sectors with a focus on developing large scale and real-time hierarchical forecasting systems. The machine learning models that drive these for...
Read moreHow Commonwealth Bank is transforming operations with Document AI
Sonal Surana , General Manager at Commonwealth Bank of Australia shares recent innovative ideas at H2O World Sydney. It’s been a rollercoaster of a ride this first year of our partnership with H2O.ai, and the momentum continues to get even more exciting. We’ve heard from Matt about our AI ambition and how front and center it is for CBA s...
Read moreIntroduction to H2O Document AI
Mark Landry, H2O.ai Director of Data Science and Product, and Kaggle Grandmasters showcases H2O Document AI during the Technical Track Sessions at H2O World Sydney 2022. Mark Landry: I’m Mark Landry, with some different titles than you see on the screen here. I’ve got a bunch at H2O, so I’ve been at H2O for about seven and a half years...
Read moreAI in Insurance: Resolution Life's AI Journey with Rajesh Malla
Rajesh Malla , Head of Data Engineering – Data Platforms COE at Resolution Life insurance takes the stage at H2O World Sydney 2022 to discuss AI transformation within the insurance industry. Resolution Life is the largest life insurer in Australasia. Malla discusses the use of H2O Driverless AI to predict claim triage and other insurance ...
Read moreAT&T panel: AI as a Service (AIaaS)
Mark Austin, Vice President of Data Science at AT&T joined us on stage at H2O World Dallas, along with his colleagues Mike Berry, Lead Solution Architect; Prince Paulraj, AVP of Engineering; Alan Gray, Principal-Solutions Architect; and Rob Woods, Lead Solution Architect, CDO to discuss what they’re doing today and where they see the ...
Read more[Infographic] Healthcare providers: How to avoid AI “Pilot-Itis”
From increased clinician burnout and financial instability to delays in elective and preventative care, the pandemic created a perfect storm of conditions that have strained the healthcare system in lasting ways. This storm continues unabated and is unleashing new challenges and exacerbating old ones. Artificial intelligence (AI) technol...
Read moreDeploy a WAVE app on an AWS EC2 instance
This article was originally published by Greg Fousas and Michelle Tanco on Medium and reviewed by Martin Turoci (unusualcode) This guide will demonstrate how to deploy a WAVE app on an AWS EC2 instance. WAVE can run on many different OSs (macOS, Linux, Windows) and architectures (Mac, PC). In this document, Ubuntu Linux will be used. T...
Read moreHow Horse Racing Predictions with H2O.ai Saved a Local Insurance Company $8M a Year
In this Technical Track session at H2O World Sydney 2022, SimplyAI’s Chief Data Scientist Matthew Foster explains his journey with machine learning and how applying the H2O framework resulted in significant success on and off the race track. Matthew Foster: I’m Matthew Foster, the Chief Data Scientist for SimplyAI. So, I’m going t...
Read moreAI and Humans Combating Extinction Together with Dr. Tanya Berger-Wolf
Dr. Tanya Berger-Wolf , Co-Founder and Director of AI for conservation nonprofit Wild Me , takes the stage at H2O World Sydney 2022 to discuss AI solutions for wildlife conservation, connecting data, people, and machines. AI can turn a massive collection of images into high-resolution information databases about wildlife, enabling scienti...
Read moreImproving Search Query Accuracy: A Beginner's Guide to Text Regression with H2O Hydrogen Torch
Although search engines are vital to our daily lives, they need help understanding complex user queries. Search engines rely on natural language processing (NLP) to understand the intent behind a user’s query and return relevant results. By formulating a well-formed question, users can provide more precise and specific information about w...
Read moreWhat it means—and takes—to be at AI’s edge with Dr. Tim Fountaine
Dr. Tim Fountaine, Senior Partner at McKinsey & Company joins us at H2O World Sydney 2022 to discuss why business leaders should care about AI, what mindset to adopt, and what actions to take to effectively bring AI into your organization. Dr. Fountaine discusses real-world examples and insights from McKinsey’s collaboration with the ...
Read more10 Consejos para Convertirte en un Científico de Datos Exitoso
La ciencia de datos llegó para quedarse. Los científicos de datos utilizan sus habilidades para ayudar a las empresas a tomar mejores decisiones sobre sus productos, servicios, a optimizar procesos, ahorrar y mejorar rentabilidad. Convertirse en un científico de datos de éxito implica muchos aspectos y el estudio continuo, ya que es un...
Read moreExplaining models built in H2O-3 — Part 1
Machine Learning explainability refers to understanding and interpreting the decisions and predictions made by a machine learning model. Explainability is crucial for ensuring the trustworthiness and transparency of machine learning models, particularly in high-stakes situations where the consequences of incorrect predictions can be signi...
Read moreH2O.ai at NeurIPS 2022
H2O.ai is proud to participate in the 36th Conference on Neural Information Processing Systems (NeurIPS) 2022, one of the biggest and most prestigious international conferences in artificial intelligence. NeurIPS 2022 will be a Hybrid Conference from Monday, November 28th through Friday, December 9th, with an in-person event at the New Or...
Read moreA Brief Overview of AI Governance for Responsible Machine Learning Systems
Our paper “A Brief Overview of AI Governance for Responsible Machine Learning Systems” was recently accepted to the Trustworthy and Socially Responsible Machine Learning (TSRML) workshop at NeurIPS 2022 (New Orleans). In this paper, we discuss the framework and value of AI Governance for organizations of all sizes, across all industries a...
Read moreH2O World Dallas Customer Talks
After three long years of not having an #H2OWorld, we finally held our first one in Sydney to a sold-out crowd! We then followed it up with H2O World Dallas in the same week! It was a fantastic and jam-packed event with customers, partners, colleagues, and community members sharing how they leverage H2O.ai to accelerate and transform AI l...
Read moreNew in Wave 0.24.0
Another Wave release has arrived with quite a few exciting new features. Let’s quickly go over the biggest ones.Wave init CLIHow many times you wanted to build a Wave app fast, but then you realized you need to start from scratch, copy over the skeleton of your app and work up from there? For these exact reasons, we introduced a new wave...
Read moreH2O.ai Raises $40 Million to Democratize Artificial Intelligence for the Enterprise
Series C round led by Wells Fargo and NVIDIA MOUNTAIN VIEW, CA – November 30, 2017 – H2O.ai, the leading company bringing AI to enterprises, today announced it has completed a $40 million Series C round of funding led by Wells Fargo and NVIDIA with participation from New York Life, Crane Venture Partners, Nexus Venture Partners and Tra...
Read moreH2O.ai Placed Furthest in Completeness of Vision in 2021 Gartner Data Science and Machine Learning Magic Quadrant in the Visionaries Quadrant. -- Copy
At H2O.ai, our mission is to democratize AI, and we believe driving value from data is a team sport. Data needs to be organized and prepared, often by data engineers, and then models need to be built by data scientists. With models built, they need to be put into production and maintained by IT and DevOps personnel. Finally, these models...
Read moreH2O.ai Expands Market Footprint in Healthcare AI by Signing Hackensack Meridian Health and Other Key Providers
We’re excited to attend the HLTH conference this week in Las Vegas, NV. This industry event has quickly become the go-to event for c-level executives across all parts of the healthcare industry. It’s both incredible and inspiring to see how quickly the event has grown in its five years, and that’s why we’re excited to share some news abou...
Read moreAn Introduction to H2O Wave Table
H2O Wave is a Python package for creating realtime ML/AI applications for a wide variety of data science workflows and industry use cases. Data scientists view a significant amount of data in tabular form. Running SQL queries, pivoting data in Excel or slicing a pandas dataframe are pretty much bread-and-butter tasks. With the growing u...
Read moreSaving Zebras: “Their stripes are like fingerprints. No two are alike.”
It’s been said that a picture is worth a thousand words. But to Tanya Berger-Wolf, a picture is far more valuable than that. To Berger-Wolf, photos, images and videos are key to protecting biodiversity and entire species around the world. Scientists have known for years that we are in the middle of the sixth mass extinction on our planet...
Read moreH2O Managed Cloud With AWS PrivateLink is Now Generally Available
A n essential part of responsibly practicing machine learning is understanding how you secure your data. H2O Managed Cloud offers a single-tenant cloud environment with multiple layers of security – but how do you get your data securely into the cloud for training, and how do you score sensitive information without exposing it to the inte...
Read moreH2O.ai Receives Innovation Award for H2O Hydrogen Torch
We don’t like to brag, but we do like to celebrate the work our Makers create, and more importantly, why they create it: for you. H2O.ai was proud to accept the award for “Best Deep Learning Technology” at the AI Tech awards. H2O Hydrogen Torch , a no-code deep learning training engine, was released less than a year ago in February 2022...
Read moreAI for Good: PetFinder.my Levels Up Furry Matchmaking
Nothing tugs at the heart strings quite like a poster in your neighborhood about a missing cat or dog. For years, technology has enabled lost pets to be reunited with their families in the form of a small microchip that contains an owner’s contact information. Now some organizations are turning to emerging technology to help the millions ...
Read moreH2O Wave joins Hacktoberfest
It’s that time of the year again. A great initiative by DigitalOcean called Hacktoberfest that aims to bring more people to open source is about to start. Hacktoberfest incentives people to make at least 4 valuable contributions (pull requests) to an open source repository and get the reward i...
Read moreThree Keys to Ethical Artificial Intelligence in Your Organization
There’s certainly been no shortage of examples of AI gone bad over the past few years–enough to give everyone pause on how (and if) this technology can truly be used for good. If it’s not Facebook selling data of its users , it’s self-driving cars from Uber that can’t recognize pedestrians in time to slow down or stop. So while the uses ...
Read moreUsing GraphQL, HTTPX, and asyncio in H2O Wave
Today, I would like to cover the most basic use case for H2O Wave, which is collecting a bunch of data and displaying them in a nice and clean way. The goal is to build a simple dashboard that shows how H2O Wave compares against its main competitors in terms of popularity and codebase metrics. The main competitors in question are: Stre...
Read more머신러닝 자동화 솔루션 H2O Driveless AI를 이용한 뇌에서의 성차 예측
Predicting Gender Differences in the Brain Using Machine Learning Automation Solution H2O Driverless AI아동기 뇌인지 발달은 기억, 주의력, 사회성 등 고등 인지 기능에 영향을 미치고, 청소년기와 성인기의 뇌 발달로까지 이어집니다.Brain cognitive development in childhood affects higher cognitive functions such as memory, attention, and sociability, and leads to brain development in adolescence ...
Read moreMake with H2O.ai Recap: Validation Scheme Best Practices
Data Scientist and Kaggle Grandmaster, Dmitry Gordeev, presented at the Make with H2O.ai session on validation scheme best practices, our second accuracy masterclass. The session covered key concepts, different validation methods, data leaks, practical examples, and validation and ensembling. Key Concepts While the validation topics cove...
Read moreIntegrating VSCode editor into H2O Wave
Let’s have a look at how to provide our users with a truly amazing experience when we need to allow them to edit pieces of code or configuration. We will use one of the most popular and well-known code editors called Monaco editor which powers VSCode. The resulting app will have the editor on the left side and a markdown card on the righ...
Read more5 Tips for Improving Your H2O Wave Apps
Let’s quickly uncover a few simple tips that are quick to implement and have a big impact. Do not recreate navigation, update it The most common error I see across the Wave apps is ugly navigation that seems to be laggy. Laggy navigation. The reason for this behavior is that we want to save the clicked value and set it e...
Read moreMake with H2O.ai Recap: Getting Started with H2O Document AI
Product Owner, Data Scientist, and Kaggle Grandmaster, Mark Landry presented at the Make with H2O.ai session on getting started with H2O Document AI. The session covered an overview of H2O Document AI , a tool to extract insights and automate document processing. The session also included a product demo, looking at documents as data sets...
Read moreAdvice for Those Getting Started on Their AI Journey
H2O.ai Innovation Day Summer ‘22 included a customer insights panel made up of Prince Paulraj, AVP, Data Insights and Chief Data Officer at AT&T , Chris Throop, Managing Director and Global Head of Data Science at Castleton Commodities International and Sean Otto, Director of Advanced Analytics at AES . One of the questions panelists...
Read moreAES Transforms its Energy Business with AI and H2O.ai
AES is a leading renewable-energy company with global operations. The business produces energy and distributes energy for both private, public, and governmental organizations. AES was recently named one of the World’s Most Ethical Companies for the ninth straight year and won the Edison Electric Institute’s (EEI’s) Edison Award– the indus...
Read moreThe H2O.ai Wildfire Challenge Winners Blog Series - Team Titans
Note : this is a community blog post by Team Titans – one of the H2O.ai Wildfire Challenge winners. You can check out their app here .BackgroundForest fires have been getting worse in recent years. According to a report by the WWF, the duration of fire seasons across the globe has increased by 19% on average. The fire season has been sta...
Read moreImproving Machine Learning Operations with H2O.ai and Snowflake
Operationalizing models is critical for companies to get a return on their machine learning investments, but deployment is only one part of that operationalization process. With H2O.ai’s latest Snowflake Integration Application, authorized Snowflake users can easily deploy models, significantly reducing deployment timelines and enabling a...
Read moreImproving Manufacturing Quality with H2O.ai and Snowflake
Manufacturers are rapidly expanding their machine learning use cases by leveraging the deep integration between Snowflake’s Data Cloud and the H2O AI Cloud. Many current manufacturing quality checks require that sensor data and image data be processed and analyzed separately. Standard tooling presents challenges in storing and referencin...
Read moreThe H2O.ai Wildfire Challenge Winners Blog Series - Team PSR
Note : this is a community blog post by Team PSR – one of the H2O.ai Wildfire Challenge winners.This blog represents an experience we gained by participating in the H2O wildfire challenge. We need to mention that competing in this challenge is like a journey in a knowledge pool. For a person who is willing to get the knowledge of buildin...
Read moreDeveloping and Retaining Data Science Talent
It’s been almost a decade since the Harvard Business Review proclaimed that “Data Scientist” is the sexiest job of the 21st century. Since then, there has been an explosion of job opportunities and university degree programs claiming to give students all of the skills they need to accel in the field of data science . Yet, the scarcity of ...
Read moreThe H2O.ai Wildfire Challenge Winners Blog Series - Team HTB
Note : this is a community blog post by Team HTB – one of the H2O.ai Wildfire Challenge winners. You can check out their app here . The Challenge The purpose of the challenge was to develop an AI application to improve the forecast of bushfires and wildfires, with the main aim of reducing the human losses that these phenomena can cause...
Read moreThe H2O.ai Wildfire Challenge Winners Blog Series - Team Too Hot Encoder
Note : this is a community blog post by Team Too Hot Encoder – one of the H2O.ai Wildfire Challenge winners. You can check out their app here .The ChallengeThe aim of the project is to predict the probability of wildfire occurrence in Turkey for each month in 2020. As a result of these predictions, it is aimed to carry out more intensive...
Read moreBias and Debiasing
An important aspect of practicing machine learning in a responsible manner is understanding how models perform differently for different groups of people, for instance with different races, ages, or genders. Protected groups frequently have fewer instances in a training set, contributing to larger error rates for those groups. Some models...
Read moreComprehensive Guide to Image Classification using H2O Hydrogen Torch
In this article, we will learn how to build state-of-the-art models in computer vision and natural language processing within a couple of minutes using H2O Hydrogen Torch. Introduction to H2O Hydrogen Torch H2O Hydrogen Torch (HT) aims to simplify building and deploying deep learning models for a wide range of tasks in computer vision...
Read moreDemocratizing Lending through AI
According to the Federal Reserve , nearly 40% of adults in the U.S. sought credit in 2020, only slightly fewer than those who applied in the previous pre-pandemic year; among those who applied more than 1 in 10 were denied credit or were approved for less than they had sought. The reasons behind these denials are many, however, the same r...
Read moreSetting Up Your Local Machine for H2O AI Cloud Wave App Development
This article is for users who would like to build H2O Wave apps and publish them in the App Store within the H2O AI Cloud (HAIC). We will walk through how to set up your local machine for HAIC Wave App development. Instructions Developing with Wave H2O Wave is a framework for building frontends using only python or R. In this article...
Read moreData Science with H2O.ai: An Introduction to Machine Learning and Predictive Modeling
Our own Jonathan Farland recently recorded a talk about machine learning and predictive modeling. In his talk, Jon also gave an overview of open source H2O and H2O AI Cloud . This video is a great resource for getting up to speed with the latest technology from H2O in half an hour. Some of you may prefer to go through the slides while l...
Read moreGene Mutation AI
A genomics AI solution from H2O.ai Health Powered by NVIDIA GPUs and NVIDIA AI As precision medicine becomes more widespread, both medical diagnosis and drug discovery are increasingly relying on and leveraging the individual’s genomic and phenotypic profiles. From the multiple types and subtypes of cancer to heart disease, to obesity or ...
Read moreExpression Biomarker AI
A drug discovery AI solution from H2O.ai Health Powered by NVIDIA GPUs and NVIDIA AI In a healthy individual, each cell type has its own metabolic program, carrying out specific functions. This organization is disrupted in disease, either as a cause or a result of it, or both, and this disruption is reflected in the patient’s gene exp...
Read moreGene Mutation AI and the Future of Cancer Research
A genomics AI solution from H2O.ai Health Powered by NVIDIA GPUs and NVIDIA AI Cancer is a multifactorial disease with exact causes we have only recently begun to understand. While inherited germline mutations are understood to create a genetic predisposition to the disease, stochastic accumulation of somatic mutations over a person’s...
Read moreVaccine NLP
A population and public health NLP solution from H2O.ai Health Powered by NVIDIA GPUs and NVIDIA AI Social media platforms such as Twitter and Reddit have become invaluable tools for communication between individuals or groups and are widely used globally. As messages on these platforms can instantly be accessed by all users and remain on...
Read moreTackling Illegal, Unreported, and Unregulated (IUU) Fishing with AI
According to a report by the High-Level Panel for a Sustainable Ocean Economy, it is estimated that illegal, unreported, and unregulated (IUU) fishing accounts for 20 percent of the seafood and up to 50 percent in some areas. These activities not only affect the marine ecosystem but, in a way, are linked to climate change on the planet a...
Read moreUnsupervised Learning Metrics
That which is measured improves – Karl Pearson , Mathematician. Almost everyone has heard of accuracy, precision, and recall – the most common metrics for supervised learning . But not as many people know the metrics for unsupervised learning . So, in this article, we will take you through the most common methods and how to implement th...
Read moreDemand Sensing with H2O Wave : Supply Chain Intelligence and Inventory Optimization for Retail, CPG, and FMCG Industries
Demand Sensing can help optimize inventories by analyzing and modeling short-term and real-time signals The supply chains across the Consumer Packaged Goods (CPG), Fast-Moving Consumer Goods (FMCG) and Retail sectors need to continuously monitor the drivers that may impact their internal models and processes. These include systems around ...
Read moreAI Application to Demonstrate K-Means Clustering Using H2O Wave
Note : this is a community blog post by Shamil Dilshan Prematunga . It was first published on Medium . In this blog, I am going to highlight how cool H2O Wave is, by demonstrating my application called “K means App” which was built using Wave 0.20.0 . This is a simple application I have created to demonstrate one of the unsupervised lea...
Read moreA Quick Introduction to PyTorch: Using Deep Learning for Stock Price Prediction
Torch is a scalable and efficient deep learning framework. It offers flexibility and speed to build large scale applications. It also includes a wide range of libraries for developing speech, image, and video-based applications. The basic building block of Torch is called a tensor. All the operations defined in Torch use a tensor. Ok, l...
Read moreIntroducing H2O Hydrogen Torch: A No-code Deep Learning Framework
Over and over again we heard from customers, “deep learning is cool, but it’s hard and time consuming.” They kept asking “could someone just make it easier?” In typical “Maker” fashion, you ask, we deliver, H2O Hydrogen Torch . H2O Hydrogen Torch is a new product that enables data scientists and developers to train and deploy state-of-t...
Read moreHow to Create Your Spotify EDA App with H2O Wave
In this article, I will show you how to build a Spotify Exploratory Data Analysis (EDA) app using H2O Wave from scratch.H2O Wave is an open-source Python development framework for interactive AI apps. You do not need to know Flask, HTML, CSS, etc. H2O Wave has ready-to-use user-interface components and charts, including dashboard templa...
Read moreH2O.ai releases new H2O MLOps features that improves the explainability, flexibility and configuration of machine learning workflows.
H2O.ai now provides data scientists and machine learning (ML) engineers even more powerful features that give greater control, governance, and scalability within their machine learning workflow – all available on our H2O AI Cloud. Now, H2O MLOps enables you to: Deploy model explanations in production Explainability is core to understa...
Read moreMission Impossible: Improving Patient Care Through Automated Document Processing
Don’t tell Bob Rogers’ team something can’t be done. When Rogers embarked on an ambitious project to automate the processing of the more than 1.4 million electronically faxed documents received annually by the Center for Digital Health Innovation at the University of California, San Francisco (UCSF CDHI), advisors and vendors initially t...
Read moreAn Introduction to Unsupervised Machine Learning
There are three major branches of machine learning (ML): supervised, unsupervised, and reinforcement. Supervised learning makes up the bulk of the models businesses use, and reinforcement learning is behind front-page-news-AI such as AlphaGo . We believe unsupervised learning is the unsung hero of the three, and in this article, we brea...
Read moreRevisiting the Miracle of Istanbul
IntroductionOn May 25th, 2005, the UEFA Champions League final between AC Milan and Liverpool was held at the Atatürk Olympic Stadium in Istanbul. The match is still considered one of the greatest finals in football history. AC Milan took a 3-0 lead in the first half but Liverpool made a miraculous comeback in the second half to tie the g...
Read moreInstall H2O Wave on AWS Lightsail or EC2
Note : this blog post was first published on Thomas’ personal blog Neural Market Trends . I recently had to set up H2O’s Wave Server on AWS Lightsail and build a simple Wave App as a Proof of Concept. If you’ve never heard of H2O Wave then you have been missing out on a new cool app development framework. We use it at H2O to build AI-ba...
Read moreWhat Are Feature Stores and Why Are They Important?
Machine learning (ML) models are only as good as the data fed into them. In tabular problems, the data is a collection of rows (samples) and columns (features). So, you could say that tabular ML models are only as good as the features fed into them. But how do you manage features? Can you share them across the company? Can you easily reu...
Read moreA Beginner’s View of H2O MLOps
Note : this is a community blog post by Shamil Dilshan Prematunga . It was first published on Medium .When we step into the AI application world it is not one easy step. It has a series of tasks that are combined. To convert an idea to the workable stage we must fulfill the requirements in each stage. When we look at existing platforms, t...
Read moreShapley Values - A Gentle Introduction
If you can’t explain it to a six-year-old, you don’t understand it yourself. – Albert Einstein One fear caused by machine learning (ML) models is that they are blackboxes that cannot be explained. Some are so complex that no one, not even domain experts, can understand why they make certain decisions. This is of particular concern when s...
Read moreThe Bond Market & AI: How MarketAxess Brings it All Together
The vast majority of the equities market trades electronically while the bond market is still in its infancy by comparison, but MarketAxess is seeking to change that. Recently, we hosted a virtual event with the MarketAxess team where they explained how they were solving challenges in the world’s largest bond marketplace while leveraging ...
Read moreH2O Release 3.36 (Zorn)
There’s a new major release of H2O, and it’s packed with new features and fixes! Among the big new features in this release are Distributed Uplift Random Forest, an algorithm typically used in marketing and medicine to model uplift, and Infogram, a new research direction in machine learning that focuses on interpretability and fairness in...
Read more1st Place Winner's Blog - Kaggle 2021 Data Science and Machine Learning Survey
Kaggle, the largest global community of data scientists, conducted the 5th annual industry-wide survey that presented a truly comprehensive view of the state of data science and machine learning. A total of 25,973 responses were collected from participants from over 60 countries. Kaggle also launched the Data Science Survey Challenge in w...
Read moreWhy Companies Need to Think About MLOps
For years machine learning (ML) researchers have focused on building outstanding models and figuring out how to squeeze every last drop of performance from them. But many have realized that creating top-performing models doesn’t necessarily equate to having them deliver business value. Often the best models can be very complex and costly ...
Read moreAn Introduction to Time Series Modeling: Traditional Time Series Models and Their Limitations
In the first article in this series, we broke down the preprocessing and feature engineering techniques needed to build high-performing time series models. But we didn’t discuss the models themselves. In this article, we will dig into this. As a quick refresher, time series data has time on the x-axis and the value you are measuring (dema...
Read moreAnnouncing the Fully Managed H2O AI Cloud
The H2O AI Cloud is the leading platform to make and access your own AI models and apps. Customers have had access to the H2O AI Hybrid Cloud for the last year, where they could manage the platform themselves on their favorite cloud or on-prem infrastructure. Today, we’re excited to announce a fully managed version of the H2O AI Cloud. Y...
Read moreH2O.ai Tools for a Beginner
Note : this is a community blog post by Shamil Dilshan Prematunga . It was first published on Medium .Hey, this is not a deep technical blog. I’d like to share the experience I had with H2O tools when I was studying Machine Learning. As a Research Engineer, I am currently working on an area based on Telecommunication. Day by day with my e...
Read moreAmazon Redshift Integration for H2O.ai Model Scoring
We consistently work with our partners on innovative ways to use models in production here at H2O.ai, and we are excited to demonstrate our AWS Redshift integration for model scoring. Amazon Redshift is a very popular data warehouse on AWS. We wanted to expand on the existing capacities of using data from Redshift to train a model on the ...
Read moreBuilding Resilient Supply Chains with AI
A global pandemic, a fundamental shift in the demand for goods and services worldwide, and the recent blockage of a major international trade route have all highlighted the need to build and maintain resilient supply chains.At the foundation of resilient supply chains lie accurate and reliable forecasts. The majority of traditional softwa...
Read moreIntroducing the H2O.ai Wildfire Challenge
We are excited to announce our first AI competition for good – H2O.ai Wildfire Challenge .We’ve structured this challenge to be a global collaborative effort to do good for the world that we share. We want teams to submit their ideas and applications freely, knowing that other teams will learn from what they’ve done to improve their AI ap...
Read moreMLB Player Digital Engagement Forecasting
Are you a baseball fan? If so, you may notice that things are heating up right now as the Major League Baseball (MLB ) World Series between Houston Astros and Atlanta Braves tied at 1-1.MLB Postseason 2021 Results as of October 28 (source) This also reminded me of the MLB Player Digital Engagement Forecasting competition in which my coll...
Read moreAnnouncing the H2O AI Feature Store
We’re really excited to announce the H2O AI Feature Store – The only intelligent feature store in the market. We’ve been working on this for many months with our co-development partner: AT&T. This enabled us to build a first-of-its-kind platform that is designed to be enterprise-grade from day 1. It is built with best-of-breed techno...
Read moreAn Introduction to Time Series Modeling: Time Series Preprocessing and Feature Engineering
Time is the only nonrenewable resource – Sri Ambati, Founder and CEO, H2O.ai. Prediction is very difficult, especially if it’s about the future – Niels Bohr, Nobel Prize-Winning Physicist. Despite its inherent difficulty, every business needs to make predictions. You may want to forecast sales or estimate demand or gauge future inventory ...
Read moreNew Features Now Available with the Latest Release of the H2O AI Cloud 21.10
The Makers here at H2O.ai have been busy building new features and enhancing capabilities across our AI platform . Designed to support our core mission of democratizing AI, these additions to our platform simplify the ability to make AI you can trust, operate it efficiently and innovate with ready-made AI applications.Launched in January ...
Read moreTime Series Forecasting Best Practices
Earlier this year, my colleague Vishal Sharma gave a talk about time series forecasting best practices. The talk was well-received so we decided to turn it into a blog post. Below are some of the highlights from his talk. You can also follow the two software demos and try it yourself using our H2O AI Cloud .(Note : The video links with ...
Read moreImproving NLP Model Performance with Context-Aware Feature Extraction
I would like to share with you a simple yet very effective trick to improve feature engineering for text analytics. After reading this article, you will be able to follow the exact steps and try it yourself using our H2O AI Cloud .First of all, let’s have a look at the off-the-shelf natural language processing (NLP) recipes in H2O Driver...
Read moreFeature Transformation with the H2O AI Cloud
It is well known throughout the data science community that data preparation, pre-processing, and feature engineering are one of the most cumbersome parts of the data science workload. So as we continue to innovate here at H2O.ai with our end-to-end automated machine learning (autoML ) capabilities, we challenged ourselves to evolve the...
Read moreIntroducing DatatableTon - Python Datatable Tutorials & Exercises
Datatable is a python library for manipulating tabular data. It supports out-of-memory datasets, multi-threaded data processing and has a flexible API.If this reminds you of R’s data.table , you are spot on because Python’s datatable package is closely related to and inspired by the R library.The release of v1.0.0 was done on 1st July,...
Read moreH2O Release 3.34 (Zizler)
There’s a new major release of H2O, and it’s packed with new features and fixes! Among the big new features in this release, we’ve added Extended Isolation Forest for improved results on anomaly detection problems, and we’ve implemented the Type III SS test (ANOVAGLM) and the MAXR method to GLM. For existing algorithms, we improved the pe...
Read moreFrom the game of Go to Kaggle: The story of a Kaggle Grandmaster from Taiwan
In conversation with Kunhao Yeh: A Data Scientist and Kaggle Grandmaster In these series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai, who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand...
Read moreVisualizing Large Datasets with H2O-3
Exploratory data analysis is one of the essential parts of any data processing pipeline. However, when the magnitude of data is high, these visualizations become vague. If we were to plot millions of data points, it would become impossible to discern individual data points from each other. The visualized output in such a case is pleasing ...
Read moreInnovation with the H2O AI Cloud
Consumer expectations for responsiveness, personalization, and overall efficiency have risen dramatically over the past several years as technology has become ubiquitous across both our personal and professional lives. These rapidly growing expectations demand an expansion in focus from simply solving narrow use cases with machine learnin...
Read moreInterning with H2O.ai- Robie Gonzales
This blog post is by Robie Gonzales, who has interned with us for the last 8 months. Thank you for your awesome work, Robie! When I started my internship eight months ago, I had minimal knowledge about machine learning and artificial intelligence. Over the course of these months, my experience as a Full Stack Developer has allowed me to ...
Read moreAI-Driven Predictive Maintenance with H2O AI Cloud
According to a study conducted by Wall Street Journal , unplanned downtime costs industrial manufacturers an estimated $50 billion annually. Forty-two percent of this unplanned downtime can be attributed to equipment failure alone. These downtimes can cause unnecessary delays and, as a result, affect the business. A better and superior al...
Read moreWhat are we buying today?
Note : this is a guest blog post by Shrinidhi Narasimhan .It’s 2021 and recommendation engines are everywhere. Be it online shopping, food, music, and even online dating, the race to provide personalized recommendations to the user has many contenders. The technology of giving users what they need based on their buying strategies or digit...
Read moreThe Emergence of Automated Machine Learning in Industry
This post was originally published by K-Tech, Centre of Excellence for Data Science and AI, powered by NASSCOM. The link of the post can be found here. The concept of Automated Machine Learning has gained much traction recently. Automated Machine Le...
Read moreWhat does it take to win a Kaggle competition? Let's hear it from the winner himself.
In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai, who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand what it takes to be a Kaggle Grandmaster. In this interview, I shall be ...
Read moreH2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database
One of the goals of machine learning is to find unknown predictive features, even hidden from subject matter experts, in datasets that might not be apparent before, and use those 3rd party features to increase the accuracy of the model.A traditional way of doing this was to try and scrape and scour distributed, stagnant data sources on th...
Read moreGetting the best out of H2O.ai’s academic program
“H2O.ai provides impressively scalable implementations of many of the important machine learning tools in a user-friendly environment. Allowing for free academic use sets a generous example for commercial software developers — it is also the way forward in the era of open-source software.” – Professor Trevor J. Hastie, John A. Overdeck ...
Read moreRegístrese para su prueba gratuita y podrá explorar H2O AI Cloud
Recientemente, lanzamos nuestra prueba gratuita de 14 días de H2O AI Cloud, lo que le brinda la oportunidad de obtener una experiencia práctica con nuestra plataforma más nueva de machine learning. H2O AI Cloud es una plataforma de inteligencia artificial de principio al fin que permite a las organizaciones crear, compartir y usar rápidam...
Read moreHow Much is My Property Worth?
Note : this is a guest blog post by Jaafar Almusaad .How Much is My Property Worth?This is the million-dollar question – both figuratively and literally. Traditionally, qualified property valuers are tasked to answer this question. It’s a lengthy and costly process, but more critically, it’s inconsistent and largely subjective. Mind you, ...
Read moreNavegación más segura con Inteligencia Artificial
El mes pasado, el mundo fue testigo de cómo socorristas intentaron liberar un buque de carga que había encallado en el Canal de Suez. Este incidente bloqueó el tráfico a través de una vía navegable que es esencial para el comercio. Aunque la ubicación fue inusual, las colisiones de buques, las colisiones de buques con objetos fijos y los...
Read moreWhat it takes to become a World No 1 on Kaggle
In conversation with Guanshuo Xu: A Data Scientist, Kaggle Competitions Grandmaster, and a Ph.D. in Electrical Engineering. In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai , who share their journey, inspirations, and accomplishments. The intention behind these interviews...
Read moreUnwrap Deep Neural Networks Using H2O Wave and Aletheia for Interpretability and Diagnostics
The use cases and the impact of machine learning can be observed clearly in almost every industry and in applications such as drug discovery and patient data analysis, fraud detection, customer engagement, and workflow optimization. The impact of leveraging AI is clear and understood by the business; however, AI systems are also seen as b...
Read moreShapley summary plots: the latest addition to the H2O.ai’s Explainability arsenal
It is impossible to deploy successful AI models without taking into account or analyzing the risk element involved. Model overfitting, perpetuating historical human bias, and data drift are some of the concerns that need to be taken care of before putting the models into production. At H2O.ai, explainability is an integral part of our ML ...
Read moreH2O.ai logra gran posicionamiento en integridad de visión en el cuadrante Visionarios del Cuadrante Mágico de Gartner 2021 para Data Science y Machine Learning
En H2O.ai, nuestra misión es democratizar la IA y creemos que impulsar el valor de los datos es un esfuerzo de equipo. A menudo, los ingenieros de datos deben organizar y preparar los datos y luego los científicos de datos deben crear modelos. Los modelos, una vez creados, deben ponerse en producción y el personal de TI y de DevOps debe m...
Read moreSafer Sailing with AI
In the last week, the world watched as responders tried to free a cargo ship that had gone aground in the Suez Canal. This incident blocked traffic through a waterway that is critical for commerce. While the location was an unusual one, ship collisions, allisions , and groundings are not uncommon. With all the technology that mariners hav...
Read moreH2O AI Cloud: Democratizing AI for Every Person and Every Organization
Harnessing AI’s true potential by enabling every employee, customer, and citizen with sophisticated AI technology and easy-to-use AI applications. Democratization is an essential step in the development of AI, and AutoML technologies lie at the heart of it. AutoML tools have played a pivotal role in transforming the way we consume an...
Read moreH2O.ai é a mais avançada por sua capacidade de execução no quadrante dos visionários no relatório do Gartner de Ciências de Dados e Machine Learning em 2021
*Este artigo foi originalmente escrito em inglês pelo SVP de Marketing, Read Maloney, e traduzido para português por Bruna Smith. Na H2O.ai, nossa missão é democratizar a Inteligência Artificial e acreditamos que o valor agregado, gerado a partir dos dados, é um trabalho em equipe. Os dados devem ser organizados e preparados, geralmente ...
Read moreH2O.ai Placed Furthest in Completeness of Vision in 2021 Gartner Data Science and Machine Learning Magic Quadrant in the Visionaries Quadrant.
At H2O.ai, our mission is to democratize AI, and we believe driving value from data is a team sport. Data needs to be organized and prepared, often by data engineers, and then models need to be built by data scientists. With models built, they need to be put into production and maintained by IT and DevOps personnel. Finally, these models...
Read moreLearning from others is imperative to success on Kaggle says this Turkish GrandMaster
In conversation with Fatih Öztürk: A Data Scientist and a Kaggle Competition Grandmaster. In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai , who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want...
Read moreH2O-3 Improvements from Two University Projects
In September 2019 H2O.ai became a silver partner of the Faculty of Informatics at Czech Technical University in Prague. The main goal of this partnership is to make connections between students and companies to prepare an environment where students can use their knowledge in practice and gain real-work experiences. In general, within th...
Read moreData to Production Ready Models to Business Apps in Just a Few Steps
Building a Credit Scoring Model and Business App using H2OIn the journey of a successful credit scoring implementation, multiple stakeholders and different personas are involved at different steps – Business Inputs, Dataset procurement, Data Analysis, Predictive Machine Learning, Data Storytelling, and Dashboarding. H2O.AI platforms such ...
Read moreUsing Python's datatable library seamlessly on Kaggle
Managing large datasets on Kaggle without fearing about the out of memory error Datatable is a Python package for manipulating large dataframes. It has been created to provide big data support and enable high performance. This toolkit resembles pandas very closely but is more focused on speed.It supports out-of-memoy datasets, multi-thr...
Read moreSuccessful AI: Which Comes First, the Data or the Question?
Successful AI is a business process. Even the most sophisticated models, the latest algorithms, and highly experienced AI experts cannot make AI a practical success unless it is connected to a meaningful business goal . To make that happen, you need a good interaction between those with knowledge of the business and with the AI team. But ...
Read moreIntroducing H2O AI Cloud
Organizations have made large investments in modernizing their data infrastructure and operations, but most still struggle to drive maximum value from their data. Many companies experimented with building large teams of expert data scientists, and while this approach did produce some valuable models, the cost was high and the timeframes ...
Read moreUsing AI to unearth the unconscious bias in job descriptions
“Diversity is the collective strength of any successful organization Unconscious Bias in Job DescriptionsUnconscious bias is a term that affects us all in one way or the other. It is defined as the prejudice or unsupported judgments in favor of or against one thing, person, or group as compared to another, in a way that is usually con...
Read moreH2O Driverless AI 1.9.1: Continuing to Push the Boundaries for Responsible AI
At H2O.ai, we have been busy. Not only do we have our most significant new software launch coming up (details here ), but we also are thrilled to announce the latest release of our flagship enterprise platform H2O Driverless AI 1.9.1. With that said, let’s jump into what is new: Faster Python scoring pipelines with embedded MOJOs for r...
Read moreMeet the Data Scientist who just cannot stop winning on Kaggle.
In conversation with Philipp Singer: A Data Scientist, Kaggle Double Grandmaster, and a Ph.D. in Computer Science. In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai , who share their journey, inspirations, and accomplishments. These interviews are intended to motivate an...
Read moreLiqui.do Speeds Credit Scoring for Fair Lending with H2O.ai
Liqui.do is a technological and innovative company developing a platform for leasing equipment for small and medium enterprises. As part of its business to provide a variety of credit options for companies that want to finance capital purchases, Liqui.do needs to rapidly and accurately assess the credit risk and scoring of a customer in o...
Read moreNew Improvements in H2O 3.32.0.2
There is a new minor release of H2O that introduces two useful improvements to our XGBoost integration: interaction constraints and feature interactions.Interaction ConstraintsFeature interaction constraints allow users to decide which variables are allowed to interact and which are not.Potential benefits: Better predictive performance...
Read moreIntroducing H2O Wave
For almost a decade, H2O.ai has worked to build open source and commercial products that are on the leading edge of innovation in machine learning, from AutoML to Explainable AI . We are thrilled to announce the release of what we believe to be the future of AI Applications: H2O Wave . Wave is an open source, lightweight Python developmen...
Read moreGrandmaster Series: The inspiring journey of the ‘Beluga’ of Kaggle World 🐋
In conversation with Gábor Fodor: A Data Scientist at H2O.ai and a Kaggle Competitions’ Grandmaster. In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai , who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage othe...
Read moreAutomate your Model Documentation using H2O AutoDoc
Create model documentation for Supervised learning models in H2O-3 and Scikit-Learn — in minutes.The Federal Reserve’s 2011 guidelines state that without adequate documentation, model risk assessment and management would be ineffective. A similar requirement is put forward today by many regulatory and corporate governance bodies. Thus ...
Read moreMitos e verdades sobre o AutoML
Todas as revoluções que tivemos até hoje, tanto as tecnológicas quanto industriais, possuem uma semelhança: elas estão ligadas à forma como os seres humanos lidam com as máquinas. Antes, os processos eram feitos de forma muito manual e, com o tempo, acabaram sofrendo uma evolução natural voltada para a automação. Com o aprendizado de máqu...
Read moreMaximizing your Value from AI
Some organizations have already identified the benefits that can be gained from Artificial Intelligence and Data Science, bringing in talented resources to enable them to build AI models and solutions. But more often than not, the business doesn’t understand the capabilities and huge potential of AI well enough, nor the investments that a...
Read moreAI in the Financial Industry: 8 Key Takeaways from the Bill.com + H2O.ai Fireside Chat
The current global pandemic crisis presents various challenges to businesses in all industries, including financial services institutions, who are monitoring and dealing with the effects of COVID-19 across the world. At a time of a pandemic, it is important that teams get together to share their insights and experience, with the goal of i...
Read moreThe Importance of Explainable AI
This blog post was written by Nick Patience, Co-Founder & Research Director, AI Applications & Platforms at 451 Research, a part of S&P Global Market Intelligence From its inception in the mid-twentieth century, AI technology has come a long way. What was once purely the topic of science fiction and academic discussion is now...
Read moreBuilding an AI Aware Organization
Responsible AI is paramount when we think about models that impact humans, either directly or indirectly. All the models that are making decisions about people, be that about creditworthiness, insurance claims, HR functions, and even self-driving cars, have a huge impact on humans. We recently hosted James Orton, Parul Pandey, and Sudala...
Read moreH2O on Kubernetes using Helm
Deploying real-world applications using bare YAML files to Kubernetes is a rather complex task, and H2O is no exception. As demonstrated in one of the previous blog posts . Greatly simplified, a cluster of H2O open source machine learning nodes is brought up in the following manner: A headless service to make initial node discovery and ...
Read moreMaking AI a Reality
This blog post focuses on the content discussed in more depth in the free ebook “ Practical Advice for Making AI Part of Your Company’s Future”. Do you want to make AI a part of your company? You can’t just mandate AI. But you can lead by example.All too often, especially in companies new to AI and machine learning, team leaders may be ta...
Read moreCombining the power of KNIME and H2O.ai in a single integrated workflow
KNIME and H2O.ai , the two data science pioneers known for their open source platforms, have partnered to further democratize AI. Our approaches are about being open, transparent, and pushing the leading edge of AI. We believe strongly that AI is not for the select few but for everyone. We are taking another step in democratizing AI by ...
Read moreThe Challenges and Benefits of AutoML
Machine Learning and Artificial Intelligence have revolutionized how organizations are utilizing their data. AutoML or Automatic Machine Learning automates and improves the end-to-end data science process. This includes everything from cleaning the data, engineering features, tuning the model, explaining the model, and deploying it into p...
Read moreH2O Release 3.32 (Zermelo)
There’s a new major release of H2O, and it’s packed with new features and fixes! Among the big new features in this release, we’ve added RuleFit — an interpretable machine learning algorithm , introduced a new toolbox for model explainability, made Target Encoding work for all classes of problems, and integrated it in our AutoML framewor...
Read more5 Key Elements to Detecting Fraud Quicker With AI
The number of transactions using electronic financial instruments has been increasing by about 23% year over year. The global COVID-19 pandemic has only accelerated that process. Electronic means have become the primary vehicle of how people purchase their goods. With this sudden increase in transactions, fraud detection systems are stres...
Read moreEmpowering Snowflake Users with AI using SQL
At H2O.ai we work with many enterprise customers, all the way from Fortune 500 giants to small startups. What we heard from all these customers as they embark on their data science and machine learning journey is the need to capture and manage more data cost-effectively, and the ability to share that data across their organization to mak...
Read more3 Ways to Ensure Responsible AI Tools are Effective
Since we began our journey making tools for explainable AI (XAI) in late 2016, we’ve learned many lessons, and often the hard way. Through headlines, we’ve seen others grapple with the difficulties of deploying AI systems too. Whether it’s: a healthcare resource allocation system that likely discriminated against millions of black peop...
Read moreAccelerating AI Transformation in Healthcare
The healthcare industry is evolving rapidly with volumes of data and increasing challenges. Early adopters of AI and machine learning in the healthcare space have embraced new data-driven initiatives and are reaping the benefits not only in terms of patient care but also in their own operations. Hospitals, physicians, and laboratories can...
Read more5 Key Considerations for Machine Learning in Fair Lending
This month, we hosted a virtual panel with industry leaders and explainable AI experts from Discover, BLDS, and H2O.ai to discuss the considerations in using machine learning to expand access to credit fairly and transparently and the challenges of governance and regulatory compliance. The event was moderated by Sri Ambati, Founder and CE...
Read moreThe Benefits of Budget Allocation with AI-driven Marketing Mix Models
Excerpt of the white paper: “The Latest in AI Technologies Reinvent Media and Marketing Analytics @ Allergan” Authors: Akhil Sood, Associate Director @ Marketing Sciences, Allergan Dr. Michael Proksch, Senior Director @ H2o.ai Vijay Raghavan, Associate Vice President @ Marketing Sciences, AllerganIntroductionThe call for accountability in...
Read moreMy Experience at the World’s Best AI Company
Blog post by Spencer Loggia When H2O announced that remote work would continue through the summer due to Covid-19, I was a little disappointed. I expected that it would be difficult to connect with others as a new employee, especially as an intern. My internship now comes to an end, and I realize how completely wrong I was. I’ve met and w...
Read moreWhat it is like to intern at H2O.ai
Blog post by Jasmine Parekh Let’s be honest, 2020 is not going to go down as a glory year in history, unless something absolutely miraculous happens in the next few months. Generations of highschoolers down the line will sit in history class learning about the pandemic that halted the world. In the face of the virus, everyone around the w...
Read moreDesmistificando a Inteligência Artificial e seu papel no sucesso dos negócios
A Inteligência Artificial tem sido um termo bastante utilizado atualmente, mas será que todos sabem, na prática, o que ela significa e como se beneficiar dessa tecnologia inovadora? Assim como toda buzzword, a IA também gera muitos mitos. Entre eles, a crença de que a aprendizagem de máquinas irá substituir o trabalho dos cientistas de da...
Read moreModèles NLP avec BERT
H2O Driverless AI 1.9 vient de sortir, et je vous propose une série d’articles sur les dernières fonctionnalités innovantes de cette solution d’Automated Machine Learning, en commençant par l’implémentation de BERT pour les tâches NLPBERT , ou “Bidirectional Encoder Representations from Transformers” est considéré aujourd’hui comme l’éta...
Read moreExploring the Next Frontier of Automatic Machine Learning with H2O Driverless AI
At H2O.ai, it is our goal to democratize AI by bridging the gap between the State-of-the-Art (SOTA) in machine learning and a user-friendly, enterprise-ready platform. We have been working tirelessly to bring the SOTA from Kaggle competitions to our enterprise platform Driverless AI since its very first release. The growing list of Driver...
Read moreIn a World Where… AI is an Everyday Part of Business
Imagine a dramatically deep voice-over saying “In a world where…” This phrase from old movie trailers conjures up all sorts of futuristic settings, from an alien “world where the sun burns cold”, a Mad Max “world without gas” to a cyborg “world of the not too distant future”.Often the epic science fiction or futuristic stories also have a...
Read moreRunning Sparkling Water in Kubernetes
Sparkling Water can now be executed inside the Kubernetes cluster. Sparkling Water provides a Beta version of Kubernetes support in a form of nightlies. Both Kubernetes deployment modes, cluster and client, are supported. Also, both Sparkling Water backends and all clients are also ready to be tested. Sparkling Water in Kubernetes is ...
Read moreFrom GLM to GBM – Part 2
How an Economics Nobel Prize could revolutionize insurance and lending Part 2: The Business Value of a Better ModelIntroductionIn Part 1 , we proposed better revenue and managing regulatory requirements with machine learning (ML). We made the first part of the argument by showing how gradient boosting machines (GBM), a type of ML, can mat...
Read moreA Inteligência Artificial está transformando e alavancando negócios. Entenda como e por quê
Você sabia que inteligência artificial e machine learning não são conceitos novos? Pois eles surgiram pela primeira vez em 1956 na universidade de Dartmouth, nos Estados Unidos, mas vêm mudando e evoluindo significativamente ao longo do tempo. Hoje, a quantidade de dados que uma empresa dispõe para análise é gigantesca e seu crescimento é...
Read moreOn-Ramp to AI
The path to democratize AI starts with one class Artificial Intelligence (AI) is like a superhighway, it’s moving fast, evolving, and growing quickly. Like most things in life, data scientists are not born with AI and Machine Learning (ML) knowledge. They learn it. Learning is a journey. At H2O.ai, we are on a mission to democratize AI...
Read moreFrom GLM to GBM - Part 1
How an Economics Nobel Prize could revolutionize insurance and lending Part 1: A New Solution to an Old ProblemIntroductionInsurance and credit lending are highly regulated industries that have relied heavily on mathematical modeling for decades. In order to provide explainable results for their models, data scientists and statisticians i...
Read moreSparkling Water 3.30.0.3 is out
Sparkling Water is about making machine learning simple, speedy, and scalable with Apache Spark. This blog provides an overview of the following new features: No H2O Client on Spark Driver Speedups Automatic String conversion to Categoricals No H2O Client on Spark DriverPreviously, Sparkling Water always started worker nodes eith...
Read moreAre All Your AI and ML Models Wrong?
We are living in unprecedented times. Our society and economy are experiencing shocks beyond anything we have seen in living history. Beyond the human cost, there is a data science and machine learning elephant in the room (hopefully 2 meters away): Are your predictive models still doing the job you expect them to do?The challenge here i...
Read moreLessons of COVID-19 and Moving Forward: Key Takeaways
This week, we hosted our second virtual panel focused on how AI can empower healthcare organizations to make better decisions and save lives. Improved forecasting and predictions lead to higher chances in managing and mitigating adverse events, such as the COVID-19 pandemic. I’m proud to acknowledge that H2O.ai is committed to helping cus...
Read moreRunning H2O cluster on a Kubernetes cluster
H2O is an open-source, in-memory platform for distributed, scalable machine learning. A perfect match for deployment on a Kubernetes cluster, the very modern way of deploying, serving & scaling applications. With the major release 3.30.0.1, released in Q1 2020, H2O obtained first class Kubernetes support .This article explains how t...
Read moreH2O Release 3.30 (Zahradnik)
There’s a new major release of H2O, and it’s packed with new features and fixes! Among the big new features in this release, we’ve introduced support for Generalized Additive Models, added an option to build many models in parallel on segments of your dataset, improved support for deploying on Kubernetes, upgraded XGBoost with newly added...
Read moreBrief Perspective on Key Terms and Ideas in Responsible AI
INTRODUCTIONAs fields like explainable AI and ethical AI have continued to develop in academia and industry, we have seen a litany of new methodologies that can be applied to improve our ability to trust and understand our machine learning and deep learning models. As a result of this, we’ve seen several buzzwords emerge. In this short po...
Read moreThree Ways Data and AI is Helping Against COVID19
We are in the midst of a global crisis that epidemiologists have warned us about. As of today, 180 countries and sovereign regions have confirmed cases of patients infected with COVID19 (from here ). Putting aside evidence that indicates the virulence of the disease could be much worse, the fast spread of the virus and the presence of hi...
Read moreModelling Currently Infected Cases of COVID-19 Using H2O Driverless AI
In response to the wake of the pandemic called COVID-19, H2O.ai organized a panel discussion to cover AI in healthcare, and some best practices to put in place in order to achieve better outcomes. The attendees had many questions that we did not have the time to cover thoroughly throughout the course of that 1-hour discussion. We hope ...
Read moreDeploying Models to Maximise the Impact of Machine Learning — Part 1
Introduction to the 4 key pillars of considerations for model deployment (1st part of a blog series)So you have built a machine learning (ML) model which delivers a high level of accuracy and does not overfit. What value does it have now? Well, at the moment, nothing, zero, diddly squat. There is no economic value in a machine learning mo...
Read moreIgniting the AI in Healthcare Community
Yesterday we held our first Community Discussion on AI in Healthcare. Our CEO and founder, Sri Ambati led the discussion between Niki Athanasiadou, Marios Michailidis, one of our Grandmasters , and myself. We had nearly 1,300 participants registered from over 45 countries, and over half of those joined live others are viewing the replay. ...
Read moreCOVID-19: Doing Good with Data + AI
During times of severe societal strain, individuals have historically shown an inclination to offer aid and assistance. Often these sacrifices have been at great cost to life or livelihood. In other cases, the efforts have been seemingly more mundane but nevertheless still essential. The efforts of the over 10,000 women code breakers of W...
Read moreTake Your Pega CRM on the Road to AI Transformation
How well does your company know its customers and prospects? Are your people empowered with relevant information when they interact with clients? What guides your employees at every step of the customer journey? Every successful company depends on how well it can address each of these questions. Investments in Customer Relationship Manage...
Read moreHow H2O.ai is Reinventing Healthcare with AI
H2O.ai is hosting a virtual Meetup on AI and Healthcare: Best Practices for Better Outcomes. Join us on 26th March, for a community discussion to collaborate with us and leading healthcare organizations to share ideas and best practices including predicting hospital staffing needs, ICU transfers, as well as sepsis detection and more. Reg...
Read moreSummary of a Responsible Machine Learning Workflow
A paper resulting from a collaboration between H2O.AI and BLDS, LLC was recently published in a special “Machine Learning with Python” issue of the journal, Information (https://www.mdpi.com/2078-2489/11/3/137). In “A Responsible Machine Learning Workflow with Focus on Interpretable Models, Post-hoc Explanation, and Discrimination Testing...
Read moreIt is a privilege to serve the world in its hour of need – H2O.ai response to the COVID-19 pandemic
During the COVID-19 pandemic, our world, our nations, states, counties, cities and communities face an unprecedented challenge with an urgent need to help our citizens and ultimately our national and global economy. At highest risk are senior citizens, at-risk populations (individuals with immunodeficiency, hypertension, diabetes) and our...
Read moreHealth Outcomes and the Miracle of Data
In 1846, a physician named Ignatz Semmelweis, located at the Allgemeine Krankenhaus in Vienna, faced a dire healthcare crisis. He observed that the maternity ward in his own hospital (as well as those in other area hospitals) had a maternal mortality rate of over 15%. That is, one out of every six mothers who came to his hospital to give ...
Read moreDetecting Money Laundering Networks Using H2O Driverless AI
Note: Dr. Ashrith Barthur (Principal Security Scientist, H2O.ai) and Sandip Sharma (Director of Solution Engineering, H2O.ai) will be speaking about solving money laundering and other real-world problems using machine learning at our upcoming webinar. You can grab a spot here. Artificial Intelligence has evolved from being a buzz word t...
Read moreA Letter to the Makers at H2O.ai
To TeamAll,Many of you have already seen this alert from me in different variations over the last few weeks. Some of you are already remote and following some of these precautions.Starting today please make all meetings default to virtual or remote. Use Zoom, Webex, Slack, FaceTime, WhatsApp and WeChat to keep in touch with your teammates...
Read moreInsights From the New 2020 Gartner Magic Quadrant For Cloud AI Developer Services
We are excited to be named a Visionary in the new Gartner Magic Quadrant for Cloud AI Developer Services (Feb 2020), and have been recognized for both our completeness of vision and ability to execute in the emerging market for cloud-hosted artificial intelligence (AI) services for application developers. This is the second Gartner MQ tha...
Read moreAI & ML Platforms: My Fresh Look at H2O.ai Technology
2020: A new year, a new decade, and with that, I’m taking a new and deeper look at the technology H2O.ai offers for building AI and machine learning systems. I’ve been interested in H2O.ai since its early days as a company (it was 0xdata back then) in 2014. My involvement had been only peripheral, but now I’ve begun to work with this comp...
Read moreInterview with Patrick Hall | Machine Learning, H2O.ai & Machine Learning Interpretability
Audio Link: In this episode of Chai Time Data Science , Sanyam Bhutani interviews Patrick Hall, Sr. Director of Product at H2O.ai. Patrick has a background in Math and has completed a MS Course in Analytics.In this interview they talk all about Patrick’s journey into ML, ML Interpretability and his journey at H2O.ai, how his work has ev...
Read moreKey Takeaways from the 2020 Gartner Magic Quadrant for Data Science and Machine Learning
We are named a Visionary in the Gartner Magic Quadrant for Data Science and Machine Learning Platforms (Feb 2020). We have been positioned furthest to the right for completeness of vision among all the vendors evaluated in the quadrant. So let’s walk you through the key strengths of our machine learning platforms. Automatic Machine Learn...
Read moreBlink: Data to AI/ML Production Pipeline Code in Just a Few Clicks
You have the data and now want to build a really really good AI/ML model and deliver to production. There are three options available today: Write the code yourself in a Jupyter notebook/R Studio etc., for training/validation and dev-ops model handoff. You decided to do the feature engineering also. Build your own features like above,...
Read moreSpeed up your Data Analysis with Python’s Datatable package
A while ago, I did a write up on Python’s Datatable library . The article was an overview of the datatable package whose focus is on big data support and high performance. The article also compared datatable’s performance with the pandas’ library on certain parameters. This is the second article in the series with a two-fold objective: ...
Read moreParallel Grid Search in H2O
H2O-3 is, at its core, a platform for distributed, in-memory computing. On top of the distributed computation platform, the machine learning algorithms are implemented. At H2O.ai, we design every operation, be it data transformation, training of machine learning models or even parsing to utilize the distributed computation model. In ord...
Read moreThe Super Bowl and Data Science: Changing the NFL with the Power of Machine Learning
Super Bowl LIV came and went. The San Francisco 49ers vs the Kansas City Chiefs. Personally, being from the The Bay, I was rooting for the 49ers, but you can’t always get what you want. Whoever came out on top, though, we were all looking forward to a great game full of fantastic plays and the kind of gridiron tenacity where players lay i...
Read moreGrandmaster Series: How a Passion for Numbers Turned This Mechanical Engineer into a Kaggle Grandmaster
In conversation with Sudalai Rajkumar: A Kaggle Double Grandmaster and a Data Scientist at H2O.aiIt is rightly said that one should never seek praise. Instead, let the effort speak for itself. One of the essential traits of successful people is to never brag about their success but instead keep learning along the way. In the data science ...
Read moreHow H2O propels data scientists ahead of itself: enhancing Driverless AI models with advanced options, recipes and visualizations
H2O.ai engineers continually innovate and introduce new techniques by adopting latest research, working on cutting edge use cases, and participating in and winning machine learning competitions like Kaggle. But thanks to the explosion of AI research and applications even the most advanced automated machine learning platform like H2O Drive...
Read moreH2O Release 3.28 (Yu)
There’s a new major release of H2O, and it’s packed with new features and fixes! Among the big new features in this release, we’ve introduced support for Hierarchical GLM, added an option to parallelize Grid Search, upgraded XGBoost with newly added features, and improved our AutoML framework. The release is named after Bin Yu .Hierarchi...
Read moreInterview with Arno Candel | AutoML | Physics | CTDS.Show
In this episode, Sanyam Bhutani interviews Dr. Arno Candel: CTO at H2O.ai They talk about Arno’s journey into the field with amazing comments and insights by Arno applicable to the field. They talk all about Arno’s journey and ML, Automated Machine Learning Broadly speaking. Arno’s journey from Physics to Software Engineering to Machine L...
Read moreWhy you should care about debugging machine learning models
This blog post was originally published here. Authors: Patrick Hall and Andrew Burt For all the excitement about machine learning (ML), there are serious impediments to its widespread adoption. Not least is the broadening realization that ML models can fail. And that’s why model debugging, the art and science of understanding and fixing p...
Read moreHow to Effectively Employ an AI Strategy in your Business
Artificial Intelligence has evolved from being a buzz word to a reality today. Companies with expertise in machine learning systems are looking to graduate to Artificial Intelligence-based technologies. The enterprises that do not yet have a machine learning culture are trying to devise a strategy to put one in place. Amidst t...
Read moreScalable AutoML in H2O
Note: I’m grateful to Dr. Erin LeDell for the suggestions, corrections with the writeup. All of the images used here are from the talks’ slides. Erin Ledell’s talk was aimed at AutoML : Automated Machine Learning , broadly speaking, followed by an overview of H2O’s Open Source Project and the library. H2O AutoML provides an easy-to-use ...
Read moreMeet Yauhen Babakhin: The first and the only Kaggle Grandmaster from Belarus
There is more to competitive Data Science than simply applying algorithms to get the best possible model. The main takeaway from participating in these competitions is that they provide an excellent opportunity for learning and skill-building. The learnings can then be utilized in one’s academic or professional life. Kaggle is one of th...
Read moreClimbing the AI and ML Maturity Model Curve
AI/ML Maturity Model Curve/StepsAI/ML Maturity models are published and updated periodically by a lot of vendors. The end goal is almost always about effecting transformation and automate processes in a short period and making AI the DNA/core of the business.One of the biggest challenges for businesses today is to clearly define what succ...
Read moreHow to write a Transformer Recipe for Driverless AI
What is a transformer recipe? A transformer (or feature) recipe is a collection of programmatic steps, the same steps that a data scientist would write a code to build a column transformation. The recipe makes it possible to engineer the transformer in training and in production. The transformer recipe, and recipes, in general, provide a...
Read moreNovel Ways To Use Driverless AI
I am biased when I write that Driverless AI is amazing, but what’s more amazing is how I see customers using it. As a Sales Engineer, my job has been to help our customers and prospects use our flagship product. In return, they give us valuable feedback and talk about how they used it. Feedback is gold to us. Driverless AI has evolved in...
Read moreImage Tasks on H2O Driverless AI
I’d like to thank Grandmaster Yauhen Babakhin for reviewing the drafts and the very useful corrections & suggestions. Link to the video. IntroductionIn this talk Kaggle GrandMaster and Data Scientist at H2O.ai: Yauhen Babakhin shows us a few prototype demos of how DriverlessAI’s upcoming release will work with Image Data and the relat...
Read moreAccelerate Machine Learning workflows with H2O.ai Driverless AI on Red Hat OpenShift, Enterprise Kubernetes Platform
Organizations globally are operationalizing containers and Kubernetes to accelerate Machine Learning lifecycles as these technologies provide data scientists and software developers with much needed agility, flexibility, portability, and scalability to train, test, and deploy ML models in production. Red Hat OpenShift is the industry’s mo...
Read moreImporting, Inspecting, and Scoring With MOJO Models Inside H2O
Machine-learning models created with H2O may be exported in two basic ways: Binary format, Model Object, Optimized (MOJO). An H2 O model can be saved in a binary format, which is tied to the very specific version of H2 O it has been created with. There are multiple reasons for such a restriction. One of the important reasons is that...
Read moreNatural Language Processing in H2O’s Driverless AI
Note: I’d like to thank Grandmaster SRK for a lot of suggestions and corrections with the writeup.Note: All images used here are from the talk. Link to the slides Link to the video Note 2: All of the discussion here is related to NLP. DriverlessAI also supports other domains that are covered in other talks and posts (releasing soon). Driv...
Read moreHighlights of H2O World New York 2019
H2O World New York happened a few days ago and we are still in awe of the conference. It is rewarding to see such a strong community and recognized industry professionals making meaningful connections and learning with each other. We are grateful for having so many makers and customers joining us – in person and via live stream – for a fu...
Read moreTakeaways from the World’s largest Kaggle Grandmaster Panel
Disclaimer: We were made aware by Kaggle of adversarial actions by one of the members of this panel. This panelist is no longer a Kaggle Grandmaster and no longer affiliated with H2O.ai as of January 10th, 2020. Personally, I’m a firm believer and fan of Kaggle and definitely look at it as the home of Data Science. ...
Read moreA Full-Time ML Role, 1 Million Blog Views, 10k Podcast Downloads: A Community Taught ML Engineer
Content originally posted in HackerNoon and Towards Data Science 15th of October, 2019 marks a special milestone, actually quite a few milestones. So I considered sharing it in the form a blog post, on a publication that has been home to all of my posts The online community has been too kind to me and these blog posts have been a method ...
Read moreThe Data Scientist who rules the "Data Science for Good" competitions on Kaggle
In conversation with Shivam Bansal: A Data Scientist, a Kaggle Kernel’s Grandmaster, and three times winner of Kaggle’s Data Science for Good Competition. Communication is an art and a useful tool in the Data Science domain. Being able to communicate the insights is necessary so that others can take the required actions based on the resu...
Read moreA Deep Dive into H2O’s AutoML
The demand for machine learning systems has soared over the past few years. This is majorly due to the success of Machine Learning techniques in a wide range of applications. AutoML is fundamentally changing the face of ML-based solutions today by enabling people from diverse backgrounds to use machine learning models to address complex ...
Read moreMake your own AI — Add Your Game to Auto-ML Models
When Features and Algorithms compete, your Business Use Case(s) wins! H2O Driverless AI is an Automatic Feature Engineering /Machine Learning platform to build AI/ML models on tabular data. Driverless AI can build supervised learning models for Time Series forecasts, Regression , Classification , etc. It supports a myriad of built-i...
Read moreH2O World New York: The Countdown is On!
Every H2O World is magical. The preparation for the conference starts many months in advance and we put a lot of effort and love in every single detail to provide our beloved community with the best experience possible. Our upcoming H2O World New York on October 22 is the third edition I work on as part of the marketing team at H2O.ai. My...
Read more5 Key Takeaways On Overcoming Gender and Diversity Barriers
Overcoming gender and diversity barriers in the workplace is a challenge for many industries. Therefore, listening to women and discussing the topic is the first step towards finding out how to address gender bias and possible inequalities. Last month, H2O.ai organized a panel in New York: Breaking gender and diversity barriers in machi...
Read morePredicting Failures from Sensor Data using AI/ML — Part 2
This is Part 2 of the blog post series and continuation of the original post, Predicting Failures from Sensor Data using AI/ML — Part 1 .Missing Values & Data ImbalanceOne of the things to note is that the hard-disk data set has a lot of missing values across its columns. Check out the Missing Data Heat Map on the training data set — ...
Read moreH2O Driverless AI: The Workbench for Data Science
This blog was written by Rohan Gupta and originally published here. 1. IntroductionIn today’s world, being a Data Scientist is not limited to those without technical knowledge. While it is recommended and sometimes important to know a little bit of code, you can get by with just intuitive knowledge. Especially if you’re on H2O’s Driverle...
Read moreH2O Driverless AI Acceleration with Intel DAAL
This week at Strata NY 2019 we will be demoing a custom recipe that incorporates the Intel Data Analytics Acceleration Libraray (DAAL) algorithm into Driverless AI. This blog will provide an introduction to Intel DAAL and how the Make-Your-Own-Recipe capability extends H2O Driverless AI. If you are at Strata NY 2019, stop by the Intel bo...
Read moreCustom recipes for Driverless AI: Prophet and pmdarima cases
Last updated: 09/23/19 H2O Driverless AI provides a great new feature called “custom recipes”. These recipes are essentially custom snippets of code which can incorporate any machine learning algorithm , any scorer/metric and any feature transformer. A user can create custom recipes using python utilizing any external library or his/her o...
Read moreFrom Academia to Kaggle and H2O.ai: How a Physicist found love in Data Science
Learning and taking inspirations from others is always helpful. It makes even more sense in the Data Science realm, which is continuously being bombarded with new courses, MOOCs, and recommendations with every passing day. Not only such a lot of choices become overwhelming but also perplexing at times. With this thought in mind, we bring...
Read moreRegression Metrics' Guide
Introduction As part of my role within the automated machine learning space with H2O.AI and Driverless AI, I have seen that many times people struggle to find the right optimization metric for their data science problems. This process is even more challenging in regression problems where the errors are often not bounded like you norma...
Read moreSeries ‘D’emocratize
Last month was very emotional for me and I suspect it was the same for many of my fellow Makers at H2O.ai. The news broke that H2O.ai raised its Series D funding of $72.5 million led by Goldman Sachs and Ping An. While some of my friends were ecstatic for me, I felt like a big weight had been lifted off me. The best word to describe what ...
Read moreDriverless AI can help you choose what you consume next
Last updated: 09/06/19 Steve Jobs once said, “A lot of times, people don’t know what they want until you show it to them’. This makes sense, especially in this era of constant choice overload. Consumers today have access to a plethora of products just at the click of their mouse. These innumerable choices can sometimes turn out to be ...
Read moreStartup Aims to Democratize AI
Adam Janofsky at the Wall Street Journal wrote a wonderful article about our company, and our eloquent and philosophical CEO and Founder, Sri Ambati. The makers at H2O.ai believe deeply in our mission to democratize AI for everyone, and we can see a future where every company can be an AI company. Read more below, and enjoy! Startup Aims ...
Read morePredicting Failures from Sensor Data using AI/ML— Part 1
Last updated: 08/26/19 Whether it’s healthcare, manufacturing or anything that we depend on either personal or in business, Prevention of a problem is always known to be better than cure! Classic prevention techniques involve time-based checks to see how things are progressing, positively or negatively. Time-based chec...
Read moreNew Innovations in Driverless AI
What’s new in Driverless AIWe’re super excited to announce the latest release of H2O Driverless AI . This is a major release with a ton of new features and functionality. Let’s quickly dig into all of that: Make Your Own AI with Recipes for Every Use Case: In the last year, Driverless AI introduced time-series and NLP recipes to meet the...
Read moreInterns Gonna Make
Blog post by Megan Chan When I first walked through the front doors of the H2O.ai Mountain View office, I have to admit, thoughts of robots, cyborgs, and Arnold Schwarzenegger as The Terminator were in the back of my mind. However, my initial preconceived notions were quickly put to rest.I am a third-year college intern studying Psycholog...
Read moreA Maker Data Scientist’s journey: from Sudoku to Kaggle
If you put enough smart people together in one space, good things happen. Erik Hersman One of the perks of being a part of H2O.ai is that you get to work with some of the brightest minds on the planet. Here you get to closely engage with people who have a great deal of experience, as well as expertise. One such set of specialists here ar...
Read moreMy Summer Internship at H2O.ai
I can’t believe the summer is nearing an end. What an amazing experience I have had at H2O.ai. As I reflect back, I am so fortunate to have learned so much, formed meaningful relationships, developed people skills and applied my creativity. The whole team has been so encouraging, supportive, and inviting throughout my internship, makin...
Read moreDetecting Sarcasm is difficult, but AI may have an answer
Recently, while shopping for a laptop bag, I stumbled upon a pretty amusing customer review: “This is the best laptop bag ever. It is so good that within two months of use, it is worthy of being used as a grocery bag.” The innate sarcasm in the review is evident as the user isn’t happy with the quality of the bag. However, as the sentence...
Read moreMitigating Bias in AI/ML Models with Disparate Impact Analysis
Everyone understands that the biggest plus of using AI/ML models is a better automation of day-to-day business decisions, personalized customer service, enhanced user experience, waste elimination, better ROI, etc. The common question that comes up often though is — How can we be sure that the AI/ML decisions are free from bias/discrimina...
Read moreH2O Release 3.26 (Yau)
There’s a new major release of H2O, and it’s packed with new features and fixes! Among the big new features in this release, we’ve introduced the ability to define a Custom Loss Function in our GBM implementation, and we’ve extended the portfolio of our machine learning algorithms with the implementation of the SVM algorithm. The release...
Read moreA Driverless Approach to Make Forecasting Easy — Part 1
You are from the supply chain department or in a role in charge of creating future estimates on Product Sales, Patient admission, Retail Store Staffing, Energy use, Ticket sales, etc., based on historical data. A common problem is to forecast numbers one week, 4 weeks, 6 months or 1–5 year, etc., in future — basically short term &a...
Read moreCustom Machine Learning Recipes: The ingredients for success
Last updated: 07/23/19Machine learning is akin to cooking in several ways. A perfect dish originates from a tried-and-tested recipe, has the right combination of ingredients, and is baked at just the right temperature. Successful AI solutions work on the same principle. One needs fresh and right quality ingredients in the form of data, ...
Read moreAI for Smarter Manufacturing
Code 3Manufacturing is a centuries old industry and has seen significant changes dating back to the first Industrial Revolution in the late 18th century. The use of conveyor belt assembly lines to replace assembly workers, newer precision robot technologies to further reduce manufacturing time, advances in ERP, historian databases, stora...
Read moreLeads to Leases
There is such a large amount of unstructured data being produced by companies. I personally find it so interesting that there is so much meaning and hidden value in text, audio, and visual content. Until recently, much of this data would go unused. However, since the rise of machine learning and artificial intelligence, it became possibl...
Read moreGetting started with H2O using Flow
This blog was originally published on towardsdatascience: https://towardsdatascience.com/getting-started-with-h2o-using-flow-b560b5d969b8A look into H2O’s open-source UI for combining code execution, text, plots, and rich media in a single document. Data collection is easy. Decision making is hard. Today, we have access to a humungous...
Read moreArmadaHealth Uses AI to Match Patients with Specialists to Improve Health Outcomes
As an intern for H2O.ai, I am amazed to see how instrumental AI has been in transforming people’s lives for the better. Especially in healthcare, AI is bringing increased efficiency, ease, and helping people lead healthier lives. In this blog, I learned about how AI is helping potential patients find the right specialist for their needs a...
Read moreToward AutoML for Regulated Industry with H2O Driverless AI
Predictive models in financial services must comply with a complex regime of regulations including the Equal Credit Opportunity Act (ECOA), the Fair Credit Reporting Act (FCRA), and the Federal Reserve’s S.R. 11-7 Guidance on Model Risk Management. Among many other requirements, these and other applicable regulations stipulate predictive ...
Read moreUnderwrite.ai Transforms Credit Risk Decision-Making Using AI
Determining credit has been done by traditional techniques for decades. The challenge with traditional credit underwriting is that it doesn’t take into account all of the various aspects or features of an individual’s credit ability. Underwrite.ai, a new credit startup, saw this as an opportunity to apply machine learning and AI to impro...
Read moreThe Reproductive Science Center of SF Bay Area uses AI to Treat Infertility
Having your own baby may be a dream that many people have but some cannot realize until they seek specialized help. The Reproductive Science Center of SF Bay Area is one of the pioneer organizations conducting in-vitro fertilization. They strive to produce healthy babies for their patients. However, every patient has their own set of obst...
Read moreMachine Learning on VMware: Training a Model with H2O.ai Tools, Inference using a REST Server and Kubernetes
This blog was originally posted by Justin Murray of VMware and can be accessed here. In this article, we explore the tools and process for (1) training a machine learning model on a given dataset using the H2O Driverless AI (DAI) tool, and (2) deploying a trained model, as part of a scoring pipeline, to a REST server for use by busi...
Read moreAn Overview of Python’s Datatable package
This blog originally appeared on Towardsdatascience.com “There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days”: Eric Schmidt If you are an R user, chances are that you have already been using the data.ta...
Read moreBuilding an Interpretable & Deployable Propensity AI/ML Model in 7 Steps…
To start with, you may have a tabular data set with a combination of: Dates/Timestamps Categorical Values Text strings Numeric Values A business sponsor wants to build a Propensity to Buy model from historical data.How many Steps does it take? Let’s find out. We are going to use H2O’s Driverless AI instance with 1 GPU (optional...
Read moreForrester Research recognizes H2O.ai as a leader in the New Automatic Machine Learning Wave
Today, The Forrester New Wave™ : Automation-Focused Machine Learning Solutions, Q2 2019 was published by Forrester Research. We are thrilled that this leading analyst firm recognized us as a clear leader in their Automatic Machine Learning evaluation. We could not be prouder of our unwavering strategy and hard work that we believe is prop...
Read moreH2O.ai Automatic Machine Learning on Red Hat OpenShift Container Platform Delivers Data Science Ease and Flexibility at Scale
Last week at Red Hat Summit in Boston, Sri Ambati, CEO and Founder, demonstrated how to use our award-winning automatic machine learning platform, H2O Driverless AI , on Red Hat OpenShift Container Platform. You can watch the replay here .What we showed not only helps data scientists achieve results, it also enables them to scale their ...
Read more6 Tips to Having it All
I posted this blog on Medium two years ago, thought I’d share a slight rework of it with all the Mothers and Makers out there again.It’s Mother’s Day, and today is when I count my blessings. I am the mother of a wonderful blended family. I have four children of my own, and three stepchildren. Do the math… that’s 7! They are all great you...
Read moreAI/ML Projects — Don’t get stymied in the last mile
Data Scientists build AI/ML models from data, and then deploy it to production – in addition to a plethora of tasks around data insights, data cleansing etc., Part of the Data Scientist job description/requirement is making models available for transparency, auditability as well as explainability for both regulators as well as internal bu...
Read moreHortifrut uses AI to Determine the Freshness of Blueberries
Who doesn’t love sweet, delicious blueberries?Providing a steady supply of beautiful, tasty berries to the market is no small effort and Hortifrut, based in Chile, has been growing and distributing berries for the last 30 years. Today, they are using AI to provide fresh berries to the world everyday.Hortifrut, the largest global producer ...
Read moreCan Your Machine Learning Model Be Hacked?!
I recently published a longer piece on security vulnerabilities and potential defenses for machine learning models. Here’s a synopsis.IntroductionToday it seems like there are about five major varieties of attacks against machine learning (ML) models and some general concerns and solutions of which to be aware. I’ll address them one-by-o...
Read moreH2O Driverless AI Updates
We are excited to announce the new release of H2O Driverless AI with lots of improved features.Below are some of the exciting new features we have added:Version 1.6.1 LTS (April 18, 2019) – Available here Several improvements for MLI (partial dependence plots, Shapley values) Improved documentation for model deployment, time-series ...
Read moreH2O World Explainable Machine Learning Discussions Recap
Earlier this year, in the lead up to and during H2O World, I was lucky enough to moderate discussions around applications of explainable machine learning (ML) with industry-leading practitioners and thinkers. This post contains links to these discussions, written answers and pertinent resources for some of the most common questions asked ...
Read moreH2O-3, Sparkling Water and Enterprise Steam Updates
We are excited to announce the new release of H2O Core, Sparkling Water and Enterprise Steam.Below are some of the new features we have added:H2O-3 Yates (3.24.0.1) – 3/31/2019Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/1/index.html Bug [PUBDEV-6159] – The AutoMLTest.java test suite now runs correctly on a local mach...
Read moreH2O Release 3.24 (Yates)
There’s a new major release of H2O, and it’s packed with new features and fixes! Among the big new features in this release, we’ve introduced cross-version support for model import, added new features for model interpretation, provided much-improved support for reading data from Apache Hive, and included various algorithm and AutoML impr...
Read moreBuilding AI/ML models on Lending Club Data, with H2O.ai — Part 1
Lending Club publishes its basic loan databases to the public and a full version to its customers — anonymized of course. You can find the download page from this link (screenshot below): The publicly downloadable loan data has various attributes — roughly 150+ columns that have categorical, numeric, text and date fields. It also has a ‘...
Read moreAI/ML Model Scoring - What Good Looks Like in Production
One of the main reasons why we build AI/Machine Learning models is for it to be used in production to support expert decision making. Whether your business is deciding what creatives your customers should be getting on emails or determining a product recommendation for a web page, AI/Models provide relevance/context to customers to drive ...
Read moreMachine Learning with H2O – the Benefits of VMware
This blog was originally posted by Justin Murray of VMware and can be accessed here. This brief article introduces a short 4.5 minute video that explains the reasons why VMware vSphere is a great platform for data scientists/engineers to use as their base operating platform. The video then demonstrates an example of this, showing a data...
Read moreHow to explain a model with H2O Driverless AI
The ability to explain and trust the outcome of an AI-driven business decision is now a crucial aspect of the data science journey. There are many tools in the marketplace that claim to provide transparency and interpretability around machine learning models but how does one actually explain a model? H2O Driverless AI provides robust inte...
Read moreBoosting your ROI with AutoML & Automatic Feature Engineering
If your business has started using AI/ML tools or just started to think about it, this blog is for you. Whether you are a data scientist, VP of data science or a line of a business owner, you are probably wondering how AI will impact your organization in various ways or why your current strategies are not working somehow. If you are not ...
Read moreWhat is Your AI Thinking? Part 3
In the past two posts we’ve learned a little about interpretable machine learning in general. In this post, we will focus on how to accomplish interpretable machine learning using H2O Driverless AI . To review, the past two posts discussed: Exploratory data analysis (EDA) Accurate and interpretable models Global explanations Local...
Read more8 Tips to Make AI Happen Without Getting Fired
“AI is the fastest growing workload on the planet,” Mike Gualtieri of Forrester Research.Last week, during H2O World San Francisco, we had the privilege to hear featured speaker Mike Gualtieri from Forrester Research offer tips on how to make AI happen without getting fired. This knowledge, he explained, was acquired by talking to enterp...
Read moreThe Journey of Pi and AI: An AI conference with heart
I was in San Francisco this (past) week as part of H2O World 2019. I flew in the week before and took a red-eye flight back home right after the conference on Tuesday night. Like any technology conference, this one had fantastic presentations, training, and product roadmap presentations. We even live streamed it if you couldn’t be there i...
Read moreKey Takeaways from the Gartner Magic Quadrant For Data Science & Machine Learning
The Gartner Magic Quadrant for Data Science and Machine Learning Platforms (Jan 2019) is out and H2O.ai has been named a Visionary. The Gartner MQ evaluates platforms that enable expert data scientists, citizen data scientists and application developers to create, deploy and manage their own advanced analytic models.H2O.ai Key Highlights...
Read moreWhat is Your AI Thinking? Part 2
Explaining AI to the Business PersonWelcome to part 2 of our blog series: What is Your AI Thinking? We will explore some of the most promising testing methods for enhancing trust in AI and machine learning models and systems. We will also cover the best practice of model documentation from a business and regulatory standpoint.More Techniq...
Read moreH2O New Year releases
There were two releases shortly after each other. First, on December 21st, there was a minor (fix) release 3.22.0.3 . Immediately followed by a more major release (but still on 3.22 branch) codename Xu, named after mathematician Jinchao Xu , whose work is focused on deep neural networks, besides many other fields of research.Of course, th...
Read moreWhat is Your AI Thinking? Part 1
Explaining AI to the Business PersonExplainable AI is in the news, and for good reason. Financial services companies have cited the ability to explain AI-based decisions as one of the critical roadblocks to further adoption of AI for their industry . Moreover, interpretability, fairness, and transparency of data-driven decision support sy...
Read moreFinally, You Can Plot H2O Decision Trees in R
Creating and plotting decision trees (like one below) for the models created in H2O will be the main objective of this post: Figure 1. Decision Tree Visualization in R Decision Trees with H2O With release 3.22.0.1 H2O-3 (a.k.a. open source H2O or simply H2O) added to its family of tree-based algorithms (which already included DR...
Read moreCelebrating our community and wins!
The last year was an amazing year at H2O.ai. We organized two H2O World’s, gathering thousands of attendees in person and online both in New York and London. Throughout the year, we garnered multiple industry awards and honors for AI and machine learning, but our customers received awards as well for the work they are doing with our techn...
Read moreWhat Business Leaders Need to Know About AI
The interest around artificial intelligence (AI) is at an all-time fevered pitch right now, and it’s important to understand why.AI can solve real business problems and address very complex situations. Organizations and business leaders should start with the idea of how AI can help by identifying a business problem or use case that they c...
Read moreFinding Clarity in the Automated Modeling Space
There is an arms race happening in Data Science and Machine Learning space. It’s the race toward automation. Granted, the questions we as Data Scientists are asked to solve for will never be automated, but many of the routine tasks will be. What are these routine tasks? They range from data ingestion to feature generation. Then we have l...
Read moreFor Today’s BI Analyst - Accelerating your AI/ML efforts with Driverless AI
Whether you are starting out as a novice data scientist or a veteran in AI and Machine Learning, modern tools can guide you in creating some of the best models from your data. Not to mention, ease of moving models to production.Also don’t forget the experienced BI Analysts in your organization, who wants to play with data science , only t...
Read moreThe Making of H2O Driverless AI - Automatic Machine Learning
It is my pleasure to share with you some never before exposed nuggets and insights from the making of H2O Driverless AI, our latest automatic machine learning product on our mission to democratize AI. This has been truly a team effort, and I couldn’t be more proud of our brilliant makers who continue to relentlessly create and innovate. T...
Read moreGratitude and thank you, makers!
Makers,Happy Thanksgiving – Hope you get to spend time with your loved ones this week.Thank them on our behalf, on your own, thank our neighbors, thank our teachers, thank our firemen, doctors, our farmers, our uber/lyft drivers, our engineers, our assistants, painters, news writers, bartenders, our chefs and a million others who play the...
Read moreNew features in H2O 3.22
Xia Release (H2O 3.22)There’s a new major release of H2O and it’s packed with new features and fixes! Among the big new features in this release, we introduce Isolation Forest to our portfolio of machine learning algorithms and integrates the XGBoost algorithm into our AutoML framework. The release is named after Zhihong Xia .Isolation ...
Read moreTop 5 things you should know about H2O World London
We had a blast at H2O World London last week! With a record number of attendees on-site and through the live stream, it’s clear that our AI and machine learning conference was indeed a huge success and we strongly believe this achievement is a result of dedicated preparation and great love – for and from – our community and makers. So, fi...
Read moreAnomaly Detection with Isolation Forests using H2O
IntroductionAnomaly detection is a common data science problem where the goal is to identify odd or suspicious observations, events, or items in our data that might be indicative of some issues in our data collection process (such as broken sensors, typos in collected forms, etc.) or unexpected events like security breaches, server failu...
Read moreLaunching the Academic Program … OR ... What Made My First Four Weeks at H2O.ai so Special!
We just launched the H2O.ai Academic Program at our sold-out H2O World London. With nearly 1000 people in attendance, we received the first online sign-up forms submitted by professors and students alike. This program will massively democratize AI in academia, increasing the number of AI-skilled graduates – with both technical and busine...
Read moreWelcome H2O.ai's Driverless AI Community!
I am very excited to announce the formation of the inaugural community for H2O Driverless AI users. The Driverless AI Community is open for anyone looking to engage with other users as well as experts from H2O.ai’s Driverless AI, Driverless AI is an award-winning automatic machine learning platform that does “AI to do AI” to solve re...
Read moreHow This AI Tool Breathes New Life Into Data Science
Ask any data scientist in your workplace. Any Data Science Supervised Learning ML/AI project will go through many steps and iterations before it can be put in production. Starting with the question of “Are we solving for a regression or classification problem?” Data Collection & Curation Are there Outliers? What is the Distribu...
Read moreWhat does NVIDIA’s Rapids platform mean for the Data Science community?
Today NVIDIA announced the launch of the RAPIDS suite of software libraries to enables GPU acceleration for data science workflows and we’re excited to partner with NVIDIA to bring GPU accelerated open source technology for the machine learning and AI community. “Machine learning is transforming businesses and NVIDIA GPUs are speeding...
Read moreAutomatic Feature Engineering for Text Analytics - The Latest Addition to Our Kaggle Grandmasters' Recipes
According to Kaggle’s ‘The State of Machine Learning and Data Science ’ survey , text data is the second most used data type at work for data scientists. There are a lot of interesting text analytics applications like sentiment prediction, product categorization, document classification and so on. In the latest version (1.3) of our Driver...
Read moreKey Takeaways from the Forrester Notebook Wave
The Forrester Wave: Notebook-Based Predictive Analytics and Machine Learning Solutions, Q3 2018 is out, and H2O.ai is a Strong Performer! The report looks at machine learning platforms centered on R and Python languages using notebooks like Jupyter and Zeppelin. Vendors are evaluated along three dimensions including market presence, curre...
Read moreH2O for Inexperienced Users
Some background: I am a rising senior in highschool, and the summer of 2018, I interned at H2O.ai. With no ML experience beyond Andrew Ng’s Introduction to Machine Learning course on Coursera and a couple of his deep learning courses, I initially found myself slightly overwhelmed by the variety of new algorithms H2O has to offer in both ...
Read moreInterpretability: The missing link between machine learning, healthcare, and the FDA?
Recent advances enable practitioners to break open machine learning’s “black box”.From machine learning algorithms guiding analytical tests in drug manufacture, to predictive models recommending courses of treatment, to sophisticated software that can read images better than doctors, machine learning has promised a new world of healthcar...
Read moreThe different flavors of AutoML
In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software (e.g. H2O , scikit-learn , keras ). Although these tools have made it easy to train and evaluate ma...
Read moreH2O’s AutoML in Spark
This blog post demonstrates how H2O’s powerful automatic machine learning can be used together with the Spark in Sparkling Water.We show the benefits of Spark & H2O integration, use Spark for data munging tasks and H2O for the modelling phase, where all these steps are wrapped inside a Spark Pipeline. The integration between Spark and...
Read moreH2O-3 on FfDL: Bringing deep learning and machine learning closer together
This post originally appeared in the IBM Developer blog here. This post is co-authored by Animesh Singh, Nicholas Png, Tommy Li, and Vinod Iyengar. Deep learning frameworks like TensorFlow, PyTorch, Caffe, MXNet, and Chainer have reduced the effort and skills needed to train and use deep learning models. But for AI developers and data ...
Read moreHow to Frame Your Business Problem for Automatic Machine Learning
Over the last several years, machine learning has become an integral part of many organizations’ decision-making at various levels. With not enough data scientists to fill the increasing demand for data-driven business processes, H2O.ai has developed a product called Driverless AI that automates several time consuming aspects of a typica...
Read moreTime is Money! Automate Your Time-Series Forecasts with Driverless AI
Time-series forecasting is one of the most common and important tasks in business analytics. There are many real-world applications like sales, weather, stock market, energy demand, just to name a few. We strongly believe that automation can help our users deliver business value in a timely manner. Therefore, once again we translated our ...
Read moreH2O.ai and IBM build a Strategic Partnership to bring AI innovation to the market together
Excited to announce our strategic partnership with IBM that allows them to resell and take to market H2O Driverless AI to businesses worldwide. This partnership makes AI economical – faster, cheaper and easier to do experiments. H2O Driverless AI and IBM POWER9 GPU Systems are bringing together the best of breed AI innovation. We have b...
Read moreAI in Healthcare - Redefining Patient & Physician Experiences
Register for the Meetup Here Patients, physicians, nurses, health administrators and policymakers are beneficiaries of the rapid transformations in health and life sciences. These transformations are being driven by new discoveries (etiology, therapies, and drugs/implants), market reconfiguration and consolidation, a movement to value-bas...
Read moreFrom Kaggle Grand Masters’ Recipes to Production Ready in a Few Clicks
Introducing Accelerated Automatic Pipelines in H2O Driverless AIAt H2O, we work really hard to make machine learning fast, accurate, and accessible to everyone. With H2O Driverless AI, users can leverage years of world-class, Kaggle Grand Masters experience and our GPU-accelerated algorithms (H2O4GPU ) to produce top quality predictive ...
Read moreH2O World coming to NYC
Whether you’re just starting out learning how machine learning and H2O.ai can supercharge your business or a veteran looking for more, we want to invite you to join some of greatest minds in the field to learn how AI and H2O.ai can transform your business. Our flagship event, H2O World is back and it’s going to be bigger than ever! We’re ...
Read moreDemocratize care with AI — AI to do AI for Healthcare
Very excited to have Prashant Natarajan (@natarpr) join us along with Sanjay Joshi on our vision to change the world of healthcare with AI. Health is wealth. And one worth saving the most. They bring invaluable domain knowledge and context to our cause. As one of our customers would like to say, Healthcare should be optimized for health...
Read moreSparkling Water 2.3.0 is now available!
Hi Makers! We are happy to announce that Sparkling Water now fully supports Spark 2.3 and is available from our download page . If you are using an older version of Spark, that’s no problem. Even though we suggest upgrading to the latest version possible, we keep the Sparkling Water releases for Spark 2.2 and 2.1 up-to-date with the lates...
Read moreH2O + Kubeflow/Kubernetes How-To
Today, we are introducing a walkthrough on how to deploy H2O 3 on Kubeflow. Kubeflow is an open source project led by Google that sits on top of the Kubernetes engine. It is designed to alleviate some of the more tedious tasks associated with machine learning. Kubeflow helps orchestrate deployment of apps through the full cycle of devel...
Read moreMakers in Action: Community, Partners and Team Members at #GTC18
NVIDIA’s GPU Technology Conference (GTC) has been incredible! Folks from all over the world are exploring the latest breakthroughs in self-driving cars, smart cities, healthcare, high performance computing, virtual reality, and more, all propelled by the AI movement. If you’re attending GTC and would like to see our solutions in action (r...
Read moreH2O4GPU now available in R
In September, H2O.ai released a new open source software project for GPU machine learning called H2O4GPU . The initial release (blog post here ) included a Python module with a scikit-learn compatible API, which allows it to be used as a drop-in replacement for scikit-learn with support for GPUs on selected (and ever-growing) algorithms. ...
Read moreCome meet the Makers!
NVIDIA’s GPU Technology Conference (GTC) Silicon Valley, March 26-29th is the premier AI and deep learning event, providing you with training, insights, and direct access to the industry’s best and brightest. It’s where you will see the latest breakthroughs in self-driving cars, smart cities, healthcare, high-performance computing, virtu...
Read moreHow Driverless AI Prevents Overfitting and Leakage
By Marios Michailidis , Competitive Data Scientist, H2O.ai In this post, I’ll provide an overview of overfitting, k-fold cross-validation, and leakage. I’ll also explain how Driverless AI avoids overfitting and leakage.An Introduction to OverfittingA common pitfall that causes machine learning models to fail when tested in a real-world e...
Read moreSparkling Water 2.2.10 is now available!
Hi Makers! There are several new features in the latest Sparkling Water. The major new addition is that we now publish Sparkling Water documentation as a website which is available here . This link is for Spark 2.2. We have also documented and fixed a few issues with LDAP on Sparkling Water. Exact steps are provided in the documentation...
Read moreCongratulations - H2O is a leader in the Gartner Magic Quadrant for Data Science and Machine Learning Platforms
Congratulations – Thanks to the support of our customer community over the past years, H2O.ai is a leader and one with the most completeness of vision in Gartner Magic Quadrant for Data Science and Machine Learning Platforms. It is an ecosystem we dedicated a good part of this decade to open up and spring. This is testimony to the incr...
Read moreNew features in H2O 3.18
Wolpert Release (H2O 3.18)There’s a new major release of H2O and it’s packed with new features and fixes! We named this release after David Wolpert , who is famous for inventing Stacking (aka Stacked Ensembles ). Stacking is a central component in H2O AutoML , so we’re very grateful for his contributions to machine learning! He is also fa...
Read moreDeveloping and Operationalizing H2O.ai Models with Azure
This post originally appeared here. It was authored by Daisy Deng, Software Engineer, and Abhinav Mithal, Senior Engineering Manager, at Microsoft. The focus on machine learning and artificial intelligence has soared over the past few years, even as fast, scalable and reliable ML and AI solutions are increasingly viewed as being vital to...
Read moreHappy Holidays from H2O.ai
Dear Community, Your intelligence, support and love have been the strength behind an incredible year of growth, product innovation, partnerships, investments and customer wins for H2O and AI in 2017. Thank you for answering our rallying call to democratize AI with our maker culture. Our mission to make AI ubiquitous is still fresh as da...
Read moreIt’s all Water (or should I say H2O) to me!
By Krishna Visvanathan, Co-founder & Partner, Crane Venture Partners In the career of any venture capitalist, one dreads the “oh shit moment” . For those unfamiliar with this most technical of terms – it is that moment of clarity when a VC, in the immediate aftermath of closing one’s latest investment (often at the first post invest...
Read moreH2O4GPU Hands-On Lab (Video) + Updates
Aggregator DBSCAN Kalman Filters K-nearest neighbors Quantiles Sort If you’d like to learn more about H2O4GPU, I invite you to explore these helpful links: H2O4GPU README Open Source License (Apache 2.0) Happy Holidays! Rosalie ...
Read moreDriverless AI - Introduction, Hands-On Lab and Updates
#H2OWorld was an incredible experience. Thank you to everyone who joined us! There were so many fascinating conversations and interesting presentations. I’d love to invite you to enjoy the presentations by visiting our YouTube channel . Over the next few weeks, we’ll be highlighting many of the talks. Today I’m excited to share two prese...
Read moreNew versions of H2O-3 and Sparkling Water available
Dear H2O Community, #H2OWorld is on Monday and we can’t wait to see you there! We’ll also be live streaming the event starting at 9:25am PST. Explore the agenda here . Today we’re excited to share that new versions of H2O-3 and Sparkling Water are available. We invite you to download them here: http://www.h2o.ai/download/ H2O-3.16 – MO...
Read moreH2O.ai Raises $40 Million to Democratize Artificial Intelligence for the Enterprise
November 30, 2017 | Data Science, Machine Learning | H2O.ai Raises $40 Million to Democratize Artificial Intelligence for the Enterprise
Read moreLaying a Strong Foundation for Data Science Work
By William Merchan, CSO, DataScience.com In the past few years, data science has become the cornerstone of enterprise companies’ efforts to understand how to deliver better customer experiences. Even so, when DataScience.com commissioned Forrester to survey over 200 data-driven businesses last year, only 22% reported they were leverag...
Read moreH2O.ai Releases H2O4GPU, the Fastest Collection of GPU Algorithms on the Market, to Expedite Machine Learning in Python
H2O4GPU is an open-source collection of GPU solvers created by H2O.ai. It builds on the easy-to-use scikit-learn Python API and its well-tested CPU-based algorithms. It can be used as a drop-in replacement for scikit-learn with support for GPUs on selected (and ever-growing) algorithms. H2O4GPU inherits all the existing scikit-learn algor...
Read moreDriverless AI Blog
In today’s market, there aren’t enough data scientists to satisfy the growing demand for people in the field. With many companies moving towards automating processes across their businesses (everything from HR to Marketing), companies are forced to compete for the best data science talent to meet their needs. A report by McKinsey says th...
Read moreScalable Automatic Machine Learning: Introducing H2O's AutoML
Prepared by: Erin LeDell, Navdeep Gill & Ray Peck In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts and experts...
Read moreXGBoost in the H2O Machine Learning Platform
The new H2O release 3.10.5.1 brings a shiny new feature – integration of the powerful XGBoost library algorithm into H2O Machine Learning Platform! XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. XGBoost provides parallel tree boosting (also known as GBDT, GBM) that ...
Read moreH2O Platform Extensibility
The latest H2O release, 3.10.5.1, introduced several new concepts to improve extensibility and modularity of the H2O machine learning platform . This blog post will clarify motivation, explain design decisions we made, and demonstrate the overall approach for this release.MotivationThe H2O Machine Learning platform was designed as a mono...
Read moreMachine Learning on GPUs
With H2O GPU Edition, H2O.ai seeks to build the fastest artificial intelligence (AI) platform on GPUs. While deep learning has recently taken advantage of the tremendous performance boost provided by GPUs, many machine learning algorithms can benefit from the efficient fine-grained parallelism and high throughput of GPUs. Importantly, G...
Read moreThe Race for Intelligence: How AI is Eating Hardware - Towards an AI-defined hardware world
With the AI arms race reaching a fever pitch, every data-driven company is (or at least should be) evaluating its approach to AI as a means to make their owned datasets as powerful as they can possibly be. In fact, any business that’s not currently thinking about how AI can transform its operations risks falling behind its competitors and...
Read moreH2O announces GPU Open Analytics Initiative with MapD & Continuum
H2O.ai, Continuum Analytics, and MapD Technologies have announced the formation of the GPU Open Analytics Initiative (GOAI) to create common data frameworks enabling developers and statistical researchers to accelerate data science on GPUs. GOAI will foster the development of a data science ecosystem on GPUs by allowing resident applicat...
Read moreUse H2O.ai on Azure HDInsight
This is a repost from this article on MSDN. We’re hosting an upcoming webinar to present you how to use H2O on HDInsight and to answer your questions. Sign up for our upcoming webinar on combining H2O and Azure HDInsight. We recently announced that H2O and Microsoft Azure HDInsight have integrated to provide Data Scientists with a Lead...
Read moreSparkling Water on the Spark-Notebook
This is a guest post from our friends at Kensu. In the space of Data Science development in enterprises, two outstanding scalable technologies are Spark and H2O. Spark is a generic distributed computing framework and H2O is a very performant scalable platform for AI. Their complementarity is best exploited with the use of Sparkling Wat...
Read moreStacked Ensembles and Word2Vec now available in H2O!
Prepared by: Erin LeDell and Navdeep Gill MathJax.Hub.Config({ tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]} }); Stacked Ensembles ensemble <- h2o.stackedEnsemble(x = x, y = y, training_frame = train, base_models = my_models) Python:ensemble = H2OStackedEnsembleEstimator(base_models=my_models) ensemble.train(x=x, y=y, training...
Read moreArtificial Intelligence Is Already Deep Inside Your Wallet – Here’s How
Artificial intelligence (AI) is the key for financial service companies and banks to stay ahead of the ever-shifting digital landscape, especially given competition from Google , Apple , Facebook , Amazon and others moving strategically into fintech. AI startups are building data products that not only automate the ingestion of vast amou...
Read moreFootball Flowers
function resizeIframe() { document.getElementById('cheese').style.height = document.getElementById('cheese').contentWindow.document.body.scrollHeight + 'px'; setInterval(resizeIframe, 1000); } ...
Read moreStart Off 2017 with Our Stanford Advisors
We were very excited to meet with our advisors (Prof. Stephen Boyd, Prof. Rob Tibshirani and Prof. Trevor Hastie) at H2O.AI on Jan 6, 2017. Professors Boyd, Tibshirani & Hastie in the house! @h2oai #elementsofstatisticallearning #MachineLearning pic.twitter.com/FnlCNrY7Hy — H2O.ai (@h2oai) January 6, 2017 Our CEO, Sri Ambati, ma...
Read moreWhat is new in Sparkling Water 2.0.3 Release?
This release has H2O core – 3.10.1.2Important Feature:This architectural change allows to connect to existing h2o cluster from sparkling water. This has a benefit that we are no longer affected by Spark killing it’s executors thus we should have more stable solution in environment with lots of h2o/spark node. We are working on article on ...
Read moreBehind the scenes of CRAN
(Just from my point of view as a package maintainer.) New users of R might not appreciate the full benefit of CRAN and new package maintainers may not appreciate the importance of keeping their packages updated and free of warnings and errors. This is something I only came to realize myself in the last few years so I thought I would write...
Read moreWhat is new in H2O latest release 3.10.2.1 (Tutte) ?
Today we released H2O version 3.10.2.1 (Tutte). It’s available on our Downloads page, and release notes can be found here . Photo Credit: https://en.wikipedia.org/wiki/W._T._Tutte Top enhancements in this release: GLM MOJO Support: GLM now supports our smaller, faster, more efficient MOJO (Model ObJect, Optimized) format for model pu...
Read moreUsing Sentiment Analysis to Measure Election Surprise
Sentiment Analysis is a powerful Natural Language Processing technique that can be used to compute and quantify the emotions associated with a body of text. One of the reasons that Sentiment Analysis is so powerful is because its results are easy to interpret and can give you a big-picture metric for your dataset. One recent event that ...
Read moreIndexing 1 Billion Time Series with H2O and ISax
At H2O, we have recently debuted a new feature called ISax that works on time series data in an H2O Dataframe. ISax stands for Indexable Symbolic Aggregate ApproXimation, which means it can represent complex time series patterns using a symbolic notation and thereby reducing the dimensionality of your data. From there you can run H2O’s ML...
Read moreWhy We Bought A Happy Diwali Billboard
It’s been a dark year in many ways, so we wanted to lighten things up and celebrate Diwali — the festival of lights! Diwali is a holiday that celebrates joy, hope, knowledge and all that is full of light — the perfect antidote for some of the more negative developments coming out of the Silicon Valley recently. Throw in a polarizing pre...
Read moreCreating a Binary Classifier to Sort Trump vs. Clinton Tweets Using NLP
The problem : Can we determine if a tweet came from the Donald Trump Twitter account (@realDonaldTrump) or the Hillary Clinton Twitter account (@HillaryClinton) using text analysis and Natural Language Processing (NLP) alone? The Solution : Yes! We’ll divide this tutorial into three parts, the first on how to gather the necessary data, t...
Read moresparklyr: R interface for Apache Spark
This post is reposted from Rstudio’s announcement on sparklyr – Rstudio’s extension for Spark Connect to Spark from R. The sparklyr package provides a complete dplyr backend. Filter and aggregate Spark datasets then bring them into R for analysis and visualization. Use Spark’s distributed machine learning library from R. Create...
Read moreWhen is the Best Time to Look for Apartments on Craigslist?
A while ago I was looking for an apartment in San Francisco. There are a lot of problems with finding housing in San Francisco, mostly stemming from the fierce competition. I was checking Craigslist every single day. It still took me (and my girlfriend) a few months to find a place — and we had to sublet for three weeks in between. Thankf...
Read moreFocus
———- Forwarded message ——— From: SriSatish Ambati Date: Thu, Sep 15, 2016 at 10:17 PM Subject: changes and all hands tomorrow. To: team Team, Our focus has changed towards larger fewer deals & deeper engagements with handful of finance and insurance customers. We took a hard look at our marketing spend, pr programs and personnel. We l...
Read moreDistracted Driving
Last week, we started to examine the 7.2% increase in traffic fatalities from 2014 to 2015, the reversal of a near decade-long downward trend. We then broke out the data by various accident classifications , such as “speeding” or “driving with a positive BAC,” and identified those classifications that had the greatest increase. One label...
Read moreIntroducing H2O Community & Support Portals
At H2O, we enjoy serving our customers and the community, and we take pride in making them successful while using H2O products. Today, we are very excited to announce two great platforms for our customers and for the community to better communicate with H2O. Let’s start with our community first: The success of every open source project ...
Read moreFatal Traffic Accidents Rise in 2015
On Tuesday, August 30th, the National Highway Traffic Safety Administration released their annual dataset of traffic fatalities asking interested parties to use the dataset to identify the causes of an increase of 7.2% in fatalities from 2014 to 2015. As part of H2O.ai ‘s vision of using artificial intelligence for the betterment of soci...
Read moreIoT - Take Charge of Your Business and IT Insights Starting at the Edge
Instead of just being hype, the Internet of Things (IoT) is now becoming a reality. Gartner forecasts that 6.4 billion connected devices will be in use worldwide, and 5.5 million new devices will get connected every day, in 2016. These devices range from wearables, to sensors in vehicles the can detect surrounding obstacles, to sensors in...
Read moreHyperparameter Optimization in H2O: Grid Search, Random Search and the Future
“Good, better, best. Never let it rest. ‘Til your good is better and your better is best.” – St. Jerome tl;drH2O now has random hyperparameter search with time- and metric-based early stopping. Bergstra and Bengio[1] write on p. 281: Compared with neural networks configured by a pure grid search, we find that random search over the s...
Read moreH2O GBM Tuning Tutorial for R
In this tutorial, we show how to build a well-tuned H2O GBM model for a supervised classification task. We specifically don’t focus on feature engineering and use a small dataset to allow you to reproduce these results in a few minutes on a laptop. This script can be directly transferred to datasets that are hundreds of GBs large and H...
Read moreSpam Detection with Sparkling Water and Spark Machine Learning Pipelines
This short post presents the “ham or spam” demo, which has already been posted earlier by Michal Malohlava , using our new API in latest Sparkling Water for Spark 1.6 and earlier versions, unifying Spark and H2O Machine Learning pipelines. It shows how to create a simple Spark Machine Learning pipeline and a model based on the fitted pipe...
Read moreInterview with Carolyn Phillips, Sr. Data Scientist, Neurensic
During Open Tour Chicago we conducted a series of interviews with data scientists attending the conference. This is the second of a multipart series recapping our conversations. Be sure to keep an eye out for updates by checking our website or following us on Twitter @h2oai. H2O.ai: How did you become a data scientist? Phillips: Until ...
Read moreInterview with Svetlana Kharlamova, Sr. Data Scientist, Grainger
During Open Tour Chicago we conducted a series of interviews with data scientists attending the conference. This is the first of a multipart series recapping our conversations. Be sure to keep an eye out for updates by checking our website or following us on Twitter @h2oai. H2O.ai: How did you become a data scientist? Kharlamova: I’m a...
Read moreH2O Day at Capital One
Here at H2O.ai one of our most important partners is Capital One, and we’re proud to have been working with them for over a year. One of the world’s leading financial services providers, Capital One has a strong reputation for being an extremely data and technology-focused organization. That’s why when the Capital One team invited us to t...
Read moreRed herring bites
At the Bay Area R User Group in February I presented progress in big-join in H2O which is based on the algorithm in R’s data.table package. The presentation had two goals: i) describe one test in great detail so everyone understands what is being tested so they can judge if it is relevant to them or not; and ii) show how it scales with...
Read moreFast csv writing for R
R has traditionally been very slow at reading and writing csv files of, say, 1 million rows or more. Getting data into R is often the first task a user needs to do and if they have a poor experience (either hard to use, or very slow) they are less likely to progress. The data.table package in R solved csv import convenience and speed in 2...
Read moreApache Spark and H2O on AWS
This is a guest post re-published with permission from our friends at Datapipe. The original lives here. One of the advantages of public cloud is the ability to experiment and run various workloads without the need to commit to purchasing hardware. However, to meet your data processing needs, a well-defined mapping between your objecti...
Read moreConnecting to Spark & Sparkling Water from R & Rstudio
Sparkling Water offers the best of breed machine learning for Spark users. Sparkling Water brings all of H2O’s advanced algorithms and capabilities to Spark. This means that you can continue to use H2O from Rstudio or any other ide of your choice. This post will walk you through the steps to get running on plain R or R studio from Spark. ...
Read moreDrink in the Data with H2O at Strata SJ 2016
It’s about to rain data in San Jose when Strata + Hadoop World comes to town March 29 – March 31st. H2O has a waterfall of action happening at the show. Here’s a rundown of what’s on tap. Keep it handy so you have less chance of FOMO (fear of missing out). Hang out with H2O at Booth #1225 to learn more about how machine learning can hel...
Read moreRoad Ahead and BTUs
H2O.ai – Road Ahead – keynote presentation by Sri Ambati from Sri Ambati ...
Read moreThank you, Cliff
Cliff resigned from the Company last week – He is parting on good terms and supports our success in future. Cliff and I worked closely since 2004 so this is a loss for me. It ends an era of prolific work supporting my vision as a partner. Let’s take this opportunity to congratulate Cliff on his work, in helping me build something from not...
Read moreThe Top 10 Most Watched Videos From H2O World 2015
Now that we’re a few months out from H2O World we wanted to share with you all what the most popular talks were by online viewership. The talks covered a variety of topics from introductions, to in-depth examinations of use cases, to wide-ranging panels. Introduction to Data Science Featuring Erin LeDell, Statistician and Machine Learnin...
Read moreCompressing Zip Codes with Generalized Low Rank Models
This tutorial introduces the Generalized Low Rank Model (GLRM) [1 ], a new machine learning approach for reconstructing missing values and identifying important features in heterogeneous data. It demonstrates how to build a GLRM in H2O that condenses categorical information into a numeric representation, which can then be used in other mo...
Read moreDatabricks and H2O Make it Rain with Sparkling Water
**This blog post was first posted on the Databricks blog hereDatabricks provides a cloud-based integrated workspace on top of Apache Spark for developers and data scientists. H2O.ai has been an early adopter of Apache Spark and has developed Sparkling Water to seamlessly integrate H2O.ai’s machine learning library on top of Spark. In thi...
Read moreH2O World from an Attendee's Perspective
Data Science is like Rome, and all roads lead to Rome. H2O WORLD is the crossroad, pulling in a confluence of math, statistics, science and computer science and incorporating all avenues of business. From the academic, research oriented models to the business and computer science analytics implementations of those ideas, H2O WORLD inform...
Read moreH2O.ai at ODSC SF 2015!
As promised, we’re here reporting from the floor of the (H2O.ai-sponsored) Open Data Science Conference (ODSC). It’s been another wild day for us, with an early start at 7:30am to set up ahead of the show. However, the long days are all worth it for a chance to see you all in the field. While we thought bringing two boxes of booklets woul...
Read moreH2O at ML Conf SF 2015
H2O is ubiquitous, and just like H2O, our team is everywhere! Today we attended the (H2O.ai-sponsored) 2015 Machine Learning Conference in San Francisco. Located at the gorgeous Julia Morgan Ballroom the ML Conference brought together some of the world’s foremost experts on machine learning, including the tireless Xavier Amatriain, VP of...
Read moreH2O World Third Day Wrap-Up
H2O fans, we know that distance and the twin holidays of Veteran’s Day and Diwali kept many of you from attending the grand finale of H2O World, but we want to at least give you a taste of all that went on at the Computer History Museum in Mountain View. Day 3 of H2O World got off to a strong start with a massive panel on creating a cultu...
Read moreH2O World Second Day Wrap-Up
H2O fans, we didn’t think that our second day could top our first, but somehow it did! Still, although we had record attendance, we know that a lot of you aren’t here. While we can’t hope to get across all that’s happened, we do want to share some of the highlights. The morning started off with CEO Sri Ambati welcoming attendees and givin...
Read moreH2O World First Day Wrap-Up
H2O fans, we wish that all of you were here, but we also know that our community is spread across the globe and not all of you could make it to H2O World. However, those of you not able to attend the conference are just as much a part of our community as those that are. While we can’t hope to convey the energy and excitement of H2O World,...
Read morePre-H2O World, Part 2
H2O fans, we have a day of data delights in store you for you tomorrow! The first day of H2O World is totally devoted to demos and walkthroughs designed to help YOU get the most out of your data. In fact, we have so many sessions planned that unless you have Hermione’s Time Turner, you won’t be able to attend them all. So choose wisely! A...
Read moreA Newbie's Guide to H2O in Python - Guest Post
This blog was originally posted hereI created this guide to help fellow newbies get their feet wet with H2O, an open-source predictive analytics platform that is fast, powerful, and easy to use. Using a combination of extraordinary math and high-performance parallel processing, H2O allows you to quickly create models for big data. The st...
Read morePre-H2O World, Part 1
H2O fans, the H2O.ai team is burning the midnight oil to get H2O World ready for you all. With an audience size twice that of last year’s event we’re going to pack the house at the Computer History Museum! This year’s event will feature 70+ speakers spread out over 41 talks, 22 training sessions and eight panels during the course of the m...
Read moreHow to Build a Machine Learning App Using Sparkling Water and Apache Spark
The Sparkling Water project is nearing its one-year anniversary, which means Michal Malohlava, our main contributor, has been very busy for the better part of this past year. The Sparkling Water project combines H2O machine-learning algorithms with the execution power of Apache Spark. This means that the project is heavily dependent on tw...
Read moreHow I used H2O to crunch through a bank's customer data
This entry was originally posted here Six months back I gingerly started exploring a few data science courses. After having successfully completed some of the courses I was restless. I wanted to try my data hacking skills on some real data (read kaggle). I find competing in hackathons, helps you to benchmark yourself against your fellow ...
Read moreFast, Scalable Machine Learning- Now with New and Improved Python API
H2O now has a new Python API, based on valuable feedback provided by our community. Newest features include: – pandas-like dataframes, but for large, distributed computing – scikit learn integration – machine learning pipeline API Check out the tutorial below: ...
Read moreAn Introduction to Data Science: Meetup Summary Guest Post by Zen Kishimoto
Originally posted on Tek-Tips forums by Zen here I went to two meetups at H2O , which provides an open source predictive analytics platform. The second meetup was full of participants because its theme was an introduction to data science. Data science is a new buzzword, and I feel like everyone claims to be a data scientist or somethin...
Read moreThe Definitive Performance Tuning Guide for H2O Deep Learning (Ported scripts to H2O-3, results are taken from February's blog)
Introduction This document gives guidelines for performance tuning of H2O Deep Learning, both in terms of speed and accuracy. It is intended for existing users of H2O Deep Learning (which is easy to change if you’re not), as it assumes some familiarity with the parameters and use cases. Motivation This effort was in part motivated b...
Read moreLending Club : Predict Bad Loans to Minimize Loss to Defaulted Accounts
As a sales engineer on the H2O.ai team I get asked a lot about the value add of H2O. How do you put a price tag on something that is open source? This typically revolves around the use cases; if a use case pertains to improving user experience or making apps that can improve internal operations then there’s no straightforward way of monet...
Read moreIntroduction to Data Science using H2O - Chicago
Thank you to Chicago for the great meetup on 29 July 2015. Slides have been posted on GitHub . The links to the sample scripts and data is contained in the slides. If you have any further questions about H2O, please join our GoogleGroup or chat with us on Gitter . The slides are also available on the H2O Slideshare : Also, thank you t...
Read moreuseR! Aalborg 2015 conference
The H2O team spent most of the useR! Aalborg 2015 conference at the booth giving demos and discussing H2O. Amy had a 16 node EC2 cluster running with 8 cores per node, making a total of 128 CPUs. The demo consisted of loading large files in parallel and then running our distributed machine learning algos in parallel. At an R conference, m...
Read moreKFold Cross Validation With H2O-3 and R
This blog is also explains the solution to a Google Stream question we received Note: KFold Cross Validation will be added to H2O-3 as an argument soonThis is a terse guide to building KFold cross-validated models with H2O using the R interface. There’s not very much R code needed to get up and running, but it’s by no means the one-magic-...
Read more'Ask Craig'- Determining Craigslist Job Categories with Sparkling Water, Part 2
This is the second blog in a two blog series. The first blog is on turning these models into a Spark streaming applicationThe presentation on this application can be downloaded and viewed at SlideshareIn the last blog post we learned how to build a set of H2O and Spark models to predict categories for jobs posted on Craigslist using Spar...
Read moreSparkling Water Tutorials Updated
This is updated version of Sparkling Water tutorials originally published by Amy Wang here For the newest examples, and updates, please visit Sparkling Water GitHub page The blog post introduces 3 tutorials: Running Sparkling Water Locally Running Sparkling Water on Standalone Spark Cluster Running H2O Commands from Spark Shell ...
Read more'Ask Craig'- Determining Craigslist Job Categories with Sparkling Water
This is the first blog in a two blog series. The second blog is on turning these models into a Spark streaming applicationThe presentation on this application can be downloaded and viewed at SlideshareOne question we often get asked at Meetups or conferences is: “How are you guys different than other open-source machine-learning toolkits?...
Read moreScaling R with H2O
In the advent of H2O 3.0 it seems appropriately timed to reintroduce the R API for H2O to help users better understand the differences between R dataframes and H2OFrames. Typically some of the first questions we get include: Does H2O support all R packages and functions? Is H2OFrame an extension of data.frame? Are H2O supported algo...
Read moreUsing H2O for Kaggle: Guest Post by Gaston Besanson and Tim Kreienkamp
This post also appears on the GSE Data Science BlogIn this special H2O guest blog post, Gaston Besanson and Tim Kreienkamp talk about their experience using H2O for competitive data science . They are both students in the new Master of Data Science Program at the Barcelona Graduate School of Economics and used H2O in an in-class Kaggle...
Read morePyData Dallas 2015
H2O was in attendance last week at PyData in Dallas, Texas. Our CTO, Cliff Click, spoke at PyData about driving H2O from Python to perform feature-engineering, group by, quantiles, and model building with H2O’s GBM, GLM, and Distributed Random Forest . We met a lot of great people and we are really excited to see the enthusiasm for H2O w...
Read moreDeep Learning for Public Safety
This article first appeared on KDnuggetsContributors: Alex Tellez, Michal Malohlava, Prithvi Prabhu, Hank Roark, Amy Wang.Download full report We’ve seen some incredible applications of Deep Learning with respect to image recognition and machine translation but this particular use case has to do with public safety; in particular, how De...
Read moreCulture
———- Forwarded message ———- From: SriSatish Ambati srisatish@0xdata.com Date: Sun, Jun 1, 2014 at 12:29 PM Subject: Re: jirassic hierarchy. To: Kevin kevin@0xdata.com Cc: Tom Kraljevic tomk@0xdata.com, engr engr@0xdata.com, team team@0xdata.com The best cultures are ones where it feels like there isn’t any. Not saying scrum won’t fit,...
Read moreSparkling Water Certified by Cloudera
Last month before the H2O.ai team publicly announced Sparkling Water at Strata San Jose we made sure that the product was backed and certified by some major partners. This includes approval from databricks itself as well as Cloudera . Integration Testing for ClouderaFor Cloudera, testing was mainly geared toward deployment and sustaina...
Read moreThe Definitive Performance Tuning Guide for H2O Deep Learning
This document gives guidelines for performance tuning of H2O Deep Learning, both in terms of speed and accuracy. It is intended for existing users of H2O Deep Learning (which is easy to change if you’re not), as it assumes some familiarity with the parameters and use cases. Motivation This effort was in part motivated by a Deep Learn...
Read moreStrata San Jose 2015
I had a great time at Strata SJ 2015! I had a lot of fun answering questions and talking to enthusiastic and curious H2O users at our booth. It was great seeing how many people are involved in the H2O community and I also really enjoyed drinking free margaritas at the booth crawl. The H2O team met some really great people with lots of dif...
Read moreHow does Java Both Optimize Hot Loops and Allow Debugging
This blog came about because an old friend is trying to figure out how Java can do aggressive loop and inlining optimizations, while allowing the loading of new code and setting breakpoints… in that optimized code. On 2/21/2015 11:04 AM, IG wrote: Safepoint. I’m still confused because I don’t understand what is the required state at a s...
Read moreIntroducing first-Fridays Hackathon with H2O
Greetings fellow ML/AI enthusiasts! This blog post serves two purposes: 1) Introduction of our First Fridays initiative 2) Recap our first 12-hour Hackathon!WHAT:The first Friday of each month, H2O.ai will hold a Hack-A-Thon from 1pm – 10pm (yep, you read correctly!) whereby we invite ANYONE to come hack through a data problem with the H2...
Read moreLaunching H2O with Docker
Hello world, again. H2O is already relatively easy to launch, all the user needs is a compatiable Java version but now that level of difficulty is reduce to nil. Jeff, our DevOps engineer, presented me with a Docker container for H2O making shipping H2O possible regardless of your environment setup. You can now launch H2O in an isolated e...
Read moreH2O vs R - Winning KDDCup98 in 10 minutes with H2O
H2O is a scalable and open-source math and machine learning platform for big data. It can handle much bigger datasets and run a lot faster than R/SAS even on a single machine. How does the modeling experience with H2O differ from the experience using traditional tools such as R/SAS? This blog answers exactly this question. In particular, ...
Read moreH2O WORLD 2014 Machine Learning IS Fun.
Earlier this year I found myself sitting among 100 or so data scientists at a meetup , eating a taco and listening to how a former particle physicist found the Higgs Boson particle over a weekend using commodity hardware and open source software . Even more impressive was his ability to answer the unrelenting questions from the audience ...
Read moreWhat if the S language had been copyrighted?
At H2O World 2014, we were fortunate to have Josh Bloch give a reprise of his A Brief, Opinionated History of the API talk that he first delivered at SPLASH 2014 . (For those with the time, you can watch a 47 minute 21 second recording of this talk on the H2O.ai YouTube channel.) This is one of those subjects that I wish I could say m...
Read moreKey Takeaways from the World's Top Kagglers
Ever wondered why data science is so competitive? After a highly successful H2O World event last week, we’re shining some light on what we’ve learned from some of the world’s best data scientists and how they go about winning these data science challenges such as Kaggle . In case you missed it, we held a Competitive Data Science Panel ...
Read morePredictive Modeling at Scale: Cisco Modernizes Predictive Model Production with H2O (joint work with Lou Carvalheira)
Cisco’s ChallengesCisco is the global leader in networking. It is a company that has long embraced the power of predictive analytics. On a regular quarter, Cisco’s Strategic Marketing Organization builds and deploys around 60,000 predictive models to treat each of 160M+ companies it maintains in its database. These models generate predict...
Read moreIntroducing Flow!
After several weeks of active development, we’re proud to unveil H2O Flow, our brand new, open-source user interface for H2O! We used it live during our H2O World keynote today, and this blog post is a brief introduction to some of the core ideas behind H2O Flow.H2O Flow is a web-based interactive computational environment where you can ...
Read moreCompetitive Data Science, Kaggle, Kdd and other Sports
Panelists:This panel promises to be just brilliant and full of sparks!Guocong Song https://www.kaggle.com/users/41275/guocong-song Jose Guerrero https://www.kaggle.com/users/5642/jos-a-guerrero Mark Landry https://github.com/mlandry22/kaggle/commits/master Arno Candel http://www.slideshare.net/0xdata/h2o-distributed-deep-learning-by-arno-...
Read moreHacking Algorithms in H2O With Cliff
Interested in Hacking Algorithms with me? I’ll be at H2 O World all day Tuesday looking to join you in doing some fun hacking. Here are 3 sample starter hacks to help you get over the H2O learning curve – Hacking KMeans Hacking Quantiles Hacking Grep All 3 take you step-by-step through the process of building a new algorithm into H2O’...
Read moreHacking Algorithms into H2O: Grep
This is a presentation of hacking a simple algorithm into the new dev-friendly branch of H2O, h2o-dev. This is one of three “Hacking Algorithms into H2O” blogs. All of these blogs start out the same: getting the h2o-dev code and building it. They are the same until the section titled Building Our Algorithm: Copying from the Example, and ...
Read moreHacking Algorithms into H2O: Quantiles
This is a presentation of hacking a simple algorithm into the new dev-friendlybranch of H2O, H2O 3.0. This is one of three “Hacking Algorithms into H2O” blogs. All three blogsstart out the same: getting the h2o-3 code and building it. They are the same until the section titled Building Our Algorithm: Copying from theExample, and then ...
Read moreHacking Algorithms into H2O: KMeans
This is a presentation of hacking a simple algorithm into the new dev-friendlybranch of H2O, h2o-dev. This is one of three “Hacking Algorithms into H2O” blogs. All blogsstart out the same – getting the h2o-dev code and building it. They are thesame until the section titled Building Our Algorithm: Copying from theExample, and then the ...
Read moreSparkling Water on YARN Example
Follow these easy steps to get your first Sparkling Water example to run on a YARN cluster. This example uses Hortonworks HDP 2.1. 1. Assumptions Installed: Java 1.7+ YARN cluster Note: In the current version of Sparkling Water running on YARN, the cluster formation requires multicast to work for the H2O nodes to find each oth...
Read moreRunning Your First Droplet on H2O
A number of us were at Strata in New York City this October, and one of the major benefits of these events is getting lots of in-person time with people who use your product.Michal and Amy spent some time with a developer who was trying to build on top of the h2o-dev repo, and we realized that we didn’t have a really basic example yet of ...
Read moreSparkling Water Tutorials
Please follow the updated version of tutorials here H2O is hosting a meetup tomorrow at our officewhere attendees are encourage to hack away with us as we run Deep Learning on Sparkling Water. If you haven’t already read allabout H2 O’s integration into Spark then get started withHow Sparkling Water Brings H2O to Spark and Sparkling W...
Read moreHow to use R, H2O, and Domino for a Kaggle competition
Guest post by Jo-Fai Chow The sample project (code and data) described below is available on Domino. If you’re in a hurry, feel free to skip to: Tutorial 1: Using Domino Tutorial 2: Using H2O to Predict Soil Properties Tutorial 3: Scaling up your analysis IntroductionThis blog post is the sequel to TTTAR1 a.k.a. An Introduction t...
Read moreHow Sparkling Water Brings H2O to Spark
This post provides a high-level introduction to the current integration plan between H2 O and Spark. This is an ongoing engineering effort involving collaboration between the open source teams, and describes what is currently underway.1. Overall ApproachThe first question one might ask is “Why”? What does one, as a user, gain from trying ...
Read moreSparkling Water!
H2O & Scala & SparkSpark is an up and coming new big data technology; it’s a whole lot faster andeasier than existing Hadoop-based solutions. H2 O does state-of-the-art MachineLearning algorithms over Big Data – and does them Fast. We are happy toannounce that H2 O now has a basic integration with Spark – Sparkling Water! This is...
Read moreIntroducing H2O Lagrange (2.6.0.11) to R
From my perspective the most important event that happened atuseR! 2014 was that I got to meetthe 0xdata team and now, long story short,here I am introducing the latest version of H2 O, labeledLagrange (2.6.0.11) ,to the R and greater data science communities. Beforejoining 0xdata, I was working at a competitor on a rival project and w...
Read moreuseR! 2014
Two weeks ago we attended the useR! conference hosted on the UCLA campus. I landed in Los Angeles at 8:30 P.M on Sunday June 29, and met up with Amy — another math hacker at 0xdata. After a harrowing cab ride we arrived on the UCLA campus at Sunset Village where we would be lodging for the next 3 evenings. Having just got the h2o R packag...
Read moreLearn to manage, munge, and model big data with H2O on the Hortonworks Sandbox
Working with big data might seem like a daunting task if like me, you’ve spent the majority of your college years doing pencil and paper proofs. Big data for me was anything that took longer than 30 minutes to ingest into single threaded R. For mathematicians and statisticians looking to understand widely used data platforms like Hadoop f...
Read moreH2O - The Killer-App on Spark
object AirlinesDemo extends Demo { override def run(conf: DemoConf): Unit = { // Prepare data // Dataset val dataset = “data/allyears2k_headers.csv” // Row parser val rowParser = AirlinesParser // Table name for SQL val tableName = “airlines_table” // Select all flights with destination == SFO val query = “””SELECT * FROM airlin...
Read moreA K/V Store For In-Memory Analytics, Part 2
This is a continuation of a prior blog on the H2O K/V Store, Part 1. A quick review on key bits going into this next blog: H2O supports a very high performance in-memory Distributed K/V store The store honors the full Java Memory Model with exact consistency by default Keys can be cached locally for both reads & writes A typi...
Read moreSJSU Tutorial on H2O and Random Forest
Our friends over at SJSU added this post to their course website after the H2O team stopped by earlier this semester to talk about H2O. We’ve reposted it here, but you can find the original at: http://sjsubigdata.wordpress.com/2014/04/24/oxdata-h2o-tutorial/ Oxdata (H2O) TutorialPosted on April 24, 2014 by bigsjsu Oxdata (H2O) Tutori...
Read moreTableau: Math Hacker Amy Talks Big Data Visualization TONIGHT
Anqi and I are back from NY, and we brought Amy with us – she's incredible, and she's giving a presentation at our meet up tonight, where she will talk about Big Data, visualization, and presenting interpretable graphics. So we're looking forward to seeing you tonight – the details are here:#meetup_oembed .mu_clearfix:after { visibility...
Read moreMLConf NY - Friday, April 11: Demo of Workflow and Collective Use Case
This Friday H2O will be at MLconf (http://mlconf.com) to give a live demo, introduce a customer use case, and talk about the implications of model specification in production. If you don’t get a chance to stop by our booth, or come see our demo, you can find the presentation slides on the MLconf website (they will be posted on Friday, Apr...
Read moreGoogle-scale Machine Learning & Deep Learning gets principal platform in Apache Mahout with Spark and H2O
H2O’s vision is direct and simple: scaling machine learning for powering intelligent applications. Our focus is distributed machine learning and a fully-featured set of industrial grade algorithms. Apache Mahout is where people learn their chops in Machine Learning. Like R, It’s the “hello world” first place many new users get exposed to ...
Read moreHang out with us tomorrow- Mar 26: H2O Math Hackers Present: Model Specification
Anqi and Irene present a hack along preview of their upcoming talk at MLConf. Come join us as we talk about the implications of model specification, and walk through how to frame models when asking different questions of the same data. #meetup_oembed .mu_clearfix:after { visibility: hidden; display: block; font-size: 0; content: " "; cl...
Read moreMeetup TONIGHT - Arno Presents: Deep Learning: Theory and Practice!
If you were unable to join us on Thursday 3/21 because of the high volume of interest, we are offering the same meeting again!In this talk, Arno Candel, Physicist & Hacker at H2O.ai will breakdown the basics of deep learning in theory & present implementation, early results from using MLP with Adaptive learning as implemented in...
Read moreIn-memory Big Data: Spark + H2O
Big Data has moved in-memory. Customers using SQL in their Join & Munging efforts via SHARK and Apache Spark need to use Regressions and Deep Learning. To make their experiences great & seamlessly weave SQL workflows with Data Science and Machine Learning, we are architecting a simple RDD data import-export in H2O. This brings c...
Read moreData Munging in H2O+R
Over the weekend we fielded a question from one of our users about the basics of data munging in H2O through R – and it was a good question, so I wanted to share the response with a wider audience – namely you guys.There are a few quick things about data munging in H2O+R: – It often looks and feels like you are manipulating data in R; we...
Read moreH2O Architecture
This is a top-level overview of the H2O architecture. H2O does in-memory analytics on clusters with distributed parallelized state-of-the-art Machine Learning algorithms . However, the platform is very generic, and very very fast. We’re building Machine Learning tools with it, because we think they’re cool and interesting, but the plat...
Read moreH2O at Code Mesh - API for in-memory Analytics - Cliff
Video link here:API for in-Memory Analytics – CodeMesh ...
Read moreHanging out at ShareThis
http://www.sharethis.com/blog/2014/02/24/machine-learning-prediction-model-hack-thon-oxdata-h2o/#sthash.jn8gTPzQ.dpbs We spent some time with the engineers and data scientists at ShareThis last week, and had a great time learning about their use cases, and getting H2O running on their data. It's nice to know that the ShareThis team had ...
Read moreAnd you know, we're on each other's team - Lorde
Walking past giant anti-burner consumerist strata booths, i was struck by Lorde's recent masterpiece. The Big Data Palace needs a release. No hype, it needs product. Product is the release. The emperor has no clothes and no one seems to dare. You see the propaganda machine. Working lock-step to Strata / stage setters. Darling startups tha...
Read moreGenerate A Mandelbrot Set In H2O
Roses are red, Violets are ~ Blue, H2O is sweet, And fractals are too! $$z_n = z_{n-1}^P + c$$ Where c is a “candidate” complex number. (Typically you’ll see $$P = 2$$ — that’s what we’ll do too). We set the the size of the sequence to the number of iterations we want, and measure convergence by looking at the modulus of $$z_n$$ ...
Read moreA K/V Store For In-Memory Analytics: Part 1
0xdata.com is building in-memory analytics (no surprise, see 0xdata.com) . What may be a surprise, though, is that there’s a full-fledged high-performance Key/Value store built into H2O and that is central to both our data management and our control logic. We use the K/V store in two main ways: All the Big Data is stored striped acros...
Read moreI'll let you be in my model, if I can be in yours.
Bob Dylan* said that.User-centric modeling is here to stay. Rich insights are available when we combine, knowledge of the world with knowledge of your customer. Yes, one at a time. However, users tangle in a network of events and overlap & become part of each others models. Sensor data can avoid granularity mismatch by building models...
Read moreCome visit H2O at Strata Booth 919
Greetings H2O friends and fans! Let’s do the data dance at Strata Santa Clara, Feb. 11-13 and check out our latest H2O Prediction Engine demo. We will be exhibiting at booth 919 and offering a 20% discount off registration. The show is slated to sell out, so be sure to register today and get your 20% discount with our code: 0XDATA20 ,...
Read moreHack data with our resident data scientist, Earl
This last thursday of every month event: Hack data with Earl Hathaway – our resident data scientist.#meetup_oembed .mu_clearfix:after { visibility: hidden; display: block; font-size: 0; content: " "; clear: both; height: 0; }* html #meetup_oembed .mu_clearfix, *:first-child+html #meetup_oembed .mu_clearfix { zoom: 1; }#meetup_oembed { bac...
Read morePathology of Data
Stephen Boyd's favorite way of summarizing a dataset at hand: “Understand the pathology of data. Sometimes it's not the pathology.” It's structure: dimensions, factors, outliers and principal components.It's very much what data scientists want from Adhoc Analytics – Scope the data from enough angles and with different tools to get real in...
Read moreAll models are wrong, but some models are useful!
George Box said that.There is no best model that works for all of your data. Wolpert reiterates that as the No free lunch theorem. Model predictive performance is domain specific. What works in one data domain has sometimes very little consequence in another one. Predictably, the rise of Domain Science: Data science needs to get closer ...
Read moreHack data with R + H2O ( aka, the last thursday of the 2013 meetup!)
Come join us and 32 other Data Scientistas to Hack airline dataset with R. This is our small intimate open house setup that we did every last thursday of each month – And this is the season finale! And what a year it has been for H2O! Nidhi will walk you through RStudio – don't forget to bring your tool belt (& a laptop with R install...
Read moreR & Scala for fast in-memory predictions on Hadoop via H2O!
Three of our best and brightest gave a talk last night on H2O, R, Scala and Hadoop (yes -all together and yes highlighting the integration).If you missed the talk last night the slides are linked here, and we're doing an encore next week (http://www.meetup.com/SF-Scala/events/153854762/ ) Tom Kraljevic presents using H2O on Hadoop – how w...
Read moreScalala on H2O at Typesafe
Please come catch us, catch up with us, and meet up with us next week, on the 17th. The makers & maintainers of Scala, Typesafe, is hosting us, where Adriaan Moors and the H2O team will be talking about Scala, working with data at scale, and getting the most out of your big data and domain. Meetup's in San Francisco, the details can ...
Read moreR & Scala for fast in-memory predictions on Hadoop via H2O!
Take R and Scala to Big Data using in-memory Algorithms from H2O. In this Triple Header for SF Big Data Science Anqi Fu , our resident R wiz, will present data munging and R adhoc analytics at scale. Be prepared for fireworks with R in RStudio and not a ton of powerpoint. Scala has reached tremendous adoption amongst Machine learning &...
Read moreMachine Learning for Adtech
Characteristics of advertising data: tens of thousands of columns or more (top 100k or 1 m sites) high collinearity factors: eg demographics, with a strong correlation between eg income and education collinearity: sports fans follow nfl + espn + bleacher report + fox sports; users of ravelry also shop etsy. Those features are certa...
Read moreMaking films is not too different from startups
Quentin Tarantino, Ang Lee and other great directors discuss making films, creative process, attention to detail and inspiring & directing one's team to do great work. ...
Read moreH2O goes to CodeMesh in London
An API for Distributed Computing We have defined an API and built an open-source platform for dealing with in-memory distributed data. We’ve used it to built state-of-the-art predictive modeling and analytics (e.g. GLMNET, GBM, Random Forest ) that’s 1000x faster than the disk-bound alternatives, and 100x faster than R (we love R but it’s...
Read moreH2O goes to qconsf
Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework,...
Read moreDistributed Deep Learning with H2O in the Cloud @ Ebay
Cyprien Noel will present hand-picked algorithms that work on H2O at scale and a survey of the space. We will walk users through the a couple of datasets (mnist) and demonstrate the power of Multi-layer Neural Networks at Scale in EC2. Learn more and sign up at http://www.meetup.com/Silicon-Valley-Big-Data-Science/events/132780102/ ...
Read morePredictable Rise of Physicists: Domain Science
For years, I secretly suspected that a lot of our math came from Physics . Some of the greatest leaps in math were made closely alongside the greatest discoveries in Physics. Calculus. QED. Turing.The physics of our businesses is grounded in a complex systems understanding of domain. When Data science gets finally freed from time-sapping...
Read morePivotal hosts 0xdata - Distributed Random Forest, GBM, GLM & API for Big Data Algos
Distributed Machine Learning has come of age, just in time to meet the challenges of Big Data. We will present an API for extending and rolling your own Algorithms or use powerful contest-winning Gradient Boosting Machine, Generalized Linear Modeling and Random Forest at scale. Demo and Fireworks using big datasets from within ...
Read moreFrontier Big Data Meetup - Scalability & Availability
Come see Sri present on November 5th! 1. Sam Hamilton , Vice President of Data Technology at PayPal 2. SriSatish Ambati , Co-founder & CEO, 0xData 3. Sourav Mazumder, Technology Head of Big Data Practices, Infosys 4. Bruce Templeton, Co-founder & CEO, NephoScale At Room B3 in Mission City Ballroom, Santa Clara Convention Center...
Read more0xdata and Yelp - Machine Learning for Relevance and Serendipity/Distributed Gradient Boosting
Join us and Yelp for a chat on Machine Learning, and make sure not to miss Sri’s lightning talk on Distributed Gradient Boosting!Main Talk: Machine Learning for Relevance and Serendipity Speaker: Aria Haghighi (Prismatic ) Abstract: Careful use of well-designed machine learning systems can transform products by providing highly perso...
Read moreOur data, our math // our tools, our science!
Big data has always been with us. Our race's answer to data explosion was through math & computation. Whether it was Newton's calculus, Einstein's Relativity or Shannon's Information Theory, each generation's answer to it's big data problem arose from it's best and brightest.Our generation's challenge is here. Our lives are mired in d...
Read moreBuilding a Distributed GBM on H2O
At 0xdata we build state-of-the-art distributed algorithms – and recently we embarked on building GBM , and algorithm notorious for being impossible to parallelize much less distribute. We built the algorithm shown in Elements of Statistical Learning II , Trevor Hastie, Robert Tibshirani, and Jerome Friedman on page 387 (shown at the bo...
Read moreAn API For Distributed Analytics
There are so many APIs to choose from…Features of the space: Lots of data – which I’ll qualify as “bigger than 1 machine” and thus needing parallel i.o, parallel memory, & parallel compute – and distributed algorithms. Ease of programming; hide details (but expose when want to). High level for ease-of-use, but “under the covers” ...
Read moreStrata NYC & Hadoop World: How to Stop Worrying and Start Modeling Big Data with Better Algorithms and H2O
How to Stop Worrying and Start Modeling Big Data with Better Algorithms and H2O Srisatish Ambati (0xdata Inc), Cliff Click (0xdata Inc) 5:05pm Tuesday, 10/29/2013 Data Science Beekman Parlor – Sutton North Data Modeling has been constrained through scale; Sampling still rules the day for Adhoc Analytics. Scale brings much needed change t...
Read moreNYC Big Data Meetup - Distributed Random Forest, GBM, GLM & API for Big Data Algos
Distributed Machine Learning has come of age. Just in time to meet the challenges of Big Data, we present an API for extending and rolling your own Algorithms or using powerful contest-winning Gradient Boosting Machine, Generalized Linear Modeling and Random Forest at scale. Demo and Fireworks using big datasets from within the familiar...
Read moreGBM on Ecology - Recreating a model made for R
In the last couple of weeks we’ve had two meetups on GBM (gradient boosted classification and regression ), and hence a lot of excitement about running the algorithm as presented by Cliff, Earl and Dr. Hastie. You can find the hella cool videos of both presentations here: http://www.youtube.com/0xdata One of my favorite articles on GBM ...
Read moreJoin Us Tomorrow at Trulia - Distributed GBM!
Hi hackers! Just a quick reminder we’ll be joining our friends at Trulia tomorrow for a meetup on machine learning discussing Distributed GBM.GBM is one of the most popular machine learning algorithms used in data mining competitions. Most of us use GBM through R implementation. However, we have recently written a distributed version fo...
Read moreH2O & LiblineaR: A tale of L2-LR
tl;dr: H2O and LiblineaR have nearly identical predictive performance. OverviewIn this blog, we examine the single-node implementations of L2-regularized logistic regression (LR) by H2O and LiblineaR . Both LibR and H2O are driven from the R console on the same hardware and evaluated on the same datasets. We compare regression coeffici...
Read more0xdata + Vendavo = Awesome
For those of you who missed our recent meetup at Vendavo, our data scientist Earl Hathaway, CTO & Architect of Distributed Gradient Boosting, Cliff Click spoke on GBM that was (without exaggeration) totally awesome! Eric, the Algorithms and Data Science guru at Vendavo and their hacker-CEO, Neil Lustig, have been partnering with us d...
Read moreRunning a GLM Model in H2O + R (notes from the hands-on meetup Sept. 26)
This is a walk through of running H2O through R. Before you get started you will need three things: R (a recent version), H2O (wich you can get through github: https://github.com/0xdata/h2o) or directly from our website: http://0xdata.com/h2O/, and the h2oWrapper R package, which is the tool that makes H2O talk to R, and lets you talk to ...
Read moreHands on Workshop: Hack Big Data With Math
Thursday night (September 26) at 7, resident Math Hackers will demonstrate hands on attitude combined with un-encumbered brain-power. Wielding powerful 0xdata machine learning technology at their fingertips, they will show you how to pull predictions from a gigantic distributed heap of Java built vectors . Bring your laptop, we have WIFI ...
Read moreGradient Boosting Machine in III Acts: Trevor Hastie, Netflix & 0xdata
Gradient Boosting Machine in III Acts: Dr. Trevor Hastie, Netflix & 0xdata. Triple Header on Boosting & GBM: Act I: Trevor Hastie, Of Stanford Mathematical Sciences, the mathematician behind Lasso & GBM speaks of the nuances of the Algorithm. Act II: Cliff Click, CTO of 0xdata, the implementor of parallel and distributed GB...
Read moreEven More MNIST
Since we've been fooling around with the MNIST data set quite a bit lately (Spence is using it in benchmarking), I've been following the leaderboard and methods for the ongoing Kaggle competition around the same data. It's really amazing to see what people come up with. But of course, the purpose of H2O is entirely that one need not devo...
Read moreReplay: Modeling MNIST With RF Hands-on Demo
Last week Spencer put together a great hands on for modeling data using H2O (http://www.meetup.com/H2Omeetup/). This post is a write-up of the workflow for generating an RF model on MNIST data for those of you who want to walk through the demo again, or maybe missed the live action version. I’m running through one of our local servers, ...
Read moreHands on Workshop: Hack Data With Math
Thursday night (August 29) at 7, resident math hacker Spencer A. is leading a hands on workshop on using H2O to analyze real-world data. For those of you who are new to the math side of H2O, we have notes below to help you get prepared. H2O is a distributed math platform featuring a set of analytical tools that can be accessed through an ...
Read morePicture it: H2O and R
August 22, 2013 | Uncategorized [EN] | Picture it: H2O and R
Read moreBig Data Science in H2O with R
Big Data Science with H2O in R from Anqi Fu We had a great turnout at our Meetup last night! We took a look at the H2O/R API, then dove right in to a hands-on demo, where we imported, cleaned, and ran GLM on the airlines data set in H2O using R commands. Here are the slides from my talk, and interested users can take a look at the ...
Read morePublic Data Sets
For your data analysis pleasure, I give you a giant list of super cool publicly available data. If you’re looking at the data sets and wondering “now what?” – you can find this list AND tutorials on how to use H2O for analysis at the H2O docs page (here: http://docs.0xdata.com) . You can also get a detailed hands on experience analyzing a...
Read moreTCP Is Not Reliable
Been to long between blogs…“TCP Is Not Reliable ” – what's THAT mean?Means: I can cause TCP to reliably fail in under 5 mins, on at least 2 different modern Linux variants and on modern hardware, both in our datacenter (no hypervisor) and on EC2.What does “fail” mean? Means the client will open a socket to the server, write a bunch of st...
Read moreRun H2O From Within R
With the REST API, it’s simple to run H2O operations from within R using similar syntax to all your favorite R functions. In this post, we’ll walk through a simple demo of its capabilities. First, get H2O installed and running by following the tutorial here . Once you have the R package loaded, you can take a look at the included demos by...
Read moreUse R to run Better Algorithms on Big Data
Our resident R users will demonstrate how to use the R package and invoke big data modeling entirely from R. In this session our resident R & Math hacker, Anqi Fu will demonstrate the R API for H2O. Early users, community and customers of H2O have been invoking GLM, Random Forest and K-means from an RConsole or RStudio. In this meetu...
Read moreRandom Forest Measurements for the MNIST Dataset
This post discusses the performance of H2O’s Random Forest [5] algorithm. We compare different versions of H2O as well as the RF implementation by wise.io . We use wall-clock time to measure work flows that match up with the user experience. A link to the scripts used is available here [1] . SpecificationsHardware Amazon EC2 in US-EAS...
Read moreWe the people: Our meetup member introductions
You may have noticed that we have a ton of stuff going on at 0xdata, including several upcoming meetups that I expect will be very well attended. I was feeling a little curious about who exactly would be attending. What are the common areas of interest, are our members mostly software people or data scientists? Anyhow, I find that when I ...
Read moreHey good looking; Visualization and Data Mining 1
I recently came across an article by Shaw et al, in Decision Support Systems (1). The article discussed the importance of data mining and information management to good customer relationship management in increasingly competitive markets. A key point of the paper that I agree with is the importance of heuristics in data mining, particular...
Read moreBig Data Cloud Computing Streaming Systems & Infrastructures
Big Data Science at Frontier Real Time Streaming Meetup. 250 Big Data enthusiasts have signed up for a saturday presentation! Looks like it's going to be quite interesting presentation and panel! ...
Read moreImplement a Machine Learning Algorithm in 2hrs
We will take a simple yet popular & powerful math algorithm such as Linear Regression and implement a distributed version in 2hrs. Pre-requisites: Knowledge of Java or R See: http://h2o.0xdata.com/ Warning: Only software programmers ignore Warnings! That said, this seriously is a very hands on java-intense exercise. Extinguished en...
Read moreGLM Bells and Whistles Part 2: Analysis and Results from Million Songs Data
Using the Million Songs Data we want to characterize a subset of the songs. To do this we’re going to run a binomial regression in H2O’s GLM. The approach to characterizing songs from the 90’s is the same method you can apply to your own data to characterize your customers relative to some larger group. In turn, those findings can be app...
Read moreGLM and K means to find Social Response Bias - Dating and Fibbers
In any field where data collection is dependent on what your clients, customers, public, whomever …. tell you, there’s the risk that people are big fat fibbers. This often happens because people respond they way they think they SHOULD rather than with their own personal truths. Social sciences and marketing people call this phenomenon soc...
Read moreRunning analysis on the right data!
All in the day: Anqi Fu, our wickedly smart Math & Data Science hacker-intern from Stanford this summer, was characterizing GLMNet in R on sparse data and comparing with other tools. We were using a data sets predicting Two Bedroom median rent based on neighborhoods from huduser.org. DATA : http://www.huduser.org/portal/datasets/fmr/...
Read moreThe MillionSongs Data Part 1: Bells and Whistles of GLM in H2O
Using the Million Songs Data Set I want to go from beginning to end through H2O's GLM tool. Note that the original data are large, so downloading and fiddling with the full data set can be quite painful if you just do it from your desktop, that said you can find it here . It’s a good opportunity to take a really detailed look at H2 O so ...
Read moreData Science is NOT Rocket Science - H2O at Big Data Cloud
DJ Das brings Sri to talk about H2O by 0xdata to the Big Data Cloud Meetup July 10, 2013. Venue: 3200 Coronado Drive, Santa Clara ...
Read moreBuilding A TB-Scale Math Platform @ Uberconf 2013, Denver
Building A TB-Scale Math Platform Datasets have gotten to PB-scale, but the modeling you can do has been limited to a single-node (e.g. R, SAS) or stuck inside the database or takes hours on Hadoop-like technologies. We have built a simple clustering package, and are using it to do distributed analytics on the sum of all ram in a cluster...
Read moreHands-on Data Science with H2O at GlobalBigDataConference
Experience a hands-on hack data session using H2O & R at BigDataBootCamp by GlobalBigDataConference. Every few months, Sridhar puts together a content-rich conference filled with highly engaged audience. This weekend Globalbigdataconference is doing a BigDataBootCamp – Tickets are on sale. Sri brings H2O & R to this audience, mun...
Read moreAge of the Intelligent Apps Ahead
The Age of the Intelligent Apps is here – Let's gear up.! Businesses are continuously data from.. yes, applications & sensors. Applications are the key to data creation. The future of Applications is to analyze data in-motion – learn the rules of the game at creation, backed by a super-intelligent model from historical data! A powerfu...
Read moreSaving Big Data Science is Saving Science
For time is the ultimate non-renewable resource!Data Science represents the convergence of Domain knowledge, Data Collection and a series of hypotheses validated or invalidated by use of Math. And Big Data Science takes that one step further into the realm of massive datasets that become necessary and pre-condition in Science and Busines...
Read moreH2O at the Hadoop Summit - Machine Learning Evening with Big Data Science
In a triple header with Mahout, Alpine Data and 0xdata, Sri presents at the Machine Learning Evening at the Hadoop Summit. Be sure to bring your Data Science hat on! Topic“Big Data + Better Algorithms ==> Better Predictions with H2O”Abstract:“H2O’s fast high scale open source algorithms are set to revolutionize Predictive Analytics. A...
Read moreHacking K-means with Cyprien Noel
Last night, 0x offices were well populated with some very bright programmers. Big thanks to Cyprien Noel, the 0x hacker who designed k-means. He led the group as we collaboratively worked through building the code underpinning K-means modeling. In case you missed the group last night – we’re doing it again. In about a month we’ll be goin...
Read moreConvert DOS to Unix - Insert Tab A into Slot B
Every day as part of my 0x immersion program one of our hackers tries to explain something he is working on – an especially beautiful bit of code or something about data science and how the mechanics of our project work, or whatever. Every day, at least once, I am completely confused. I realize that this must be exactly how someone who...
Read moreH2O and Big Data Meetup at Elance
While giving a talk at the Meetup at Elance Thursday night Chris Pouliot (Netflix’ lead analyst) commented that good analysis happens not when you have an army of clones, but when you have a diverse set of bright, engaged minds all willing to tackle a problem. No more than an hour later, Sri was reminding us that good solutions and grea...
Read moreStandardized Coefficients
One of the (few) downsides of being in the Bay is the completely absurd traffic. Perhaps I am a bit more sensitive to this than most, given my epic daily commute. While I am normally inclined to whine about my cross-bay traverse a little, yesterday it paid off. You see, I’m used to making sense of things in my own way – which doesn’t al...
Read moreBIG VS. LITTLE: P-Values and Coefficients
The Quick and Dirty: For the moment let’s assume that we have some a priori hypothesis, and we want to test. We can talk about two things: how big the relationship is and how strong it is. P-values don’t care about big – they only care about strong. To get a sense for this recall from ANOVA the fairly common test statistic F . We decide...
Read moreChocolate Cake
Chocolate Cake (Wednesday, June 5, 2013) You know how sometimes you have one bite of really good chocolate cake, or a really amazing peach and totally assume that you could eat another 30lbs of whatever without regard for good manners or physical limitations? Yeah. Decreasing marginal returns dictate that it almost always turns out th...
Read moreData Science is NOT Rocket Science
Finding myself at 0x is a lot less like starting fresh in a new profession and more like choosing cultural expatriation – it is a whole new (beautiful) world. On my first day everyone spoke what I was relatively sure should be English, but it felt like they were actually speaking in their own dialect (which I’ve come to think of as Hexpe...
Read moreMeetup: Distributed Random Forest at SF Data Mining
Come watch Jan Vitek present Distributed Random Forest at SF Data Mining group. ...
Read moreBetter Big Data Algorithms with H2O by 0xdata
Manhattan loves data + math better than any one! Join us on our first New York City meetup talking high-scale algos at Pivotal Labs, Union Sq, NYC Cliff and I will walk through a Big GLM over large datasets and deep dive in parallelizing and distributing algorithms over distributed array-let datastructures. ...
Read moreBig Data Science Practice + Algo Implementation
In this double header we present a practitioners close view of the science and an engineer’s close view of design and implementation of distributed algorithm.Day in the Life of a Data Scientist – Chris Pouliot In this session, Netflix analytical leader Chris Pouliot shares his experience building a large team of data scientists at Netfli...
Read moreH2O Hack Data Meetup
Hack Data with Math, H2O Meetup We derive insights from Airline Dataset – We analyze airline take off and landing dataset of the past 20years and infer about how flying has changed (more delays, different airports) after 9/11? ...
Read moreHack Data with Math using H2O - Silicon Valley Big Data Science Meetup at Google
Thanks for attending! Presentations: Cliff’s H2O and API for Big Data Math Talk JanVitek’s talk on Distributed Random Forest Cliff & Jan will present a deep dive into H2O and Hacking Big Data with Math. We locked down the Venue – Google, Building 43, 1600 Amphitheatre Parkway, Mountain View, CA, 94040. Can’t wait for the fireworks...
Read moreHack Airline Data with Math
Last Thursday of the month, April 25, 2013, is here! It’s BigDataWeek. Join us on our monthly open house and meet the artists and hackers behind H2O. This time we are hacking the airline dataset! “Have you ever been stuck in an airport because your flight was delayed or cancelled and wondered if you could have predicted it if you’d had ...
Read moreTime is ripe for a revolution in Math for Big Data!
Data has always been with us. Everytime we as a race complained about data, a new kind of math evolved to crush the scourge of BigData. Whether it was Newton with Calculus or Einstein with relativity or Shannon with Information theory. Our generation’s response to BigData is due. The time is ripe. For a revolution in Math. One that opens ...
Read moreH2O does BigDataWeek at SF Data Mining
Todd Holloway hosts the SF Data Mining meetup at Trulia and brings a lot of goodness to the community of data scientists here. We are fortunate to present for his group and bigdataweek . 0xdata’s own SriSatish Ambati will be giving a talk on H2O. Sri’s talk will dive into scaling GLM (Generlaized Linear Model), Random Forest , and other...
Read morePredicting Airline Data using a Generalized Linear Model (GLM)
Just recently I created a wiki post on the H2 O Github page with step by step directions on how to predict if a flight’s arrival would be delayed or not. I essentially uploaded airline data from the American Statistical Association to H2 O and used GLM (also known as generalized linear model , logistics regression, or logit regression) to...
Read moreH2O at Predictive Analytics World Conference in SF
Join H2O and the 0xdata team at the Predictive Analytics World conference in San Francisco, CA on April 15 – 16, 2013. Meet us at the 0xdata booth in the Exhibitor Center at PAW where we will be demoing H2O hacking large data sets. Not to mention showing off our latest video. Be sure to look out for great talks from Netflix’s Chris Poulio...
Read moreTest Page
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Posuere urna nec tincidunt praesent semper feugiat. Pharetra vel turpis nunc eget lorem dolor sed viverra ipsum. Nunc sed velit dignissim sodales ut eu. Bibendum at varius vel pharetra vel turpis nunc eget lorem. ...
Read more