Keynote Speakers


Dr. Feryal Ozel

Professor of Astronomy & Astrophysics
University of Arizona, Department of Astronomy

Feryal Ozel is a Professor of Astronomy and Physics at the University of Arizona. Dr. Ozel's primary research interests are the physics of black holes, neutron stars, and theoretical high-energy astrophysics. She is Chair of NASA Astrophysics Advisory Committee, Chair of NASA Lynx Large Mission Concept Study, and the Lead of Event Horizon Telescope Modeling and Analysis Group. She received her PhD in astrophysics from Harvard University and was a Hubble Fellow at the Institute for Advanced Study in Princeton. She was awarded the American Physical Society Maria Goeppert Mayer Prize, was elected to the Science Academy in Turkey in 2013, and was elected a Fellow of the American Physical Society in 2015. She has been awarded a Radcliffe Fellowship, Guggenheim Fellowship, and Visiting Miller Professorship at UC Berkeley. In 2019, she shared the Breakthrough Prize and the National Science Foundation Diamond Achievement Award with the Event Horizon Telescope collaboration for taking the first image of a supermassive black hole. Dr. Ozel serves on numerous national committees and advisory boards in astrophysics and appears in science programs and documentaries worldwide.

Session Abstract

Astrophysical experiments of the current era generate the largest volume and rate of data encountered anywhere in the world. For example, the Event Horizon Telescope, which has taken the first image of a black hole in the nearby galaxy M87, has recorded 5 Petabytes of data over a 5-night observing campaign, while the Large Synoptic Survey is poised to image the entire sky at a high cadence and gather 20 Terabytes per night that needs to be processed in real-time. Different datasets pose different types of challenges, ranging from identifying weak signals to making robust statistical models. I will describe various problem-specific hardware and algorithmic solutions developed in the astrophysics community for these use cases and show that one approach does not fit all.


Jenny Bryan

Software Engineer & Data Scientist

Jenny Bryan is a Software Engineer and Data Scientist at RStudio and an Adjunct Professor of Statistics at the University of British Columbia with a Ph.D. in Biostatistics from the University of California, Berkeley. She works on a team that develops open source R packages to make data science faster, easier and more fun.

Session Abstract

Coming soon.

carol willing

Carol Willing

Willing Consulting

Carol Naslund Willing serves on Project Jupyter’s Steering Council and works as a Core Developer on JupyterHub and She serves as a co-editor of The Journal of Open Source Education (JOSE) and co-authored an open source book, Teaching and Learning with Jupyter.

She is a member of Python’s inaugural Steering Council and a core developer of CPython. She’s a Python Software Foundation Fellow and former Director. In 2019, she was awarded the Frank Willison Award for technical and community contributions to Python. With a strong commitment to community outreach, Carol co-organizes PyLadies San Diego and San Diego Python User Group.

Carol has an MS in Management from MIT and a BSE in Electrical Engineering from Duke University.

Session Abstract

Coming soon.

Fireside Chat - Building a Data Science Driven Company

anna counselman

Anna Counselman


Anna Counselman is co-founder and head of people and operations at Upstart, a leading AI lending startup with more than $160M in venture capital funding. Anna co-founded Upstart in 2012, after a career at Google leading operations. She helped scale Upstart from zero to 200+ employees and originate more than $3.5B in loans. At Google, Anna led Gmail Consumer Operations for 5 years as it scaled from 150 million to 450 million users and launched the global Enterprise Customer Programs team. She also held a variety of operations roles at McMaster Carr and several other startups. Anna graduated Summa Cum Laude from Boston University with a BA in Finance and Entrepreneurship. Anna received a White House Champion of Change award and was recognized as one of Silicon Valley Business Journal's 40 under 40.


Falon Donohue (Moderator)


Coming Soon.

Session Abstract

Coming soon.

Main Panel


Ritika Gunnar

VP, Data & AI Expert Services & Learning

Coming Soon.

Session Abstract

Coming soon.

Rehgan Avon

Rehgan Avon (Moderator)

Women in Analytics

Coming Soon.

Session Abstract

Coming soon.

Advanced & Applied Methods: Machine Learning and Data Science

julia silge

Julia Silge

Data Scientist - Stack Overflow
Text Mining Using Tidy Data Principles

Julia Silge is a data scientist at Stack Overflow and the author of Text Mining with R. She is both an international keynote speaker and a real-world practitioner focusing on data analysis and machine learning practice. She loves making beautiful charts and communicating about technical topics with diverse audiences.

Session Abstract

Text data is increasingly important in many domains, and tidy data principles and tidy tools can make text mining easier and more effective. In this talk, learn how to manipulate, summarize, and visualize the characteristics of text using these methods and R packages from the tidy tool ecosystem. These tools are highly effective for many analytical questions and allow analysts to integrate natural language processing into effective workflows already in wide use. Explore how to implement approaches such as measuring tf-idf, topic modeling, and building classification models.


Vishakha Lall

Software Development Engineer I - Flipkart
Don’t let Neural Networks intimidate you – Understanding complex networks with simplicity

Vishakha is a recent Engineering graduate with a knack for technology! She loves building software that makes an impact and have a keen interest in the geeky world that revolves around data, analytics, cloud and blockchain! She enjoys sparking creativity with the software she builds. Looking at community problems from the perspective of an engineer lets her work on ingenious ideas and solutions. One such idea won her the title of 'Inspiring Innovator' at Anita's Moonshot Codeathon for working on a novel solution to combat traffic problems by encouraging intelligent lane driving. The same algorithm was also published as an IEEE research paper at ICCCNT, 2018. 

Vishakha is a strong advocate of community, sharing knowledge and ideas with fellow tech-enthusiasts and collaborating to build better solutions for the planet. She often applies as a mentor in multiple initiatives to help beginners with their first steps in technology with the aim to galvanize them to think creatively. At work, she collaborates on and builds optimized solutions for complicated graph network problems.

Session Abstract

The session is a must-visit for any Data Science enthusiast with an interest in Computer Vision, Object Detection, Identification and Segmentation, Neural Networks and the like. There is no required level of proficiency in any of the above although an overview and beginner level understanding of concepts would be helpful. The talk would use a specific example of Object Segmentation using a complicated Neural Network model with the aim that attendees would extrapolate the knowledge to several independent applications. Neural Network models often get really complex to understand as and when more layers are added, and it gets challenging for a beginner to get the hang of it. The session would, therefore, focus on how and when to add layers, parallelism and understand the in-depth working of the model. During the session, we would work on a live demonstration through a Python notebook and work through every step together to understand better. I consider myself as a rookie too, and that I believe would encourage all beginners in the crowd to believe in themselves and not just work on complicated solutions but understand them in and out too!


Liz Wanless

Assistant Director of Analytics - Ohio University
Regularization Methods in R and Python: From Ridge to Elastic Net

Coming Soon.

Session Abstract

Coming Soon.


Jigyasa Grover

Machine Learning Engineer - Twitter, Inc.
Fueling Machine Learning with Feature Engineering

Red Hat Women in Open Source Academic Award Winner 2017, Google Summer of Code alumna and currently a Machine Learning Engineer at Twitter Inc., Jigyasa Grover is an ardent open-source enthusiast along-with being a budding researcher. A feminist by heart, she is involved as the lead of a initiatives which help bridge the gender gap in technology. In her quest to build a powerful bunch of girls and boys alike & believing in “we rise by lifting others” she mentors aspiring developers in various global programs.

Session Abstract

In the contemporary world of learning algorithms - “data is the new oil”. Data demands efficient refinement to expose valuable information. To lay a strong foundation for the state-of-the-art machine learning algorithms to work their magic, the crude oil-like data needs to be infused with domain knowledge and extracted into “features”. This talk aims to introduce the audience to the subject of Feature Engineering, and talk about the power of the most creative aspect of data science which often does not get its due limelight. It will also walk the audience through the process of feature engineering as done in formal settings with a simple hands-on Pythonic example on publicly available data, along with putting forward some popular techniques like hashing, encoding, and embedding, which assists in pulling the most out of the data after giving it a proper structure for predictive modeling. Terms pertaining to the realm of feature engineering like relevance, selection, combination, and explosion will also be discussed. The goal is to institute the importance of data, especially in its worthy format, and the spell it casts on fabricating smart learning algorithms.

Leadership & Strategy: Leadership, Culture, Talent, Strategy


Wendy Anderson

Data & Analytics Group Manager - Intuit
Data as a product feature deliverable, not an afterthought

Wendy Anderson leads the Data and Analytics team for Intuit’s Mint and Turbo consumer finance applications where data is not only the key to measuring business performance, data in the form of user insights are directly integrated into the experience to help users track and understand their financial health.

She has over 15 years of digital analytics experience and has implemented data and analytics platforms for and Sony Entertainment Network prior to joining Intuit.

Session Description

Surprisingly, many new product experiences and capabilities are created without a plan to measure its impact. As a result, the data required to measure success are overlooked: missing data instrumentation or pipelines; misconfigured A/B tests that are inconclusive; data science models without a feedback loop to improve. Decision making is slower.

Data has to be treated as a deliverable, as important as any user-facing feature. In the near future, it will be the foundation of features. At Intuit, we realized that data has to be the center of our ecosystem and applications have to be considered consumers of the data. This required changes in our data infrastructure, processes, and organizational mindset.

Three key elements are necessary for this change: the decision plan, the data pod, and organizational discipline. Decision plans ensure that success criteria are defined upfront and provide a roadmap for data requirements. Data pods are virtual scrum teams that own the quality and availability of data for their feature and include: product manager, application developer, architect, data analyst, data engineer, and data scientist. Lastly, the organizational commitment to delivering data as part of the product feature is required to reinforce data quality and completeness.


Brandeis Marshall, PhD

Founder & CEO - The DataedX Group
Inclusion at Scale

Brandeis Marshall, PhD is a computer science scholar and educator who contributes to the business intelligence, data science, and computer science education fields. She focuses on broadening participation in computer and data science. She has served as a faculty member at Purdue University and Spelman College. She is the founder of DataedX, a consultancy firm that provides workforce development training to enhance data competencies. She holds a B.S., M.S., and Ph.D. in computer science from the University of Rochester and Rensselaer Polytechnic Institute, respectively.

Session Description

In this talk, we’ll discuss public value failures in the data science ecosystem describe and focus on two public value failures which offer significant challenges and opportunities for data scientists and the organizations they serve. Finally, we pose the Participation, Access, Inclusion and Representation (PAIR) principles framework for organizations seeking to minimize the impacts of these failures via the creation of a taxonomy capable of deploying data science that reflects the values of the communities they aim to serve. Plausible approaches will be shared on how today’s enterprise can construct, support, and sustain inclusivity using data and the power of the PAIR principles.

Barr Moses Headshot

Barr Moses

CEO - Monte Carlo
Making data downtime a pillar of your data strategy

Barr Moses is CEO & Co-Founder of Monte Carlo, a data/analytics startup backed by Accel and other top Silicon Valley investors. Previously, she was VP Customer Operations at Gainsight (an enterprise customer data platform) where she scaled the company 10x in revenue and among other functions, built the customer data/analytics team. Prior to that, she was a management consultant at Bain & Company and a research assistant at the Statistics Department at Stanford. She also served in the Israeli Air Force as a commander of an intelligence data analyst unit. Barr graduated from Stanford with a B.Sc. in Mathematical and Computational Science.

Session Description

Ever had your CEO look at a report and say the numbers look way off? Has a customer ever called out incorrect data in one of your product dashboards? If this sounds familiar, data reliability should be the cornerstone of your data strategy. In this talk, I'll introduce the concept of “data downtime” — periods of time when data is partial, erroneous, missing or otherwise inaccurate. Data downtime is highly costly for organizations, yet is often addressed ad hoc. We’ll discuss why data downtime matters to the data industry and tactics best-in-class organizations use to address it -- including org structure, culture, and technology.


Laura Ellis

Analytics Architect - IBM Cloud
Data Democratization: Enabling Citizen Data Scientists

Laura Ellis is a data geek who aims to make data science and analytics accessible to everyone. She started her analytic journey with databases and has been hooked ever since. As a natural progression, she moved from databases to BI and then on data science and predictive modeling. Most recently she has focused on real-time event streaming platforms and data applications in an effort to enable data democracy. She has worked in the data field holding a variety of positions for 15 years. She holds a Bachelor of Engineering Science, Software Engineering from the University of Western Ontario and a Master of Science in Predictive Analytics from Northwestern University. Check out her blog:

Session Description

The most successful businesses will equip every layer in the organization with the data needed to identify and drive growth opportunities. The search for this self-service data platform and underlying community has fueled IBMs own internal business transformation in IBM Cloud Platform. Unsurprisingly, the path involves both technical and cultural challenges. Technically, you need to ensure that your data shows the full picture in a timely fashion, with quality information in the tool of choice. Culturally, you need to create an accessible environment that allows subject matter experts to perform deep analysis without becoming data scientists. Join us to learn how IBM Cloud built an internal data platform enabling all team members to drive quality and growth.

headshot (1)

Tiffany Cross

Senior Development Manager - IBM Cloud
Data Democratization: Enabling Citizen Data Scientists

Specialize in data-driven, front-end web development and analytics, integrating great graphic design with RESTful backend services and databases. I lead key projects and initiatives that span teams and ensure that our customers have the best user experience possible.

Session Description

The most successful businesses will equip every layer in the organization with the data needed to identify and drive growth opportunities. The search for this self-service data platform and underlying community has fueled IBMs own internal business transformation in IBM Cloud Platform. Unsurprisingly, the path involves both technical and cultural challenges. Technically, you need to ensure that your data shows the full picture in a timely fashion, with quality information in the tool of choice. Culturally, you need to create an accessible environment that allows subject matter experts to perform deep analysis without becoming data scientists. Join us to learn how IBM Cloud built an internal data platform enabling all team members to drive quality and growth.


Ingy Youssef

Director, Information Risk Management (Cloud & Application Security) - Nationwide Insurance
Motivating the Case for Security - Metrics Based Conversations

Ingy Youssef is a Director of Information Risk Management at Nationwide whose work focuses on application and cloud security spanning a wide range of secure by design practices, security patterns & protocols design as well as threat modeling techniques & software security testing, in addition to cloud security controls, governance and automation. She is an engineer with a defender mission, a security professional who likes to build systems, break systems, research and defend systems and is dedicated to lifelong learning. With a Masters in Information Systems and years of experience developing new software, team leadership and project management, she earned a PhD in Network Security focused on Cryptography while acquiring a wide range of ethical hacking skills and deep knowledge of systems vulnerabilities as a certified ethical hacker. Focused on rounding her expertise in security, she worked to extend her knowledge of various tools and techniques for systems defense and enterprise security architecture. She then moved on to application & cloud security building upon her skills in offensive, defensive and foundational security to serve the developers community at Nationwide as part of the Nationwide’s Information Risk Management team.

Session Description

In the security field, we're always asking for something, maybe some needed testing, some funding, more resources, or simply a critical conversation for shutting down a service or application. Data can be very powerful in motivating our case and in this session I want to share with you examples of metrics that can drive these conversations in the right direction for more secure applications and environments.

Platforms & Process: Tools, Data, Infrastructure

Jordan Hagan

Jordan Hagan

Senior Data Scientist - Miner & Kasch
Optimizing Data Pipelines: Why SQL is Underrated

Jordan Hagan is a Senior Data Scientist with Miner & Kasch. She started her career in 2010 as a government subcontractor working with the OIG/DOJ providing analytic insights to support Medicare Part D fraud, waste, and abuse cases. With expansive healthcare domain knowledge, Jordan has worked closely with hospitals across the United States to build data-driven systems to improve patient outcomes, parse surgeons’ notes, and streamline provider workflows to ensure the highest quality of care. Recently expanding beyond healthcare, she has built workflows to help companies identify customers for personalized retention and marketing efforts powered by recommendation engines and prediction algorithms. Residing in Denver, Colorado, Jordan spends her free time exploring the mountains and sampling great beer.

Session Abstract

This talk is targeted at the many Data Analysts, Data Scientists, and Software Engineers that prefer to write as little SQL as possible and perform analysis and data manipulation in memory. SQL and databases can do a lot of heavy lifting, speeding up data pipelines. Pulling large volumes of data into memory can hinder workflows if the SQL and programming language of choice are not optimized properly. Knowing SQL best practices and what languages to use when will ease your data manipulation and extraction processes.

TanyaBW19 -1 (1)

Tanya Berger-Wolf

Director - Translational Data Analytics institute, The Ohio State University
Artificial Intelligence for Wildlife Conservation: AI and Humans Combating Extinction Together

Dr. Tanya Berger-Wolf is the Director of the Translational Data Analytics Institute and a Professor of Computer Science Engineering, Electrical and Computer Engineering, as well as Evolution, Ecology, and Organismal Biology at the Ohio State University. Berger-Wolf is also a director and co-founder of the AI for wildlife conservation non-profit Wild Me, home of the Wildbook project.

Berger-Wolf holds a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign. She has received numerous awards for her research and mentoring, including University of Illinois Scholar, University of Illinois at Chicago Distinguished Researcher of the Year, US National Science Foundation CAREER, Association for Women in Science Chicago Innovator, and the UIC Mentor of the Year.

Session Abstract

Photographs, taken by field scientists, tourists, automated cameras, and incidental photographers, are the most abundant source of data on wildlife today.  I will show how data science and machine learning methods can be used to turn massive collections of images into high resolution information database, enabling scientific inquiry, conservation, and policy decisions. I will demonstrate how computational data science methods are used to collect images from online social media, detect various species of animals and even identify individuals. I will present data science methods to infer and counter biases in the ad-hoc data to provide accurate estimates of population sizes from those image data.
I will show how it all can come together to a deployed system, Wildbook, a project of tech for conservation non-profit Wild Me,  enabling scientific inquiry, conservation, and citizen science. We have built Wildbooks for over 30 species of animals, including whales (,  sharks (, giraffes (, and working on elephants. In January 2016, Wildbook enabled the first ever full species (the endangered Grevy's zebra) census using photographs taken by ordinary citizens in Kenya.The resulting numbers are now the official species census used by IUCN Red List and we repeated the effort in 2018, becoming the first certified census from an outside organization accepted by the Kenyan government. The 2020 event has just concluded on January 25-26. Wildbook is becoming the data foundation for wildlife science, conservation, and policy. Read more:


Sara Daqiq

OAuth and OIDC - Okta
Securing your ML Assets: OpenID Connect

Sara lives to build great products for businesses; a no-nonsense developer, she currently works at Okta - an identity management company - as a developer support engineer. In this role, she speaks with developers and helps them with the implementation and workflow of their SDKs. Sara enjoys helping others learn to code; she was a teacher for two summers for Girls Who Code and a code coach at theCoderSchool. Besides, she is passionate about women’s rights, helping women get financially independent, and working with women around the world to help them develop useful skills. In pursuit of this goal, she founded AccessLocal, an organization that teaches underprivileged females in rural Afghanistan literacy and financial planning. Sara graduated from Georgetown College with a Bachelor’s in Information Systems. On her free time, she enjoys bike riding, yoga, and going to the gym.

Session Abstract

How can application developers provide their users with secure authentication without investing a lot of time, and instead focus on building the parts of their app that will drive their business? With OpenID Connect (OIDC), you grant authority to a trusted provider to prove that the user is who they say they are. OIDC is built on top of OAuth 2.0 so it has all functionality of OAuth and more. In this talk, we’ll explore how applications communicate to grant access to resources on behalf of a user via OIDC.


Paige Roberts

Open Source Relations Manager - Vertica
Architecting Production IoT Analytics

In two decades in the data management industry, Paige Roberts has worked as an engineer, a trainer, a support technician, a technical writer, a marketer, a product manager, and a consultant.

She has built data engineering pipelines and architectures, documented and tested large scale open source analytics implementations, spun up Hadoop clusters from bare metal, picked the brains of some of the stars in the data analytics and engineering industry, championed data quality when that was supposedly passé, worked with a lot of companies in a lot of different industries, and questioned a lot of people's assumptions.

Now, she promotes understanding of Vertica, MPP data processing, open source, and how the analytics revolution is changing the world.

Session Abstract

Analyzing Internet of Things data has broad applications in a variety of industries, from smart cities to smart farms, from network optimization for telecoms to preventative maintenance on expensive medical machines or factory robots. When you look at technology and data engineering choices, even in companies with wildly different use cases and requirements, you see something surprising: Successful production IoT architectures show a remarkable number of similarities.

Join us as we drill into the data architectures in a selection of companies like Philips, Anritsu, and Optimal+. Each company, regardless of industry or use case, has one thing in common: highly successful IoT analytics programs in large scale enterprise production deployments.

By comparing the architectures of these companies, you’ll see the commonalities, and gain a deep understanding of why certain architectural choices make sense in a variety of IoT applications.

By studying successful production architectures, you’ll learn to:

- Judge large scale IoT technology choices critically and objectively

- Avoid some of the traps that have cost other companies time and money and caused so many implementations to fail

- Insulate your company from some of the impact of rapid change in data management technology

- Learn from other companies’ mistakes and successes so you don’t have to reinvent the wheel

- Choose an architecture that will help ensure your AI and ML projects make it into production where they have real impact

Joy Payton

Supervisor, Data Education; Adjunct Professor - Children's Hospital of Philadelphia; Yeshiva University
Cloud Computing: Mastering Key Concepts

Joy Payton is a data scientist, data educator, and cloud engineer at Children’s Hospital of Philadelphia (CHOP), where she helps biomedical researchers learn the reproducible computational methods that will speed time to science and improve the quality and quantity of research conducted at CHOP. A longtime open source evangelist, Joy develops and delivers data science instruction on topics related to R, Python, SQL, Cloud Computing, and git to an audience that includes physicians, nurses, researchers, analysts, developers, and other staff.

She is also a curriculum developer and adjunct professor at Yeshiva University, where she trains new data analysts in the fundamentals of statistics, data wrangling, data communication, and code. Joy has a special interest in helping career changers and people without strong technical backgrounds climb the data science learning curve.

Her personal research interests include using natural language processing to identify linguistic differences in a neurodiverse population as well as the use of government open data portals to conduct citizen science that draws attention to issues affecting vulnerable groups.

When she's not writing or delivering curriculum, Joy is passionate about inclusion in STEM fields and can be found learning ASL, reading more about workplace bias, or arguing that degree requirements are classist and exclusionary. Joy holds a degree in philosophy and math from Agnes Scott College, a divinity degree from the Universidad Pontificia de Comillas (Madrid), and a data science Masters from the City University of New York (CUNY).

Session Abstract

Cloud computing is increasingly important in data analytics, particularly when it comes to big data. Having data and analytics processes that rely on cloud providers like AWS, GCP, and Azure can accelerate discovery, but it’s not always easy to integrate cloud computing into an organization with established methods. Where do you start? How can you explain cloud computing, and why it matters to the stakeholders that hold the pursestrings? Find out in this session. We'll cover the key concepts of cloud computing, including the business advantages that cloud computing brings and the basic terminology that describes various cloud offerings.

We'll then segue into a hands-on workshop that allows you to use the free tier of a major cloud computing provider, and finish by walking you through the next steps you will need to take in order to be cloud competent or demonstrate cloud concepts for key decisionmakers. This talk and workshop is appropriate for complete beginners to cloud computing as well as for people who need to hire and evaluate cloud architects and engineers or make judgment calls about the strategic application of cloud computing in your enterprise.


Lauren McDonald

Principal Consultant - Systems Evolution Inc.
Real-time Retail - Enabled by Kroger's Event Streaming Platform

Lauren McDonald is a principal consultant at Systems Evolution, Inc. specializing in big data technologies and on assignment at a fortune 20 retail company. She previously worked for GE in big data and cloud engineering, starting her career in GE’s Information Management Leadership Program. Lauren has a BS in Systems Analysis from Miami University in Oxford, Ohio and an MS in Computer Information Technology from Northern Kentucky University. She lives in Cincinnati, Ohio with her husband and two children.

Session Abstract

To compete in a modern eCommerce world, retailers need to have a strong digital presence and react in near real-time to customer preferences and demands. Event streaming using technologies like Kafka enables this near real-time communication, but provides limited or lacking capabilities to manage data quality. Early attempts at real-time business event streaming at a fortune 20 grocer were based on JSON formatted events. Modifications to the event formats occasionally broke downstream consumers, causing costly downtime. In the course of reimagining what an industrial strength streaming platform would look like, we decided to focus heavily on schema lifecycle and management as a foundation. The schema registry is a great service, but it’s only one part of the schema lifecycle management process. Modern business is always evolving, so it is critical the tools and processes are built to gracefully support changes as they occur. In this talk, I'll discuss how we evolved into a robust schema-first development platform, including the key components which drive our lifecycle process and the core principles which inspired them.

Trust & Governance: Ethics, Governance, Policy, Risk


Valeria Cortez Vaca Diez

Data Scientist - Lloyds Banking Group
Detecting Discriminatory Outcomes in Machine Learning Models: A Case Study of Credit Model

Valeria is a Data Scientist at Lloyds Banking Group, specializing in the design and build of scalable Machine Learning solutions for different business areas of LBG and their customers. Her current work at LBG focuses on building tools and processes to detect and mitigate bias in ML models.

Before working at LBG, Valeria studied Business Informatics and Technology Management in Germany. For her final dissertation, she led a study on the economics of privacy at Microsoft Research in Cambridge to understand the trade-offs between reward and disclosure of personal information across cultures.

After finalizing her studies, Valeria worked with the startup TAB as a product manager to build the most comprehensive analytics platform on P2P lending and crowdfunding. During this time, she discovered her passion for Data Science, which led her to take a year to complete a Master’s degree in Business Analytics at Imperial College London. As part of her final project, she researched on discriminatory outcomes in machine learning to analyze unfair treatments in credit models.

Valeria is a strong advocate of ethics and responsibility in AI as well as bringing more diversity into tech teams.

Session Description

Machine Learning and AI are considered by many as techniques free of personal judgment and biases. However, there is significant evidence that proves the opposite, with these methods leading to harmful discrimination and potentially causing long-lasting negative impacts on society.

Policing, hiring and lending are some of the many areas where Machine Learning has harmed disproportionately the most vulnerable groups in our society. Understanding unfair treatment in AI and ML is now crucial to prevent automated discrimination at scale.

The fundamental techniques to analyze and detect bias in Machine Learning decision can be explained through simple metrics applied to model outcomes. The aim of this presentation is to pass this knowledge so that any person, in a technical or non-technical role, can be empowered to challenge how Machine Learning is implemented.

In this talk, I will first focus on how machine learning models make decisions that affect single individuals. Then, I will explain through some examples of the metrics we can use to understand whether the model’s outcomes are having a negative impact on a group in society. Finally, I will present a case study that applies the topics covered.


Dr. Doreen Galli

Chief of Research/Distinguished Analyst - TBW Advisors LLC
Seven Security and Governance Data Space Issues CxOs Don’t Know About

Dr. Galli has been there and done that in IT – before dedicating her career as an industry analyst where she researches and advises enterprise technologists on all things data.

As an IBM Director, Dr. Galli solved the services asset reuse problem, created the methodology for outsource development and established solution-based go-to-market strategies. Concurrently, she launched IBM Global Services Intellectual Property program. Becoming a Fortune 10 officer before the age of 40, she served as the CIO and CTO for Deutsche Poste World Net Mail Division. There she integrated eight acquisitions into a cohesive line of business, establishing business continuity and portfolio management while migrating the infrastructure to MPLS.

Dr. Galli digitally transformed Dell as an IT Executive. The impact included real-time analytics spanning sales, supply chain and manufacturing—both factory operations data and IT data globally. Success included 10 black belt projects. As an AT&T strategist, she shepherded software-defined networking (SDN) solutions into cloud management platforms industry-wide. She’s led teams in excess of 2,000 employees, spanning 170 countries. Her largest profit and loss (P&L) responsibility was nearly $4 billion. She also has plenty of startup experience, given her involvement with WebMD, USA.Net, and Patient Forward.

Trained as a professional speaker, Dr. Galli has been invited to speak at such distinguished venues as Forbes Executive Summit, CIO Summit, CIO Academy, Network+ Interop Las Vegas & Atlanta, The Internet Security Conference, WITI Conference, Grace Hopper, Strategic Analytics Summit, CES WITI Innovation Panel and Gartner’s Catalyst Conference.

Session Description

California Consumer Protection Act (CCPA) and European Union’s (EU) General Data Protection Regulation (GDPR) expect CxOs to be able to answer the question, “who shared what customer data with whom.” To answer that question, enterprise-wide data governance must exist. For governance to be enterprise-wide, it must encompass all data access and sharing. Unfortunately, due to technology choices, configuration errors, missing driver updates, missing log files or lack of understanding of vulnerabilities, many CxOs are not aware of which data copies exist, let alone how they are being shared.

This research shares seven security and governance issues in the data space that remain generally unknown by CxOs. Each of the seven potential issues is examined. Examples are provided of vendors that have solved the issues and those that have not. Other possible remedies are discussed.

Driving Value & Uncovering Insight: BI, Data Visualization, Storytelling

Helen Pollitt Profile

Helen Pollitt

Managing Director - Arrows Up
How to win friends and influence people with reports

Helen is Managing Director at Arrows Up, with over ten years of experience in digital marketing and analytics. She has worked with a variety of international corporates, small local businesses and start-ups to develop a holistic digital marketing strategy that relies on measurement and data. Helen has spoken across the globe and written for leading online publications about digital marketing and analytics.

Session Description

Reports are created to serve the purpose of communicating insight, but all too often they are an overlooked source of marketing potential.

This talk will highlight how to use your existing reports to gain budget and buy-in from stakeholders, to promote your work and make sure the value of your work is known.

Key takeaways will include:

- Learning how to articulate through your reports so anyone will understand your data

- using your reports for your own marketing purposes

- discovering how to tell a story with data

Learn from a marketer about how to get your reports to work for you beyond communicating what the data says.

Industry & Innovation: Use Cases, Emerging Trends

Abigail Baldridge, MS, Preventive Medicine

Abigail Baldridge

Assistant Director of Research - Northwestern University
A Practical Approach to Reproducibility in Academia and Beyond

Abigail (Abi) Baldridge is a research director and biostatistician in the Center for Global Cardiovascular Health and Department of Preventive Medicine, Feinberg School of Medicine at Northwestern University. She is an experienced public health leader with history of working in academic research, pharmaceutical and medical devices industries as an engineer and biostatistician. Abi is skilled in project management, data science, data visualization, biostatistics, clinical research, and teaching within higher education. She works daily in SAS, Stata, and R, and places particular personal emphasis on promoting and adhering to practices for reproducible research. Abi is currently pursuing a doctoral degree in public health at Johns Hopkins Bloomberg School of Public Health with a focus on implementation science.

Session Description

Reproducibility, wherein data analysis and documentation are sufficient so that results can be recomputed or verified, is an increasingly important component of statistical practice. Research is more efficient and robust when research teams can easily recreate and reproduce findings using original data. However, adopting reproducible research workflows can be daunting due to technical barriers, a perceived need to switch away from a favorite software, or the impression that reproducible research is an “all-or-nothing” endeavor.

In the first half of this session, we will explore how to approach reproducible research: steps for starting small, expanding capability, and both technical and non-technical strategies to help along the way. This session will include a broad overview of tools and software for source code control, electronic laboratory notebooks, containers, and manuscript preparation tools.

In the second half of this session, we will take a deeper dive into manuscript preparation and dynamic documents, with a focus on StatTag. StatTag is a free, open-source program that embeds statistical results from R/R Markdown, SAS, Stata, or Python directly in Microsoft Word. With StatTag, results inserted into a Word document can be updated automatically or on-demand, and retain their linkage to the code even when the document changes hands, is redlined, or the text is copied and pasted elsewhere.

This session is well suited for analytics professionals with any level of expertise.


Leigh Stauffer

Software Engineer - Mobikit
Challenges in Mobility and Telematics Data

Coming Soon.

Session Description

Coming soon. 


Sharon Wilhelm

CIO - The State of Ohio Department of Commerce
Session Title - Coming Soon

Coming Soon.

Session Description

Coming soon. 


Sabitha Darsi

Data Architect - Nationwide Insurance
Data Virtualization: A Use Case on Data Services Layers, Transtional/Hybrid Architectures, and Real-time Reporting

Sabitha Darsi has been Technology Area Leader, System Architect /Project Architect and Tech Lead with overall 15+ years of IT experience including 8 years of development experience in the Integration and governance space. Has expertise in the areas of data integration, data management, data governance, Cloud, DevOps, Innovation, packaged applications and data security with extensive experience in providing solutions for Enterprise needs. Has been working as lead architect/tech lead for multiple complex projects managing planning and different phases of DDIT. Also has experience in developing strategies, defining goal/target state architecture, data integration and data management, packaged applications, software selection process, providing innovative solutions, code development, maintenance, production support, upgrades/migration. Sabitha Darsi volunteered and lead several hunger relief programs, participated in hackathons, provided innovation ideas and implemented the ideas. Mentored several associates, contractors and interns. Active participant and volunteered at several All women ARG groups and represented Nationwide at other Women conferences Sabitha Darsi holds a Master's degree in Electrical and Computer Engineering from Wichita State University and is a certified TOGAF certified architect, AWS Big Data Specialty Certified and AWS Solutions Architect-Associate.

Session Description

Do you have lots of data residing in multiple systems? Do you need to get a unified view for them in real-time but your current technology/process doesn't allow? Are you struggling with persisting multiple copies of the stale data? Here we got you covered, literally, but with a virtual fabric. Believe us, it's not Emperor’s New Clothes. It's Data Virtualization - here you don’t need to physically move the data and you get to access it real time with easy metadata and access management. Best of all, it will increase the agility and speed to market of your data product with even less operational overhead. Come join our session to hear more about it with a real world success story from Nationwide Insurance on one of their high profile programs Data Virtualization capability allows for the data to remain in place, and provide real-time access to the source system, thus reducing the risk of data errors and reducing the workload of moving data around that may be used minimally. It enables consumers to transform, improve quality, reformat while continue using the same source data. While it is not a replacement tool for ETL, it increases the agility and speed to market of your data product even less operational overhead. Some of the use-cases includes using it for Data Services Layer, Transtional/hybrid architectures and real-time reporting.

Workshop: Data Storytelling & Visualization Masterclass (6 HR)


Lea Pica

Founder | Analytics Presentation & Data Visualization Consulting

Coming Soon.

Workshop Description


When you are able to connect with your audience in a way they understand, they will be much more receptive to your message. Doing so requires thoughtful research into the deep desires of your audience. We'll discuss audience needs assessment techniques and a robust presentation planning framework that works.

Topics covered:

  • What your stakeholders are thinking but aren't telling you
  • Audience needs assessment + interview strategies
  • Transforming statements to insights
  • Creating recommendations that get acted upon
  • A proven influential presentation planning framework
  • Content brainstorming techniques
  • Digitizing and refining your content plan


The most important, innovative, or valuable ideas can be lost if they're communicated poorly. By learning to think like a graphic and data designer, you'll be able to present your information in a clear and compelling way. We'll review neuroscience-backed design principles, how to work with imagery, and how to avoid the most common visual pitfalls.

You'll learn time-saving tricks on how to customize a PowerPoint template with client branding for reuse, quick formatting tricks for basic charts, and my best keyboard shortcuts to get things out the door fast.


Students are introduced to Lea's proprietary PICA Protocol™, my prescription for healthy, actionable data stories. It includes the anatomy of a healthy data viz, a full framework for visualizing analytical narrative, as well as alternatives for avoiding the most common data visualization mistakes.

Topics covered:

  • Alternative strategies to death by bullet points
  • White space, typeface, color, imagery, and other design fundamentals
  • Time-saving Powerpoint productivity tricks
  • Common data visualization violations
  • Creative animation techniques
  • Lea's proprietary Chart Detox Checklist

Workshop: Storytelling & Presentation (4 HR)


Ruth Milligan

Founder / Managing Director

As an executive speech coach and trainer, Ruth Milligan now lives at the intersection of deep knowledge fields like science and research to medicine and data / analytics married with speakers’ desires to be highly resonating and engaging. This is a culmination her nearly 30 year career practicing some form of communications. Ruth founded Articulation in early 2010 after hosting one of the first TEDx events, TEDxColumbus, which is also now one of the longest running TEDx programs in the world. Since then she and her team have coached over 500 people in TEDx or TED-style talks alongside training thousands in their original classes on content framing, storytelling, public speaking, executive presence and accessing science. Her processional passion is to help organizations of all sizes create storytelling cultures that elevate the opportunities for associates and executives to practice and deliver great presentations. She believes that a great talk can come from a nervous, reluctant or beginning speaker, given the right feedback and development environment. After ten years of curating, organizing and hosting TEDx events, she is also a seasoned host, emcee and consultant to a wide range of events from major donor events at universities to pitch events inside data and analytics teams. Ruth has been the official speaker coach for the Women in Analytics presenters since 2017 and helped to host the main stage presenters in 2018. She lives in Columbus with her husband and two teenage children.

Workshop Description

While data increases it’s influence in making business decisions and driving strategies, sharing insights requires the understanding of story to make the data come to life. Story means the way in which we reveal the problem, the solution and the ultimate impact that data will inform. This class will focus on how to take a basic data set and turn it into a compelling story that will resonate with your business partners. Bring a data set you are working on to apply during class.

Note: This class will not address data visualization.


Sandy Steiger

Director of the Center for Analytics and Data Science
Miami University

As Director of the Center for Analytics and Data Science, Sandy Steiger is responsible for providing co-curricular experiences that allow students to embrace the application of analytics and data science. Prior to joining Miami University, Steiger spent 15 years at 84.51° and dunnhumbyUSA. Her most recent role was Vice President, Data Science & Analytics, where she was responsible for developing a strategic roadmap that would bring transformational, innovative thinking to 84.51°, focused on driving data science at scale across the Enterprise. Steiger holds a Master of Science in Statistics from Miami University, Ohio, and a Bachelor of Arts in Mathematics and Business from Mount St. Joseph University.

Workshop Description

While data increases it’s influence in making business decisions and driving strategies, sharing insights requires the understanding of story to make the data come to life. Story means the way in which we reveal the problem, the solution and the ultimate impact that data will inform. This class will focus on how to take a basic data set and turn it into a compelling story that will resonate with your business partners. Bring a data set you are working on to apply during class.

Note: This class will not address data visualization.

Workshop: Introduction to Data Science in R (4 HR)


Ezgi Karaesmen

Graduate Research Assistant
The Ohio State University College of Pharmacy

Ezgi Karaesmen is a PhD candidate at the Ohio State University College of Pharmacy. She is a genomic data scientist with cancer biology background. Currently, she works with large genomic and clinical datasets in the context of bone marrow transplants. Broadly, she is interested in associations of germline genetic variants with survival events of leukemia patients following their transplant.

Workshop Description

Are you sick and tired of manually manipulating data in excel? Are you interested in learning one of the most popular FREE (i.e., open source) data science programming languages out there? WIA's Introduction to R workshop will provide a complete crash course on the full data science workflow from cleaning and wrangling to visualization, modeling, and repeatable reporting. Two RStudio Certified Tidyverse Instructors will cover the essentials of popular R packages including ggplot2, tidyr, dplyr, tidymodels, and rmarkdown, . Code will be made available to attendees via RStudio Cloud prior to and following this hands-on-keyboard session, allowing beginner level users to dive right in and revisit materials after training. No programming experience is required for the workshop, however interested attendees are encouraged to explore the online book "R for Data Science" which can be found at

Katie Sasso

Katie Sasso-Schafer

Director of Data Science
Columbus Collaboratory

Coming soon.

Workshop Description

Are you sick and tired of manually manipulating data in excel? Are you interested in learning one of the most popular FREE (i.e., open source) data science programming languages out there? WIA's Introduction to R workshop will provide a complete crash course on the full data science workflow from cleaning and wrangling to visualization, modeling, and repeatable reporting. Two RStudio Certified Tidyverse Instructors will cover the essentials of popular R packages including ggplot2, tidyr, dplyr, tidymodels, and rmarkdown, . Code will be made available to attendees via RStudio Cloud prior to and following this hands-on-keyboard session, allowing beginner level users to dive right in and revisit materials after training. No programming experience is required for the workshop, however interested attendees are encouraged to explore the online book "R for Data Science" which can be found at

Workshop: AI & Marketing Attribution Analysis (4 HR)


Katie Robbert

Trust Insights

As CEO of Trust Insights, Katie oversees the growth of the company, manages operations and product commercialization, and sets overall strategy. Her expertise includes strategic planning, marketing operations management, organizational behavior, and market research.

Workshop Description

Marketers want to know what’s working and where leads are coming from. There is only so much budget to go around, so prioritizing your digital channels is essential. In this session, Katie will review the different kinds of attribution analysis you can get out of the box and why you’re better off building your own.

You’ll learn:

  • The different types of models available in out of the box
  • What goes into a building your own model
  • How to structure your project for success

Workshop: Preprocessing Data for Machine Learning in Python (4 HR)


Sarah Guido

Senior Data Scientist

Sarah is a Senior Data Scientist at InVision where she studies user collaboration through data. She is an accomplished conference speaker and O'Reilly Media author, and enjoys making data science as accessible as possible to a broad audience. Sarah attended graduate school at the University of Michigan's School of Information.

Workshop Description

Getting your data ready for modeling is the essential first step in the machine learning process. In this workshop, you'll learn the basics of how and when to perform data processing. You'll learn: how to perform basic techniques such as dealing with missing data and incorrect data types, how to standardize your data so that it's in the right form for modeling, the benefits of creating new features to leverage the information in your dataset, and the process for selecting the best features to improve your model fit.

Women in Analytics Speaker

Speaker Bureau

Find speakers from previous WIA events.
View content, slides, videos.

View Here
Call for Speakers Women in Analytics Conference

Call for Speakers

Interested in sharing your expertise?
Apply by February 29, 2020.

Apply to Speak