Tracy Teal, Open Source Program Director at RStudio, joins us to explore the practices, challenges, and structural aspects of managing and maintaining open source projects. She also shares with us a few of her favorite new and exciting things in the world of open source.
This episode was recorded live at the 2022 DataConnect Conference.
About Tracy Teal
Tracy Teal is the Open Source Program Director at RStudio and previously the Executive Director of The Carpentries. She has been working with open source communities, developing curriculum, and teaching people how to work with data and code as a developer, instructor, and project leader throughout her career.
Relevant Links
- Every Other Thursday (Book)
- Data Umbrella (Organization)
- PyMC-Data Umbrella Sprints (Event Series)
- DataKind (Organization)
- Data Science for Social Good (Organization)
- Paper on running inclusive hackathons (Paper, by Daniela Huppenkothen et al.)
- The Carpentries (Organization)
- Data Carpentry (Organization)
- ROpenSci (Organization)
- R-Ladies (Organization)
- PyLadies (Organization)
- CSCCE, Center for Scientific Collaboration and Community Engagement
- Tidyverse (Collection of R packages)
- Quarto (Open-source scientific and technical publishing system built on Pandoc)
- RStudio (Organization)
- @rstudio_glimpse (Twitter Account)
Follow Tracy
- GitHub
Follow Lauren
- Website
Lauren Burke [00:06]
Hello, and welcome to Episode two of Women in Analytics After Hours, the podcast where we hangout and learn with the WIA community. I'm your host, Lauren Burke, and I'd like to thank you for joining us today. Today, we are coming to you live from the 2022 DataConnect conference hosted in Columbus, Ohio by WIA.
For this episode, I am excited to have Tracy Teal joining us. Tracy is the Open Source Program Director at RStudio and was previously the Executive Director of The Carpentries. She has been working with open source communities, developing curriculums and teaching people how to work with data and code as a developer, instructor, and project leader throughout her career.
Please join me in welcoming Tracy Teal. Tracy has just spoken at the 2022 DataConnect Conference about her work in open source and some of the things she's learned. So thank you, Tracy, for being here with us today, I'm happy to welcome you to the podcast.
I'm going to turn it over to you. Could you give us a little bit of information about your background and what you're currently doing at open source?
Tracy Teal [01:12]
Sure. Well, thank you so much for inviting me to be on this podcast. I'm very excited to be a part of the beginning of this series. This conference already so far has been great.
So yes, a little bit about my background and kind of around open source in particular is that I started in research software as a graduate student, where in biology we were generating a lot of data and needed to be able to analyze it. And it wasn't something that people had yet had the opportunity to learn about. So during that I had actually some system administration background, so got more into programming and worked on some open source software for analyzing genomic data, as well as just sort of some basic data analytics.
So I spent many years as a contributor, user of open source and then transitioned into leading a project called Data Carpentry, which is around teaching researchers how to work with data and software. Shared challenge that we were all facing and was the Executive Director of that and then The Carpentries for five years. Then I worked at Dryad, which is an open source data repository. And now I'm the Program Director for Open Source at RStudio.
So my journey in open source has been kind of along the pathway of being early stage developer contributor. So now doing some management of individual projects to now, really kind of looking out for an ecosystem of open source projects. And I've definitely learned a lot along the way. Still have a lot to learn but have seen through that pathway, a lot of shared challenges and opportunities.
Lauren Burke [03:00]
That's awesome. That is such an interesting background. And open source is such an interesting area, especially for data and analytics and machine learning. There are just so many software tools and libraries being created, basically every day that are introducing such new, novel, and interesting concepts that are very, very beneficial to those who have been in the field for many years, or those who are just joining.
So you previously spent a good amount of time in academia. Like you mentioned, you received a PhD in Computation and Neural Systems from Caltech. So how has your experience in academia helped you in your open source work?
Tracy Teal [03:39]
Great question. I did spend many years in academia and then I would say academia adjacent through The Carpentries work. And so, how has that contributed? So I think a great thing about academia is that there is a lot of space to take a problem and then figure out how to solve it. And so that did sort of give me and many others and still many others, some space to develop software and also kind the ethos of science and research is open.
We certainly could argue, very convincingly, that we need more of that. But that ethos around, like why you're creating things contributes to like how you decide to write and share your code. So I think that actually that foundation was really important in like giving me the space to do that work and then how to think about it.
I think one of the challenges in academia with that work, that is true in other contexts, is that while you can do that work, it isn't necessarily within the incentive structure of a university. So it's sort of hard to figure out where your work belongs, how you get credit and value for it, who is it that's doing this work with you. It can feel really lonely. It can feel like it's not supported.
And so I think that set of experiences in academia is actually pretty representative in a lot of different spaces and was like a good foundation in both the opportunities of what you can do, but also looking at kind of the social structure and management structure of these projects and some areas that could use some improvement.
Lauren Burke [05:16]
That makes a lot of sense. It sounds like in academia, you're researching something a lot of times it's a new thing you're trying to do. You're trying to discover something new or find a better way to do something. And a lot of that seems to translate super easily over to open source where you're doing those exact same things, just maybe in a different, even less structured environment, which makes it more difficult.
So what are some of the things that surprised you about working at open source, that might have changed over time, even from when you first started?
Tracy Teal [05:47]
So what are some things that surprised me? I think part of the thing that surprised me was that transition from like, I wrote some software and I use it, to the idea of other people using it, and everything that came along with it. With actually any sort of project I've been involved with, I always kind of start out thinking about the technology and you know, where are the challenges there and what can we solve and what can we do together. And ultimately landing on that the real challenges are like societal or structure around people.
So I think I kind of learned over and over again how important that piece is in any project and how it's sort of also continually a little bit neglected. So I, I think that, you know, I think I got into it really thinking like science, technology and just like the journey is really now about people.
Lauren Burke [06:46]
I feel like the sense of community is very important in open source. Not just the people that are contributing, but the people that use it, the people that are wanting to see improvements. It's just a very large-scale community.
And I really liked in your presentation, if anyone wasn't able to see her presentation, she had this really interesting upside-down pyramid chart that showed the levels of contribution and how involved you are. And it kind of spoke to how so much of open source falls to really just the top few contributors or that team.
One thing you mentioned in your talk and you said your favorite part of the talk was about templates which I thought was really interesting. So do you feel that reproducibility is necessary for the success of an open source project?
Tracy Teal [07:30]
Templates are one of my favorite things. I love templates because I think it helps guide people in what to do. It's sort of like a lot of people have thought about this, you don't need to start from scratch. You know, what's my starting place, I already think that's a big win.
We talk a lot about onboarding people in open source and onboarding people and things like templates are a way to onboard people in some sense without necessarily having to talk to each other. So just to shout out to templates.
But the question then was about reproducibility. So is reproducibility important in open source? That's a really good question. I mean, there's a lot into what do you mean by reproducibility? So in the scientific context it means one thing, in software it means maybe something else. So maybe let's start by me describing right now what I'm gonna mean by reproducibility. So the idea that someone could take code and use it and have the same outcome. One, be able to use it. And two, have the same outcome as another person who used that software. Let's say that's reproducibility.
I do think that that is important, even if you're the only one who uses it. And part of that is because you're doing it for future you. So when you're doing something it's like all consuming. It's in your mind, you know what's going on. Your variable names make perfect sense. You're like, I will remember this forever. And then a month later, you go back and look at it and you're like, what the heck was I doing? I don't know. Maybe that's just me. But so the idea of like documenting your code, making sure things can run for future you, if nothing else, is really important.
So that's about, can someone run the code. What kind of information do I need to have that they're able to do that and then does the code work? Right? So those are important things. That's two pieces, does the code do what it says it's doing, that's a whole other deal, right. Then you have, you know, thinking about, okay, do we have tests and articulating what it is that it's supposed to do and making sure algorithms are correct. Like that's a whole other piece. And so, can it run again? Sure. That's important.
And then do you have the same outcomes that is also important, right? You don't want someone taking some code, running it and then getting a different answer, especially if you're thinking about things like modeling. That's one where even the computer chip you're using can result in different outcomes. That's harder to test, but if that is something that you're saying, you can count on this code or I did this code. Either articulating how you ran it, like, what was your computer? What was your environment? So that someone could do that again. Or making sure that you're getting the same outcomes on other computers. And if you're the only contributor, that kind of let me test on all systems in existence, just might not be possible.
So more capturing what it is you did. And that's where I think we're seeing the emergence of things like containers, like Docker. Where you can say, here's my environment. Here's what I did exactly. That can be helpful, but the Docker containerization is still not super approachable for a new user. So that has its challenges.
Lauren Burke [10:53]
I like the one comment you made on making it sort of reproducible for yourself. For me personally, if someone compliments not just my code, but says my comments are good. That is a very, very good compliment to me. And I really appreciate that, maybe even more than if they said they like my code. So that's interesting. And I like that you mentioned that as well, because I think that is such a good skill to learn early on.
One thing you mentioned, that I'd like to touch more on is that some of those things like containerization, Docker, might make open source a little bit harder for new people to get into. So what are some ways that you feel like open source could improve the way that they are attracting new contributors, or maybe people that are newer to the field or even contributing themselves?
Tracy Teal [11:40]
Yeah, I want to add the caveat like there is a list that open source can do, but let's also recognize, what was the topic of my talk around managing an open source project and what you alluded to with the triangle. You know, that you have two people maintaining a project. So sometimes we say, huh, open source should do X, Y, and Z. And it's like, well, but that's like two people doing it on Saturdays, right? So let's also acknowledge that as much as we might say, like projects should do X that, you know, practically there are limitations around what we can implement and the different skills and perspectives that maintainers have.
So with that caveat, I think that things that open source projects can do are some of the things that I alluded to in the talk. Which is around articulating and managing expectations through documentation and then using templates so that projects look kind of similar to each other.
So one of the challenges in open source, is you might be familiar with one open source package and you feel like, okay, I got a handle on this, and then you go to another one and something looks completely different. Right? So you have that extra cognitive overhead of oh, okay where do I start? What do I do? When all the projects look different, it makes it difficult to go from one to another. So it'd be nice if there were some similarities. Again, lots of challenges there and which set of things do you use.
But those pieces that are not the code. That are, like you said about the comments, the documentation. So how does someone get onboarded? How do they learn about the project? How do they get started?
Examples are really key. People like that, people really resonate with examples. Things like cook books. So a lot of pieces in the kind of the documentation realm we know help, like studies show.
The other piece is feeling like it's a welcoming community, right? I belong here. And that's a big one. Sometimes when we look at these packages or these communities, we see, you know, what are the comments? What's the level of the comment? Are people being nice to each other?
There's a lot of sense of, I don't belong in open source. And so creating communities where people feel like they belong. And this is another one that I've been thinking about, and I don't know if there are solutions for, but spaces where you can make mistakes. So one thing that's really challenging is that the advice is go work on this open source project and here's how you get started.
And maybe I have great documentation in a good community. But it's still, if it's a package that a lot of people rely on, you don't feel like you can just come in there and do something and have it be wrong. Right? Because you're like, oh my gosh, all these people count on it, this is pretty high stakes. So how can you get started learning in a supportive community where it's okay to make mistakes? So in some sense I kind of wish that there were open source packages that were like sandboxes. That like, not didn't matter, but that gave us the space to learn together, make mistakes, without feeling like it's so high stakes all the time.
And that's actually why I feel like I was kind of fortunate in my learning journey of developing open source, just like for myself at the beginning was that I could make those mistakes, and did make many mistakes without anyone going, oh my gosh, I can't believe you did that. Because it wasn't something that a lot of people relied on. I did have friends or colleagues that I could talk to and say, oh my gosh, what am I doing again, fortunate there.
But, it'd be interesting to think about how we could kind of create these spaces for learning together in a way that doesn't necessarily exist in ongoing open source projects.
Lauren Burke [15:18]
I think that what you mentioned, the feeling you need to feel like you're part of the community, but you have to get to a point where you can feel that way yourself. I think that falls in line with the imposter syndrome we see in a lot of fields, especially for women in the technology fields.
So I like that you brought that up. That even if you want to contribute, when do you get to that point where you feel like you can be a contributor, you can call yourself a contributor? When do you feel like you know enough to make that first pull request and ask for a review on that?
So going off of that, what are some things that you think people can do to get involved, to work on their very first open source project? If they are feeling this way, they don't know where to start, they don't know if they should start.
Tracy Teal [16:02]
Yeah. This is like a new thing that I'm trying out as an idea. So, we'll see. I'm not really saying it's a solution. It's just a thought I had in seeing these challenges, like just what you were saying and, also you alluded to imposter syndrome. And that is an important thing, but I think also a lot of like what we've labeled imposter syndrome is not actually the personal responsibility. It's like, they don't feel like they belong because they haven't been made to feel like they belong. Like that's a societal, structural issue. Not just someone feeling like, oh my gosh, I don't have the confidence. I mean, it's both of those pieces, but what we can look at is that societal, that structural. What can we do to broaden that sense of belonging.
And so one idea I have around getting started is actually not to contribute to these big projects to start, but to have your own thing. Start on your own and have like a friend work with you on it. Maybe even someone who isn't doing the same thing, have them do code review, spend an hour doing paired programming. So as a way to get a sense of where you are learn together in that like safe space. And I mentioned this book in my talk, Every Other Thursday. And it's a book about a group of women in academia who met every other Thursday and created this structure, where they supported each other through career challenges.
And so I would love to see a thing that was like Every Other Thursday for open source. And you could you could do it virtually, right? You could like sign up, and connect with some other people and like here's my open source and let's work on stuff together. So I'm curious to think about this idea of, instead of saying the road to get started is go read the docs, be part of this community. Is to say find some people where you feel safe and start out there and build your confidence and help you navigate.
Even that first pull request that you put in, if you then have a community, you're like, hey, am I good to put this in? And they're like, yeah, that looks good. That just makes you feel better than like sitting on your own on your computer and being like I wonder how this pull request is gonna go.
Lauren Burke [18:11]
I think that is a great list of suggestions. I like what you mentioned earlier about how we sort of need a sandbox almost where people feel like they can go in there, they can try things out and they might mess it up, but they have support to fix it and no one's going to be mad at them. They're not going to sort of break this entire software by adding one change that they think is helpful.
I recently saw a sprint that Data Umbrella put on. It was like an intro to PyMC, which is a Bayesian probabilistic programming language, packaged through Python.
And I thought that was really interesting because they not only set up the sprint, but they had a lot of training sessions for what you should expect for your first sprint, what you should be doing to prepare, where you should go. I think things like that are really helpful for people who might want to go to a sprint, but they don't feel like they've been a long time contributor. They don't feel like they have the experience to do it. They feel like they might mess up something.
What other things like that have you seen that you feel like should be more prevalent in the open source community?
Tracy Teal [19:13]
Yeah, that's a great example of, you know, not just go to a sprint or go to a hackathon, but here's what it looks like, and here's what you need to know, and here's how you contribute. Like all that infrastructure around is already great.
Some other ones that I'd seen that are good, DataKind. And there's also Data Science for Social Good. And so those are sort of similar, in that they both have events and sort of onboard you to those events and are very explicit about creating welcoming and learning environments. And I think really a plus in those is like I was saying, oh, get together and talk about some stuff. Okay. That's great. But what we know that people really will sort of catalyze around is something that they care about, right?
So instead of just random code come together, is we wanna solve this problem together now let's come together around that. And so like the Data Science for Social Good is a great example where people are coming together because they care about a problem and then working together. So that creates like more of a connection to start because you have that shared interest and something is important to you.
And we've seen studies too. Even I think Carnegie Mellon, in how they teach computer science. If it's not just about code, but also about what challenges can you solve with code, that brings more people into coding. If the framing is around societal impact, that more people will participate. So I really love projects like that.
There's a great paper, too, about how to run inclusive hackathons, that I don't know the name of right now. I'll have to follow up with the link, but so having those events is really key, but then also how you have them is so important. So I love that you alluded to both of those pieces.
Lauren Burke [20:57]
Yeah. That's a really good point just about the way that the community needs to be set up and needs to sort of embrace those new contributors because they are at the end of the day, the ones that are going to be maintaining that software down the line. I also like what you said about framing it as a problem-solving aspect. I think a lot of people in tech fields really love the problem-solving. They love working in tech, but I think for anyone, if you can't feel and see the impact that you're having in your job, on a project you're working on, it's hard to sort of find motivation after the excitement wears off. So I love that point that focusing on the impact you can have is something that really drives open source and really drives the people that are contributing to it.
We are about to wrap up, but I would like to ask you first, what are some of the open source projects that you're excited about? And these could be things you are working on, you have worked on or just something you've seen that looks really interesting.
Tracy Teal [21:57]
Wow. This is a good one. I have so many and I know I'm gonna miss some, so I will apologize in advance.
So some of the open source projects that I'm excited about that like build community. I would say I'm obviously biased, but The Carpentries. That's one that I think is really important for empowering instructors, empowering learners to get started kind of in this context of we're saying learning, community, making mistakes. So I'm really excited about that project.
But there's a lot of other projects like that that support the ecosystem of people developing software and building communities. So one is ROpenSci. There are projects like R-Ladies, PyLadies. CSCCE is one that's around supporting community managers.
So those are kind of the pieces that I think are really foundational. That aren't really just about code, but operate in this kind of open context.
In terms of what projects am I excited about? I am closest now to the R ecosystem and what we are developing at RStudio. And so like the Tidyverse ecosystem, I was already really excited about before I came to RStudio because I did teach a lot of R. And I taught in Base R and that's already really powerful, but when we switched over to teaching Tidyverse.
The thing about the Tidyverse is that the way that you work with data and the functions that you use are matched with the way you would talk about something. And so when you say, oh, I want to select this data and filter on this, those are the actual words of the functions.
And so it makes it so much easier for a person to get started. And I hadn't taught in a while and I just taught again last week and just what it enables for people. Like how people get it, what they are excited about, it's so amazing. Like every time, it makes me happy. So I'm really excited about that.
Another thing that I'm excited about is sharing the results of what you do with code. The Jupyter ecosystem has really been very important, for contributing around putting together the words and the code and being able to rerun analysis. In R you have Knitr and R markdown and the whole ecosystem of things that allow you to integrate code and text.
And at RStudio, we're now developing it's out there already, Quarto, which brings together a lot of these different principles for scientific publishing and combines. Like you can do papers and presentations and all kind of websites. And speaking of templates, it allows you to do these things much more easily so that you're not having to spend as much time figuring out, you know, what's my structure, but you can focus more on the content.
And so I can write the content and then have it come out in PDF, or LaTeX or whatever these are. So again just about creating structures that make it easier for people to not only write code, but communicate about, and share that code. So I'm really excited about that project and what it will enable.
Lauren Burke [25:03]
That's awesome. I love open source and I love Python and R because there are two languages where people are absolutely focused on bringing new and exciting things to the people that use those languages. Every day I feel like I see on LinkedIn or on Twitter, a new package that someone has developed that, it's a problem I've heard of, it's a problem I've had. And I've always been like, I wish someone would fix this and look someone's fixed it. So that's awesome.
Before we close out, I wanted to ask, is there anywhere that our listeners can keep up with you, on social media or a website?
Tracy Teal [25:35]
Speaking of structures that I'm not good at. I am on Twitter. I don't post a lot on Twitter, but I'm there. And I don't have a website, but I'm working on it. So I guess I would say Twitter.
Another place though, that we're trying to communicate more about like what's going on with RStudio and the packages and like these kinds of documentation. The Twitter handle, @rstudio_glimpse. So we're trying to surface a lot more of what we've been talking about here. Kind of the nuts and bolts of what goes on with the open source. So I guess those are both good places.
But yeah, I just love the opportunity to talk about this topic today and with you now. It's something that's really important to me. And so I am going to be looking to try and develop resources, probably in GitHub. So I'm the same @tracykteal on GitHub as well as on Twitter.
Lauren Burke [26:22]
Awesome. Well, thank you so much for being here with us today. We really appreciate it. Your talk was so interesting earlier today. If anyone missed her talk at the conference, you can look at the WIA membership platform and you can see the recording. It should be available in a couple weeks.
But thank you so much, Tracy. It was such an interesting conversation. I'm so glad we were able to have you on today.
Tracy Teal [26:44]
Thank you so much for having me. It just was such a delight and a great chance to meet you and talk about something that I care a lot about. So thanks so much for the opportunity.
Keywords: