Behind the scenes: a day in the life of a data scientist

0

Helping others use data is “like giving them super power,” says Plenty, the senior data scientist at an agro-tech start-up.

Data Scientist Dana Seidel at work.

Image: Dana Seidel

Dana Seidel was “roaming rural Alberta, following herds of elk,” trying to figure out their movement patterns, what they ate, what brought them back to one place, when she had a revelation: the data could help to answer these questions.

SEE: Snowflake Data Warehouse Platform: A Quick Reference (Free PDF) (TechRepublic)

At the time, in a master’s program at the University of Alberta, she was interested in tracking the movements of deer, elk and other central foragers. Seidel realized that she could use her math and ecology training at Cornell University to help evaluate a model that could answer these questions. She continued her studies and obtained a doctorate. at the University of California at Berkeley related to animal movements and the spread of disease, which she has monitored, in part, by collecting data from collars. Much like a Fitbit, Seidel explained, “following where you go throughout the day,” producing GPS data points that could connect to terrestrial data, such as satellite imagery, providing a window into the world. movement of this fauna.

Seidel, 31, has since moved from academia to the startup world, working as a senior data scientist at Plenty, an indoor vertical agriculture company. Or as she would call herself a “data scientist interested in spatiotemporal time series data”.

Seidel was born in Tennessee, but raised in Kansas. She is 31 years old, which she says is “old” for the startup world. As someone who has spent their 20s “investing in a career path and then changing careers,” she doesn’t necessarily have the same industry experience as her colleagues. So while she is grateful for her experience, a degree is not a necessity, she said.

“I’m not sure my doctorate is helping me with my current job,” she said. One area where it helped her, however, was giving her access to internships – at Google Maps, Quantitative Analysts, and RStudio – where she gained experience in software development.

“But I don’t think writing more anthrax and zebra articles has really convinced anyone that I was a data scientist,” she said.

Seidel learned the R programming language, which she loved, in college, and in her master’s program, began building databases. She said she “usually learned alongside these courses to use the tools.” A data scientist’s greatest skill “may very well be just knowing how to do things on Google,” she said. “That’s all coding really is, creative problem solving.”

SEE: Job description: Data manager (TechRepublic Premium)

The field of data science has been around for about a decade, Seidel said. Previously, it was about statistics. “The idea of ​​having someone who has a background in statistics or understands inferential modeling or machine learning has been around much longer than we have called him a data scientist,” she said. stated, and a master’s degree in data science did not exist until the final year of his doctorate.

In addition, the term “data scientist” is very broad. Among data scientists, many different professions may exist. “There are data scientists who focus a lot on advanced analytics. Some data scientists only do natural language processing,” she said. And the job encompasses a lot of diverse skills, she said, including “project management skills, data skills, analytical skills, critical thinking skills.”

Seidel mentored others interested in getting into the field, starting with a weekly Women in Machine Learning and Data Science coffee hour in Berkeley. The first tip? “I would tell them, ‘You have skills,’” Seidel said. Many young students, especially women, don’t realize everything they already know. “I don’t think we often communicate with ourselves in a positive way, all the things we know how to do and how that could translate,” she said.

For those wishing to move from academia to industry, she also advises gaining experience in software development and best practices, which may have been missing from formal education. “If you understand things like industry standard practices like version control and git and bash scripting so that you have some of that language, some of that knowledge, you can be a more effective contributor. ” Seidel also recommends learning SQL – one of the easiest languages, in her opinion – which she calls “the lingua franca of data analysis and data science. Although I think it is. something you can absolutely learn on the job, that will be the primary way you access data if you work in an industry data science team. They are going to have big databases with data and you have need a way to communicate that, “she said. She also recommends developing skills, through things like the 25 days The advent of code, and other ways to demonstrate a clean coding style. “What takes a fair amount of leg work, and until you have your job in the industry, is unpaid leg work, but it can really help you stand out,” he said. she declared.

SEE: Top 5 Things You Should Know About Data Science (TechRepublic)

On a typical morning at his current job, working from home, Seidel drinks coffee and responds to Slack messages in his home office / quilting studio. She checks to see if there are any questions about the data, a problem with the dashboard, or a question about plant health. Software engineers working with data may also have questions, she said. There is often a Scrum meeting in the morning, and they work with sprint teams (meeting every two weeks) and agile workflows.

“I have a pretty unique position where I can float between the different data scrums that we do, we have an agricultural performance scrum versus a perception team or a data infrastructure team,” Seidel explained. . “I can decide: what will I contribute in this sprint? Twice a week there’s a management meeting, where she’s on software and data, and she can listen in on what else is being worked on and what’s coming up, which she says. it, is one of the most important meetings for her, because she can hear directly “when a change occurs on the software side or there is a new requirement coming from operations for a software or for a software or for data to come “.

In the afternoon, she has a good block of development time, “to dig into whatever issue I’m working on this sprint,” she said.

SEE: How to become a data scientist: reminder (TechRepublic)

Seidel manages the data warehouse and ensures that data feeds “are presented to end users in master data models”. Last week, she worked on the farm performance scrum, “validating the metrics coming out of the farm, anticipating new metrics we need to collect, and reflecting on the metrics we have on our farm in South San Francisco, streaming measurements from a few thousand devices. ” It needs to ensure precise measurement flows, which come from everything from temperature to irrigation, to ensure plant health, and answer questions such as: “Why did last week’s rocket hit?” does better than this week’s arugula? “

The main task is to find out if they’re measuring the right thing, and push back and say, “Oh, okay, what do you want this data to explain? What is the question you are asking? She needs to stay a few steps ahead, she said, and ask, “What are all the new data sources that I need to be aware of that we need to take care of?” “

The hardest part of the job? “I really hate not having the answer. I hate having to say, “No, we’re not measuring this thing yet.” Or: “We will have it in the next sprint. Balancing giving people the answers with giving them the tools to access the answers themselves is a daily challenge, she said, with the ultimate goal of making data accessible.

And saying, “Oh, yeah, that data is there and it’s about this simple query” or, “Oh, have you seen this tool that I created a year ago that can fix this? Is really gratifying.

“Helping someone learn to ask and answer questions from data is like giving them super power,” Seidel said.

Also look


Source link

Leave A Reply

Your email address will not be published.