I’ll admit it: For the majority of my adult life, spreadsheets have remained shrouded in mystery.
I’ll admit it: For the majority of my adult life, spreadsheets have remained shrouded in mystery.
"Knowledge is powerful, be careful how you use it!" A collection of inspiring lists, manuals, cheatsheets, blogs, hacks, one-liners, cli/web tools and more. ? What is it? This list is a collection of various materials and tools that I use every day in my work.
What advice do you give someone beginning to learn data science? originally appeared on Quora: the place to gain and share knowledge, empowering people to learn from others and better understand the world. It’s an exciting time to be a data scientist.
I joined LinkedIn about six years ago at a particularly interesting time. We were just beginning to run up against the limits of our monolithic, centralized database and needed to start the transition to a portfolio of specialized distributed systems.
I joined Facebook in 2011 as a business intelligence engineer. By the time I left in 2013, I was a data engineer. I wasn’t promoted or assigned to this new role. Instead, Facebook came to realize that the work we were doing transcended classic business intelligence.
For a company like Slack that strives to be as data-driven as possible, understanding how our users use our product is essential. We knew when we started building this system that we would need flexibility in choosing the tools to process and analyze our data.
The ability of statistics to accurately represent the world is declining. In its wake, a new age of big data controlled by private companies is taking over – and putting democracy in peril by In theory, statistics should help settle arguments.
Five years ago, a team of researchers from Google announced a remarkable achievement in one of the world’s top scientific journals, Nature. Without needing the results of a single medical check-up, they were nevertheless able to track the spread of influenza across the US.
Let’s start with Facebook’s Surveillance Machine, by Zeynep Tufekci in last Monday’s New York Times. Among other things (all correct), Zeynep explains that “Facebook makes money, in other words, by profiling us and then selling our attention to advertisers, political actors and others.
Data is ubiquitous — but sometimes it can be hard to see the forest for the trees, as it were. Many companies of various sizes believe they have to collect their own data to see benefits from big data analytics, but it’s simply not true.
Many conversations about data and analytics (D&A) start by focusing on technology. Having the right tools is critically important, but too often executives overlook or underestimate the significance of the people and organizational components required to build a successful D&A function.
You don’t need to be a seasoned data scientist or have a degree in graphic design in order to create incredible data visualisations. It has become a lot simpler to mine your data and interpret your insights in an engaging, attractive, and most importantly easy to understand way.
“What is the relationship like between your team and the data scientists?” This is, without a doubt, the question I’m most frequently asked when conducting interviews for data platform engineers.
NOTICE: This repo is automatically generated by apd-core. Please DO NOT modify this file directly. We have provided a new way to contribute to Awesome Public Datasets. The original PR entrance directly on repo is closed forever. This list of a topic-centric public data sources in high quality.
The creators of a revolutionary AI system that can write news stories and works of fiction – dubbed “deepfakes for text” – have taken the unusual step of not releasing their research publicly, for fear of potential misuse.
Introduction As I was browsing the web and catching up on some sites I visit periodically, I found a cool article from Tom Hayden about using Amazon Elastic Map Reduce (EMR) and mrjob in order to compute some statistics on win/loss ratios for chess games he downloaded from the millionbase archive,
Big Data, Data Sciences, and Predictive Analytics are the talk of the town and it doesn’t matter which town you are referring to, it’s everywhere, from the White House hiring DJ Patil as the first chief data scientist to the United Nations using predictive analytics to forecast bombings on schoo
Notebooks have rapidly grown in popularity among data scientists to become the de facto standard for quick prototyping and exploratory analysis. At Netflix, we’re pushing the boundaries even further, reimagining what a notebook can be, who can use it, and what they can do with it.
The more experienced I become as a data scientist, the more convinced I am that data engineering is one of the most critical and foundational skills in any data scientist’s toolkit. I find this to be true for both evaluating project or job opportunities and scaling one’s work on the job.
Data science is an exciting, fast-moving field to become involved in. There’s no shortage of demand for talented, analytically-minded individuals. Companies of all sizes are hiring data scientists, and the role provides real value across a wide range of industries and applications.
That’s the question behind the new Chrome extension Data Selfie.
We’ve entered the age of big data, in which more and more companies are seeing the value and importance of data in many different areas of their business, from market and customer research, to internal sales figures and HR analytics.
Not a week goes by without us publishing something here at HBR about the value of data in business. Big data, small data, internal, external, experimental, observational — everywhere we look, information is being captured, quantified, and used to make business decisions.
In the last few years I spent a significant time with reading books about Data Science. I found these 7 books the best. These together are a very valuable source of learning the basics. It drives you through everything, you need to know.
Guest blog post by Rubens Zimbres, PhD. This article brings images from my work modeling with Mathematica, my experience as a Business Analyst and also my doctorate lessons.
Demystifying Data Science: 8 Skills that Will Get You Hired (Just launched, 5.24.18, the new Data Scientist Nanodegree program!) Regardless of your previous experience or skills, there exists a path for you to pursue a career in data science.
Uber is committed to delivering safer and more reliable transportation across our global markets.
Like many professionals, my job doesn’t require expertise in data or analytics. I’m a writer and editor, so I deal with words, not numbers. Still, nearly every knowledge worker today needs to be a regular consumer of data analysis.
The goal of microservice¹ architecture is to help engineering teams ship products faster, safer, and with higher quality. Decoupled services allow teams to iterate quickly and with minimal impact to the rest of the system. At Medium, our technical stack started with a monolithic Node.
Data systems have mostly focused on the passive storage of data. Phrases like “data warehouse” or “data lake” or even the ubiquitous “data store” all evoke places data goes to sit.
A year and a half ago, I dropped out of one of the best computer science programs in Canada. I started creating my own data science master’s program using online resources. I realized that I could learn everything I needed through edX, Coursera, and Udacity instead.
This blogpost is an excerpt of Springboard's free guide to data science jobs and originally appeared on the Springboard blog. Most data scientists use a combination of skills every day, some of which they have taught themselves on the job or otherwise. They also come from various backgrounds.
Update: This article discusses the lower half of the stack. For the rest, see Part II: The Edge and Beyond. Uber’s mission is transportation as reliable as running water, everywhere, for everyone. To make that possible, we create and work with complex data.
If you were to stumble upon the whole microservices thing, without any prior context, you’d be forgiven for thinking it a little strange. Taking an application and splitting it into fragments, separated by a network, inevitably means injecting the complex failure modes of a distributed system.
In the interest of getting back into writing, I want to break the seal with a simple “what have I been up to and thinking about lately” style post. Hopefully future topics will be more focused and frequent. For the past year, I have been working on data and analytics at GitHub.
Service-Oriented Architecture has a well-deserved reputation amongst Ruby and Rails developers as a solid approach to easing painful growth by extracting concerns from large applications. These new, smaller services typically still use Rails or Sinatra, and use JSON to communicate over HTTP.
Those of us who have spent years studying “data smart” companies believe we’ve already lived through two eras in the use of analytics. We might call them BBD and ABD—before big data and after big data. Or, to use a naming convention matched to the topic, we might say that Analytics 1.
The computing industry progresses in two mostly independent cycles: financial and product cycles. There has been a lot of handwringing lately about where we are in the financial cycle. Financial markets get a lot of attention. They tend to fluctuate unpredictably and sometimes wildly.
Machine learning is a branch in computer science that studies the design of algorithms that can learn. Typical tasks are concept learning, function learning or “predictive modeling”, clustering and finding predictive patterns.
Big data! If you don’t have it, you better get yourself some. Your competition has it, after all. Bottom line: If your data is little, your rivals are going to kick sand in your face and steal your girlfriend.
Artificial intelligence is no longer just a niche subfield of computer science. Tech giants have been using AI for years: Machine learning algorithms power Amazon product recommendations, Google Maps, and the content that Facebook, Instagram, and Twitter display in social media feeds.
Over the past 7 years, Netflix streaming has expanded from thousands of members watching occasionally to millions of members watching over two billion hours every month.
Over a year ago, my fellow data infrastructure engineers and I broke ground on a total rewrite of our event delivery infrastructure. Our mission was to build a robust, centralized data integration platform tailored to the needs of our Data Scientists.
Data Science is an ever-growing field, there are numerous tools & techniques to remember. It is not possible for anyone to remember all the functions, operations and formulas of each concept. That’s why we have cheat sheets.
A year ago, I dropped out of one of the best computer science programs in Canada. I started creating my own data science master’s program using online resources. I realized that I could learn everything I needed through edX, Coursera, and Udacity instead.
Data is essential to us at Airbnb. We characterize data as the voice of our users at scale. Thus, data science plays the role of an interpreter — we use data and statistics to understand our users and translate it to a voice that people or machines can understand.
There are three elements to our "big data" efforts, or unhyped normal data efforts: Data Collection, Data Reporting, and Data Analysis. We are all aware that the best companies in the world have an optimal DC-DR-DA allocation when it comes to time/money/people: 15%-20%-65%.
(Sorry about the length! At some point in the distant past, this was supposed to be a short blog post. If you like, you can skip straight to the demo section to get a sense of what this article is about.) Embarrassingly, most of my app development to date has been confined to local devices.
Working with large JSON datasets can be a pain, particularly when they are too large to fit into memory. In cases like this, a combination of command line tools and Python can make for an efficient way to explore and analyze the data.
Part three of my ongoing series about building a data science discipline at a startup. You can find links to all of the posts in the introduction, and a book based on this series on Amazon. Building data pipelines is a core component of data science at a startup.
The key to getting better at deep learning (or most fields in life) is practice. Practice on a variety of problems – from image processing to speech recognition. Each of these problem has it’s own unique nuance and approach.
Machine learning (ML) based data analytics is rewriting the rules for how enterprises handle data.
In August 2016, the Australian government released an “anonymised” data set comprising the medical billing records, including every prescription and surgery, of 2.9 million people.
Don't waste time testing different values individually in Excel. Use a data table to show the results for many different possible scenarios at once. Get the latest Microsoft stock price here.
An excellent visualization, according to Edward Tufte, expresses “complex ideas communicated with clarity, precision and efficiency.” I would add that an excellent visualization also tells a story through the graphical depiction of statistical information.
Together with our colleagues at the University of Hamburg, we — that is Felix Gessert, Wolfram Wingerath, Steffen Friedrich and Norbert Ritter — presented an overview over the NoSQL landscape at SummerSOC’16 last month. Here is the written gist.
Over the last week, I studied seven commonly used data structures in great depth. The impetus for embarking on such a project was a resolution I made at the beginning of the year to train myself to be a better software engineer and write about things I learned in the process.
In a tech startup industry that loves its shiny new objects, the term “Big Data” is in the unenviable position of sounding increasingly “3 years ago”. While Hadoop was created in 2006, interest in the concept of “Big Data” reached fever pitch sometime between 2011 and 2014.
Before MongoDB, before Cassandra, before “NoSQL”, there was Lucene. Did you know that Doug Cutting wrote the first versions of Lucene in 1999? To put things in context, this was around the time Google was more a research project than an actual trusted application.
It feels good to be a data geek in 2017. Last year, we asked “Is Big Data Still a Thing?”, observing that since Big Data is largely “plumbing”, it has been subject to enterprise adoption cycles that are much slower than the hype cycle.
You’re walking home alone on a quiet street. You hear footsteps approaching quickly from behind. It’s nighttime. Your senses scramble to help your brain figure out what to do. You listen for signs of threat or glance backward.
How can you go from zero programming skills to a job in technology or analytics? If you’re interested in learning these skills, whether for fun or for a career change, what’s the best way to go about it?
An obscure controversy has reared its ugly head again this past month. Two icons of the quantitative analysis community have locked horns on the greatest of public stages, Twitter. You may be forgiven for not following the controversy: I’ll do a quick review for the uninitiated.
An open source Data Science repository to learn and apply towards solving real world problems. This is a shortcut path to start studying Data Science. Just follow the steps to answer the questions, "What is Data Science and what should I study to learn Data Science?"
Since its founding in 2004, Palantir has managed to grow into a billion dollar company while being very surreptitious about what it does exactly. Conjecture abounds.
The full code is available on Github. In this post we will implement a model similar to Kim Yoon’s Convolutional Neural Networks for Sentence Classification.
When dealing with data, it helps to have a well defined workflow. Specifically, whether we want to perform an analysis with the sole intent of "telling the story" (Data Visualisation/Journalism) or build a system that relies on data to model a certain task (Data Mining), process matters.
You know that old saying, “If it seems too good to be true, it probably is?” We technologists should probably apply that saying to database vendor claims pretty regularly. In the summer of 2014, the Parse.ly team finally kicked the tires on Apache Cassandra.
More than a million people have now used our Wolfram|Alpha Personal Analytics for Facebook. And as part of our latest update, in addition to collecting some anonymized statistics, we launched a Data Donor program that allows people to contribute detailed data to us for research purposes.