![]() While even the names of these datasets are pretty complex, each entry has a helpful breakdown of what’s included, as well as related datasets, and how to go about analyzing them. Frankly, these data aren’t for the faint of heart but if you’re interested in particle physics, they’re worth checking out. It offers access to over two petabytes of information, including datasets from the Large Hadron Collider particle accelerator. Want to demonstrate your ability to work with highly complex datasets? Head to the CERN Open Data Portal. Sample dataset: Higgs candidate collision events from 20 Who knows, you might even make a scientific discovery… 6. If Earth-based data isn’t your thing, NASA’s Planetary Data System takes things a step further with data from interplanetary missions, such as the Cassini probe (which orbited Saturn from 2004 to 2017). Publicly available since 1994, this repository provides access to all of NASA’s satellite observation data for our little blue planet.Īs you can imagine, there’s plenty to peruse, from weather and climate measurements to atmospheric observations, ocean temperatures, vegetation mapping, and more. If you think space is awesome (let’s face it, space is awesome!) look no further than Earth Data. Sample dataset: Environmental conditions during fall moose hunting season in Alaska, 2000-2016 This makes it easy to find something that’s suitable, whatever machine learning project you’re working on. categorical, numerical), data type, and area of expertise. classification, regression, or clustering), attribute (i.e. Launched thirty years ago by the University of California Irvine, don’t let the 90s vibe mislead you-the UCI repository has a strong reputation among students, teachers, and researchers as the go-to place for machine learning data.ĭatasets are clearly categorized by task (i.e. ![]() But if you’re seeking something more niche, why not specialize? Enter the UCI Machine Learning Repository. Generalized repositories are great if you’re happy to browse. Sample dataset: Behavior of urban traffic in Sao Paulo, Brazil UCI Machine Learning Repositoryĭata compiled by: University of California Irvine Because many of the data on the portal are updated monthly (or even daily) you’ll always have something fresh to work with, as well as data that covers broad timescales. While Datahub covers a variety of topics from climate change to entertainment, it mainly focuses on areas like stock market data, property prices, inflation, and logistics. As such, using economic or business datasets for your portfolio project might be worth considering. The goal of many data analysts is to help drive savvy business decisions. Sample dataset: Average mass of glaciers since 1945 Type of data: Mostly business and financeĪccess: Mostly free, no registration required If you’re interested in more general data about the US population, you can also check out the US Census Bureau, offering a rich selection of data about US citizens, their geography, education, and population growth. Search results are also clearly labeled at federal, state, county, and city levels. With over 200,000 datasets covering everything from climate change to crime, you can lose yourself in the database for hours.įor a government website, it has some surprisingly user-friendly search functions, including the ability to drill down by geographical area, organization type, and file format. In 2015, the US Government made all its data publicly available. Sample dataset: Lobster Report for Transshipment and Sales It’s since evolved into a renowned open data platform, offering cloud-based collaboration for data scientists, as well as educational tools for teaching artificial intelligence and data analysis techniques…plus, of course, tonnes of great datasets covering almost any topic you can imagine. Kaggle launched in 2010 with a number of machine learning competitions, which subsequently solved problems for the likes of NASA and Ford. ![]() Like Google Dataset Search, Kaggle offers aggregated datasets, but it’s a community hub rather than a search engine. Sample dataset: Daily temperature of major cities Google Dataset Search aggregates data from external sources, providing a clear summary of what’s available, a description of the data, who it’s provided by, and when it was last updated. While it’s not the best tool if you prefer to browse, if you have a particular topic or keyword in mind, it won’t disappoint. Launched in 2018, Google Dataset Search is like Google’s standard search engine, but strictly for data. It seems we turn to Google for everything these days, and data is no exception. Sample dataset: Global price of coffee, 1990-present Google Dataset SearchĪccess: Free to search, but does include some fee-based search results
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |