Only 4 companies
Users send 400 million
tweets every day.
The only way to access 100% of those tweets in real-time is
through the Twitter “Firehose”. The other option for accessing tweets is using
one of Twitter’s direct API offerings.
Twitter’s Search API
First up is Twitter’s Search API, which involves polling Twitter’s data through a search or username. Twitter’s Search API gives you access to a data set that already exists from tweets that have occurred. Through the Search API users request tweets that match some sort of “search” criteria. The criteria can be keywords, usernames, locations, named places, etc. A good way to think of the Twitter Search API is by thinking how an individual user would do a search directly at Twitter (navigating to search.twitter.com and entering in keywords).How much data can you get with the Twitter Search API?
With the Twitter Search API, developers query (or poll) tweets that have occurred and are limited by Twitter’s rate limits. For an individual user, the maximum number of tweets you can receive is the last 3,200 tweets, regardless of the query criteria. With a specific keyword, you can typically only poll the last 5,000 tweets per keyword. You are further limited by the number of requests you can make in a certain time period. The Twitter request limits have changed over the years but are currently limited to 180 requests in a 15 minute period.Twitter’s Streaming API
Unlike Twitter’s Search API where you are polling data from tweets that have already happened, Twitter’s Streaming API is a push of data as tweets happen in near real-time. With Twitter’s Streaming API, users register a set of criteria (keywords, usernames, locations, named places, etc.) and as tweets match the criteria, they are pushed directly to the user. Think of this as an agreement between the end user and Twitter – you agree with Twitter that whenever they receive tweets that match keywords relating to “hockey”, they will deliver the tweet directly to you as they happen. This is a push of data by Twitter, rather than a pull of data initiated by the end user.The major drawback of the Streaming API is that Twitter’s Steaming API provides only a sample of tweets that are occurring. The actual percentage of total tweets users receive with Twitter’s Streaming API varies heavily based on the criteria users request and the current traffic. Studies have estimated that using Twitter’s Streaming API users can expect to receive anywhere from 1% of the tweets to over 40% of tweets in near real-time. The reason that you do not receive all of the tweets from the Twitter Streaming API is simply because Twitter doesn’t have the current infrastructure to support it, and they don’t want to; hence, the Twitter Firehose.
The Twitter Firehose is handled by two data providers, GNIP and DataSift, which have tight relationships with Twitter. Similar to the streaming API, the firehose consists of an agreement between an end user and distributors of the Firehose (GNIP or Datasift) on what tweets the end user should receive in near real-time. As the data providers receive tweets they are pushed directly to the end user.
The two differences between Twitter’s Streaming API and Twitter’s Firehose access is that you are guaranteed delivery of 100% of the tweets and it’s not free. The Twitter Streaming API is free to use but gives you limited results (and limited licensing usage of the data). Access to the Twitter Firehose removes a lot of the usage restrictions imposed by Twitter but is fairly costly for access to all the tweets.
Complete Twitter Data Access
Realtime and Historical
Twitter Data To Meet The Needs Of Your Business
Gnip was the first authorized reseller of Twitter
data. We provide realtime data as well as access to every publicly available
Tweet dating back to the very first Tweet from March 21, 2006. Whether you are
looking for Tweets about specific keywords, high volumes of data, or historical
data, we've got you covered!
API
(Application Programming Interface) An API dictates how two
interfaces work with each other. In the case of social data, most information
is shared through a streaming API.
Backfill
Backfill is Gnip's product that allows you to briefly
disconnect from your realtime stream and easily get all of your data when you
reconnect.
Big
Data
Big Data is a term to describe the value that companies are
seeing from using data to create actionable insights.
Choice
of Protocols
Choice of protocols means you can receive the data in the
format you prefer GET, POST, or Streaming.
Complete
Complete data is when customers have access to the entire
set of data on a platform so they never miss a conversation.
Data
Collector
Data Collector is Gnip's product that collects and
normalizes data from public APIs including Instagram, Flickr, YouTube &
more.
Data
Mining
A method of computer science that sifts through data to find
patterns using machine learning, statistics, database systems and more.
Data
Scientist
Considered a relatively new field, the profession of data
scientist means different things to different companies and often is a
combination of statistics, machine learning, business intelligence, etc.
Data
Scraping
Data scraping is when a company doesn't get the data from a
social media publisher but rather scrapes content where they can find it. It is
never complete, reliable or sustainable.
Decahose
Gnip's decahose provides a random 10 percent sample of the
full firehose. We'd also like to openly admit it should be called a decihose,
which means a tenth while deca means ten.
Enrichments
Enrichments are how Gnip provides additional metadata to its
data streams making it easier for our customers to digest data. Examples
include Klout scores, geo location, expanding shortened URLs and more.
Firehose
Firehose is a term first coined by Twitter to describe their
complete set of data. Now firehose in conjunction with social media means that
you have access to the full set of of a social media publisher's data.
Geotagged
Geotagged data is when a social media publisher lets the
user decide if they want to provide the exact location of their content.
Geotagged content more often comes from a smart phone.
JSON
(Java Script Object Notation) JSON is a text-based open
standard designed for data interchange that even the human eye can read and is
easy for computers to parse. JSON is the format Gnip delivers its social data
in.
Machine
Learning
Machine learning is the concept that you can teach a machine
to make better predictions and decisions based on data.
Natural
Language Processing
Natural language processing is the discipline of teaching
computers to understand the human language.
Node.js
A Javascript framework making it easy to build network
applications. It's another way to connect to Gnip and consume data.
Predictive
Analytics
The ability to predict future behavior and actions based on
past data using machine learning, statistics, dating mining and other
techniques.
Public
API
Many social media publishers offer a public API providing
access to their data but it is often rate limited.
PowerTrack
PowerTrack is Gnip's powerful filtering language that gives
you the ability to get complete coverage of the data you need.
RESTful
API
With a REST API, you make a request to the server within a
certain time period, and get data back only after you make the request.
Sentiment
Analysis
Sentiment analysis is a technique for determining the
feelings expressed in text aka whether the sentiment of text is angry, sad,
happy, etc.
Social
Data
Expresses social media in a computer-readable format (e.g.
JSON) and shares metadata about the content to help provide not only content,
but context. Metadata often includes information about location, engagement and
links shared. Unlike social media, social data is focused strictly on publicly
shared experiences.
Social
Media
User-generated content where one user communicates and
expresses themselves and that content is delivered to other users. Examples of
this are platforms such as Twitter, Facebook, YouTube, Tumblr and Disqus.
Social media is delivered in a great user experience, and is focused on sharing
and content discovery. Social media also offers both public and private
experiences with the ability to share messages privately.
Streaming
API
With a Streaming API, your requests are ongoing as is the
data coming your way after you make the requests.
XML
Extensible Markup Language - a markup language that defines
a set of rules for encoding documents in a format that is both human-readable
and machine-readable.