Monitoring tweets about your company, an event, building any kind of real-time analysis of a variety of subjects, etc. Twitter’s Streaming API has an enormous amount of use cases for individuals, companies and governments. Let us dig into how it works and how to integrate it with PHP.
Recently, Twitter release a new version of their API, soberly named “Twitter V2”, including more features and filtering options (that is good). While it is not at feature-parity yet, it is pretty close, notably, all the stream-related endpoints surpass the previous version.
This new version comes with a number of changes that break older packages01, most notably fennb/phirehose. There are alternatives, like spatie/laravel-twitter-streaming-api and spatie/twitter-streaming-api that kept the same public API for the V2 and the V1, drastically limiting the use cases but allowing for an easy upgrade.
Anyway, the last two both rely on my package, so let us use that. I will go over everything you need to know to work with the API.
Following along
You will need an approved developer account. If you do not have one, apply here. The process usually takes a few days for the "essential access". If you want more features and more tweets pulled per month, once you got the “essential access”, you may apply for the “elevated access”, it takes a week in my experience.
You will then need to create an “application” and get your bearer token.
Make sure you have PHP and Composer installed and require the package:
1composer require redwebcreation/twitter-stream-api
I also recommend downloading other packages to make development easier:
1composer require --dev symfony/var-dumper vlucas/phpdotenv nunomaduro/collision
symfony/var-dumper
is for debugging (includes thedd
anddump
function),vlucas/phpdotenv
is for parsing and loading env files.nunomaduro/collision
is a nice error handler for the terminal.
You should also create an .env file with the following contents:
1TWITTER_BEARER_TOKEN=...
Then, create a tinker.php
file and paste the following:
1<?php 2 3require __DIR__ . '/vendor/autoload.php'; 4 5use Felix\TwitterStream\Streams\VolumeStream; 6use Felix\TwitterStream\Streams\FilteredStream; 7use Felix\TwitterStream\TwitterConnection; 8use NunoMaduro\Collision\Provider; 9use Dotenv\Dotenv;10 11(new Provider)->register();12 13$dotenv = Dotenv::createImmutable(__DIR__);14$dotenv->load();15 16$bearerToken = $_ENV['TWITTER_BEARER_TOKEN'];17 18$stream = new VolumeStream();19$connection = new TwitterConnection($bearerToken);20 21$stream->listen($connection, function (object $tweet) {22 echo $tweet->data->text . PHP_EOL;23});
You can already listen to a stream!
1php tinker.php
You should see a bunch of text after a few seconds. Congrats, you are listening to Twitter’s Streaming API in real-time! Yay.
Types of stream
You have already used one of the two types available in the example above: the volume stream, which returns roughly 1% of all the new tweets. There is another one called the filtered stream, which returns all tweets matching a set of rules (more on that later).
There are no technicalities surrounding the volume stream, for this reason, I will use it to demonstrate how this package works before diving into the specifics of the filtered stream.
1use Felix\TwitterStream\Streams\VolumeStream; 2use Felix\TwitterStream\TwitterConnection; 3 4$stream = new VolumeStream(); 5$connection = new TwitterConnection(bearerToken: '...'); 6 7$stream 8 ->withTweetLimit(100) 9 ->listen($connection, function (object $tweet) {10 echo $tweet->data->text;11 });
Let's break down this piece of code, the TwitterConnection
object uses your bearer token to authenticate you. Then — and it is true for any stream that implements the TwitterStream interface —, we call the listen(Connection, callable)
method to start listening to the stream (careful, as Twitter heavily limits the number of calls for both streams: "50 requests per 15-minute window"02, that is a request every 18 seconds).
Each incoming tweet is passed in the callable. In this case, the $tweet
contains an object tweet that follows the same structure as the default Tweet object.
You may also access inside the callable:
The number of tweets received, via
$stream->tweetsReceived()
.The UNIX timestamp at which the stream started, via
$stream->createdAt()
The number of milliseconds since the stream started, via
$stream->timeElapsedInSeconds()
A way to stop further processing, via
$this->stopListening()
However, due to PHP's limitations, you can not stop the stream after a given amount of time, but you may stop processing as soon as you get a tweet after an arbitrary deadline. It's usually a technicality more than a problem, however, if it's a deal-breaker for you, check out ReactPHP, an event driven, non-blocking I/O toolset that could solve this problem03.
Fields & Expansions
By default, Twitter sends little data about the tweet. To get more information, you will need to explicitly request it using fields and (fields) expansions.
Fields
Fields allow for more customization regarding the payload returned per tweet. Let's see that in an example below:
1$stream2 ->fields([3 // alternatively, you can also pass in an array4 'tweet' => 'author_id'5 ])6 ->listen(...);
Which could return:
1{2 "data": {3 "id": "1234321234321234321",4 "text": "Hello world!",5 "author_id": "5678765678765678765"6 }7}
Here's the list of all the available field types and their respective object model (last updated: Aug. 2022):
You can also check out Twitter’s documentation for more details.
Expansions
Expansions let you expand IDs to their complete object, for example, if you request an extra author_id
field, you may expand it using the author_id
expansion:
1$stream2 ->fields(['tweet' => 'author_id'])3 ->expansions('author_id')4 ->listen(...);
Which could return:
1{ 2 "data": { 3 "id": "1234321234321234321", 4 "text": "Hello world!", 5 "author_id": "5678765678765678765" 6 }, 7 "includes": { 8 "users": [ 9 {10 "id": "5678765678765678765",11 "name": "John Doe",12 "username": "johndoe"13 }14 ]15 }16}
The list of expansions is quite extensive and not all expansions work the same, please check out Twitter's documentation on the subject.
Filtering the stream
This part only applies if you're interested in the filtered stream.
Twitter built its own query language that enables fine-grained control over which tweet you may receive, let's dig into it.
Building a rule
Rules are a list of filters to narrow down the results from the 6000 tweets per seconds that you could theoretically get to "only" a few hundred per second, depending on the specificity of your filter, of course. They contain a query and a label ("tag") for this query and are stored on Twitter's side. Rules are persistent between connections. However, they do expire if unused for more than 180 days; you'll get a 30-day notice. A filtered stream can receive tweets from more than one rule: five for the "essential access", twenty-five for the "elevated access" and a thousand for the "academic research access04. Each rule must be unique to your stream.
Note, If you change your rules while connected to the stream, Twitter will use the new rules immediately05.
Before jumping into rule building, let's learn how to save and delete rules using this package.
Save, read and delete rules
You can not update rules.
1use Felix\TwitterStream\Rule\RuleManager;2 3$rule = new RuleManager($connection);
Let's create a rule:
1$rule->save(2 # tweets must contain the word cat and have at least one image3 "cat has:images",4 "images of cats"5);
You may now retrieve your newly saved rule:
1$rule->all();
Which returns an array of Felix\TwitterStream\Rule\Rule
:
1[2 0 => Felix\TwitterStream\Rule\Rule{3 +value: "cat has:images",4 +tag: "images of cats",5 +id: "4567654567654567654"6 }7]
Note, the Felix\TwitterStream\Rule\Rule
is merely a Data Object, it does not contain any method.
To delete the rule pass its ID to the delete method:
1$rule->delete('4567654567654567654');
Batch Processing
To save many rules at once:
1use Felix\TwitterStream\Rule\Rule;2 3$rule->saveMany([4 new Rule("cats has:images", "cat pictures"),5 new Rule("dogs has:images", "dog pictures"),6 new Rule("horses has:images", "horse picture"),7]);
To delete these new rules,
1$rule->deleteMany([2 '1484148414841484148',3 '2585258525852585258',4 '5101510151015101510'5]);
Validating your rules
Twitter has a dry-run mode, meaning you'll hit the endpoint but no rules will be created.
You can either use the validate
method:
1$rule->validate('cats ha:images');
Or, the save and saveMany method both have a dryRun parameter:
1$rule->save('...', '...', dryRun: true);2 3$rule->saveMany([...], dryRun: true);
Changing named parameters is considered a breaking-change by this package, you may use them safely.
Both ways would throw the following exception:
[UnprocessableEntity] cats ha:images : Reference to invalid operator 'ha'. Operator is not available in current product or product packaging. Please refer to complete available operator list at http://t.co/filteredstreamoperators. ( at position 6); Reference to invalid field 'ha' (at position 6) [https://api.twitter.com/2/problems/invalid-rules]
Operators
Finally, how to build rules. We're well past 10,000 characters and the only rule you've seen was about cats. Images of cats. Let's do better.
Types of operators
To prevent you from retrieving all of Twitter in real-time, you have to have at least one "standalone" operator. standalone operators may be a hashtag, a word, an emoji, etc. These standalone operators can not be a stopword – a word like "the", "is", "an", "you", etc.07 –, here are a few examples:
cats
, tweets containing the word "cats"cool dogs
, tweets containing the words “cool” and “dogs”, in any positionWriting operators with a space between them is equivalent to writing cool AND dogs (more on boolean operators later).
"no way"
, tweets that contains the words “no way”, next to each other.#future
, tweets containing the hashtag future.@afelixdorn
, tweets that mentions the given username
Standalone operators are case-insensitive, meaning that the rule cool dogs
would match “COOL DOGS”, “cOol dOGs”… Accents and diacritics on the other hand are respected, pequeño
and pequeno
are two different rules.
Note, a rule like no way
, may return a tweet without "no way" in it because the "no way" is in the quoted tweet. You may also encounter this behavior for replies (the parent tweet matches the rule but the reply will be returned).
Quick tip: while debugging your rules, you can look up a tweet without knowing the author using the following URL template:
https://twitter.com/_/status/ID_HERE
08
On the other hand, "conjunction-required" operators are not needed for a rule to be valid but allow you to filter out tweets to only the ones relevant for your use-case.
Here are a few examples (I will omit the standalone operator that would be required):
-cats
means “tweets without the word ‘cats’”. It is not a standalone operator because querying all tweets without a word may be too unspecific.is:retweet
, tweets that are “true” retweets. It does not include quoted tweets, there is an is:quoted operator for that.-is:retweet
, all tweets except “true” retweets.lang:fr
, tweets identified as written in French by Twitter.point_radius:[-41 174 20km]
, tweets posted in a circle whose center is the longitude (-41
) and latitude174
defined by the first two parameters. The radius of the circle being the third one20km
.…
I will quickly list all the available operators as of now (August 2022), just to give you a peek into how much you can do with Twitter’s Stream API: from:, to:, url:, retweets_of:, context:, entity:, conversation_id:, bio:, bio_name:, bio_location:, place:, place_country:, point_radius:, bounding_box:, is:retweet, is:reply, is:quote, is:verified, -is: nullcast, has:hashtags, has:cashtags, has:links, has:mentions, has:media:, has:images, has:videos, has:geo, sample:, lang:, followers_count:, tweets_count:, following_count:, listed_count:, url_title:, url_description:, url_contains:, source:, in_reply_to_tweet_id:, retweets_of_tweet_id:.
Wow. Took some time.
let us build a rule that retrieves tweets about songs that people are listening to.
1$rule->new('listening to music')2 ->raw('#nowplaying')3 ->isNotRetweet()4 ->lang('en')5 ->save();
Okay, this is cool, you can try and run it. Here is a complete example:
1<?php 2 3use Felix\TwitterStream\Rule\RuleManager; 4use Felix\TwitterStream\Streams\FilteredStream; 5use Felix\TwitterStream\TwitterConnection; 6 7require __DIR__ . '/vendor/autoload.php'; 8 9$stream = new FilteredStream();10$connection = new TwitterConnection(bearerToken: '');11$rule = new RuleManager($connection);12 13$rule->new('listening to music')14 ->raw('#nowplaying')15 ->isNotRetweet()16 ->lang('en')17 ->sample(10) // only returns 10% of the available tweets18 ->save();19 20$stream->listen($connection, dump(...));
If you are unfamiliar with the first-class callable syntax
dump(...)
, here is the RFC. This example also assumes that you havesymfony/var-dumper
installed.
Compiling this would produce the following:
1#nowplaying -is:retweet lang:en sample:10
Note, while the query builder makes heavy use of magic methods to let you use nice method names like isNotRetweet, exceptFromLang, andNotFrom… You still get full autocompletion (as long as your editor understands PHPDoc).
To quickly debug a rule, you may call dd() at any time on the query builder: $rule->new()->...()->dd(). If the function dd does not exist, it defaults to var_dump and die.
Boolean Operators
We are talking about ANDs and ORs here.
ANDs
We have seen previously that separating operators with a space was equivalent to writing “AND”, that means there is no use for the AND keyword, you may use it if facilitates the comprehension of your query but be careful: rules have a max length. You are losing 4 characters per AND you add.
Here are a few examples :
1$rule->raw('dog')->andRaw('doggy'); // (1) dog AND doggy2$rule->raw("I'm famous")->andNotVerified(); // (2) "I'm famous" AND -is:verified3$rule->raw('big')->and->raw('house'); // (3) big AND ho
These would return exactly the same tweets as the examples below (without ANDs):
1$rule->raw('dog')->raw('doggy'); // (1) dog doggy2$rule->raw("I'm famous")->exceptVerified(); // (2) "I'm famous" -is:verified3$rule->raw('big')->raw('house';) // (3) big house
You can use exceptSomething or notSomething interchangeably for is and has operators. Often, one sounds better than the other apart from that, there is no rule, no difference.
ORs
or
s follow the same syntax as and
s and behave as one would expect: “Successive operators with OR between them will result in OR logic, meaning that Tweets will match if either condition is met.”09.
1$rule->raw('study')->orRaw('paper'); // study OR paper2$rule->raw('apple')->raw('iphone')->or->raw('ipad') // apple OR iphone ipad
About the order of operations, tomato potato OR carrot
would be evaluated as tomato (potato OR carrot)
which corresponds to “tweets containing 'tomato' and either ‘potato’ or ‘carrot’”. Inversely, tomato OR potato carrot
would be evaluated as (tomato OR potato) carrot
.
Do not forget to check out Twitter's documentation.
Conclusion
Twitter’s Stream API is a great way to listen to what is happening, now but rules are very hard to get right, iterate on them.
Trends change, if you are planning on running your script for a long time, check regularly your data to make sure you are getting what you think you are getting.
Anyway, thanks for reading.
I am only assuming that using ReactPHP would be the most straightforward way to implement a timeout-based disconnection, it may not be the case. Please reach out if you know better. In the meantime, here's a link to two, probably relevant, packages: reactphp/promise and reactphp/promise-timer
The list of stopwords isn't public and common list of stopwords match very poorly with Twitter's (undisclosed) list of stopwords, most likely because those list are destined to filter out insignificant words in natural language data and not to prevent developers from abusing Twitter's API. The list above was built through trial-and-error by calling the API to check for each stopword individually. I checked for ~700 stopwords and only found four ("the", " is", "an", "you").