Mike Verdone [Mon, 3 Feb 2014 21:51:53 +0000 (13:51 -0800)]
Merge pull request #196 from adonoho/pr-fix-stream
A Simpler Fix to the Streaming Code due to Changes from Twitter on Jan. 13, 2014.
Gentlefolk,
This is a candidate release patch. I propose it become the formal branch of this library and have dubbed it version v1.10.3. I once again formally thank RouxRC for his efforts moving this library forward. Any errors in this patch remain mine and do not reflect upon RouxRC or his code.
This library is a high performance streaming library. Compared to other Twitter libraries, it is easily an order of magnitude faster at delivering tweets to your application. Why is that? When streaming, this library pierces Python's urllib abstraction and takes control of the socket. It interprets the HTTP stream directly. That makes it fast. It also makes it vulnerable to changes. It needed to be upgraded when Twitter upgraded the protocol version.
Twitter's switch to HTTP v1.1 was long overdue.
Summary of changes:
- Based upon RouxRC's code, I turned off gzip compression. My version is slightly different than RouxRC's version.
- Instead of incrementally reading arbitrary lengths of bytes from the socket and seeing if they parse in the JSON parser, a good technique, the switch to HTTP chunking forced us to process in chunk sized blocks. Based upon inspection, Twitter never sends partial JSON in a chunk. They also send keep-alive delimiters in single 7 byte long chunks. This code depends upon both of these observations. It does not do general purpose HTTP chunk processing. It is a Twitter specific HTTP chunk parser.
- Chunk oriented processing allowed me to isolate stream interpretation to the chunk code and migrate the wrapper code to operate exclusively using strings. This makes the wrapper code more readable.
- Once I had opened up the wrapper code, I cleaned it up. This involved modest edits in how certain socket parameters were determined and moving data exclusive to the generator into the generator and out of the containing object.
- As this is exclusively socket oriented code, the HTTP exception catching was removed from the method. The exception was moved to wrap the opening of the socket by url lib.
- Due to reading the data in larger chunks and, hence, running it through the JSON parser less often, this code is about 10% faster than the prior generation.
- When Twitter hangs up on us, this code emits a `hangup` message in the stream.
- This code has been tested using Python v2.7.6 and v3.3.3 on OS X 10.8.5 (Mountain Lion). I have tested it on the high volume sample stream and on a user stream under both versions of Python. It is believed, but not tested, that it will function under Python v2.6.x. It uses the bytearray type. I believe that has been back ported all the way to Python v2.6.x. As the code is not particularly tricky, I do not foresee that it has introduced any new issues that were not already apparent in this library.
- I use this patch in production and have captured 50M+ tweets with it. It is solid and reliable. If you find it to not be so, please contact me. I use it in production and have a vested interest in ensuring that it catches all corner cases.
Thank you for your patience while I refine this patch and I ask Mr. Verdone to select this patch as the basis for moving this library forward.
Andrew W. Donoho [Tue, 28 Jan 2014 14:13:06 +0000 (08:13 -0600)]
Further refine socket management.
All HTTP chunks are read in their entirety.
Cosmetic code improvements. (The socket's blocking state is set in a more compact form after a DeMorgan's boolean transformation.)
Hangups by Twitter, as with timeouts, are signaled via a message to allow gracious recovery.
Andrew W. Donoho [Mon, 27 Jan 2014 13:26:44 +0000 (07:26 -0600)]
As Twitter appears to send complete JSON in the chunks, we can simplify buffer management to only operate on strings and not re-encode the string as bytes. This improves readability at the expense of breakage if Twitter starts spanning JSON across HTTP chunks. This is an unlikely change to their infrastructure. That said, this is a totally optional patch.
Andrew W. Donoho [Thu, 23 Jan 2014 23:44:46 +0000 (17:44 -0600)]
Minimize string decoding and move to use a bytearray for the buffer. This reduces memory consumption and is faster than the += operator for buffer concatenation and trimming.
Mike Verdone [Mon, 4 Nov 2013 11:47:16 +0000 (03:47 -0800)]
Merge pull request #185 from cegme/json_status_dump
Added a json format option
This addition allows the user to get the raw json tweet information from each row. This is helpful when the twitter json format is needed by another process.
Example usage: `twitter --format=json friends`
This would get the latest tweets from friends in the raw json format. That json can be ported into another process or another database for processing.
Mike Verdone [Mon, 4 Nov 2013 11:43:32 +0000 (03:43 -0800)]
Merge pull request #178 from dkanygin/master
added timeout option to TwitterStream
In case of low tweet volume, we now can timeout and exit iterator to update search query or other housekeeping tasks.
Christan Grant [Fri, 18 Oct 2013 22:44:49 +0000 (18:44 -0400)]
Added a json format option
This addition allows the user to get the raw json tweet information from each row. This is helpful when the twitter json format is needed by another process.
Mike Verdone [Mon, 2 Sep 2013 16:35:56 +0000 (09:35 -0700)]
Merge pull request #167 from lumbric/master
Add stream documenation
It was very difficult to find information on this topic. Now that I figured out how to get direct messages, I added it to the README.
See also questions and discussions on this topic:
http://stackoverflow.com/a/17536438/859591
https://dev.twitter.com/discussions/8081
https://dev.twitter.com/discussions/8110
Mike Verdone [Mon, 2 Sep 2013 16:34:50 +0000 (09:34 -0700)]
Merge pull request #174 from RouxRC/master
POST for "statuses/filter" in Streaming API
Twitter recommends to use preferably POST for the filter method in the Streaming API https://dev.twitter.com/docs/api/1.1/post/statuses/filter
So it should be listed here
See also questions and discussions on this topic:
http://stackoverflow.com/a/17536438/859591
https://dev.twitter.com/discussions/8081
https://dev.twitter.com/discussions/8110
Mike Verdone [Sat, 22 Jun 2013 17:11:13 +0000 (10:11 -0700)]
Merge pull request #156 from mattcen/master
DM archiving, Twitter API upgrade, better timestamps.
You know what's awesome? Patching a program, realising you should rebase your patch on the latest commit (I based off twitter-1.8.0, so had a fair few changes to make), and then finding all the features (namely Favourites and Mentions) that got added to master in the meantime! Love your project! I will likely try to tweak the Favourites and Mentions behaviours in the near future though so they and Timeline-fetching aren't mutually exclusive.
NOTE: You'd need to update your Twitter App settings to allow viewing and posting of DMs for this to work out of the box for people.
Add argument to get DMs
Adapt statuses_portion()
Adapt statuses() to optionally handle DMs
Adapt main() to pull down DMs if instructed
Enforce Twitter API 1.1 for archiver and follow.
Add option to allow more accurate timestamps (specifically the timezone specification) in output files.
Matthew Cengia [Sun, 9 Jun 2013 06:57:14 +0000 (16:57 +1000)]
Convert archiver.py and follow.py to API 1.1
This is mostly done. I've not yet decided on a tidy way to re-implement
the API limit tests, since this has changed significantly between API
versions 1.0 and 1.1.
Further, as I understand it, API 1.1 requires OAuth for everything, but
it is still an optional command argument which is off by default. This
should be fairly trivial to fix, but I've not yet done so.
Mike Verdone [Fri, 21 Jun 2013 20:08:23 +0000 (13:08 -0700)]
Merge pull request #160 from nicksloan/master
Fixes sixohsix/twitter#154: application-only authentication with oauth2 support
Fixes sixohsix/twitter#154: application-only authentication with oauth2 support. Tested on python 2.7 and python 3.3.
Mike Verdone [Thu, 18 Apr 2013 15:12:11 +0000 (08:12 -0700)]
Merge pull request #138 from DarkDefender/master
Fix twitter stream under python 3
This fixes the "AttributeError: '_io.BufferedReader' object has no attribute '_sock'" error when trying to create a twitter stream with python3.
There are two issues (#70 and #108) that are fixed with this commit.
I have written a twitter script in python 3 with your twitter lib so I would be really glad if you can merge this and then create a new point release of the lib.
That way I can release the script without having to tell the users to manually patch the twitter lib.
I'm sorry if you've already read my reply to #108. But I would really like this to get merged and released ASAP.
Mike Verdone [Mon, 25 Mar 2013 19:37:48 +0000 (12:37 -0700)]
Merge pull request #134 from pykler/master
Being safe looking for content-encoding header
I managed to bump into a situation with stream.twitter.com where the content encoding header was not in the headers ... so I am making the check more robust to handle the case where the header is not there.
@sixohsix can you consider merging this in before the micro version bump.
Mike Verdone [Sun, 24 Mar 2013 17:46:16 +0000 (10:46 -0700)]
Merge pull request #125 from MrMitch/syntax-highlighting
Make code blocks use syntax highlighting in README
Make code blocks use syntax highlighting so they are more eye-friendly
Also make URL be turned into `<a></a>` markup
Mike Verdone [Sun, 24 Mar 2013 17:23:11 +0000 (10:23 -0700)]
Merge pull request #131 from pykler/master
With Twitter 1.1 an invalid oauth raises httplib.IncompleteRead
An exception is happening in the TwitterHTTPError's __init__ when it is trying to read twitter's error message. This patch catches that error (IncompleteRead Error) and handles it.
Also included in this pull request is a test case to demonstrate this error as well as a runner.py file to help run all the tests.
Mike Verdone [Fri, 15 Feb 2013 15:10:28 +0000 (07:10 -0800)]
Merge pull request #124 from Adapptor/incompleteread
TwitterCall._handle_response(): try to recover from httplib.IncompleteRead
Use IncompleteRead.partial in the hope that what's there is actually complete.
Mike Verdone [Fri, 15 Feb 2013 14:55:12 +0000 (06:55 -0800)]
Merge pull request #122 from patricksmith/update-readme
Update README example code
Updates the examples in the README because of changes to the Twitter API. The changes were basically copied from a similar commit to the API docs: https://github.com/sixohsix/twitter/commit/58ccea4e1489a735d2b01bcdd45677b2c4374f00.
Mike Verdone [Sat, 2 Feb 2013 13:57:42 +0000 (05:57 -0800)]
Merge pull request #118 from DracoThuban/patch-1
Update twitter/follow.py
Sometimes Twitter returns a list of any userID without user.
Users blocked or suspended? I don't know.
This modification prevents the script to stop and return an incomplete list.