| Profiel van CPRandom OracleWeblogLijsten | Help |
|
|
25 augustus Search engine privacy (conclusion)In concluding this series, it is time to revisit a question we have glossed over: to what extent can the privacy features built into a web browser help improve privacy on the web. To recap, previous threeposts discussed HTTP state management (glorified phrase for "cookies"), cookie management functionality built into mainstream browsers and one way to utilize that functionality to limit lifetime of cookies. Now for the caveat emptor. 1.Even when search engine cookies are downgraded, other unique identifiers may exist which are consistent across multiple sessions. These are associated with services that require authentication, such as email, which exist in the same domainas the search engine. Given that 99% of authentication the web uses cookies, itis almost guaranteed that cookies are written after signing in. These cookies may be visible to the search engine, providing an independent unique ID that remains constant over time. This brings up a recommendation made by EFF's Seth Schoen, about not logging into search engine services when searching. 2.Cookies are not the only way to uniquely identify users, even though it is the "official" state management system in HTTP. There are a lot of moving pieces involved in web browsing and any one of them can create opportunities for tracking. Several well-known tricks exist to use the browser cache, history or other client-state to correlate requests over time as belonging to the same user. 3. Even if the application layer itself could be purged of all unique identifiers, there is a fundamental problem in the networking layer. Fundamental design of the Internet requires that two parties communicating know each other's address in the network, known as the IP address, even when the communication is passing through many hops along the way. (This is far from being an absolute engineering requirement. Routing could still work if every node along the path remembered the previous neighbour only, without knowing about upstream nodes.) IP address is at first blush a meaningless identifier like 65.54.153.237, which corresponds to spaces.live.com, but the combination of IP address and timestamp is often uniquely identifying. Looking in from the outside, it is impossible to know whether this information is used to correlate requests beyond what is possible with cookies alone. cemp 14 augustus Search engine privacy and cookie settings (part III)The last posting discussed the idea of downgrading cookies and how it balances privacy with functionality. Next step is to configure a web browser to use this feature. In Firefox this is easy: Tools --> Options, privacy tab and selecting "keep cookies for session only" applies the setting to all websites. Alternatively the site specific settings can be used to specify that cookies from search engines are bounded in lifetime by the session itself. This latter targetted fix can be necessary. Converting all cookies into session cookies does not break functionality as long as the browser is open. In other words within one session, everything appears normal. But long term site customizations and "remember me" type behavior will not work correctly. In Internet Explorer, activating the downgrade functionality is more difficult. IE does not expose user-interface to select this option, even though some of the built in settings use it based on the P3P policy associated with the cookie. Instead a custom XML policy must be imported, using Tools --> Options menu item, selecting Privacy tab and clicking "Import" button to choose a file on the local computer. XML language for the cookie settings is specified on this MSDN page. A quick web search turned up no ready-mode policy to downgrade all cookies, which suggests that creating this for reference may be useful. (Stay tuned for an update to this problem on RandomOracle.) Before moving on to the question of just how much privacy is gained this way, a few words on the technical details. You might be wondering why IE6 uses the shorter of the session lifetime and actual expiration. The reason was to make the downgrade feature more transparent to websites, in order to help compatibility. Good way to view this problem is asking whether a website can detect that a client has downgraded a persistent cookie that website attempted to set. Why is this question interesting? Because if we can answer "no" then we can be confident that downgrading will not have any impact on functionality; a website which can not detect this choice can not experience any adverse effects on functionality. For reference, detecting that a cookie has been rejected is trivial: in fact many websites will verify whether cookies are enabled by attempting to set one and check if it is being replayed. Some websites will not allow users to proceed if that step fails. By contrast, detecting a downgraded cookie in IE6 is nearly impossible during that session: a persistent cookie that was accepted will be replayed, and so will a persistent cookie which got downgraded. Observing the replay, it is not possible for the site to deduce which one happened on the client side. This is precisely the transparency required to guarantee compatibility. But using the Firefox implementation can create one difference that allows distinguishing the outcomes: the lifetime of a short-lived persistent cookie will be extended for the session. Imagine setting a persistent cookie that was only good for 10 seconds-- if this cookie is being replayed after 10 seconds, the website can conclude it has been converted to a session cookie. Caveat emptor: depending on implementation, there are limits on this compatibility guarantee. For example a website that wanted to check if a cookie got downgraded by IE could force a download in a new session. (Challenge: how?) This is more a limitation of the implementation; sharing downgraded cookies and reference-counting them by each session would solve the problem. cemp 13 augustus Search engine privacy and cookies (part II)Both IE and Firefox allow users to disable cookies selectively based on the website. For Internet Explorer versions 6 and above, this functionality is exposed via Tools --> Options menu, under the Privacy tab. Clicking on "Sites" button brings up a new dialog box where you can view and edit the cookie policy for each site. One subtlety that may not be obvious in this UI is that the sitenames listed are in fact wildcards. For example adding "foo.com" to the block list in fact blocks all subdomains such "bar.foo.com." Firefox supports similar behavior in its own privacy tab, also accessed from the Tools --> Options menu. Under the Privacy tab, there is a button called "Exceptions" which defines site-specific rules. One difference is that Firefox UI distinguishes betweenblocking an entire site including all subdomains (as in *.foo.com) and blocking only the top-level DNS name. In principle one can add the search engine websites into the block list to disable cookies inside limited scope, without impacting functionality of other sites. That is a crude solution because these cookies are typically set in the secondary domain, which means that other services offered in other subdomains will also stop functioning. This is where a useful feature introduced first in IE6 comes in handy. Full disclosure: this blogger wrote most of the privacy and cookie management code in IE6. It is called "downgrading." In order to explain why it is useful for privacy, we have to revisit the two types of cookies. First type are persistent meaning that the cookie has a precise expiration date. This cookie is replayed until that time is reached, which can be-- and often is-- years into the future. The second type is a session cookie, which is discarded when the web browsing session ends. (Unfortunately the notion of a "session" is vague and dependent on the application. For example, when there are multiple windows open they be part of the same session or different sessions.) Downgrading a persistent cookie means that it will be discarded either when the expiration time is reached or when the session ends, whichever occurs first. This second part is a bit of a technicality which makes the feature more effective than simply converting it into a session cookie. Incidentally it is also where IE and Firefox differ: the "allow for session" feature in Firefox only implements the first part. Downgrading cookies is a middle-ground between functionality and privacy. Temporary identification by replaying cookies is required to get any functionality out of the web. Continuing to replay the same cookie over a period of time allows that site to build a comprehensive profile by correlating different requests and associating them with the same user. The solution is to accept and replay cookies but only for limited time; after that "short-time" passes the cookie is discarded and the user is back to square, "unidentified" as far as the website is concerned. (A disclaimer is in order here: strictly speaking, there are ways beyond cookies to identify users, but they are less reliable and far less common.) Length of a session turns out to be a natural way to define that time window. (continued) cemp 10 augustus Search engine privacy in the wake of AOL (part I)The disclosure of search query data for 650K users from AOL serves as a wake-up call for the privacy community. Original copy has been taken down from the official AOL site, but the disclosed data is now available online for others to search. In principle the data is anonymized, in the sense that users identified by a numeric ID instead of their real subscriber identity. That is no consolation-- often times the query stream itself is sufficient to give clues to the identity of the user, as Declan's original CNet article points out by looking at the history of all search terms typed in by one user. Once again this proves the difficulty of pseudonymizing data. When enough secondary data is attached to a seemingly-random pseudonym-- as the AOL user ID appears to be in this case-- the result is far from privacy enhancing. What about users that only issued a handful of queries, not enough to create a uniquely identifying profile? Their prospects are better for sure, but even their privacy is not completely protected if AOL can translate user ID back into a real-world subscriber identity. Even having the combination of IP address + timestamp for one of those queries is enough. Those 2 pieces of information are sufficient for an ISP to identify the subscriber if he/she is using a computer at home from with a network connection associated with their name. That will be the case for the majority of homes with a dial-up or broadband connection. And this is a good thing for accountability, because it allows law enforcement to map online identities back to individuals: knowing an IP address and associated time when a request, such as search query, from that IP address is observed, they can subpoena the ISP for information about that user. On the other hand it is bad news for privacy, because seach engines colluding with ISPs can accomplish the same objective for commercial motives for less admirable than law enforcement. Putting aside such conspiracy theories, unless the search engine is the ISP of course as with the AOL case, there are simple steps users can take to reduce the amount of information available to search engines. Emphasis on simple: that excludes installing Tor, using anonymizing proxies, wearing tinfoil hat (this MIT page discusses effective designs) etc. Tweaking browser settings can be that first step that works on every PC and does not require expending significant resources. As expected cookies are the culprit: they allow search engines to recognize users across different sessions. When a new unidentified user arrives at a search engine, the website writes a unique identifier into a cookie-- similar to the identifiers which appear in the AOL data set. On subsequent search queries, the web browser announced that ID to the search engine, allowing the engine to deduce that the user searching for "aspirin" right now is the same one that searched for "absinthe recipes" yesterday. Cookies have been much maligned by the privacy community but the hostility is often unwarranted. Completely disabling cookie functionality, as some recommend, renders the web largely unusable. Shopping carts at ecommerce merchants would stop working, anything requiring authentication would likely break and all types of website customization would disappear. There are less radical work-arounds which can balance the need for privacy with getting maximum value out of the web. (Continued tomorrow.) cemp DefCon 14 observationsWhat is going on here-- DefCon is back on the strip? After getting booted from virtually every respectable establishment in Las Vegas and being consigned to the Alexis Park near McCarran airport for the past couple of years, DC was hosted at the Riviera this year August 4-6th. According to Jeff Moss the search for a new venue had been going for the past three years, when the conference began running to limitations about space. For the first time attendees could focus on listening to the talks inside air-conditioned hotel space instead of a makeshift tent baking in the desert heat. Prediction: considering how badly and consistently the Alexis Park gets trashed every year, after just one experience the Riviera management will wake up, smell the coffee and say "never again," joining a distinguished line up of other major venues who earlier reached the same remarkable conclusion, namely that an underage crowd of aspiring script-kiddies who can't (legally) gamble or purchase alcohol is not exactly prime audience. Actual observation: very little damage was inflicted. Over 6000+ people attended, counting the BlackHat block which immediately precedes DefCon. This exceeded the number of battery-powered blinking-blue-LED badges designed by Joe Grand. Usual crowding occurred, with some of the rooms filled to capacity and enthusiastic audience members standing outside the door angling for a peek at the slides. In particular this situation happened with the Google/search-engine privacy presentation on Friday, which was remarkably well timed in light of the AOL search-results disclosure that broke on Monday. Only visible glitch was the program starting out 2 hours late on Friday, which in itself is remarkable considering that this is the first year for the organizers at the new location. cemp 24 juli OpenSSL and FIPSInteresting story unfolding on the open-source front. The cryptographic library OpenSSL made headlines in January when it achieved FIPS compliance. FIPS stands for "Federal Information Processing Standards" and specifies requirements for implementation. Getting FIPS 140 compliance is generally a requirement to compete in government and defense markets. OpenSSL achieved level 2 which is good for use with sensitive but not classified data. Until that point only commercial offerings had achieved FIPS certification, which generally involves submitting code and documentation to third party auditor/testing lab. (Full disclosure: this blogger has worked on the Windows cryptography code base and its successor CNG in Vista) Last week the National Institute for Standards and Technology (NIST) which oversees the FIPS standards revoked the OpenSSL certification. There is a lot of controversy over that apparent back-pedalling (inevitable Slashdot discussion) but one thing certain is it will cause frustration for customers which already deployed the app or used the associated library for building their own applications. (There is no mention of these developments on the official OpenSSL website where users can still access different flavors for download. It is also included in the popular kit cygwin.) According to the article from GNC, the Open Source Software Institute which sponsored the evaluation is likely to continue the efforts. cemp 22 juli Mobile computing environments in infancyThe promise of easily roaming your computing environment (applications, settings, data and all) without carrying around expensive hardware is still a long way off from crossing the chasm. Another recent entry into this space is the Bio-Computer-On-a-Stick, available from ThinkGeek. It is a USB based device that combines biometric authentication with a compact Linux distribution. The website for the manufacturer FingerGear has other variants including simple USB drives and the computer-on-a-stick without biometric authentication capability. Individually none of the specs are impressive. Only 512MB of space when most USB flash drives from most manufacturers these days are pushing 4GB limit and under $50/GB cost factor. Powering the device is a Linux 2.6.x kernel with Gnome front-end. Applications include Firefox, Open Office suite (stressing the MSFT Office compatibility), PDF viewer, standard Unix tools such as ssh for remote connectivity and open source instant messaging client GAIM. According to the web page, it can boot directly from USB on most computers, or you can use the included CD to get started. Boot times claimed are less than 40 seconds. Perhaps the most unique feature is largely a distraction: there is an integrated finger-print reader which can be configured to become the gatekeeper for access to the device. The website does not make any mention of tamper-resistance or FIPS compliance. Without taking additional measures to block access to underlying storage, finger-print reader will only stop the casual attacker. Equally controversial will be this security claim from the website: " This can be misleading. When the device is plugged into a host PC, it is still very much at the mercy of that PC. Rogue host can still copy everything off the device or feed the device bogus data. That Word document the user was editing, can get corrupted in memory before being saved back into the data segment. In practice this is going to be difficult because the host OS is not used; malware must embed itself in BIOS or perhaps into a hypervisor that takes control before the ordinary boot sequence from the USB drive starts. (Judging by the specs, the OS itself is on write-protected non-flash memory which ensures that at least it can not be tampered with even by untrusted host.) There is also the obvious problem that you can not roam the OS of your choosing-- you are stuck with what is available on the distribution itself. Flexibility this is not. For all these problems, this is still a promising start towards mobile computing made practical. cemp Post-script: ThinkGeek website says they are sold out due to popularity of the item. 08 juli Visualizing a social networkCombining some Perl scripts with the Graphwiz package, here is a visualization of a fragment of MSN Spaces around a colleague on the Windows Live Identity team. cemp 02 juli RFID security concerns: scanners everywhere, darkly(Continuing an earlier post on the AmEx Blue cards with RFID chips.) Perhaps the biggest problem with RFID tags is the lack of consent element. Existing forms of identification require affirmative step by the individual before he/she can be identified. It could very old school as taking out a drivers license from your wallet or inserting the bank card into the ATM machine. Because RFID works at a distance-- and the preferred term these days is "contactless card" to emphasize that-- the tag can be read surreptitiously. All that is required is a transceiver operating on the right frequency; by design the tag broadcasts its identity to any source willing to inquire. (Kim Cameron described it as a "beacon" broadcasting identity in an omnidirectional way, as part of this critic of existing identity systems in the influential paper The laws of identity.) That lack of user control is the basis for many of the nightmare scenarios critics of RFID like to spin. Smart bill-boards could select an advertisement based on scanning active RFID tags on objects you are carrying. Retail stores could immediately profile a customer the moment she walks in the door, based on scanning objects they are carrying. This is either a marketing dream or privacy invasion, depending on the point of view. There are also darker predictions of high-tech criminals wondering around with RFID scanner, searching for tags to copy information from. It is not comfort that the intended reading range is a few inches. That is for normal usage, not an adverserial minded world where criminals have motivation to scan the identifier on the tag against the user's own interest. For example the next generation of US passports have RFID chips. The claim that they had a very limited range for reading lead to an interesting showdown at CFP 2005 in Seattle, between the State Department's Frank Moss and ACLU's Barry Steinhardt. Speaking on the same panel, Moss fired the first shot by insisting that the passports could not be read at distances greater than 4 inches. Steinhardt counterd later with a live demonstration showing that it could in fact be read at a distance of about one yard using home-brew electronics. Clearly this is a pure technology race: more advanced readers will achieve greater range, depending on the transmission power. Applying the same logic to credit cards, the question becomes: what prevents unauthorized scanning of information on the card, and cloning it for use in unauthorized purchases? Annalee Newitz has written extensively about the dangers of RFID, including a piece in Wired. The subject is close to her heart, or actually in this case close to her arm-- she has an implanted RFID chip from Verichip. This application is unusual in humans, although it has been standard fare for identifying pets. One driving scenario is controlling access to restricted areas. Instead of carrying cards, typing in codes or staring into a camera for retinal scans, users will wave their arm at the reader in order to unlock the door, this story goes. Newitz's story shows that the cloning risks are very real in this case: with a little help from the subjects interviewed for the article, she was able to clone the RFID tag in her arm. cp 04 mei CFP2006 first dayQuick impressions from the Computers, Freedom & Privacy 2006 conference, in Washington DC. My colleague Mike Hintze, senior attorney with MSFT, presented on an interesting panel yesterday about federal privacy regulation. He pointed out that current patchwork of state regulation leads to confusion and the lack of consistent assurance in data proteciton is starting to erode customer confidence. New York Times author Eric Lichtblau, winner of the 2006 Pulitzer prize, appeared on a panel yesterday to discuss his experiences reporting on controversial current events. cemp 12 april On the virtualization frontMany developments in this space. 1. Couple of months ago VMware announce the availability of the VMware Player as a free download. As the name suggest, this app allows running existing virtual machines but not creating new ones. Upping the ante, the Player is compatible with Microsoft Virtual PC and Virtual Server images. For the first time, it is possible to completely package a machine image and share this with a friend without worrying about whether they have the necessary software installed to run it. 2. Drive towards zero pricing continues: more recently, Microsoft announces that Virtual Server R2 is now a free download including the x64 version. VMware likewise offers VMware Server which replaces the GSX skew, for free starting with the beta that boasts "experimental support for Intel VT" (Virtual PC was already available as a 45 day free download.) 3. Last week Apple's Bootcamp technology-- which allow dual booting between Mac OS and Windows-- received all the attention. Lost in that commotion was the announcement of virtualization solution for Macs from Parallels. It supports Windows and Linux as guests. cemp |
|
|