Thursday, November 11, 2010

Google Releases Data Cleanser

Google has updated and re-released open-source software for cleaning, analyzing and transforming data sets, now called Google Refine.
The software, originally called Freebase Gridworks, came with Metaweb, a company Google purchased in July.
Google Refine is a collection of tools that could come in handy when wrangling useful information from a data set, particularly ones that have data inconsistencies.
This desktop application can, for instance, find all the variant spellings of a word in a data set and replace them with the appropriate term. This process, called normalization, is nothing new. But normalizing data usually requires writing code that is specific to one data set, noted Christopher Groskopf, a developer for the Chicago Tribune.
"The genius of Gridworks is that it is generic enough to work for a wide variety of data sets without the need to write any code at all. Even better the resulting operations are portable, so the process used to clean up 2009′s data can be repeated for 2010," Groskopf wrote in a blog post.
The software contains a number of other tools as well. It includes an expression language that can be used to analyze a set of data. Filters can be used to isolate subsets of data, which then can be analyzed or changed through a set of transform commands.
The software works with plain text files, the data in which can be split into different columns by the use of commas. Results can exported back out in the JSON (JavaScript Object Notation) format, which can then be easily transformed into HTML tables or other formats.
The software can work with up to a few hundred thousand rows per data set, depending on the user's computer memory. And unlike most spreadsheet software, this software can interactively transform large subsets of data, the company asserted.
Google said this week that it has added several new features to the software, officially called Google Refine 2.0, including the ability to link records to other databases, and a number of new transformation commands and expressions.
The non-profit government watchdog organization ProPublica has used this software to aggregate data from seven different data sets to show how pharmaceutical companies pay doctors to recommend certain medications.

Tuesday, November 9, 2010

Facebook finds loophole in Google's ban, rivalry escalates

Google is all about transparency, unless Facebook is involved. Google recently decided to cut off Facebook’s access to its Gmail clients, citing the social networking site’s refusal to share its own information. Previous to this ban, Facebook users were able to instantly add Gmail contacts on Facebook to their Friends, a service no longer provided.
Add to this a recent partnership allowing rival search engine Bing to include Facebook profiles in its search results, and you can imagine there’s some tension between Google and Facebook.
But if Facebook is anything, it’s resourceful. Today the site rolled out a link under the “Find Friends” tab, allowing users to directly download load their Gmail contacts. Under the tab, there’s an option labeled “Other Tools” where you can choose the “Other Email Service.”  Go ahead and enter your Gmail address, only to be told there are no results – but you’ll also see a link that lets you download your Gmail contacts as a CSV file. This is what you can then upload to Facebook with one click – and voila, all your Gmail contacts on Facebook are available.

It looks like the workaround is staying put. “We’re disappointed that Facebook didn’t invest their time in making it possible for their users to get their contacts out of Facebook. As passionate believers that people should be able to control the data they create, we will continue to allow our users to export their Google contacts,” a spokeswoman for the company commented.
Facebook declined to make an official comment, but platform engineer Mike Vernal from the company wrote on TechCrunch today, criticizing Google for saying one thing and doing another. He accuses the company of “being open when its [sic] convenient,” and saying “E-mail is different from social networking because in an e-mail application, each person maintains and owns their own address book, whereas in a social network your friends maintain their information and you just maintain a list of friends. Because of this, we think it makes sense for e-mail applications to export e-mail addresses and for social networks to export friend lists.”
Vernal closes by saying, “We strongly hope that Google turns back on their API and doesn’t come up with yet another excuse to prevent their users from leaving Google products to use ones they like better instead.”
While users will remain able to upload their Gmail contacts (an issue very little people actually seem concerned with), look out for what promises to continue to be an escalating rift between the two Internet giants. Get ready for a host of little more than snide press releases and catty insults.

Sunday, November 7, 2010

BP to sign exploration deal with Chinese company: report

LONDON (AFP) – British oil giant BP is set to sign a major exploration deal with China's biggest offshore oil and gas producer when Prime Minister David Cameron visits Beijing next week, a report said on Saturday.
Sky News said BP was hoping to conclude an agreement with state-owned China National Offshore Oil Corporation (CNOOC) to explore an area of the South China Sea, where the two companies have already worked together.
A BP spokesman refused to comment on the report.
AFP/File – A British Petroleum sign is seen at a station. The BP is set to sign a major exploration deal with China's 
Cameron, accompanied by four of his top ministers, starts a two-day visit to China on Tuesday during which he will meet President Hu Jintao and Prime Minister Wen Jiabao for talks which are expected to focus partly on energy issues.
BP is seeking to rebuild its international reputation after the disastrousoil spill from a well it operated in the Gulf of Mexico.
The company said this month the spill will cost it 40 billion dollars after ramping up its estimate by almost a quarter.

Saturday, November 6, 2010

Cable subscribers flee, but is Internet to blame?

NEW YORK – TV subscribers are ditching their cable companies at an ever faster rate in the past few months, and many of them aren't signing up with a satellite or phone competitor instead.
Their willingness to simply go without pay television could be a sign that Internet TV services such as Netflix and Hulu are finally starting to entice people to cancel cable, though company executives say the weak economy and housing market are to blame.
Third-quarter results reported this week by major cable and satellite TVcompanies show major losses, but don't settle the question of what's causing them.
If "cord-cutting" in favor of Internet video is finally taking hold, that has wide-ranging implications. Consumers who use the Internet to get their movies and TV shows bypass not just the cable companies, but the cable networks that produce the content. The move could have the same disruptive effect on the TV and movie industries as digital downloads have already had on music.
AP – FILE - In this Feb. 2, 2009 file photo, a Time Warner Cable truck is parked in New York. Time Warner 
A few weeks ago, the CEO of phone company Verizon Communications Inc. likened cord-cutting to what started happening to the local-phone companies five or six years ago, when people started giving up their landlines in favor of relying solely on their cell phones.
"The first thing when that happens is you deny it," Ivan Seidenberg said. "I know the drill. I have been there."
On Thursday, Time Warner Cable Inc.'s chief operating officer, Landel Hobbs, said the company doesn't see evidence of people dropping cable in favor of the Internet. He said the biggest subscriber losses have been among people who don't have cable broadband services; high-speed Internet — from cable or a competitor — is key to watching video online. These people seem to be going to satellite or giving up on pay TV entirely.
On the theory that college students might be among the first to drop cable TV, the company looked at changes in subscriber figures in college towns such as Austin, Texas, and Columbus, Ohio. They weren't out of line with previous years, and they corresponded to the level of student enrollment, he said.
"We'll continue to monitor cord-cutting, but haven't found evidence where you might expect to see it," Hobbs told analysts on a conference call.
Time Warner Cable lost 155,000 video subscribers in the July-September quarter, compared with 64,000 a year ago.
The only larger cable company, Comcast Corp., reported last week that its subscriber loss more than doubled in the third quarter, to 275,000. Comcast said many of those leaving had taken advantage of low introductory rates that the company offered last year when the analog TV broadcast network was shut down.
Of the satellite companies, DirecTV gained subscribers and Dish Network Corp. lost them. On a conference call Friday, Dish CEO Charlie Ergen said the Internet was making itself noticed as a competitor.
"You know, my kids think I'm crazy for being in the pay-TV business because they don't pay for TV. They don't pay for movies," Ergen told analysts.
The country's eight largest publicly traded pay-TV companies, representing about 85 percent of the subscriber total, had reported their results for the third quarter by Friday. These cable, phone and satellite companies showed a combined gain of 66,700 video subscribers, or a 0.3 percent increase at an annualized rate, about a third the growth of the population.
The figure was a slight recovery from the seasonally weak second quarter, when they gained just 12,400 subscribers. But it's far short of the 401,300 subscribers gained a year ago.
Missing from the tally is the third-largest cable company, Cox Communications, which is privately held and doesn't report subscriber counts publicly. If it lost cable subscribers at the same rate as Comcast and Time Warner Cable, the nine largest pay-TV companies had zero net gain for the latest quarter and lost subscribers in the second.
Cable companies have been losing video subscribers for some time, but they have been compensating by upgrading basic subscribers to more expensive digital tiers, as well as adding broadband and phone subscribers.
However, both Time Warner Cable and Cablevision Systems Corp. lost digital video subscribers in the third quarter. Both added record-low number of phone subscribers, as years of growth are coming to an end.
Meanwhile, Netflix Inc.'s streaming service has become so popular that it is now the largest source of U.S. Internet traffic during peak evening hours, according to Sandvine Inc., a Canadian company that supplies traffic-management equipment to Internet service providers.
A variety of gadgets can send Netflix's streams to the living room TV, including game consoles and the $99 Apple TV box. Many high-end TVs now come with the built-in ability to play Internet content.
Thomas Clancy Jr., 35, in Long Beach, N.Y., canceled the family's Cablevision subscription this spring. He said he has been happy with Netflix and other Internet video services since then, even though there isn't a lot of live sports to be had online.
"The amount of sports that I watched certainly didn't justify a hundred-dollar-a-month expense for all this stuff. I mean, that's twelve hundred dollars a year," Clancy said. "Twelve hundred dollars is ... near a vacation."
But Clancy — who has no relation to the thriller writer — is also an example of the hurdles cord cutters face. He uses an Internet-connected Blu-ray player to get Netflix movies to the TV. And he pulls a cable from his computer to the TV for Internet content Netflix doesn't have. Clancy owns a computer consulting firm and is tech-savvy enough to do all that. Most people wouldn't know how.
Cablevision wanted to raise Clancy's Internet bill when he canceled TV service. That would have made cord-cutting less attractive, but he happens to live in an area where Verizon provides Internet service at speeds that are comparable with the best cable has to offer. He got a better deal from Verizon and switched to that provider.
Most people who have the technological skills to take advantage of Internet video find that the selection of movies and shows isn't broad enough to make the jump worth it, Sanford Bernstein analyst Craig Moffett said.
On the other hand, poor people have an excellent motive to cut cable and simply replace it with an antenna or nothing at all, he said.
"The price of cable TV has risen to the point where it's simply not affordable to lots of lower-income homes. And right now there are an awful lot of lower-income homes," Moffett said. "The evidence suggests that what we're seeing is a poverty problem rather than a technology phenomenon."
In addition, high unemployment means fewer new households, as kids are probably delaying moving out of their parents' houses, or people move in with roommates. That can reduce the number of households that pay for TV.
Cable companies would like to get low-income customers back with cheaper cable packages, but their hands are tied. Content providers such as The Walt Disney Co. and News Corp. won't license their channels one by one, so subscribers have to take big, expensive channel packages, or very basic ones, which offer little beyond what's available with an antenna.
Content providers now get billions of dollars in fees from cable service providers, and they want to make sure that whatever new industry model comes along, they'll get paid. It's not obvious yet that Internet video will let them sustain their profit levels.
Six companies create the content that consumes 85 percent of U.S. viewing hours, Moffett said. "Until they get on board, the train's not leaving the station."

Wednesday, November 3, 2010

Perfect storm for GOP: Obama base stays home, white voters defect

Like converging thunderstorms, two distinct trends collided Tuesday night to power the Republican Party to the largest midterm gains for either party since 1938.
The portions of the electorate that remained loyal to President Obama and Democrats - particularly minority voters and young people - did not show up in anywhere near the numbers they did in 2008. And among the voters who did show up, Democratic candidates suffered crippling defections among white voters, particularly independents, seniors, and those without a college education, according to the national network exit poll of House elections.
The long-anticipated enthusiasm gap manifested itself in force Tuesday: The exit poll found that Republicans equaled Democrats as a share of the electorate; just two years ago, Democrats outnumbered Republicans as a share of all voters by 40 to 33 percent. That shift reflected the declining participation of some of the Democrats' best groups and a surge among those favoring Republicans.

Young people, who cast 18 percent of the ballots in 2008, dropped to just 11 percent. That was a slightly larger falloff than is typical in midterm elections. Likewise, the falloff between the minority share of the vote in 2008 and Tuesday night was the largest decline between a presidential and the subsequent midterm election in at least the past two decades. Two years ago, minorities cast 26 percent of all ballots in the presidential election; this year that number fell to 22 percent. Both groups largely stuck with Democrats - but their impact was severely diluted by their declining turnout.
Meanwhile, seniors, who represented one-sixth of voters in 2008, soared to fully 22 percent - their largest share since at least 1992. And nearly three-fifths of them backed Republican House candidates. Among white seniors, that number rose to over three-fifths.

Overall, the national exit poll measuring preferences in House races put the Republican vote among whites at a jaw-dropping 60 percent, up sharply from 53 percent in 2008. Democratic candidates attracted only about 35 percent of the vote among white men and women without a college education and college-educated white men. Following patterns evident in Obama's approval rating, the only segment of the white electorate that didn't collapse for Democrats were college-educated white women. But even they tilted slightly toward the GOP.
The stampede toward the GOP among blue-collar whites was powerful almost everywhere. In heartland states such as Arkansas, Ohio, Indiana, and even Illinois, Democrats were routed among college-educated whites, too, the exit polls found. But along the coasts - in such states as Delaware, California, and Connecticut - Democrats did a better job of holding college whites, especially women. That was critical to their Senate victories in those states. The exception to that coastal pattern was in Pennsylvania. Republican Pat Toomey attracted more of those suburban voters and, crucially, remained competitive in the Philadelphia suburbs (which two years ago gave Obama a crushing margin of nearly 200,000 votes); that helped power Toomey's narrow victory over Democrat Joe Sestak. In Colorado, which shares many cultural characteristics with the coastal states, strong support among college-educated whites in metropolitan areas such as Denver and Boulder allowed Democratic Sen. Michael Bennet to remain in a tight race with Republican challenger Ken Buck despite Buck's massive advantage among noncollege whites, especially in the state's rural areas.

Overall, Tuesday had the feel of a parliamentary election in which individual candidates had only limited ability to separate themselves from the national tide. Put another way, the name on the back of the jersey mattered less than the color on the front.
Particularly in the House, Democrats from all segments of the party were swept away. The Democratic House losses were greatest in the sorts of places that have been most skeptical of Obama from the outset. Democrats representing districts that voted for John McCain in 2008 were routed: Republicans appear to have gained at least three-fourths of the 48 seats now held by Democrats in that category.
In a geographic reflection of Obama's weakness among blue-collar white voters, a partial count showed that Republicans captured the seats of at least 35 House Democrats in districts where the percentage of whites with a college degree lags the national average of 30.4 percent. House Democrats elected in 2006 and 2008, when George W. Bush's weakness allowed the party to expand deep into traditionally Republican terrain, also suffered heavy losses. Geographically, Democrats were especially hard hit through the border states and industrial Midwest: The party lost five House seats in Ohio, five in Pennsylvania, three in Tennessee, two in Indiana, and at least three in Illinois. Meanwhile, Republicans captured governorships in Michigan, Wisconsin, and Ohio, flipped Senate seats in Wisconsin, Indiana, and Illinois, and easily held an open Republican seat in Ohio.

But the election's blast radius extended well beyond those highly-vulnerable categories. Besides the freshman and sophomore Democrats, the election also claimed veteran House leaders such as Ike Skelton of Missouri, Paul Kanjorski of Pennsylvania and John Spratt of South Carolina. While the House losses were greatest in downscale blue-collar districts, Democrats also lost white-collar suburban seats in New Jersey, New Hampshire, and the Philadelphia suburbs, and failed to carry the suburban seat vacated by Mark Kirk, the successful GOP Senate challenger in Illinois. Those losses also extended the Democrats' vulnerability beyond swing states to reliably blue states that have been cornerstones of the party's coalition since the 1990s, including New York. There, the party dominated statewide races, but lost a stunning five House seats, mostly through economically squeezed upstate districts. Few Democrats anywhere Tuesday night could feel entirely sheltered from the storm.