Monday, December 6, 2010

PHP - Managing Memory When Processing Large Files

Something interesting I found out about PHP when processing huge files. The memory garbage collector doesn't always work the way it is intended and common tricks don't always work either.

What's even more frustrating is the common method of reading a file line by line causes huge memory leaks.
Here's my findings and solutions: (feel free to correct me if I'm wrong, even though it worked for me)

Common method fopen+fgets fails:
fopen() with fgets() to read line by line on files that contains almost a million lines will cause crazy memory leak. It takes only 10 seconds before it consumes pretty much 100% of the system memory and go into swap.
   My solution:
   use "head -1000 {file} | tail -1000" is much less memory intensive. The exact number of lines to process varies depending on the system speed. I had it set to 2000 and was running very smoothly.

Garbage Collector fails:
PHP's garbage collector fails to clean up memory after each loop iteration even if unset() is used (or set variables to null). The memory just keep on piling up. Unfortunately "gc_collect_cycles" which forces the garbage collector cycle to run, is only available in PHP 5.3 branch.

Example Code:
for ($i=2000; $i<=1000000; $i+=2000) {
    $data = explode("\n", shell_exec("head -$i blah.xml | tail -2000"));
    //parse using simplexml
    unset($data);
}

My Solution
You can FORCE the garbage collector to run by wrapping a process in a function. PHP does clean up memory after each function call. So for the above code example, if re-written, memory will happily hover over 0.5% constantly.

Example Code:
for ($i=2000; $i<=1000000; $i+=2000) {
    $data = shell_exec("head -$i blah.xml | tail -2000");
    process($data);
    unset($data);
}

function process($data) {
    $data = explode("\n", $data);
    //parse using simplexml
    unset($data);
}

Friday, March 19, 2010

New Google Search on Chrome

Google certainly love their own browser more than any other browsers. Who doesn't love their own child?

Today, Google has changed the entire search experiences just for the Chrome. If you navigate to http://www.google.com, you'll get to see a slightly re-styled layout. But as you proceed on your search, you'll soon find out the search results are completely re-arranged, re-styled, and re-worked.

It looks like Google is taking a step up against Bing's decision driven search engine and also they took a step up in terms of user friendliness.
The left rail is now filled with search filters and options. The most used "Everything", "News", and "Maps" is very useful.

Here's a couple screenshots:


Thursday, March 4, 2010

WoW Account Restored

I have finally gotten my hacked WoW account restored by Blizzard. After lengthy investigation and restoration, Blizzard's "specialists" were kind of enough to restore most of my items. They tried the best they can to restore all my lost gears and for other items, they simply gave me 2500g per level 80 characters plus 14 emblems of frost and 70 emblems of triumph to cover the loss.

I am very pleased by Blizzard's customer service quality, except the fact that their hotline was nearly impossible to call in. Everything got restored just in time for ICC.

Now the biggest challenge I'm facing is squeezing time out for playing WoW and I'm slowly getting cornered by other priorities in my life. Time has become extremely precious for me. With LA's traffic, by the time I get back home, it is already 7:30PM or 8:00PM. After dinner and playing with my son, there's pretty much no time left for anything other than going to bed. It has been a few days since I have even touched my game machine. Strangely, I don't feel sad at all. I do miss WoW, but I value activities with my family far greater than quality WoW time. Yah, there it is, I've said it and maybe one day I will quit WoW as well (that will probably be a while :p)

Wednesday, March 3, 2010

SSH Clients on Windows

One of the thing I really love Mac and Linux over Windows is the built in Terminal. Unfortunately, majorities of computer users are using Windows. For programmers who work for larger corporations are pretty much stuck with Windows.

When it comes to using SSH on Windows, there aren't that many great choices out there. The built in Command-Prompt just doesn't cut it even if Microsoft somehow manages to have ssh support directly built in.

Currently, I'm switching between PuTTY and mintty. Maybe once I have figured out how to overcome windows 7's permission restrictions, I would drop PuTTY completely for mintty.

Here is a list of popular SSH clients on Windows:

  1. PuTTY
    This is probably the most popular SSH client on Windows. It's extremely light weight and straight forward to use. After almost 20 years in development, it is still in beta. There is a long wishlist and most of them have been on pending status for a long time, like the popular wish for having tabs. 
    However, tabbed PuTTY isn't a dream. Currently, there is an alternative solution. It's called PuTTY Connection Manager.
    A little Trick: Once you've downloaded putty.exe, move it to /windows/ directory. This way you can launch putty by simply type "putty" in the Start->Run prompt. 
  2. SecureCRT
    Although not free, but it does come with pretty much every feature you'll probably ever need from a ssh terminal. For larger corporation with enough budget, offering SecureCRT to programmers will definitely put some smiles on their face.
  3. mintty
    mintty is a small but excellent terminal emulator for Cygwin. I'm particularly not a huge fan of Cygwin, but mintty does offer some of the natural features that you'll find on Terminals available on Mac and Linux. Believe or not, mintty is based on code from PuTTY, so do expect that it doesn't have tabs.

Wednesday, February 3, 2010

PHP Session Storage

This is a continuation of my older article PHP Memcache Extension - Lesser Known Pitfalls. Reader Tony pointed out a very strong statement I made regarding not to use memcache as PHP session storage.
I have to admit that was some strong statements I made when I was writing that post. Thank you Tony for pointing that out. :) I have since changed the wording a bit to not deny the all usages of memcache in replacement of session.

Here's a couple situations I think are not good ideas when using memcache as the primary session storage:

  1. putting shopping cart data in memcache on a high volume e-commerce site. if the memory on the memcache server runs out or it crashes, your customers will instantly loose everything in the cart.
  2. putting user session data in memcache. this one is arguable. again, in a high volume situation, if memcache server goes away, just imagine how many queries will be triggered on your database.

I'm sure it's very arguable that none of those will be problems when using clustered memcache servers and have memcache servers installed with 64GB of memories (God forbid if all 64GB of data all got lost because of a power outage)

However, if the size of each session is controllable, growth of session data is foreseeable, and in case of failure, data is recoverable without triggering major disaster, I'm definitely pro on using memcache as the session storage. There are many ways to achieve this. Here's a couple ideas of mine:

  1. Controlling the size of each session can be easily achieved by optimizing your code. A lot of times I see people toss everything into a session (like user data) just in case a piece a data might be needed somewhere, sometime in the future. This leads to a lot of unnecessary data being stored in session. 
  2. Optimize the query that gets the data before it is set into session. In case of failure, you need to ensure that your database can handle the amount of traffic for recovering the data. If each lost session requires a major join query, you can well guess how long the database will last.
  3. Have a backup plan for the session data. If you have the luxury of using data storage mounts backed by NAS, utilize it. Put your session data in memcache for fast access, leave a copy on the NAS for faster and safer recovery. (remember, I don't mean leave a copy permanently)
  4. Again, if you have the luxury of using storage mounts backed by NAS, try use SQLLite for your user data. Each user will have his/her own SQLLite file. Whenever data needs to be retrieved, SQLLite file gets hit first. Imagine the load gets spread across thousands of disks. 
All in all, I'm not opposed to any form of session storages as long as all sides of it are well thought out and planned out.






Sunday, January 31, 2010

Fried Video Card

Things are just not heading towards North as I wished. Ever since I had my WoW account hacked, it seems like every little thing can turn South at the most unexpected time.

Just as I was trying to finish up the dev work on the three Chrome Extensions, my video card gave up on me. Sadly, I'm on a Dell XPS400 that was built 4 years ago and the video card is only a GeForce 6 series. I know, it's really a joke to most of  you guys. It's definitely a legacy video card now consider that it's only PCI-Express x16 without the buzzy word "2.0". I guess I should've replaced it when the fans started making noises....  Everything is too late now. It finally frozen up my screen and gave up on itself.

I'm actually writting this post from my wife's laptop which I'm not allowed to use for programming. It's a netbook strictly bought for her for reading online novels (those chinese love novels.... are they really that good?)

I have placed an order for a new video card, a GeForce 9800 GT, from newegg.com. It should arrive by tomorrow. For those of you who came from the support link on my extensions, please be patient. I'll be back on track in a couple days, promised.

Friday, January 22, 2010

First Tweet from Space



Just about 10 hours ago, Timothy J.(TJ) Creamer, a NASA Astronaut tweeted from the International Space Station. This marks as the 1st tweet from space.

According to the statement release today by NASA, "Astronauts aboard the International Space Station received a special software upgrade this week – personal access to the Internet and the World Wide Web via the ultimate wireless connection."

This personal Web access, called the Crew Support LAN, takes advantage of existing communication links to and from the station and gives astronauts the ability to browse and use the Web. The system will provide astronauts with direct private communications to enhance their quality of life during long-duration missions by helping to ease the isolation associated with life in a closed environment.


During periods when the station is actively communicating with the ground using high-speed Ku-band communications, the crew will have remote access to the Internet via a ground computer. The crew will view the desktop of the ground computer using an onboard laptop and interact remotely with their keyboard touchpad.


Ok, they don't have full access to internet all the time, but probably a few hours per day as the station orbits around the earth. Also, the access isn't directly through a dell/mac they brought with them, but rather through a computer on the ground which the astronaut will access using remote desktop. Still though, this is really cool. This gotta be the most interesting place to use twitter and marks the first step toward full internet access in space.

Now, here comes some fun facts from the statment:

Astronauts will be subject to the same computer use guidelines as government employees on Earth


This translates to as:  no porn, no WoW, and the other 100,000 noes from the government computer use guidelines handbook.

Regular Expression Checker v1.1.2 Released

This is a very minor release, but it did include a significant update.
The tool now accepts html or any other markup language format. Previously, since the results are directly outputted into a div, any thing wrapped within < and > would been parsed by the browser. These tags are now properly parsed and regular expression matching mechanism has been updated to support this change.

Release Notes:
v1.1.2
 - support for html is added
 - some code optimization

Monday, January 18, 2010

WoW Account Hacked

Sigh.... The unthinkable has happened. My WoW account was compromised last night. Some punk ass somehow got my logon info and went through all of my characters across realms. Everything that could be sold was gone and all my gold is gone as well. My characters are practically naked now.

There have been lots of phishing emails lately. So far, I've been able to identify them all. This time, I'm really suspecting it's the UI Addons I've been playing around with lately. There were probably key loggers in those freaking UIs. To up the security, I've just ordered an Authenticator and I will reinstall all my computers at home tonight.




I have contacted Blizzard via email and webform. So far their phone line has not worked. According to their support page, there's a chance to restore all my items/gold if I report the compromise within 2 weeks (which I did, obviously).  But, this restore can only happen once per account. If the account is compromised again, only lost characters would be restored, but no gold/item would be reimbursed.

When an account is reported or its security appears to have been compromised, the World of Warcraft Customer Support Staff will disable access to the account while we conduct an investigation. Once we are confident that account access privileges have been restored to the registered user (and only the registered user), we will enable access to the account. While we will attempt to restore any items/gold missing from the account, we cannot guarantee that lost items/gold will be reimbursed. Reimbursement is only a possibility if the situation is reported within 2 weeks (14 days) from the date the account was first compromised. Additionally, restoration due to an account compromise is offered only once. Should the account become compromised in the future, we may restore deleted characters, but no reimbursement of items/gold lost will be offered. 

Thursday, January 14, 2010

Chrome Armory v1.3.2 Released

After staying up very late last night to push out v1.3.0, I was very happy with the new code optimizations. Now that the whole code base is based on a much stronger structure/framework, I started doing some reality checks on the list of features on my notebook (btw, it was updated early this morning after seeing the new wowarmory). Surprisingly, most the features can be done much sooner since the implementation part has gotten much easier and cleaner.

I had a long 3 hour drive home from work today (LA traffic SUCKS). Usually it's a long ass boring drive, but not today. My brain was going through each planned/unplanned features to figure out how they can be done and in which order they can be released. Don't under estimate the power of 3 hour traffic jam. I've not only reordered the release milestones on my basecamp, but also added a few more features onto my notebook after I got home.

So here it is, v1.3.2. I quickly finished off some of the features that were interrupted by yesterday's emergency release and also added a simple but very very neat feature - 3D Model Viewer.

Release Notes:
v1.3.2
 - added new character 3D model viewer
 - added base/ranged/melee/spell/defense stats of the character
 - lots of code optimization and speed optimization



Wednesday, January 13, 2010

Chrome Armory v1.3.0 Released

v1.3.0 is supposed be a big release, but since Blizzard decided to update wowarmory.com, this version has been taken over by code updates to support Blizzard's updates.

However, this release isn't a minor patch release. It did contain quite a bit of changes and including a new feature. I went through major refactoring over the quick and dirty codes (remember, this extension was my first extension and took me only half a day to finish v1.0). The end result is a more organized, more secure, and much faster code base. You'll probably notice the load time on a character's profile has increased. This round of refactoring is only the first step of many that I have planned.

My goal is to have a release at least once a week (every Sunday). There are many ideas on my notebook (on Evernote), so stay tunned in for more. :)

Here are the release notes:

v1.3.0
 - major release to support new wowarmory.com UI
 - added character specs
 - detects the active and inactive specs

Tuesday, January 12, 2010

China's Top Search Engine Baidu Got Hacked

China's top search engine Baidu (百度) was hacked this morning.
It appears that the hacker, who call themselves "Iranian Cyber Army", changed Baidu's DNS record to point to another site.
A Baidu insider told Chinanews.com.cn that the problem is solved and the site should be back in half an hour.

Ok, this is definitely embarrassing. It is ok to make mistakes, even Google makes mistakes too as listed being one of their top five viewed blog posts of 2009. But getting your DNS record changed, that's like having your backyard light on fire by someone else.

Google opens up file uploads through Google Docs

I really feel like I should rename my blog to something Google related. It seems like almost every blog I post is about Google.

Google has announced on its official blog that Google Docs will be opened up in the next few weeks for all kinds of file uploads and supports file size up to 250MB per file. Files uploaded can also be shared with other people just like the docs right now.

This is certainly some great news for all of us. For the past couple years, GDrive has been on my and probably on many of your wishlist. I know this is not officially the GDrive, but it certainly is just as exciting.


Over the next few weeks, we’re rolling out the ability to upload all file types to the cloud through Google Docs, giving you one place where you can upload and access your key files online. Because Google Docs now supports files up to 250 MB in size, which is larger than the attachment limit on most email applications, you’ll be able to backup large graphics files, RAW photos, ZIP archives and much more to the cloud. More importantly, instead of carrying a USB drive, you can now use Google Docs as a more convenient option for accessing your files on different computers.


This feature can also help you work with teams to organize and collaborate on information online.

Google twitter search results

Just came across something really interesting on Google's search result.

I was searching for "chrome extensions" and noticed the twitter search results were listed very differently. The results are updated in real time. If you wait and see, you'll see more results gets added to the top.


Sunday, January 10, 2010

5 AVATAR's Nicest Touches

I just came back from the theater after the 3 hour long AVATAR movie in IMAX 3D. It was just beyond awesomeness. The picture quality was by far the best I've ever seen and the sound effects literally shook the floor. If you haven't seen it, go watch it. And if you have seen it and didn't see it on IMAX 3D, go see it on IMAX 3D. Wait, did I mention IMAX 3D. GO SEE IT ON IMAX 3D

This marks my first experience of IMAX. Yes, as a "nerd", this is kinda weird, but late is always better than never right? :) I'm so glad that I saved my IMAX experience for AVATAR. I don't want to spoil for people who has not seen the movie, so I'm not going to share any in-depth details of the movie, however, as I'm so excited, I will share some of the nicest touches from the movie (don't worry, they won't spoil the story).



Created with Admarket's flickrSLiDR.


Nicest Touch #1
The bugs in the forest of Pandora. The little insects were really nice touches on the 3D screen. As characters moved by them, they were flying in all directions just like the real ones. It was so real that I lifted up my hand started to "shuuuu" them off.

Nicest Touch #2
The ashes and burning ambers. This is potentially a story spoiler, so I'm not going to share which scene it was. However, you can try to put together a disaster movie image where ashes and ambers are being blown by the wind, flying across the screen. I'm sure this is not going to be as fun for a lot of the Californians who lived through the wild fires because the scene was that surreal.

Nicest Touch #3
The movement of the AVATARS. You think the movie is a huge CG'ed project? Then you're wrong. Every character in the movie, this includes the AVATARS, is real. Even though you know the AVATAR has to be CG'ed, but you just can't feel the stiffness movements generated by computer at all. Why? Because no movements were emulated by computers. They were all recorded using motion censors. That's right, every single movement, including the mouth. When the AVATARS talk, their facial muscles move just the same way as a human, there were no differences at all.

Nicest Touch #4
The scorpion helicopters. This again can potentially spoil your experience, so I'm just going to hint it. The mechanics of the helicopters literally followed the laws of the physics. I won't be surprised that taking the same mechanics into real life and we can build a scorpion helicopter that flies in the same way.

Nicest Touch #5
The computer screens. This whole story is based on 150 years from now. The fictional technology presented were absolutely amazing. The screens were like sheets of glasses and they can be bended and wrapped around you. I'm working on a nice 24" here and 2 23" at work, but i still find myself always run out of space and I hate dual or triple monitors because of the disconnect between them. The fiber glass like screens in the movie was just like a dream come true. I wish I can have that tomorrow. :D

This is as far as I'm going to share as it is already mid-night here. As of today, every show was still sold out. We had to watch the 8:30 show because the 4:40 show were completely sold out at 3:30. When we left the the theater, there were still lines of people ready to watch the 1:00 show.

Go see the movie, and definitely go watch it on IMAX 3D. You'll not regret the $15, it is definitely money well spent. As for the movie itself, the controversial $500 million cost in my opinion is also money well spent. This movie carries so many achievements that no others could compete and will be able to compete with for a long time.

Thursday, January 7, 2010

Microsoft Joins W3C SVG Working Group

On Jan. 4th, Microsoft submitted their request to join the Scalable Vector Graphics (SVG) Working Group of the World Wide Web Consortium (W3C).

For the first time, instead of bashing Microsoft, I'm going to cheer for them. SVG has been a huge hype around cut-edge web developers. Browsers like Firefox, Chrome and Webkit has all tried to implement it in some way. With now Microsoft on board to help standardize the SVG format and platform, I think we have a pretty bright future ahead of us. We might just get it this year and just as standardized as the "<body>" tag across all the browsers. (my fingers are crossed)

Wednesday, January 6, 2010

Google + Mobile = Total Freedom

Yes, Freedom, don't we all love freedom, especially after being slaved by our mobile carriers for so long? Maybe people outside of United States won't have such a strong feeling, but for anyone who lives in US, you know the pain.

Yesterday, Google has just announced its new mobile strategy based on the already popular (some people call it "iPhone killer") Android platform. The idea of the strategy literally smashed open the shackles that we were forced to put on with all the mobile carriers. We now can choose an "unlocked" phone first before choosing a carrier.

Just to clear things up a bit, I'm going to explain why this is so significant.

The idea of buying an "unlocked" phone to some of us, it is almost unheard of. Of course to people that are a bit more tech savvy, this isn't that special. However, buying an unlocked phone directly from the manufacture instead of through eBay or some shady cellphone dealers in LA is definitely new. But in many other parts of the world, this isn't new at all. Their mobile industry models are built to serve as the networks, not the cellphone dealers.

A quick example would be China even though China has only two government owned mobile networks. In China, people would save up their money to buy a phone first, and then decide which network they want to use. If you go to a computer/gadgets mall in Shanghai, you'll find hundreds of different cell phones sold by pretty much every store. There is no concept of "locked" or "unlocked" phones. All phones sold their by natures are "unlocked" and they have to be. You'll see exactly why shortly here.

I remember I had to call T-Mobile while I was in China to unlock my MDA so that I could use it there. T-Mobile actually charged me $30 just for giving me the damn unlocking code.

The freedom of choosing which carrier you want to use with your phone is truly turning the US mobile industry upside down. For the first time, you don't have to deal with a carrier that you don't like or a plan you can't effort just because you want the phone from that carrier or plan. You know buy your preferred phone than take it to the carrier you like and choose the plan you wish. I don't know about anyone else's feeling on this, at least this sounds naturally right to me. Why would I settle for a two year contract for an insane monthly cost just for a phone that potentially can be out dated within a year?

Again, I'm going to use China's mobile industry as an example to compare to the US mobile industry.
Have you seen the messiness of the prepaid mobile market in US? There are quite a few of small to large carriers who specifically provide prepaid mobile service. They all charge an insane amount of money for every service provided (even MetroPCS. oh yes, don't be fooled, it cost you $2 everytime to just pay the bill online ) What's even worse is they all make you to buy your phone from them and most phones are really really crappy. When my Mom visited me last year, she brought with her a really nice Samsung phone from China thinking that she could just buy a prepaid SIM card and start making phone calls. But no, that did not happen and quite frankly could not happen with any carriers. (we ended up buying a $60 crappy phone from MetroPCS and put her on a MetroPCS plan)
Now comparing to China's mobile industry, it is just completely the other way around. You can buy prepaid SIM cards or Prepaid Minutes Card from any convenient store (including 7-Eleven). You have the freedom of choosing which network you want to use, which phone number you want to use (this part is depended on what the store has on hand), and which prepaid plan you want to use (plans are different by minutes and services provided). If you happen to have your phone with you, you can put in the SIM card right at the spot and start using the service.
Of course, there are also long term service contracts you can sign with the networks just like the way it is in US, but buying a phone from the network is completely optional.

Ever since Google surfaced up in the Tech mainstream, being public or not, it has kept its promise for being open. Most of the softwares and platforms developed by Google are open sourced. Google believes in "openess" when it comes to software development (I have a post coming up soon that talks specifically about this "openess"). This promise was also well kept on the Android, an open source smart mobile platform.

The article on Techcrunch covering this story summarized the relationship between Google and Carriers quite well. All the carriers and manufactures except AT&T & Apple had their balls held by iPhone when it comes to the smartphone war. Everyone of them wants something to compete with iPhone. Now here comes Android. It's not a phone, but a base platform that can be implemented on different hardwares. It makes the manufacture and the carrier's life easier and it gives them a magic wand that can compete with iPhone.

From my personal view, it's a simple market share war between Google and Apple. If all the carriers sign up to the Google's new strategy, how many Android phones will there be on the market? I'm guessing it would be a lot more than what AT&T+Apple can handle by themselves.

Monday, January 4, 2010

Regular Expression Checker v1.1.1 Released

I've just updated the Regular Expression Checker extension on https://chrome.google.com/extensions/
I didn't have the time to finish implementing all the features. However, I did fix a few critical usability issues and added a few useful tools.

Here are the release notes:

v1.1.1
 - added support for regex replacement
 - added support for changing highlight color
 - updated option checkboxes to instantly apply affects
 - fixed styling issues on the results box
 - minor code optimizations







Again, if you have any suggestions, bugs to report, or ideas, please feel free to write in the comments. Trust me, I do read them and valuable feed backs (good or bad) are all extremely welcomed.

Chrome Armory 1.2.1 Released

I have just released v1.2.1 of Chrome Armory on https://chrome.google.com/extensions/

Here are the release notes:

v1.2.1
 - fixed an image displaying issue which was caused by failed packaging


v1.2.0
 - added support for all other regions (except for china since all realms are offline there)
 - added international characters support 
 - code optimizations







Again, if you have any suggestions, bugs to report, or ideas, please feel free to write in the comments. Trust me, I do read them and valuable feed backs (good or bad) are all extremely welcomed.