曾强de博客: 九月 2011

2011年9月29日星期四

1st vimrc

syntax on
set background=dark
set shiftwidth=2
set tabstop=2

if has("autocmd")
filetype plugin indent on
endif

set showcmd " Show (partial) command in status line.
set showmatch " Show matching brackets.
set ignorecase " Do case insensitive matching
set smartcase " Do smart case matching
set incsearch " Incremental search
set hidden " Hide buffers when they are abandoned

从 stackoverflow上找的，那上面谈论这个vimrc的人还挺多的。
先从最简单的开始慢慢加。

2011年9月28日星期三

re: An open letter to those who want to start programming

An open letter to those who want to start programming

First off, welcome to the fraternity. There aren’t too many people who want to create stuff and solve problems. You are a hacker. You are one of those who wants to do something interesting.

“When you don’t create things, you become defined by your tastes rather than ability."
– WhyTheLuckyStiff

Take the words below with a pinch of salt. All these come from me – a bag-and-tag programmer. I love to get things working, rather than sit at something and over-optimize it.

Start creating something just for fun. That’s a great start! There’s no way you will start if you say you “need to learn before doing”. Everybody’s got to start somewhere. Fire up your editor and start writing code.

Here’s something important which people might call bad advice, but I’m sure you’ll stand by me when I’m finished saying why. Initially, screw the algorithms and data structures. They do not have generic use-cases in most simple applications. You can learn them later when you need them. Over a period of time, you’ll know what to apply in situations. Knowing their names and what they do would suffice to be able to pick some paper, dust it and implement it. And that is… if no library (other programmers' re-usable code) is available, to do it in the programming language of your choice.

Choose a good language. One that you think you can produce something useful in short time.

So let C not be your first language. That might give you the satisfaction of doing things the really old-n-geeky way. C was the solution to the problem Assembly Language was. It offers better syntactic sugar than it’s prominent predecessor – Assemble Language. But today, C (or C++) is not a language that you can produce something very quickly. I would suggest that you use a dynamic language – I won’t sideline any options. Choose a language whose syntax (and documentation) you think you might be comfortable with. For this, you might want to spend some time trying out different languages for a few hours. The purpose of choosing such a language is not to make you feel better and that programming is easy. Completing stuff faster and being able to see the output keeps you motivated. Don’t choose a language that requires a special heavy-weight IDE (tool that helps you write code and run it) to program better in the language. All you should need is a text editor.

Choose a good editor.

An editor is to a programmer, like how a bow is to an archer. Here are some editors to get started with…

SublimeText 2 – recommended if you are just starting.
Emacs – huge learning curve. Complex key shortcuts. And to be able to customize it, you’ll need to learn Emacs Lisp.
Vim – used by many for it’s simplicity and the fact that it comes with linux distros by default. I used Emacs for 2yrs and then switched to Vim to run away from emacs’s complex key strokes and when my little finger on both hands started hurting. Knowing vim keystrokes is a must. When you work remotely and try to type out code on some server from your computer, you’ll know that the only editor available from the command line without any installs, is Vim.

Watchout! Emacs and Vim might be really old. But they both have some features which even most modern editors don’t have.

Use an operating system that’ll teach you something.

Windows won’t teach you anything. The only thing you learn using Windows is to click the .exe file to install the software and use it. It may seem cool in the beginning, but in the long run when you have to deploy applications, especially if you are aspiring to be a web developer, you’ll need atleast basic knowledge of linux. Linux also allows you to customize stuff the way you need them to be. Macs are cool too, but I assume that you cannot afford one of those now.

Don’t copy-paste files to backup stuff.

It’s usual among amateur programmers to copy-paste files to some temporary directory in order to backup them. That’s the only way they seem to know. Stop that! Use a version control software. I strongly suggest Git, since it’s popular and easy to use. It has nice community and resources to support new-comers. (Apart from Git, There’s mercurial, darcs, fossil, etc. But just start with Git. I’m not going to bother you with the reasons for suggesting Git).

Know where to get help.

Join a community that you can relate to (with the tools you use). StackOverflow is Facebook for programmers. There are no status messages and comments. Instead there are questions and answers. Also learn to use the IRC. It’s an old form of chatrooms and is now being used by mostly developers to share information and helping each other.

Develop your netiquette.

Know when to ask questions. Most problems you face might have been stumbled upon by others who might have already posted on the internet for answers. Before asking on IRC or any forums, google first (or should I say blekko first) to see if there’s already a solution to your problem. IRC needs patience. Remember people are helping you for free out of goodwill. Sometimes it might take hours, for someone in the chatroom to respond to you. So wait until they do. Besides, be polite. It's a small world. Karma, good or bad, comes back.

Meet people, because books only teach you routine stuff (oh and the "book" is dead they say).

There are some street smarts that you’ll learn when you tinker with stuff or learn from those who do it. Roam, meet people and say hello. You are not the only programmer in your place. Make friends and do stuff with them. If you've noticed, when a couple geeks get together, whatever the starting point of the conversation be, it always ends up getting technical. It's bound to happen. Enjoy it. Programming for a good number of years, I can tell you that I learnt nothing more than what the books and articles said, until I starting meeting people and getting technical with them 6yrs back. So I always say that I’ve been programming for 6yrs, because that’s when I started meeting people and feel I really started to learn.

Write opensource code.

Writing opensource code is giving back. It’s much more than charity. You are leaving code that others can use and improve on (maybe) for years to come. It also helps you refine your skills when someone else adds to your code or suggests changes. Code that you opensource doesn't have to be big. It can even be a useful little program that downloads youtube videos. Moreover, you’ll be surprised, that your code will often help you start and have interesting conversations with people.

Lastly, when years pass, return this favour, by writing a similar letter to someone else who asks you for such help. And possibily correct me.

--
For a hacker, by a hacker
Akash Manohar

P.S: Wise men say, it takes 10 years or 10000 hours to get good at something. So don’t hurry.

2011年9月24日星期六

你真的需要重构软件吗？

Joel on Software
Things You Should Never Do, Part I
by Joel Spolsky
Thursday, April 06, 2000

Netscape 6.0 is finally going into its first public beta. There never was a version 5.0. The last major release, version 4.0, was released almost three years ago. Three years is an awfully long time in the Internet world. During this time, Netscape sat by, helplessly, as their market share plummeted.

It's a bit smarmy of me to criticize them for waiting so long between releases. They didn't do it on purpose, now, did they?

Well, yes. They did. They did it by making the single worst strategic mistake that any software company can make:

They decided to rewrite the code from scratch.

Netscape wasn't the first company to make this mistake. Borland made the same mistake when they bought Arago and tried to make it into dBase for Windows, a doomed project that took so long that Microsoft Access ate their lunch, then they made it again in rewriting Quattro Pro from scratch and astonishing people with how few features it had. Microsoft almost made the same mistake, trying to rewrite Word for Windows from scratch in a doomed project called Pyramid which was shut down, thrown away, and swept under the rug. Lucky for Microsoft, they had never stopped working on the old code base, so they had something to ship, making it merely a financial disaster, not a strategic one.

We're programmers. Programmers are, in their hearts, architects, and the first thing they want to do when they get to a site is to bulldoze the place flat and build something grand. We're not excited by incremental renovation: tinkering, improving, planting flower beds.

There's a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:

It’s harder to read code than to write it.

This is why code reuse is so hard. This is why everybody on your team has a different function they like to use for splitting strings into arrays of strings. They write their own function because it's easier and more fun than figuring out how the old function works.

As a corollary of this axiom, you can ask almost any programmer today about the code they are working on. "It's a big hairy mess," they will tell you. "I'd like nothing better than to throw it out and start over."

Why is it a mess?

"Well," they say, "look at this function. It is two pages long! None of this stuff belongs in there! I don't know what half of these API calls are for."

Before Borland's new spreadsheet for Windows shipped, Philippe Kahn, the colorful founder of Borland, was quoted a lot in the press bragging about how Quattro Pro would be much better than Microsoft Excel, because it was written from scratch. All new source code! As if source code rusted.

The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they've been fixed. There's nothing wrong with it. It doesn't acquire bugs just by sitting around on your hard drive. Au contraire, baby! Is software supposed to be like an old Dodge Dart, that rusts just sitting in the garage? Is software like a teddy bear that's kind of gross if it's not made out of all new material?

Back to that two page function. Yes, I know, it's just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I'll tell you why: those are bug fixes. One of them fixes that bug that Nancy had when she tried to install the thing on a computer that didn't have Internet Explorer. Another one fixes that bug that occurs in low memory conditions. Another one fixes that bug that occurred when the file is on a floppy disk and the user yanks out the disk in the middle. That LoadLibrary call is ugly but it makes the code work on old versions of Windows 95.

Each of these bugs took weeks of real-world usage before they were found. The programmer might have spent a couple of days reproducing the bug in the lab and fixing it. If it's like a lot of bugs, the fix might be one line of code, or it might even be a couple of characters, but a lot of work and time went into those two characters.

When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work.

You are throwing away your market leadership. You are giving a gift of two or three years to your competitors, and believe me, that is a long time in software years.

You are putting yourself in an extremely dangerous position where you will be shipping an old version of the code for several years, completely unable to make any strategic changes or react to new features that the market demands, because you don't have shippable code. You might as well just close for business for the duration.

You are wasting an outlandish amount of money writing code that already exists.

Is there an alternative? The consensus seems to be that the old Netscape code base was really bad. Well, it might have been bad, but, you know what? It worked pretty darn well on an awful lot of real world computer systems.

When programmers say that their code is a holy mess (as they always do), there are three kinds of things that are wrong with it.

First, there are architectural problems. The code is not factored correctly. The networking code is popping up its own dialog boxes from the middle of nowhere; this should have been handled in the UI code. These problems can be solved, one at a time, by carefully moving code, refactoring, changing interfaces. They can be done by one programmer working carefully and checking in his changes all at once, so that nobody else is disrupted. Even fairly major architectural changes can be done without throwing away the code. On the Juno project we spent several months rearchitecting at one point: just moving things around, cleaning them up, creating base classes that made sense, and creating sharp interfaces between the modules. But we did it carefully, with our existing code base, and we didn't introduce new bugs or throw away working code.

A second reason programmers think that their code is a mess is that it is inefficient. The rendering code in Netscape was rumored to be slow. But this only affects a small part of the project, which you can optimize or even rewrite. You don't have to rewrite the whole thing. When optimizing for speed, 1% of the work gets you 99% of the bang.

Third, the code may be doggone ugly. One project I worked on actually had a data type called a FuckedString. Another project had started out using the convention of starting member variables with an underscore, but later switched to the more standard "m_". So half the functions started with "_" and half with "m_", which looked ugly. Frankly, this is the kind of thing you solve in five minutes with a macro in Emacs, not by starting from scratch.

It's important to remember that when you start from scratch there is absolutely no reason to believe that you are going to do a better job than you did the first time. First of all, you probably don't even have the same programming team that worked on version one, so you don't actually have "more experience". You're just going to make most of the old mistakes again, and introduce some new problems that weren't in the original version.

The old mantra build one to throw away is dangerous when applied to large scale commercial applications. If you are writing code experimentally, you may want to rip up the function you wrote last week when you think of a better algorithm. That's fine. You may want to refactor a class to make it easier to use. That's fine, too. But throwing away the whole program is a dangerous folly, and if Netscape actually had some adult supervision with software industry experience, they might not have shot themselves in the foot so badly.

原址

Compiler difference between VS .net and VS 2008

Just found a problem today when I convert Visual C++ .NET Project to Visual C++ 2008.

In Visual Studio .NET, this is legal:



for (int i=0;i<10;i++) {}


for (i=0;i<20;i++) {}

But this will generate a undeclared identifier error by VS 2008 compiler.

There may be some other error. So be cautious when converting projects between different version of VS.

2011年9月22日星期四

The C++ Programming Languages (3rd Edition)

you can write structured programs
in Fortran77 and object-oriented programs in C, but it is unnecessarily hard to do so because these
languages do not directly support those techniques

In some areas, such as interactive
graphics, there is clearly enormous scope for object-oriented programming. In other areas, such as
classical arithmetic types and computations based on them, there appears to be hardly any scope for
more than data abstraction, and the facilities needed for the support of object-oriented programming
seem unnecessary.

12.2.5 Type Fields [derived.typefield]

To use derived classes as more than a convenient shorthand in declarations, we must solve the following
problem: Given a pointer of type base *, to which derived type does the object pointed to
really belong? There are four fundamental solutions to the problem:
[1] Ensure that only objects of a single type are pointed to.
[2] Place a type field in the base class for the functions to inspect.
[3] Use dynamic_cast.
[4] Use virtual functions.
Pointers to base classes are commonly used in the design of container classes such as set, vector,
and list. In this case, solution 1 yields homogeneous lists, that is, lists of objects of the same type.
Solutions 2, 3, and 4 can be used to build heterogeneous lists, that is, lists of (pointers to) objects of
several different types. Solution 3 is a language-supported variant of solution 2. Solution 4 is a special type-safe
variation of solution 2. Combinations of solutions 1 and 4 are particularly interesting
and powerful; in almost all situations, they yield cleaner code than do solutions 2 and 3.
...
In other words, use of a type field is an errorprone
technique that leads to maintenance problems.
The problems increase in severity as the size of the program increases because the use of a
type field causes a violation of the ideals of modularity and data hiding. Each function using a type
field must know about the representation and other details of the implementation of every class
derived from the one containing the type field.
It also seems that the existence of any common data accessible from every derived class, such
as a type field, tempts people to add more such data. The common base thus becomes the repository
of all kinds of "useful information." This, in turn, gets the implementation of the base and
derived classes intertwined in ways that are most undesirable. For clean design and simpler maintenance,
we want to keep separate issues separate and avoid mutual dependencies.

2011年9月21日星期三

今天碰到的几个问题

今天一整天基本上都在 fix bugs.

email graph的那个问题已经弄了好久了，将近一个月吧。昨天submit了我的第n个fix，但今天却还是被告知fail，真的有些不爽的。问题是这样的：用户可以通过前几个月的报告数据生成一个图表，这个图表是php生成的一个jpeg图片。它的地址是以绝对路径存储在数据库里的（这里先不说不该用绝对路径）。我们在给用户一个软件的安装程序的时候（一个tar.gz2的drupal文件），会预制一些这种图表+地址放在数据库里面。这些其实是demo的图片，主要是想让用户知道有这个东西。然后他们自己生成第一个graph的时候会overwrite这个demo的。最后，这个生成的图表可以被设定通过邮件发送，这个是用户自己的设置。

问题出在哪里呢，那些demo图片的地址是hard coded进去的，而且是错的。可是我们的系统会被很多不同的domain使用，所以，他们用邮件发送的图片当然发不出去。发送邮件的这个问题涉及到一个cron job的问题。邮件不是立刻发的，而是每天有定点的程序发送。之前的fix是用$_SERVER['HTTP_HOST'] +drupal 在服务器的文件夹名生成一个base url来替换掉原来错误的地址，也就是只是换掉domain+root folder的那个part。这种方法理论上是work的，在localhost和server上都测试没问题。我们测试的方法是，直接在浏览器里放systemCron.php的服务器地址。邮件也发的很好，没问题。然后提交fix后，很快就被告知fail了。根据这个思路我也是过好几个tweak，也都是在我这边测试没问题，到用户那边就出问题。通过log发现，每次都是domain不对，也就是说$_SERVER['HTTP_HOST']没有被读到。

后来google了一下，发现$_SERVER['HTTP_HOST']在http/1.0里是没有defined。于是就怀疑用户的机器是http/1.0，但很快发现他们的用的是http/1.1是没有问题的。

问题到底出在哪里呢？我测试都可以，到他那边就不行。问什么呢？后来就想到这个cron task上面去了，想了一下cron task到底是怎么invoke的。去bluehost 上看了一下，恍然大悟。原来，那些所谓的script是用php cli来run的。php cli用的是文件在本地的路径，跟服务器没有任何关系，也就不会有想'HTTP_HOST'这种variable了。var_dump了一下$SERVER，发现没有任何服务器信息。我的测试之所以能work是因为，我是直接在browser里面run，发送了http request。实际情况并不是这样。

于是，我现在的fix是，在app 的setttings.php 里面define一个常数：BASE_ADDRESS。在下一个update或install的时候set它的值到正确的地址比如 https://eccc.com/gma 。这样cron job run的时候include_once一下这个settings.php就可以直接用这个BASE_ADDRESS了。 Set 这个BASE_ADDRESS需要用get_file_content()+strpo+strconcat，因为define了的值不能在php里面改动。我fgets的时候遇到了一个小问题，不能读取setttings.php里面的string。后来上了stackoverflow,上面说要用htmlspecialchars()这个function。果然，php echo 不能读special character like ""，必须要转换一下。其实我不需要print out, 因为在buffer里面的string是没有问题的。直接找到那行代码，两头一切，然后中间插一个从$_SERVER得到的地址就行了。

现在的fix应该可以了，因为问题的本质已经在那里的。之前老是弄不对是因为没有办法replicate的bug。总以为fix是对的，其实还是经验不足的体现。

今天还弄了一个ssl的问题。
我们做的网站有用ssl。那个东西就是把client, server发出去的东西encrypt一下，保护通讯安全。具体的workflow就是:

client request secure connection -> server send its public key and certificate to client -> client verify and generate session key using server's public key and send to server -> server can decrypt the message using its private key -> then both side use the session key to encrypt and decrypt the message

server需要support ssl，像apache有mod_ssl。然后需要有一个有效的有CA signed 的certificate，然后每次request把http换成https，就可以用了。我遇到的问题是，网站的某个网页ssl失效。也就是浏览器无法确定网页的安全性，于是就有一个https被红线划掉的标志。我挑查了一下，发现浏览器其实是可以读到他的certificate的，而且是证明有效的，但现实的是partially verified。我打开firebug一看，原来，这个page load了很多resource像，css, js, jpeg等等，都是指向port 443。但其中有一个resource找不到，显示是红色的。就是这个找不到的resource影响了网站的安全性。然后我把那个resource拿掉，就没有问题了。

2011年9月18日星期日

A Collection of Useful .gitignore Templates

https://github.com/github/gitignore

http://help.github.com/ignore-files/

2011年9月6日星期二

有关管理自己工作的一些想法

工作快两个月了。每天都埋着头写程序，但是一直觉得效率不高。尤其是当要同时处理好几个东西的时候，总感觉有些乱。

有时候在想我需要发展一套自己的工作习惯，这套习惯能够应对任何种类的工作，并能在处理好几个棘手的问题时候能有条不紊。最重要的是这个习惯要能很容易的遵循，应该有一些工具可供使用。比如我现在有一些简单的思路：
首先，邮件一定要归类，行政，项目1，项目2等等都要分得清楚。这样查找起来很方便。有时候为了防止遗忘某些重要的邮件，可以将它们标星。这些工作都要在收到邮件的第一时间做好，因为过了这个时间很可能就不记得了，再去清理的话，一来浪费时间，二来有可能出错。

其次，充分的利用一个任务管理软件，比如说thunderbird的日历功能。每次拿到新任务，一定要第一时间建立起一个任务。可以是没有时间期限的，但是任务细节一定要有。每一条小的步骤都要列清楚，这样每次看到这个任务的时候心中都可以有一个很清晰的图像。很多时候我觉得自己blur blur的都是因为对目前的问题不够了解，我觉得这些东西还是要写下来才行。任务的执行过程中，有些客户要求会经常变动，这时候一定要及时更新任务。每天要对自己完成任务的程度做一个更新，多少percent，都要写清除。我觉得任务的管理一定要集中，不然东西零散的堆积在那里，自己的思绪要乱七八糟，只会让情况恶化。

这个日历跟邮件一样在工作的时候都是一直打开的。这样一旦有新任务，就可以马上更新。

然后，写程序的时候一定要有一个好的源码管理软件，目前我使用的是Git，用起来很方便。它可以随时commit 设置存储点，每一次源码的变动都在掌握之中。Git 可以同步几个不同的文件夹，也可以很便宜的创建分支。不过我在使用Git的过程中还是有一个疑惑，就是几个feature同时开发的时候应该怎么做。是分几个分支，在干净的源码上分别开发，最后全部一起合并好，还是全部都在主分支上开发。前者思路比较清晰，很容易知道哪些改动属于当前的分支，但缺点是合并的时候可能会有很多麻烦。比如合并resource文件的时候遇到冲突几行程序唯一不同的只是几个坐标，这时候就很难分清哪些是需要的合并，哪些是需要舍弃的。全都在主干上开发的坏处就是乱，这时候唯一能够keep track的工具就是有意义commit标记。我现在用的是mix的方法，同时工作的feature多了，一时做这个，一时做那个，所以感觉相当乱。这个还需要探索一下。当然这个疑惑跟Git没有关系，属于软件开发过程的。

最后一点是思想上的，就是工作的过程中一定要时刻提醒刚刚做过什么，现在该做什么。多看任务记录，多看项目文档。这样才不会走神。

做到这么几点，我觉得还只是初步的，之后还有很多要做。但总体来说就是这么一点，要养成好的习惯，做事要有一致性，条理性。我现在是一个人做一个项目，勉勉强强可以胜任，唯一让我感到不满意的就是效率问题。这两个月虽不长，但我经历了软件开发的很多方面，可谓学到不少东西。现在的首要任务就是培养一个好的习惯，我相信这是提高效率的首要步骤。

订阅：博文 (Atom)