The ideal candidate's answers

Alexey posted a few questions he asks when interviewing candidates for a Ruby on Rails job. And here are my answers. Don’t take them for granted.

security problems in Database, Application, Internet, and User

First of all there’s one thing missing: The Developer. I don’t trust other people’s code and I don’t trust my own code. Paranoia ftw.

Database: Make sure that the few users who access you db directly only have a minimum of necessary access rights. Hide sensitive data like passwords by either encrypting them really strong (sha1 and a random, unique seed for every user stored in your database is today’s standard, md5 isn’t) or create views to show them even less data. You can still give them more rights but if they have too many rights, deleting rows or tables, and screw your data, happen’s fast, then there’s no way back.

Application: The most insecure piece. Make sure your application handles any kind of bad or missing data coming from the user in a safe manner. SQL injections are only one danger, being to lax with your data is the bigger one. Read into ActiveRecord’s attr_protected (or attr_accessible if you’re conservative). If your app features user roles write all your tests to run with all possible roles.

Internet: Uh? That’s a security issue? Well, maybe the van outside your house reads your network traffic, who knows. But there are more dangers like Cross-site request forgery and Cross-site scripting. Rails has a few security measures that often cause problems for noobs when they send custom AJAX Post request. It’s ridicolous that some noobs really want to deactivate Rails’ security filters.

User: If there’s something I trust less than my own code than it’s the User. They lie to you by sending bad data. They fall for all kinds of stupid tricks like phishing. They are stupid and want better help text. They are lazy and want a signup form right under your review/comment/whatever form. So, don’t have users in your webapp. They are so 1.0. Webapps for machines, that’s the future.

one time migration for a large database

The most important thing for such a task: Make Best Friends Forever with your db-admin.

Mephisto has a plugin for migrating data from all kinds of blog softwares, but if you have a lot less things to do a single migration script should do. A big help are db views, get the column names right and it’s getting fun. ActiveRecord can connect to more than one database. Don’t too many things at once.

About iterating over all records, a find :all will hit memory limits, so do only a few 1000s with every cycle. About find_by_sql, I can’t remember the last time I used it. The returned objects are readonly and you save all the expensive things that ActiveRecord would do in a find. If you can’t or don’t want to use db views and if you don’t update the objects, then it’s find_by_sql.

About update_attribute, saves, connection.execute etc, the less UPDATE and INSERT your db is tortured with the better. connection.execute is THE thing if you can do your migration solely on the db level.

uploads and slow storages

Uploads are to be processed and distributed by a background process.

random selects from MySQL

Is :order => "RAND()" not good enough for you? (Update: Right your are) But I think a more important thing is to use :select so you only read what you really need. Saves db load if you have fat models.

download counter

Can’t we have this handled by say Google Analytics Events? If not then ActiveRecord::Base.update_counters is what you want. It’s doing an UPDATE only, no SELECT necessary.

Users with no documents

This is a big system, no? So we were smart enough to have a doc_count in the User model? Great. Problem solved.

Caching

General caching: Caching mean less traffic, a faster page delivery and less db request. There are many techniques: memcache, db slaves, proxy balancers with data centers in overseas. For every problem you need a special solution and the problem solvers are usually well-paid consultants. But before you waste money, check your logs and find what can be optimized. Your app’s most likely doing things that it doesn’t have to. As I said, I don’t trust even my own code.

Fragment caching, Action caching, Page caching, ActiveRecord also caches sql queries. Since 2.2 (or 2.1?) you can have your own cache storage. I once wrote one that lies in between for ttl stamps and it was soo much more flexible than the old solutions that only worked for fragment caching. Fragment caching is great for small widgets that occur on many pages in you app, action caching is most times the best if you relay on user authentication but you can have that with page caching too if you use some JS like it’s done in adva_cms. Page caching is number one choice for static pages that don’t show data about the current_user. And sql caching is done automagically, but make sure it really works. Always watch your dev log.

HTTP offers the Etags. On a second request the browser sends the time he has requested the document last and if the doc hasn’t changed yet the server returns a 304 not modified. Use YSlow! to see if it works.

The trick about caching is not to expire it at one time but over a time. If you keep a cached fragment for five minutes you are fine. That’s random. If you expire it every full hour you are screwed. Distribute your problem on the time axis.

Leave a Reply