Software Engineering
Official Google Research Blog: Large-scale graph computing at Google
from Official Google Research Blog: Large-scale graph computing at Google.
If you squint the right way, you will notice that graphs are everywhere. For example, social networks, popularized by Web 2.0, are graphs that describe relationships among people. Transportation routes create a graph of physical connections among geographical locations. Paths of disease outbreaks form a graph, as do games among soccer teams, computer network topologies, and citations among scientific papers. Perhaps the most pervasive graph is the web itself, where documents are vertices and links are edges. Mining the web has become an important branch of information technology, and at least one major Internet company has been founded upon this graph.
Just like Map/Reduce, Logic programming or OO, having more ways of thinking about a problem is a good thing :-)
One Div Zero: A Brief, Incomplete, and Mostly Wrong History of Programming Languages
1987 – Larry Wall falls asleep and hits Larry Wall’s forehead on the keyboard. Upon waking Larry Wall decides that the string of characters on Larry Wall’s monitor isn’t random but an example program in a programming language that God wants His prophet, Larry Wall, to design. Perl is born.
from One Div Zero: A Brief, Incomplete, and Mostly Wrong History of Programming Languages.
Domain Specific Editing Interface using RDFa and jQuery
I wrote back in January about Resource Lists, Semantic Web, RDFa and Editing Stuff. This was based on work we’d done in Talis Aspire.
Several people suggested this should be written up as a fuller paper, so Nad, Jeni and I wrote it up as a paper for the SFSW 2009 workshop. It’s been accepted and will be published there, but unfortunately due to work priorities that have come up we won’t be able to attend.
A draft of the paper is here: A Pattern for Domain Specific Editing Interfaces Using Embedded RDFa and HTML Manipulation Tools.
The camera ready copy will be published in the conference proceedings. Feedback welcomed.
Coghead closes for business
With the announcement that Coghead, a really very smart app development platform, is closing its doors it’s worth thinking about how you can protect yourself from the inevitable disappearance of a service.
Of course, there are all the obvious business type due diligence activities like ensuring that the company has sufficient funds, understanding how
your subscription covers the cost (or doesn’t) of what you’re using and so on, but all these can do is make you feel more comfortable – they can’t provide real protection. To be protected you need 4 key things – if you have these 4 things you can, if necessary, move to hosting it yourself.
- URLs within your own domain.
- Regular exports of your data.
- Regular exports of your application.
- The code.
Both you and your customers will bookmark parts of the app, email links, embed links in documents, build excel spreadsheets that download the data and so on and so on. You need to control the DNS for the host that is running your tenancy in the SaaS service. Without this you have no way to redirect your customers if you need to run the software somewhere else.
This is, really, the most important thing. You can re-create the data and the content, you can even re-write the application if you have to, but if you lose all the links then you will simple disappear.
You may not get much notice of changes in a SaaS service. When you find they are having outages, going bust or simply disappear is not the time to work out how to get your data back out. Automate a regular export of your data so you know you can’t lose too much. Coghead allowed for that and are giving people time to get their data out.
Having invested a lot in working out the write processes, rules and flows to make best use of your app you want to be able to export that too. This needs to be exportable in a form that can be re-imported somewhere else. Coghead hasn’t allowed for this, meaning that Coghead customers will have to re-write their apps based on a human reading of the Coghead definitions. Which brings me on to my next point…
You want to be able to take the exact same code that was running SaaS and install it on your own servers, install the exported code and data and update your DNS. Without the code you simply can’t do that. Making the code open-source may be a problem as others could establish equivalent services very quickly, but the software industry has had ways to deal with this problem through escrow and licensing for several decades. The code in escrow would be my absolute minimum.
SaaS and PaaS (Platform as a Service) providers promote a business model based on economies of scale, lower cost of ownership, improved availability, support and community. These things are all true even if they meet the four needs above – but the priorities for these needs are with the customer, not with the provider. That’s because meeting these four needs makes the development of a SaaS product harder and it also makes it harder for any individual customer to get setup. We certainly don’t meet all four with our SaaS and PaaS offerings at work yet, but I am confident that we’ll get there – and we’re not closing our doors any time soon ;-)
Ruby Mock Web Server
I spent the afternoon today working with Sarndeep, our very smart automated test guy. He’s been working on extending what we can do with rspec to cover testing of some more interesting things.
Last week he and Elliot put together a great set of tests using MailTrap to confirm that we’re sending the right mails to the right addresses under the right conditions. Nice tests to have for a web app that generates email in a few cases.
This afternoon we were working on a mock web server. We use a lot of RESTful services in what we’re doing and being able to test our app for its handling of error conditions is important. We’ve had a static web server set up for a while, this has particular requests and responses configured in it, but we’ve not really liked it because the responses are all separate from the tests and the server is another apache vhost that has to be setup when you first checkout the app.
So, we’d decided a while ago that we wanted to put in a little Ruby based web server that we could control from within the rspec tests and that’s what we built a first cut of this afternoon.
require File.expand_path(File.dirname(__FILE__) + "/../Helper")
require 'rubygems'
require 'rack'
require 'thin'
class MockServer
def initialize()
@expectations = []
end
def register(env, response)
@expectations << [env, response]
end
def clear()
@expectations = []
end
def call(env)
#puts "starting call\n"
@expectations.each_with_index do |expectation,index|
expectationEnv = expectation[0]
response = expectation[1]
matched = false
#puts "index #{index} is #{expectationEnv} contains #{response}\n\n"
expectationEnv.each do |envKey, value|
puts "trying to match #{envKey}, #{value}\n"
matched = true
if value != env[envKey]
matched = false
break
end
end
if matched
@expectations.delete_at(index)
return response
end
end
#puts "ending call\n"
end
end
mockServer = MockServer.new()
mockServer.register( { 'REQUEST_METHOD' => 'GET' }, [ 200, { 'Content-Type' => 'text/plain', 'Content-Length' => '11' }, [ 'Hello World' ]])
mockServer.register( { 'REQUEST_METHOD' => 'GET' }, [ 200, { 'Content-Type' => 'text/plain', 'Content-Length' => '11' }, [ 'Hello Again' ]])
Rack::Handler::Thin.run(mockServer, :Port => 4000)
The MockServer implements the Rack interface so it can work within the Thin web server from inside the rspec tests. The expectations are registered with the MockServer and the first parameter is simply a hashtable in the same format as the Rack Environment. You only specify the entries that you care about, any that you don’t specify are not compared with the request. Expectations don’t have to occur in order (expect where the environment you give is ambiguous, in which case they match first in first matched).
As a first venture into writing more in Ruby than an rspec test I have to say I found it pretty sweet – There was only one issue with getting at array indices that tripped me up, but Ross helped me out with that and it was pretty quickly sorted.
Plans for this include putting in a verify() and making it thread safe so that multiple requests can come in parallel. Any other suggestions (including improvements on my non-idiomatic code) very gratefully received.
dev8D | Lightning talk: Agile Development
This is a great post on agile development coming out from the JISC Dev8d days.
Example from the floor, Matthew: what worked well in a commercial company I was working for where we practiced extreme coding and used agile principles was: no code ownership (bound by strict rules), test-based development, rules about simplicity, never refactoring until you have to, stand up meetings, whiteboard designs, iterations so could find out when you’d messed something up almost immediately, everything had to have unit tests, there has to be a lot of trust in the system (you have to know that someone is not going to break your code)
Graham: building trust is central.
via dev8D | Lightning talk: Agile Development.
The quote above from Matthew and Graham mirrors exactly my experience – when we do those things well, and are disciplined about it and trust each other the things work out well. When we do less of those things then things turn out less well.
Graham is Graham Klyne who I’ve met a few times at various meets like Vocamp 2008 in Oxford. He and his team are doing clever things with pictures of flies and semweb technologies.
Nounification of Verbs
For a long time I’ve felt uncomfortable every time I’ve written a class with a name like ‘FooManager’, ‘BarWatcher’ or ‘BazHelper’. This has always smelt bad and opening any codebase that is structured this way has always made me feel ever so slightly uneasy. My thoughts as to why are still slightly fuzzy, but here’s what I have so far…
Firstly some background, my perspective on object-oriented programming is deliberately naive. I don’t like to create interfaces for everything and I don’t use lots of factories. This comes, I guess, from my earliest education in C++, through one of the best books ever written on the subject. Simple C++ by Geoffrey Cogswell. While you stop laughing at the idea that you can learn something as complex as object-oriented programming from a thin paperback featuring a robot dog and the term POOP (Profound Object Orientated Programming), think about the very essence of what it is we’re trying to do.
OOP is about modelling objects. Objects are things that are, and to name things that are we use nouns. Then we give the objects responsibilities, things that they can do, behaviour. So we use what my primary school english so beautifully called ‘doing words’ or verbs if you prefer.
Now, not long ago I wrote a ByteArrayHelper class in Java. I’m not ashamed of it. The code is good, efficient, readable code that does many of the common things I needed to do with a byte[]. However, help is a verb. My classes responsibility is to help byte[] by doing things that byte[] doesn’t do. I’ve made the class name into a noun by nounifying the verb.
By de-nounifying it I can see where the responsibilities should really sit – with byte[]. My ByteArrayHelper does nothing for itself. All of its methods do something with a byte[]. The methods are things like SubArray(offset, length) and insertAt(offset, bytes). These are methods that I wanted on byte[].
Now, what I really wanted was to be able to add these methods to byte[], making them available wherever a byte[] was being handled, but as Java is statically typed I couldn’t do that (even if byte[] were a class, which it isn’t). In SmallTalk, Javascript or Ruby I likely could have just added the methods I wanted. The next best thing would have been to declare a sub-class of byte[] and put the methods on that, then the initial construction of my byte[] instance could create my own, more capable, object, but still pass it around everywhere as a byte[]. But byte[] isn’t a class in Java, byte isn’t even a class, it’s a primitive – sort of an object, but much less powerful.
Following the search for a noun-base approach I could have created my own ByteArray that may or may not have delegated to a byte[] internally. This could not have been passed around as a byte[] though, so would have required substantial refactoring of the classes already there. So, I wrote a ByteArrayHelper instead. Having written the ByteArrayHelper, though, it was obvious that none of the methods required any instance variables, they all took and returned byte arrays – so I made them all static. So, my nounified verb had actually led me to write nothing more than a function library.
Whether or not I made the right decision is left as an exercise for the reader.
Taking another example, this time from a friend’s code. Looking through it we noticed that one of the classes was a FileLoaderManager – a class who’s responsibility is to manage FileLoaders. A nounified verb looking after another nounified verb. I hasten to add that this is not bad code – the code in question does some awesome processing of relationships looking for similarities, like Amazon’s ‘people who bought this also bought’ but more generic.
When we looked into the FileLoaderManager and took away some of the responsibilities that fitted better with other classes we were left with just the need to list all the files in a given path that matched a particular pattern. Knowing what files are at a given path sounds like the responsibility of a Directory to me. Now, being very lean C++ we didn’t bother looking for one of the readily available Directory classes, the code we already had could be re-factored quickly. Having written the Directory class it becomes obvious that it would be useful elsewhere, whereas the FileLoaderManager could only be used for the one specific case it originally fulfilled. The nounified verb had led to the code being far more specific than it needed to be.
Two classes I came across in a PHP codebase recently were called FilePutter and FileGetter. These two classes wrap the file_put_contents and file_get_contents functions in PHP, wrapping these functions as classes allows them to be mocked, and therefore users of them can be unit tested. Wouldn’t a single class called, simply, File be easier to follow? The nounified verb approach had led to a peculiar structure in the code made it less obvious for a reader to follow.
So far then, my conclusion is that nounified verbs are likely to be a sign that I’m not using OO techniques for specialisation of behaviour; that my code is more specialised than it could be or that I’m writing in a way that is less easy to read than it could be.
Resource Lists, Semantic Web, RDFa and Editing Stuff
Some of the work I’ve been doing over the past few months has been on a resource lists product that helps lecturers and students make best use of the educational material for their courses.
One of the problems we hoped to address really well was the editing of lists. Historically products that do this have been deemed cumbersome and difficult by academic staff who will often produce lists as simple documents in Word or the like.
We wanted to make an editing interface that really worked for the academic community so they could keep the lists as accurate and current as they wanted.
Chris Clarke, our Programme Manager, and Fiona Grieg, one of our pilot customers, describe the work in a W3C case study. Ivan Hermann then picks up on one of the way we decided to implement editing using RDFa within the HTML DOM. In the case study Chris describes it like this:
The interface to build or edit lists uses a WYSIWYG metaphor implemented in Javascript operating over RDFa markup, allowing the user to drag and drop resources and edit data quickly, without the need to round trip back to the server on completion of each operation. The user’s actions of moving, adding, grouping or editing resources directly manipulate the RDFa model within the page. When the user has finished editing, they hit a save button which serialises the RDFa model in the page into an RDF/XML model which is submitted back to the server. The server then performs a delta on the incoming model with that in the persistent store. Any changes identified are applied to the store, and the next view of the list will reflect the user’s updates.
This approach has several advantages. First, as Andrew says
One thing I hadn’t used until recently was RDFa. We’ve used it on one of the main admin pages in our new product and it’s made what was initially quite a complex problem much simpler to implement.
The problem that’s made simpler is this – WYSIWYG editing of the page was best done using DOM manipulation techniques, and most easily using existing libraries such as prototype. But what was being edited isn’t really the visual document, it is the underlying RDF model. Trying to keep a version of the model in a JS array or something in synch with the changes happening in the DOM seemed to be a difficult (and potentially bug-ridden) option.
By using RDFa we can distribute the model through the DOM and have the model updated by virtue of having updated the DOM itself. Andrew describes this process nicely:
Currently using Jeni Tennison’s RDFQuery library to parse an RDF model out of an XHTML+RDFa page we can mix this with our own code and end up with something that allows complex WYSIWYG editing on a reading list. We use RDFQuery to parse an initial model out of the page with JavaScript and then the user can start modifying the page in a WYSIWYG style. They can drag new sections onto the list, drag items from their library of bookmarked resources onto the list and re-order sections and items on the list. All this is done in the browser with just a few AJAX calls behind the scenes to pull in data for newly added items where required. At the end of the process, when the Save button is pressed, we can submit the ‘before’ and ‘after’ models to our back-end logic which builds a Changeset from before and after models and persists this to a data store on the Talis Platform.
Building a Changeset from the two RDF models makes quite a complex problem relatively straightforward. The complexity now just being in the WYSIWYG interface and the dynamic updating of the RDFa in the page as new items are added or re-arranged.
As Andrew describes, the editing starts by extracting a copy of the model. This allows the browser to maintain before and after models. This is useful as when the before and after get posted to the server the before can be used to spot if there have been editing conflicts with someone else doing a concurrent edit – this is an improvement to how Chris described it in the case study.
There are some gotchas in this approach though. Firstly, some of the nodes have two-way links:
<http://example.com/lists/foo> <http://purl.org/vocab/resourcelist/schema#contains> <http://example.com/items/bar>
<http://example.com/items/bar> <http://purl.org/vocab/resourcelist/schema#list> <http://example.com/lists/foo>
So that the relationship from the list to the item gets removed when the item is deleted from the DOM we use the @rev attribute. This allows us to put the relationship from the list to the item with the item, rather than with the list.
The second issue is that we use rdf:Seq to maintain the ordering of the lists, so when the order changes in the DOM we have to do a quick traversal of the DOM changing the sequence predicates (_1, _2 etc) to match the new visual order.
Neither of these were difficult problems to solve :-)
My thanks go out to Jeni Tennison who helped me get the initial prototype of this approach working while we were at Swig back in Novemeber.
How long to change culture
Over on Tiny Drops of Knowledge (the name belies the depth of the content) Lyndsay posted about changing culture – in his case being part of a team moving from VB to C#.
At Talis we find we’re constantly changing, and have been for longer than my three years here. At Talis that change has been about talent, responsibility, empowerment, achievement, learning, fun and real values.
Lyndsay says:
A reoccurring challenge we’ve faced on this project is re-learning how to write software. Its more than just learning a new language, tool or framework, its more like loosing the shackles of apartheid.
Over the past few years I’ve concluded that this is the normal state of affairs. When I first started work I was given some great advice – my boss told me, when I handed my notice in, to make sure that wherever I went I wasn’t the smartest person there. What he was getting at is the need in me, and many of the people I count as my closest friends, to be constantly finding out new things. I thought nothing more of it at the time.
But then, working with folks at Talis I realised the very real truth that “the more I learn the less I know”. And I’m okay with that. Not everyone is, but I think it’s key to being as productive as you can be. The change Lyndsay is talking about may be the change from writing instructions for the computer to modeling a problem for yourself and your colleagues. Once you’ve made that leap it makes sense to understand as many different ways of modeling problems as you can – at work we’ve been through OO design in Java through procedural code in PHP and functional code in XSLT to discussions of the similarities between Prolog and how we initially perceive how we might work with RDF.
There’ll be something else tomorrow. I know there will because I work with a load of people who are smarter than I am.
Technorati Tags: software engineering, talis
There’s no I in Team
Back in June on The Berkun Blog, Scott’s talked about Asshole-Driven Development, and other great techniques for the dysfunctional office. He states clearly that his list is cynical, and that there is probably a happy list as well…
Well, I figured I’d take a pop at a happier list…
First up, let’s have:
Motivated and Empowered Individual method (ME,I)
This is how I’d describe the way Joel Spolsky has set up the guys at Fog Creek. Essentially the team breaks the solution down into parts and gives a part to a person. Each person is free to develop in their own way, within some bounds set by the team, and becomes the owner of an area of functionality. Without the distractions of other people working on the same code areas the owner can become very productive within the bounds of the code they own. Joel describes the people he hires as “Smart and Gets Things Done“, he wrote a book of the same name.
Smart Friends Development Model (SFDM)
I spotted this one at XTech in Paris earlier this year. There I met three smart friends who, in their spare-time, had developed Quakr. Friendship in a development team provides a real boost to the way the team communicates and negotiates decisions and issues. In the case of Quakr they were friends first and decided to build Quakr second, but I’ve seen teams formed by other companies where effort has been put in to building great friendships.
Very-Clever and Nice People (VCNP)
Martin Fowler of Thoughtworks is open about trying to hire only the very best people. The main barrier to growing Thoughtworks is finding and hiring that talent. Once hired, they move people around, making sure they get to know all the other very clever people they’ve hired. Being clever isn’t enough though, they also looking for soft skills; they hire nice people. The end result is that they can form teams who can work at a very high level and have a lot of fun sharing ideas and helping each other. This is essentially what Microsoft did in the early days too and how they came to have the Program Manager role. Comments over at Scott’s piece talk about responsibility without authority in a very negative way, but if you have very clever and nice people this can clearly work and Thoughtworks show this with their teams.
Smart and Nice Entrepreneurs (SANE)
Back at Talis we also hire smart people. We also try very hard to make sure they’re nice too. We think we’re all pretty nice really. But there’s also a key self-motivational quality we look for; the ability to understand and be interested in how the software will make someone’s life better, as well as how clean the code is under the bonnet. We think that combination is what’s helping us develop some really great stuff and have fun doing it.
It saddens me to read posts like Scott’s and the subsequent comments. I’ve had bad experiences with employers and managers who seem to have different motivations and values to mine, and I know from friends around the industry how prevalent the problems Scott and his commentors talk about are. Surely the best thing to do is to find somewhere worth working and move, or as Martin Fowler apparently said “If you can’t change your organization, change your organization!”
I had hoped to get to more than three happier methodologies. Perhaps that’s a sign that the cynics are right.
Technorati Tags: software engineering, talis
Search
What I'm Doing...
- @moustaki, would you recommend an equivalent to music ontology for visual recordings? 4 hrs ago
- @chriskeene Does the uni have it's own local weather system? (http://twitter.com/chriskeene/status/10314171215 and go left) in reply to chriskeene 13 hrs ago
- @_philjohn should I expect a late arrival then? in reply to _philjohn 13 hrs ago
- More updates...
Recent Comments
- Patents are Property – Like it or Not « Chasing the Power Curve on When Patents Go Wrong…
- Arizona Joe on Fixing a plasma TV
- alex_turner11 on Ground roundup of new eReaders at CES on CNN
- negative_charge on Hacking Into Your Account is as Easy as 123456
- infopeep on Hacking Into Your Account is as Easy as 123456
- BenenhaleyBrian on The 18 Mistakes That Kill Startups
- Brian Benenhaley on The 18 Mistakes That Kill Startups
- infopeep on The 18 Mistakes That Kill Startups
- Rob Styles on Ruby Mock Web Server
- Jim on Fixing a plasma TV
Categories
- .Net Technical (8)
- Blog on Blog (6)
- commands I have issued (9)
- Enterprise Architecture (19)
- event (4)
- Fiction Book Review (2)
- Food (2)
- Intellectual Property (9)
- Interaction Design (27)
- Internet Social Impact (43)
- Internet Technical (16)
- IP Law (10)
- Library Tech (19)
- Music (2)
- New Toy (4)
- Non-Fiction Book Review (7)
- Ontologies (6)
- Open Data (7)
- Other Technical (20)
- Personal (36)
- Random Thought (16)
- Resourcing (4)
- Review (1)
- Security And Privacy (11)
- Semantic Web (30)
- Software Business (10)
- Software Engineering (37)
- Talis Technical (9)
- Uncategorized (44)
- Working at Talis (26)
- [grid::blogpaper] (8)
- [grid::fatherhood] (4)
Archives
- February 2010 (1)
- January 2010 (4)
- November 2009 (10)
- October 2009 (4)
- September 2009 (2)
- August 2009 (9)
- July 2009 (12)
- June 2009 (5)
- May 2009 (6)
- April 2009 (7)
- March 2009 (3)
- February 2009 (6)
- January 2009 (10)
- December 2008 (4)
- November 2008 (4)
- October 2008 (9)
- September 2008 (23)
- August 2008 (8)
- July 2008 (1)
- June 2008 (1)
- May 2008 (6)
- April 2008 (14)
- March 2008 (3)
- January 2008 (5)
- December 2007 (6)
- November 2007 (13)
- October 2007 (9)
- July 2007 (2)
- June 2007 (1)
- May 2007 (10)
- April 2007 (5)
- March 2007 (11)
- February 2007 (10)
- January 2007 (13)
- December 2006 (8)
- November 2006 (8)
- September 2006 (2)
- August 2006 (1)
- June 2006 (2)
- February 2006 (2)
- January 2006 (3)
- December 2005 (3)
- November 2005 (2)
- September 2005 (2)
- August 2005 (5)
- July 2005 (8)
- June 2005 (3)
- May 2005 (2)
- February 2005 (1)
- January 2005 (4)
- December 2004 (3)
- November 2004 (6)
- October 2004 (2)
- September 2004 (2)
- August 2004 (5)
- July 2004 (1)
- June 2004 (4)
- May 2004 (4)
- April 2004 (3)
- March 2004 (13)
- February 2004 (6)
- December 2003 (3)
- November 2003 (1)
- August 2003 (2)
- July 2003 (1)
- June 2003 (2)
- May 2003 (1)
- March 2003 (1)
- January 2003 (1)
- October 2002 (1)
- May 2002 (1)
- March 2002 (1)
- August 2001 (1)
- May 2001 (1)
- April 2001 (1)
- January 2001 (1)
- December 2000 (1)
- November 2000 (1)
- December 1999 (1)
- November 1999 (1)
- July 1999 (1)