Thursday, November 30, 2006

Is text-mining a good way to elicit requirements?

Yes, it is.

A landmark work on this topic is AbstFinder. Several other strategies exist. Original documents from the Universe of Discourse are information sources to be used in eliciting requirements. They are of fundamental importance.

However, we must be careful when and how to use text-mining. Way back a simple text-mining strategy was commonly used by some OO practicioners: underlining words in a given text, usually a requirements document in natural language written by clients.
Such strategy is not the best choice.

See what Mitchell Lubars, Colin Potts, and Charles Richter wrote in a 1993 ICSE paper.

"Some protagonists of OOA advocate a bottom-up
strategy in which the analys tunderlines or highlights all
the noun phrases in the source material[Rum91].
This produces a list of candidate objects that must then
be pruned according to certain guidelines. This strategy
is sensible for small problems: objects are likely to be
referred to by noun phrases; making the list requires
little judgement and is almost trivial: and object-oriented
methods make many useful recommendations to help the
analyst prune the list.

However,these tasks become overwhelming for problems
of the size we faced. Listing the noun phrases in a
500-page requirements document is a daunting task
of questionable value. A long, unorganized list is
not a good starting point for the next stages of analysis."

Monday, November 27, 2006

Software Engineering as a Discipline

Some believe that Information Systems should be treated as a discipline in itself. In a recent article at the CACM, Katerattanakul, Han and Rea report that Information System is growing from “an applied discipline drawing upon other disciplines” to “…the new perception that IS is a reference discipline for others”.

They used a cross-reference analysis to show that papers being published in IS journals are being cited in other fields of knowledge. Although most citation came from the field itself and to the seeds disciplines of computer science and management, it is interesting to point out that IS is being cited in journals from engineering, sociology and medicine.

The study used the following approach: selected 1120 articles published in the one of the following journals (as representative of the area): CACM, EJIS, ISJ, ISR, I&M and MISQ; trace references to these articles in two large repositories, SSCI and SCI; the citation source was then classified by areas.

The following areas provided more references to these 1120 articles: IS – 43.9%, Computer Science – 28%, Management 7,6%, Engineering, 5,8%, Sociology, 2,6% and Others with 3,2%.

My understanding from the data is that the authors may have a point, but still most of the references are still too much multidisciplinary instead of interdisciplinary. I was surprised with the difference of Computer Science and Management. I would suppose that most of the citations would come from Management, but… A possible explanation is that from the sample of 1,120 there could have a large set of articles really more on computer science than in information systems, but these just show us how blurred are these distinctions.

Why did I post this note?

Well, I believe if somebody did a similar study of software engineering, it would stand even better as a reference discipline. The previous post on Jackson is one of the reasons why I believe SE is a discipline by itself (see a previous post on the topic).

The Machine

Michael Jackson has been a fundamental contributor to the field of software engineering.

I first learned from Michael with JSP, Jackson Structured Programming and after with JSP, Jackson System Development. I have taught several editions of my course at PUC-Rio using the JSD book.

I use to say that Michael´s writings have evolved in a truly computer science way, that is: bottom up. He started with JSP, then move to JSD, then to the great book on Software Requirements and Specifications and recently to the real abstract book: Problems Frames.

I regret that the great book on Requirements used the “and” in its title: lack of cohesion. However, this is symptomatic of the confusion that these terms bring to the software engineer.

Anyway, this note is to call your attention to a must read paper by Jackson: “Some Basic Tenets of Description”. It is short, 7 pages, and it is aimed at the “heart” of software engineering: modeling.

Let me start by the conclusion. Here Jackson brings out 3 important mantras, of his own. They are:

• "Distinguish the machine from the problem domain"
• "Don’t restrict description to the machine", and
• "State explicitly what is described".

I would add another 4, extracted from the Section titles of the article.

• “Requirements Are Not Given Properties”
• “The Model Is Not the Reality”
• “The Problem Is Not at the Interface”
• “Describing the Machine Is Not Enough”

The article is concise and right to the point. It distinguishes the machine, the problem domain (which I rather name Universe of Discourse) and explicitly points that requirements are desired needs of a set of actors, such that they must reflect a bridge from the problem domain to the machine.

p.s. In searching for the links to JSP and JSD, I found this commentary by Michael on the rationale for both methods.

Friday, November 10, 2006

UML Usage

In a recent article at the CACM, Dobing and Parsons report on the analysis of a UML usage survey (182 respondents) . 171 responses came from UML users and 11 came from partial UML users.

Of the several findings of the article, I would like to stress the following ones:

1) Respondents, on average, have been involved in 27 projects, of which only 6.2 used UML.

2) Class Diagram (73%) is the most frequently used technical description, followed by Use Case Diagram and Sequence Diagram.

3) Use Cases Narratives (87%), Activity Diagrams (77%) and Use Case Diagrams (74%) are the preferred means with regard to client involvement.

4) Class Diagram (93%), Sequence Diagram (91%) and Statechart Diagram (82%) are the preferred means with regard to clarify technical understanding.

7) In asking the question, “reasons for not using some UML components”, some findings are intriguing:
a. 50% said that Class Diagrams were not well understood by analysts, 48% said that Activity Diagrams were not well understood by analysts.
b. 42% said that Statechart Diagrams were not useful for most projects.
c. 42% said that Use Case Diagrams have insufficient value to justify cost and that were not useful with programmers.
d. 49% said that Collaboration Diagrams would capture redundant information.

8) “The median “typical” UML project had a budget of around $1, 000,000 and 6.5 person-years and required 50,000 lines of code”

Some observations:

1) Although being the most used UML component, Class Diagram is also the top misunderstood component! If the data collection and treatment is trustful, then this looks like to be a major contradiction. How a team would claim to use UML, and as such be object-oriented, if their team members do not understand the Class Diagram?

2) Of respondents´previous projects, 27 (average), only 6, 2 (23%) used UML!

3) Use Case Narratives were preferred over Use Case Diagrams as a way to communicate with clients. This is a surprise, since common belief is that Use Case Diagrams, given its iconic character, is the most used medium by OO developers when dealing with clients.

4) The data collected by Dobing and Parsons shows a productivity of 7,692 lines of code per year. Above to the industry averages of 2001, according to Gartner ( as cited in the Msdn Blog by EldarM), which is of 6,200. This brings the productivity at the rate of 32 LOC per day per person (240 working days). Of course, it is widely known that LOC is not the best productivity measure, since different context yields different numbers.