This page contains the project's work plan, together with a list of scheduled project tasks / deliverables.


Work Plan

The emphasis in this medium size project is on 3 inter-related R&D areas - Parallel Programming Models and Environments, Parallel Algorithms for Numerical Methods, and Scalable Computer Vision. The establishment of Excellence Nucleus for each of the main R&D areas identify the 3 major tasks:

T1. To establish an Excellence Nucleus on Parallel Programming Models and Environments

T2. To establish an Excellence Nucleus on Parallel Algorithms for Numerical Methods

T3. To establish an Excellence Nucleus on Scalable Computer Vision


The remaining text describes in more detail each of these tasks, dividing and allocating the subtasks to each Institution. The existing and required human resources are also included. The original proposal involved 5 institutions. However, the available funds had a negative impact towards priorities of some institutions, which lead to the unavailability of Universidade de Coimbra (UC) to contribute to the deliverables. As a result, UC changed its status in this project from a partner to a consulting institution.

The deliverables from this project are specified/detailed for each task or subtask. An annual Workshop will resume the activities of the consortium, where each Institution will present technical communications, that will be later published as Workshop Proceedings. In a global way, the deliverables include:


Tasks and Deliverables


T1 T2 T3


T1. To establish an Excellence Nucleus on Parallel Programming Models and Environments

Leader: Prof. Jose' Cunha (UNL)

Partners: UM (UC as a consultor)

T 1.1 T 1.2 T 1.3
Description Description Description
Deliverables --- Deliverables
Human Resources --- Human Resources

Task description:

Several parallel software engineering tools were already developed within the consortium, such as monitoring and debugging tools, parallel programming libraries and a scalable object-oriented programming and run-time environment; prototypes were also developed and tested in a limited set of case studies.

In this project we propose to use a basic parallel programming platform which is supported by the PVM and the MPI systems; PVM is the most widely disseminated and portable parallel programming interface, while MPI is an emerging system which will probably become the standard interface. This approach takes advantage of previous R&D experience of the task members in this field and allows a convenient integration of the developed tools and environments.

Several main areas of activity can be identified within this task, which are very closed related; they can be included into the following subtasks:

- Framework for Monitoring and Debugging Tools for Parallel and Distributed Systems

- Suite of Parallel Programming Libraries

- Scalable OO Programming and Run-time Environment, ParC++

Within this project we will continue our work concerning the improvement and assessment of the above existing models and tools. But now this will be done under a tight co-operation scheme between the project partners, promoting a cross-fertilisation of results, an exchange of knowledge and a convergence towards the common goal of promoting an excellence nucleus on parallel programming environments. The above tools and models will be integrated into the common platform and later evaluated through a set of parallel applications that will be selected jointly with partners in other tasks.

A monitoring architecture for parallel and distributed systems was developed at UNL, which corresponds to a trade-off between flexibility, efficiency, and scalability issues. This design integrates a suite of mechanisms supporting several aspects of program execution, including performance monitoring, debugging, and visualisation of program behaviour. The implementation work has centred on the PVM system for local heterogeneous computer networks, including Transputer-based parallel machines. An event-based interface promotes the integration of this monitoring and debugging support layer with other tools in a parallel software environment, such as high-level language models, performance analysis and visualisation tools, high-level debuggers, etc.

The development of a heterogeneous computer vision architecture and its implementation within this consortium, and the associated R&D work, shows that computer vision tasks may take advantage of scalable ParComp environments, requiring however adequate interfaces to develop portable and efficient applications, and better parallel algorithms on computational-intensive numerical computer vision tasks. A scalable object-oriented (OO) programming and run-time environment, ParC++, was designed and implemented at UM as a prototype for the development of computer vision and computer graphics applications, for a Transputer-based parallel architecture. The parallelism granularity in ParC++ can be dynamically defined by a run-time grain size adaptation, based on the architecture and application behaviour; a similar property can be found in Ellie, although it only relies on compile-time information: the grain size is statically defined. The scalability of the parallel code is obtained by the run-time parallel activities grain size adaptation, based on processors and network load, and aimed to support dynamic object allocation; this feature is the main difference between ParC++ and other C++ based approaches,. More extensive testing of this environment in computer vision and computer graphics applications is still required, together with a support for standard message passing mechanisms (PVM/MPI) and further investigation on scalability and dynamic load balancing.

This project is a good opportunity for the consortium to apply the above work and assess its suitability for the development of a significant set of parallel applications drawn from the numerical methods and the computer vision domains. It will also contribute to ease the difficult task of parallel programming on several related aspects:

- the monitoring and debugging tools allow the programmer to obtain information from the actual parallel program execution or to control its dynamic evolution, at several levels of abstraction; this will contribute to reduce the development time of parallel applications, and it also highlights hidden aspects of program behaviour that are amenable to more efficient implementations or are potential causes of errors;

- a scalable (OO) parallel programming language provides a higher abstraction level, contributing to a clearer program structure, while the associated runtime system may reduce the complexity of writing high performance architecture independent scalable applications.


T 1.1 Framework for Monitoring and Debugging Tools for Parallel and Distributed Systems

This subtask includes the following activities:

T 1.1.1 To develop a monitoring and debugging architecture for parallel platforms based upon the PVM and MPI systems (UNL)

T 1.1.2 To develop an environment to support an user interface to control the monitoring and debugging functionalities, namely to select specific events to be monitored and to specify the level of abstraction for viewing program execution (UNL)

T 1.1.3 To provide an interface between the monitoring layer and existing performance visualisation packages, thus giving access to a wider diversity of computation views (UNL)

T 1.1.4 To evaluate the monitoring and debugging functionalities by integrating them into a parallel and distributed logic programming system that supports the development of distributed artificial intelligence applications (UNL)

T 1.1.5 To evaluate the monitoring and debugging functionalities by integrating them into an environment supporting the parallel and distributed execution of genetic algorithms (UNL)

T 1.1.6 To evaluate and integrate the aspects developed in activities T 1.1.1, T 1.1.2, T1.1.3, T 1.1.4, T 1.1.5 and in related subtask T 1.2 (UNL, UM)

Deliverables from T 1.1:

Human Resources for T 1.1:

UNL: 6 Academic Researchers (2 PhDs, 4 PhD Students) and 2 M.Sc. Students.

Total manpower: 72 man*month


Co-ordinator: Jose' Cunha

Participants: Pedro Medeiros, Vítor Duarte, João Lourenço, Maria Cecília Gomes, Rui Marques

2 M.Sc. Students (requiring grants)

UM: 2 Academic Researchers (PhD Students)


Participants: Luís Paulo Santos, Joao Luís Sobral (within their duties in Task T 1.3)


T 1.2 Suite of Parallel Programming Libraries

Cancelled due to lack of funds.


T 1.3 Scalable OO Programming and Run-time Environment, ParC++

This subtask includes the following activities:

T 1.3.1 To identify the services that should be included in the run-time system to ensure efficiency and portability; to study and test dynamic and distributed load balancing strategies (UM/UNL)

T 1.3.2 To implement a run-time version on PVM/MPI and to integrate it into the monitoring tools of task T 1.1 (UM based on work from UNL)

T 1.3.3 To extend the ParC++ model to support object based data-parallel programming, to include distributed state objects and method invocation broadcast (UM)

T 1.3.4 To include the PVM/MPI run-time system on the ParC++ environment, providing transparent dynamic load balancing and dynamic granularity control (UM)

T 1.3.5 To evaluate the ParC++ environment as far as programmer productivity and efficiency are concerned; this will be achieved developing computer vision and computer graphics applications (UM/Groups from Numerical Methods/Computer Vision)

Deliverables from T 1.3:

Human Resources for T 1.3:

UM: 3 Academic Researchers (1 PhD, 2 PhD students)

Total manpower: 78 man*months


Co-ordinator: Alberto Proenca

Participants: Luís Paulo Santos, Joao Luís Sobral


Back to top

 T2 To establish an Excellence Nucleus on Parallel Algorithms for Numerical Methods

Leader: Prof. Rui Ralha (UM)

Partners: FEUP, UM

Human Resources


Task description:

Many problems that arise in Engineering and Physics, namely in Fluid Dynamics, Structural Engineering, Computer Vision, Chemistry or Weather Forecast involve large computations. This task aims to select applications, to identify adequate numerical methods, to produce and to implement parallel algorithms, and to evaluate the final result. The search for the parallel algorithms will be carried out mainly in the scope of the following areas: linear equations, optimisation, eigenvalues and singular values.

This task includes the following activities:


T 2.1 Construction of the mathematical model discretization matrices, in a distributed way, by domain decomposition techniques. The experience acquired by some elements of our team in the parallelization of Fluid Dynamics applications will be continued in this project, with new applications and new decomposition strategies.

T 2.2 Selection of numerical methods: to identify algorithms that are the computational kernels of many scientific applications and require intensive computation; this process of selection will have to take into account the robustness of the numerical methods and the degree of potential parallelism that is offered by each method.

This subtask includes the following activities:

T 2.2.1 Algorithms for eigenvalues and singular values computation: this is an area where the advent of parallel computers has triggered an intensive research since the most popular sequential methods present important difficulties for parallel processing (mainly in a distributed memory environment); this explains, for instance, the revival of the old Jacobi's method for the diagonalisation of a symmetric matrix; in this context, we intend to follow a very specific line of research which consists in replacing the classical two-sided orthogonal transformations with theoretically equivalent one sided transformations.

T 2.2.2 Algorithms for systems of linear equations: the choice of the appropriate numerical method depends on the characteristics of the problem, and it has to be composed with the domain decomposition to concatenate the solutions obtained in the different processors (subdomains) in parallel, to obtain the global solution. The experience of our group in the use of robust iterative methods specially suited for nonsymmetric problems (namely GMRES), with different preconditioners, will by useful for this research kernel. This work will be continued with its use in problems with others properties, that will admit other preconditioners.

T 2.2.3 Algorithms for optimisation: the advent of high performance computing has allowed the tackling of very large scale problems in optimisation. Following previous work on power scheduling optimisation, we intend to study the behaviour of sytems under different demand scenarios; therefore we will investigate methods for stochastic optimisation and parallel dynamic programming.

T 2.3 Parallelisation of algorithms: to use different parallelisation techniques (domain decompositon, data parallelism, farming, etc) to get efficient and reliable parallel algorithms; one major aspect we will be concerned with in this context is the study of the scalability of the parallel algorithms.

T 2.4 Implementation of the parallel algorithms: to write code (Fortran, C) for the selected algorithms; for reasons of efficiency and portability across a wide range of machines, the developed code will integrate, as much as possible, the optimized routines of the BLAS library as building blocks.

T 2.5 Testing and Evaluation: to evaluate in practical tests with a parallel machine the quality of the produced codes, taking into account different aspects: robustness, functionality, scalability and efficiency; in each case, i.e, for each one of the algorithms produced, this evaluation will include a comparison with the results achieved by routines already in use in available libraries (Lapack, ScaLapack).

Deliverables from T 2:

Human Resources for T 2:

UM/FEUP: 3 Academic Researchers (3 PhD), 1 PhD Student and 1 M.Sc. Student

Total manpower: 60 man*month


Co-ordinator: Rui Ralha

Participants: Pedro Oliveira, Filomena Almeida, Ana Julia Viamonte, 1 M.Sc. Student (requiring grant)


Back to top

T3. To establish an Excellence Nucleus on Scalable Computer Vision

Leader: Prof. Hans du Buf (UAlg)

Partners: UM

Human Resources


Task description:

This task concerns the development of new and parallel algorithms for computer vision. Specifically, this proposal will address algorithm design for edge-preserving smoothing and segmentation in the case of three dimensional data (e.g. tomography, underwater acoustics),as well as for motion estimation in 2D image sequences.

These tasks will use, whenever possible, the environments and tools being developed in Task 1.

The goals and backgrounds are detailed in the following subtasks:


T 3.1 Edge-preserving smoothing

Edge-preserving smoothing by adaptive filtering is known for excessive CPU times due to multiple iterations and instability, i.e. most methods known will affect edge-sharpness while smoothing noise within regions. This holds for the 2D case, but imposes an even bigger problem in 3D. For this reason current effort in algorithm design includes the development of scalable algorithms specifically for MPP systems. This subtask includes the implementation of a few existing methods in parallel, by extending 2D algorithms to 3D, which will allow to compare results (filtering quality and CPU times) with new methods. It also includes the development of new 3D methods with an emphasis on scalability.

T 3.2 Image segmentation in 3D

One of the best methods for segmenting 2D images is based on constructing a quadtree, applying a clustering at a certain tree level, and then projecting the initial boundaries down level by level but with a boundary refinement at each level on the basis of the information available. This is reasonably fast because of the data reduction in the tree, but the boundary accuracy is often poor. An excellent method for obtaining accurate boundaries is stochastic relaxation labeling (SRL), which exploits a dictionary of edge and vertex configurations, but the CPU time is unacceptable, even in 2D (not to speak of 3D), because the dictionary must be checked at each pixel (voxel) position.

This subtask proposes to develop a hybrid and scalable method, which consists of constructing an octree and a first clustering at a given tree level, followed by SRL for refining the initial boundaries. In the downprojection SRL will also be applied, but only in corridors around boundary estimates, which will result in good quality and much smaller CPU times. Load balancing strategies under study in T1 will have an important role in this subtask, because an equidistant grid partitioning of the data cube implies an unequal amount of data analysis. Subtask 3.1 fits into 3.2 if seen as a segmentation-preprocessing procedure.

T 3.3 Motion estimation

The extraction of motion information from sequences of images is a wellknown problem in computer vision. Existing techniques are based on coarse-to-fine block matching in a quad/octree and the analysis of phase information from Gabor/wavelet filters in different orientations and scales. In this task 2 different and complementary approaches will be used:

  1. a spatio-temporal smoothing technique followed by a 3D boundary detection, and
  2. a neural net based approach where training strategies to estimate local motion will be defined to take advantage of a parallel execution environment. The training of a neural net is a time consuming task that will profit from a parallel implementation, provided the right strategy is chosen.

We emphasize the relation between the first approach (a) and subtasks T3.1 and T3.2, which will lead to an active collaboration between the UM and UAlg partners.

Deliverables from T3:

Human Resources for T 3:

UAlg: 1 Academic Researcher (PhD), 1 PhD Students and 2 M.Sc. Student.

Total manpower: 54 man*month


Co-ordinator: Hans du Buf

Participants: Fernando de Gouveia (PhD student requiring grant), João Rodrigues (MsC student)

UM: 1 Academic Researcher (PhD), and 2 M.Sc. Student.

Total manpower: 24 man*month


Co-ordinator: Alberto Proença

Participants: Vitor Manuel Filipe (MsC Student) plus 1 M.Sc. Student (requiring a grant)


Back to Top


Back to Top