Tell me more ×
Answers OnStartups is a question and answer site for entrepreneurs looking to start or run a new business. It's 100% free, no registration required.

This will be tough to explain but....

What do you do when you are developing a software that needs to deal with large amount of variations of semi structured data from the web ?

I told myself it was finished but today feeding it more random sample, it's failing however progress is being shown as more bugs are being fixed.

I feel like I'm so close yet it also feels like every step forward results in two steps backward due to more bugs being discovered. I'm not trying to be a perfectionist but I feel that it needs to be able to deal with most variations in semi structured data to be useful to my future users.

I have gained strong interest from my past clients I worked for in this software but I feel that once they realize, they are having poor rate of success, I will lose them.

Mentors say that I should be releasing early and often in small increments. However, I feel that currently, my software is not able to even perform very well.

My patience is running out and I am being burnt out. I have been working on this software and researching for the past 2 years. My release date is late January but with the amount of bugs (caused by large variations in sample size) makes things difficult.

I'm not sure if I am expressing myself correctly but this software is not a typical CRM or simple game with clear cut features. This software function at it's minimum is to deal with large variations across semi structured data found on the web but to reach this goal seems to require much more testing and fixing.

Every bug I fix, does bring progress, but it just seems endless.

**UPDATE: Happy New Year ! Hi everyone, I realized why I was keep getting the same bugs. Problem is my algorithm overlooked a very simple parameter (one of the answers here gave me an idea). Thank you James. It makes perfect sense why I keep seeing the similar bugs over and over ! I wish there was a way to select more than one answer ! All of the answers here are equally great !

share|improve this question
2  
are you using test driven development? if so it sounds like you need to write more test cases to catch the edge cases, if not then i highly recommend you check it out – Ben Dec 30 '10 at 6:28
where can I find some tutorials on TDD ? I have been interested in this model for a while but never got around to it. – Kim Jong Woo Dec 30 '10 at 19:23
2  
I recommend reading these two books: first, "Test-Driven Development" by Kent Beck and then "The Art of Unit Testing" by Roy Osherove. – Brandon King Dec 30 '10 at 20:55

5 Answers

up vote 2 down vote accepted

Without knowing the specifics of your product I can't offer any fixes. But, if you have been working on this for 2 years and you are still discovering major problems, it sounds like you have a fundamental problem with your parser/algorithm/methodology.

Are you just collecting more and more dataset/samples then 'bug fixing' to handle each 'exception'? did you start off with a clear concept how you were going to solve the problem, or are you guessing your way through each new data sample?

To be the bearer of bad news - you might need to stop, take a deep breath then re-visit your core code and decide if it is properly designed to handle the job. Do you need a module approach? e.g if you are parsing web logs you might need a apache module, iis module etc where you can add specific features instead of a monolithic one-thing-does-it-all-and-will-rule-the-world. It might give you a better marketing/sales approach too. When a customer has a new/different data set you can sell/publish a new module

share|improve this answer
basically, each new dataset reveals bugs or require minor addition of a new rule/logic. The original algorithm seems to hold but it's these small extra external variations of the dataset that requires some feature or logic rule to increase accuracy. – Kim Jong Woo Dec 30 '10 at 2:34
I have developped an algorithm to solve my main problem. Then as I process new datasets, I come across situations where algorithm fails to process it, so I add new fix to handle that. I am hoping that once enough samples have been collected and fixes provided, the success rate and accuracy of my system should increase. The latter is observed but it appears there's some more fixes that needs to be added until desirable accuracy is achieved. – Kim Jong Woo Dec 30 '10 at 2:41
Years ago when file viewers were problematic some view-every-file programs offered a 'file guarantee' - if a file can't be viewed then send it to us and we'll make it work. A clever way of collecting new file formats while setting a good user expectation. Add an 'oops, this file didn't parse, want to send it to tech support?' option. Your next problem will be trying to make money while continually supporting existing customers and their file requests – james Dec 30 '10 at 5:09
James you pointed out that there could be a fundamental problem with algorithm and this made me take a big picture look at my algo, and BAM, I discovered a serious flaw that I had overlooked ! I can't believe I did not see this a year ago. I chose this answer because it directly helped me. I would also choose below answer as well. – Kim Jong Woo Jan 12 '11 at 3:04
In addition, I have been thinking of implementing that as well. Upon error, it would ask confirmation to alert techies (me). That's the next up hill battle, trying to juice revenues out of clients and meeting their satisfactory level and improving my current algo. I have done this before, and this is far more stressful. I think I might even consider taking on partners at this point. – Kim Jong Woo Jan 12 '11 at 3:08
show 1 more comment

Take a deep breath and hang in there. Every piece of software has bugs, so don't let them get you too frustrated. I'd recommend doing something like the following:

  • Prioritize your bugs in order of severity and importance to the main features of your software.
  • Fix the most mission critical bugs so that your main features are more or less stable.
  • As you fix each bug, write a unit test for it. It'll take a few extra minutes now, but it will save time and provide more stability for the future.
  • Don't necessarily worry too much about non-mission-critical bugs. If it doesn't affect your main features, you can update it in a future release.
  • Slap a beta tag on your software. Generally, people will be more forgiving of bugs in a beta version. They may even expect them.
  • Acknowledge known bugs for each release. Transparency helps and your clients and community will be more understanding if they know you're working on it.
  • Don't develop new features until you take care of the majority or all of the existing bugs.
  • Iterate quickly.

If you don't feel like your main features are stable enough by your release date, push back the release. Delays happen in software all the time and it will be easier to overcome a delay by a few weeks or even months than the initial bad reputation that would result from your main features not working.

share|improve this answer
+1 for unit testing – Brandon King Dec 29 '10 at 22:44
In that situation i usually light one up. – Frank Dec 30 '10 at 1:45
Also sounds like you need to get more hands on deck. – Frank Dec 30 '10 at 1:54

The lean startup model is fantastic and I support it wholeheartedly, but the truth is, it doesn't work for every type of business.

There are some business models where it is absolutely critical to your market to release a product that has a full feature set and a high accuracy.

It sounds like you need to take some time to get your head out of the coding and think about your software as a business. Is it critical to release with the degree of accuracy your gut is telling you? If so why? What will the impact on your customers be if the software isn't as accurate as you would like?

If a high degree of accuracy is imperative, maybe you need to look for angel funding to get to the point you need to before release. Not the lean startup model, but still a relevant one depending on what your business model is.

share|improve this answer
Agreed, I'm a Scrum user and it just doesn't seem to apply to this case. I wouldn't rush the release -- What matters is whether he can deconstruct the minimally viable product further. Is there a different type of product you can release with the work you've already put in? – Henry the Hengineer Dec 29 '10 at 11:34
1  
I too am a SCRUM BAG, and must admit it does not work on all projects. Specially in the B2B market, one with highly established players or with complex requirements. Sure you could build a simple app in a few days, but you usually get only one chance to make a first impression. If your customers see your app as "limited" then good luck convicing them it has evolved. People still hate windows, because of windows ME or 95 or vista. Its hard to make a 2nd impression. Make sure you understand what you are building is of value, and worry less about the timelines. look to hire help if possibl – Frank Dec 30 '10 at 1:56
You are right, it's software aimed at businesses, and there are a few competitors. I feel that if I fail at accuracy and rush to launch, I will only be giving up my leads to my competitors. This is my greatest fear. – Kim Jong Woo Dec 30 '10 at 2:44
This would be the second answer I choose. You are correct, I don't think the same model fits all. Some things need to deliver good first impression. – Kim Jong Woo Jan 12 '11 at 3:05

Seth Godin talk about the ultimate technique in this video.

There is one secret to shipping on time and on budget. The secret is: when you run out of time or you run out of money, you ship. The you are on time and on budget.

Time to market is important for technology products.

I suggest you to use Scrum to achieve that goal.

share|improve this answer
I don't quite agree with him. For instance, what if you were building a car, could you ship a car that only has a chassis and without any ergonomics installed but one can drive unconfortably ? – Kim Jong Woo Dec 30 '10 at 2:31
Your lizard brain is talking :) – user3997 Dec 30 '10 at 9:19
Godin's posts don't refer to high-accuracy products. While product revision would make Scrum more applicable, Godin does not provide the advice necessary to make that kind of transition. So no, nothing to do with lizard-brain. user3650's concerns are justified in this particular instance. – Henry the Hengineer Dec 30 '10 at 22:34
He doesn't go into the details because it's not needed. If you are given 30 days to build a car, what you will prioritize will be different. – user3997 Dec 30 '10 at 23:01

Cut scope.

Your basic problem is that there is a lot of data on the internet, you can't possibly support every type of data - it's just not going to work, by the time you add support for all types of data that exists today new types will pop up - you will never finish.

Think about your first client (it can be an imaginary client), think about the smallest subset of data they absolutely need - that is the smallest subset where using your software provide any value what so ever not the smallest subset you think is acceptable.

Release this initial version and continue to prioritize new data types by the frequency your customers are failing to process them.

(I don't know exactly what data you are trying to process, so I may be wrong but from your question I get the feeling that it is one of those endlessly dynamic types of data where you can't possibly support everything).

share|improve this answer
1  
this is a great answer. – Kim Jong Woo Dec 30 '10 at 2:29

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.