Software Testing Club -  An Online  Software Testing Community

What test data does your team use?

How does your team obtain the test data?

If you cannot use a snapshot of prod data, or if some elements of prod data are sensitive and you cannot have access to the data (patient information, credit card numbers, social security numbers, etc.) how (what methods) do use to build test data?

Reply to This

Replies to This Discussion

I'm very interested in this as well. I work for a company that develops a commercial test data generation tool and we're very much like to ensure that our tool ends up corresponding to what the market wants. My theory is that most shops use production data, whether it's "officially" allowed or not, rather than taking the effort to generate the data themselves. I'd be very curious to know what the experiences of other testers are regarding using realistic data to test against.

Reply to This

David,

My experience matches your theory; I've always been able to get a copy of at least some production data. But then some of the data doesn't give me all of what I need so I use the prod data as a base and build up from there. I blogged about sample data here: http://www.testingreflections.com/node/view/7701 While I certainly don't want to turn the forum commerical, I'd be interested in having a link to the company you work at and the tool you're referencing. Thanks for joining the discussion.

Reply to This

Sorry, I didn't mention it so not to be accused of using this discussion to plug the product. I don't work in sales, after all! The company is Red Gate Software and the product is SQL Data Generator (http://www.red-gate.com/products/SQL_Data_Generator/index.htm). It generates data for SQL Server databases only. I'd be delighted if you (or any members of this forum) could give it a go and give me some feedback, either via this forum or a review in a blog, or maybe there is a space in softwaretestingclub.com where product reviews can be posted? The default install is fully functional but expires after 14 days, but I'd be happy to provide trial extension licenses as and when required. My interest here is to understand the needs of the testing community better so the product can be further improved.

Reply to This

David,

I'm aware of the company and the tool. Would be great to have time with the tool in trade for feedback. If we need to arrange details offline my email is karen@karennjohnson.com

Reply to This

Thanks Karen. I've just sent a message to your personal email address.

I've read your blog and it appears that the two main aims are to identify the best data as test inputs, and to ensure that volume tests are run. The latter tests can cover system stress, can identify memory leaks, and could also verify performance requirements. A data generation tool is ideal for such testing. Identifying data that is most likely to break an application is harder as business knowledge may be required to achieve this. The obvious approach would be to simply generate a set of records using boundary values for the field data types, and to assume that if these work flawlessly, other combinations will, but somehow I think that this naive method would miss many failure conditions. This is why it's so useful to have access to production data for testing purposes because it provides a entirely realistic data sample whereas anything else would necessarily be second best.

Reply to This

Well this is a good question - what test data do my team use ? How does your team obtain the test data ?

As for my project goes, as soon as the requirements are freezed for a release, we have a specialised team know as test data collectors. This team looks into the requirements and prepares the document which will have the details like what are the attributes in the facet that is going to come in and which source file will be having the data. They will drill down to the source file generator and ask the team to generate source files with the exact data requirements.

Once the data requirement analysis has been done the request is forwarded to the source systems to generate the files with the data. The data is then loaded into the tables by the usual process of ETL. Now the test data for the specific release is sitting on your database to carry out your testing. The testing is started from the files being received with the test data and the ETL process and goes upto reporting and further downstream feeds (if required).

Usually the above process is carried out in CAT - Combined Application Testing, where End To End testing occurs. Apart from this in system test stage, the Data In uses the same files for the data loading, Reporting uses the copy of the database to do their part of testing and finally Data Out takes over for their part. However in system testing we end up doing lot of mocking up of test data to carry out negative testing. CAT only involves positive end to end testing.

We generally use the snapshot of production data to carry out performance testing as well as pre production release testing.

Reply to This

Ritwick,

Thanks for sharing your experience. You are fortunate (I think) to have data collectors. I wonder what skill set the data collectors have and what they do when they are not generating or gathering data? Are they developers? testers?

You raise a good point about having to mock up data to test negative conditions. I have had the same experience. I had business users that were confused by this and assumed that if they tested with a snapshot of prod that everything (from a test perspective would be covered). But there are additional tests and certainly negative tests to address that sometimes requires mock data.

I'm interested that you have had an environment to test end to end. This was also fortunate.

Thanks for sharing. This type of specific reply has is good to hear and yet not too many people have shared. I suspect that much testing that takes place in the BI space is done by developers - I may be wrong about this but have yet to hear from many testers.

Karen

Reply to This

Karen,

Well I try to clear up the questions you have raised.

1. what skill set the data collectors have and what they do when they are not generating or gathering data? Are they developers? testers?

Well the Data Collection team is built with testers and they work in close tandem with the business analysts who drive the requirements. Say in any module, if there are 5 testers, 1 senior tester will join the data collectors team. The data collectors when the job is over they associate themselves with the test case review so that they can have a look if the specific test cases cover the data they have collected.

Well in the BI space testers are basically from the development, however the leads are from specialist testing area.

Having specific environments for testing in BI is very useful or should I say a necessity.

Ritwick

Reply to This

Thanks for clarifying about the team and the skill set.

Having data and an environment is important - agreed.

Reply to This

RSS

© 2010   Created by Rosie Sherry

Badges  |  Report an Issue  |  Terms of Service

Sign in to chat!